This project has retired. For details please refer to its Attic page.
Lucy::Index::DataWriter – Apache Lucy Documentation
Apache Lucy™

NAME

Lucy::Index::DataWriter - Write data to an index.

SYNOPSIS

# Abstract base class.

DESCRIPTION

DataWriter is an abstract base class for writing index data, generally in segment-sized chunks. Each component of an index – e.g. stored fields, lexicon, postings, deletions – is represented by a DataWriter/DataReader pair.

Components may be specified per index by subclassing Architecture.

CONSTRUCTORS

new

my $writer = MyDataWriter->new(
    snapshot   => $snapshot,      # required
    segment    => $segment,       # required
    polyreader => $polyreader,    # required
);

Abstract constructor.

  • snapshot - The Snapshot that will be committed at the end of the indexing session.
  • segment - The Segment in progress.
  • polyreader - A PolyReader representing all existing data in the index. (If the index is brand new, the PolyReader will have no sub-readers).

ABSTRACT METHODS

add_segment

$data_writer->add_segment(
    reader  => $reader   # required
    doc_map => $doc_map  # default: undef
);

Add content from an existing segment into the one currently being written.

  • reader - The SegReader containing content to add.
  • doc_map - An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.

finish

$data_writer->finish();

Complete the segment: close all streams, store metadata, etc.

format

my $int = $data_writer->format();

Every writer must specify a file format revision number, which should increment each time the format changes. Responsibility for revision checking is left to the companion DataReader.

METHODS

delete_segment

$data_writer->delete_segment($reader);

Remove a segment’s data. The default implementation is a no-op, as all files within the segment directory will be automatically deleted. Subclasses which manage their own files outside of the segment system should override this method and use it as a trigger for cleaning up obsolete data.

  • reader - The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.

merge_segment

$data_writer->merge_segment(
    reader  => $reader   # required
    doc_map => $doc_map  # default: undef
);

Move content from an existing segment into the one currently being written.

The default implementation calls add_segment() then delete_segment().

  • reader - The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.
  • doc_map - An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.

metadata

my $hashref = $data_writer->metadata();

Arbitrary metadata to be serialized and stored by the Segment. The default implementation supplies a hash with a single key-value pair for “format”.

get_snapshot

my $snapshot = $data_writer->get_snapshot();

Accessor for “snapshot” member var.

get_segment

my $segment = $data_writer->get_segment();

Accessor for “segment” member var.

get_polyreader

my $poly_reader = $data_writer->get_polyreader();

Accessor for “polyreader” member var.

get_schema

my $schema = $data_writer->get_schema();

Accessor for “schema” member var.

get_folder

my $folder = $data_writer->get_folder();

Accessor for “folder” member var.

INHERITANCE

Lucy::Index::DataWriter isa Clownfish::Obj.