Project Lucy has retired. For details please refer to its Attic page.
Lucy::Index::SegWriter – C API Documentation
Apache Lucy™

Lucy::Index::SegWriter

parcel Lucy
class variable LUCY_SEGWRITER
struct symbol lucy_SegWriter
class nickname lucy_SegWriter
header file Lucy/Index/SegWriter.h

Name

Lucy::Index::SegWriter – Write one segment of an index.

Description

SegWriter is a conduit through which information fed to Indexer passes. It manages Segment and Inverter, invokes the Analyzer chain, and feeds low level DataWriters such as PostingListWriter and DocWriter.

The sub-components of a SegWriter are determined by Architecture. DataWriter components which are added to the stack of writers via Add_Writer() have Add_Inverted_Doc() invoked for each document supplied to SegWriter’s Add_Doc().

Methods

Register
void
lucy_SegWriter_Register(
    lucy_SegWriter *self,
    cfish_String *api,
    lucy_DataWriter *component // decremented
);

Register a DataWriter component with the SegWriter. (Note that registration simply makes the writer available via Fetch(), so you may also want to call Add_Writer()).

api

The name of the DataWriter api which writer implements.

component

A DataWriter.

Fetch
cfish_Obj*
lucy_SegWriter_Fetch(
    lucy_SegWriter *self,
    cfish_String *api
);

Retrieve a registered component.

api

The name of the DataWriter api which the component implements.

Add_Writer
void
lucy_SegWriter_Add_Writer(
    lucy_SegWriter *self,
    lucy_DataWriter *writer // decremented
);

Add a DataWriter to the SegWriter’s stack of writers.

Add_Doc
void
lucy_SegWriter_Add_Doc(
    lucy_SegWriter *self,
    lucy_Doc *doc,
    float boost
);

Add a document to the segment. Inverts doc, increments the Segment’s internal document id, then calls Add_Inverted_Doc(), feeding all sub-writers.

Add_Segment
void
lucy_SegWriter_Add_Segment(
    lucy_SegWriter *self,
    lucy_SegReader *reader,
    lucy_I32Array *doc_map
);

Add content from an existing segment into the one currently being written.

reader

The SegReader containing content to add.

doc_map

An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.

Merge_Segment
void
lucy_SegWriter_Merge_Segment(
    lucy_SegWriter *self,
    lucy_SegReader *reader,
    lucy_I32Array *doc_map
);

Move content from an existing segment into the one currently being written.

The default implementation calls Add_Segment() then Delete_Segment().

reader

The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.

doc_map

An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.

Delete_Segment
void
lucy_SegWriter_Delete_Segment(
    lucy_SegWriter *self,
    lucy_SegReader *reader
);

Remove a segment’s data. The default implementation is a no-op, as all files within the segment directory will be automatically deleted. Subclasses which manage their own files outside of the segment system should override this method and use it as a trigger for cleaning up obsolete data.

reader

The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.

Finish
void
lucy_SegWriter_Finish(
    lucy_SegWriter *self
);

Complete the segment: close all streams, store metadata, etc.

Methods inherited from Lucy::Index::DataWriter

Metadata
cfish_Hash* // incremented
lucy_SegWriter_Metadata(
    lucy_SegWriter *self
);

Arbitrary metadata to be serialized and stored by the Segment. The default implementation supplies a hash with a single key-value pair for “format”.

Format (abstract)
int32_t
lucy_SegWriter_Format(
    lucy_SegWriter *self
);

Every writer must specify a file format revision number, which should increment each time the format changes. Responsibility for revision checking is left to the companion DataReader.

Get_Snapshot
lucy_Snapshot*
lucy_SegWriter_Get_Snapshot(
    lucy_SegWriter *self
);

Accessor for “snapshot” member var.

Get_Segment
lucy_Segment*
lucy_SegWriter_Get_Segment(
    lucy_SegWriter *self
);

Accessor for “segment” member var.

Get_PolyReader
lucy_PolyReader*
lucy_SegWriter_Get_PolyReader(
    lucy_SegWriter *self
);

Accessor for “polyreader” member var.

Get_Schema
lucy_Schema*
lucy_SegWriter_Get_Schema(
    lucy_SegWriter *self
);

Accessor for “schema” member var.

Get_Folder
lucy_Folder*
lucy_SegWriter_Get_Folder(
    lucy_SegWriter *self
);

Accessor for “folder” member var.

Inheritance

Lucy::Index::SegWriter is a Lucy::Index::DataWriter is a Clownfish::Obj.