parcel | Lucy |
class variable | LUCY_INDEXER |
struct symbol | lucy_Indexer |
class nickname | lucy_Indexer |
header file | Lucy/Index/Indexer.h |
Lucy::Index::Indexer – Build inverted indexes.
The Indexer class is Apache Lucy’s primary tool for managing the content of inverted indexes, which may later be searched using IndexSearcher.
In general, only one Indexer at a time may write to an index safely. If a write lock cannot be secured, new() will throw an exception.
If an index is located on a shared volume, each writer application must
identify itself by supplying an
IndexManager with a unique
host
id to Indexer’s constructor or index corruption will
occur. See FileLocking for a detailed
discussion.
Note: at present, Delete_By_Term() and Delete_By_Query() only affect documents which had been previously committed to the index – and not any documents added this indexing session but not yet committed. This may change in a future update.
lucy_Indexer* // incremented
lucy_Indexer_new(
lucy_Schema *schema,
cfish_Obj *index,
lucy_IndexManager *manager,
int32_t flags
);
Open a new Indexer. If the index already exists, update it.
A Schema.
Either a string filepath or a Folder.
An IndexManager.
Flags governing behavior.
lucy_Indexer*
lucy_Indexer_init(
lucy_Indexer *self,
lucy_Schema *schema,
cfish_Obj *index,
lucy_IndexManager *manager,
int32_t flags
);
Initialize an Indexer.
A Schema.
Either a string filepath or a Folder.
An IndexManager.
Flags governing behavior.
void
lucy_Indexer_Add_Doc(
lucy_Indexer *self,
lucy_Doc *doc,
float boost
);
Add a document to the index.
A Lucy::Document::Doc object.
A floating point weight which affects how this document scores.
void
lucy_Indexer_Add_Index(
lucy_Indexer *self,
cfish_Obj *index
);
Absorb an existing index into this one. The two indexes must have matching Schemas.
Either an index path name or a Folder.
void
lucy_Indexer_Delete_By_Term(
lucy_Indexer *self,
cfish_String *field,
cfish_Obj *term
);
Mark documents which contain the supplied term as deleted, so that they will be excluded from search results and eventually removed altogether. The change is not apparent to search apps until after Commit() succeeds.
The name of an indexed field. (If it is not spec’d as
indexed
, an error will occur.)
The term which identifies docs to be marked as deleted. If
field
is associated with an Analyzer, term
will be processed automatically (so don’t pre-process it yourself).
void
lucy_Indexer_Delete_By_Query(
lucy_Indexer *self,
lucy_Query *query
);
Mark documents which match the supplied Query as deleted.
A Query.
void
lucy_Indexer_Delete_By_Doc_ID(
lucy_Indexer *self,
int32_t doc_id
);
Mark the document identified by the supplied document ID as deleted.
A document id.
void
lucy_Indexer_Optimize(
lucy_Indexer *self
);
Optimize the index for search-time performance. This may take a while, as it can involve rewriting large amounts of data.
Every Indexer session which changes index content and ends in a Commit() creates a new segment. Once written, segments are never modified. However, they are periodically recycled by feeding their content into the segment currently being written.
The Optimize() method causes all existing index content to be fed back into the Indexer. When Commit() completes after an Optimize(), the index will consist of one segment. So Optimize() must be called before Commit(). Also, optimizing a fresh index created from scratch has no effect.
Historically, there was a significant search-time performance benefit to collapsing down to a single segment versus even two segments. Now the effect of collapsing is much less significant, and calling Optimize() is rarely justified.
void
lucy_Indexer_Commit(
lucy_Indexer *self
);
Commit any changes made to the index. Until this is called, none of the changes made during an indexing session are permanent.
Calling Commit() invalidates the Indexer, so if you want to make more changes you’ll need a new one.
void
lucy_Indexer_Prepare_Commit(
lucy_Indexer *self
);
Perform the expensive setup for Commit() in advance, so that Commit() completes quickly. (If Prepare_Commit() is not called explicitly by the user, Commit() will call it internally.)
lucy_Schema*
lucy_Indexer_Get_Schema(
lucy_Indexer *self
);
Accessor for schema.
Lucy::Index::Indexer is a Clownfish::Obj.
Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
Apache License, Version 2.0.
Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
respective owners.