parcel | Lucy |
class variable | LUCY_COMPILER |
struct symbol | lucy_Compiler |
class nickname | lucy_Compiler |
header file | Lucy/Search/Compiler.h |
Lucy::Search::Compiler – Query-to-Matcher compiler.
The purpose of the Compiler class is to take a specification in the form of a Query object and compile a Matcher object that can do real work.
The simplest Compiler subclasses – such as those associated with constant-scoring Query types – might simply implement a Make_Matcher() method which passes along information verbatim from the Query to the Matcher’s constructor.
However it is common for the Compiler to perform some calculations which affect it’s “weight” – a floating point multiplier that the Matcher will factor into each document’s score. If that is the case, then the Compiler subclass may wish to override Get_Weight(), Sum_Of_Squared_Weights(), and Apply_Norm_Factor().
Compiling a Matcher is a two stage process.
The first stage takes place during the Compiler’s construction, which is where the Query object meets a Searcher object for the first time. Searchers operate on a specific document collection and they can tell you certain statistical information about the collection – such as how many total documents are in the collection, or how many documents in the collection a particular term is present in. Lucy’s core Compiler classes plug this information into the classic TF/IDF weighting algorithm to adjust the Compiler’s weight; custom subclasses might do something similar.
The second stage of compilation is Make_Matcher(), method, which is where the Compiler meets a SegReader object. SegReaders are associated with a single segment within a single index on a single machine, and are thus lower-level than Searchers, which may represent a document collection spread out over a search cluster (comprising several indexes and many segments). The Compiler object can use new information supplied by the SegReader – such as whether a term is missing from the local index even though it is present within the larger collection represented by the Searcher – when figuring out what to feed to the Matchers’s constructor, or whether Make_Matcher() should return a Matcher at all.
lucy_Compiler*
lucy_Compiler_init(
lucy_Compiler *self,
lucy_Query *parent,
lucy_Searcher *searcher,
lucy_Similarity *similarity,
float boost
);
Abstract initializer.
The parent Query.
A Lucy::Search::Searcher, such as an IndexSearcher.
A Similarity.
An arbitrary scoring multiplier. Defaults to the boost of the parent Query.
lucy_Matcher* // incremented
lucy_Compiler_Make_Matcher(
lucy_Compiler *self,
lucy_SegReader *reader,
bool need_score
);
Factory method returning a Matcher.
A SegReader.
Indicate whether the Matcher must implement Score().
Returns: a Matcher, or NULL if the Matcher would have matched no documents.
float
lucy_Compiler_Get_Weight(
lucy_Compiler *self
);
Return the Compiler’s numerical weight, a scoring multiplier. By default, returns the object’s boost.
lucy_Similarity*
lucy_Compiler_Get_Similarity(
lucy_Compiler *self
);
Accessor for the Compiler’s Similarity object.
lucy_Query*
lucy_Compiler_Get_Parent(
lucy_Compiler *self
);
Accessor for the Compiler’s parent Query object.
float
lucy_Compiler_Sum_Of_Squared_Weights(
lucy_Compiler *self
);
Compute and return a raw weighting factor. (This quantity is used by Normalize()). By default, simply returns 1.0.
void
lucy_Compiler_Apply_Norm_Factor(
lucy_Compiler *self,
float factor
);
Apply a floating point normalization multiplier. For a TermCompiler, this involves multiplying its own weight by the supplied factor; combining classes such as ORCompiler would apply the factor recursively to their children.
The default implementation is a no-op; subclasses may wish to multiply their internal weight by the supplied factor.
The multiplier.
void
lucy_Compiler_Normalize(
lucy_Compiler *self
);
Take a newly minted Compiler object and apply query-specific normalization factors. Should be invoked by Query subclasses during Make_Compiler() for top-level nodes.
For a TermQuery, the scoring formula is approximately:
(tf_d * idf_t / norm_d) * (tf_q * idf_t / norm_q)
Normalize() is theoretically concerned with applying the second half of that formula to a the Compiler’s weight. What actually happens depends on how the Compiler and Similarity methods called internally are implemented.
bool
lucy_Compiler_Equals(
lucy_Compiler *self,
cfish_Obj *other
);
Indicate whether two objects are the same. By default, compares the memory address.
Another Obj.
cfish_String* // incremented
lucy_Compiler_To_String(
lucy_Compiler *self
);
Generic stringification: “ClassName@hex_mem_address”.
lucy_Compiler* // incremented
lucy_Compiler_Make_Compiler(
lucy_Compiler *self,
lucy_Searcher *searcher,
float boost,
bool subordinate
);
Abstract factory method returning a Compiler derived from this Query.
A Searcher.
A scoring multiplier.
Indicates whether the Query is a subquery (as opposed to a top-level query). If false, the implementation must invoke Normalize() on the newly minted Compiler object before returning it.
void
lucy_Compiler_Set_Boost(
lucy_Compiler *self,
float boost
);
Set the Query’s boost.
float
lucy_Compiler_Get_Boost(
lucy_Compiler *self
);
Get the Query’s boost.
cfish_Obj* // incremented
lucy_Compiler_Dump(
lucy_Compiler *self
);
cfish_Obj* // incremented
lucy_Compiler_Load(
lucy_Compiler *self,
cfish_Obj *dump
);
Lucy::Search::Compiler is a Lucy::Search::Query is a Clownfish::Obj.
Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
Apache License, Version 2.0.
Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
respective owners.