This project has retired. For details please refer to its Attic page.
Lucy::Docs::Cookbook::CustomQuery
Apache Lucy™

Sample subclass of Query

Explore Apache Lucy’s support for custom query types by creating a “PrefixQuery” class to handle trailing wildcards.

Code example for C is missing

Query, Compiler, and Matcher

To add support for a new query type, we need three classes: a Query, a Compiler, and a Matcher.

  • PrefixQuery - a subclass of Query, and the only class that client code will deal with directly.

  • PrefixCompiler - a subclass of Compiler, whose primary role is to compile a PrefixQuery to a PrefixMatcher.

  • PrefixMatcher - a subclass of Matcher, which does the heavy lifting: it applies the query to individual documents and assigns a score to each match.

The PrefixQuery class on its own isn’t enough because a Query object’s role is limited to expressing an abstract specification for the search. A Query is basically nothing but metadata; execution is left to the Query’s companion Compiler and Matcher.

Here’s a simplified sketch illustrating how a Searcher’s hits() method ties together the three classes.

Code example for C is missing

PrefixQuery

Our PrefixQuery class will have two attributes: a query string and a field name.

Code example for C is missing

PrefixQuery’s constructor collects and validates the attributes.

Code example for C is missing

Since this is an inside-out class, we’ll need a destructor:

Code example for C is missing

The equals() method determines whether two Queries are logically equivalent:

Code example for C is missing

The last thing we’ll need is a make_compiler() factory method which kicks out a subclass of Compiler.

Code example for C is missing

PrefixCompiler

PrefixQuery’s make_compiler() method will be called internally at search-time by objects which subclass Searcher – such as IndexSearchers.

A Searcher is associated with a particular collection of documents. These documents may all reside in one index, as with IndexSearcher, or they may be spread out across multiple indexes on one or more machines, as with LucyX::Remote::ClusterSearcher.

Searcher objects have access to certain statistical information about the collections they represent; for instance, a Searcher can tell you how many documents are in the collection…

Code example for C is missing

… or how many documents a specific term appears in:

Code example for C is missing

Such information can be used by sophisticated Compiler implementations to assign more or less heft to individual queries or sub-queries. However, we’re not going to bother with weighting for this demo; we’ll just assign a fixed score of 1.0 to each matching document.

We don’t need to write a constructor, as it will suffice to inherit new() from Lucy::Search::Compiler. The only method we need to implement for PrefixCompiler is make_matcher().

Code example for C is missing

PrefixCompiler gets access to a SegReader object when make_matcher() gets called. From the SegReader and its sub-components LexiconReader and PostingListReader, we acquire a Lexicon, scan through the Lexicon’s unique terms, and acquire a PostingList for each term that matches our prefix.

Each of these PostingList objects represents a set of documents which match the query.

PrefixMatcher

The Matcher subclass is the most involved.

Code example for C is missing

The doc ids must be in order, or some will be ignored; hence the sort above.

In addition to the constructor and destructor, there are three methods that must be overridden.

next() advances the Matcher to the next valid matching doc.

Code example for C is missing

get_doc_id() returns the current document id, or 0 if the Matcher is exhausted. (Document numbers start at 1, so 0 is a sentinel.)

Code example for C is missing

score() conveys the relevance score of the current match. We’ll just return a fixed score of 1.0:

Code example for C is missing

Usage

To get a basic feel for PrefixQuery, insert the FlatQueryParser module described in CustomQueryParser (which supports PrefixQuery) into the search.cgi sample app.

Code example for C is missing

If you’re planning on using PrefixQuery in earnest, though, you may want to change up analyzers to avoid stemming, because stemming – another approach to prefix conflation – is not perfectly compatible with prefix searches.

Code example for C is missing