Until now, our search app has had only a single search box. In this tutorial chapter, we’ll move towards an “advanced search” interface, by adding a “category” drop-down menu. Three new classes will be required:
QueryParser - Turn a query string into a Query object.
TermQuery - Query for a specific term within a specific field.
ANDQuery - “AND” together multiple Query objects to produce an intersected result set.
Our new “category” field will be a StringType field rather than a FullTextType field, because we will only be looking for exact matches. It needs to be indexed, but since we won’t display its value, it doesn’t need to be stored.
{
String *field_str = Str_newf("category");
StringType *type = StringType_new();
StringType_Set_Stored(type, false);
Schema_Spec_Field(schema, field_str, (FieldType*)type);
DECREF(type);
DECREF(field_str);
}
There will be three possible values: “article”, “amendment”, and “preamble”,
which we’ll hack out of the source file’s name during our parse_file
subroutine:
const char *category = NULL;
if (S_starts_with(filename, "art")) {
category = "article";
}
else if (S_starts_with(filename, "amend")) {
category = "amendment";
}
else if (S_starts_with(filename, "preamble")) {
category = "preamble";
}
else {
fprintf(stderr, "Can't derive category for %s", filename);
exit(1);
}
...
{
// Store 'category' field
String *field = Str_newf("category");
String *value = Str_new_from_utf8(category, strlen(category));
Doc_Store(doc, field, (Obj*)value);
DECREF(field);
DECREF(value);
}
The “category” constraint will be added to our search interface using an HTML “select” element (this routine will need to be integrated into the HTML generation section of search.cgi):
static void
S_usage_and_exit(const char *arg0) {
printf("Usage: %s [-c <category>] <querystring>\n", arg0);
exit(1);
}
We’ll start off by loading our new modules and extracting our new CGI parameter.
const char *category = NULL;
int i = 1;
while (i < argc - 1) {
if (strcmp(argv[i], "-c") == 0) {
if (i + 1 >= argc) {
S_usage_and_exit(argv[0]);
}
i += 1;
category = argv[i];
}
else {
S_usage_and_exit(argv[0]);
}
i += 1;
}
if (i + 1 != argc) {
S_usage_and_exit(argv[0]);
}
const char *query_c = argv[i];
QueryParser’s constructor requires a “schema” argument. We can get that from our IndexSearcher:
IndexSearcher *searcher = IxSearcher_new((Obj*)folder);
Schema *schema = IxSearcher_Get_Schema(searcher);
QueryParser *qparser = QParser_new(schema, NULL, NULL, NULL);
Previously, we have been handing raw query strings to IndexSearcher. Behind the scenes, IndexSearcher has been using a QueryParser to turn those query strings into Query objects. Now, we will bring QueryParser into the foreground and parse the strings explicitly.
Query *query = QParser_Parse(qparser, query_str);
If the user has specified a category, we’ll use an ANDQuery to join our parsed query together with a TermQuery representing the category.
if (category) {
String *category_name = String_newf("category");
String *category_str = String_newf("%s", category);
TermQuery *category_query
= TermQuery_new(category_name, category_str);
Vector *children = Vec_new(2);
Vec_Push(children, (Obj*)query);
Vec_Push(children, category_query);
query = (Query*)ANDQuery_new(children);
DECREF(children);
DECREF(category_str);
DECREF(category_name);
}
}
Now when we execute the query…
Hits *hits = IxSearcher_Hits(searcher, (Obj*)query, 0, 10, NULL);
… we’ll get a result set which is the intersection of the parsed query and the category query.
When querying full text fields, the easiest way is to create query objects using QueryParser. But sometimes you want to create TermQuery for a single term in a FullTextType field directly. In this case, we have to run the search term through the field’s analyzer to make sure it gets normalized in the same way as the field’s content.
Query*
make_term_query(Schema *schema, String *field, String *term) {
FieldType *type = Schema_Fetch_Type(schema, field);
String *token = NULL;
if (FieldType_is_a(type, FULLTEXTTYPE)) {
// Run the term through the full text analysis chain.
Analyzer *analyzer = FullTextType_Get_Analyzer((FullTextType*)type);
Vector *tokens = Analyzer_Split(analyzer, term);
if (Vec_Get_Size(tokens) != 1) {
// If the term expands to more than one token, or no
// tokens at all, it will never match a single token in
// the full text field.
DECREF(tokens);
return (Query*)NoMatchQuery_new();
}
token = (String*)Vec_Delete(tokens, 0);
DECREF(tokens);
}
else {
// Exact match for other types.
token = (String*)INCREF(term);
}
TermQuery *term_query = TermQuery_new(field, (Obj*)token);
DECREF(token);
return (Query*)term_query;
}
You’ve made it to the end of the tutorial.
For additional thematic documentation, see the Apache Lucy Cookbook.
ANDQuery has a companion class, ORQuery, and a close relative, RequiredOptionalQuery.
Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
Apache License, Version 2.0.
Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
respective owners.