This project has retired. For details please refer to its Attic page.
Lucy::Index::Similarity – C API Documentation
Apache Lucy™

Lucy::Index::Similarity

parcel Lucy
class variable LUCY_SIMILARITY
struct symbol lucy_Similarity
class nickname lucy_Sim
header file Lucy/Index/Similarity.h

Name

Lucy::Index::Similarity – Judge how well a document matches a query.

Description

After determining whether a document matches a given query, a score must be calculated which indicates how well the document matches the query. The Similarity class is used to judge how “similar” the query and the document are to each other; the closer the resemblance, they higher the document scores.

The default implementation uses Lucene’s modified cosine similarity measure. Subclasses might tweak the existing algorithms, or might be used in conjunction with custom Query subclasses to implement arbitrary scoring schemes.

Most of the methods operate on single fields, but some are used to combine scores from multiple fields.

Functions

new
lucy_Similarity* // incremented
lucy_Sim_new(void);

Constructor. Takes no arguments.

init
lucy_Similarity*
lucy_Sim_init(
    lucy_Similarity *self
);

Initialize a Similarity.

Methods

Length_Norm
float
lucy_Sim_Length_Norm(
    lucy_Similarity *self,
    uint32_t num_tokens
);

Dampen the scores of long documents.

After a field is broken up into terms at index-time, each term must be assigned a weight. One of the factors in calculating this weight is the number of tokens that the original field was broken into.

Typically, we assume that the more tokens in a field, the less important any one of them is – so that, e.g. 5 mentions of “Kafka” in a short article are given more heft than 5 mentions of “Kafka” in an entire book. The default implementation of length_norm expresses this using an inverted square root.

However, the inverted square root has a tendency to reward very short fields highly, which isn’t always appropriate for fields you expect to have a lot of tokens on average.

Equals
bool
lucy_Sim_Equals(
    lucy_Similarity *self,
    cfish_Obj *other
);

Indicate whether two objects are the same. By default, compares the memory address.

other

Another Obj.

Inheritance

Lucy::Index::Similarity is a Clownfish::Obj.