This project has retired. For details please refer to its Attic page.
Lucy::Analysis::SnowballStopFilter – C API Documentation
Apache Lucy™

Lucy::Analysis::SnowballStopFilter

parcel Lucy
class variable LUCY_SNOWBALLSTOPFILTER
struct symbol lucy_SnowballStopFilter
class nickname lucy_SnowStop
header file Lucy/Analysis/SnowballStopFilter.h

Name

Lucy::Analysis::SnowballStopFilter – Suppress a “stoplist” of common words.

Description

A “stoplist” is collection of “stopwords”: words which are common enough to be of little value when determining search results. For example, so many documents in English contain “the”, “if”, and “maybe” that it may improve both performance and relevance to block them.

Before filtering stopwords:

("i", "am", "the", "walrus")

After filtering stopwords:

("walrus")

SnowballStopFilter provides default stoplists for several languages, courtesy of the Snowball project, or you may supply your own.

|-----------------------|
| ISO CODE | LANGUAGE   |
|-----------------------|
| da       | Danish     |
| de       | German     |
| en       | English    |
| es       | Spanish    |
| fi       | Finnish    |
| fr       | French     |
| hu       | Hungarian  |
| it       | Italian    |
| nl       | Dutch      |
| no       | Norwegian  |
| pt       | Portuguese |
| sv       | Swedish    |
| ru       | Russian    |
|-----------------------|

Functions

new
lucy_SnowballStopFilter* // incremented
lucy_SnowStop_new(
    cfish_String *language,
    cfish_Hash *stoplist
);

Create a new SnowballStopFilter.

stoplist

A hash with stopwords as the keys.

language

The ISO code for a supported language.

init
lucy_SnowballStopFilter*
lucy_SnowStop_init(
    lucy_SnowballStopFilter *self,
    cfish_String *language,
    cfish_Hash *stoplist
);

Initialize a SnowballStopFilter.

stoplist

A hash with stopwords as the keys.

language

The ISO code for a supported language.

Methods

Transform
lucy_Inversion* // incremented
lucy_SnowStop_Transform(
    lucy_SnowballStopFilter *self,
    lucy_Inversion *inversion
);

Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.

inversion

An inversion.

Equals
bool
lucy_SnowStop_Equals(
    lucy_SnowballStopFilter *self,
    cfish_Obj *other
);

Indicate whether two objects are the same. By default, compares the memory address.

other

Another Obj.

Dump
cfish_Obj* // incremented
lucy_SnowStop_Dump(
    lucy_SnowballStopFilter *self
);

Dump the analyzer as hash.

Subclasses should call Dump() on the superclass. The returned object is a hash which should be populated with parameters of the analyzer.

Returns: A hash containing a description of the analyzer.

Load
cfish_Obj* // incremented
lucy_SnowStop_Load(
    lucy_SnowballStopFilter *self,
    cfish_Obj *dump
);

Reconstruct an analyzer from a dump.

Subclasses should first call Load() on the superclass. The returned object is an analyzer which should be reconstructed by setting the dumped parameters from the hash contained in dump.

Note that the invocant analyzer is unused.

dump

A hash.

Returns: An analyzer.

Methods inherited from Lucy::Analysis::Analyzer

Transform_Text
lucy_Inversion* // incremented
lucy_SnowStop_Transform_Text(
    lucy_SnowballStopFilter *self,
    cfish_String *text
);

Kick off an analysis chain, creating an Inversion from string input. The default implementation simply creates an initial Inversion with a single Token, then calls Transform(), but occasionally subclasses will provide an optimized implementation which minimizes string copies.

text

A string.

Split
cfish_Vector* // incremented
lucy_SnowStop_Split(
    lucy_SnowballStopFilter *self,
    cfish_String *text
);

Analyze text and return an array of token texts.

text

A string.

Inheritance

Lucy::Analysis::SnowballStopFilter is a Lucy::Analysis::Analyzer is a Clownfish::Obj.