This project has retired. For details please refer to its Attic page.
Lucy::Analysis::PolyAnalyzer – C API Documentation
Apache Lucy™

Lucy::Analysis::PolyAnalyzer

parcel Lucy
class variable LUCY_POLYANALYZER
struct symbol lucy_PolyAnalyzer
class nickname lucy_PolyAnalyzer
header file Lucy/Analysis/PolyAnalyzer.h

Name

Lucy::Analysis::PolyAnalyzer – Multiple Analyzers in series.

Description

A PolyAnalyzer is a series of Analyzers, each of which will be called upon to “analyze” text in turn. You can either provide the Analyzers yourself, or you can specify a supported language, in which case a PolyAnalyzer consisting of a CaseFolder, a RegexTokenizer, and a SnowballStemmer will be generated for you.

The language parameter is DEPRECATED. Use EasyAnalyzer instead.

Supported languages:

en => English,
da => Danish,
de => German,
es => Spanish,
fi => Finnish,
fr => French,
hu => Hungarian,
it => Italian,
nl => Dutch,
no => Norwegian,
pt => Portuguese,
ro => Romanian,
ru => Russian,
sv => Swedish,
tr => Turkish,

Functions

new
lucy_PolyAnalyzer* // incremented
lucy_PolyAnalyzer_new(
    cfish_String *language,
    cfish_Vector *analyzers
);

Create a new PolyAnalyzer.

language

An ISO code from the list of supported languages. DEPRECATED, use EasyAnalyzer instead.

analyzers

An array of Analyzers. The order of the analyzers matters. Don’t put a SnowballStemmer before a RegexTokenizer (can’t stem whole documents or paragraphs – just individual words), or a SnowballStopFilter after a SnowballStemmer (stemmed words, e.g. “themselv”, will not appear in a stoplist). In general, the sequence should be: tokenize, normalize, stopalize, stem.

init
lucy_PolyAnalyzer*
lucy_PolyAnalyzer_init(
    lucy_PolyAnalyzer *self,
    cfish_String *language,
    cfish_Vector *analyzers
);

Initialize a PolyAnalyzer.

language

An ISO code from the list of supported languages. DEPRECATED, use EasyAnalyzer instead.

analyzers

An array of Analyzers. The order of the analyzers matters. Don’t put a SnowballStemmer before a RegexTokenizer (can’t stem whole documents or paragraphs – just individual words), or a SnowballStopFilter after a SnowballStemmer (stemmed words, e.g. “themselv”, will not appear in a stoplist). In general, the sequence should be: tokenize, normalize, stopalize, stem.

Methods

Get_Analyzers
cfish_Vector*
lucy_PolyAnalyzer_Get_Analyzers(
    lucy_PolyAnalyzer *self
);

Getter for “analyzers” member.

Transform
lucy_Inversion* // incremented
lucy_PolyAnalyzer_Transform(
    lucy_PolyAnalyzer *self,
    lucy_Inversion *inversion
);

Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.

inversion

An inversion.

Transform_Text
lucy_Inversion* // incremented
lucy_PolyAnalyzer_Transform_Text(
    lucy_PolyAnalyzer *self,
    cfish_String *text
);

Kick off an analysis chain, creating an Inversion from string input. The default implementation simply creates an initial Inversion with a single Token, then calls Transform(), but occasionally subclasses will provide an optimized implementation which minimizes string copies.

text

A string.

Equals
bool
lucy_PolyAnalyzer_Equals(
    lucy_PolyAnalyzer *self,
    cfish_Obj *other
);

Indicate whether two objects are the same. By default, compares the memory address.

other

Another Obj.

Dump
cfish_Obj* // incremented
lucy_PolyAnalyzer_Dump(
    lucy_PolyAnalyzer *self
);

Dump the analyzer as hash.

Subclasses should call Dump() on the superclass. The returned object is a hash which should be populated with parameters of the analyzer.

Returns: A hash containing a description of the analyzer.

Load
lucy_PolyAnalyzer* // incremented
lucy_PolyAnalyzer_Load(
    lucy_PolyAnalyzer *self,
    cfish_Obj *dump
);

Reconstruct an analyzer from a dump.

Subclasses should first call Load() on the superclass. The returned object is an analyzer which should be reconstructed by setting the dumped parameters from the hash contained in dump.

Note that the invocant analyzer is unused.

dump

A hash.

Returns: An analyzer.

Methods inherited from Lucy::Analysis::Analyzer

Split
cfish_Vector* // incremented
lucy_PolyAnalyzer_Split(
    lucy_PolyAnalyzer *self,
    cfish_String *text
);

Analyze text and return an array of token texts.

text

A string.

Inheritance

Lucy::Analysis::PolyAnalyzer is a Lucy::Analysis::Analyzer is a Clownfish::Obj.