This project has retired. For details please refer to its Attic page.
Lucy::Analysis::Normalizer – C API Documentation
Apache Lucy™

Lucy::Analysis::Normalizer

parcel Lucy
class variable LUCY_NORMALIZER
struct symbol lucy_Normalizer
class nickname lucy_Normalizer
header file Lucy/Analysis/Normalizer.h

Name

Lucy::Analysis::Normalizer – Unicode normalization, case folding and accent stripping.

Description

Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms. Optionally, it performs Unicode case folding and converts accented characters to their base character.

If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.

Functions

new
lucy_Normalizer* // incremented
lucy_Normalizer_new(
    cfish_String *normalization_form,
    bool case_fold,
    bool strip_accents
);

Create a new Normalizer.

normalization_form

Unicode normalization form, can be one of ‘NFC’, ‘NFKC’, ‘NFD’, ‘NFKD’. Defaults to ‘NFKC’.

case_fold

Perform case folding, default is true.

strip_accents

Strip accents, default is false.

init
lucy_Normalizer*
lucy_Normalizer_init(
    lucy_Normalizer *self,
    cfish_String *normalization_form,
    bool case_fold,
    bool strip_accents
);

Initialize a Normalizer.

normalization_form

Unicode normalization form, can be one of ‘NFC’, ‘NFKC’, ‘NFD’, ‘NFKD’. Defaults to ‘NFKC’.

case_fold

Perform case folding, default is true.

strip_accents

Strip accents, default is false.

Methods

Transform
lucy_Inversion* // incremented
lucy_Normalizer_Transform(
    lucy_Normalizer *self,
    lucy_Inversion *inversion
);

Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.

inversion

An inversion.

Dump
cfish_Hash* // incremented
lucy_Normalizer_Dump(
    lucy_Normalizer *self
);

Dump the analyzer as hash.

Subclasses should call Dump() on the superclass. The returned object is a hash which should be populated with parameters of the analyzer.

Returns: A hash containing a description of the analyzer.

Load
lucy_Normalizer* // incremented
lucy_Normalizer_Load(
    lucy_Normalizer *self,
    cfish_Obj *dump
);

Reconstruct an analyzer from a dump.

Subclasses should first call Load() on the superclass. The returned object is an analyzer which should be reconstructed by setting the dumped parameters from the hash contained in dump.

Note that the invocant analyzer is unused.

dump

A hash.

Returns: An analyzer.

Equals
bool
lucy_Normalizer_Equals(
    lucy_Normalizer *self,
    cfish_Obj *other
);

Indicate whether two objects are the same. By default, compares the memory address.

other

Another Obj.

Methods inherited from Lucy::Analysis::Analyzer

Transform_Text
lucy_Inversion* // incremented
lucy_Normalizer_Transform_Text(
    lucy_Normalizer *self,
    cfish_String *text
);

Kick off an analysis chain, creating an Inversion from string input. The default implementation simply creates an initial Inversion with a single Token, then calls Transform(), but occasionally subclasses will provide an optimized implementation which minimizes string copies.

text

A string.

Split
cfish_Vector* // incremented
lucy_Normalizer_Split(
    lucy_Normalizer *self,
    cfish_String *text
);

Analyze text and return an array of token texts.

text

A string.

Inheritance

Lucy::Analysis::Normalizer is a Lucy::Analysis::Analyzer is a Clownfish::Obj.