Lucy::Analysis::Normalizer – Apache Lucy Documentation

Apache » Lucy » Docs » Perl » Lucy » Analysis

About

Resources

Related Projects

NAME

Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent stripping.

SYNOPSIS

my $normalizer = Lucy::Analysis::Normalizer->new;

my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
    analyzers => [ $tokenizer, $normalizer, $stemmer ],
);

DESCRIPTION

Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms. Optionally, it performs Unicode case folding and converts accented characters to their base character.

If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.

CONSTRUCTORS

new

my $normalizer = Lucy::Analysis::Normalizer->new(
    normalization_form => 'NFKC',
    case_fold          => 1,
    strip_accents      => 0,
);

Create a new Normalizer.

normalization_form - Unicode normalization form, can be one of ‘NFC’, ‘NFKC’, ‘NFD’, ‘NFKD’. Defaults to ‘NFKC’.
case_fold - Perform case folding, default is true.
strip_accents - Strip accents, default is false.

METHODS

transform

my $inversion = $normalizer->transform($inversion);

Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.

inversion - An inversion.

INHERITANCE

Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Clownfish::Obj.

Copyright © 2010-2015 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.