Lucy::Analysis::SnowballStopFilter - Suppress a “stoplist” of common words.
my $stopfilter = Lucy::Analysis::SnowballStopFilter->new( language => 'fr', ); my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => [ $tokenizer, $normalizer, $stopfilter, $stemmer ], );
A “stoplist” is collection of “stopwords”: words which are common enough to be of little value when determining search results. For example, so many documents in English contain “the”, “if”, and “maybe” that it may improve both performance and relevance to block them.
Before filtering stopwords:
("i", "am", "the", "walrus")
After filtering stopwords:
("walrus")
SnowballStopFilter provides default stoplists for several languages, courtesy of the Snowball project, or you may supply your own.
|-----------------------| | ISO CODE | LANGUAGE | |-----------------------| | da | Danish | | de | German | | en | English | | es | Spanish | | fi | Finnish | | fr | French | | hu | Hungarian | | it | Italian | | nl | Dutch | | no | Norwegian | | pt | Portuguese | | sv | Swedish | | ru | Russian | |-----------------------|
my $stopfilter = Lucy::Analysis::SnowballStopFilter->new( language => 'de', ); # or... my $stopfilter = Lucy::Analysis::SnowballStopFilter->new( stoplist => \%stoplist, );
Create a new SnowballStopFilter.
my $inversion = $snowball_stop_filter->transform($inversion);
Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.
Lucy::Analysis::SnowballStopFilter isa Lucy::Analysis::Analyzer isa Clownfish::Obj.
Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
Apache License, Version 2.0.
Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
respective owners.