Lucy::Analysis::Token - Unit of text.
my $token = Lucy::Analysis::Token->new( text => 'blind', start_offset => 8, end_offset => 13, ); $token->set_text('mice');
Token is the fundamental unit used by Apache Lucy’s Analyzer subclasses.
Each Token has 5 attributes:
text attribute is a Unicode string encoded as UTF-8.
start_offset is the start point of the token text,
measured in Unicode code points from the top of the stored field;
end_offset delimits the corresponding closing boundary.
end_offset locate the Token within a larger context,
even if the Token’s text attribute gets modified – by stemming,
The Token for “beating” in the text “beating a dead horse” begins life with a start_offset of 0 and an end_offset of 7; after stemming,
the text is “beat”,
but the start_offset is still 0 and the end_offset is still 7.
This allows “beating” to be highlighted correctly after a search matches “beat”.
boost is a per-token weight.
Use this when you want to assign more or less importance to a particular token,
as you might for emboldened text within an HTML document,
(Note: The field this token belongs to must be spec’d to use a posting of type RichPosting.)
pos_inc is the POSition INCrement,
measured in Tokens.
which defaults to 1,
is a an advanced tool for manipulating phrase matching.
Tokens are assigned consecutive position numbers: 0,
and 2 for
"three blind mice".
if you set the position increment for “blind” to,
then the three tokens will end up assigned to positions 0,
and 1001 – and will no longer produce a phrase match for the query
"three blind mice".
my $token = Lucy::Analysis::Token->new( text => $text, # required start_offset => $start_offset, # required end_offset => $end_offset, # required boost => 1.0, # optional pos_inc => 1, # optional );
my $text = $token->get_text;
Get the token's text.
Set the token's text.
my $int = $token->get_start_offset();
my $int = $token->get_end_offset();
my $float = $token->get_boost();
my $int = $token->get_pos_inc();
my $int = $token->get_len();
Lucy::Analysis::Token isa Clownfish::Obj.
Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
Apache License, Version 2.0.
Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.