Qizx/db 2.1 API

com.qizx.api
Interface Indexing.WordSieve

All Superinterfaces:
Indexing.Sieve
All Known Implementing Classes:
DefaultWordSieve
Enclosing interface:
Indexing

public static interface Indexing.WordSieve
extends Indexing.Sieve

Pluggable text analyzer for custom full-text indexing and query. Analyzes text chunks to extract and normalize words.

To parse words, the sieve is first initialized with method start(String) on a text chunk. Then the nextWord() method is called repeatedly until the last word is parsed.


Method Summary
 char charAt(int ahead)
          Returns the source character at a given position.
 Indexing.WordSieve copy()
          Creates a carbon copy of this object.
 boolean isWordPart(char c)
          Returns true if the char can be part of a word.
 boolean isWordStart(char c)
          Returns true if the char can be at start of a word.
 char mapChar(char c)
          Normalizes a character.
 char multiCharsWildcard()
          Returns the wildcard character which matches several characters.
 char nextChar()
          Moves to next source character and returns it, returns 0 if at end.
 char[] nextWord()
          Returns the next normalized word, or null if the end of the fragment to analyze is reached.
 char singleCharWildcard()
          Returns the wildcard character which matches a single character.
 void start(char[] text, int length)
          Starts the analysis of a new text chunk.
 void start(String text)
          Starts the analysis of a new text chunk.
 int wordLength()
          Returns the original length of the last word returned by nextWord.
 int wordOffset()
          Returns the offset of the last word returned by nextWord.
 
Methods inherited from interface com.qizx.api.Indexing.Sieve
getParameters, setParameters
 

Method Detail

start

public void start(char[] text,
                  int length)
Starts the analysis of a new text chunk.

Parameters:
text - characters to analyze, index from 0 to length - 1
length - number of characters in the text array

start

public void start(String text)
Starts the analysis of a new text chunk.

Parameters:
text - fragment to analyze

nextWord

public char[] nextWord()
Returns the next normalized word, or null if the end of the fragment to analyze is reached.

Returns:
a character array containing the word found Caution: must return a *new* char array for each word.

wordOffset

public int wordOffset()
Returns the offset of the last word returned by nextWord.

Returns:
an index in the source text fragment

wordLength

public int wordLength()
Returns the original length of the last word returned by nextWord. Most often equal to the length of the array returned by nextWord, but can be different if normalization or stemming is performed.

Returns:
word length

charAt

public char charAt(int ahead)
Returns the source character at a given position.

Parameters:
ahead - an offset to the current position of the sieve in the source text. If equal to 0, return the character at current position.
Returns:
the character at position specified, or character 0 if out of bounds (no exception raised).

nextChar

public char nextChar()
Moves to next source character and returns it, returns 0 if at end.

Returns:
the next source character

mapChar

public char mapChar(char c)
Normalizes a character.

Parameters:
c - a source character converted to a normalized value in the returned word, for example converted to uppercase.
Returns:
normalized character

isWordStart

public boolean isWordStart(char c)
Returns true if the char can be at start of a word.

Parameters:
c - a source character
Returns:
true if the char can be at start of a word.

isWordPart

public boolean isWordPart(char c)
Returns true if the char can be part of a word.

Parameters:
c - a source character
Returns:
true if the char can be part of a word (not a punctuation or space).

multiCharsWildcard

public char multiCharsWildcard()
Returns the wildcard character which matches several characters. Used for full-text query parsing. In SQL LIKE patterns, it is '%', in Unix-glob patterns, it is '*'.

Returns:
the wildcard character

singleCharWildcard

public char singleCharWildcard()
Returns the wildcard character which matches a single character. Used for full-text query parsing. In SQL LIKE patterns, it is '_', in Unix-glob patterns, it is '?'.

Returns:
the wildcard character

copy

public Indexing.WordSieve copy()
Creates a carbon copy of this object.

Returns:
a new copy of this object

© 2008 Axyana Software