Qizx/db 2.1 API

com.qizx.api.util.text
Class DefaultWordSieve

java.lang.Object
  extended bycom.qizx.api.util.text.SieveBase
      extended bycom.qizx.api.util.text.DefaultWordSieve
All Implemented Interfaces:
Indexing.Sieve, Indexing.WordSieve, Serializable

public class DefaultWordSieve
extends SieveBase
implements Indexing.WordSieve, Serializable

A basic word extractor suitable for most European languages.

All methods can be redefined.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class com.qizx.api.util.text.SieveBase
parameters
 
Constructor Summary
DefaultWordSieve()
          Builds a case-insensitive and accent-insensitive sieve.
DefaultWordSieve(boolean caseSensitive, boolean accentSensitive)
          Builds a sieve specifying case and accent sensitiveness.
 
Method Summary
 char charAt(int ahead)
          Returns the source character at a given position.
 Indexing.WordSieve copy()
          Creates a carbon copy of this object.
 boolean isWordPart(char c)
          Returns true if the char can be part of a word.
 boolean isWordStart(char c)
          Returns true if the char can be at start of a word.
 char mapChar(char c)
          Normalizes a character.
 char multiCharsWildcard()
          Returns the wildcard character which matches several characters.
 char nextChar()
          Moves to next source character and returns it, returns 0 if at end.
 char[] nextWord()
          Returns the next normalized word, or null if the end of the fragment to analyze is reached.
 void setParameters(String[] parameters)
          Defines optional parameters for the sieve.
protected  void setup(boolean caseSensitive, boolean accentSensitive)
           
 char singleCharWildcard()
          Returns the wildcard character which matches a single character.
 void start(char[] text, int length)
          Starts the analysis of a new text chunk.
 void start(String text)
          Starts the analysis of a new text chunk.
 int wordLength()
          Returns the original length of the last word returned by nextWord.
 int wordOffset()
          Returns the offset of the last word returned by nextWord.
 
Methods inherited from class com.qizx.api.util.text.SieveBase
addParameter, getParameters, toString, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface com.qizx.api.Indexing.Sieve
getParameters
 

Constructor Detail

DefaultWordSieve

public DefaultWordSieve()
Builds a case-insensitive and accent-insensitive sieve.


DefaultWordSieve

public DefaultWordSieve(boolean caseSensitive,
                        boolean accentSensitive)
Builds a sieve specifying case and accent sensitiveness.

Parameters:
caseSensitive - if false, uppercase and lowercase characters are equivalent.
accentSensitive - if false, a letter with diacritic signs is equivalent to the same letter without diacritic sign, for example '?' is equivalent to 'e'.
Method Detail

start

public void start(char[] text,
                  int length)
Description copied from interface: Indexing.WordSieve
Starts the analysis of a new text chunk.

Specified by:
start in interface Indexing.WordSieve
Parameters:
text - characters to analyze, index from 0 to length - 1
length - number of characters in the text array

start

public void start(String text)
Description copied from interface: Indexing.WordSieve
Starts the analysis of a new text chunk.

Specified by:
start in interface Indexing.WordSieve
Parameters:
text - fragment to analyze

nextWord

public char[] nextWord()
Description copied from interface: Indexing.WordSieve
Returns the next normalized word, or null if the end of the fragment to analyze is reached.

Specified by:
nextWord in interface Indexing.WordSieve
Returns:
a character array containing the word found Caution: must return a *new* char array for each word.
See Also:
Indexing.WordSieve.nextWord()

isWordStart

public boolean isWordStart(char c)
Description copied from interface: Indexing.WordSieve
Returns true if the char can be at start of a word.

Specified by:
isWordStart in interface Indexing.WordSieve
Parameters:
c - a source character
Returns:
true if the char can be at start of a word.

isWordPart

public boolean isWordPart(char c)
Description copied from interface: Indexing.WordSieve
Returns true if the char can be part of a word.

Specified by:
isWordPart in interface Indexing.WordSieve
Parameters:
c - a source character
Returns:
true if the char can be part of a word (not a punctuation or space).

multiCharsWildcard

public char multiCharsWildcard()
Description copied from interface: Indexing.WordSieve
Returns the wildcard character which matches several characters. Used for full-text query parsing. In SQL LIKE patterns, it is '%', in Unix-glob patterns, it is '*'.

Specified by:
multiCharsWildcard in interface Indexing.WordSieve
Returns:
the wildcard character

singleCharWildcard

public char singleCharWildcard()
Description copied from interface: Indexing.WordSieve
Returns the wildcard character which matches a single character. Used for full-text query parsing. In SQL LIKE patterns, it is '_', in Unix-glob patterns, it is '?'.

Specified by:
singleCharWildcard in interface Indexing.WordSieve
Returns:
the wildcard character

mapChar

public char mapChar(char c)
Description copied from interface: Indexing.WordSieve
Normalizes a character.

Specified by:
mapChar in interface Indexing.WordSieve
Parameters:
c - a source character converted to a normalized value in the returned word, for example converted to uppercase.
Returns:
normalized character

charAt

public char charAt(int ahead)
Description copied from interface: Indexing.WordSieve
Returns the source character at a given position.

Specified by:
charAt in interface Indexing.WordSieve
Parameters:
ahead - an offset to the current position of the sieve in the source text. If equal to 0, return the character at current position.
Returns:
the character at position specified, or character 0 if out of bounds (no exception raised).

nextChar

public char nextChar()
Description copied from interface: Indexing.WordSieve
Moves to next source character and returns it, returns 0 if at end.

Specified by:
nextChar in interface Indexing.WordSieve
Returns:
the next source character

wordOffset

public int wordOffset()
Description copied from interface: Indexing.WordSieve
Returns the offset of the last word returned by nextWord.

Specified by:
wordOffset in interface Indexing.WordSieve
Returns:
an index in the source text fragment

wordLength

public int wordLength()
Description copied from interface: Indexing.WordSieve
Returns the original length of the last word returned by nextWord. Most often equal to the length of the array returned by nextWord, but can be different if normalization or stemming is performed.

Specified by:
wordLength in interface Indexing.WordSieve
Returns:
word length

setup

protected void setup(boolean caseSensitive,
                     boolean accentSensitive)

setParameters

public void setParameters(String[] parameters)
Description copied from interface: Indexing.Sieve
Defines optional parameters for the sieve.

Specified by:
setParameters in interface Indexing.Sieve
Parameters:
parameters - an array of even size containing alternately a parameter name and a parameter value.

copy

public Indexing.WordSieve copy()
Description copied from interface: Indexing.WordSieve
Creates a carbon copy of this object.

Specified by:
copy in interface Indexing.WordSieve
Returns:
a new copy of this object

© 2008 Axyana Software