org.exist.storage.analysis
Class SimpleTokenizer

java.lang.Object
  extended byorg.exist.storage.analysis.SimpleTokenizer
All Implemented Interfaces:
Tokenizer

public class SimpleTokenizer
extends java.lang.Object
implements Tokenizer

This is the default class used by the fulltext indexer for tokenizing a string into words. Known token types are defined by class Token.

Author:
Wolfgang Meier

Constructor Summary
SimpleTokenizer()
           
SimpleTokenizer(boolean stem)
           
 
Method Summary
protected  TextToken alpha(TextToken token, boolean allowWildcards)
           
protected  TextToken alphanum(TextToken token, boolean allowWildcards)
           
protected  void consume()
           
protected  TextToken eof()
           
 int getLength()
           
 java.lang.String getText()
           
protected  TextToken nextTerminalToken(boolean wildcards)
           
 TextToken nextToken()
           
 TextToken nextToken(boolean wildcards)
           
protected  TextToken number()
           
protected  TextToken p()
           
 void setStemming(boolean stem)
           
 void setText(java.lang.CharSequence text)
           
protected  TextToken whitespace()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleTokenizer

public SimpleTokenizer()

SimpleTokenizer

public SimpleTokenizer(boolean stem)
Method Detail

setStemming

public void setStemming(boolean stem)
Specified by:
setStemming in interface Tokenizer

alpha

protected TextToken alpha(TextToken token,
                          boolean allowWildcards)

alphanum

protected TextToken alphanum(TextToken token,
                             boolean allowWildcards)

consume

protected void consume()

eof

protected TextToken eof()

getLength

public int getLength()

getText

public java.lang.String getText()

nextTerminalToken

protected TextToken nextTerminalToken(boolean wildcards)

nextToken

public TextToken nextToken()
Specified by:
nextToken in interface Tokenizer

nextToken

public TextToken nextToken(boolean wildcards)
Specified by:
nextToken in interface Tokenizer

number

protected TextToken number()

p

protected TextToken p()

setText

public void setText(java.lang.CharSequence text)
Specified by:
setText in interface Tokenizer

whitespace

protected TextToken whitespace()


Copyright (C) Wolfgang Meier. All rights reserved.