org.exist.storage
Class TextSearchEngine

java.lang.Object
  extended byjava.util.Observable
      extended byorg.exist.storage.TextSearchEngine
Direct Known Subclasses:
NativeTextEngine

public abstract class TextSearchEngine
extends java.util.Observable

This is the base class for all classes providing access to the fulltext index. The class has methods to add text and attribute nodes to the fulltext index, or to search for nodes matching selected search terms.

Author:
wolf

Field Summary
protected  DBBroker broker
           
protected  Configuration config
           
protected  boolean indexNumbers
           
protected  boolean stem
           
protected  PorterStemmer stemmer
           
protected  java.util.TreeSet stoplist
           
protected  Tokenizer tokenizer
           
 
Constructor Summary
TextSearchEngine(DBBroker broker, Configuration conf)
          Construct a new instance and configure it.
 
Method Summary
abstract  void close()
           
abstract  void flush()
           
abstract  NodeSet[] getNodesContaining(DocumentSet doc, java.lang.String[] expr)
          For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes.
abstract  NodeSet[] getNodesContaining(DocumentSet docs, java.lang.String[] expr, int type)
          For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes.
 Tokenizer getTokenizer()
          Returns the Tokenizer used for tokenizing strings into words.
abstract  void reindex(DocumentImpl oldDoc, NodeImpl node)
          Reindex a document or node.
abstract  void removeCollection(Collection collection)
          Remove index entries for an entire collection.
abstract  void removeDocument(DocumentImpl doc)
          Remove all index entries for the given document.
abstract  Occurrences[] scanIndexTerms(User user, Collection collection, java.lang.String start, java.lang.String end, boolean inclusive)
          Scan the fulltext index and return an Occurrences object for each of the index keys.
abstract  void storeAttribute(IndexPaths idx, AttrImpl text)
          Tokenize and index the given attribute node.
abstract  void storeText(IndexPaths idx, TextImpl text)
          Tokenize and index the given text node.
 
Methods inherited from class java.util.Observable
addObserver, clearChanged, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers, setChanged
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

stoplist

protected java.util.TreeSet stoplist

broker

protected DBBroker broker

tokenizer

protected Tokenizer tokenizer

config

protected Configuration config

indexNumbers

protected boolean indexNumbers

stem

protected boolean stem

stemmer

protected PorterStemmer stemmer
Constructor Detail

TextSearchEngine

public TextSearchEngine(DBBroker broker,
                        Configuration conf)
Construct a new instance and configure it.

Parameters:
broker -
conf -
Method Detail

getTokenizer

public Tokenizer getTokenizer()
Returns the Tokenizer used for tokenizing strings into words.

Returns:

storeText

public abstract void storeText(IndexPaths idx,
                               TextImpl text)
Tokenize and index the given text node.

Parameters:
idx -
text -

storeAttribute

public abstract void storeAttribute(IndexPaths idx,
                                    AttrImpl text)
Tokenize and index the given attribute node.

Parameters:
idx -
text -

flush

public abstract void flush()

close

public abstract void close()

getNodesContaining

public abstract NodeSet[] getNodesContaining(DocumentSet doc,
                                             java.lang.String[] expr)
For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes. This method uses MATCH_EXACT for comparing search terms.

Parameters:
doc -
expr -
Returns:

getNodesContaining

public abstract NodeSet[] getNodesContaining(DocumentSet docs,
                                             java.lang.String[] expr,
                                             int type)
For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes. The type-argument indicates if search terms should be compared using a regular expression. Valid values are DBBroker.MATCH_EXACT or DBBroker.MATCH_REGEXP.

Parameters:
expr -
Returns:

scanIndexTerms

public abstract Occurrences[] scanIndexTerms(User user,
                                             Collection collection,
                                             java.lang.String start,
                                             java.lang.String end,
                                             boolean inclusive)
                                      throws PermissionDeniedException
Scan the fulltext index and return an Occurrences object for each of the index keys. Arguments start and end are used to restrict the range of keys returned. For example start="a" and end="az" will return all keywords starting with letter "a".

Parameters:
user -
collection -
start -
end -
inclusive -
Returns:
Throws:
PermissionDeniedException

removeCollection

public abstract void removeCollection(Collection collection)
Remove index entries for an entire collection.

Parameters:
collection -

removeDocument

public abstract void removeDocument(DocumentImpl doc)
Remove all index entries for the given document.

Parameters:
doc -

reindex

public abstract void reindex(DocumentImpl oldDoc,
                             NodeImpl node)
Reindex a document or node. If node is null, all levels of the document tree starting with DocumentImpl.reindexRequired() will be reindexed.

Parameters:
oldDoc -
node -


Copyright (C) Wolfgang Meier. All rights reserved.