org.exist.storage
Class NativeTextEngine

java.lang.Object
  extended byjava.util.Observable
      extended byorg.exist.storage.TextSearchEngine
          extended byorg.exist.storage.NativeTextEngine

public class NativeTextEngine
extends TextSearchEngine

This class is responsible for fulltext-indexing. Text-nodes are handed over to this class to be fulltext-indexed. Method storeText() is called by RelationalBroker whenever it finds a TextNode. Method getNodeIDsContaining() is used by the XPath-engine to process queries where a fulltext-operator is involved. The class keeps two database tables: table words stores the words found with their unique id. Table inv_idx contains the word occurrences for every word-id per document.

Author:
Wolfgang Meier

Field Summary
protected  BFile dbWords
           
protected  org.apache.oro.text.regex.PatternCompiler globCompiler
           
protected  org.exist.storage.NativeTextEngine.InvertedIndex invIdx
           
protected  org.apache.oro.text.regex.PatternMatcher matcher
           
protected  org.apache.oro.text.regex.PatternCompiler regexCompiler
           
protected  boolean useCompression
           
 
Fields inherited from class org.exist.storage.TextSearchEngine
broker, config, indexNumbers, stem, stemmer, stoplist, tokenizer
 
Constructor Summary
NativeTextEngine(DBBroker broker, Configuration config)
           
 
Method Summary
 void close()
           
protected  void collect(java.util.HashSet words, java.util.Iterator domIterator)
          Collect all words in a document to be removed
static boolean containsWildcards(java.lang.String str)
          check if string contains non-letters (maybe it's a regular expression?
 void flush()
           
 NodeSet[] getNodesContaining(DocumentSet docs, java.lang.String[] expr)
          Find all the nodes containing the search terms given by the array expr from the fulltext-index.
 NodeSet[] getNodesContaining(DocumentSet docs, java.lang.String[] expr, int type)
          Get all the nodes containing the search terms given by the array expr using the fulltext-index.
 NodeSet[] getNodesExact(DocumentSet docs, java.lang.String[] expr)
          Get all nodes whose content exactly matches the terms passed in expr.
 void reindex(DocumentImpl oldDoc, NodeImpl node)
          Reindex a document or node.
 void remove()
           
 void removeCollection(Collection collection)
          Remove indexed words for entire collection
 void removeDocument(DocumentImpl doc)
          Remove all index entries for the specified document
 Occurrences[] scanIndexTerms(User user, Collection collection, java.lang.String start, java.lang.String end, boolean inclusive)
          Scan the fulltext index and return an Occurrences object for each of the index keys.
static boolean startsWithWildcard(java.lang.String str)
           
 void storeAttribute(IndexPaths idx, AttrImpl attr)
          Index an attribute value
 void storeText(IndexPaths idx, TextImpl text)
          Index a text node
 void sync()
           
 
Methods inherited from class org.exist.storage.TextSearchEngine
getTokenizer
 
Methods inherited from class java.util.Observable
addObserver, clearChanged, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers, setChanged
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dbWords

protected BFile dbWords

invIdx

protected org.exist.storage.NativeTextEngine.InvertedIndex invIdx

useCompression

protected boolean useCompression

regexCompiler

protected org.apache.oro.text.regex.PatternCompiler regexCompiler

globCompiler

protected org.apache.oro.text.regex.PatternCompiler globCompiler

matcher

protected org.apache.oro.text.regex.PatternMatcher matcher
Constructor Detail

NativeTextEngine

public NativeTextEngine(DBBroker broker,
                        Configuration config)
Method Detail

containsWildcards

public static final boolean containsWildcards(java.lang.String str)
check if string contains non-letters (maybe it's a regular expression?

Parameters:
str - Description of the Parameter
Returns:
Description of the Return Value

startsWithWildcard

public static final boolean startsWithWildcard(java.lang.String str)

close

public void close()
Specified by:
close in class TextSearchEngine

collect

protected void collect(java.util.HashSet words,
                       java.util.Iterator domIterator)
Collect all words in a document to be removed

Parameters:
words - Description of the Parameter
domIterator - Description of the Parameter

flush

public void flush()
Specified by:
flush in class TextSearchEngine

reindex

public void reindex(DocumentImpl oldDoc,
                    NodeImpl node)
Description copied from class: TextSearchEngine
Reindex a document or node. If node is null, all levels of the document tree starting with DocumentImpl.reindexRequired() will be reindexed.

Specified by:
reindex in class TextSearchEngine
Parameters:
oldDoc -
node -

remove

public void remove()

getNodesContaining

public NodeSet[] getNodesContaining(DocumentSet docs,
                                    java.lang.String[] expr)
Find all the nodes containing the search terms given by the array expr from the fulltext-index.

Specified by:
getNodesContaining in class TextSearchEngine
Parameters:
docs -
expr -
Returns:
array containing a NodeSet for each of the search terms

getNodesContaining

public NodeSet[] getNodesContaining(DocumentSet docs,
                                    java.lang.String[] expr,
                                    int type)
Get all the nodes containing the search terms given by the array expr using the fulltext-index.

Specified by:
getNodesContaining in class TextSearchEngine
Parameters:
docs - the input document set
expr - array of search terms
type - either MATCH_EXACT or MATCH_REGEX
Returns:
array containing a NodeSet for each of the search terms

getNodesExact

public NodeSet[] getNodesExact(DocumentSet docs,
                               java.lang.String[] expr)
Get all nodes whose content exactly matches the terms passed in expr. Called by method getNodesContaining.

Returns:
array containing a NodeSet for each of the search terms

scanIndexTerms

public Occurrences[] scanIndexTerms(User user,
                                    Collection collection,
                                    java.lang.String start,
                                    java.lang.String end,
                                    boolean inclusive)
                             throws PermissionDeniedException
Description copied from class: TextSearchEngine
Scan the fulltext index and return an Occurrences object for each of the index keys. Arguments start and end are used to restrict the range of keys returned. For example start="a" and end="az" will return all keywords starting with letter "a".

Specified by:
scanIndexTerms in class TextSearchEngine
Parameters:
user -
collection -
start -
end -
inclusive -
Returns:
Throws:
PermissionDeniedException

removeCollection

public void removeCollection(Collection collection)
Remove indexed words for entire collection

Specified by:
removeCollection in class TextSearchEngine
Parameters:
collection - Description of the Parameter

removeDocument

public void removeDocument(DocumentImpl doc)
Remove all index entries for the specified document

Specified by:
removeDocument in class TextSearchEngine
Parameters:
doc - The document

storeAttribute

public void storeAttribute(IndexPaths idx,
                           AttrImpl attr)
Index an attribute value

Specified by:
storeAttribute in class TextSearchEngine
Parameters:
attr - the attribute to be indexed
idx -

storeText

public void storeText(IndexPaths idx,
                      TextImpl text)
Index a text node

Specified by:
storeText in class TextSearchEngine
Parameters:
idx - IndexPaths object passed in by the broker
text - the text node to be indexed
Returns:
boolean indicates if all of the text content has been added to the index

sync

public void sync()


Copyright (C) Wolfgang Meier. All rights reserved.