Methods
|
|
|
|
__getitem__
|
__getitem__ ( self, word )
Return an InvertedIndex-style result "list"
|
|
__init__
|
__init__ (
self,
id=None,
ignore_ex=None,
call_methods=None,
lexicon=None,
)
Create an index
The arguments are:
-
id the name of the item attribute to index. This is
either an attribute name or a record key.
-
ignore_ex Tells the indexer to ignore exceptions that
are rasied when indexing an object.
-
call_methods Tells the indexer to call methods instead
of getattr or getitem to get an attribute.
lexicon is the lexicon object to specify, if None, the
index will use a private lexicon.
There is a ZCatalog UML model that sheds some light on what is
going on here. _index is a BTree which maps word ids to
mapping from document id to score. Something like:
{'bob' : {1 : 5, 2 : 3, 42 : 9}}
{'uncle' : {1 : 1}}
The _unindex attribute is a mapping from document id to word
ids. This mapping allows the catalog to unindex an object:
{42 : (bob , is , your , uncle )
This isn't exactly how things are represented in memory, many
optimizations happen along the way.
|
|
__len__
|
__len__ ( self )
|
|
__setstate
|
__setstate ( self, state )
|
|
_apply_index
|
_apply_index (
self,
request,
cid='',
ListType=[],
)
Apply the index to query parameters given in the argument,
request
The argument should be a mapping object.
If the request does not contain the needed parameters, then
None is returned.
Otherwise two objects are returned. The first object is a
ResultSet containing the record numbers of the matching
records. The second object is a tuple containing the names of
all data fields used.
|
|
_subindex
|
_subindex (
self,
isrc,
d,
old,
last,
)
|
|
clear
|
clear ( self )
|
|
evaluate
|
evaluate (
self,
q,
ListType=type([] ),
)
Evaluate a parsed query
|
|
getLexicon
|
getLexicon ( self, vocab_id )
bit of a hack, indexes have been made acquirers so that
they can acquire a vocabulary object from the object system in
Zope. I don't think indexes were ever intended to participate
in this way, but I don't see too much of a problem with it.
|
|
get_operands
|
get_operands (
self,
q,
i,
ListType=type([] ),
StringType=type( '' ),
)
Evaluate and return the left and right operands for an operator
|
|
index_object
|
index_object (
self,
i,
obj,
threshold=None,
tupleType=type(() ),
dictType=type({} ),
strType=type( "" ),
callable=callable,
)
Index an object:
i is the integer id of the document
obj is the objects to be indexed
threshold is the number of words to process between
commiting subtransactions. If None subtransactions are
not used.
the next four arguments are default optimizations.
|
|
positions
|
positions (
self,
docid,
words,
obj,
)
Return the positions in the document for the given document
id of the word, word.
|
|
query
|
query (
self,
s,
default_operator=Or,
ws=( string.whitespace, ),
)
This is called by TextIndexes. A query term which is a string
s is passed in, along with an index object. s is parsed, then
the wildcards are parsed, then something is parsed again, then the
whole thing is evaluated
|
|
unindex_object
|
unindex_object (
self,
i,
tt=type(() ),
)
carefully unindex document with integer id i from the text
index and do not fail if it does not exist
|
|