Maps words to word ids and then some
The Lexicon object is an attempt to abstract vocabularies out of
Text indexes. This abstraction is not totally cooked yet, this
module still includes the parser for the Text Index Query
Language and a few other hacks.
Methods
|
|
Splitter
__getitem__
__init__
__len__
get
grep
query_hook
set
set_stop_syn
|
|
Splitter
|
Splitter (
self,
astring,
words=None,
)
wrap the splitter
|
|
__getitem__
|
__getitem__ ( self, key )
|
|
__init__
|
__init__ ( self, stop_syn=None )
|
|
__len__
|
__len__ ( self )
|
|
get
|
get (
self,
key,
default=None,
)
|
|
grep
|
grep ( self, query )
regular expression search through the lexicon
he he.
Do not use unless you know what your doing!!!
|
|
query_hook
|
query_hook ( self, q )
we don't want to modify the query cuz we're dumb
|
|
set
|
set ( self, word )
return the word id of word
|
|
set_stop_syn
|
set_stop_syn ( self, stop_syn )
pass in a mapping of stopwords and synonyms. Format is:
{'word' : [syn1, syn2, ..., synx]}
Vocabularies do not necesarily need to implement this if their
splitters do not support stemming or stoping.
|
|