CJKSplitter v0.2
New Release of CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex
CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text stored as Unicode. It uses a simple, but workable, "hack" instead of trying to do real word splitting from dictionaries. Compared to a dictionary based word splitter, this results in a bigger index and more matches than necessary, but it is a cheap price to pay for the reduced complexity.
Version 0.2 improves on the previous in a number of ways: uses Unicode internally (not UTF-8), replaces configuration file with lookups using unicodedata module for looking up CJK characters and symbols, adds unit tests, and detailed English instructions for installation etc. It may even work for Korean / Japanese (untested).