You are not logged in Log in Join
You are here: Home » Members » Bjorn Stabell » ZCTextIndex splitter that works with Chinese, Japanese, and Korean text » dc_view

Log in
Name

Password

 
 


Dublin Core Elements

The Dublin Core metadata element set is a standard for cross-domain information resource description.
Element Description Value
Identifier resource ID http://old.zope.org/Members/bjorn/CJKSplitter
Title resource name ZCTextIndex splitter that works with Chinese, Japanese, and Korean text
Description resource summary CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text stored as Unicode. It uses a simple, but workable, "hack" instead of trying to do real word splitting from dictionaries. Compared to a dictionary based word splitter, this results in a bigger index and more matches than necessary, but it is a cheap price to pay for the reduced complexity. Changes Summary - Version 0.2 "[email protected]":mailto:[email protected] improves on the previous in a number of ways: uses Unicode internally (not UTF-8), replaces configuration file with lookups using unicodedata module for looking up CJK characters and symbols, adds unit tests, and detailed English instructions for installation etc. - Version 0.1 "[email protected]":mailto:[email protected] original version. Known Problems - Text must (well, should) be stored as Unicode. - Cannot search single characters. - Could do a better job at identifying CJK characters. - May match more than is strictly necessary due to algorithm used. (See source code for details.) Please join the "zopeasia project on SourceForge":http://sourceforge.net/projects/zopeasia/ to participate in the development
Creator resource creator ZopeOrgSite
Date default date 2004-01-18 03:16:35
Format resource format text/html
Type resource type Software Package
Subject resource keywords Internationalization, SoftwareProduct, ZCatalog, catalog, i18n
Contributors resource collaborators
Language resource language
Publisher resource publisher No publisher
Rights resource copyright


Additional Zope Elements

Element Description Value
CreationDate date resource created 2003-03-09 21:08:43
ModificationDate date resource last modified 2004-01-18 03:16:35
EffectiveDate date resource becomes effective None
ExpirationDate date resource expires None

Backlinks: via Google / Technorati