You are not logged in Log in Join
You are here: Home » Members » AlexR » Using Keyword Indexes in ZCatalogs

Log in
Name

Password

 

Using Keyword Indexes in ZCatalogs

A simple example

Version 0.2

This doc explains how to use keyword indexes to create a keyword search form. It is based on information collected on the Zope lists (including postings from Michel Pelletier and Stuart 'Zen' Bishop). Feel free to send your feedback.

Version History

  • Version 0.2 - 24th November 1999 - Minor changes
  • Version 0.1 - 22nd November 1999 - First draft

Index types in ZCatalogs

Traditionally ZCatalogs support two index types: Text Indexes and Field Indexes. In Zope 2.1b1 and later you can also use Keyword Indexes.

A Field Index looks at the entire property value as a single value. A TextIndex breaks the property up into words. For example, if you have a FieldIndex for "category", and you search for "Green", you will only find the items with the category "Green" (but not "Green Shirts" or "Green Eggs"). If this was a TextIndex, searching for Green would match all three of those.

So, you use Field Indexes for distinct values that you will be looking for (category, last name, zip code, etc.). Use Text Indexes for things like titles, or content that you want to be full text searchable.

For the record, here is further technical information (stolen from a posting by Michel Pelletier):

  • ZCatalog indexes objects into an arbitrary set of various kinds of indexes. Each index is responsible for indexing one particular property of an object. If the an object being indexed does not have a property that an index is looking for, the object is simply not indexed in that index (but it may have a property that another index is looking for, and therefore will be indexed in *that* index).

  • Text Index: property values are applied against a lexicon object that stems, stops, and parses the value into a full text index. The index may be queried with a simple boolean query language that allows 'and', 'or', phrasing, parenthesized boolean expressions, and proximity matching. Relevance ranking is supported and returns the sum of the occurances of all query terms in the "hit". A normalized score is also provided that is normalized from 0 to 100 over the whole result set.

  • Field Index: property values are treated atomically. Indexes can be queried for all objects that match that value. Range searches can also be done on indexed object values that support comparison (like numbers, dates, special purpose "length" objects, etc). Indexes can also be queried for the set of unique values in the index, for example, you can ask for the set of unique "meta_types" of all objects indexed. A good example of this is the search by "type" on the Zope site.

OK. What about keyword indexes then?

Keyword Indexes allow you to index a sequence of "keywords" or "key phrases" as a single property of an object. They have the same behavior as field indexes, except that property values are treated as a sequence of keywords. Thus you get the best of both worlds: multiple values in one property with 100% matching for every keyword (and case is relevant).

This is useful for buiding categorical hierarchies and many-many relationships, said the gurus. Well, so far, I've only build a keyword search form. Here's how.

Creating a simple keyword search form

This section describes how to add a keyword field in a Zclass, register the field as a keyword index in the ZCatalog, then add a drop-down list with individual keywords in the search form. I won't go into details that are covered elsewhere. For more info see the following docs.

Information sources

Here is a selection of docs. You can find many more on the Zope site.

You'll need to read this to get started:

Then read this to add features:

Adding a keyword field in your ZClass

Let's create a product called MyDoc. It contains a ZClass called MyDocClass. This ZClass subclasses CatalogAware (so that it self-registers into a ZCatalog), DTML Document and DTML Method. Its metatype is "MyDoc".

Note you may need to work around a couple of issues such as the Resource not found bug (that may strike when you create a ZClass instance) and the Missing ID bug (when you try to access an instance id from DTML).

Now in your ZClass, create a custom propertysheet called Info. On this propertysheet, create a lines property called doc_keywords. This property will store the keywords in the ZClass instances. (You need to store your properties on a custom propertysheet if your want your ZCatalog to be updated when properties are edited. For more info see the CatalogAware Howto.)

To edit your properties, create a form called myPropForm in the Zclass. Here is a code example:

<form action="myPropHandler">
  <table>
  ...
  <tr><th>Keywords</th>
  <td>
  <textarea name="doc_keywords:lines" rows="6" cols="40">
  <dtml-var "_.string.join(doc_keywords,'\n')"></textarea>
  </td></tr>
  ...
  <tr><td>
  <input type=submit value=" Save "></td></tr> 
  </table>
</form>

Note two important points:

  • In the textarea tag, add a :lines to the name attribute so that values are converted to lines before stored in the doc_keywords property.
  • You can use _.string.join(doc_keywords,'\n') to convert the current value for the doc_keywords property to lines before it is displayed in the text area.

To save the property values, create a handler method whose name must be specified in the FORM tag. Here it is called myPropHandler.

In myPropHandler, add code to store the property values. Example:

<dtml-call "propertysheets.Info.manage_changeProperties(REQUEST)">
<dtml-call reindex_object>

Note you need to specify the propertysheet name (Info here) when using a custom propertysheet. The second lines is useful if you build a CatalogAware ZClass: it will update the property values in the ZCatalog.

Remember to map myPropForm to a view so that you can easily edit properties in your ZClass instance. In your ZClass definition, click on the Views tab and map a Tab name to a file name (myPropForm here).

Registering the keyword index in the ZCatalog

Edit your ZCatalog. On the Indexes tab, enter doc_keywords in the Add Index field and choose KeywordIndex in the Index Type list. Then click the Add button.

On the MetaData Table tab, enter doc_keywords in the "Add column to the Meta Data table" field and click the Add button.

Creating ZClass instances

Now create a couple of MyDoc instances and add keywords in the Keywords field. Add one keyword per line. These keywords can actually be "key phrases"; they can contain several words. For instance, enter "Squids" in one document and "Squids are beautiful" (as one line) in another one.

If you created a CatalogAware ZClass, your MyDoc instances and their properties will be registered into the ZCatalog. If not, edit the ZCatalog and "force-index" them from the Find Items to ZCatalog tab. Your ZClass instances should then be listed on the Cataloged Objects tab.

Creating a search form

Create a new ZSearch Interface in your database. Select your ZCatalog in the searchable objects list. Let's call the search form mySearch and the result report myResults.

In mySearch, replace the default code for doc_keywords with the following code:

<SELECT name="doc_keywords:list" multiple size="4">
 <OPTION value=""></OPTION>
 <dtml-in "Catalog.uniqueValuesFor('doc_keywords')">
  <OPTION value="<dtml-var sequence-item>">
   <dtml-var sequence-item>
  </OPTION>
 </dtml-in>
</SELECT>

Here Catalog is the name of the ZCatalog the docs are indexed in. Note the following points:

  • Specify :list in the SELECT tag so that the selection is returned as a list.
  • The uniqueValuesFor('doc_keywords') method returns one instance of all existing keywords. These values are used to prefill the selection list with keywords actually used in the docs.
  • The multiple attribute in the SELECT tag allows multiple selections in the list. The catalog will then return any document that matches at the least one keyword/keyphrase (OR search).

[To do: how can I control the sort order in the selection list?]

In myResults, delete unnecessary fields, then display mySearch and try out the search feature. In the selection list, click one or more keyword/phrases. Note every keyword/phrase is displayed as one list entry. Try searching for "Squids". This matches at least one document. It doesn't match the document with the "Squids are beautiful" phrase.

Searching from DTML

You can also search the ZCatalog from DTML as for standard keywords. Example:

<dtml-in "Catalog.searchResults(doc_keywords='Squids')">
...
</dtml-in>

or

<dtml-in "Catalog.searchResults(doc_keywords in ['Squids', 'foo', 'bar'])">
...
</dtml-in>