You are not logged in Log in Join
You are here: Home » Members » Kaivo » Catalog Query » Catalog Query How-To

Log in
Name

Password

 
 

This product is the realization of a feature that I have wanted to have in Zope since I started using it. ZCatalogs are an integral part of Zope development especially any time you store content or data in the ZODB, which I find myself doing all the time.

Although getting data indexed in a catalog is quite easy, getting the data back out with anything but the simplest criteria is a chore. The biggest limitation is the inability with the built in query mechanism to do "or" logic across multiple indexes. Range searches are also kludgey (but at least doable).

The goal of this product is to solve the problem of easily extracting data from a catalog using an easy to understand query language that can represent arbitrary query logic intuitively. This has been accomplished by adding (through a hotfix) a new "query" method to ZCatalog. Through this method, you can pass query strings to return matching catalog result sets.

The Implementation

There is actually a large amount of "hidden" functionality that DC has stuffed into ZCatalog, indexes and the BTree data structures deep in the bowels of Zope. My product does not really invent any of this functionality, it just exposes what was already there and previously inaccessible.

A major design goal was to not have to make changes to the basic code underlying the ZCatalog and indexes. Although I was skeptical that I could accomplish this at first, in the end this goal was realized. This product changes nothing about the way that ZCatalogs function under the hood. it merely adds a new interface to functionality that already existed.

Because this product is a hotfix, it can be uninstalled by simply removing the product directory from your Zope installation.

Performing ZCatalog Queries

Ok, lets look at how this puppy works! As I mentioned, this product adds a new method named "query" to ZCatalog. It takes two arguments: the query string, and an optional mapping or namespace object to be used in evaluating expressions embedded in the query.

Calling from DTML:

<dtml-in expr="Catalog.query(...)">
    ...
</dtml-in>

Calling from Python scripts:

results = context.Catalog.query(...)

Query Language

The query language is designed to look very much like a Python expression. If you are more familiar with SQL than Python, it will look much like a WHERE clause in a SQL statement. The basic syntax is:

<index name> <operator> <value expression> [and|or ...]

Let’s look at a few basic query examples:

title == ’spam’

Returns all objects where the index "title" matches "spam". The exact behavior depends on the type of index (TextIndex, FieldIndex, KeywordIndex). This is equivilant to:

Catalog(title=’spam’)

Let’s extend that a bit:

title == ’spam’ and id == ’index_html’

This is equivilant to:

Catalog(title=’spam’, id=’index_html’)

OK, so far we haven’t done anything that we couldn’t do before, but how about this:

title == ’spam’ or PrincipiaSearchSource == ’spam’

or this:

title == ’spam’ and PrincipiaSearchSource != ’eggs’

Catalog Query Operators

Let’s take a look at the operators available to you in query expressions:

Operator Description

== index matches value
!= index doesn’t match value
>= index greater than or equal to value
<= index less than or equal to value
> index greater than value
< index less than value
between index between a list of two values
in index in a list of values
not in index not in a list of values

These operators are the same as in Python excepting the addition of the "between" operator. Let’s look at using it in a query:

bobobase_modification_time between [start_date, end_date]

This is equivilant to:

    Catalog(bobobase_modification_time=[start_date, end_date],
            bobobase_modification_time_usage="range:min:max")

If I never see the latter one again it will be too soon! 8^)

Expressions in Querys

As I mentioned earlier, you can write an expression for the value for the index to match against. If you want to pass names from the namespace into your queries, you will need to pass REQUEST or _ as the second argument to the query method. Let’s looks at another example with bobobase_modification_time:

Catalog.query(’bobobase_modificationtime >= ZopeTime()-30’, )

This returns all objects modified in the last 30 days. The value is calculated by using ZopeTime function. You must pass _ as the second argument for the query machinery to be able to access ZopeTime.

The only limitation on expressions is that they cannot contain the "and" or "or" Python operators. Instead use "&" and "|" respectively.

DTML Quoting Conundrums

The keen eyed amongst you may have already noticed a hangup when trying to match against string values in a catalog query from DTML. Since the query itself is a string inside of the expr="..." attribute, single and double quotes are already accounted for before we even get to the query string.

Here is an illustration of the problem, take the query:

title == ’spam’

You cannot write this into DTML and have it work:

<dtml-in expr="Catalog.query(’title == ’spam’’)">

nor can you write this: (and triple-quotes don’t work either)

<dtml-in expr="Catalog.query(’title == "spam"’)">

There are two solutions, one built into Zope and one that I came up with. The first is to "escape" the single quotes using backslashes (\):

<dtml-in expr="Catalog.query(’title == \’spam\’’)">

I find this to be quite ugly, but it does work. Another solution that I added was the ability to delimit string with double back-quotes (``). This allows you to legally write this as:

<dtml-in expr="Catalog.query(’title == ``spam``’)">

Still not 100% the best, but an improvement. Internally, the double back-quotes are replaced with real double-quotes (") before being fed to the query parser. If anyone has a better idea on how to implement this, let me know.

To avoid this problem altogether, use Python Scripts!

Things That Are Missing

This being the first release and all, there are things that I would like to see added (feel free to suggest other improvements or contribute code).

  • You cannot group in query expressions using parens. The execution always occurs from left to right. (but at least it’s predictable)

  • The "==" and "in" operators and "!=" and "not in" are exactly the same. In later versions I would like to differentiate them, although the index machinery seems to feel otherwise...

  • I would like to allow you to create query objects in Zope to embody Catalog queries. This would allow greater optimization and the creation of a "non-programmer" query creation interface.

  • You cannot as yet specify an index to sort the results on.

  • There are plenty of additional optimizations that could be done internally to make it more efficient.

I hope you enjoy using query expressions. If you have a comment, suggestion, critique or bug to tell me about, send me an email at: [email protected]