You are not logged in Log in Join
You are here: Home » Members » Toby Dickenson » howto » HTTP Caching and Zope

Log in
Name

Password

 

HTTP Caching and Zope

Isn't Caching Automatic?

There are several layers of caching built in to Zope:

  • ZODB caching of objects.
  • Caching of DTML parse trees.
  • SQL methods caching queries.
  • Products such as CachePool that cache arbitrary data.

These caches reduce the time taken to execute any Zope query, however they are all ultimately limited. They can not reduce the time for a single request below a hard limit that is made up of:

  1. The time for the client to send the HTTP request over the network. This depends on network bandwidth.
  2. The time for ZServer to parse the request, ZPublisher to traverse to the appropriate object, check authorization, and dispatch to the appropriate method.
  3. The time for the method to execute.
  4. The time for the response to be sent back to the client over the network.

Zope itself cannot do anything about these overheads. They can only be eliminated by something outside of Zope - and that means exploiting the HTTP caching mechanisms.

This HowTo is a brief guide to using caching in Zope. For the full story, see RFC 2616.

Why Isn't Caching Automatic?

Caching isn't a win-win solution. It carries risks of making your Zope system more complicated than it needs to be, and the possibility of clients seeing outdated information. Without care, it is possible for caching to waste more time than is saved (calculating and checking the HTTP caching headers)

Think of caching as an Optimisation technique. Like any other optimisation, its only worth using after performance measurements show that there is a problem worth solving.

Who Provides the Caches?

The closer a cache is to the client, the more effective it is for him. The first few techniques presented here target caches built in to browsers, or caches shared by a group of users.

The technique of using a cache close to the Zope server (possibly on the same machine) is discussed further below.

What can Benefit from Caching?

Frequently accessed documents

Any document that is accessed many more times than it changes is a good candidate for caching. Examples are:

  • A home page, or central index page.
  • CSS files that are referenced from almost every page.
  • Images.

Users browsing though the site should, ideally, only have to download each of these items only once. This can be achieved by using dtml to set the Cache-Control header. At its simplest:

<dtml-call "RESPONSE.setHeader('Cache-Control','max-age=3600')">

The 3600 is the number of seconds for which a client (or intermediate cache) should not re-request the object. You need to tune the number for your application. In HTTP terms, the response is considered fresh until the specified time has elapsed. In general, an object will not be re-requested again if the client (or intermediate cache) knows it is fresh.

The Expires header is an alternative to max-age if your documents expire at a specific time. For example, daily news that is always updated at 9AM.

<dtml-call "RESPONSE.setHeader('Expires','Thu, 01 Aug 2000 09:00:00 GMT')">

Using this with images on html pages is particulary effective because it allows the browser to display the image (using the cached data) as soon as the html is loaded.

Unfortunately Zope's standard Image and File objects do not provide an easy way to set this header (as of version 2.2. This may get added in a later version). A cheat is to provide a DTML method that does just that, put it somewhere in the acquisition context, and name it in the Precodition field. This method will then be called before the File or Image data is returned.

Documents that are not secret

A browser will send authorization information for every request after the first one that needs it. This authorization information prevents subsequent responses from being stored in a shared cache. If your objects would be publicly visible anyway, then this can result in underusing a shared cache.

This can be avoided by adding a 'public' cache control directive:

<dtml-call "RESPONSE.setHeader('Cache-Control','public, max-age=3600')">

If you are developing a reusable product (ZClass or Python product) then you may not know whether your object should be public. The following python code can be used to determine whether an anonymous user can call that method object - and whether the public directive is appropriate.

class MyObject......
    def mymethod(self,REQUEST,RESPONSE):
        """Something that gets used alot by authenticated users, but
        probably isnt private.
        """
        if anonymous_access(self.REQUEST,self,self.mymethod__roles__):
            RESPONSE.setHeader('Cache-Control', 'max-age=60, public')
        else:
            RESPONSE.setHeader('Cache-Control', 'max-age=60')

def anonymous_access(REQUEST,ob,roles):
    """Check whether an anonymous user would be able to access
    the given object, with the given roles.
    """
    while 1:
        if hasattr(ob,'__allow_groups__'):
            other_user = ob.__allow_groups__.validate(REQUEST,None,roles)
            if other_user is not None:
                return 1
        if hasattr(ob,'aq_parent'):
            ob = ob.aq_parent
        else:
            break

Large documents

If a document is sufficiently large that the overhead of transmitting its content is a problem, then using the If-Modified-Since header can be effective. This allows the server to return a small response (including a 304 status code) if the document is unchanged since the client last retrieved it.

This technique is used by the standard File and Image objects, and that code is easy to steal if you are developing a product that behaves in a similar way. Beware that this code has a flaw: The Last-Modified, Content-Type and Cache-Control headers really should be set in the response even when returning a 304 status code.

The real complication with use of this technique is calculation of a last-modified time for each document. For objects that store their data in one ZODB object the last-modified time is easily obtained from the _p_mtime attribute. If the data comes from a file system file, use stat.

For most complex documents the answer involves a kludgey process of determining which ZODB objects and files make up the document, and finding the maximum of all the individual modification times. This might be a problem to consider at an early stage when designing a cache-aware product.

Documents that are a little bit secret

It is possible to allow documents which normally require authorization to be stored in a shared cache, by including a cache-control header that force the cache to revalidate the response every time (allowing Zope to check authorization). This is only a benefit when combined with If-Modified-Since processing This is not a high-security solution since it relies on the shared caches to be well behaved.

This can be achieved using the following mix of cache-control directives:

  • 'public' to allow it to be cached at all.
  • 's-maxage=0' to force any shared cache to always revalidate it.
  • 'proxy-revalidate' so that it gets revalidated, even with caches that have been configured to allow stale responses. (This is not needed in addition to 'must-revalidate')

Calculated documents

Caching is also useful for documents that are expensive to generate, since it is possible to avoid recalculating the document on subsequent requests. Caching is used to reduce server load, rather than to improve performance for individual users.

The Zope logic is no different to that already described, however the same techniques are more effective when an external HTTP proxy cache is used close to the Zope server. This might be:

Note that an external cache is only useful if your expensive documents are not confidential. If they require authorization then the external cache will not store them.

When Caching is a Problem

Stale documents

Some caches can be configured to return documents that have not been proved to be fresh. If it is a problem for users of your application to ever see out-of-date information then add the 'must-revalidate' cache control directive.

But, note that they might still see old pages by pressing a Back button.

Dynamic Content

HTTP 1.1 includes a mechanism to cover documents that depend on the value of a request header (perhaps user-agent, or accept-language). Unfortunately the 'vary' header is not yet supported by browsers or caches. If you have documents like this, the best solution is to include a 'private' cache control directive (or 'must-revalidate' too if you expect this header field to change)

Cookies

The presence of cookie headers in requests or set-cookie header in responses does not affect whether or not an HTTP reply can be cached. If the content of a document depends on a cookie then you should treat the document as described for 'Dynamic Content'.

Although a response that included a 'set-cookie' header can be cached, the 'set-cookie' header itself will not be. To make sure that your 'set-cookie' headers always take effect, you might need to add a 'no-cache' cache control directive too.

Private documents

If your documents are intended only for a single user, include the 'private' cache control directive.

If the documents contain sensitive information that should not even be kept in a private cache (where it might escape onto a backup tape, for example) the include 'no-store' in addition to 'private'