You are not logged in Log in Join
You are here: Home » Members » Toby Dickenson » howto » Configuring Squid as an Accelerator for Zope

Log in
Name

Password

 

Configuring Squid as an Accelerator for Zope

This HowTo gives an explanation of configuring Squid as a caching http accelerator for Zope. In this configuration the Squid looks to the outside world like an ordinary http server, however it obtains its page content from your Zope http server.

Some elements described in this HowTo can be omitted from a simpler configuration. However, simple configurations always grow into bigger ones. The example given here includes elements that every Squid installation will need sooner or later:

  1. Multiple virtual hosts from the same squid
  2. Multiple redundant zope servers for each virtual host
  3. Load balancing and load clustering between each zope server
  4. https

To make best use of your caching server you will need to set specific caching headers in your Zope application. See another HowTo for more information on this manually. Alternatively Zope 2.3 has limited built-in support for setting caching headers.

Apache?

Some people have reported success with getting Apache to perform caching, using ProxyPass. For detailed instructions see http://www.zope.org/Members/rbeer/caching. I prefer squid since it is easier to configure, and uses fewer system resources. It also provides more detailed cache-specific logs, and provides detailed statistics while it is running through its web-based Cache Manager CGI.

Configuration

For the full details see Squids documentation on its http-accelerator mode.

You must use Squid version 2.3.STABLE4 or later. Earlier versions do not support everything described in this HowTo.

Edit squid.conf and change the following values

http_port

Specify the IP addresses on which squid should listen.

https_port

Specify the https ports on which squid should listen. You will need to provide one port for each SSL key and certificate. Note that https support requires Squid 2.5

no_cache

The default configuration prevents squid from caching for any URL containing cgi-bin or ?. This is inapproporiate as a Zope accelerator, so you probably want to remove this no_cache line from the default configuration file.

hierarchy_stoplist

This is inappropriate for the same reason as no_cache. You probably want to remove the default hierarchy_stoplist line.

request_body_max_size

Increase this number if you want to allow uploads larger than 1M

redirect_program

All front-end proxy solutions involve a problem of mapping incoming URLs into requests to backend servers. Squid solves this by allowing to provide an external program or script that translates external URLs into internal URLS. The content of this script is explained below, and you must enter the name of the script as redirect_program.

(for comparison, Apache's ProxyPass uses regular-expression based rules in its configuration file. I prefer the squid solution, mainly because I can easily test the redirector script outside of Squid. However it is a little more intimidating if you are not familiar with writing text-processing scripts.)

http_access

Squids default access rules prevent http access except from the listed IP addresses. The easiest change is to change the http_access deny all to http_access allow all.

Such a configuration is secure, but a single error in your redirector script could turn your squid into an open relay. For extra depth of security you may want to add acl accel_hosts dst ipofbackendhost1 ipofbackendhost2, http_access allow accel_hosts, and leave the http_access deny all intact.

http_accel_host

httpd_accel_host virtual

http_accel_port

httpd_accel_host 0

http_accel_with_proxy

httpd_accel_with_proxy on

http_accel_uses_host_header

httpd_accel_uses_host_header on

Backend Glue

All that remains is to tell squid about your backend servers. This howto only deals with Zope, although it is possible to use other web servers as backeds.

Option 1

Squid provides two different ways to make an http request to a backend server. The first option is most traditional; where the backend server is an origin server, and squid makes an ordinary http request.

To implement this your redirector script must translate a url such as http://www.example.com/a/b/c into http://backend-zope.dmz.example.com:8080/VirtualHostBase/http/www.example.com:80/a/b/c

Note that the host name in the output URL is the backend host, and probably should be inaccessible from the public internet. The trailing abc has been copied from the incoming URL, and the VirtualHostBase/http/www.example.com:80 segment is some VirtualHostMonster magic so that Zope can reconstruct the URL used by the original requester (in case it has to use similar URLs in its pages). www.zope.org contains full documentation on installing and using VirtualHostMonster.

Option 2

Treating the Zope backend as an origin server as described in option 1 is by far the easiest to set up, but not the most effective. I believe this second option to be better, although it is a little unconventional.

Squid can also make http requests to other caches, which Zope can understand. Squid contains some sophisticated logic for managing connections to a pool of other caches, and these features prove to be useful for managing a pool of backend Zope servers too.

To implement this solution your redirector script must output a URL where the hostname part of the URL is a keyword which describes a pool of backend servers, such as http://backendpool/VirtualHostBase/http/www.example.com:80/a/b/c Note that the hostname part of the URL is not a real host; it is a keyword that will be used in squid's configuration.

In addition you must configure squid with the backend zope server as peers, and configure its access control rules so that all requests for that 'host' keyword are directed to those peers. If you have two Zope instances which serve redundant copies of the same virtual host, then squid.conf needs to contain lines such as:

cache_peer backendzope1.dmz.example.com parent 8080 8080 no-digest no-netdb-exchange round-robin
cache_peer backendzope2.dmz.example.com parent 8080 8080 no-digest no-netdb-exchange round-robin

acl in_backendpool dstdomain backendpool
cache_peer_access backendzope1.dmz.example.com allow in_backendpool
cache_peer_access backendzope1.dmz.example.com deny all
cache_peer_access backendzope2.dmz.example.com allow in_backendpool
cache_peer_access backendzope2.dmz.example.com deny all

never_direct allow all

The never_direct line will ensure that Squid does not try to resolve the backendpool 'host' keyword as if it was a real host name, to connect to it if all the peers are down. You may need a more sophisticated never_direct acl if you have some backend servers which are not presented as peers.

The configuration above assumes that the two backend zopes are providing http and ICP on port 8080. To use ICP you will need to enable it with the --icp command line switch, and you will need to some patches for Zope versions before 2.6. Alternatively include the no-query directive in the cache_peer lines.

Redirector Scripts

This really needs some examples.

You should ensure that your redirector script will only output URLs targeting your backend servers. If your script can output URLs to arbitrary hosts then your accelerator is effectively an open http proxy.

Credits

Thanks to Jim Washington for pointing out the security concerns for the redirector script.

Thanks to Robert Collins for the more robuts http_access option.

Thanks to Wankyu Choi for pointing out the typos in the examples.