Changes to Zope Developers Guide, Chapter 2, Object Publishing

* Add the following before the section 'HTTP Responses' under 'Stringifying 
the published object'

Character Encodings for Responses

 If the published method returns an object of type 'string', a plain
 8-bit character string, the publisher will use it directly as the body of the
 response. 

 Things are different if the published method returns a unicode string,
 because the publisher has to apply some character encoding. The published
 method can choose which character encoding it uses by setting a
 'Content-Type' response header which includes a 'charset' property
 (setting response headers is explained later in this chapter). A
 common choice of character encoding is UTF-8. To cause the publisher
 to send unicode results as UTF-8 you need to set a
 'Content-Type' header with the value 'text/html; charset=UTF-8'

 If the 'Content-Type' header does not include a charser property (or if this
 header has not been set by the published method) then the publisher will
 choose a default character encoding. Today this default is ISO-8859-1
 (also known as Latin-1) for compatability with old versions of Zope which
 did not include Unicode support. At some time in the future this default
 is likely to change to UTF-8.


* Inside the section 'Argument Conversion' is a list of type conversion 
marshalling tags. Insert the following definition of 'ustring' under 'string'

 ustring
  Converts a variable to a Python unicode string.

* and insert this definition at the bottom of the list

 ulines, utokens, utext
  like lines, tokens, text, but using unicode strings instead of
  plain strings.

* Insert this section before 'Method Arguments'

Character Encodings for Arguments

 The publisher needs to know what character encoding was used by the browser
 to encode form fields into the request. That depends on whether the form
 was submitted using GET or POST (which the publisher can work out for itself)
 and on the character encoding used by the page which contained the form
 (for which the publisher needs your help).

 In some cases you need to add a specification of the character encoding
 to each fields type converter. The full details of how this works are
 explained below, however most users do not need to deal with the full
 details:

 1 If your pages all use the UTF-8 character encoding (or at least all the
   pages that contain forms) the browsers will always use UTF-8 for
   arguments. You need to add ':utf8' into all argument type converts. For
   example:

   <input type="text" name="name:utf8:ustring">
   <input type="checkbox" name="numbers:list:int:utf8" value="1">
   <input type="checkbox" name="numbers:list:int:utf8" value="1">

 2 If your pages all use a character encoding which has ASCII as a subset
   (such as Latin-1, UTF-8, etc) then you do not need to specify any
   chatacter encoding for boolean, int, long, float, and date types.
   You can also omit the character encoding type converter from string,
   tokens, lines, and text types if you only need to handle ASCII characters
   in that form field.

  Character Encodings for Arguments; The Full Story

   If you are not in one of those two easy categories, you first need
   to determine which character encoding will be used by the browser to
   encode the arguments in submitted forms.

   1. Forms submitted using GET, or using POST with 
      "application/x-www-form-urlencoded" (the default) 

      1. Page uses an encoding of unicode:
         Forms are submitted using UTF8, as required by RFC 2718 2.2.5

      2. Page uses another regional 8 bit encoding:
         Forms are often submitted using the same encoding as the
         page. If you choose to use such an encoding then you should
         also verify how browsers behave.

   2. Forms submitted using "multipart/form-data": 

      According to HTML 4.01 (section 17.13.4) browsers should state which
      character encoding they are using for each field in a Content-Type
      header, however this is poorly supported. The current crop of
      browsers appear to use the same encoding as the page containing
      the form. 

   Every field needs that character encoding name appended to is converter.
   The tag parser insists that tags must only use alphanumberic characters
   or an underscore, so you might need to use a short form of the
   encoding name from the Python 'encodings' library package (such
   as utf8 rather than UTF-8).