Log in |
Zope 2.6 Unicode ChangesZope 2.6 includes better support for using unicode. This support is based on the patches by Toby Dickenson, previously distributed at http://www.zope.org/Members/htrd/wstring. Changes to ZPublisherZPublisher has been changed to handle a unicode response slightly differently to non-Unicode responses. If the response is not unicode then it behaves exactly as before. However, if (and only if) the response is Unicode then it applies the character encoding specified by the charset property in the Content-Type header. (This applies to all text/* content-types) If the Content-Type header does not include a charset property (or if it is blank - ZPublisher guesses 'text/html') then the unicode string is encoded into latin-1 using Python's replace policy, which replaces all non-latin-1 characters with a question mark. Changes to DTMLUnicode strings can be mixed freely with plain strings in DTML. DTML will return a unicode string if any of its constituents are Unicode, otherwise it will return a plain string as before. When Unicode strings are mixed with plain strings, the plain
string is converted to unicode assuming that it contains
latin-1 characters. Note that this is different to what happens
when you mix Unicode and plain strings in python, where a
If you expect that your pages might include Unicode data, change your standard_html_header to something like the following example: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <dtml-call "RESPONSE.setHeader('Content-Type','text/html; charset=UTF-8')"> <title><dtml-var title_or_id></title> <dtml-var "u''"> </head> <body> Changes for PropertiesProperty pages and property sheets now include extra types
Changes for FormsZPublisher has processing for field names of the form This mechanism has been extended to include a specification of the
character encoding used by the response. You need to know which encoding
will be used by the browser and include an appropriate tag, such as
Four extra type converters have been added: Unicode equivalents of the existing string types. 'ustring', 'utokens', 'utext', and 'ulines'. If the field name does not include a character encoding tag, then it assumes the form was submitted in latin-1. Character Encoding Used In Form ResponsesAs explained above, you need to know which character encoding will be used by the browser submitting responses to your forms, and include the name of that encoding in the name of your form controls. The encoding used by a browser depends on the encoding used by the page containing the form, and the type of form.
You are right to think that this is harder than it really should be. A no-brainer policy is to use UTF8 for every page, in which case form responses are also always UTF8. Changes to the ZMI (Zope Management Interface)Previously the ZMI did not specify a character encoding used in its management interface, leaving it up to the individual browser to guess. From Zope 2.6 the default character encoding for the ZMI is latin-1. In future it may change to utf-8. Product authors can overide this default character encoding for their own unicode-aware management pages by setting the XXXXXTBD REQUEST header before calling XXXXX, as shown in the following example. This technique is currently used by the Properties page, to correctly display the value of unicode properties. EXAMPLE TBD Pages That Do Not Expect UnicodeThere are many DTML pages that are not currently unicode aware, including most of Zope's management interface. Many of these pages use their own choice of character encoding, with encoded character data stored in plain strings. These Unicode changes have been designed to allow these DTML pages to remain unchanged, provided a unicode property is not used on the page. Problem AreasThe following issues remain a problem.
Changes Since The Last ReleaseThe support in Zope 2.6 is based on the patches previously distributed at http://www.zope.org/Members/htrd/wstring. The following changes have been made since the last release of that patch:
|