History for CustomizingTheDocumentProcessor
??changed:- Customizing the document processor The document processor is driven by two tables. The first table, named 'paragraph_types', is a sequence of callable objects or method names for coloring paragraphs. If a table entry is a string, then it is the name of a method of the document processor to be used. For each input paragraph, the objects in the table are called until one returns a value (not 'None'). The value returned replaces the original input paragraph in the output. If none of the objects in the paragraph types table return a value, then a copy of the original paragraph is used. The new object returned by calling a paragraph type should implement the ReadOnlyDOM, StructuredTextColorizable, and StructuredTextSubparagraphContainer interfaces. See the 'Document.py' source file for examples. A paragraph type may return a list or tuple of replacement paragraphs, this allowing a paragraph to be split into multiple paragraphs. The second table, 'text_types', is a sequence of callable objects or method names for coloring text. The callable objects in this table are used in sequence to transform the input text into new text or objects. The callable objects are passed a string and return nothing ('None') or a three-element tuple consisting of: - a replacement object, - a starting position, and - an ending position The text from the starting position is (logically) replaced with the replacement object. The replacement object is typically an object that implements that implements the ReadOnlyDOM, and StructuredTextColorizable interfaces. The replacement object can also be a string or a list of strings or objects. Replacement is done from beginning to end and text after the replacement ending position will be passed to the character type objects for processing. To create a new StructuredText format based on the document processor, simply subclass the document processor's class and override the processing tables or the methods that the processing table references. The class of the document processor can be found in the 'DocumentClass' module of the StructuredText package. Example 1, Disabling use of single quotes for literal inline text Many people don't like the ClassicStructuredTextRule that causes single-quoted strings to be translated to literal text (e.g. HTML 'code' tags). We can disable this in two ways. First, we can modify the text_types table to remove this text type. The original text_type table in the 'DocumentClass' class looks like:: text_types = ![ 'doc_href', 'doc_strong', 'doc_emphasize', 'doc_literal', ] We can create our own document processor class with a different table:: import StructuredText, StructuredText.DocumentClass, re class myDocumentClass(StructuredText.DocumentClass.DocumentClass): text_types = filter(lambda t: t != 'doc_literal', StructuredText.DocumentClass.DocumentClass.text_types) Document=myDocumentClass() src=open('mydata').read() # get some source text basic=StructuredText.Basic(src) # convert it to a basic document doc=Document(basic) # convert it to a document-style html=StructuredText.HTML(doc) # generate HTML Note that we created the subclass table with a filter so that we can still pick up new text stypes as they are added to the base class. Another approach would be to replace the method that detects literal text with one that does nothing:: class myDocumentClass(StructuredText.DocumentClass.DocumentClass): def doc_literal(self, s): pass Example 2, Provide an alternate literal format Rather than disable the ability to provide literal text, we could simply change it by providing a function that implements a different rule. For example, we might want to allow literal inline text to be spelled with double backward and forward single quotes as in:: We can use expressions in the DTML var tag as in ``<dtml-var "x+'.txt'">'' In this case, we simply override the method that recognizes literal text with one that implements this rule:: class myDocumentClass(StructuredText.DocumentClass.DocumentClass): def doc_literal( self, s, expr=re.compile( "(?:\s|^)``" # open "(![^\n]+?)" # contents "''(?:\s|![,.;:!?]|$)" # close ).search): r=expr(s) if r: start, end = r.span(1) return ( StructuredText.DocumentClass.StructuredTextLiteral( s![start:end]), start-2, end+2) else: return None