An Introduction to Structured Text
By Paul Everitt
Engineers spend a lot of time communicating, primarily by email but also in documentation. However, writing by engineers is complicated by a simple fact: the world consumes writing largely in presentation formats such as HTML and PDF.
In theory this should be no problem, as we would all march happily off and write in "DocBook" (or perhaps LaTeX), the supposed lingua franca of documentation. However most tools don't support DocBook (or LaTex) very well, and even if tools were mature, most engineers would reject them.
Why? Engineers spend most of their time communicating in plain text. Their tools (vi and Emacs) are oriented toward text. The vast majority of words they communicate are in email. Finally, what little documentation you can squeeze out of engineers is in the form of "docstrings" in source code.
Wouldn't it be nice if there was a non-tag, text-oriented system for engineers to express semantic meaning? This is the problem Structured Text tackles. With Structured Text, format-independent writing becomes extremely convenient and natural, once a few rules are learned. Furthermore, Structured Text can be extended to cover advanced and customed uses.
To get a quick idea of what Structured Text does, the following words in Structured Text::
Sometimes the best approach to complexity is simplicty. A good structured text system is:
- Convenient
- Rich
is rendered into the following HTML:
<p> Sometimes the <em>best</em> approach to complexity is simplicity. A good structured text system is: </p> <ul> <li><p>Convenient</p></li> <li><p>Rich</p></li> </ul>
and the following DocBook XML:
<para> Sometimes the <emphasis>best</emphasis> approach to complexity is simplicty. A good structured text system is: </para> <itemizedlist> <listitem><para>Convenient</para></listitem> <listitem><para>Rich</para></listitem> </itemizedlist>
In fact, the text of this article is written in Structured Text. In this article, we'll look at the basics of Structured Text, organizing large text into sections, advanced formatting, and metadata issues.
Structured Text Basics
Let's plunge into structured text and look at the basics by correlating it to ideas in HTML.
The most basic idea in Structured Text is a paragraph. The following snippet of::
This is the first paragraph.
This is the second paragraph.
...is converted to the following in HTML:
<p>This is the first paragraph.</p> <p>This is the second paragraph.</p>
That is, white space matters in Structured Text. This is a very intuitive idea. For instance, in email paragraphs are separated by white space.
To introduce emphasis, Structured Text uses another text convention: asterisks. Note the following snippet:
This is the *first* paragraph. This is the **second** paragraph.
In HTML, this snippet introduces the em
tag and the strong
tag:
<p>This is the <em>first</em> paragraph.</p> <p>This is the <strong>second</strong> paragraph.</p>
Again, this is a common pattern in email. Several other common patterns are supported, such as referring to a piece of jargon:
When you see 'STX', you know this is shorthand for 'Structured Text'.
The HTML output is as follows:
<p>When you see <code>STX</code>, you know this is shorthand for <code>Structured Text</code>.</p>
Using Indentation
The preceding section focused on text conventions that convey a semantic meaning. This semantic meaning, when processed by Structured Text, produces certain HTML tags.
In Structured Text, indentation is also very important in conveying semantic meaning. The most basic is the idea from HTML of headings.
In the following snippet, indentation is used to convey an outline-like structure::
Using Indentation
The preceding section focused on text conventions that convey a semantic meaning. This semantic meaning, when processed by Structured Text, produces certain HTML tags.
This produces the following HTML:
<h1>Using Indentation</h1> <p>The preceding section focused on text conventions that convey a semantic meaning. This semantic meaning, when processed by Structured Text, produces certain HTML tags.</p>
That is, the indentation conveyed a semantic meaning. The paragraph was subordinate to the heading, and the relationship is thus expressed in HTML. In fact, outline relationship can be continued:
Using Indentation The preceding section focused on text conventions that convey a semantic meaning. This semantic meaning, when processed by Structured Text, produces certain HTML tags. Basics of Indentation In this section we will investigate the basics of indentation...
Hyperlinks
This produces the following HTML:
<h1>Using Indentation</h1> <p>The preceding section focused on text conventions that convey a semantic meaning. This semantic meaning, when processed by Structured Text, produces certain HTML tags.</p> <h2>Basics of Indentation</h2> <p>In this section we will investigate the basics of indentation...</p> <h2>Hyperlinks</h2>
Lists and Items
Lists are also supported in Structured Text, including unordered, ordered, and descriptive lists. The convention unordered lists is a common pattern in text-based communication::
HTML has three kinds of lists:
- Unordered lists
- Ordered lists
- Descriptive lists
Structured Text allows you to use the symbols '*', o
, and -
to
connote list items. The above example produces this HTML:
<p>HTML has three kinds of lists:</p> <ul> <li><p>Unordered lists</p></li> <li><p>Ordered lists</p></li> <li><p>Descriptive lists</p></li> </ul>
The Structured Text conventions for ordered lists is shown below:
HTML has three kinds of lists: 1. Unordered lists 2. Ordered lists 3. Descriptive lists
This produces:
<p>HTML has three kinds of lists:</p> <ol> <li><p>Unordered lists</p></li> <li><p>Ordered lists</p></li> <li><p>Descriptive lists</p></li> </ol>
Descriptive lists are also easily accommodated using double dashes:
Unordered Lists -- Generally inclues a series of bullets when viewed in HTML. Ordered Lists -- HTML viewers convert the list items into a numbered series. Descriptive Lists -- Usually used for definitional lists such as glossaries.
This becomes the following HTML:
<dl><dt>Unordered Lists</dt><dd><p>Generally inclues a series of bullets when viewed in HTML.</p> </dd> <dt> Ordered Lists</dt><dd><p>HTML viewers convert the list items into a numbered series.</p> </dd> <dt> Descriptive Lists</dt><dd><p>Usually used for definitional lists such as glossaries.</p> </dd> </dl>
Example Code
As mentioned above, Structured Text authors can use an easy
convention to get the monotype semantics of the CODE
tag from
HTML. For instance::
When you see the dialg box, hit the Ok
button.
...is rendered into the following HTML:
<p>When you see the dialg box, hit the <code>Ok</code> button.</p>
However, sometimes you want long passages of code. For instance,
what if you wanted to document a Python function in the middle of
an article discussing Python? You can indicate a code block by
ending a paragraph with ::
, and indenting the following
paragraph(s). For instance, this Structured Text snippet:
In our next Python example, we convert human years to dog years:: def dog_years(age): """Convert an age to dog years""" return age*7
...would be converted to the following HTML:
<p>In our next Python example, we convert human years to dog years:</p> <pre> def dog_years(age): """Convert an age to dog years""" return age*7 </pre>
The convention of combining ::
at the end of a paragraph-ending
sentence and indenting a block does more than apply CODE
semantics. It also escapes the indented block. That is how the
Structured Text and HTML snippets in this article are left alone,
rather than being rendered.
For example, the less than, greater than, and ampersand symbols in this code block are escaped:
Here's an HTML example:: <html> <p>This is a page about dogs & cats.</p> </html>
...to produce this HTML:
<p>Here's an HTML example:</p> <pre> <html> <p>This is a page about dogs & cats.</p> </html> </pre>
Hyperlinks
In the previous sections we focused on ways to get certain presentation semantics in HTML by using common text conventions.
But the web isn't just HTML. Linking words and phrases to other information and including images are equally important. Fortunately Structured Text supports conventions for hyperlinks and image tags.
Let's start with a simple hyperlink. If we have a Structured Text paragraph discussing Python::
For more information on Python, please visit the "Python website" :http://www.python.org/.
This becomes:
<p>For more information on Python, please visit the <a href="http://www.python.org/">Python website</a>.
The convention is fairly simple:
- The text of the reference is enclosed in quotes.
- The second quotation mark is followed by a colon and a URL.
- The URL can be followed by punctuation.
This basic convention has a number of variations. For instance,
relative URLs are possible, as are mailto
URLs.
(Note: in the above example, there should not be a space between the last quote and the colon. This is due to a bug in the version of structured text currently running on Zope.org. This bug has been fixed in more recent versions of Zope.)
Advanced Usage
There are more obscure extensions to Structured Text to handle cross references, tables, images, and more.
One of the great things about structured text is that if you don't like its rules it's fairly easy to extend. This is made possible by the recent rewriting of Structured Text sometimes referred to as "Structured Text NG". For example, you could create a LaTeX outputter, or you could change structured text to recognize a different syntax for hyperlinks.
Structured Text is available in Zope (and is integrated into the Zope Content Management Framework,) but you can also use it outside of Zope. To use Structured Text in Zope just create a document or file containing structured text, then call it like so::
This will give you the HTML representation of my_document
.
The Zope Book is an example of a Project that uses Structured Text outside of Zope. The book was written in Structured Text with some modifications to support figure handling, and the publisher's in-house markup format. Python scripts parse the input and create output in HTML and PDF.
Structured Text use is also used in Python doc strings. A number of Python documentation extraction tools support Structured Text. Currently work is under way on the Python doc-sig to develop docstring conventions, and a docstring processing system.
Conclusion
Structured Text gives you an easy way to express yourself in plain text. The Structured Text implementation allows you to tailor the syntax and output. Structured Text is integrated into Zope and is also usable outside Zope.
Resources
Structured Text Wiki - discusses structured text and STXNG.
reStructuredText - A Structured Text alternative being developed as a Python docstring standard.
Comment
Missing Fragment
Posted by: dle at 2004-04-22This article contains the paragraphs:
Structured Text is available in Zope (and is integrated into the Zope Content Management Framework,) but you can also use it outside of Zope. To use Structured Text in Zope just create a document or file containing structured text, then call it like so::
This will give you the HTML representation of my_document.
Can someone clarify this by inserting the missing code fragment?
Comment
STX examples unclear
Posted by: heiko.stoermer at 2007-05-30The actual STX code that is used and cited to produce the HTML outputs displayed above is unclear. Every block of STX code should go into a code environment to precisely show the characters one has to type to achieve the desired result (e.g. the "bulleted list" example cannot be reproduced).