ZODB

Documentation
A persistent object system (POS) provides persistence to application objects. Ideally, providing object persistence has little or no impact on application code. The Zope Object Database (Z ODB) achieves this goal by providing highly transparent persistence for application objects. While there is some impact on applications, the impact is far less than with other solutions, such as relational databases or shelve.

The Bobo POS system was the predecessor to Z ODB. I did not consider Bobo POS to be an object database system because it provides only limited provisions for databases that are much larger than (desired) application memory. The Bobo POS had a feature for deferring object loading until needed. However, it lacked a facility for unloading objects from memory when they are no longer needed. To automatically remove unneeded objects, it was necessary to track object usage. This was not possible with a Python-only implementation without paying a prohibitive performance penalty.

Digital Creations added the previously-commercial code that enhanced Bobo POS into Z ODB, thus providing database semantics. Z ODBt achieves this by providing a C-based persistence mix-in class that allows object access to be tracked efficiently and a C-based object cache that automatically deactivates and deallocates objects as needed, allowing virtually unlimited database size without excessive consumption of memory.
Z ODB 3 is a major enhancement to the Zope story. It applies lessons learned from earlier versions. In particular, it:

- More fully integrates transaction management into interaction with a storage manager.

- Integrates support for long-running transactions, which are called �versions�.

- Redefines and clarifies the storage interface, which was muddied by BoboPOS2, to facilitate implementations on top of relational databases.

Perhaps the most important feature of Z ODB 3 is support for multiple concurrent threads of execution.
TBD multiple databases
The current design is aimed at supporting transactions that span multiple databases in the future. This is likely to cause some complications that have not yet been given significant thought. For example:

- It will be necessary for transaction identifiers to be globally unique.

- It will be necessary for objects to have cross-database references and for database connections to be established automatically when sub-objects from foreign databases are encountered.
ZPublisher Issue
ZPublisher will need to support some protocol for retrying requests. This may impose requirements on publishers as well, since it will be necessary to keep request state (standard input and output and the environment) around in a reusable state.

Also, there will be some interaction with �streaming� output. Of course, streaming output is inherently incompatible with transactions and should probably only be used for read-only transactions or in cases where the application is willing to take over transaction control and deal with conflict errors.
Concurrency
Z ODB 3 supports multiple concurrent threads by providing each thread with a separate copy of the persistent object space. The persistent object space consists of all in-memory objects that are reachable from the system root object. Connection objects are used to manage persistent object space copies.

Separate threads of execution can executed more or less independently, because each has its own copy of persistent objects. Care is still required when dealing with non-persistent global or �static� data, such as module and class variables, and mutable default function arguments.

Threads are synchronized through database commits. Transactions are single-threaded, and each thread has it�s own transaction. Transactions can be applied to multiple (database) connections to multiple databases. Only one transaction can commit to a particular database at one time. An optimistic concurrency strategy is used. Objects are not locked. Each connection manages a collection of object IDs for objects that have been �invalidated�. An object has been invalidated if it has been modified through another connection in the current transaction or in a transaction that committed after the start of the current transaction.

When a transaction is to be committed, all of the objects modified by the transaction are checked to see if they have been invalidated. If any objects modified by the transaction have been invalidated, then the transaction is aborted and a ZODB3.ConflictError exception is raised. An application (e.g. the Python Object Publisher) should catch the ZODB3.Conflict exception and attempt to re-execute the transaction.

Object revisions are assigned 32-bit serial numbers. The serial number is updated whenever a change to an object is committed. Note that objects will have different serial numbers in different versions. Committing a version increments the base serial number and discards the version serial number. Active objects have the serial number corresponding to the revision from which the object was read. When an object is changed and the change is committed, the object�s serial number os compared to the serial number stored in the database. If the serial numbers don�t agree, then the object has been updated by some other thread or process and a ConflictError is raised. Checking serial numbers prevents overriding revisions in cases where invalidation messages are lost, especially in multi-process situations.
Database organization
The database has an extremely simple organization. The database consists of a single network of objects rooted in a single system root object. The system root object maps names to application root objects.

The database provides almost no high-level data structures or indexes. Data structures, such as tables and other collections, and collection indexes are provided by application code. The only data structure exposed by the database is the system root object. The only indexes provided by the system are used to provide fast retrieval of objects given object ids and fast retrieval of root objects given object names.

More than one database can be used in an application. Objects in one database can reference objects in another database. (The details of how this will work are still TBD and may not be resolved in the initial releases.)
Release notes
This is version 2.0 of the ZODB 3 architecture description


version 2.2

Temporary versions have been abandoned in favor of subtransactions.

A sync method has been added to connections to bring connections up-to-date without having to close and reopen them.

version 2.1

This release reflects the implementation of Zope 2.0.0 alpha 1.

There have been many minor changes since 2.0 (of this document).

The biggest changes are:

- Finalization of serial numbers. Time-stamps are used for serial
numbers. These are 8-byte strings that can be converted to and
from date-time values using the provided TimeStamp class.

- There is now a packaging chapter that described how ZODB components
are made available to applications.

- The storage interface has expanded slightly. It will probably need to be
expanded further to:

- provide some kind of version summaries,

- allow a storage implementer to not implement the history feature and
declare as much.

version 2.0

This version is a significant change from version 1 in that this
version is a UML model, rather than a word-processing document.

Many details not specified earlier are specified here.

The most substantive changes are:

- Objects now carry a serial number to make sure conflicting
changes are not made.

- Storages can now send invalidation messages to databases:

o When transactions are undone,

o When versions are committed or aborted,

o When other processed modify objects.