You are not logged in Log in Join
You are here: Home » Members » k_vertigo » Stories » PortalSynchronization » View Document

Log in
Name

Password

 

PortalSynchronization

CMF Synchronization

Purpose

Allow for serialization of content and logic to the filesystem in human editable form such that the system can be checked into cvs. Also supports incremental loading of serialized forms.

this document can be considered a followup to the Team Zope article. That document focused on code/logic, presentation, and configuration, while this one focuses on data. That said, it also should be noted that complete serialization, offers the ability to version all of these aspects.

Use Cases

  • data/code migration from dev <-> production
  • concurrent distributed developers
  • scm integration for content
  • support of alternative storages for content, for external tool support (analysis, integration, etc).
  • site upgrades

Design

Problems

In Zope2, synchronization is fundamentally complicated by the class mixin nature of development in Zope2. For example the Portal class that makes portal objects, has over 30 unique base classes. The majority of these mixin classes are stateless and provide mostly methods to participate in the zope framework. However, several of them do provide state, and developers are free to create subclasses of them in their own development which mix in these stateful classes, to create instances whose attributes are managed by several classes.

The issue of filesystem serialization and even synchronization has been tackled another of times in the zope community. To date, the implementations have been focused on one to one mapping of classes to serialization and loading.

  • FSDump (serialization only) http://www.zope.org/Members/tseaver
  • LocalFS (latter versions) http://www.zope.org/Members/jfarr
  • ZFS http://www.zope.org/Members/andym

In general this approach is problematic as it does not allow for general object serialization in a manner which allows complete recreation of the object's state.

These frameworks do provide some handling for these mixins, like Security, but because they approach a meta_type as their base unit of serialization, they require hacking on the source to allow for handling of additional stateful mixins. additionally the mixins they do allow are not in all cases adequate for complete recreation of the object from serialized form.

This problem is moreevident in attempting a CMF serialization, as the CMF mixins some new stateful mixins, like the DublinCore and Opaque objects like Discussions.

Aspects

I propose a different solution for serialization, whose basis is to turn the base unit of serialization from a class into an aspect. Where aspect is defined as an collection of attributes for a particular domain.

Components

An aspect is composed of four different components.

Serializers

Handle decomposing an object's attributes that form the concern of this aspect into a neutral format.

Writers

operate on the aspect's domain attributes to turn them into a serialized form.

Readers

handle reconstruction of an object's attributes from a serialized form.

Adapter/Deserializer

adapts an object (creating if nesc) to the read attributes.

The separation of writers and readers allows for developers to plugin custom serialization formats by altering just the readers and writers for an aspect. Some of these roles might get consolidated in implementation, ie consolidating reader/writer and serializer/deserializer pairs.

Classification

Aspects are classified into three different types based on their relationship with an object.

Memento

Memento aspects correspond to mixin classes an object's class might inherit from. some examples include.

  • security (local roles, ownership)
  • workflow history
  • dublin core
  • properties
  • identity (meta_type, modification_time)

Opaque

Corresponding to object attributes which are handled by external entities for which the object may not be aware.

  • discussions

Payload

Corresponding to the

  • python scripts
  • dtml files
  • content objects

    these classifications are not hard boundaries, in that what classification a set of attributes falls under is upto the developer to determine. An example being the workflow history... this part needs some more thinking about ;-)

Aspect Registry

Aspects are registered into a registry, based on a one to one association with stateful mixin classes.

Composite Aspects aka Synchronization Components

A typical class of an instance to be serialized will correspond to several aspects. within the serialization framework the aspects are aggregated by type. roughly each type/composite aspect corresponds to a single file in a filesystem representation.

my thoughts on the default serialization for composite aspects are that memento aspects get serialized to an xml file, payload aspects get serialized to a single file if one, or a mime-encoded file if multiple, and likewise for opaque attributes. this allows for easy editing of the aspect of the object most relevant.

Synchronizers

correspond to an instance's class, and are used to synchronize objects to and from the filesystem.

Synchronizer Construction

synchronizers can either be explicitly specified, or implicitly built via introspection into the classes of an object, and mapping of those classes into aspects and aggregation into composites based on types.

Synchronization Service

Provides the functionality to synchronize an object tree to and from the fs. does lookup of object/file synchronizers and traversal of trees. if an instance's synchronizer is constructed it will be cached for the lifetime of the synch operation.

Synchronization Tool

a cmf tool that provides the public api for usage, configuration management, and log/stat viewing.

Conflict Resolution

we try to take the simplest approach to conflict resolution.

all of this is open to change, and discussion (hint ;-)

in a sync to fs operation

  • if the object is modified more recently than the corresponding files, than the files are deleted and rewritten.
  • if there are files which do not correspond to an object, they are deleted.
  • if the files are more recently modified than the object, we check if they have been modified more recently than the most recent synch to fs operation, if so we overwrite, else we continue to the next object.

in a sync to the zodb operation

  • if the objects do not correspond to a file set, they are deleted.
  • if the files are more recently modified than the last sync operation than, the objects are created/modified.

we should probably bypass the standard object creation mechanism, except for cataloging when creating a new object... hmm. this needs to get thought through some more.

Bootstrap

for initial development, i think it easiest to ignore the tools for all operations. with a goal of adding in the tools after the system is stable.

Risks

  • unicode, i'm not well versed in unicode issues.
  • speed

Prototype

Some initial code on a prototype resides in the collective cvs (CMFSynchronization). currently (10/25/2k) it is out of date with this document. its more a ball of mud than a working prototype.

Design Influences

AdaptableStorage

I noticed some similiarities between this approach to synchronization and Shane Hathaway's excellent AdapatableStorage product. Which serializes objects at the zodb storage level via different aspects/attribute handlers to different storage layers (rdbms, fs).

Unfortunately i also find it problematic for my usage scenarios as it would require all content to reside on the mounted storage and would still require implementation of all the deserializing logic. At this stage, its also alpha software. more importantly, i find a bit aribitrarily, i admit, a little safer tackling this issue at the application layer.

FSSync

This is the implementation for Zope3 of a synchronization system for the zodb. the concerns raised by the team zope paper are very much answered at the infrastructure layer by zope3. the fs sync service provides the nesc. infrastructure in z3 for synchronization of objects/data to the fs. its use in a z2 setting is not possible as it depends on corresponding architecture and infrastructure from z3. also it should be noted that the default implementation for serialization is the classic z2 import/export facility aka xml pickle.

Author

kapil thangavelu @2002

License

FDL