Python Database Performace: ZODB (object-oriented) vs. SQLite (relationl)
Flávio Codeço Coelho compares the performance of the ZODB object-oriented database with SQLite, a relatioinal database in Python. His results may surprise you.
. . . for up to a 100000 inserts per transaction, ZODB’s performance is comparable to SQLite3. Since ZODB allows you to store arbitrarily complex objects, you don’t have to cook up complex SQL queries to get at data you need, the relation between each datum is given by the design of the object you are storing.
In some apps of mine, I have to write code to extract the the data from my Python objects, put them in table format (to store in a relational db), and then, when I read them back, I have to have more code to put them back where they belong. With ZODB, none of that is necessary.
ZODB stores your data in a file like SQLite, however it supports other storage types.
You might also wish to look at Durus,
[tease]
The MEMS Exchange software development team developed Durus after using ZODB successfully for three years. We were very satisfied by the general architecture used by ZODB, but were also aware that much of the complexity of the ZODB source code existed to support features that the MEMS Exchange would never use: most notably the support for multiple threads in a process. Durus is primarily a re-implementation of the subset ZODB architecture that we use for our web applications.
[tease]
Durus stores instances in a file so that they can be used later. Instances must be converted to strings before they can be written to a file. Durus uses the Python pickler for this serialization. The pickler is powerful in that it can serialize instance graphs, even if they contain circular references. In Durus, we want to manage change in a rather large instance graph, with, for example, hundreds of thousands of instances. The pickle of the whole instance graph (our “universe”) might be hundreds of MB: too big for fast writes.
Fortunately, the pickler includes hooks that make it possible to put bounds on the part of the instance graph that is actually serialized. In Durus, the pickler behaves specially whenever it encounters a reference to an instance of the “Persistent” class (described in detail in a section below). Instead of crawling forward and including the instance in the pickle in the usual way, a reference to a Persistent instance is replaced with an identifier that can be used to locate that instance later. If the referred-to instance does not already have an assigned identifier, the pickler assigns one.
The critical benefit of this behavior is that it effectively partitions an instance graph into distinct components, one component for each Persistent instance, and with a distinct identifier for each instance. With this behavior, we can record a change in one Persistent instance by writing the corresponding pickle instead of pickling the universe. Moreover, we can unpickle a single Persistent instance’s state when we need it instead of being forced to load the whole universe of instances. In Durus, the state of a Persistent instance is not loaded until it is used, as, for example, when application code looks for the value of an attribute.
See also: Shove,
[tease]
[Shove is a] common object storage frontend that supports dictionary-style access, object serialization and compression, and multiple storage and caching backends.
At a lower level of abstraction, there’s cPickle.
cPickle — A faster pickle
The cPickle module supports serialization and de-serialization of Python objects, providing an interface and functionality nearly identical to the pickle module. There are several differences, the most important being performance and subclassability.
First, cPickle can be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPickle module.
bsddb can be used as the backend for cPickle.
[tease]
bsddb — Interface to Berkeley DB library
Availability: Unix, Windows.
The bsddb module provides an interface to the Berkeley DB library. Users can create hash, btree or record based library files using the appropriate open call. Bsddb objects behave generally like dictionaries. Keys and values must be strings, however, so to use other objects as keys or to store other kinds of objects the user must serialize them somehow, typically using marshal.dumps() or pickle.dumps().