Electric Magic | Portfolio | Storage and Databases

WireOver - Fast, Free, Secure File Sending

WireOver is a Mac, Windows and Linux application for sending and receiving files. It's easy to use and very low friction, requiring only entry of an email address and click on a received web link to set up. Receiving an unsolicited send is even easier, as acting on the receipt of the notifying email naturally associates that email with the recipient.

Transfers (by subscribers) use perfect forward secrecy, and can be entirely asynchronous, so sender and receiver need never be online at the same time. Of course if they are, then transmission goes as fast as the physical network allows.

We built the backend and client mainly in Python, supporting Mac and Windows back to 10.5 and XP respectively. For Linux we support 32 and 64 bit Ubuntu 12 and later.

Much of my focus has been on keeping OS variations from impacting the main body of the code. To that end, we're using wxWidgets to abstract the GUI, and I've implemented such native code as needed to interface with unwrapped aspects of the OS and third-party libraries. In some cases the fluent Python bindings to OS facilities just aren't fast enough, and native code is used there also.

Wellness!

As a new parent, especially once the children started at school, I was amazed at the number of prescriptions for minor ailments that we'd have to manage at once. Hence the iOS application Wellness!, developed in collaboration with IRMAcreative.

Knowledge Forum 4.6

Knowledge Forum 4.6's minor version number change belies just how different from its predecessor it really is.

Under the hood KF 4.6 has moved from using ZDBase for its backing store, instead using a tuplebase. This makes it possible to split HTML page generation into separate processes, potentially running on multiple front end machines. It also restores support for a rich client application, now written in Java, using ZTSoup to efficiently communicate changes in the tuplebase, whether made by other clients or by the web interface.

ZTSoup: UI-Friendly Tuplebase Access

The tuplebase API is well-suited to data processing needs. But it's clumsy as the mechanism by which data in a tuplebase is to be presented and maintained in a live UI, rather than web pages or generated reports.

ZTSoup gets its name and inspiration from the Apple Newton concept. A ZTSoup is backed by the same data as a tuplebase, but that data is accessed by instantiating a ZTCrouton for each tuple that code is interested in, and a ZTSieve for the result set of any query of interest. Why the funny names? A soup has croutons floating in it, interesting ones are sieved out of it.

There are two problems with connecting a UI to a transaction-based store like the tuplebase. First, how do you efficiently identify the minimal set of interesting changes that have been made in the tuplebase. Second, in reading or writing data, what do you do when a transaction fails to commit?

ZTSoup addresses these problems by maintaining objects locally that record a snapshot of the subset of the tuplebase in which the software is interested. Local changes are recorded in those objects, and they are sent to the tuplebase as a single atomic update operation. It's simply a matter of hooking a call to the soup's Update method into the normal event loop -- all additions and removals of interest in croutons and sieves are sent to the tuplebase, and remote changes to the croutons and sieves are returned.

Because the soup may be talking to a tuplebase on the end of a comms link the update work is split into a synchronous piece, which executes in a small amount of time with no blocking, and an async piece that accumulates work done by the synchronous piece and communicates with the server as fast as latency and server CPU will allow.

Java Tuplebase Access

Initially I provided Java access to a tuplebase instance by implementing Java classes whose most interesting methods were marked as native, and thus invoked via JNI. This was very powerful because Java could use any tuplebase implementation simply by calling the appropriate factory function and I could expose any existing C++ functionality simply by implementing the appropriate JNI glue.

But native code can't be included with unsigned applets, so I ported ZTBRep_Client from C++ to Java. ZTBRep_Client is the most useful tuplebase implementation in that it talks to a ZTBServer instance on the end of a comms link, generally a network socket. It's also fairly large, and the async nature of the protocol makes things somewhat tricky. The normal Java synchronization model is based around monitors, which are not directly useful for implementing certain types of concurrent processing, so I implemented condition vars and mutexes to provide more appropriate tools.

Knowledge Forum 4.5

Knowledge Forum 4.5 was a significant refinement of Knowledge Forum 4.0, with a much richer web interface, although still constrained by the need to support Netscape 4.x-era web browsers.

New features included support for upload and download of files as attachments to views and notes, headings within views, full text and attribute-based searches, and notification by email. It also supported graphs of useful metrics, implemented across platforms using ZDC_ZooLib.

On the back end, a significant change was moving from the original blockstore implementation to one based on B+Trees that uses a neat upward propagting copy-on-write of nodes to make it virtually bullet-proof - only a physical failure of the backing store can cause data loss.

Tuplebase

ZooLib's tuplebase is derived from the tuplespace concept initially explored in the Linda coordination language, another well known derivation of which is Sun's JavaSpaces system. Whereas JavaSpaces is Java-only and relies on many of that languages's features, ZooLib's tuplebase works today with C++ and Java, and is well suited to work with other languages.

Language agnosticism is a key feature of tuplebase. Another is its runtime flexibility, which contrasts with most relational database implementations.

The power of a relational database is that the data it stores conforms to a rigorously defined structure. For a single piece of software it's always possible to find a single structure that supports all the work to be performed. As the nature of that work changes the database schema is evolved along with it. But this evolution must be coordinated so that all software that depends on the schema is updated to accommodate the changes, or virtual views are created to preserve a logical interface to the data.

Tuplebase takes a very different approach. Rather than enforcing a structure it allows any structure at all to be stored. Different pieces of software with overlapping concerns obviously must coordinate amongst themselves, but there's no requirement that there be any single authority managing everything. And because there's no mandated structure it's possible at runtime to treat multiple physical tuplebases as a single logical tuplebase, even if they contain data of different provenances. If the code using the data is written to be accommodating to the gradual migration of meaning then newer code can correctly interpret older data, and older code will ignore (but preserve) newer data.

IFF and QuickTime File Formats

IFF is a venerable data meta-format, introduced in 1985 by Electronic Arts as a standard framing mechanism for multimedia data. In short it defines a nested chunked format, where the file as a whole is considered to be a chunk. A chunk has as its first four bytes a tag that indicates the type of data in the payload, a 32 bit (four bytes again) count of the number of bytes in the payload, and then that number bytes being the payload itself. Some chunk types are known to contain zero or more other chunks in their payload, so an arbitrarily complex hiearchy can be established. There are other details, but chunk types and sizes are IFF's essentials. QuickTime's file format is almost the same, except that the size comes first, and includes the eight bytes of the chunk header.

IFF/QT are simpler than most file formats, but can still be fiddly to work with. To make it easier for me to parse and generate QT files manually (on platforms which don't have QT libraries) I put together a suite of ZooLib streams that take care of all the bookkeeping.

Knowledge Forum 4.0

Knowledge Forum 4.0 took the radical direction of being web-only. We'd had some support for web access by virtue of a perl program that used the client's communications protocol to talk directly to the server, but perl wasn't pre-installed on Mac OS (Classic) or Windows, and it had some performance problems. So we ported the perl software to C++ and incorporated it into the server directly.

For a time we had native client and web access running simultaneously, but never in a released version. The 3.x client remains buildable as of May 2006, but still talks only to a 3.x server.

Files

Most of my work till this point had not required 'interesting' operations with file systems, being restricted to creating and opening files in externally determined locations, then reading and writing the files' contents. When it became necessary to ennumerate the contents of directories, and to deal with permissions and locking I took the opportunity to define an API that would be consistent across Windows, Mac and UNIX whilst cleanly accomodating their differences.

The ZooLib file API doesn't address more esoteric file systems like VMS or IBM's MVS. It models a hierarchical system, with accomodations for special roots (like Windows' UNC paths and Mac Classic's multiple volumes with the same name). Files are byte-oriented, and as with other stream operations ZooLib's file Open and Create return a ref counted streamer object, and optionally an error enum providing more detail in the case of failure. The representation of a pointer to a node in a file system also follows ZooLib's practice of a using a smart object with value semantics.

The ZFile suite was the first place I formalized the notion of a 'trail'. This was another case where it was important to come up with a new name that was suggestive of an existing notion, but distinct enough from it that its different semantics could be defined. Picture a tree; a trail is simply the list of steps to be taken to navigate from some node to some other node. A step is either to traverse the link from the node to its parent, or to traverse the link from the node to one of its children. We represent a step as a string. An empty string indicates a 'bounce', a traversal to the parent of the node. Any other string indicates a traversal to the node with that name. Whether the name is attached to the child node, or attached to the link to the child node doesn't matter in this model.

The notion of a trail through a tree shows up in assets and in the handling of HTTP requests. Theoretically it could also be applied to tuples but I'm not quite ready to harmonize everything to that extent.

BlockStore: File System in a File

Applications often have to satisfy two conflicting requirements. On the one hand the data created by a user has a structure whose parts should be managed independently of one another, ideally with each piece placed in its own file. On the other hand users like to think of their data as a single entity which can be copied, emailed and backed up in its entirety. Although Mac OS X has the notion of a bundle, a specially marked directory which behaves like a single file when manipulated by the Finder, other operating systems do not.

So, a ZBlockStore takes a single file (or file-like entity) and allows an application to work with resizable blocks allocated within it, easily, efficiently and (with ZBlockStore_PhaseTree) safely.

Easy to use — blocks are managed with the same API as real files.
Efficient — all the data lives in a wide fan-out B+ Tree.
Safe — modifications to nodes in the tree cause the entire ancestor branch to be replicated into existing free space, and the new root is atomically updated only when descendant nodes are known to be in stable storage.

Assets: Portable resources

I picked the name 'asset' as an alternative to 'resource', a term that already has too many disparate meanings on different platforms. That said, assets are used in the same situations that MacOS/Win32/BeOS resources would be, although the mechanism is more flexible.

The data making up an asset tree is directly usable by any processor, big or little-endian. It can be kept in a file, loaded into RAM, memory-mapped from disk or accessed from a stream.

ZooLib

ZooLib is an Open Source (MIT License) C++ library that makes it easy to write one set of source and build an application for Windows, Mac and UNIX. It provides a foundation suite of facilities that in essence form a virtual operating system API, and wide range of higher-level facilities that build on that foundation.

ZooLib abstracts threads, files, networking, graphics and the user interface. And there are escape hatches whereby OS-specific features can be accessed without compromising the platform-agnostic nature of the rest of the code.

ZooLib has been under continual development since 1992, and incorporates fixes for every OS bug or anomaly I've encountered, and generally provides cleaner mechanisms for most tasks than OSes traditionally provide. It's been the basis of Measurement in Motion, NetPhone, and every version of the Knowledge Forum server and client applications.

Knowledge Forum 3.0

Knowledge Forum 3.0 took advantage of major enhancements in ZooLib that let the client be released for Windows as well as Mac OS. Much of Knowledge Forum 2.0 had been built around the Mac-only Zoom framework, and so had to be reimplemented. The parts of Knowledge Forum 2.0 built with ZooLib were simply carried forward, with refinements and enhancements.

By this time ZooLib abstracted Mac OS, Windows and UNIX, so the server could be hosted on all three OSes.
The text editing in Knowledge Forum 3.0's note windows used the PAIGE text engine, which was a fairly crusty 60,000 lines by this time, but was portable across Mac and Windows. So although the rest of the client could be built for UNIX, without the text editor it wasn't particularly useful and we never released a UNIX client, although that wasn't a big deal in 2000.
Unfortunately we also had to abandon Kevin Parichan's excellent painting implementation, as it was also Mac-specific. Instead we used ZooLib's NPainter, which used only ZooLib features and was thus completely portable.

Knowledge Forum 2.0

Knowledge Forum 2.0 is a computer supported collaboration environment designed to foster the growth of knowledge building communities. It's based on the CSILE project, developed at the Ontario Institute for Studies in Education and is published by published by Learning in Motion.

CSILE was pioneering and innovative, but in May 1994 limitations in its feature set and a desire to see it in wider use led to a collaboration with Learning in Motion to redesign it. The scope of the necessary changes neccessitated a complete reimplementation and the decision was taken that it would be released as a commercial product.

Cyrus Ghalambor started work on the KF 2.0 client application in July 1995. I was busy with NetPhone till December 1995, when I started working full time on the client's data API and the server that would satisfy its requests. We looked at several existing database libraries, but they were either too limiting or could not run on Mac OS (Classic, of course). So I implemented ZDBase, a portable database engine that was built on ZooLib. Client and server communications and the server's administration UI were also built using ZooLib.

The client application's event loop, menus and windows were handled by a refinement of Berkeley Systems' Zoom framework. It had two main types of windows. The note window used Zoom for its controls, and the WASTE text engine for its text editing panel. The view window provided a two-dimensional icon-based representation of collections of content, its operation was closely modeled on that of Apple's Finder, and it was implemented using ZooLib. The note and view windows also contained pictures, in-line with the text in the note window and in a layer behind the icons in the view window. Pictures were edited in place using a nice painting implementation by Kevin Parichan.

Other dialogs and property inspectors were based on Zoom or ZooLib, depending on whether they were written by Cyrus or by me.

ZDBase: Portable Database Engine

ZDBase is a relatively simple database storage engine. It's more of an alternative to Berkeley DB (as it was) than a replacement for MySQL. It supports an arbitrary number of tables, records and typed/named fields within those records. Each table maintains as many indices as desired, and the schema can be updated at any time. All data is kept in a blockstore, so everything lives in a single file, and the code and data are portable across platforms.

This was my first serious use of B-Trees, the essential basis of most high-performance disk-based storage systems. The searching of B-Tree content and the insertion of new data is well documented in numerous sources, but the deletion of data is invariably left as an exercise for the reader, which was a bit frustrating but ultimately empowering.

Marrakech

Marrakech took the hypermedia concepts I explored in WorkSpace and applied them to the problem of managing workflow and assets for multimedia development.

WorkSpace

WorkSpace was an interesting application that took user interface ideas from Andy Hertzfeld's Servant, and hypermedia ideas from all over, and combined them into a personal information manager that used the web of links between entities to represent meaning. It was never released as a product, but was where I first started creating ZooLib, and formed the basis of Marrakech and ultimately of Measurement in Motion.

February 2014

WireOver - Fast, Free, Secure File Sending

May 2011

Wellness!

May 2006

Knowledge Forum 4.6

May 2005

ZTSoup: UI-Friendly Tuplebase Access

May 2004

Java Tuplebase Access

August 2003

Knowledge Forum 4.5

October 2002

Tuplebase

October 2002

IFF and QuickTime File Formats

September 2002

Knowledge Forum 4.0

April 2002

Files

June 2001

BlockStore: File System in a File

March 2001

Assets: Portable resources

December 2000

ZooLib

January 2000

Knowledge Forum 3.0

August 1997

Knowledge Forum 2.0

January 1996

ZDBase: Portable Database Engine

November 1992

Marrakech

June 1991

WorkSpace

Projects by Category

Recent Storage and Databases

Project Archives