Awhile back I was asked to look at a file in our repository which had the extension OMF. It was not identified by DROID and didn’t appear to be in PRONOM. It didn’t take long to find quite a bit of information on the file format as it was used by many important software titles, or at least it used to. Exploring the details of this file format led me on quite the rabbit hole. You see, the OMFI format is based on a container format that once was heralded as the a better open choice over the Microsoft OLE container format growing in popularity.
This all started with a multi-platform approach to an open document format started by Apple Computers in the early 1990’s called OpenDoc. It was originally an alliance between Apple, IBM, and Motorola. The idea was to have a framework any developer could use to develop software or components that would all work seamlessly together. Many developers were on board initially with many promised software titles being developed, but ultimately with much confusion surrounding the framework and Steve Jobs return to Apple in 1997, the project was scrapped.
There are four key ideas in the Bento format:
- everything in the container is an object,
- objects have persistent IDs,
- all the metadata lives in the TOC (Table of Contents),
- objects consist entirely of values, and
- each value knows its own property, type, and data location.
The idea of a data model with such an organized structure was so appealing the digital preservation community there was excited to push for a Universal Preservation Format specifically for multimedia based on Bento. The idea was presented to AMIA in 1996!
Open Media Framework (OMF) Interchange
Avid Technology, a leader in audio/video editing systems, used the Bento specification to design a container format for multimedia. This allowed easy interchange of projects between many different software titles. Original specifications were published in 1994, while the 2.1 specifications released in 1997. Software titles such as Pro Tools, Cubase, Adobe Audition, Adobe Premiere, Apple Logic Pro, Apple Final Cut Pro, and many others supported the OMF format, at least for awhile. OMFI was migrated to Microsoft’s Structured Storage container format to form the core of (AAF) Advanced Authoring Format in the late 1990’s.
In order to identify an OMF file we first need to understand what is part of the OMF specifications and what is part of the Bento format. OpenDoc may not have lived very long but the Bento format held on long enough to be the structure used by a few different file formats. I am aware of the following, but there was other software being developed at the time.
Samples from each of these formats show some similar patterns. In the Bento specifications we can see:
The only version of the specifications I can find are version 1.0d5 released in 1993, but we know there was also a version 2 released later. The magic bytes are not defined in the 1.0d5 spec, but looking at the code in the Open Doc Developer Release in 1996, we can find reference to the magic bytes used in “Containr.h”.
#define MagicByteSequence "\xA4""CM""\xA5""Hdr""\xD7"
The Bento specification also defines this header information as, “Our solution to this is to define the standard Bento format to have the label at the end of the container.” Which means this byte sequence will frequently be found at the End of File. The “CM” refers to “Container Manager” and “Hdr” refers to “Header”.
Now that we have the magic bytes for the Bento container we can look at what makes the OMF file unique from others. We can find the answer in the Bento specifications.
We know that every Bento container must have a object, so in version 1.0 of the specifications on page 65 we find.
Each object must have the property OMFI:ObjID. The value of OMFI:ObjID is required and is listed in the property description for each object.
The OMFI:ObjID can also be found in version 2.0 of the specification, but in addition it defines:
The OMFI:ObjID property has been renamed the OMFI:OOBJ:ObjClass property, which eliminates the concept of generic properties and makes the class model easier to understand. The name ObjClass is more descriptive because the property identifies the class of the object rather than containing an
ID number for the object.
Since both are required it seems appropriate to use those strings for identification in a PRONOM signature. You can check out the proposed signature and samples on my GitHub page.
There is so much history wrapped up in these formats and the potential they had to change how we preserve files in our archives. Luckily we have the Internet Archive WayBack machine to help us discover or remember ideas that once existed, some which may find their way back to inspire future file formats.