The 1990’s was a an exciting time for Desktop Publishing. I got my first taste of design in the early 90’s with Aldus PageMaker. QuarkXPress was king in commercial publishing world. For the most part designers and commercial printers used Macintosh computers which QuarkXpress catered to. For those who could not afford the high prices, or used a PC, there was a few options. Microsoft Publisher, TimeWorks Publish It!, and Express Publisher were a few. There was many debates during that time on which software was the best.
I have submitted signatures to PRONOM for many of these:
Express Publisher was proving elusive for finding software and sample files. Express Publisher was developed by Power Up who had been developing a DOS version since the late 1980’s. At one point Power Up decided to sue QuarkXPress for the use of the name XPress. In 1991 Power Up sold all their assets to Spinnaker around the time they released the first Windows version of Express Publisher.
When I first took a look at some samples from the Windows version 1.0 of Express Publisher, the magic header looked familiar.
If it looks familiar to you it is similar to the famous, well nerd famous, JAVA Class file format.
The story goes that James Gosling needed a magic number for his new class format and was in a place they called Cafe Dead when he realized CAFE was a hex value, he soon used CAFEBABE and CAFED00D for his new formats. JAVA was released by Gosling in 1995 for SUN Microsystems.
File Format magic numbers are often used when designing a file to be used with software. Often times it is meant to be a sequence of hex values or a string indicating the file supported by certain software, this is more accurate than the simple extension at the end of most files. They are not required to be there, in fact there are a few formats which are difficult to identify as they don’t use this type of magic number in the header. To learn more read Ross Spencer’s post on magic numbers for digital preservation.
At first I thought it was some sort of homage to the JAVA Class format until I realized the Express Publisher file format was released 3 years earlier. Just a coincidence? I am sure whomever developed this format probably has an interesting story behind it.
These formats are not in PRONOM so lets take a look at what is needed.
The document format for Express Publisher version 1 for Windows 3 uses the EWD extension, as well as EWT for templates. Magic numbers work best for a signature when they are at least 4 bytes long, this gives enough to have little chance of conflicting with another file format. So our PRONOM signature byte sequence would look like this:
<ByteSequence Reference="BOFoffset"> <SubSequence MinFragLength="0" Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0"> <Sequence>CAFEBEEF</Sequence> <DefaultShift>5</DefaultShift> <Shift Byte="CA">4</Shift> <Shift Byte="FE">3</Shift> <Shift Byte="BE">2</Shift> <Shift Byte="EF">1</Shift> </SubSequence> </ByteSequence>
In looking at the earlier DOS versions of Express Publisher they used the extension EPD for a document and EPT for templates. I only have a few samples of version 2 and version 3, but they have different headers.
Version 2 & 3 has consistent bytes starting at offset 4, version 2 using the string PAGES, and version 3 the string EP300. I will have to dig a little more to see if I can find some samples of version 1 to see how they compare and then should be able to submit a PRONOM signature for them.
For the time being, adding “CAFEBEEF” to PRONOM will be a good addition. I wonder if there are any other “CAFE” formats out there, if you know of any, let me know!
UPDATE – There is another format, AnFX Movie, which uses the magic header “CAFEBEEF”. More research is needed to distinguish the two formats.