Picture It!

Most everyone has heard of Microsoft Office, the suite of applications used by millions everyday. Less people know about Microsoft Works, which was a lower cost alternative, but was quite popular as a home office suite of applications. One tool which often came with the Works suite was a digital image tool called Picture It!

Picture It! was a photo editing tool first released by Microsoft in 1996 geared to making photo editing easy and affordable.

Picture It! used a wizard type interface which walked you through acquiring an image and adding to it. One of the key features of the software was the ability to “stack” objects like layers. Because of this feature a new file format was used to save this information to disk. Meet the Microsoft Image (Picture) Extension format, commonly known as the MIX file format. It is very similar to the FlashPix image format, which was supposed to be an image file format to solve many delivery issues, but didn’t seem to gain hold despite being created by Kodak, HP, and others. In fact many of the MIX files I found on Microsoft disks are actually FlashPix files.

The MIX extension was also used by another Microsoft program, PhotoDraw, which causes confusion as they were similar, but PhotoDraw has some added features which may not be compatible with Picture It!. Both formats are based on the Microsoft Compound Object (OLE) container, and have a similar structure. Let’s take a look at a MIX file from Picture It! version 1.

7z l PictureIt1-s02.mix                 

--
Path = PictureIt1-s02.mix
Type = Compound
Physical Size = 48128
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....          328          384  [5]Data Object 000001
                    .....          396          448  [5]Transform 000004
                    .....          872          896  [5]Operation 000001
                    .....          320          320  [1]CompObj
                    .....          292          320  [5]Global Info
                    .....          872          896  [5]Operation 000002
                    .....          144          192  [5]Operation 000003
                    .....          684          704  [5]Transform 000008
                    .....         1028         1088  [5]Transform 000009
                    .....          328          384  [5]Data Object 000009
                    .....          324          384  [5]Data Object 000005
2023-12-27 11:04:39 D....                            Data Object Store 000001
                    .....          328          384  [5]Data Object 000010
                    .....        20932        20992  [5]SummaryInformation
                    .....          200          256  [5]Microsoft Embedding Info
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0001
                    .....         1400         1408  Data Object Store 000001/[5]Image Contents
                    .....          230          256  Data Object Store 000001/[1]CompObj
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0000
                    .....           28           64  Data Object Store 000001/Resolution 0000/Subimage 0000 Data
                    .....           80          128  Data Object Store 000001/Resolution 0000/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0003
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0002
                    .....           28           64  Data Object Store 000001/Resolution 0002/Subimage 0000 Data
                    .....          208          256  Data Object Store 000001/Resolution 0002/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0005
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0004
                    .....           28           64  Data Object Store 000001/Resolution 0004/Subimage 0000 Data
                    .....         1792         1792  Data Object Store 000001/Resolution 0004/Subimage 0000 Header
                    .....          124          128  Data Object Store 000001/[5]SummaryInformation
                    .....           28           64  Data Object Store 000001/Resolution 0005/Subimage 0000 Data
                    .....         6976         7168  Data Object Store 000001/Resolution 0005/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0003/Subimage 0000 Data
                    .....          544          576  Data Object Store 000001/Resolution 0003/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0001/Subimage 0000 Data
                    .....          128          128  Data Object Store 000001/Resolution 0001/Subimage 0000 Header
------------------- ----- ------------ ------------  ------------------------
2023-12-27 11:04:39              38698        39872  29 files, 7 folders

This is a simple MIX file with one line of text, but contains a lot of content inside the OLE container. If I try and use the PRONOM registry to identify the file, I get:

sf PictureIt1-s02.mix 
---
siegfried   : 1.11.0
scandate    : 2023-12-27T11:06:32-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'PictureIt1-s02.mix'
filesize : 48128
modified : 2023-12-27T11:04:40-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/111'
    format  : 'OLE2 Compound Document Format'
    version : 
    mime    : 
    class   : 'Text (Structured)'
    basis   : 'byte match at 0, 30'
    warning : 

Hmm, we know it is an OLE compound document, but it should identify as a Picture It! file as PRONOM has defined a PUID for the format. fmt/936 has been defined as “Microsoft Picture It! Image File 1”. So I am not sure why this file from version 1 is not identifying correctly. Let’s take a look. The PRONOM container signature for fmt/936 is looking for this:

    <ContainerSignature Id="17015" ContainerType="OLE2">
      <Description>Microsoft Picture It! Image File</Description>
      <Files>
        <File>
          <Path>CompObj</Path>
          <BinarySignatures>
            <InternalSignatureCollection>
              <InternalSignature ID="17015">
                <ByteSequence Reference="BOFoffset">
                  <SubSequence Position="1" SubSeqMinOffset="32"
                               SubSeqMaxOffset="32">
                    <Sequence>'Microsoft Picture It! version 1 Picture'</Sequence>
                  </SubSequence>
                </ByteSequence>
              </InternalSignature>
            </InternalSignatureCollection>
          </BinarySignatures>
        </File>
      </Files>
    </ContainerSignature>

The container signature is looking into the OLE container for the “CompObj” file (which seems to be required), then looks for the string “Microsoft Picture It! version 1 Picture” starting at the 32nd byte. That is pretty specific. The sample file I am using as an example has the following string of bytes.

hexdump -C PictureIt1-s02/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 00 68 61 56  |.............haV|
00000010  54 c1 ce 11 85 53 00 aa  00 a1 f9 5b 1e 00 00 00  |T....S.....[....|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 50 69  63 74 75 72 65 00 27 00  |e It! Picture.'.|
00000040  00 00 7b 35 36 36 31 36  38 30 30 2d 43 31 35 34  |..{56616800-C154|
00000050  2d 31 31 43 45 2d 38 35  35 33 2d 30 30 41 41 30  |-11CE-8553-00AA0|
00000060  30 41 31 46 39 35 42 7d  00 13 00 00 00 50 69 63  |0A1F95B}.....Pic|
00000070  74 75 72 65 49 74 21 2e  50 69 63 74 75 72 65 00  |tureIt!.Picture.|

Ok, so this sample has a similar string but is missing the “version 1” text. It seems the samples used to created the PRONOM signature was working off samples which included the version 1 in the header of CompObj. Maybe when Microsoft learned they would be making a version 2, they decided a version number should be included going forward. Let’s take a look a file from version 2 to compare:

hexdump -C PictureIt2-s01/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 50 28 72 2d  |............P(r-|
00000010  4b 8c d0 11 a9 6f 00 a0  c9 05 41 0d 28 00 00 00  |K....o....A.(...|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 76 65  72 73 69 6f 6e 20 32 20  |e It! version 2 |
00000040  50 69 63 74 75 72 65 00  27 00 00 00 7b 32 44 37  |Picture.'...{2D7|
00000050  32 32 38 35 30 2d 38 43  34 42 2d 31 31 44 30 2d  |22850-8C4B-11D0-|
00000060  41 39 36 46 2d 30 30 41  30 43 39 30 35 34 31 30  |A96F-00A0C905410|
00000070  44 7d 00 f4 39 b2 71 50  00 00 00 4d 00 69 00 63  |D}..9.qP...M.i.c|

Ok, so it looks like they did update the version string for version 2. This file also does not identify correctly. A quick look at the wikipedia page for Microsoft Picture It! tells us they continued to release the software until version 10. Is there a different string for each version?

Diving into this and gathering many samples has brought a lot of variants to surface. Let’s see if we can list all the CompObj header variants.

Version 1 samples:
Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! version 1 Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! Collage'{56616800-C154-11CE-8553-00AA00A1F95B}

Version 2 samples:
Microsoft Picture It! version 2 Picture'{2D722850-8C4B-11D0-A96F-00A0C905410D}

Version 3 samples:
Microsoft Picture It! version 3 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

Version 4 samples:
Microsoft Picture It! version 4 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 1 samples:
Microsoft PhotoDraw version 1 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 2 samples:
Microsoft PhotoDraw version 2 Picture'{18B8D021-B4FD-11D0-A97E-00A0C905410D}

FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

Ok, there is a lot to discuss here. First of all, it seems MIX was only used in Picture It! until version 5 (2001), then the Picture It! software used a new format, PNG Plus to store the layered stacks. More on that in a future post! Although some later versions seems to be able to open the older MIX format. Version 4 of the MIX format seems to be the last as the 2001 software had only version 4 files on it. Probably safe to say only the 4 versions are needed for identification.

You may notice the additional unique identifier I included in each format. This is called a Class ID for the OLE format, which A LOT of formats use. Each “format” has a unique ID associated with it to help distinguish it from other formats. This Unique ID could possibly be a better solution for identification. It does cross over with the PhotoDraw format, but the FlashPix format seems to have a unique ID. With all the variations in the version 1 strings, the ID remains the same. For version 3 and 4 the ID is the same, which could mean they are interchangeable. It is also the same as PhotoDraw version 1. Not to complicate things.

So it seems in order to get proper identification of these similar formats we need to:

  • Clean up version 1 identification for fmt/936
  • Add a signature for 2, 3, and 4
  • Add a version 2 signature for the PhotoDraw format
  • Add some additional signature variations for the FlashPix format.

The Class ID’s could be used to distinguish different versions and formats, but many of the ID’s are identical, this could mean they are the same format. But for now we can just add the additional variation strings and it should identify everything for now. The FlashPix format needs more research as there is so many different variations and it’s so close to the MIX format. Take a look at my GitHub submission, maybe you have some additional variations to add?

Digital Negatives

One of the important parts about Digital Preservation is to gather significant properties of the digital files we hope to preserve. This can allow us to base our risk assessments off of more data than just an extension. For example, a TIFF file is a mighty good preservation format. Well documented and adopted by the preservation community, and with hundreds if not thousands of software tools to render and make use of the format. But if a TIFF file uses compression like LZW, or if it happens to have multiple pages, those are good things to know about. Most formats might have a stable set of properties, but sometimes can have properties which adds more risk to the format becoming difficult to render or migrate.

A DNG or Digital Negative developed by Adobe was supposed to solve the issues with proprietary RAW digital camera formats. Rendering a PhaseOne IIQ file often times requires the full CaptureOne software which can be expensive. Adobe spends quite a bit of resources in adding support to its Camera RAW toolkit and adding the ability to take majority of these RAW formats and move them into a DNG. There is also more and more camera manufacturers who image directly to a DNG as their native RAW format. This is the case for Apple’s ProRAW format which uses the DNG specification.

Another manufacturer is the Insta360 camera’s. Their 360 camera’s can use two lenses to capture 180 degrees from each and then stitch into a 360 photo or video. They can capture compressed images and videos, but also in RAW. Because of the two lenses and sensors, their DNG’s can get quite large. For this reason I recently asked PRONOM to adjust their signatures to allow for a bigger offset of DNG information in the larger RAW images.

exiftool IMG_20230913_141939_00_039.dng 
ExifTool Version Number         : 12.70
File Name                       : IMG_20230913_141939_00_039.dng
File Size                       : 143 MB
File Type                       : DNG
File Type Extension             : dng
MIME Type                       : image/x-adobe-dng
Exif Byte Order                 : Little-endian (Intel, II)
Subfile Type                    : Full-resolution image
Image Width                     : 5984
Image Height                    : 11968
Bits Per Sample                 : 16
Compression                     : Uncompressed
Photometric Interpretation      : Color Filter Array
Make                            : Arashi Vision
Camera Model Name               : Insta360 X3

DNG files are actually based on the TIFF format, TIFF/EP to be precise, which means there is some good history behind the format and understanding of its structure. DNG does add many new tags and new features, so there is much more going on. Here is a TIFFInfo view of a DNG. Lots of new tags…..

tiffinfo IMG_20230913_141939_00_039.dng 
TIFFReadDirectory: Warning, Unknown field with tag 33421 (0x828d) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 33422 (0x828e) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50937 (0xc6f9) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50938 (0xc6fa) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50940 (0xc6fc) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 51009 (0xc741) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 51107 (0xc7a3) encountered.
=== TIFF directory 0 ===
TIFF Directory at offset 0x889946c (143234156)
  Subfile Type: (0 = 0x0)
  Image Width: 5984 Image Length: 11968
  Bits/Sample: 16
  Sample Format: unsigned integer
  Compression Scheme: None
  Photometric Interpretation: 32803 (0x8023)
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 11968
  Planar Configuration: single image plane
  Make: Arashi Vision
  Model: Insta360 X3
  Software: v1.0.69_build1
  DateTime: 2023:09:13 14:19:40
  Tag 33421: 2,2
  Tag 33422: 1,2,0,1
  EXIFIFDOffset: 0x8
  GPSIFDOffset: 0x3e6
  DNGVersion: 1,3,0,0
  DNGBackwardVersion: 1,3,0,0
  UniqueCameraModel: Insta360 X3

An IFD (Image File Directory) is the building block of a TIFF file. A TIFF file can have multiple IFD’s within a single file. But an IFD can also be a thumbnail, metadata or GPS info. For a DNG, they use the IFD structure as well, but often, the first IFD is a lower resolution of the full image.

 <File:FileType>DNG</File:FileType>
 <File:FileTypeExtension>dng</File:FileTypeExtension>
 <File:MIMEType>image/x-adobe-dng</File:MIMEType>
 <File:ExifByteOrder>Little-endian (Intel, II)</File:ExifByteOrder>
 <IFD0:SubfileType>Reduced-resolution image</IFD0:SubfileType>
 <IFD0:ImageWidth>256</IFD0:ImageWidth>
 <IFD0:ImageHeight>171</IFD0:ImageHeight>
 <IFD0:BitsPerSample>8 8 8</IFD0:BitsPerSample>
 <IFD0:Compression>Uncompressed</IFD0:Compression>
 <IFD0:PhotometricInterpretation>RGB</IFD0:PhotometricInterpretation>
 <IFD0:Make>Canon</IFD0:Make>
 <IFD0:Model>Canon EOS RP</IFD0:Model>
...
 <SubIFD:SubfileType>Full-resolution image</SubIFD:SubfileType>
 <SubIFD:ImageWidth>6384</SubIFD:ImageWidth>
 <SubIFD:ImageHeight>4224</SubIFD:ImageHeight>
 <SubIFD:BitsPerSample>16</SubIFD:BitsPerSample>
 <SubIFD:Compression>JPEG</SubIFD:Compression>

But not always the same way.

 <IFD0:SubfileType>Full-resolution image</IFD0:SubfileType>
 <IFD0:ImageWidth>5984</IFD0:ImageWidth>
 <IFD0:ImageHeight>11968</IFD0:ImageHeight>
 <IFD0:BitsPerSample>16</IFD0:BitsPerSample>
 <IFD0:Compression>Uncompressed</IFD0:Compression>
 <IFD0:PhotometricInterpretation>Color Filter Array</IFD0:PhotometricInterpretation>
 <IFD0:Make>Arashi Vision</IFD0:Make>
 <IFD0:Model>Insta360 X3</IFD0:Model>

 <IFD0:SubfileType>Reduced-resolution image</IFD0:SubfileType>
 <IFD0:ImageWidth>4032</IFD0:ImageWidth>
 <IFD0:ImageHeight>3024</IFD0:ImageHeight>
 <IFD0:BitsPerSample>8 8 8</IFD0:BitsPerSample>
 <IFD0:Compression>JPEG</IFD0:Compression>
 <IFD0:PhotometricInterpretation>YCbCr</IFD0:PhotometricInterpretation>
 <IFD0:Make>Apple</IFD0:Make>
 <IFD0:Model>iPhone 13 Pro</IFD0:Model>
...
 <SubIFD:SubfileType>Full-resolution image</SubIFD:SubfileType>
 <SubIFD:ImageWidth>4032</SubIFD:ImageWidth>
 <SubIFD:ImageHeight>3024</SubIFD:ImageHeight>
 <SubIFD:BitsPerSample>12 12 12</SubIFD:BitsPerSample>
 <SubIFD:Compression>JPEG</SubIFD:Compression>

It can get confusing, especially for tools we use to extract metadata and significant properties from a DNG for preservation. Within Rosetta, the preservation system I use at work, there is no dedicated DNG extractor, so we use JHOVE, as it is the tool we use for our TIFF images. This presents a problem as the process only extracts properties for the first IFD assuming it is the main IFD, but in many cases it reports back the image is much smaller in pixel dimensions than it actually is. More work is needed to improve extracting correct significant properties for DNG and other RAW image formats.

Adobe released a new version of DNG this year. In June, DNG version 1.7.0.0 was finalized. The new version brought a few new features, two of which are including JPEG XL compression and a new HDR colorimetric value. In order to add JPEG XL compression DNG version 1.7 is required. Here is how one looks in exiftool, created with Adobe DNG Converter 16.1.

exiftool _MG_9375_1.dng 
ExifTool Version Number         : 12.70
File Name                       : _MG_9375_1.dng
File Size                       : 5.4 MB
File Type                       : DNG
File Type Extension             : dng
MIME Type                       : image/x-adobe-dng
Exif Byte Order                 : Little-endian (Intel, II)
Make                            : Canon
Camera Model Name               : Canon EOS DIGITAL REBEL XT
Preview Image Start             : 91884
Orientation                     : Rotate 270 CW
Rows Per Strip                  : 171
Preview Image Length            : 10305
Software                        : Adobe DNG Converter 16.1 (Macintosh)
Modify Date                     : 2023:12:18 11:45:06
Artist                          : unknown
Image Width                     : 3516
Image Height                    : 2328
Bits Per Sample                 : 16
Compression                     : JPEG XL
DNG Version                     : 1.7.1.0
DNG Backward Version            : 1.7.1.0

I had recently submitted a new signature for DNG 1.7 to PRONOM, but I found this new DNG version falls outside the signature I created. I had made the assumption all DNG’s report their version based on the last two values of 0.0, so I created the signature to look for 1.7.0.0. This is wrong now that I can see an example of version 1.7.1.0.

In order to fix the issue, I would need to change all the DNG signatures to remove the last two bytes so:

12C601000400000001070000 would change to 12C60100040000000107

This would allow for identification if some DNG files have a point version.

The pace at which manufacturers are producing camera’s with new features is much faster than the Digital Preservation community can keep up with. As new technologies get released, we play catch up trying to identify new formats and variations to existing ones. I guess that is job security?

Final Cut Pro

When it comes to Digital Preservation, the easiest types of file formats to preserve are often single self contained formats with lots of documentation. There are plenty of formats which break this norm, but a file format like a simple TIFF file is well understood and can stand on its own. The hardest file formats to preserve, I have found, are the complex under documented formats which often show up when you don’t expect them. There is a file format type which indeed makes things difficult. The project format.

There are many software tools out there which generate a “Project”, this is often proprietary and can only be used by the software which created it. Project files are also interdependent, meaning they require other files in known locations in order to be used. This interdependence is often links to images, audio, video, fonts, and other multimedia. The file format itself is just a reference to all the project settings and the paths to the files included in the project. This makes things very difficult to preserve and maintain the complex structure required. Any renaming, removing, or moving the files out of their original order can render the project useless. Many project formats are human readable in XML, or other human readable text, but others are not. I have made a recent attempt to document more Project formats on the File Format Wiki, including many Label and Optical disc project formats, along with updates to Adobe InDesign, QuarkXPress and other desktop publishing project formats. There is still plenty of work needed in other Video and Audio project formats.

Apple computers over the years has created some very powerful software for content creators to use, especially in Video editing. iMovie was used by many home movie editors and iDVD to burn those movies to DVD to share with family and friends, but Apple also sold a professional Video Editing suite which included Final Cut Pro.

Final Cut Pro started life as a Macromedia software tool called KeyGrip which never was released and later bought by Apple. Final Cut Pro was well used and loved by video editors and was given a major upgrade in 2011 to Final Cut Pro X, which was full re-written to be 64-bit. This change included a change to the Project file format. So for version 1 through version 7, Final Cut Pro used a project format with the extension .FCP. Lets take a closer look at the this project format.

hexdump -C Swing.fcp | head
00000000  a2 4b 65 79 47 0a 0d 0a  00 00 00 00 20 fc c5 5b  |.KeyG....... ..[|
00000010  00 de b3 11 d0 93 19 00  05 02 18 66 07 00 00 00  |...........f....|
00000020  03 00 00 00 00 00 00 00  00 01 00 00 00 00 01 00  |................|
00000030  00 00 11 07 73 75 62 74  79 70 65 00 00 00 01 01  |....subtype.....|
00000040  00 00 00 03 00 06 4e 4f  55 4e 44 4f 00 00 00 00  |......NOUNDO....|
00000050  01 01 00 00 00 00 00 00  00 00 00 00 00 07 52 55  |..............RU|
00000060  4e 54 49 4d 45 00 00 00  00 01 01 00 00 00 00 00  |NTIME...........|
00000070  00 00 00 01 07 76 69 65  77 65 72 73 00 00 00 00  |.....viewers....|
00000080  01 01 00 00 00 00 00 00  00 00 00 00 00 00 00 08  |................|
00000090  63 68 69 6c 64 72 65 6e  00 00 00 00 01 01 00 00  |children........|
*
00000e30  00 00 00 00 00 00 00 00  00 00 00 00 00 00 07 8c  |................|
00000e40  b3 2e 56 40 4d 6f 6f 56  54 56 4f 44 00 02 00 02  |..V@MooVTVOD....|
00000e50  00 00 00 11 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000e60  00 00 00 0b 44 61 6e 63  65 20 53 68 6f 74 73 00  |....Dance Shots.|
00000e70  00 01 00 08 00 00 07 8a  00 00 07 84 00 02 00 2f  |.............../|
00000e80  41 54 54 4f 20 52 41 49  44 30 20 47 72 6f 75 70  |ATTO RAID0 Group|
00000e90  3a 54 55 54 4f 52 49 41  4c 3a 44 61 6e 63 65 20  |:TUTORIAL:Dance |
00000ea0  53 68 6f 74 73 3a 49 6e  74 72 6f 2e 6d 6f 76 00  |Shots:Intro.mov.|
00000eb0  00 09 00 a8 00 a8 61 66  70 6d 00 00 00 00 00 03  |......afpm......|
00000ec0  00 18 00 39 00 59 00 75  00 95 00 9e 07 49 4c 31  |...9.Y.u.....IL1|
00000ed0  20 33 72 64 00 00 00 00  00 00 00 00 00 00 00 00  | 3rd............|
00000ee0  00 00 00 00 00 00 00 00  00 00 00 00 00 0f 77 61  |..............wa|
00000ef0  6c 74 d5 73 20 43 6f 6d  70 75 74 65 72 00 00 00  |lt.s Computer...|
00000f00  00 00 00 00 00 00 00 00  00 00 00 00 00 10 41 54  |..............AT|
00000f10  54 4f 20 52 41 49 44 30  20 47 72 6f 75 70 00 00  |TO RAID0 Group..|
00000f20  00 00 00 00 00 00 00 00  00 07 77 73 68 69 72 65  |..........wshire|
00000f30  73 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |s...............|
00000f40  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000f50  00 00 00 00 00 00 00 00  00 00 00 00 ff ff 00 00  |................|
00000f60  00 00 00 00 00 00 00 10  41 54 54 4f 20 52 41 49  |........ATTO RAI|
00000f70  44 30 20 47 72 6f 75 70  00 00 00 00 00 00 00 2b  |D0 Group.......+|
00000f80  00 00 00 01 00 00 00 03  00 00 00 03 54 55 54 4f  |............TUTO|
00000f90  52 49 41 4c 00 44 61 6e  63 65 20 53 68 6f 74 73  |RIAL.Dance Shots|
00000fa0  00 49 6e 74 72 6f 2e 6d  6f 76 00 00 00 00 00 00  |.Intro.mov......|

From the header we can see a remnant of the original KeyGrip software, but later in the file we find some references to files in the Mac HFS path format which includes a colon instead of a slash. These are the paths to the each of the MOV files used in the Project. This file is from the tutorial disk of Final Cut Pro version 1.2, so lets take a look at the last version released, version 7.

hexdump -C Lesson 1 Project.fcp | head
00000000  a2 4b 65 79 47 0a 0d 0a  01 de 00 00 00 20 08 92  |.KeyG........ ..|
00000010  66 c4 28 d7 11 8a e5 00  30 65 ec fe 98 03 00 00  |f.(.....0e......|
00000020  00 00 00 00 00 00 00 00  00 01 00 00 00 00 01 15  |................|
00000030  00 00 00 07 73 75 62 74  79 70 65 01 00 00 00 01  |....subtype.....|
00000040  03 00 00 00 00 06 4e 4f  55 4e 44 4f 00 00 00 00  |......NOUNDO....|
00000050  01 01 00 00 00 00 00 00  00 00 00 00 00 07 52 55  |..............RU|
00000060  4e 54 49 4d 45 00 00 00  00 01 01 00 00 00 00 00  |NTIME...........|
00000070  01 00 00 00 07 76 69 65  77 65 72 73 00 00 00 00  |.....viewers....|
00000080  01 01 00 00 00 00 00 00  00 00 00 00 00 00 00 08  |................|
00000090  63 68 69 6c 64 72 65 6e  00 00 00 00 01 01 01 00  |children........|

Almost identical to the first version, which is helpful for identification, but if we need to identify based on version, it might prove a little more difficult. It appears all the samples I have and have seen reference to all begin with the same 5 hex values, A24B657947, 0xA2 KeyG. It’s hard to know what other hex values might have something to do with versions of the file format. More samples could tell us, but from what I have the 20 bytes starting from offset 12 seems to be consistent among the different version samples. But for now the 5 bytes at the beginning of the file should suffice for identification.

When Final Cut Pro went through a complete re-write in 2011, the FCP format was abandoned. Not only made obsolete, but completely unsupported. The new Final Cut Pro X software was not able to support this now obsolete format. The new format followed the pattern of many other Apple formats of using a folder identified through an extension as a single file. Called a bundle format, Final Cut Pro X used the extension, .FCPBUNDLE. This bundle could include the media assets along with project settings/thumbnails and clips. Because of this “bundle” format, identification would have to be done at the individual file level inside the bundle. This would include formats with extensions such as .flexolibrary and .fcpevent, which appear to be SQLite databases. This complex format makes preservation of this type of object difficult with current methods and practices.

Luckily Apple didn’t leave Final Cut Pro users completely unable to migrate their content. Final Cut Pro could export the project as an XML file. This format is called Final Cut Pro XML Interchange Format and was well documented. The format was not made to bridge the gap from Final Cut Pro to Final Cut Pro X, but rather make the project file more useful outside of Final Cut Pro. Final Cut Pro X actually can’t open these files either, which is why a third party developer came in and developed 7toX (SendtoX) to allow for projects to be converted to a newer XML format.

Lets take a look at the basic Final Cut Pro XML Interchange Format which has a standard XML extension:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="Sequence 1 ">...</sequence>
</xmeml>

Standard XML with a Doctype/root of xmeml. Clever. A little ways into the XML we also see:

<appspecificdata>
	<appname>Final Cut Pro</appname>
	<appmanufacturer>Apple Inc.</appmanufacturer>
	<appversion>7.0</appversion>
</appspecificdata>

Final Cut Pro X also has an XML format which is different than XMEML and has an extension FCPXML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE fcpxml>

<fcpxml version="1.8">
    <resources>
        <format id="r1" name="FFVideoFormatDV720x480i5994" frameDuration="2002/60000s" fieldOrder="lower first" width="720" height="480" paspH="10" paspV="11" colorSpace="6-1-6 (Rec. 601 (NTSC))"/>
    </resources>
    <library location="file:///Untitled.fcpbundle/">...</library>
</fcpxml>

A different Doctype/root and structure but should be easy to identify.

The preservation of projects files, according to some, is not necessary since they are not the finalized product. Preserving the finalized output would be preferable as it can be managed easier and represent the final render of a project. But identification of the Final Cut Pro project and all the assets gives the option to access a collection more accurately. I was able to create a signature for the FCP, XML, and FCPXML formats. Take a look on my GitHub for the signatures and some test files.

PianoSoft DOM-30

I often find myself at a thrift store looking through the well used Compact Discs. Often see the same ones over and over, but occasionally finding a gem. While looking through a set of discs, a few caught my eye. When I pulled one out to look at the cover I noticed it was not your typical CD. Opening the cover I was greeted with a 3.5 floppy inside the jewel case. That was a fun surprise.

The 3.5 inch floppy disk appears to be made specifically for the Yamaha Disklavier piano’s. The disk had the appearance of your typical double density floppy. Unfortunately, when I inserted the disk I was greeted with the error, “no mountable file systems”. I was however able to use ddrescue and make a disk image. Here is what the disk header looks like:

hexdump -C Yamaha_RazzleDazzle.img | head
00000000  e5 e5 e5 e5 e5 e5 e5 e5  e5 e5 e5 e5 e5 e5 e5 e5  |................|
*
00000200  f9 ff ff 03 40 00 05 60  00 07 80 00 47 00 00 0b  |....@..`....G...|
00000210  c0 00 0d e0 00 0f f0 ff  11 20 01 13 40 01 15 f0  |......... ..@...|
00000220  ff 17 80 01 19 a0 01 1b  c0 01 42 00 00 1f 00 02  |..........B.....|
00000230  21 20 02 23 40 02 25 60  02 27 80 02 29 f0 ff 2b  |! .#@.%`.'..)..+|
00000240  c0 02 2d e0 02 2f f0 ff  31 20 03 33 40 03 35 60  |..-../..1 .3@.5`|
00000250  03 37 80 03 39 f0 ff 3b  c0 03 3d e0 03 3f 00 04  |.7..9..;..=..?..|
00000260  41 f0 ff ff 4f 04 45 60  04 48 f0 ff 49 f0 ff 00  |A...O.E`.H..I...|
00000270  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The header actually has 512 bytes of the E5 hex values. Not a FAT12 file system for sure. A little farther into the disk image I can see what appears to be file listing.

00000e00  53 4f 4e 47 20 20 20 20  50 30 31 00 00 00 00 00  |SONG    P01.....|
00000e10  00 00 00 00 00 00 00 00  00 00 02 00 00 20 00 00  |............. ..|
00000e20  53 48 4f 52 54 20 20 20  50 30 32 20 00 00 00 00  |SHORT   P02 ....|
00000e30  00 00 00 00 00 00 fa ac  a2 16 0a 00 00 18 00 00  |................|
00000e40  54 4f 59 20 20 20 20 20  50 30 33 20 00 00 00 00  |TOY     P03 ....|
00000e50  00 00 00 00 00 00 05 ad  a2 16 10 00 00 18 00 00  |................|
00000e60  53 57 45 45 54 20 20 20  50 30 34 20 00 00 00 00  |SWEET   P04 ....|
00000e70  00 00 00 00 00 00 12 ad  a2 16 16 00 00 20 00 00  |............. ..|
00000e80  56 49 4f 4c 49 4e 20 20  50 30 35 20 00 00 00 00  |VIOLIN  P05 ....|
00000e90  00 00 00 00 00 00 40 ad  a2 16 1e 00 00 30 00 00  |......@......0..|
00000ea0  51 55 49 45 54 20 20 20  50 30 36 20 00 00 00 00  |QUIET   P06 ....|
00000eb0  00 00 00 00 00 00 50 ad  a2 16 2a 00 00 18 00 00  |......P...*.....|
00000ec0  52 41 5a 5a 4c 45 20 20  50 30 37 20 00 00 00 00  |RAZZLE  P07 ....|
00000ed0  00 00 00 00 00 00 83 ad  a2 16 30 00 00 28 00 00  |..........0..(..|
00000ee0  4c 45 54 53 20 20 20 20  50 30 38 20 00 00 00 00  |LETS    P08 ....|
00000ef0  00 00 00 00 00 00 9a ad  a2 16 3a 00 00 20 00 00  |..........:.. ..|
00000f00  50 49 41 4e 4f 44 49 52  46 49 4c 20 00 00 00 00  |PIANODIRFIL ....|
00000f10  00 00 00 00 00 00 a1 ad  a2 16 43 00 00 18 00 00  |..........C.....|
00000f20  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001c00  fe 00 00 00 20 00 00 43  4f 4d 2d 45 53 45 51 51  |.... ..COM-ESEQQ|
00001c10  31 31 56 31 2e 30 30 80  00 00 00 d9 01 00 00 00  |11V1.00.........|
00001c20  20 00 00 01 58 00 00 20  20 20 20 20 20 20 20 20  | ...X..         |
00001c30  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

This gave me a few clues. A few google searches I came across a project on Hackaday about “Hacking Yamaha Disklavier Floppies“. I was eager to test out the python software and see if could export the files on the disk image. To my disappointment, the python script could not read my disk image. So I reached out to the author. After sharing a couple disk images with him, he was able to enable support for this different type of disklavier disk.

python3 disklav.py -t Yamaha_RazzleDazzle.img 
Loading file...OK
Format: PianoSoft DOM-30
Disk: PPC 1919       
Title: RAZZLE DAZZLE
--------------------------------------------------------------------
Track 01 - Song Without Words
Track 02 - Shortenin' Bread Boogie
Track 03 - Toy Bugle
Track 04 - Sweet Tooth
Track 05 - The Mysterious Violin
Track 06 - Quiet Moment
Track 07 - Razzle Dazzle
Track 08 - Let's Have A Party!

When I used the extract option with the software I was rewarded with eight files with the .FIL extention.

sf Yamaha_RazzleDazzle-track01.fil           
---
siegfried   : 1.10.1
scandate    : 2023-12-04T22:38:04-07:00
signature   : default.sig
created     : 2023-12-04T22:37:35-07:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'Yamaha_RazzleDazzle-track01.fil'
filesize : 6409
modified : 2023-12-04T22:28:34-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match'

No surprise the file was not known to PRONOM. I doubt these files have made a big appearance in many archives.

hexdump -C Yamaha_RazzleDazzle-track01.fil | head
00000000  fe 00 00 00 20 00 00 43  4f 4d 2d 45 53 45 51 51  |.... ..COM-ESEQQ|
00000010  31 31 56 31 2e 30 30 80  00 00 00 d9 01 00 00 00  |11V1.00.........|
00000020  20 00 00 01 58 00 00 20  20 20 20 20 20 20 20 20  | ...X..         |
00000030  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000120  20 20 20 20 20 20 20 00  76 04 02 00 1e ff 00 00  |       .v.......|
00000130  ff ff ff ff ff 00 ff 00  00 00 00 ff ff 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 02 02 02  02 02 02 02 02 01 01 01  |................|
00000160  01 01 01 01 01 01 01 01  01 01 01 01 01 00 00 00  |................|

The header of a .FIL file has an ascii string “COM-ESEQ”. A little investigation shed a little light on the format. Turns out there is a proprietary format Yamaha developed called “E-SEQ”. The E-SEQ format is compatible with all Disklaviers, Clavinova digital pianos, and a few other Yamaha products. I was curious if the format was something similar to a MIDI file, which was commonly used with early keyboard systems, but I was unable to find anything to suggest they are similar in any way. Yamaha does mention there are tools out there to convert an E-SEQ file to a SMF (Standard Midi File) which was used on other systems.

There is another tool called PPFBU which can be used to extract a disk image from a Disklavier floppy and the E-SEQ files. Along with a companion tool called MID2PianoCD claims to be able to convert a E-SEQ to a WAV or MP3, although I haven’t had much luck.

Another set of tools are available here, they allow for copying of a disk and converting back and forth from E-SEQ and Midi. A text file in DVUtils from the link has the following background about the disk format and the FIL files:

DISKLAVIER FILES AND DISCS


Yamaha Disklavier discs are always on Double Density (2DD) media, High Density (HD)discs, which are more common nowadays,  will not work. Furthermore, they are formatted to 720 Kbytes not the default of 1.2 Mbytes.  The original discs are copy protected.  This has been achieved by placing invalid data on the first sector.  As DOS and Windows always refer to this sector to check out a floppy, they will report that the discs are bad.  The Yamaha machinery ignores the first sector so it reads them normally.

The music files on a Disklavier disc have the extension .FIL . They are frequently identified with titles like PIANO001.FIL but sometimes they have names similar to DOS like MUSIC1.FIL.  In addition to the music files, there is an index file on the disc.  This contains a list of the active music files on the disc, their titles, and pointers to their position on the disc.  The index file is always called PIANODIR.FIL and always has a size of 6 Kbytes. In order to set up a Disklavier disc to function on a Disklavier, you must first copy the music files onto it in Disklavier format (ESEQ) and then run the ESEQ EXPLORER program to build the index file.

Although there are many Disklavier Piano’s still out there and quite expensive if you want to pick one up for yourself, the websites dedicated to the format as slowly disappearing. One archived website has plenty of sample files to download and write to a floppy for use if you happen to have a Disklavier.

You can check out the signature I put together for the E-SEQ on my Github. Might be good to explore the disk image format more and add that as well to PRONOM for identification as well.

Embroidery Formats

There are certain file formats which seem to be fairly mainstream and come up frequently from a variety of sources. Then you find one from a specialized niche industry. I recently came across a file with the HUS extension and it led me down a path of a family of formats I didn’t know existed. The world of embroidery formats has been around for quite some time and is quite confusing. There are many manufacturers of embroidery machines and over the years have merged with each other or upgraded to include new features all requiring changes to the file formats used by the systems.

Embroidery stitching designs are a unique file formats. They are images, probably vector, but with stitching information included in the file which determines the color of thread used and pattern used.

I took a data set of many different embroidery files from here and ran it through Siegfried.

That is pretty bleak. Not a single file identified except a few using an OLE container and a couple files which may be plain bitmaps. I have started to document where each of these formats come from on the File Format Wiki, but there are so many formats to research. The triple XXX format research has its other challenges…..

Today I wanted to focus on one brand, Husqvarna, a Swedish company with a long history.

Export Formats in Premier+ Software

Husqvarna has made sewing machines or quite some time. In the late 1970’s machines started to enter the market which were “computerized”. It was in the early 1990’s when the embroidery machines could take a memory card full of stitching patterns, then later could take a floppy disk or USB drive full of custom made designs. Let’s take a look at a few of the formats, starting with the extension .HUS.

First a sample with a 1994 last modified date:

hexdump -C Bow.hus | head
00000000  5b af c8 00 1a 09 00 00  01 00 00 00 98 01 67 01  |[.............g.|
00000010  6f fe 9b fe 2c 00 00 00  47 00 00 00 0e 07 00 00  |o...,...G.......|
00000020  00 00 00 00 00 00 00 00  00 00 07 00 00 10 38 84  |..............8.|
00000030  81 af f8 d9 6e bc 5e c1  b4 1d fc b2 00 00 00 0c  |....n.^.........|
00000040  93 03 49 98 00 e5 f0 07  41 77 75 c6 49 c6 ff e2  |..I.....Awu.I...|
00000050  80 aa aa 08 00 28 82 a8  b2 49 27 5e dd ae ba ee  |.....(...I'^....|
00000060  db 78 05 bc 4b ef 00 37  f0 da db b5 d6 ee 93 9e  |.x..K..7........|
00000070  55 15 01 00 44 00 05 51  01 3e 16 cf db ce 2c bc  |U...D..Q.>....,.|
00000080  ad db 48 97 cb e5 f2 fe  fc 63 79 e7 a3 a1 46 80  |..H......cy...F.|
00000090  5c 37 f0 10 41 43 a1 40  21 c7 a5 d2 6e 82 0a 20  |\7..AC.@!...n.. |

Then one from around 1996:

hexdump -C ROSEBUD.HUS | head
00000000  5b ff c8 00 5f 0b 00 00  04 00 00 00 ce 00 69 01  |[..._.........i.|
00000010  01 ff 17 fe 3a 00 00 00  66 00 00 00 97 09 00 00  |....:...f.......|
00000020  00 6b 6e 6c 6a 6b 00 00  00 00 0a 00 19 00 1a 00  |.knljk..........|
00000030  0e 00 1a 00 05 00 a4 a2  00 10 00 19 42 88 80 35  |............B..5|
00000040  ff 4d 96 0d c1 72 49 6e  09 00 c8 72 ee 90 76 93  |.M...rIn...r..v.|
00000050  47 ca 20 00 00 4a 32 b1  d4 d4 08 e1 e7 86 d3 50  |G. ..J2........P|
00000060  20 0f 2f 1a 63 e0 09 66  7e ed f0 85 b9 37 fd 12  | ./.c..f~....7..|
00000070  2c 89 09 01 02 00 30 33  33 00 44 96 4b 36 49 65  |,.....033.D.K6Ie|
00000080  bd d7 77 7e df bb cd f4  9b db e7 6f 5b 76 ed b1  |..w~.......o[v..|
00000090  2d b6 49 08 03 31 81 8c  00 02 44 91 22 24 b2 5f  |-.I..1....D."$._|

And another from 1998:

hexdump -C FLOWER.HUS | head
00000000  5d fc c8 00 de 23 00 00  04 00 00 00 79 01 e5 01  |]....#......y...|
00000010  87 fe 1b fe 32 00 00 00  e0 00 00 00 51 1b 00 00  |....2.......Q...|
00000020  00 00 00 00 00 00 00 00  00 00 0e 00 03 00 0c 00  |................|
00000030  05 00 00 69 53 6d 82 36  bf f0 d8 7a 55 c1 0b b6  |...iSm.6...zU...|
00000040  fd b9 52 ff da 61 20 20  14 6a 31 1d e8 1c b0 7f  |..R..a  .j1.....|
00000050  92 cc 48 0c 38 59 16 49  6d 71 11 39 00 00 1e 10  |..H.8Y.Imq.9....|
00000060  4c fc 18 00 00 00 00 00  00 00 0f 68 67 e0 be 11  |L..........hg...|
00000070  17 88 d5 2b 13 c4 5b 33  a2 98 f7 b9 6e 2d dc 62  |...+..[3....n-.b|
00000080  ba 5e 8f 50 bf 09 f9 28  13 38 29 2a de 47 f4 c1  |.^.P...(.8)*.G..|
00000090  9c 3e d6 37 bc 8c ad 95  f0 b3 c1 97 bc fb 1f b5  |.>.7............|

After 1998 Viking Husqvarna merged with the Pfaff brand and a new format with the extension .VIP was used. I found a couple variants of this format.

hexdump -C Magic Flower.vip | head
00000000  5d fc 90 01 0e 06 00 00  01 00 00 00 3e 01 19 01  |]...........>...|
00000010  c2 fe e7 fe 6e 00 00 00  80 00 00 00 c8 04 00 00  |....n...........|
00000020  00 00 00 00 00 00 00 00  00 00 36 00 00 00 89 bf  |..........6.....|
00000030  93 fc 01 00 00 00 1a 00  00 00 46 00 6c 00 6f 00  |..........F.l.o.|
00000040  72 00 61 00 6c 00 3b 00  20 00 41 00 62 00 73 00  |r.a.l.;. .A.b.s.|
00000050  74 00 72 00 61 00 63 00  74 00 3b 00 20 00 46 00  |t.r.a.c.t.;. .F.|
00000060  61 00 73 00 68 00 69 00  6f 00 6e 00 00 00 00 0b  |a.s.h.i.o.n.....|
00000070  38 68 61 2f f8 6c 9d 68  63 47 07 80 09 c0 6b e0  |8ha/.l.hcG....k.|
00000080  04 9f 6d eb 70 c5 4b 3f  e0 00 7d 52 4a 45 66 6f  |..m.p.K?..}RJEfo|
00000090  bf 1e f5 41 be f6 50 44  90 02 21 fd 6f 39 bd 71  |...A..PD..!.o9.q|

The three HUS files and the VIP files have a similar first 4 bytes. Should make an easy signature.

With the addition of the TruE mySewnet service and software, the format went through a change and started using the .VP3 extension.

hexdump -C Magic Flower.vp3 | head
00000000  25 76 73 6d 25 00 00 38  00 50 00 72 00 6f 00 64  |%vsm%..8.P.r.o.d|
00000010  00 75 00 63 00 65 00 64  00 20 00 62 00 79 00 20  |.u.c.e.d. .b.y. |
00000020  00 56 00 53 00 4d 00 20  00 53 00 6f 00 66 00 74  |.V.S.M. .S.o.f.t|
00000030  00 77 00 61 00 72 00 65  00 20 00 4c 00 74 00 64  |.w.a.r.e. .L.t.d|
00000040  00 02 00 00 00 0d 64 00  32 00 46 00 6c 00 6f 00  |......d.2.F.l.o.|
00000050  72 00 61 00 6c 00 3b 00  20 00 41 00 62 00 73 00  |r.a.l.;. .A.b.s.|
00000060  74 00 72 00 61 00 63 00  74 00 3b 00 20 00 46 00  |t.r.a.c.t.;. .F.|
00000070  61 00 73 00 68 00 69 00  6f 00 6e 00 00 7c 38 00  |a.s.h.i.o.n..|8.|
00000080  00 6d c4 ff ff 83 c8 ff  ff 92 3c 00 00 06 0e 00  |.m........<.....|
00000090  01 0c 00 01 00 03 00 00  00 0d 10 00 00 00 00 00  |................|

The current version of the Premier+ software produces a file with the extension .VP4.

hexdump -C Magic Flower.vp4 | head
00000000  25 56 70 34 25 01 00 00  00 4d 73 61 0a df 74 29  |%Vp4%....Msa..t)|
00000010  3c 87 6b 44 2c 84 2f 00  3c 7c f7 e7 a0 69 6e 66  |<.kD,./.<|...inf|
00000020  6f 00 00 00 00 45 00 00  00 6e 74 74 6e 00 00 00  |o....E...nttn...|
00000030  00 27 00 00 00 02 00 6e  74 65 73 19 00 46 6c 6f  |.'.....ntes..Flo|
00000040  72 61 6c 3b 20 41 62 73  74 72 61 63 74 3b 20 46  |ral; Abstract; F|
00000050  61 73 68 69 6f 6e 73 74  67 73 00 00 0c 06 00 00  |ashionstgs......|
00000060  01 00 00 00 01 00 c2 fe  e7 fe 3e 01 19 01 00 00  |..........>.....|
00000070  73 62 64 73 00 00 00 00  ab 0c 00 00 01 00 73 62  |sbds..........sb|
00000080  64 6e 00 00 00 00 9d 0c  00 00 00 00 00 00 00 00  |dn..............|
00000090  00 00 00 00 00 00 00 00  00 00 00 6e 74 74 6e 00  |...........nttn.|

There is a great python library which can read in many of these formats and can give us some confirmation of the format headers. It is called pyembroidery and has readers for HUS and VP3.

One other format associated with the Husqvarna Viking Designer 1 model which used a floppy disk is the SHV stitch file. This is a special format which could only be used on the embroidery machine if it was properly formatted. This would put into a structure that would look like this:

The SHV file is the design file with the MHV and PHV are used for display on the embroidery machine. This is what we find when we look at the SHV content:

hexdump -C My Designs/MENU_01/DES01_01.SHV | head
00000000  45 6d 62 72 6f 69 64 65  72 79 20 64 69 73 6b 20  |Embroidery disk |
00000010  63 72 65 61 74 65 64 20  75 73 69 6e 67 20 73 6f  |created using so|
00000020  66 74 77 61 72 65 20 6c  69 63 65 6e 73 65 64 20  |ftware licensed |
00000030  66 72 6f 6d 20 56 69 6b  69 6e 67 20 53 65 77 69  |from Viking Sewi|
00000040  6e 67 20 4d 61 63 68 69  6e 65 73 20 41 42 2c 20  |ng Machines AB, |
00000050  53 77 65 64 65 6e 08 55  6e 74 69 74 6c 65 64 8f  |Sweden.Untitled.|
00000060  a0 47 50 62 7c 00 00 00  00 00 00 00 00 00 00 00  |.GPb|...........|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 09 99  |................|

The MHV and PHV files have the same header so identification of just the SHV design file will be difficult.

I have only scratched the surface with these embroidery formats. The embroidery market was quite large with many different manufacturers. The user base seems to have been quite large, but its reach may be limited to personal collections. I have attempted a signature for the Husqvarna formats, you can find them in my GitHub repository. More to come hopefully!

RCA-VOC

I wonder sometimes what goes through a software/hardware developers mind when deciding a format to use for a new device. There are so many options our there for audio formats to choose from. I am sure there are pros and cons to using one technology over another but it seems a few decide to go ahead and make their own. I am sure there is some commercial advantage to developing a proprietary audio format, but with all the established choices it seems unnecessary.

Sony developed their own audio compression formats, which I explored in an earlier blog post. I came across a small goofy looking RCA voice recorder, model VR6320.

Many of these RCA VR series recorders can record in a WAV or a VOC file format. The WAV files are pretty run of the mill, but the VOC format is unique to RCA recorders.

The VOC format is not to be confused with another audio format with the same extension. The Creative Voice Format is a bit more well known. It was used with the Creative’s sound cards (Sound Blaster family) many folks had in their Windows computers in the 1990’s. But the RCA file format is different, and because of the same extension needs its own identification so they are not confused with each other.

sf REC00001.VOC 
---
siegfried   : 1.10.1
scandate    : 2023-11-19T23:33:47-07:00
signature   : default.sig
created     : 2023-05-12T09:10:13Z
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V112.xml; container-signature-20230510.xml'
---
filename : 'REC00001.VOC'
filesize : 47231
modified : 2015-01-09T20:51:10-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match; possibilities based on extension are fmt/1736'

The RCA VOC file format seems to be undocumented, there isn’t much available. You can always download a copy of the RCA Digital Voice Manager software, which may or may not run on your current system, and convert the VOC files to WAV or you can use a piece of software coded in 2008 called “devoc“. The developer used to have an online website you could upload the VOC to and it would convert it automatically, but is not longer available. The code can also be found here.

Let’s take a look at the header of a couple of the files I have:

hexdump -C REC00001.VOC | head
00000000  56 43 50 31 36 32 5f 56  4f 43 5f 46 69 6c 65 0c  |VCP162_VOC_File.|
00000010  0f 01 09 14 32 1c 00 00  0b 44 03 00 00 00 00 00  |....2....D......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001b0  00 10 00 00 00 00 00 00  00 00 00 10 00 00 00 00  |................|
000001c0  00 00 00 00 00 10 00 00  00 00 00 00 00 00 00 10  |................|
000001d0  00 00 00 00 00 00 00 ff  ff ff ff ff ff ff ff ff  |................|
000001e0  ef 11 14 d3 96 77 57 44  34 33 34 44 43 33 44 43  |.....wWD434DC3DC|
000001f0  43 34 44 34 43 43 34 44  43 43 33 35 43 33 43 34  |C4D4CC4DCC35C3C4|
00000200  34 43 43 24 34 43 43 33  44 51 33 42 14 44 32 43  |4CC$4CC3DQ3B.D2C|

hexdump -C A0000003.VOC | head
00000000  52 50 35 31 32 30 5f 56  4f 43 5f 46 69 6c 65 78  |RP5120_VOC_Filex|
00000010  08 06 16 0a 0f 20 00 04  17 01 03 00 00 00 00 00  |..... ..........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000180  00 03 b9 2f 00 07 62 af  00 0b 0c 2f 00 0e b5 af  |.../..b..../....|
00000190  00 12 5f 2f 00 00 00 00  00 00 00 00 00 00 00 00  |.._/............|
000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000fa0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 e1  |................|
00000fb0  ea eb ea fe df ae 4e a1  1d cd 1c cf 9f de cf 3b  |......N........;|

Most of samples I have show “VCP162_VOC_File” in the header, but I have one sample with “RP5120_VOC_File“. I have heard of others, one being “V432_Voice_File“. There could be more variations. One could assume the header is somehow associated with the model number of the device, but that doesn’t appear to be the case. Although there is a device with the model number “RP 5120“. It might be that the older RP series get one header and the newer VR Series get VCP? I will need more samples to confirm, if you have any send them my way. Also, according to the manuals, there is a SP and LP mode to manage the bitrate of the file to squeeze more minutes on the built in memory of these devices. This doesn’t appear to affect identification, but might be good to differentiate in the future.

For now you can take a look at the signature on my GitHub page.

Adobe Acrobat Capture

During the recent PRONOM Research Week, I noticed a file format with no description and no signature.

x-fmt/217Adobe ACD

All I had to go on was it was an Adobe format and the acronym “ACD”. One of the first results that came up in a google search was a post in the Adobe forums with someone asking what to do with some old ACD and ACI files they found on a disc, circa 2000, labeled “Adobe Capture”. The only thing I remember about Adobe Capture was some scanning tools related to Adobe Acrobat, but I didn’t remember coming across any ACD files related to Acrobat.

Initially it wasn’t easy to find more information on this format. Eventually I was able to narrow it down to stand-alone software adobe released called “Adobe Acrobat Capture”. Originally released in 1995 it was eventually discontinued in 2010. The software was marketed under the ePaper name and connected to Acrobat through the creation of a PDF from scanned images. The software was compatible with many scanner models and would process the scanned images, run Optical Character recognition, and export to a searchable PDF. These tools are built into Adobe Acrobat today.

One of the reasons the software was being so elusive is the fact it was sold with a high price tag and required the use of a hardware key, or dongle, in order to process scans. The hardware key also managed the type of license you purchased which may limit the number of pages you are allowed to scan within a certain period of time. So the software is very difficult to run today, if you do happen to find a copy out there in Internet land.

In order to document these file formats for preservation purposes I needed to find some samples. I was excited to find a demonstration CD on the Internet Archive, but unfortunately it contained no examples of the ACD file format.

A little sleuthing on the Wayback Machine helped me find a few user guides and brochures. I was also able to find there was three versions of Adobe Acrobat Capture. In a Product Brochure, you can see a screenshot of the software with a document open with the ACD extension.

If you are OCD like me you might have noticed the window in this screenshot is typical of the older Windows 3.1 or Windows NT system. So this was indeed an older product released by Adobe.

The Adobe Acrobat Capture 3.0 Demonstration CD-ROM from the Internet Archive luckily has a UserGuide PDF on the disc and was able to help me understand the ACD format a little more.

Looks like the ACD format is an intermediate format used by the software to manage the process between scanning and export to PDF. ACD was also defined as an “Acrobat Capture Document” which makes sense. They were also mentioned as being “multipage files in Acrobat Capture Document (ACD)”. The UserGuide also mentioned an ACP format which it referenced as “one-page files are in Acrobat Capture Page (ACP) format.” So more research is needed.

Lets start with Adobe Acrobat Capture 2.0 as I managed to get a few samples from an installer I found. Here is a hexdump of an ACD file and its corresponding ACI file.

hexdump -C CONTRACT.ACD | head
00000000  02 04 47 47 c9 00 86 b5  01 00 b6 27 02 00 01 00  |..GG.......'....|
00000010  f5 00 5e 00 3b 96 02 00  01 6e 63 6a 00 00 88 68  |..^.;....ncj...h|
00000020  00 00 26 00 44 3a 5c 43  4f 44 45 5c 47 47 5c 50  |..&.D:\CODE\GG\P|
00000030  52 4f 44 55 43 54 2e 33  32 53 5c 49 4e 5c 63 6f  |RODUCT.32S\IN\co|
00000040  6e 74 72 61 63 74 2e 61  63 69 00 00 00 00 00 00  |ntract.aci......|
00000050  7c 33 c0 27 00 40 ff ff  ff 00 03 00 03 00 00 00  ||3.'.@..........|
00000060  00 00 00 00 00 00 40 00  00 00 00 00 00 03 00 00  |......@.........|
00000070  00 00 00 00 00 00 00 40  00 00 00 00 09 00 0a ab  |.......@........|
00000080  04 0b 14 b5 04 39 19 00  40 00 00 00 00 0c 14 b0  |.....9..@.......|
00000090  04 38 19 b0 04 08 00 0a  7f 06 d3 11 89 06 39 17  |.8............9.|

hexdump -C CONTRACT.ACI | head
00000000  49 49 2a 00 b3 0c 02 00  35 80 78 a0 80 35 c0 78  |II*.....5.x..5.x|
00000010  a4 80 35 40 3c 54 40 01  e2 b2 01 e2 b2 01 e2 b2  |..5@<T@.........|
00000020  01 e2 b2 01 e2 b2 01 e2  b2 01 e2 b2 01 e2 b2 01  |................|
00000030  e2 b2 01 e2 b2 01 e2 b2  01 e2 b2 01 e2 b2 01 e2  |................|
00000040  b2 01 e2 b2 01 e2 b2 01  e2 b2 01 e2 b2 01 e2 b2  |................|
00000050  01 e2 b2 01 e2 b2 01 e2  b2 01 e2 b2 01 e2 b2 01  |................|
00000060  e2 b2 01 e2 b2 01 e2 b2  01 e2 b2 01 e2 b2 01 e2  |................|
00000070  b2 01 e2 b2 01 e2 b2 01  e2 b2 01 e2 b2 01 e2 b2  |................|
00000080  01 e2 b2 01 e0 b0 01 e0  b0 01 e0 b0 01 e0 b0 01  |................|
00000090  e0 b0 01 e0 b0 01 e0 b0  01 e0 b0 01 e0 b0 01 e0  |................|

The ACD file is unique, PRONOM and even TrID was unaware of the format. But to the keen observer, the ACI format is very recognizable. You may have seen this header before:

Lets take a closer look at an ACI file to see if they are a true TIFF image or if there is any customization to the format.

tiffinfo CONTRACT.ACI 
=== TIFF directory 0 ===
TIFF Directory at offset 0x20cb3 (134323)
  Subfile Type: (0 = 0x0)
  Image Width: 2544 Image Length: 3295
  Resolution: 300, 300
  Bits/Sample: 1
  Compression Scheme: CCITT RLE
  Photometric Interpretation: min-is-white
  Samples/Pixel: 1
  Rows/Strip: 32
  Planar Configuration: single image plane
  Software: HALO Desktop Imager

exiftool -D CONTRACT.ACI 
    - ExifTool Version Number         : 12.60
    - File Name                       : CONTRACT.ACI
    - Directory                       : TUTORIAL/SAMPOUT
    - File Size                       : 134 kB
    - File Modification Date/Time     : 1995:07:10 16:02:08-06:00
    - File Access Date/Time           : 2023:11:14 15:41:02-07:00
    - File Inode Change Date/Time     : 2023:11:08 08:34:18-07:00
    - File Permissions                : -rwxrwxrwx
    - File Type                       : TIFF
    - File Type Extension             : tif
    - MIME Type                       : image/tiff
    - Exif Byte Order                 : Little-endian (Intel, II)
  254 Subfile Type                    : Full-resolution image
  256 Image Width                     : 2544
  257 Image Height                    : 3295
  258 Bits Per Sample                 : 1
  259 Compression                     : CCITT 1D
  262 Photometric Interpretation      : WhiteIsZero
  273 Strip Offsets                   : (Binary data 625 bytes, use -b option to extract)
  277 Samples Per Pixel               : 1
  278 Rows Per Strip                  : 32
  279 Strip Byte Counts               : (Binary data 448 bytes, use -b option to extract)
  282 X Resolution                    : 300
  283 Y Resolution                    : 300
  305 Software                        : HALO Desktop Imager
    - Image Size                      : 2544x3295
    - Megapixels                      : 8.4

Looks like a true TIFF image with no special tags or unique properties. They are 1-bit TIFF’s compressed with CCITT RLE. Not sure there would be any need to create a special signature for these ACI files.

Looking closer at the ACD file format, we can see they reference ACI files, so probably safe to assume the ACD file doesn’t contain the full raster data for each image:

hexdump -C Report.acd
00000000  02 04 47 47 c9 00 9a 8b  00 00 d4 ce 00 00 03 00  |..GG............|
00000010  f5 02 5f 00 00 61 01 00  01 6e 63 6a 01 00 30 5f  |.._..a...ncj..0_|
00000020  00 00 27 00 63 3a 5c 63  61 70 74 75 72 65 32 5c  |..'.c:\capture2\|
00000030  73 61 6d 70 6c 65 73 5c  6f 75 74 5c 52 65 70 6f  |samples\out\Repo|
00000040  72 74 5f 30 30 30 31 2e  61 63 69 00 00 01 00 00  |rt_0001.aci.....|
00000050  00 00 00 00 00 00 00 00  00 00 e8 03 00 00 01 00  |................|
00000060  01 00 00 00 00 00 00 00  00 00 08 00 52 65 70 6f  |............Repo|
00000070  72 74 30 31 00 00 00 00  70 33 d8 27 00 40 ff ff  |rt01....p3.'.@..|
*
00005f40  07 00 40 6f 00 09 00 40  01 6e 63 6a 02 00 52 2c  |..@o...@.ncj..R,|
00005f50  00 00 27 00 63 3a 5c 63  61 70 74 75 72 65 32 5c  |..'.c:\capture2\|
00005f60  73 61 6d 70 6c 65 73 5c  6f 75 74 5c 52 65 70 6f  |samples\out\Repo|
00005f70  72 74 5f 30 30 30 32 2e  61 63 69 00 00 00 00 00  |rt_0002.aci.....|
00005f80  00 00 00 00 4e 0c fe ff  ff ff e8 03 00 00 01 00  |....N...........|
00005f90  01 00 00 00 00 00 00 00  00 00 08 00 52 65 70 6f  |............Repo|
00005fa0  72 74 30 32 00 00 00 00  4c 31 f0 27 00 40 ff ff  |rt02....L1.'.@..|

From the limited sample set I have access, all the ACD files begin with the same Hex values, “02044747C900”. Along with the common header we can assume there should be at least one ACI file referenced in the first part of the file. Because it is referenced as a filepath, the ACI string would be variable in its offset.

Adobe Acrobat Capture 3.0 turns out to be a different format. But looks familiar………

hexdump -C Contract.acd | head
00000000  50 4b 03 04 14 00 00 00  08 00 3b ba 6e 57 23 9d  |PK........;.nW#.|
00000010  8e b8 3d 00 00 00 3e 00  00 00 09 00 40 00 46 49  |..=...>.....@.FI|
00000020  4c 45 53 2e 4c 53 54 0a  00 20 00 00 00 00 00 00  |LES.LST.. ......|
00000030  00 00 00 80 e6 e9 ca 50  17 da 01 80 e6 e9 ca 50  |.......P.......P|
00000040  17 da 01 80 e6 e9 ca 50  17 da 01 4e 55 18 00 4e  |.......P...NU..N|
00000050  55 43 58 09 00 46 00 49  00 4c 00 45 00 53 00 2e  |UCX..F.I.L.E.S..|
00000060  00 4c 00 53 00 54 00 8b  76 74 76 31 8c e5 e5 f2  |.L.S.T..vtv1....|
00000070  0c 76 f6 f7 0d f0 0f f6  0c 71 b5 0d 09 0a 75 e5  |.v.......q....u.|
00000080  e5 f2 0b f5 75 f3 f4 71  0d b6 35 e4 e5 02 31 fc  |....u..q..5...1.|
00000090  1c 7d 5d 0d 6d 9d f3 f3  4a 8a 12 93 4b f4 12 93  |.}].m...J...K...|

sf Contract.acd 
---
siegfried   : 1.10.1
scandate    : 2023-11-15T09:10:01-07:00
signature   : default.sig
created     : 2023-10-11T15:10:17-06:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V114.xml; container-signature-20230822.xml'
---
filename : 'Contract.acd'
filesize : 79002
modified : 2023-11-14T23:17:53-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/263'
    format  : 'ZIP Format'
    version : 
    mime    : 'application/zip'
    basis   : 'byte match at [[0 4] [78886 3] [78980 4]]'
    warning : 'extension mismatch'

Yep, its a zip container file. lets take a peek inside to see what it is composed of.

7z l Contract.acd 
--
Path = Contract.acd
Type = zip
Physical Size = 79002

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2023-11-14 23:17:54 ....A           62           61  FILES.LST
2023-11-14 23:17:54 ....A          410          226  Contract.acd
2023-11-14 23:17:52 ....A       150213        78093  Contract.acp
------------------- ----- ------------ ------------  ------------------------
2023-11-14 23:17:54             150685        78380  3 files

The the Contract ACD file is like a nesting doll, an ACD within an ACD. Lets see what the ACD and ACP is made of.

hexdump -C Contract.acd | head
00000000  00 01 00 00 00 02 04 47  47 2d 01 9a 01 00 00 02  |.......GG-......|
00000010  00 00 00 02 00 01 01 00  00 00 01 00 00 00 04 04  |................|
00000020  00 00 00 09 00 57 69 6e  67 64 69 6e 67 73 05 00  |.....Wingdings..|
00000030  41 72 69 61 6c 0b 00 43  6f 75 72 69 65 72 20 4e  |Arial..Courier N|
00000040  65 77 0f 00 54 69 6d 65  73 20 4e 65 77 20 52 6f  |ew..Times New Ro|
00000050  6d 61 6e 05 01 00 00 00  02 00 00 00 78 01 00 00  |man.........x...|
00000060  0f 00 54 69 6d 65 73 20  4e 65 77 20 52 6f 6d 61  |..Times New Roma|
00000070  6e 00 00 00 20 0b 00 00  c0 0a 00 00 00 00 00 00  |n... ...........|
00000080  00 06 00 00 00 0f 00 54  69 6d 65 73 20 4e 65 77  |.......Times New|
00000090  20 52 6f 6d 61 6e 00 00  00 20 0c 00 00 00 0c 00  | Roman... ......|

hexdump -C Contract.acp | head
00000000  25 50 44 46 2d 31 2e 33  0d 25 e2 e3 cf d3 0d 0a  |%PDF-1.3.%......|
00000010  31 20 30 20 6f 62 6a 0d  3c 3c 20 0d 2f 54 79 70  |1 0 obj.<< ./Typ|
00000020  65 20 2f 43 61 74 61 6c  6f 67 20 0d 2f 50 61 67  |e /Catalog ./Pag|
00000030  65 73 20 32 20 30 20 52  20 0d 2f 53 74 72 75 63  |es 2 0 R ./Struc|
00000040  74 54 72 65 65 52 6f 6f  74 20 34 20 30 20 52 20  |tTreeRoot 4 0 R |
00000050  0d 2f 43 41 50 54 5f 49  6e 66 6f 20 3c 3c 20 2f  |./CAPT_Info << /|
00000060  56 20 33 30 31 20 2f 46  53 20 5b 20 28 57 69 6e  |V 301 /FS [ (Win|
00000070  67 64 69 6e 67 73 29 28  41 72 69 61 6c 29 28 43  |gdings)(Arial)(C|
00000080  6f 75 72 69 65 72 20 4e  65 77 29 28 54 69 6d 65  |ourier New)(Time|
00000090  73 20 4e 65 77 20 52 6f  6d 61 6e 29 5d 20 2f 4c  |s New Roman)] /L|

The ACD has some of the same hex values as the previous version, but with some extra bytes at the beginning and it looks like the ACP is a straight up PDF. But may have some interesting tags, like “CAPT_info”.

The problem we will face when trying to write a signature for this version of ACD is the container signature needs a static file name to reference, and it appears the name of the container is also the name of the ACD file within the container. So every file will be different. I wish there was a way in the PRONOM signature syntax to reference an extension and ignore the filename, but currently there no method to do this. The only thing inside the container which seems to be consistent is the file “FILES.LST”. So lets take a peek inside if it.

hexdump -C FILES.LST | head
00000000  5b 41 43 44 31 5d 0d 0a  49 53 43 4f 4d 50 4f 53  |[ACD1]..ISCOMPOS|
00000010  49 54 45 3d 54 52 55 45  0d 0a 4e 55 4d 46 49 4c  |ITE=TRUE..NUMFIL|
00000020  45 53 3d 31 0d 0a 46 49  4c 45 4e 41 4d 45 31 3d  |ES=1..FILENAME1=|
00000030  43 6f 6e 74 72 61 63 74  2e 61 63 70 0d 0a        |Contract.acp..|

Ok, there seems to be some static information that is unique to the ACD format. I bet the string “[ACD1]” would be sufficient enough to make a solid signature.

This is a good format example of a limited amount of information on the file format used by a well known company which has become obsolete and disappeared. Take a look at my signatures, maybe you have some old ACD files you were unaware of!

Multiplan

This is a follow up post to the post “EARLY MICROSOFT EXCEL” earlier this year.

I have to admit, often when I am researching file formats I can get distracted by a shinier format I come across. I often go down rabbit holes and forget the reason I started down the path I am on. I try and focus on the current needs in my life as a Digital Preservation Manager, but can get easily sidetracked. I always look forward to November every year so I can celebrate World Digital Preservation Day which sometimes comes along with a PRONOM research week. This gives me a chance to look at formats that may need attention which are not normally on my radar.

This week I a taking a look again at Multiplan. There is a PRONOM PUID for version 4, but does not have a description nor does it have a binary signature. It is was also lacking a File Format Wiki entry. So I decided to dive in. I had already bumped into the format while doing some research on early Microsoft Excel formats. This includes the SYLK format which needed a little update.

Microsoft Multiplan was the parent of Microsoft Excel. Multiplan was built for many different types of computers in the 1980’s, but was never ported to Windows. So to use Multiplan you have to be comfortable with using DOS. If you want to take Multiplan for a spin, head over to PCjs Machines and load up one of the many emulated systems they have.

In the end, Multiplan had four versions, but the last one, version 4.2, had some big changes, especially to the file format. More on that in a minute.

Mutiplan Version 1 – DOS

hexdump -C MP1.MOD  | head
00000000  08 e7 00 00 58 09 01 00  08 00 01 00 00 00 0a 00  |....X...........|
00000010  40 00 00 00 2e f5 0a 80  27 07 94 00 12 00 01 00  |@.......'.......|
00000020  0a 00 01 00 0c 0a 08 00  27 00 0d 80 04 00 01 00  |........'.......|
00000030  54 00 00 00 27 00 10 00  54 52 41 4e 53 46 48 f5  |T...'...TRANSFH.|
00000040  00 80 84 0a 68 61 52 f5  58 f5 5a f5 4e f5 0c 0a  |....haR.X.Z.N...|
00000050  12 00 01 00 72 f5 72 f5  0a 80 4b 0b 0f 00 12 00  |....r.r...K.....|
00000060  0c 0a 01 00 0c 00 01 00  08 00 20 4e 40 00 09 00  |.......... N@...|
00000070  8a f5 0a 80 4a 07 30 00  48 00 01 00 20 4e 00 00  |....J.0.H... N..|
00000080  28 0c 18 00 04 00 0d 80  03 00 28 0c 04 00 00 00  |(.........(.....|
00000090  26 00 00 00 54 52 d0 01  00 00 a4 f5 0a 00 62 0b  |&...TR........b.|

Mutiplan Version 1 – Macintosh

hexdump -C Multiplan1  | head
00000000  11 ab 00 00 13 e8 00 00  00 00 00 00 00 02 02 8c  |................|
00000010  00 18 00 0e 02 a4 02 b2  00 0e 02 fe 00 03 00 0e  |................|
00000020  00 bd 01 e3 2f 0f 00 08  15 5e 19 d1 03 5e 19 dd  |..../....^...^..|
00000030  61 60 60 5e 16 90 00 67  60 60 60 8f 5f 03 e8 7a  |a``^...g```._..z|
00000040  30 61 60 60 13 5f 03 e8  7b 90 00 67 60 60 60 8f  |0a``._..{..g```.|
00000050  16 85 67 60 60 60 8f 16  6d 85 61 60 60 13 5e 10  |..g```..m.a``.^.|
00000060  7b 90 00 67 60 60 60 8f  13 7a 31 14 6a d7 16 6e  |{..g```..z1.j..n|
00000070  85 14 77 60 16 6f 85 67  60 60 60 90 00 67 60 60  |..w`.o.g```..g``|
00000080  60 90 00 67 60 60 60 8f  13 7a 31 14 6a d7 16 70  |`..g```..z1.j..p|
00000090  85 14 77 60 16 71 85 67  60 60 60 90 00 67 60 60  |..w`.q.g```..g``|

Mutiplan Version 2 – DOS

hexdump -C MP2.MOD | head
00000000  0c ec 00 00 08 ab 08 00  1f 00 1a 00 03 00 27 03  |..............'.|
00000010  4b 05 00 00 00 00 00 00  00 00 00 1d c8 14 03 00  |K...............|
00000020  00 00 2f 00 9a 2e b3 fc  46 02 34 04 f3 16 00 00  |../.....F.4.....|
00000030  00 00 00 00 08 00 10 22  00 00 0d 06 84 1d 08 1d  |......."........|
00000040  ff 03 83 0a c8 18 48 1a  02 19 00 00 00 00 15 1b  |......H.........|
00000050  98 15 85 15 03 00 2a 00  00 37 46 32 1c 00 18 00  |......*..7F2....|
00000060  00 00 01 00 01 00 a9 03  0f 80 e8 14 00 00 01 00  |................|
00000070  6a 1c 00 00 01 00 0d 00  0f 80 0a 15 00 00 77 20  |j.............w |
00000080  00 00 01 00 6e 61 6c 20  00 00 2a 00 00 00 04 00  |....nal ..*.....|
00000090  00 00 0d 00 14 19 00 00  d4 06 0e 80 24 15 00 00  |............$...|

The DOS files for Version 2 begin with 0CEC0000 08AB0800, but a file for the Xenix system starts with 0AEC0000 08AB0800. So it appears the first byte may be different depending on the system.

hexdump -C MP3.MOD | head                         
00000000  0c ed 00 00 08 ab 08 00  1f 00 1a 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000110  00 00 02 00 01 00 00 00  00 00 ff 0f ff 00 00 00  |................|
00000120  00 00 05 00 06 00 46 00  36 00 42 00 00 00 00 00  |......F.6.B.....|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 01 00 00  |................|
00000150  00 fe 0f 00 fe 00 00 00  00 00 00 00 00 00 00 00  |................|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The DOS files for Version 3 begin with a similar hex pattern, 0CED0000 08AB0800. This would make sense as the documentation for Multiplan 4.2 states it supports opening of Version 2 & 3, but not Version 1.

There was also a companion product that went along with Multiplan, it was called Microsoft Chart. Here is a file from version 3:

hexdump -C EXAMPLE1.MC | head
00000000  90 01 00 00 08 ab 00 00  00 00 00 00 00 00 04 00  |................|
00000010  80 00 05 00 04 00 43 10  00 00 00 00 00 00 00 00  |......C.........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  e8 ff 04 00 00 22 24 36  a4 1f 00 00 11 00 24 00  |....."$6......$.|
00000090  03 00 64 00 00 00 cc 0c  cc 0c cc 0c cc 0c 00 00  |..d.............|
000000a0  00 00 00 00 ff 7f 00 00  01 f0 00 00 00 5f 00 00  |............._..|
000000b0  00 00 a2 ff a0 ff 00 00  01 f0 01 00 64 00 00 00  |............d...|
000000c0  00 00 01 f0 00 00 01 70  00 00 8e ff 8c ff 00 ff  |.......p........|
000000d0  0d 00 00 00 20 14 01 00  00 00 00 00 00 00 00 00  |.... ...........|

The Chart file format has a similar byte pattern with the 08AB pattern and looks similar to the BIFF format. We will have to make sure it doesn’t conflict with any signatures so it can be identified separately.

Version 4 of Multiplan was the first to use the BIFF (Binary Interchange File Format). Technically Version BIFF2, not much is know about BIFF1 or if it ever existed. BIFF2 is the exact same format as Excel 2.0 used, so there will be some problems if we want to identify them separately. They currently identify as fmt/55.

hexdump -C MP4.MOD | head
00000000  09 00 04 00 40 01 10 00  42 00 02 00 b5 01 66 00  |....@...B.....f.|
00000010  1b 00 00 00 00 00 00 00  00 00 ff ff 0f 01 00 01  |................|
00000020  00 01 00 01 00 00 00 00  00 00 00 00 00 0d 00 02  |................|
00000030  00 01 00 0e 00 02 00 01  00 0f 00 02 00 00 00 11  |................|
00000040  00 02 00 00 00 2a 00 02  00 00 00 6b 00 13 00 01  |.....*.....k....|
00000050  00 00 00 00 00 fe 0f 00  fe 40 02 e0 3d d0 2f 00  |.........@..=./.|
00000060  01 00 26 00 08 00 00 00  00 00 00 00 e0 3f 27 00  |..&..........?'.|
00000070  08 00 00 00 00 00 00 00  e0 3f 28 00 08 00 00 00  |.........?(.....|
00000080  00 00 00 00 f0 3f 29 00  08 00 00 00 00 00 00 00  |.....?).........|
00000090  f0 3f 70 00 0b 00 00 00  2e 00 02 04 f0 0a 00 f0  |.?p.............|

hexdump -C EXCEL2.XLS | head
00000000  09 00 04 00 02 00 10 00  0b 00 10 00 71 02 00 00  |............q...|
00000010  01 00 29 00 06 03 00 00  dc 0d 00 00 0c 00 02 00  |..).............|
00000020  64 00 0d 00 02 00 01 00  0e 00 02 00 01 00 0f 00  |d...............|
00000030  02 00 01 00 10 00 08 00  fc a9 f1 d2 4d 62 50 3f  |............MbP?|
00000040  11 00 02 00 00 00 22 00  02 00 00 00 40 00 02 00  |......".....@...|
00000050  00 00 2a 00 02 00 00 00  2b 00 02 00 00 00 25 00  |..*.....+.....%.|
00000060  02 00 2c 01 31 00 09 00  c8 00 00 00 04 48 65 6c  |..,.1........Hel|
00000070  76 32 00 0e 00 00 00 00  00 00 00 90 01 00 00 00  |v2..............|
00000080  00 00 8d 31 00 09 00 c8  00 01 00 04 48 65 6c 76  |...1........Helv|
00000090  32 00 0e 00 00 00 00 00  00 00 bc 02 00 00 00 00  |2...............|

You can see in the hex values above a difference of two bytes in the header. The reason the Multiplan file identifies as an Excel 2 file is the PRONOM signature ignores those two bytes and allows them to be anything. Some specifications say these aren’t used, but clearly there is a use for them. We could probably use the same signature for Multiplan, but include the two bytes, then set the priority to the Multiplan signature.

Multiplan 4.2 is very different.

hexdump -C MP42.MOD | head
00000000  0c ef 4d 50 a4 01 00 00  00 00 00 00 00 00 00 00  |..MP............|
00000010  00 00 00 00 00 00 80 02  00 00 00 00 00 00 00 2e  |................|
00000020  ff 0f ff 00 01 00 d0 02  d0 02 a0 05 a0 05 d0 2f  |.............../|
00000030  e0 3d 40 02 09 00 03 00  02 04 0a 00 00 00 fe 0f  |.=@.............|
00000040  00 fe 00 00 01 00 01 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  01 00 00 00 06 01 15 50  |...............P|
00000060  05 00 00 00 00 00 00 00  06 00 13 00 07 00 07 00  |................|
00000070  00 00 00 00 00 00 08 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The hex values for the first 4 bytes have a similar pattern. 0CEF, Which seems to be in sequence where Version 3 left off. Microsoft calls this new format, New or Normal Binary File Format. They claim it is “the fastest loading and fastest saving file format ever“! Exciting as the new format probably was, it didn’t last long. Multiplan was phased out so Excel could shine.

When I was younger I didn’t use DOS very often because the computer my father brought home in the mid 1980’s was a Macintosh. I use DOS more now in my research then I did when I was younger. Using the DOS interface is not easy. There are a lot of key commands you need to know intuitively just to navigate, but it is fascinating to see how far software has come. Early Excel, Multiplan, and Chart were all intertwined, but hopefully combing through all of these samples can bring some clarity. Take a look at the draft signature I made and all the samples that go with it on my GitHub page.

Composite File Management System

In honor of World Digital Preservation Day, I wanted to write a little about format headers, the magic that makes some files more easily identifiable than others.

When it comes to binary file formats, some developers decide to make the format clearly identifiable in a header and others choose to make it ambiguous. Others have a little fun with leaving little clues and references to popular culture.

A couple of my favorites based on their header.

A couple of my current least favorites:

Like I said some developers make it very obvious what software created the file format and others seem to make things difficult. I understand there is a need to optimize files to keep them from getting bloated and taking up too much space, but many of the size limits from the early days of computing are not an issue anymore. Can’t we be more clear when designing a file format?

Today I want to document one format which was very easy to identify as it spelled out its format very verbosely, but because of the lack of additional documentation makes it very hard to preserve.

Meet the Composite File Management System file format:

hexdump -C sample.br4
00000000  43 43 6d 46 20 2d 20 55  6e 69 76 65 72 73 61 6c  |CCmF - Universal|
00000010  20 2d 20 41 78 69 6f 6d  20 2d 20 41 47 50 20 2d  | - Axiom - AGP -|
00000020  20 43 6f 6d 70 6f 73 69  74 65 20 46 69 6c 65 20  | Composite File |
00000030  4d 61 6e 61 67 65 6d 65  6e 74 20 53 79 73 74 65  |Management Syste|
00000040  6d 20 28 55 6e 69 76 65  72 73 61 6c 29 20 2d 20  |m (Universal) - |
00000050  43 72 65 61 74 65 64 20  62 79 20 41 6e 64 72 65  |Created by Andre|
00000060  61 20 50 65 73 73 69 6e  6f 2c 20 44 65 63 65 6d  |a Pessino, Decem|
00000070  62 65 72 20 31 39 39 35  20 28 76 65 72 73 2e 20  |ber 1995 (vers. |
00000080  35 29 20 2d 20 43 6f 70  79 72 69 67 68 74 28 63  |5) - Copyright(c|
00000090  29 20 31 39 39 35 2d 39  36 20 62 79 20 4d 65 74  |) 1995-96 by Met|
000000a0  61 54 6f 6f 6c 73 2c 20  49 6e 63 2e 20 2d 20 50  |aTools, Inc. - P|
000000b0  72 6f 75 64 6c 79 20 6d  61 64 65 20 69 6e 20 74  |roudly made in t|
000000c0  68 65 20 55 53 41 2c 20  6c 61 6e 64 20 6f 66 20  |he USA, land of |
000000d0  74 68 65 20 66 72 65 65  2c 20 68 6f 6d 65 20 6f  |the free, home o|
000000e0  66 20 74 68 65 20 62 72  61 76 65 2e 00 00 00 00  |f the brave.....|
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Where to start? First off, this is the Bryce 4 file format. Bryce was a 3D modeling, animation software developed by MetaTools, later MetaCreations. Metacreations was also the developer of popular software Ray Dream Studio/Infini DFractal Design Painter, and Kai’s Power Tools.

Secondly, this format refers to a Universal File Management System or CCmF, which I have found to be the file format for many other extensions, some of which are .goo, .brc, .br3, .br4, .br5, .sfp, .shp, .obp. It doesn’t always have the verbose header, some of them have the following:

hexdump -C Tutorial.obp | head
00000000  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000050  20 20 20 20 20 20 20 20  20 20 20 20 20 20 43 43  |              CC|
00000060  6d 46 69 6c 65 3a 3a 6b  49 64 65 6e 74 69 66 79  |mFile::kIdentify|
00000070  34 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |4               |
00000080  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

Different, but still contains the CCmF identification string. Others have the verbose header, but further down inside the file.

With this format being used with so many well known software titles, I assumed information on the format would we readily available. Alas, not so much. The format even had the name of the creator! “Created by Andrea Pessino, December 1995”. So I reached out. He was on Twitter and I asked about the file format and if there was any documentation available. Twitter (X) has since deleted his responses after he deleted his account, but he told me he wasn’t sure where the documentation might be. One other developer also commented and confirmed they didn’t know where any of the documentation went after they left.

MetaCreations sold Bryce to Corel in 2000, then in 2004 sold it to Daz3D, the current owners. It’s not actively developed anymore being that it was never made into a 64bit application. A blog post explains the format a little more, but concludes it is a secret known only to Daz.

It seems there is a community who would like to see Bryce more open, maybe even open-sourced. This thread discusses the format and the underlying Axiom format used.

The creator Andrea Pessino was able to track down some documentation on the CCmF file structure for me. He explained Axiom was an entire codebase for all MetaTools/Creations applications and plugins. So the CCmF system was more than a file format. The documentation included some information on versioning of a CCmF.

There seems to be a few versions of the CCmF file structure.

  • CCmFile::kIdentify which corresponds with December 1995 (vers. 5)
  • CCmFile::kIdentify2 which corresponds with March 1997 (vers. 7)
  • CCmFile::kIdentify3 which corresponds with October 1998 (vers. 9)
  • CCmFile::kDfFormat which is a Generic Composite File

The documentation given to me was up to date for 1998, but after Corel purchased Bryce there was some updates made as many material files have the identifier “CCmFile::kIdentify4“.

Bryce 6 & 7 were released by Daz3D and have a different file header. They have the extension .BR6 & .BR7 with the header:

hexdump -C Bryce7-s01.br7 | head  
00000000  42 72 79 63 65 5f 36 2e  30 5f 46 69 6c 65 00 00  |Bryce_6.0_File..|
00000010  11 00 00 00 d4 07 00 00  00 20 00 00 e5 07 00 00  |......... ......|
00000020  00 0a 00 00 00 10 00 00  00 08 78 9c 63 64 60 60  |..........x.cd``|
00000030  60 04 e2 8c cc f4 0c 85  e4 9c fc d2 14 85 92 d4  |`...............|
00000040  8a 92 d2 a2 54 86 11 05  18 a1 18 04 82 76 c8 b5  |....T........v..|
00000050  be 0e 7c 60 8f 4e 93 67  f2 07 32 f5 d1 0e 30 31  |..|`.N.g..2...01|
00000060  40 fc ca 0c c5 60 bf 33  a2 ab da e2 8c c0 70 e0  |@....`.3......p.|
00000070  00 22 58 a0 9c ff 2a 40  fc bf 16 88 ff c3 c3 2e  |."X...*@........|
00000080  13 64 20 83 82 13 50 29  50 ad 17 50 ef 3c 20 ce  |.d ...P)P..P.< .|
00000090  72 66 64 86 19 31 cd 09  42 57 b9 80 71 43 9d 0b  |rfd..1..BW..qC..|

I still need to gather more samples from the various extensions related to this format and the software related to them. More work to do understanding the different uses of the short CCmFile string and the more detailed header and the differences between objects, materials, and models. When I asked Andrea why he used such a verbose file header, his answer was basically, why not!

Apple Mail

There really is no “Macintosh Format”, but there sure are a lot of formats you only find on the MacOS. From Resource Forks and iWork formats to unique sound formats, MacOS has them all! Majority of cross-platform software vendors have done a much better job in recent years in making their file formats the same across platforms, but for Apple, they love to make things unique, just for their platform.

Take EMLX for example. Seems to be a trend to add “X” to the end of an older format to breath new life into it. The EML format, or Electronic Mail, has existed for a few decades now, but in 2005 Apple updated their Apple Mail application to use a new format, EMLX.

As far as I know, Apple hasn’t released any documentation on the EMLX format, but many folks out there have asked the question and have been able to “reverse engineer” the format. Lets take a look.

An EMLX file consists of three parts:

  • bytecount on first line;
  • email content in MIME format (headers, body, attachments);
  • Apple property list (plist) with metadata.

The bytecount is a variable number which consists of the total bytes starting from the start of the MIME format, including HTML, to the start of the XML property list. Lets look at a simple EMLX.

The byte count is on line 1 with the MIME email (EML) taking up the 556 bytes, then the XML plist at the end. You may ask, what is a plist? Well, it is another Apple (originally NextStep) invention which is embedded throughout the MacOS operating system. A Plist is usually an XML with keys but can also be in a binary format. The Plist can contain properties of the email within Apple Mail like special color flags, tagged as junk, date received and last reviewed.

If you do happen across an EMLX file or group of them, there are a few tools you can use to convert them to a plain old EML. There are python libraries or many other tools to do the job.

But first we need to be sure of identification beyond the extension. Adding this file format to PRONOM would help in identification for preservation purposes. If ran through PRONOM today we get:

filename : '9.emlx'
filesize : 18582
modified : 2023-10-26T22:16:25-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/950'
    format  : 'MIME Email'
    version : '1.0'
    mime    : 'message/rfc822'
    class   : 'Text (Structured)'
    basis   : 'byte match at [[31 17] [599 4] [339 6] [426 6] [90 14]]'
    warning : 'extension mismatch'

Because the format has a EML plain text format within its structure, it is assumed to be an EML file. While technically accurate, Identifying as a unique EMLX format would be beneficial in a preservation system so you can properly assign risk and choose the right tool to parse or migrate.

In looking at the three parts of an EMLX format, we know the EML file is not a good way to show the difference as they are the same structure. The byte count on the first line is variable, so there is no static byte sequence to use for identification. That leaves the Plist section at the end to distinguish the difference.

The PRONOM entry for a Plist looks for the typical XML strings present in most XML files, but then uses the root element “<plist version=”1.0″>” for identification. We could combine the existing EML signature and the Plist signature to identify an EMLX, or just take the existing EML signature and put in a small byte sequence for the closing of the </plist> tag near the EOF? There would be a need for a priority over EML, both would essentially accomplish the same thing.

Take a look at latter idea on my GitHub page and tell me which makes the most sense.