UFO

August 2, 2024 by Thor Leave a comment

Researching file formats isn’t for everyone. Others may find it boring or even odd. Trying to explain to others the nuances of a binary format versus a container format would bring many tears. Their reactions sometimes are similar to hearing someone explain their belief in aliens. Passionate, but a bit on the crazy side.

from Imgflip Meme Generator

So with aliens and containers on my mind, let’s take a look at a format with the extension UFO. It is not an unidentified flying object or a UAP, it may as well be an unidentified file object, but in this case it is a “Ulead File for Objects” format. It is the exclusive file format for use with the PhotoImpact software from Ulead Systems, a Taiwanese developer known for many popular software programs. First released in 1996 with version 3, the PhotoImpact software was marketed as “a fully object-based tool, which pioneered a number of important innovations“.

The reason it was a considered a full object-based tool was the UFO format is based on the, at the time, popular OLE Compound File Storage object format developed by Microsoft. So by using some OLE tools we can take a closer look at some of these Unidentified File Objects……..

oleid Sample.ufo 
oleid 0.60.1 - http://decalage.info/oletools
THIS IS WORK IN PROGRESS - Check updates regularly!
Please report any issue at https://github.com/decalage2/oletools/issues

Filename: Sample.ufo
--------------------+--------------------+----------+--------------------------
Indicator           |Value               |Risk      |Description               
--------------------+--------------------+----------+--------------------------
File format         |Generic OLE file /  |info      |Unrecognized OLE file.    
                    |Compound File       |          |Root CLSID:  - None       
                    |(unknown format)    |          |                          
--------------------+--------------------+----------+--------------------------
Container format    |OLE                 |info      |Container type            
--------------------+--------------------+----------+--------------------------
Encrypted           |False               |none      |The file is not encrypted 
--------------------+--------------------+----------+--------------------------
VBA Macros          |No                  |none      |This file does not contain
                    |                    |          |VBA macros.               
--------------------+--------------------+----------+--------------------------
XLM Macros          |No                  |none      |This file does not contain
                    |                    |          |Excel 4/XLM macros.       
--------------------+--------------------+----------+--------------------------
External            |0                   |none      |External relationships    
Relationships       |                    |          |such as remote templates, 
                    |                    |          |remote OLE objects, etc   
--------------------+--------------------+----------+--------------------------

Well, it is a OLE file, but is unrecognized/unidentified by the oletools software. It also appears to be missing the root entry and CLSID you commonly find in OLE files. Since this is an OLE container we can also just use 7zip to peek inside as well.

Path = Sample.ufo
Type = Compound
Physical Size = 937984
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1999-05-25 03:33:05 D....                            OS-3
1999-05-25 03:33:04 D....                            OS-1
1999-05-25 03:33:03 D....                            OS-0
                    .....        31122        31232  OS-0/ObjectImage
                    .....         1316         1344  OS-0/ObjectData
                    .....       137996       138240  OS-0/PathStream
                    .....        19591        19968  OS-0/ObjectMask0
1999-05-25 03:33:05 D....                            OS-2
                    .....        43405        43520  OS-2/ObjectImage
                    .....         1316         1344  OS-2/ObjectData
                    .....       176204       176640  OS-2/PathStream
                    .....        25524        25600  OS-2/ObjectMask0
                    .....        41588        41984  OS-1/ObjectImage
                    .....         1316         1344  OS-1/ObjectData
                    .....       170132       170496  OS-1/PathStream
                    .....        25221        25600  OS-1/ObjectMask0
                    .....        34505        34816  LtfMainImage
                    .....          656          704  LtfHeader
1999-05-25 03:33:06 D....                            OS-4
                    .....        19249        19456  OS-4/ObjectImage
                    .....         1316         1344  OS-4/ObjectData
                    .....         4842         5120  LtfPreviewImage
                    .....         1160         1216  LtfObjectList
                    .....        31753        32256  OS-3/ObjectImage
                    .....         1316         1344  OS-3/ObjectData
                    .....       131892       132096  OS-3/PathStream
                    .....        19439        19456  OS-3/ObjectMask0
------------------- ----- ------------ ------------  ------------------------
1999-05-25 03:33:06             920859       925120  22 files, 5 folders

In this sample file, we have a bunch of directories and objects, but none of what we expect to see in an OLE file, such as a “SummaryInformation” or “DocumentSummaryInformation” like we would see in a Word DOC file. By not having the standard contents of the container, it makes these files very specific to PhotoImpact software.

Path = PhotoImpactX3-s01.ufo
Type = Compound
Physical Size = 5120
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....           20           64  HotspotStream
                    .....          656          704  LtfHeader
                    .....           20           64  SliceInfoStream
                    .....          412          448  LtfPreviewImage
                    .....          714          768  WebPropStream
                    .....           20           64  ManualHotspotScriptInfoStream
                    .....           20           64  ObjectHotspotScriptInfoStream
------------------- ----- ------------ ------------  ------------------------
                                  1862         2176  7 files

Here is another UFO file from the last version of the software PhotoImpact X3 when it was owned by Corel, but phased out in 2009. This is the basic file structure with no objects added to the file. We can be fairly confident these are the base files in most every other UFO file. It doesn’t have any of the “OS” folders which contain the objects, so I think the LtfHeader file might be our best bet for a signature. Let’s take a look at the Hex values for a few of them.

hexdump -C PhotoImpactX3-s01/LtfHeader| head
00000000  90 02 00 00 4c 54 46 00  58 02 00 00 02 00 ba dc  |....LTF.X.......|
00000010  ee 02 00 00 26 02 00 00  80 fc 0a 00 80 fc 0a 00  |....&...........|
00000020  00 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C Sample/LtfHeader| head
00000000  90 02 00 00 4c 54 46 00  90 01 00 00 02 00 f7 bf  |....LTF.........|
00000010  90 01 00 00 90 01 00 00  80 fc 0a 00 80 fc 0a 00  |................|
00000020  00 00 00 00 06 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C v3/ANIMALS/LtfHeader| head
00000000  90 02 00 00 4c 54 46 00  64 00 00 00 02 00 6e 00  |....LTF.d.....n.|
00000010  40 01 00 00 c8 00 00 00  80 fc 0a 00 80 fc 0a 00  |@...............|
00000020  00 00 00 00 11 00 00 00  01 00 00 00 01 00 00 00  |................|
00000030  01 00 00 00 60 00 00 00  3c 00 00 00 60 00 00 00  |....`...<...`...|

Making a signature using the first 8 bytes of the LtfHeader file appears to have worked for all the 3,400+ sample files I have collected. Problem is it also worked for another extension found in the some of the later versions of PhotoImpact.

When you have successfully finished your template, make sure to save it in the Ulead File For Photo Project format (*.UFP). This allows you to open and use your template in the Photo Projects dialog box. In the Template tab, click Open Project and browse for the created file.

They appear to be a template version for the format so we should be fine just adding the extension to the same signature.

Well, this Unidentified File Object is no longer unidentifiable. Was it sent by aliens? Possibly, but at least we know where these UFO’s came from, PhotoImpact. Take a look at the samples and proposed signature in my GitHub.

Also be sure to join us at this years iPres conference and attend our workshop on container signatures in PRONOM!

SDIF

May 17, 2024 by Thor Leave a comment

I have used and have researched a lot of audio editing software. Some are very simple and straightforward, others are feature rich and take some time to learn. While looking in a format, I came across some Audio software which nothing like I have used before. At first I was confused, I figured it would be simple to open a certain file format and play the audio. Not so fast.

Max is software which proudly says it is an, “infinitely flexible space to create your own interactive software”. Created by Cycling ’74 software, Max has been around for awhile, being developed in the mid 1980’s. It allows the user to make “patches” stringing around components and effects to accomplish an infinite amount of options and outcomes.

The software produces simple project files and patch files, but hey are just JSON data, at least in the latest version. But when working with audio files the software can save to a number of formats.

One of the options is a format called “SDIF”, which stands for “Sound Description Interchange Format“. SDIF was jointly developed by IRCAM and CNMAT, with proposals starting back in the mid-1990’s. Originally written as a Spectral Description, it was later changed to refer to a Sound Description.

The Specification states the general idea was to “store information related to signal processing and specifically of sound, in files, according to a common format to all data types. Thus, it is possible to store results or parameters of analyses, syntheses…” So not exactly the same as a simple WAVE file you can open and edit, this format was meant to store signal data for analysis.

Each SDIF file consists of a header and then an overall a succession of frames, not unlike chunks in the IFF/AIFF/RIFF formats, ordered in time. Each frame matrix declares a “Type” which can be a combination of many options. Lets take a look at a SDIF file:

hexdump -C test.sdif | head
00000000  53 44 49 46 00 00 00 08  00 00 00 03 00 00 00 01  |SDIF............|
00000010  31 54 52 43 00 00 00 20  00 00 00 00 00 00 00 00  |1TRC... ........|
00000020  00 00 00 01 00 00 00 01  31 54 52 43 00 00 00 04  |........1TRC....|
00000030  00 00 00 00 00 00 00 04  31 54 52 43 00 00 00 c0  |........1TRC....|
00000040  3f 74 7a e1 40 00 00 00  00 00 00 01 00 00 00 01  |?tz.@...........|
00000050  31 54 52 43 00 00 00 04  00 00 00 0a 00 00 00 04  |1TRC............|
00000060  3f 80 00 00 45 95 35 c3  00 00 00 00 00 00 00 00  |?...E.5.........|
00000070  40 00 00 00 46 06 e2 14  00 00 00 00 00 00 00 00  |@...F...........|
00000080  40 40 00 00 45 3b 42 3d  00 00 00 00 00 00 00 00  |@@..E;B=........|
00000090  40 80 00 00 43 5d 94 7b  00 00 00 00 00 00 00 00  |@...C].{........|

This test file has the opening frame “SDIF“, to identify it as an SDIF, then a reference to the type “1TRC“. I would try and explain a Matrix 1TRC Sinusoidal Track, but I have no idea what it means. Something, something sine wave, etc. Someone much smarter than me can make use of this format. Here are a couple examples of SDIF with other frame types.

hexdump -C angry_cat.part.sdif| head
00000000  53 44 49 46 00 00 00 08  00 00 00 03 00 00 00 01  |SDIF............|
00000010  31 4e 56 54 00 00 00 88  ff ef ff ff ff ff ff ff  |1NVT............|
00000020  ff ff ff fd 00 00 00 01  31 4e 56 54 00 00 03 01  |........1NVT....|
00000030  00 00 00 61 00 00 00 01  53 74 72 65 61 6d 49 44  |...a....StreamID|
00000040  09 30 0a 44 61 74 65 09  54 68 75 5f 41 75 67 5f  |.0.Date.Thu_Aug_|
00000050  5f 33 5f 32 31 2e 33 32  2e 34 35 5f 32 30 30 30  |_3_21.32.45_2000|
00000060  5f 0a 54 61 62 6c 65 4e  61 6d 65 09 53 69 6e 75  |_.TableName.Sinu|
00000070  73 6f 69 64 61 6c 54 72  61 63 6b 73 0a 57 72 69  |soidalTracks.Wri|
00000080  74 74 65 6e 42 79 09 50  6d 5f 56 65 72 73 69 6f  |ttenBy.Pm_Versio|
00000090  6e 5f 31 2e 32 2e 32 0a  00 00 00 00 00 00 00 00  |n_1.2.2.........|

hexdump -C cymbalum-c4.res.sdif| head 
00000000  53 44 49 46 00 00 00 08  00 00 00 03 00 00 00 01  |SDIF............|
00000010  31 52 45 53 00 00 0d 20  00 00 00 00 00 00 00 00  |1RES... ........|
00000020  00 00 00 04 00 00 00 01  31 52 45 53 00 00 00 04  |........1RES....|
00000030  00 00 00 d0 00 00 00 04  42 49 27 7a 39 59 fc ab  |........BI'z9Y..|
00000040  3d 35 06 c9 00 00 00 00  42 6e 68 68 39 63 99 b1  |=5......Bnhh9c..|
00000050  3e 25 f7 c0 00 00 00 00  42 c6 02 bb 39 8c 31 79  |>%......B...9.1y|
00000060  3f bb 7e 6e 00 00 00 00  43 01 82 96 3a 1d 36 44  |?.~n....C...:.6D|
00000070  3e d9 21 12 00 00 00 00  43 07 35 f0 3a 20 6f 6e  |>.!.....C.5.: on|
00000080  3f 02 32 7f 00 00 00 00  43 30 84 0b 39 97 f9 1b  |?.2.....C0..9...|
00000090  3e c6 43 c7 00 00 00 00  43 4d e4 e4 39 88 14 90  |>.C.....CM..9...|

Unfortunately, the common tools I use to explore AV formats don’t seem to work on this format. MediaInfo, FFProbe, Exiftool, all give me unknown file warnings. So I had to compile the SDIF software in order to get some details.

querysdif angry_cat.part.sdif 
Header info of file angry_cat.part.sdif:

Format version: 3
Types version:  1

Ascii chunks of file angry_cat.part.sdif:

1NVT
{
StreamID	0;
Date	Thu_Aug__3_21.32.45_2000_;
TableName	SinusoidalTracks;
WrittenBy	Pm_Version_1.2.2;
}

Data in file angry_cat.part.sdif (9504872 bytes):
 1933 1TRC frames in stream 0 between time 0.000000 and 5.794875 containing
    1933 1TRC matrices with  45 --400 rows,   4 --  4 columns

An interesting thing is that a SDIF file can be in text form as well.

sdiftotext test.sdif 
SDIF


SDFC

1TRC	1	1	0
  1TRC	0x0004	0	4

1TRC	1	1	0.005
  1TRC	0x0004	10	4
	1	4774.72	0	0
	2	8632.52	0	0
	3	2996.14	0	0
	4	221.58	0	0
	5	1943.02	0	0
	6	123.951	0	0
	7	6705.04	0	0
	8	4304.97	0	0
	9	3554.29	0	0
	10	23.7822	0	0

1TRC	1	1	0.01
  1TRC	0x0004	10	4
	1	4774.72	0.0353114	2.06098
	2	8632.52	0.00442518	0.68795
	3	2996.14	0.0238517	-1.42295
	4	221.58	0.0089712	-2.44141
	5	1943.02	0.00768914	2.64629
	6	123.951	0.0397061	-0.17527
	7	6705.04	0.0245643	-0.168753
	8	4304.97	0.00894803	1.45553
	9	3554.29	0.0265175	2.57231
	10	23.7822	0.0419019	-2.17731

1TRC	1	1	0.2
  1TRC	0x0004	10	4
	1	2284.56	0.02781	2.47054
	2	4222.62	0.0151738	1.55309
	3	31.1554	0.00421461	-0.657285
	4	310.99	0.0122306	1.25794
	5	215.192	0.0174093	1.25468
	6	6253.69	0.000894192	2.21334
	7	8533.32	0.0296167	2.07209
	8	8044.77	0.0423002	2.54088
	9	6087.45	0.0264733	-2.05523
	10	7052.7	0.0287347	0.426339

1TRC	1	1	0.205
  1TRC	0x0004	10	4
	1	2284.56	0	0
	2	4222.62	0	0
	3	31.1554	0	0
	4	310.99	0	0
	5	215.192	0	0
	6	6253.69	0	0
	7	8533.32	0	0
	8	8044.77	0	0
	9	6087.45	0	0
	10	7052.7	0	0

1TRC	1	1	0.21
  1TRC	0x0004	0	4

ENDC
ENDF

An interesting format for sure. But wait, there is more!

My initial interest in this format was when I was given access to a set of MUBU files. I was unclear on how there were created at first and it took me down a long path of learning about SDIF and the Max software from Cycling ’74 and IRCAM. MUBU turns out to be a toolbox for Max which adds more analysis features.

MUBU stands for MUlti-BUffer, which helps overcome some limitations. It is actually a container using the SDIF standard. Lets take a look.

hexdump -C test.mubu | head
00000000  53 44 49 46 00 00 00 08  00 00 00 03 00 00 00 01  |SDIF............|
00000010  31 4e 56 54 00 00 00 78  ff ef ff ff ff ff ff ff  |1NVT...x........|
00000020  ff ff ff fd 00 00 00 01  31 4e 56 54 00 00 03 01  |........1NVT....|
00000030  00 00 00 53 00 00 00 01  4d 75 42 75 2e 43 6f 6e  |...S....MuBu.Con|
00000040  74 61 69 6e 65 72 2e 4e  75 6d 54 72 61 63 6b 73  |tainer.NumTracks|
00000050  09 31 0a 4d 75 42 75 2e  43 6f 6e 74 61 69 6e 65  |.1.MuBu.Containe|
00000060  72 2e 56 65 72 73 69 6f  6e 09 31 2e 35 0a 4d 75  |r.Version.1.5.Mu|
00000070  42 75 2e 43 6f 6e 74 61  69 6e 65 72 2e 4e 75 6d  |Bu.Container.Num|
00000080  42 75 66 66 65 72 73 09  31 0a 00 00 00 00 00 00  |Buffers.1.......|
00000090  31 4e 56 54 00 00 00 38  ff ef ff ff ff ff ff ff  |1NVT...8........|

A MUBU file has the same SDIF frame header, but also include a “1NVT” frame, which is a Name Value Table. This is where the MUBU container is referenced. The MuBu file has its own structure:

If I query the MuBu file like I did the SDIF, I get the following:

querysdif test.mubu
Header info of file test.mubu:

Format version: 3
Types version:  1

Ascii chunks of file test.mubu:

1NVT
{
MuBu.Container.NumTracks	1;
MuBu.Container.Version	1.5;
MuBu.Container.NumBuffers	1;
}
1NVT
{
MuBu.Buffer.Index	0;
}
1NVT
{
MuBu.Track.MxRows	2;
AudioFile	1;
MuBu.Track.NonNumType	0;
MuBu.Track.MaxSize	93515;
meta_ISFT	Lavf60.16.100;
MuBu.Track.Name	mytrack;
MuBu.Track.BufferIndex	0;
MuBu.Track.SampleRate	48000;
FileName	Wilhelm_Scream.wav;
MuBu.Track.MxVarRows	0;
MuBu.Track.MxCols	1;
meta_MetaDataSource	WAV;
MuBu.Track.EndTime	1623.5;
FilePath	/;
MuBu.Track.SampleOffset	0;
MuBu.Track.TimeTags	0;
MuBu.Track.Size	77929;
MuBu.Track.Index	0;
}

1TYP
{
  1MTD	M000	{unnamed}
  1FTD	M000
	{
	  M000	Track-0-MatrixData;
	}
}

Data in file test.mubu (3741392 bytes):
77929 M000 frames in stream 0 between time 0.000000 and 1.623500 containing
   77929 M000 matrices with   2 --  2 rows,   1 --  1 columns

The MuBu file contains one audio track and one buffer. This is a simple test file, but MuBu files can be quite large with multiple tracks.

Working with the Max software or OpenMusic is not something I found to be easy to understand. I am sure if I was more musically inclined and with a little practice I could make some of this work. For the time being, a signature to identify a SDIF and MUBU will have to do. Check out the GitHub for my proposed signature and a couple examples.

PROmotion

May 3, 2024 by Thor 1 Comment

The 1990’s was an amazing time for multimedia. Compared to what is possible today, the graphics were more simple but there were many software titles leading the charge in Animation. Macromedia Director, along with Flash, dominated the interactive multimedia market for quite some time. Eventually being picked up by Adobe and discontinued in 2013. Quite a few multimedia disc’s out there were built using Director.

Competing with Director, another company had a strong product. Motion Works International was an early pioneer in the multimedia CD-ROM scene. Rumor has it, Motion Works was started by a 12 year old. Motion Works had been making software for use with the highly successful HyperCard software since 1988. In 1992 they released the successor to their ADDmotion software, a path based animation tool called PROmotion.

PROmotion was used with with many Multimedia titles, some in cooperation with the Corel Home series. In addition to commercial titles PROmotion was a great tool for the creation of animation clips and other marketing material. I came across some stand-alone marketing files for old scriptwriting software called ScriptWare. When I unarchived the HQX file and Installed the Demo, I was presented with a set of files with the .MW extension.

ls -l@
total 10232
-rw-r--r--@ 1 tyler  staff  1392 May  1 23:17 Read me first!
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	 452 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 begin_here.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	158901 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 characters.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	387029 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 cinovation.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	189509 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 cut paste.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	608405 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 formats.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	289698 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 modify formats.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	486730 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 notes.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	319250 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 overview.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	376854 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 scene shuffle.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	359746 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 script elements.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	279052 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 sw_menu.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	421836 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 title page.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	236614 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 transitions.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	471462 
-rw-r--r--@ 1 tyler  staff     0 May  1 23:17 try it.MW
	com.apple.FinderInfo	  32 
	com.apple.ResourceFork	622312 

getfileinfo sw_menu.MW 
file: "sw_menu.MW"
type: "APPL"
creator: "AMvw"

Looking at the files in the directory with their extended attributes I can see all the .MW files have no data fork (0 bytes), only a resource fork. This is common for any Application on the MacOS systems prior to MacOS X. At first the MW extension made me thing of MacWrite, but launching one of these MW files brought up an interactive menu. The type being APPL, which is Application.

What I thought would be a demo of the application Scriptware was actually interactive animations demonstrating the software. By dumping the resource fork of one of the MW files I found some information which helped me know what software created these interactive demos.

derez Scriptware\ Demo\ folder/sw_menu.MW

data 'vers' (1) {
	$"0103 8000 0000 0531 2E30 2E33 2941 4D20"            /* ..?....1.0.3)AM  */
	$"5669 6577 6572 2031 2E30 2E33 0DA9 2031"            /* Viewer 1.0.3.? 1 */
	$"3939 3320 4D6F 7469 6F6E 2057 6F72 6B73"            /* 993 Motion Works */
	$"2049 6E74 6C2E"                                     /*  Intl. */
};

data 'vers' (2) {
	$"0103 8000 0000 0531 2E30 2E33 1E50 6C61"            /* ..?....1.0.3.Pla */
	$"7962 6163 6B20 6279 204D 6F74 696F 6E20"            /* yback by Motion  */
	$"576F 726B 7320 496E 746C 2E"                        /* Works Intl. */
};

data 'STR#' (1250, "ADDmotion HC strings") {
	$"000A 1641 4444 6D6F 7469 6F6E 5F65 7870"            /* ...ADDmotion_exp */
	$"6F72 745F 6672 616D 650E 4144 446D 6F74"            /* ort_frame.ADDmot */
	$"696F 6E5F 696E 666F 1141 4444 6D6F 7469"            /* ion_info.ADDmoti */
	$"6F6E 5F73 7573 7065 6E64 1041 4444 6D6F"            /* on_suspend.ADDmo */
	$"7469 6F6E 5F72 6573 756D 650E 4144 446D"            /* tion_resume.ADDm */
	$"6F74 696F 6E5F 7175 6974 0E41 4444 6D6F"            /* otion_quit.ADDmo */
	$"7469 6F6E 5F70 6C61 790E 4144 446D 6F74"            /* tion_play.ADDmot */
	$"696F 6E5F 7374 6F70 0F41 4444 6D6F 7469"            /* ion_stop.ADDmoti */
	$"6F6E 5F70 6175 7365 0000"                           /* on_pause.. */
};

Makes sense, MW stood for “Motion Works”. ADDmotion was another software title developed by Motion Works, most will remember it as an add-on for Hypercard for adding animation to stacks. These MW files are created using PROmotion and exporting them as a stand-alone animation which includes the “AM Viewer” built in. A regular PROmotion file, however, did not include a viewer and requires the software in order to open and run.

-rwx------@ 1 tyler  staff      0 Apr 25 15:51 Example Animation
	com.apple.FinderInfo	   32 
	com.apple.ResourceFork	495272

The PROmotion file format also is Resource Fork only, making them difficult to manage outside of a Macintosh.

getfileinfo Example\ Animation
file: "Example Animation"
type: "ADDm"
creator: "ADDm"

The files do have a Type/Creator code of “ADDm”, but with no data fork, identification through standard means is not possible. They also do not have the “vers” string to help identify them within the Resource Fork. Since standard methods of identification are impossible, I hope in the future there will be more tools available to read the Type/Creator codes while on the Mac, or in a disk image, or within a container and return back the Software which created the file and the file type.

The products from Motion Works where significantly cheaper than animation tools such as Director, but were still pretty powerful for its day. I was surprised when I found the company didn’t last much longer than 1998 before disappearing. There are probably many stories like PROmotion, coming onto the scene with new and exciting features before thought impossible only to die out as other tools dominate the market.

If you are interested in looking at the files yourself, here is a link to some original files, and the same files encoded in MacBinary.

MAGIX

March 22, 2024 by Thor 1 Comment

There are probably many reasons why a software developer might want to create a proprietary format to store their files in. The software may require special features that don’t fit into an existing format. I would hope a developer would try to use existing formats, or even better open formats, but for many reasons, which probably include profits, they choose to re-invent the wheel often.

MAGIX is a German company which started making software in 1994. In 2001 they developed their first video editing software which was called Movie Edit Pro. The software seems to be well received and is still in use today.

Like most video editing software, project files are used to store all the edits and links to video files. These are usually smaller text based, with many using XML as the project format. Not MAGIX, they decided to go with a different yet known format for their project files.

hexdump -C MAGIX15-s01.MVP | head
00000000  52 49 46 46 6c 37 01 00  53 45 4b 44 4d 56 50 48  |RIFFl7..SEKDMVPH|
00000010  08 00 00 00 00 00 00 00  00 00 00 00 4c 49 53 54  |............LIST|
00000020  0c 16 01 00 4d 56 50 4c  4c 49 53 54 00 16 01 00  |....MVPLLIST....|
00000030  56 49 50 4c 53 56 49 50  0c 07 00 00 00 dc 05 00  |VIPLSVIP........|
00000040  00 00 00 00 20 00 00 00  0c 00 00 00 80 bb 00 00  |.... ...........|
00000050  10 00 00 00 29 6b 55 e2  53 f8 3d 40 00 00 f0 42  |....)kU.S.=@...B|
00000060  01 00 00 00 bd 04 ef fe  00 00 01 00 06 00 08 00  |................|
00000070  00 00 01 00 06 00 08 00  00 00 01 00 3f 00 00 00  |............?...|
00000080  28 00 00 00 04 00 04 00  01 00 00 00 00 00 00 00  |(...............|
00000090  00 00 00 00 00 00 00 00  bd 8f 32 01 d0 02 00 00  |..........2.....|

Yes, they used the RIFF container format for their projects. Seems an odd choice, especially for video production although it is well suited for it. AVI is another video format which uses the RIFF container. The MVP project file uses the ID SEKD with the format MVPH. Earlier versions of Movie Edit Pro used a different extension.

hexdump -C MAGIXv11-s01.MVD | head
00000000  52 49 46 46 38 57 00 00  53 45 4b 44 53 56 49 50  |RIFF8W..SEKDSVIP|
00000010  70 00 00 00 00 dc 05 00  00 00 00 00 04 00 00 00  |p...............|
00000020  02 00 00 00 80 bb 00 00  10 00 00 00 8e 23 d6 e2  |.............#..|
00000030  53 f8 3d 40 00 00 f0 42  01 00 00 00 bd 04 ef fe  |S.=@...B........|
00000040  00 00 01 00 00 00 06 00  00 00 04 00 00 00 06 00  |................|
00000050  00 00 04 00 3f 00 00 00  28 00 00 00 04 00 04 00  |....?...(.......|
00000060  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  c8 1b 32 01 d0 02 00 00  e0 01 00 00 52 d7 da fb  |..2.........R...|
00000080  54 55 f5 3f 4c 49 53 54  04 00 00 00 70 68 79 73  |TU.?LIST....phys|
00000090  4c 49 53 54 d0 3d 00 00  74 72 6b 73 4c 49 53 54  |LIST.=..trksLIST|

The MVD format used on an earlier version of Movie Edit Pro is also a RIFF, and with the ID of SEKD, but has a format of SVIP.

RIFFpad can break down the chunks we see in an MVP file. Each of the LIST chunks has their own subchunks as well. I assume this his how the editing software stores each video/audio track references, etc. So I give it to MAGIX for at least using an understandable format to store their projects.

MAGIX has also used RIFF in many of its supporting formats. So far I have found mfx, afx, ifx, cfx, ctf, tfx, ufx, mmt, mmm, hdp, each having their own format:

hexdump -C 101_Loud.mfx | head
00000000  52 49 46 46 a8 6f 00 00  53 45 4b 44 4d 41 46 58  |RIFF.o..SEKDMAFX|
00000010  00 00 00 00 4c 49 53 54  94 6f 00 00 41 55 46 58  |....LIST.o..AUFX|
00000020  4c 49 53 54 88 6f 00 00  41 46 58 45 46 58 48 44  |LIST.o..AFXEFXHD|
00000030  20 00 00 00 00 00 25 0d  00 00 00 00 02 00 00 00  | .....%.........|
00000040  01 00 00 00 00 00 00 00  03 18 00 00 00 00 00 00  |................|
00000050  00 00 00 00 4c 49 53 54  54 6f 00 00 41 46 58 44  |....LISTTo..AFXD|
00000060  4c 49 53 54 50 6a 00 00  41 46 58 45 46 58 48 44  |LISTPj..AFXEFXHD|
00000070  20 00 00 00 00 00 25 0d  00 00 00 00 05 00 00 00  | .....%.........|
00000080  01 00 00 00 00 00 00 00  03 18 00 00 00 00 00 00  |................|
00000090  00 00 00 00 4c 49 53 54  1c 6a 00 00 41 46 58 44  |....LIST.j..AFXD|

Not sure the best way to manage all of these in terms of identification, as I am not sure what what is the purpose of each format. Maybe for now I’ll make a generic to catch them all as a MAGIX File.

Extension	ID	FORMAT
AFX	SEKD	SAFX
CFX	SEKD	SCFX
CTF	SEKD	SVIP
HDP	SEKD	SHDP
IFX	SEKD	SIFX
MFX	SEKD	MAFX
MMM	SEKD	SVIP
MMT	SEKD	SVIP
MVD	SEKD	SVIP
MVP	SEKD	MVPH
MXM	MXMD	mxmi
TFX	SEKD	STFX
UFX	SEKD	SVIP

But, when it comes to their proprietary MAGIX Video format, I think they may have pushed things a little too far. Meet the MXV format:

hexdump -C MAGIXv11-s01.mxv | head
00000000  4d 58 52 49 46 46 36 34  9a cb 2b 00 00 00 00 00  |MXRIFF64..+.....|
00000010  4d 58 4a 56 49 44 36 34  4d 58 4a 56 48 32 36 34  |MXJVID64MXJVH264|
00000020  70 00 00 00 00 00 00 00  70 00 00 00 03 00 00 00  |p.......p.......|
00000030  42 93 2b 00 00 00 00 00  f0 00 00 00 00 00 00 00  |B.+.............|
00000040  7b 2e 00 00 4b 00 00 00  01 00 00 00 00 00 00 00  |{...K...........|
00000050  8e 23 d6 e2 53 f8 3d 40  80 02 00 00 e0 01 00 00  |.#..S.=@........|
00000060  80 02 00 00 e0 01 00 00  04 00 00 00 43 15 00 00  |............C...|
00000070  f0 00 00 00 00 00 00 00  28 19 00 00 00 00 00 00  |........(.......|
00000080  55 55 55 55 55 55 f5 3f  00 00 00 00 00 00 00 00  |UUUUUU.?........|
00000090  7f dd 05 00 00 00 00 00  4d 58 4a 56 48 44 36 34  |........MXJVHD64|

I am not sure what I am looking at, is it a RIFF? Is it a RIFF variant like RF64? MAGIX claims the format is:

This is the MAGIX video format for quicker processing with MAGIX products. It offers very low loss of quality, but it cannot be played via conventional DVD players.
MAGIX Video Pro X6

A look around the internet doesn’t bring much up in reference to this format. Just my recent page on the format wiki. A search for MXRIFF64 bring up nothing. But a closer look at other strings within the MXV file reveal we are probably looking at some sort of MPEG format.

I was able to locate a project on GitHub which claims to be able to demux the MXV format. The software is written in GO and appears to indicate this format is chunked based and has most of the chunks figured out. So if you find yourself stuck with some MXV files and don’t want to use the latest from MAGIX, this might be the tool for you.

This demuxer also has an interesting file you can download. It is called a “GRAMMAR” file and can be loaded into hex viewers like Synalyze It! can show the parts of a file you load. Its a great way to explore a format!

None of these formats are found in PRONOM, project files are not usually kept in archives, but if would be good to know about the RIFF files if they do turn up. The video format is for sure something the archival world should know about. MediaInfo is currently not aware of this format, but seems like it might be an easy task.

As usual, you can see some samples and my proposal signatures on my GitHub.

Designer

March 15, 2024 by Thor Leave a comment

Micrografx / Corel Designer

Many software titles we have all used began life under a different brand or even title. Larger software companies gobble up smaller developers, some brands merge, and others change names for whatever reason. Adobe has bought many smaller companies over the years, sometimes developing the acquired software and other times burying the software to avoid competition. Pagemaker was bought to give InDesign life, many Macromedia titles were incorporated or shelved. Such is life in the software world.

In understanding a file format, often times you need to follow this trail backwards to understand when file formats changed and compatibility is dropped. Often times the formats remained the same, but the extension is changed. Or the software name changes and formats are updated, but the extension remains the same. There can also be multiple titles which all use a common format, further complicating the identification of the formats.

Let’s look closer at the a title which changed names and file formats a few times over the years. Micrografx was founded in 1982 and were pretty well known for their innovation in computer graphics. They have released many titles over the years, but one of the first was In*A*Vision graphic software for Windows 1.0 in 1986. This software used a format with the .PIC extension. A couple years later version 2, was renamed to Micrografx Designer and used the .DRW extension. This extension was also used by Micrografx Draw, another similar program.

Micrografx Designer continued to be released until version 9 which is when it was purchased by Corel who continued to release new versions, although it is said the software was just a variation of CorelDraw, and now Designer is part of the CorelDraw Technical Suite. Other Micrografx software such as Picture Publisher was discontinued and customers were encouraged to use Corel’s PaintShop Pro instead. Somewhere in the middle of all this, Micrografx spun off a separate business unit called iGrafx, which Designer was marketed under for a short time.

Let’s break down the names, extensions used, and format type.

In*A*Vision & Draw, binary format, PIC extension
Micrografx Designer & Draw, binary format, DRW extension
Micrografx Designer version 4, RIFF format, DS4 & MGX extension
Micrografx Designer versions 6-9, OLE Container format, DSF extension
Micrografx/Corel Designer versions 10-12, RIFF format, DES extension
Corel Designer version X4-Current, ZIP/XML format, DES extension

According to the 2021 Corel DesignerUser Guide:

Corel DESIGNER (DES, DSF, DS4, or DRW)

You can import Corel DESIGNER files. Files from version 10 and later have the filename extension .des. Files from Micrografx versions 6 to 9 have the filename extension .dsf. Version 4 files have the filename extension .ds4. The .drw filename extension is used for a Micrografx 2.x or 3.x file. Micrografx template files (DST) are also supported.

The PRONOM registry has a few of these formats with signatures and documented, but not all, let’s see where the gaps are.

PUID	Format Name	Format Version	Extension
x-fmt/151	Micrografx Designer		dsf
x-fmt/296	Micrografx Designer	3.1	drw
x-fmt/47	Micrografx Draw	1-2	drw
x-fmt/294	Micrografx Draw	3	drw
x-fmt/295	Micrografx Draw	4	drw, drt
fmt/1907	Micrografx Icon File		icn
fmt/1481	Micrografx In-A-Vision Drawing		pic

So from the PRONOM list, it appears we have good identification on the original PIC and DRW formats. Then the Designer DSF OLE container is taken care of as well. That leaves us with DS4 and DES formats.

hexdump -C DS41-S01.DS4 | head
00000000  52 49 46 46 6e 07 00 00  4d 47 58 20 69 74 70 64  |RIFFn...MGX itpd|
00000010  04 00 00 00 00 02 00 80  70 72 6f 70 23 00 00 00  |........prop#...|
00000020  1f 00 00 30 02 00 00 00  08 00 2c 40 44 00 11 20  |...0......,@D.. |
00000030  20 00 01 10 80 e0 00 00  91 08 21 e0 5c 82 90 72  | .........!.\..r|
00000040  05 ff c0 00 4c 49 53 54  10 04 00 00 64 69 74 6e  |....LIST....ditn|
00000050  74 68 6e 6c 03 04 00 00  57 01 00 30 00 00 08 00  |thnl....W..0....|
00000060  08 00 00 41 04 00 01 20  a4 00 82 10 72 14 40 48  |...A... ....r.@H|
00000070  00 58 20 84 04 32 10 40  00 12 c8 98 18 22 63 90  |.X ..2.@....."c.|
00000080  2b 91 32 36 47 08 20 c0  23 e4 80 90 92 22 46 49  |+.26G. .#...."FI|
00000090  09 29 26 24 e4 a0 94 92  a2 56 4b 09 69 2e 25 e4  |.)&$.....VK.i.%.|

Micrografx Designer 4 apparently uses the RIFF container format. The RIFF format is used with many different types of formats. The most common is the WAV format. CorelDRAW also uses the RIFF format so it makes sense they would use it as they took over from Micrografx.

Each RIFF format has a four byte identifier type after the first eight bytes which identify the RIFF. The DS4 file uses the code “MGX ” to identify itself. Which also appears to be used with their clipart format, MGX. We can use the same identification method we use for other RIFF’s to identify this format.

hexdump -C Corel-DES10Sample.des | head
00000000  52 49 46 46 8a 57 00 00  44 45 53 41 76 72 73 6e  |RIFF.W..DESAvrsn|
00000010  02 00 00 00 7e 04 4c 49  53 54 54 0c 00 00 69 63  |....~.LISTT...ic|
00000020  63 70 69 63 63 64 48 0c  00 00 00 00 0c 48 4c 69  |cpiccdH......HLi|
00000030  6e 6f 02 10 00 00 6d 6e  74 72 52 47 42 20 58 59  |no....mntrRGB XY|
00000040  5a 20 07 ce 00 02 00 09  00 06 00 31 00 00 61 63  |Z .........1..ac|
00000050  73 70 4d 53 46 54 00 00  00 00 49 45 43 20 73 52  |spMSFT....IEC sR|
00000060  47 42 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |GB..............|
00000070  f6 d6 00 01 00 00 00 00  d3 2d 48 50 20 20 00 00  |.........-HP  ..|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Starting with version 10 of Corel Designer, the RIFF format is used again and has a different type. With Version 10 using “DESA”, then for version 10.5:

hexdump -C Corel-DES10.5Sample.des | head 
00000000  52 49 46 46 cc 57 00 00  44 45 53 42 76 72 73 6e  |RIFF.W..DESBvrsn|
00000010  02 00 00 00 b0 04 4c 49  53 54 54 0c 00 00 69 63  |......LISTT...ic|
00000020  63 70 69 63 63 64 48 0c  00 00 00 00 0c 48 4c 69  |cpiccdH......HLi|
00000030  6e 6f 02 10 00 00 6d 6e  74 72 52 47 42 20 58 59  |no....mntrRGB XY|
00000040  5a 20 07 ce 00 02 00 09  00 06 00 31 00 00 61 63  |Z .........1..ac|
00000050  73 70 4d 53 46 54 00 00  00 00 49 45 43 20 73 52  |spMSFT....IEC sR|
00000060  47 42 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |GB..............|
00000070  f6 d6 00 01 00 00 00 00  d3 2d 48 50 20 20 00 00  |.........-HP  ..|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The next version after 10.5 is version 12 and it shows a type:

hexdump -C Corel-DES12-Sample.des | head 
00000000  52 49 46 46 ce 57 00 00  44 45 53 43 76 72 73 6e  |RIFF.W..DESCvrsn|
00000010  02 00 00 00 e2 04 4c 49  53 54 54 0c 00 00 69 63  |......LISTT...ic|
00000020  63 70 69 63 63 64 48 0c  00 00 00 00 0c 48 4c 69  |cpiccdH......HLi|
00000030  6e 6f 02 10 00 00 6d 6e  74 72 52 47 42 20 58 59  |no....mntrRGB XY|
00000040  5a 20 07 ce 00 02 00 09  00 06 00 31 00 00 61 63  |Z .........1..ac|
00000050  73 70 4d 53 46 54 00 00  00 00 49 45 43 20 73 52  |spMSFT....IEC sR|
00000060  47 42 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |GB..............|
00000070  f6 d6 00 01 00 00 00 00  d3 2d 48 50 20 20 00 00  |.........-HP  ..|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

After version 12, Corel started using numbering consistent with their other products. The first being X4.

hexdump -C Corel-DESX4-Sample.des | head
00000000  50 4b 03 04 14 00 00 08  00 00 f8 bb c9 4e c3 4b  |PK...........N.K|
00000010  9c d1 2d 00 00 00 2d 00  00 00 08 00 00 00 6d 69  |..-...-.......mi|
00000020  6d 65 74 79 70 65 61 70  70 6c 69 63 61 74 69 6f  |metypeapplicatio|
00000030  6e 2f 78 2d 76 6e 64 2e  63 6f 72 65 6c 2e 64 65  |n/x-vnd.corel.de|
00000040  73 69 67 6e 65 72 2e 64  6f 63 75 6d 65 6e 74 2b  |signer.document+|
00000050  7a 69 70 50 4b 03 04 14  00 00 08 00 00 f8 bb c9  |zipPK...........|
00000060  4e 6f 38 b6 64 98 13 00  00 98 13 00 00 14 00 00  |No8.d...........|
00000070  00 63 6f 6e 74 65 6e 74  2f 72 69 66 66 44 61 74  |.content/riffDat|
00000080  61 2e 63 64 72 52 49 46  46 90 13 00 00 44 45 53  |a.cdrRIFF....DES|
00000090  45 76 72 73 6e 02 00 00  00 82 05 4c 49 53 54 54  |Evrsn......LISTT|

Well it looks like things changed, starting with X4 the format changed to a ZIP container. Let’s take a peak inside.

Path = Corel-DESX4-Sample.des
Type = zip
Physical Size = 8714

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-06-09 22:31:47 .....           45           45  mimetype
2019-06-09 22:31:47 .....         5016         5016  content/riffData.cdr
2019-06-09 22:31:47 .....       196662          239  metadata/thumbnails/thumbnail.bmp
2019-06-09 22:31:47 .....       151606          698  metadata/thumbnails/page1.bmp
2019-06-09 22:31:47 .....          596          259  metadata/textinfo.xml
2019-06-09 22:31:47 .....         4977         1314  metadata/metadata.xml
2019-06-09 22:31:47 .....           53           55  links.xml
------------------- ----- ------------ ------------  ------------------------
2019-06-09 22:31:47             358955         7626  7 files

Looks like the container holds a RIFF inside along with some thumbnails, metadata, and other things. The mimetype file simple holds “application/x-vnd.corel.designer.document+zip”. The riffData.cdr however looks like this:

hexdump -C Corel-DESX4-Sample/content/riffData.cdr | head
00000000  52 49 46 46 90 13 00 00  44 45 53 45 76 72 73 6e  |RIFF....DESEvrsn|
00000010  02 00 00 00 82 05 4c 49  53 54 54 0c 00 00 69 63  |......LISTT...ic|
00000020  63 70 69 63 63 64 48 0c  00 00 00 00 0c 48 4c 69  |cpiccdH......HLi|
00000030  6e 6f 02 10 00 00 6d 6e  74 72 52 47 42 20 58 59  |no....mntrRGB XY|
00000040  5a 20 07 ce 00 02 00 09  00 06 00 31 00 00 61 63  |Z .........1..ac|
00000050  73 70 4d 53 46 54 00 00  00 00 49 45 43 20 73 52  |spMSFT....IEC sR|
00000060  47 42 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |GB..............|
00000070  f6 d6 00 01 00 00 00 00  d3 2d 48 50 20 20 00 00  |.........-HP  ..|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Another RIFF, and seems to be in the same sequence, but going from version 12 to X4 we seemed to have skipped “DESD”. Maybe there was a developer version in between as they transitioned. Version X5 looks similar and has the RIFF sequence “DESF”. When we get to X6 the structure changes.

Path = Corel-DESX6-Sample.des
Type = zip
Physical Size = 8568

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-06-09 22:31:21 .....           45           45  mimetype
2019-06-09 22:31:21 .....        12153         1098  content/data/data1.dat
2019-06-09 22:31:21 .....          439          224  content/data/masterPage.dat
2019-06-09 22:31:21 .....          613          265  content/data/page1.dat
2019-06-09 22:31:21 .....           34           28  content/dataFileList.dat
2019-06-09 22:31:21 .....          960          279  content/root.dat
2019-06-09 22:31:21 .....       196662          239  metadata/thumbnails/thumbnail.bmp
2019-06-09 22:31:21 .....       151606          698  metadata/thumbnails/page1.bmp
2019-06-09 22:31:21 .....          427          208  color/color.xml
2019-06-09 22:31:21 .....          596          259  metadata/textinfo.xml
2019-06-09 22:31:21 .....          103          100  color/docPalette.xml
2019-06-09 22:31:21 .....        14920         1444  styles/document.cdss
2019-06-09 22:31:21 .....         5500         1462  metadata/metadata.xml
2019-06-09 22:31:21 .....           53           55  links.xml
------------------- ----- ------------ ------------  ------------------------
2019-06-09 22:31:21             384111         6404  14 files

The mimetype remains the same, but we see additional files within the structure. Also the riffData.cdr file is missing. Looking at each file we can see the root.dat file is a RIFF and follows the same sequence.

hexdump -C Corel-DESX6-Sample/content/root.dat | head
00000000  52 49 46 46 b8 03 00 00  44 45 53 47 66 76 65 72  |RIFF....DESGfver|
00000010  10 00 00 00 ff ff ff ff  08 00 00 00 5e 06 02 00  |............^...|
00000020  00 00 10 00 76 72 73 6e  10 00 00 00 ff ff ff ff  |....vrsn........|
00000030  02 00 00 00 5e 06 00 00  00 00 00 00 4c 49 53 54  |....^.......LIST|
00000040  7c 00 00 00 64 6f 63 20  6d 63 66 67 10 00 00 00  ||...doc mcfg....|
00000050  00 00 00 00 83 20 00 00  00 00 00 00 00 00 00 00  |..... ..........|
00000060  70 72 65 66 10 00 00 00  00 00 00 00 e6 0e 00 00  |pref............|
00000070  83 20 00 00 00 00 00 00  70 74 72 74 10 00 00 00  |. ......ptrt....|
00000080  00 00 00 00 10 00 00 00  69 2f 00 00 00 00 00 00  |........i/......|
00000090  4c 49 53 54 04 00 00 00  66 69 6c 74 4c 49 53 54  |LIST....filtLIST|

As we get to a more recent version. We can see the pattern continues.

hexdump -C Designer2022-s01/content/root.dat | head
00000000  52 49 46 46 88 06 00 00  44 45 53 4e 66 76 65 72  |RIFF....DESNfver|
00000010  10 00 00 00 ff ff ff ff  08 00 00 00 60 09 02 00  |............`...|
00000020  00 00 18 00 76 72 73 6e  10 00 00 00 ff ff ff ff  |....vrsn........|
00000030  02 00 00 00 60 09 00 00  00 00 00 00 4c 49 53 54  |....`.......LIST|
00000040  30 01 00 00 64 6f 63 20  6d 63 66 67 10 00 00 00  |0...doc mcfg....|
00000050  00 00 00 00 08 1f 00 00  00 00 00 00 00 00 00 00  |................|
00000060  70 72 65 66 10 00 00 00  00 00 00 00 ae 07 00 00  |pref............|
00000070  08 1f 00 00 00 00 00 00  70 74 72 74 10 00 00 00  |........ptrt....|
00000080  00 00 00 00 10 00 00 00  b6 26 00 00 00 00 00 00  |.........&......|
00000090  4c 49 53 54 4c 00 00 00  66 6e 74 74 66 6f 6e 74  |LISTL...fnttfont|

The last sample I have is for Corel Designer 2022, but there could be more. I created new signatures for all the samples I have, you can see them in my Github as usual. I decided to group some of the versions together to simplify things a bit, but if anyone thinks they should be broken out into individual versions, let me know.

Melco

March 1, 2024 by Thor Leave a comment

I came across another CD-ROM the other day with some fun embroidery formats. It includes the HUS format I recently posted on, plus a few more.

Like I mentioned before, this is a format genre which is not normally seen in the archival world, but is fun to take a peek into the world of embroidery formats. The HUS format from Husqvarna was a unique proprietary format, but looking at another in this set, we see a common container format.

filename : 'CH1604.ofm'
filesize : 25600
modified : 2002-04-29T05:58:26-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/111'
    format  : 'OLE2 Compound Document Format'
    version : 
    mime    : 
    class   : 'Text (Structured)'
    basis   : 'byte match at 0, 30'

First, what is an OFM file? It is the native format for Melco branded embroidery machines. They have been around for a few years. Melco has been around since 1972, but i’m sure the format is much newer. The fact that it is in an OLE container would indicate it was created in the mid 1990’s.

Looking inside the OLE container:

Path = CH1604.ofm
Type = Compound
Physical Size = 25600
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....        19171        19456  EdsIV Object
                    .....         2502         2560  Design Icon
                    .....          130          192  Design Status
------------------- ----- ------------ ------------  ------------------------
                                 21803        22208  3 files

The EdsIV Object seems specific. Looking back at the web archive it looks like EDS IV was software available for the Melco products. In a user manual there are three formats associated with the software:

.CND – Condensed Format
.EXP – Expanded Format
.OFM – Project (Layout format)

The EdsIV Object file is unique and will work well for identification. There also seems to be some common patterns within the file that can further the correct identification.

hexdump -C EdsIV Object | head
00000000  03 00 00 00 03 00 00 00  00 00 00 00 00 00 ff ff  |................|
00000010  0b 00 0c 00 43 50 72 6a  44 65 66 61 75 6c 74 73  |....CPrjDefaults|
00000020  05 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 f0 3f  28 00 00 00 01 00 00 00  |.......?(.......|
00000040  7f 00 00 00 00 00 00 00  00 00 39 40 00 00 00 00  |..........9@....|
00000050  00 00 10 40 00 00 00 00  00 00 00 00 00 00 00 00  |...@............|
00000060  00 00 00 00 00 00 00 00  00 00 59 40 04 00 00 00  |..........Y@....|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 80 51 40  |..............Q@|
00000080  00 00 00 00 00 00 3e 40  00 00 00 00 00 00 2e 40  |......>@.......@|
00000090  00 00 00 00 00 80 56 40  00 00 00 00 00 80 51 40  |......V@......Q@|

The CND and EXP formats are a different matter. I ran Tridscan across all the CND samples and it could not detect one common pattern among them all.

python tridscan.py *.csd

TrIDScan/Py v2.02 - (C) 2015-2016 By M.Pontello

File(s) to scan found: 60
Scanning for patterns...
Checking file 1/60 './Cf0103.csd'
Checking file 2/60 './Cr0005.csd'
  Pattern(s) found: 11
Checking file 3/60 './Fd0106.csd'
tridscan.py: Error: no patterns found!

Being a condensed format, I gather it might have some compression which makes for a difficult binary file to identify.

The EXP format on the other hand has a short pattern at the beginning:

hexdump -C CF0103.EXP | head
00000000  80 02 00 00 80 02 18 e7  80 02 19 e6 80 02 19 e6  |................|
00000010  80 02 19 e7 80 02 19 e6  80 02 19 e6 80 02 19 e6  |................|
00000020  80 02 19 e7 80 02 19 e6  80 02 19 e6 80 02 18 e7  |................|
00000030  00 00 fc 00 04 00 fc 00  04 ff fc 01 ed 00 ec 00  |................|
00000040  21 21 df de da 01 15 14  15 15 15 15 eb eb eb eb  |!!..............|
00000050  eb eb da 00 17 17 17 17  17 18 17 17 ea e9 e9 e9  |................|
00000060  e9 e8 e9 e9 ed 00 ec 00  18 18 19 19 18 19 19 19  |................|
00000070  18 18 e8 e8 e8 e7 e7 e7  e8 e7 e8 e8 fa 01 20 00  |.............. .|
00000080  21 00 20 01 21 00 20 00  f8 1e f7 1e f7 1f f7 1e  |!. .!. .........|
00000090  da 00 e6 e5 e5 e5 e5 e4  e5 e5 1a 1b 1b 1b 1b 1c  |................|

Currently Melco distributes a different software for use with their embroidery machines. Their DesignShop software also works with the OFM format. Downloading a copy of version 11 and using the trial version I get access to a few OFM sample files. Let’s see if they are the same.

hexdump -C BUBBLEBOY1.ofm | head
00000000  52 49 46 46 86 e5 01 00  4f 46 4d 38 76 72 73 6e  |RIFF....OFM8vrsn|
00000010  08 00 00 00 39 00 2e 00  30 00 30 00 6e 6f 74 65  |....9...0.0.note|
00000020  a8 00 00 00 ff fe ff 52  44 00 69 00 67 00 69 00  |.......RD.i.g.i.|
00000030  74 00 69 00 7a 00 65 00  72 00 20 00 3a 00 20 00  |t.i.z.e.r. .:. .|
00000040  41 00 45 00 30 00 38 00  33 00 0d 00 0a 00 46 00  |A.E.0.8.3.....F.|
00000050  61 00 62 00 72 00 69 00  63 00 20 00 3a 00 20 00  |a.b.r.i.c. .:. .|
00000060  54 00 77 00 69 00 6c 00  6c 00 20 00 0d 00 0a 00  |T.w.i.l.l. .....|
00000070  4d 00 45 00 4c 00 43 00  4f 00 20 00 2d 00 20 00  |M.E.L.C.O. .-. .|
00000080  41 00 43 00 54 00 49 00  4f 00 4e 00 20 00 49 00  |A.C.T.I.O.N. .I.|
00000090  4c 00 4c 00 55 00 53 00  54 00 52 00 41 00 54 00  |L.L.U.S.T.R.A.T.|

Well that is very different than the earlier example. We can see right away this is a different type of file, in fact the first few bytes tells us this another container format. The Resource Interchange File Format, is used in many various file formats, the most popular are WAVE, AVI, and CorelDRAW. It is a chunk based format and there are a few tools we can use to look closer.

Riffpad can open the file, but claims there is some extra data at the end. It does see four chunks and it gives us the code “OFM8”, which is what identifies this particular RIFF type.

I was also able to get some samples of version 10 of DesignShop and found they are the same OLE container. Also has the same “EdsIV Object” within the container. There is a small paragraph in the EdsIV user manual that indicates there are some versioning within the OFM format.

If you open an EDS III .OFM file and save it, it will be converted into an EDS IV .OFM file, which is no longer readable in EDS III.
Files saved in this version of EDS IV cannot be read by previous versions of EDS IV.

This version of EDS IV is capable of producing two types of OFM files. Files saved as “Melco Project File (.ofm)” can only be read with this version or higher versions of EDS IV. Files saved as “Melco Version 2.00 (.ofm)” can be read by any EDS IV user that has version 2.00.006 or higher software.

It never ceases to amaze me how many formats use the Compound Object Container format. Seems like more and more are documented often. For now, I made a signature to identify the OLE and RIFF version of OFM. I’ll keep my eye out for the older EDS III and other related formats. As always, you can find my signatures and a sample file on my GitHub.

PowerBI

February 23, 2024 by Thor Leave a comment

I think when most of us have some data to sort or make sense of, we tend to gravitate toward a spreadsheet. Using Excel or LibreOffice, or if you really like to party, OpenRefine. There are plenty of meme’s out there representing the frustration people have with bugs, features and limitations of Excel specifically.

Optimist: The glass is ½ full.
Pessimist: The glass is ½ empty.
Excel: The glass is January 2nd.
— jxf@mastodon.social (@jxxf) May 7, 2022

There are more tools out there for making sense of data, one some people have access to is Microsoft’s more advanced PowerBI tool. Marketed as a Data Visualization tool it is accessible to many with a Office 365 subscription. It offers expanded features than excel and isn’t as limited in row maximums.

PowerBi was recently the topic of a Code4Lib editorial issue. The writer of an article for their journal posted two PowerBI datasets which a reader later noticed had private data. After some miscommunications and misunderstandings an open letter was drafted and received some support. Code4Lib did release a statement and lessons were learned.

One statement from the Code4Lib staff caught my eye. “The released files were in a proprietary file format, Microsoft Power BI, with which none of the editors have experience.”

We all use tools for our jobs we are most familiar or available to us. No one can be an expert in all file formats. Some us try, but things change so fast it is impossible. But, we can do more in documenting and making formats identifiable through the tools we use for digital preservation. The File Format Wiki and PRONOM ~~have~~ had no mention of Power BI, so let’s change that.

Microsoft Power BI was released in 2011 and has been part of the Microsoft Power Platform. Power BI can gather data from many sources. The software can be accessed in the Office 365 cloud, but also using a Desktop application. In the desktop application, all the data sources and connections are stored in a single file with the extension PBIX. But there are other related formats.

filename : 'PowerBI-Test.pbix'
filesize : 401951
modified : 2024-02-22T11:29:41-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/263'
    format  : 'ZIP Format'
    version : 
    mime    : 'application/zip'
    class   : 'Aggregate'
    basis   : 'byte match at [[0 4] [401867 3] [401929 4]]'
    warning : 'extension mismatch'

Path = PowerBI-Test.pbix
Type = zip
Physical Size = 401951

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2024-02-22 18:29:40 .....            8           10  Version
2024-02-22 18:29:40 .....          488          230  [Content_Types].xml
2024-02-22 18:29:40 .....       397312       397312  DataModel
2024-02-22 18:29:40 .....         2848          882  Report/Layout
2024-02-22 18:29:40 .....          328          161  Settings
2024-02-22 18:29:40 .....          136          120  Connections
2024-02-22 18:29:40 .....        18972         1733  Report/StaticResources/SharedResources/BaseThemes/CY24SU02.json
2024-02-22 18:29:40 .....          358          357  SecurityBindings
------------------- ----- ------------ ------------  ------------------------
2024-02-22 18:29:40             420450       400805  8 files

Just like many modern Microsoft formats it is a ZIP container with a mixture of XML and JSON. There is also a DataModel file along with Settings and Connections. A quick peek at some of the contents shows us:

hexdump -C PowerBI-Test/Version | head
00000000  31 00 2e 00 32 00 38 00                           |1...2.8.|

hexdump -C PowerBI-Test/DataModel | head
00000000  ff fe 53 00 54 00 52 00  45 00 41 00 4d 00 5f 00  |..S.T.R.E.A.M._.|
00000010  53 00 54 00 4f 00 52 00  41 00 47 00 45 00 5f 00  |S.T.O.R.A.G.E._.|
00000020  53 00 49 00 47 00 4e 00  41 00 54 00 55 00 52 00  |S.I.G.N.A.T.U.R.|
00000030  45 00 5f 00 29 00 21 00  40 00 23 00 24 00 25 00  |E._.).!.@.#.$.%.|
00000040  5e 00 26 00 2a 00 28 00  3c 00 42 00 61 00 63 00  |^.&.*.(.<.B.a.c.|
00000050  6b 00 75 00 70 00 4c 00  6f 00 67 00 3e 00 3c 00  |k.u.p.L.o.g.>.<.|
00000060  42 00 61 00 63 00 6b 00  75 00 70 00 52 00 65 00  |B.a.c.k.u.p.R.e.|
00000070  73 00 74 00 6f 00 72 00  65 00 53 00 79 00 6e 00  |s.t.o.r.e.S.y.n.|
00000080  63 00 56 00 65 00 72 00  73 00 69 00 6f 00 6e 00  |c.V.e.r.s.i.o.n.|
00000090  3e 00 31 00 34 00 30 00  3c 00 2f 00 42 00 61 00  |>.1.4.0.<./.B.a.|

hexdump -C PowerBI-Test/\[Content_Types\].xml | head
00000000  ef bb bf 3c 3f 78 6d 6c  20 76 65 72 73 69 6f 6e  |...<?xml version|
00000010  3d 22 31 2e 30 22 20 65  6e 63 6f 64 69 6e 67 3d  |="1.0" encoding=|
00000020  22 75 74 66 2d 38 22 3f  3e 3c 54 79 70 65 73 20  |"utf-8"?><Types |
00000030  78 6d 6c 6e 73 3d 22 68  74 74 70 3a 2f 2f 73 63  |xmlns="http://sc|
00000040  68 65 6d 61 73 2e 6f 70  65 6e 78 6d 6c 66 6f 72  |hemas.openxmlfor|
00000050  6d 61 74 73 2e 6f 72 67  2f 70 61 63 6b 61 67 65  |mats.org/package|
00000060  2f 32 30 30 36 2f 63 6f  6e 74 65 6e 74 2d 74 79  |/2006/content-ty|
00000070  70 65 73 22 3e 3c 44 65  66 61 75 6c 74 20 45 78  |pes"><Default Ex|
00000080  74 65 6e 73 69 6f 6e 3d  22 6a 73 6f 6e 22 20 43  |tension="json" C|
00000090  6f 6e 74 65 6e 74 54 79  70 65 3d 22 22 20 2f 3e  |ontentType="" />|

So it looks like the ZIP structure follows the standard for OpenXML packages as it contains a “[Content_Types].xml” file. So using this XML alone would clash with too many other formats. From what I could find the “DataModel” file is what stores the data is more unique to this format, even though the name is pretty generic. Using a string within the file would probably help be more accurate. The “DataModel” file does have unicode double byte strings we can use. “STREAM_STORAGE_SIGNATURE” seems like a unique enough string to use, but it looks like it may not be unique to PBIX. Looks like the “DataModel” file is a Microsoft “MS-XLDM” file format and is a “Spreadsheet Data Model File Format“.

There is a variation to the DataModel file and I am not sure when the standard is used verses this variation, “This backup was created using XPress9 compression”. Not sure if it is versioning or how the file is saved, but they both seem to function correctly.

hexdump -C DataModel | head
00000000  54 00 68 00 69 00 73 00  20 00 62 00 61 00 63 00  |T.h.i.s. .b.a.c.|
00000010  6b 00 75 00 70 00 20 00  77 00 61 00 73 00 20 00  |k.u.p. .w.a.s. .|
00000020  63 00 72 00 65 00 61 00  74 00 65 00 64 00 20 00  |c.r.e.a.t.e.d. .|
00000030  75 00 73 00 69 00 6e 00  67 00 20 00 58 00 50 00  |u.s.i.n.g. .X.P.|
00000040  72 00 65 00 73 00 73 00  39 00 20 00 63 00 6f 00  |r.e.s.s.9. .c.o.|
00000050  6d 00 70 00 72 00 65 00  73 00 73 00 69 00 6f 00  |m.p.r.e.s.s.i.o.|
00000060  6e 00 2e 00 00 00 00 b0  07 00 76 75 00 00 2a d7  |n.........vu..*.|
00000070  86 4e 00 b0 07 00 ad ab  03 00 2c cb 06 00 00 00  |.N........,.....|
00000080  00 00 f8 6c 86 7f 00 00  00 00 68 01 56 6e 00 00  |...l......h.Vn..|
00000090  20 82 67 49 52 06 00 f6  ab fc fc fe 2d f6 da 8b  | .gIR.......-...|

After a bit of digging it seems like the MS-XLDM format can be found within an XSLX file. I found an example with these datasets. Within an XSLX there can be a found a file “xl/model/item.data” and it has the same structure as DataModel within a PBIX.

hexdump -C Customer Profitability Sample-no-PV/xl/model/item.data | head
00000000  ff fe 53 00 54 00 52 00  45 00 41 00 4d 00 5f 00  |..S.T.R.E.A.M._.|
00000010  53 00 54 00 4f 00 52 00  41 00 47 00 45 00 5f 00  |S.T.O.R.A.G.E._.|
00000020  53 00 49 00 47 00 4e 00  41 00 54 00 55 00 52 00  |S.I.G.N.A.T.U.R.|
00000030  45 00 5f 00 29 00 21 00  40 00 23 00 24 00 25 00  |E._.).!.@.#.$.%.|
00000040  5e 00 26 00 2a 00 28 00  3c 00 42 00 61 00 63 00  |^.&.*.(.<.B.a.c.|
00000050  6b 00 75 00 70 00 4c 00  6f 00 67 00 3e 00 3c 00  |k.u.p.L.o.g.>.<.|
00000060  42 00 61 00 63 00 6b 00  75 00 70 00 52 00 65 00  |B.a.c.k.u.p.R.e.|
00000070  73 00 74 00 6f 00 72 00  65 00 53 00 79 00 6e 00  |s.t.o.r.e.S.y.n.|
00000080  63 00 56 00 65 00 72 00  73 00 69 00 6f 00 6e 00  |c.V.e.r.s.i.o.n.|
00000090  3e 00 31 00 35 00 30 00  3c 00 2f 00 42 00 61 00  |>.1.5.0.<./.B.a.|

Because this file has a different filename and is in a different path, using “DataModel” should keep identification specific to a PBIX file.

The Power BI Report has a template option. This format uses the .PBIT extension and doesn’t contain any data only a template to use with other data. The structure is roughly the same, but doesn’t contain the “DataModel” file, but “DataModelSchema”, which appears to be a JSON file.

hexdump -C DataModelSchema | head
00000000  7b 00 0d 00 0a 00 20 00  20 00 22 00 6e 00 61 00  |{..... . .".n.a.|
00000010  6d 00 65 00 22 00 3a 00  20 00 22 00 38 00 36 00  |m.e.".:. .".8.6.|
00000020  65 00 34 00 32 00 62 00  33 00 30 00 2d 00 30 00  |e.4.2.b.3.0.-.0.|
00000030  34 00 34 00 33 00 2d 00  34 00 36 00 30 00 63 00  |4.4.3.-.4.6.0.c.|
00000040  2d 00 61 00 36 00 66 00  36 00 2d 00 36 00 66 00  |-.a.6.f.6.-.6.f.|
00000050  34 00 35 00 35 00 66 00  64 00 64 00 31 00 61 00  |4.5.5.f.d.d.1.a.|
00000060  35 00 36 00 22 00 2c 00  0d 00 0a 00 20 00 20 00  |5.6.".,..... . .|
00000070  22 00 63 00 6f 00 6d 00  70 00 61 00 74 00 69 00  |".c.o.m.p.a.t.i.|
00000080  62 00 69 00 6c 00 69 00  74 00 79 00 4c 00 65 00  |b.i.l.i.t.y.L.e.|
00000090  76 00 65 00 6c 00 22 00  3a 00 20 00 31 00 35 00  |v.e.l.".:. .1.5.|

The DataModelSchema JSON has some plain text strings which could be used for identification. Later in the file there is a string, “defaultPowerBIDataSourceVersion“.

000001c0  20 00 20 00 20 00 7d 00  2c 00 0d 00 0a 00 20 00  | . . .}.,..... .|
000001d0  20 00 20 00 20 00 22 00  64 00 65 00 66 00 61 00  | . . .".d.e.f.a.|
000001e0  75 00 6c 00 74 00 50 00  6f 00 77 00 65 00 72 00  |u.l.t.P.o.w.e.r.|
000001f0  42 00 49 00 44 00 61 00  74 00 61 00 53 00 6f 00  |B.I.D.a.t.a.S.o.|
00000200  75 00 72 00 63 00 65 00  56 00 65 00 72 00 73 00  |u.r.c.e.V.e.r.s.|
00000210  69 00 6f 00 6e 00 22 00  3a 00 20 00 22 00 70 00  |i.o.n.".:. .".p.|
00000220  6f 00 77 00 65 00 72 00  42 00 49 00 5f 00 56 00  |o.w.e.r.B.I._.V.|
00000230  33 00 22 00 2c 00 0d 00  0a 00 20 00 20 00 20 00  |3.".,..... . . .|

Seems like the best identification of the template format.

As usual you can find my signature proposal on my GitHub along with a couple “safe” samples.

Compact Pro

February 16, 2024 by Thor 2 Comments

In the Classic Macintosh world back in the day it was important to use compression tools to keep files small and also allow you to send Macintosh files through the internet. Floppy disks could only hold a small amount of data so utilizing compression was a way to use the space effectively. I have already made posts on BINHEX and DiskDoubler which where also used for similar purposes. The most popular compression software for Macintosh is Stuffit, which used .SIT and .SEA extensions. One of the other often used tools was called Compact Pro.

Compact Pro, originally know as Compactor, developed by Bill Goodman in the early 1990’s and was quite popular. It was generally faster in its ability to compress and decompress files on the Macintosh. By 1995 the last version was released and by 2002 the software was officially discontinued.

Also, Macintosh files often contain a Resource Fork to go along with the data. Archiving files within a Compact Pro archive could contain both forks along with creation, modification dates and the finder Type/Creator codes. Then an archive could be transferred through the internet or on a non Macintosh file system without loosing these key bits of information.

You can see from the image below, the compression of a PICT file retained the resource fork and finder data with an impressive 60% savings in size.

Compact Pro could also segment an archive into multiple parts. This was advantageous when needing to copy a larger file on to a set of floppy disks, or for transferring smaller files through the internet and combined later. Segments would be extracted by opening the final segment.

The other nifty feature of Compact Pro is it could create a Self-Extracting Archive. Archiving as an SEA, would compress the file into an archive, but contained within an application which could extract the archive without the use of the the full Compact Pro application. This was used mainly for use on distributed Macintosh file system disks as the application could only be run on a Mac OS system.

Let’s look at the actual Compact Pro file format.

hexdump -C CompactProTest.cpt | head
00000000  01 01 6f 07 00 00 00 cb  80 35 04 56 00 60 50 50  |..o......5.V.`PP|
00000010  00 50 50 00 60 05 60 50  00 00 00 00 00 00 00 00  |.PP.`.`P........|
00000020  00 00 60 00 00 00 00 00  00 00 00 00 00 00 00 00  |..`.............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 30  |...............0|
00000040  00 00 04 60 00 05 00 06  00 55 40 00 00 00 00 00  |...`.....U@.....|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 60 00 00 00  |............`...|
00000070  00 00 00 00 00 40 00 00  00 00 00 00 00 00 00 00  |.....@..........|
00000080  00 00 00 00 00 00 00 00  05 08 00 01 20 00 00 00  |............ ...|
00000090  00 20 01 10 88 c1 04 f6  05 41 3e 47 56 e4 09 5f  |. .......A>GV.._|

hexdump -C CP-s01.cpt | head    
00000000  01 01 90 69 00 00 10 55  80 46 78 67 77 67 78 67  |...i...U.Fxgwgxg|
00000010  86 88 09 89 9a 70 8b 90  ba 97 0a a7 90 87 a6 bb  |.....p..........|
00000020  90 8a a0 90 ab b7 aa a0  a0 80 a8 a0 98 89 00 9a  |................|
00000030  99 80 98 99 69 a9 60 0a  79 ab 86 0a b7 98 a7 90  |....i.`.y.......|
00000040  98 a0 97 7a 90 00 09 00  07 77 80 00 aa 9b 00 ba  |...z.....w......|
00000050  99 a0 90 00 08 08 a0 8a  08 a0 00 00 b9 b0 09 7a  |...............z|
00000060  08 0a aa 90 0a aa 00 00  98 60 90 b9 9b 9a 9a 57  |.........`.....W|
00000070  a8 88 bb aa aa 00 00 77  89 7a 09 b9 89 79 9b 78  |.......w.z...y.x|
00000080  86 80 8a 96 65 55 56 66  65 17 00 02 24 35 46 47  |....eUVfe...$5FG|
00000090  57 67 67 78 88 8a 70 80  80 90 00 a0 90 a0 00 00  |Wggx..p.........|

The file format is not recognized by PRONOM, and as you can see from the headers above, identification is not easy as there are no magic bytes. Using Unarchiver they identify as Compact Pro.

lsar CP-s01.cpt 
CP-s01.cpt: Compact Pro
CP.PICT

The only bytes which seem to be consistent is the first two, but “01 01” is not a signature which is unique to Compact Pro. The Unarchiver uses a more complicated calculation of file size and the CRC for identification, from what I can tell.

hexdump -C CP-s01.sea | head
00000000  01 01 8a 89 00 00 10 55  80 46 78 67 77 67 78 67  |.......U.Fxgwgxg|
00000010  86 88 09 89 9a 70 8b 90  ba 97 0a a7 90 87 a6 bb  |.....p..........|
00000020  90 8a a0 90 ab b7 aa a0  a0 80 a8 a0 98 89 00 9a  |................|
00000030  99 80 98 99 69 a9 60 0a  79 ab 86 0a b7 98 a7 90  |....i.`.y.......|
00000040  98 a0 97 7a 90 00 09 00  07 77 80 00 aa 9b 00 ba  |...z.....w......|
00000050  99 a0 90 00 08 08 a0 8a  08 a0 00 00 b9 b0 09 7a  |...............z|
00000060  08 0a aa 90 0a aa 00 00  98 60 90 b9 9b 9a 9a 57  |.........`.....W|
00000070  a8 88 bb aa aa 00 00 77  89 7a 09 b9 89 79 9b 78  |.......w.z...y.x|
00000080  86 80 8a 96 65 55 56 66  65 17 00 02 24 35 46 47  |....eUVfe...$5FG|
00000090  57 67 67 78 88 8a 70 80  80 90 00 a0 90 a0 00 00  |Wggx..p.........|

The self extracting archive has the same basic structure. I have also noticed on all the archive samples I have, the byte at offset 8 is always “80”. This could be significant.

Another thing to note, when looking at a segmented archive, the first two bytes are in sequence, 0101 for the first, 0102 for the second and so on.

CompactPro could use some further investigation. You can find quite a few on site such as: https://websites.umich.edu/~archive/mac

For now, it would be good to add the CPT extension to PRONOM with the name CompactPro Archive.

SolidWorks

February 2, 2024 by Thor 1 Comment

The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.

I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.

SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.

There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.

Path = flatann.sldprt
Type = Compound
Physical Size = 5851648
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21 D....                            Contents
                    .....        60844        60928  Header
                    .....        45022        45056  Preview
1997-08-05 08:34:06 D....                            ThirdPty
                    .....          237          256  [5]SummaryInformation
1997-08-05 08:34:18 D....                            _MO_VERSION_629
                    .....          157          192  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
                    .....       996343       996352  Contents/Definition
                    .....      1003198      1003520  Contents/Default
                    .....       781536       781824  Contents/DisplayLists
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21            2887463      2888256  8 files, 3 folders

We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.

Path = dispenser.sldasm
Type = Compound
Physical Size = 2143232
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-03-19 17:29:16 D....                            ThirdPty
                    .....        16812        16896  Preview
                    .....         4655         5120  Header
1997-09-04 15:30:48 D....                            Contents
                    .....      1009461      1009664  Contents/DisplayLists
                    .....        23931        24064  Contents/Definition
                    .....          237          256  [5]SummaryInformation
1997-09-04 15:35:39 D....                            _MO_VERSION_629
                    .....          107          128  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
1997-09-04 15:35:39            1055329      1056256  7 files, 3 folders

Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.

The SolidWorks 2000 format added additional files to the container which can help.

Path = SW2000-s01.SLDPRT
Type = Compound
Physical Size = 20992
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51 D....                            _DL_VERSION_1500
                    .....         5300         5632  Preview
                    .....          481          512  Header
2024-01-16 20:00:51 D....                            Contents
2024-01-16 20:00:51 D....                            ThirdPty
                    .....            4           64  Contents/OleItems
                    .....           69          128  Contents/CMgrHdr
                    .....          343          384  Contents/CMgr
                    .....         5456         5632  Contents/Config-0
                    .....          592          640  Contents/DisplayLists__Zip
                    .....          957          960  Contents/Definition
                    .....          252          256  [5]SummaryInformation
2024-01-16 20:00:51 D....                            _MO_VERSION_1500
                    .....          840          896  _MO_VERSION_1500/Biography
                    .....           98          128  _MO_VERSION_1500/History
                    .....          148          192  [5]DocumentSummaryInformation
                    .....          120          128  ISolidWorksInformation
                    .....            6           64  _DL_VERSION_1500/DLUpdateStamp
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51              14666        15616  14 files, 4 folders

The introduction of the “ISolidWorksInformation” file helps give positive identification of the SolidWorks format.

hexdump -C SW2000-s01.SLDPRT/ISolidWorksInformation      
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  48 00 00 00 02 00 00 00  02 00 00 00 18 00 00 00  |H...............|
00000040  00 00 00 00 24 00 00 00  1e 00 00 00 01 00 00 00  |....$...........|
00000050  00 00 00 00 02 00 00 00  00 00 00 00 01 00 00 00  |................|
00000060  00 02 00 00 00 0d 00 00  00 53 57 2d 46 69 6c 65  |.........SW-File|
00000070  20 4e 61 6d 65 00 00 00                           | Name...|

hexdump -C SW2000-s02.SLDASM/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  6c 00 00 00 03 00 00 00  02 00 00 00 20 00 00 00  |l........... ...|
00000040  03 00 00 00 2c 00 00 00  00 00 00 00 34 00 00 00  |....,.......4...|
00000050  1e 00 00 00 01 00 00 00  00 00 00 00 0b 00 00 00  |................|
00000060  00 00 00 00 03 00 00 00  00 00 00 00 01 00 00 00  |................|
00000070  00 03 00 00 00 0e 00 00  00 41 73 73 65 6d 62 6c  |.........Assembl|
00000080  79 20 74 79 70 65 00 02  00 00 00 0d 00 00 00 53  |y type.........S|
00000090  57 2d 46 69 6c 65 20 4e  61 6d 65 00              |W-File Name.|

hexdump -C SW2000-s01.SLDDRW/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  bc 01 00 00 0a 00 00 00  02 00 00 00 58 00 00 00  |............X...|
00000040  03 00 00 00 64 00 00 00  04 00 00 00 70 00 00 00  |....d.......p...|
00000050  05 00 00 00 7c 00 00 00  06 00 00 00 88 00 00 00  |....|...........|
*
000000d0  05 00 00 00 52 27 a0 89  b0 e1 d1 3f 05 00 00 00  |....R'.....?....|
000000e0  51 6b 9a 77 9c a2 cb 3f  03 00 00 00 00 00 00 00  |Qk.w...?........|
000000f0  0a 00 00 00 00 00 00 00  01 00 00 00 00 04 00 00  |................|
00000100  00 15 00 00 00 53 57 2d  53 68 65 65 74 20 46 6f  |.....SW-Sheet Fo|
00000110  72 6d 61 74 20 53 69 7a  65 00 05 00 00 00 11 00  |rmat Size.......|
00000120  00 00 53 57 2d 43 75 72  72 65 6e 74 20 53 68 65  |..SW-Current She|
00000130  65 74 00 08 00 00 00 19  00 00 00 41 63 74 69 76  |et.........Activ|
00000140  65 20 73 68 65 65 74 20  70 61 70 65 72 20 77 69  |e sheet paper wi|
00000150  64 74 68 00 02 00 00 00  0d 00 00 00 53 57 2d 46  |dth.........SW-F|
00000160  69 6c 65 20 4e 61 6d 65  00 09 00 00 00 14 00 00  |ile Name........|
00000170  00 41 63 74 69 76 65 20  73 68 65 65 74 20 48 65  |.Active sheet He|
00000180  69 67 68 74 00 07 00 00  00 0e 00 00 00 53 57 2d  |ight.........SW-|
00000190  53 68 65 65 74 20 4e 61  6d 65 00 0a 00 00 00 18  |Sheet Name......|
000001a0  00 00 00 41 63 74 69 76  65 20 73 68 65 65 74 20  |...Active sheet |
000001b0  70 61 70 65 72 20 73 69  7a 65 00 03 00 00 00 0f  |paper size......|
000001c0  00 00 00 53 57 2d 53 68  65 65 74 20 53 63 61 6c  |...SW-Sheet Scal|
000001d0  65 00 06 00 00 00 10 00  00 00 53 57 2d 54 6f 74  |e.........SW-Tot|
000001e0  61 6c 20 53 68 65 65 74  73 00 00 00              |al Sheets...|

Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:

hexdump -C Bracket.SLDPRT | head
00000000  9f e4 18 9f 00 00 00 04  26 00 42 15 14 00 06 00  |........&.B.....|
00000010  08 00 06 00 40 a5 c3 a7  0e 51 5b 03 00 00 91 07  |....@....Q[.....|
00000020  00 00 0d 00 00 00 34 f6  e6 47 56 e6 47 37 f2 34  |......4..GV.G7.4|
00000030  d4 76 27 b5 55 5d 48 14  51 14 3e ab 2e f6 63 65  |.v'.U]H.Q.>...ce|
00000040  8b be 55 2e 42 0f 45 89  05 16 68 3a 93 eb f6 03  |..U.B.E...h:....|
00000050  ab 2e ae 89 d4 c2 3a ee  ce ae 53 bb 3b cb cc 2e  |......:...S.;...|
00000060  18 42 0d f8 16 41 3d 95  42 94 24 41 b0 3d 54 14  |.B...A=.B.$A.=T.|
00000070  fd 68 ad 52 0f 45 54 06  61 84 d1 0f 52 3e 44 20  |.h.R.ET.a...R>D |
00000080  48 af 6e e7 cc cc dd 3f  5d ea a5 3b dc 3d df f9  |H.n....?]..;.=..|
00000090  b9 e7 9c 7b ef b9 67 d3  69 0b 54 41 44 76 c8 d1  |...{..g.i.TADv..|

hexdump -C SW2023-s01.SLDPRT | head
00000000  f4 e9 02 fc 00 00 00 04  51 3f 60 ad 6a 35 f9 b3  |........Q?`.j5..|
00000010  14 00 06 00 08 00 a8 8c  60 c0 d0 05 00 00 74 01  |........`.....t.|
00000020  00 00 e8 02 00 00 07 00  00 00 05 27 56 67 96 56  |...........'Vg.V|
00000030  77 d6 df ea 07 e7 cf ed  c6 8e 6c a1 48 70 d6 76  |w.........l.Hp.v|
00000040  cd 16 7f e9 6b 95 3a 4e  bb 6e 95 cc d2 b3 69 a9  |....k.:N.n....i.|
00000050  72 6b af c7 82 38 95 6f  bc 37 d2 4e a6 28 36 bd  |rk...8.o.7.N.(6.|
00000060  c3 cf 85 46 0a 85 63 97  83 56 88 a1 38 02 64 14  |...F..c..V..8.d.|
00000070  00 06 00 08 00 a8 8c 60  c0 44 07 00 00 d1 01 00  |.......`.D......|
00000080  00 a2 03 00 00 22 00 00  00 34 f6 e6 47 56 e6 47  |....."...4..GV.G|
00000090  37 f2 34 f6 e6 66 96 76  d2 03 d2 25 56 37 f6 c6  |7.4..f.v...%V7..|

The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.

More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.

FlashPix

January 19, 2024 by Thor Leave a comment

Is there a perfect raster image format? TIFF has been around quite some time and is generally accepted as a preferred preservation format. There have been a few attempts to have a single file contain multiple resolutions with the purpose of providing resolutions for different uses, lower-resolution for web and higher-resolution for print. Even the semi popular JPEG2000 added multiple resolutions to improve the JPEG format. Kodak came up with a few ideas to do this as well. The Kodak PCD, PhotoCD or Image PAC files was one that was used for awhile before it was abandoned. Another was FlashPix.

I briefly mentioned FlashPix on an earlier post about the Microsoft Picture It! format. They are extremely similar. Both. have the same basic structure in a Compound Object format. Some of the FlashPix files generated by Picture It! even have the same identifiers in the CompObj header.

FlashPix was supposed to be the answer to all the problems with storing bitmap image data and how we view the web. Kodak partnered with some big names, Microsoft Corporation, Hewlett-Packard Company and Live Picture, Inc, were among them. Kodak marketed the format and even included it as a native file format to some of its new digital cameras. The format was made official in June of 1996, with a Whitepaper explaining all the benefits and architecture. There was a lot of hype, some even calling it, “Not your Grandma’s format“. Many graphics software started to include support for the new format, including Adobe Photoshop. So what happened, why didn’t the format catch on? Some say it was the size of storing multiple resolutions in one file, others believe it was the complicated Compound Object structure that lead to its demise. Either way, the format had a lot of hype in the late 1990’s, but by the year 2000, it had gone silent and all the websites went away.

FlashPix did have a big impact, and there were many software and hardware devices which were made compatible. There are a few stories left behind of those who scanned all their photos to the FlashPix format only to find a few years later it was unsupported on more modern computers. There was also a few early digital camera’s which could capture directly to the format. Take my Kodak DC260 zoom camera, circa 1998. Changing the Capture Preferences, I can switch between a JPG and FPX.

Using exiftool we can take a look at one of the images from the camera:

exiftool P0004795.FPX
ExifTool Version Number         : 12.73
File Name                       : P0004795.FPX
Directory                       : GitHub/digicam_corpus/Kodak/DC260/DC260_01
File Size                       : 251 kB
File Modification Date/Time     : 2024:01:06 12:54:20-07:00
File Access Date/Time           : 2024:01:06 13:20:46-07:00
File Inode Change Date/Time     : 2024:01:06 13:04:34-07:00
File Permissions                : -rwxrwxrwx
File Type                       : FPX
File Type Extension             : fpx
MIME Type                       : image/vnd.fpx
Code Page                       : Unicode UTF-16, little endian
Data Object ID                  : 13BC5A58-6B90-1B6B-12C9-0800201177F8
Data Object Status              : Exists, Not Purgeable
Creating Transform              : Source Image
Using Transforms                : 
Cached Image Height             : 1024
Cached Image Width              : 1536
Comp Obj User Type Len          : 16
Comp Obj User Type              : FlashPix_Object
Visible Outputs                 : 1
Maximum Image Index             : 1
Maximum Transform Index         : 0
Maximum Operation Index         : 0
Thumbnail Clip                  : (Binary data 18480 bytes, use -b option to extract)
Revision Number                 : 1
Create Date                     : 2024:01:06 12:53:29
Modify Date                     : 2024:01:06 12:53:29
Software                        : KODAK DIGITAL SCIENCE DC260
Image Width                     : 1536
Image Height                    : 1024
Subimage Width                  : 1536
Subimage Height                 : 1024
Subimage Color                  : RGB
Subimage Numerical Format       : 8-bit, Unsigned
Decimation Method               : None (Full-sized Image)
JPEG Tables                     : (Binary data 558 bytes, use -b option to extract)
Number Of Resolutions           : 1
Max JPEG Table Index            : 1
Scene Type                      : Original Scene
Software Release                : KODAK DIGITAL SCIENCE DC260
Make                            : Eastman Kodak Company
Camera Model Name               : KODAK DIGITAL SCIENCE DC260
Serial Number                   : 7577
Exposure Time                   : 1/180
F Number                        : 4.7
Exposure Program                : Program AE
Exposure Compensation           : 0
Subject Distance                : 0.520 m
Metering Mode                   : Center-weighted average
Light Source                    : Unknown
Focal Length                    : 24.0 mm
Max Aperture Value              : 4.6
Flash                           : No Flash
Exposure Index                  : 90
Sharpness Approximation         : 0
File Source                     : Digital Camera
Sensing Method                  : One-chip color area
Extension Create Date           : 2024:01:06 12:53:29
Extension Modify Date           : 2024:01:06 12:53:29
Creating Application            : Picoss
Extension Name                  : ijuhsimasa
Extension Persistence           : Always Valid
Extension Description           : Data Object Store 000001
Storage-Stream Pathname         : /Data Object Store 000001
Extension Class ID              : 56616000-C154-11CE-8553-00AA00A1F95B
Used Extension Numbers          : 1
Screen Nail                     : (Binary data 4304 bytes, use -b option to extract)
Subimage Tile Count             : 384
Subimage Tile Width             : 64
Subimage Tile Height            : 64
Num Channels                    : 3
Audio Stream                    : (Binary data 30780 bytes, use -b option to extract)
Aperture                        : 4.7
Image Size                      : 1536x1024
Megapixels                      : 1.6
Shutter Speed                   : 1/180
Preview Image                   : (Binary data 4164 bytes, use -b option to extract)
Focal Length                    : 24.0 mm

The file also does identify in PRONOM:

sf P0004795.FPX 
---
siegfried   : 1.11.0
scandate    : 2024-01-17T23:13:59-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'P0004795.FPX'
filesize : 250880
modified : 2024-01-06T12:54:20-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/56'
    format  : 'Kodak FlashPix Image'
    version : 
    mime    : 'image/vnd.fpx'
    class   : 'Image (Raster)'
    basis   : 'extension match fpx; container name CompObj with byte match at 53, 36 (signature 2/2)'
    warning :

If you notice, PRONOM has two signatures for the FlashPix format, this image was identified with signature #2. The first signature looks for the string “FlashPix Object”, but the second looks for the CLSID which is unique to each compound object format. FlashPix has the CLSID: {56616700-c154-11ce-8553-00aa00a1f95b}. Looking at many of the other samples I have there is much variation on the use of the string and CLSID.

FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

The images from the Kodak Camera use “FlashPix_Object” string so with the underscore it doesn’t match the first signature, but others I made using Picture It! software used a couple variations. Many don’t use the string at all. Others use a sightly different CLSID in both uppercase and lowercase. We will have to suggest adjustments to the current signature to identify them all.

Looking at the contents of the OLE container we can see some interesting things.

Path = P0004795.FPX
Type = Compound
Physical Size = 250880
Extension = compound
Cluster Size = 512
Sector Size = 64

Size         Compressed     Name
------------ ------------  ------------------------
188          192           [5]Data Object 000001
272          320           [1]CompObj
388          448           [5]Extension List
144          192           [5]Global Info
                           Data Object Store 000001
18704        18944         [5]SummaryInformation
816          832           Data Object Store 000001/[5]Image Contents
272          320           Data Object Store 000001/[1]CompObj
988          1024          Data Object Store 000001/[5]Extension List
1624         1664          Data Object Store 000001/[5]Image Info
4332         4608          Data Object Store 000001/[5]Screen Nail_bd0100609719a180
                           Data Object Store 000001/Resolution 0005
                           Data Object Store 000001/Audio_bd0100609719a180
1112         1152          Data Object Store 000001/[5]KDC_bd0100609719a180
72           128           Data Object Store 000001/[5]SummaryInformation
108          128           Data Object Store 000001/Audio_bd0100609719a180/[5]Audio Info
30808        31232         Data Object Store 000001/Audio_bd0100609719a180/Audio Stream 000000
6208         6656          Data Object Store 000001/Resolution 0005/Subimage 0000 Header
176378       176640        Data Object Store 000001/Resolution 0005/Subimage 0000 Data
------------ ------------  ------------------------
242414       244480        16 files, 3 folders

The main CompObj is where we find the identification information, but the Data Object Store 000001 directory is where all the image data is stored. In a multiple resolution image we might see additional Resolution directories. You may also notice a mention of an Audio directory. Yes, this image was captured and then audio was recorded with it. Not a video, but an audio clip associated with the image. FlashPix can contain audio streams. This isn’t the first time we have seen this, HP camera’s also have this function which as it turns out is stored in a FlashPix exif extension within a JPEG.

The FlashPix native format may have disappeared, but the format lives on as an extension to Exif data, allowing you to embed audio and other media within a JPEG file. The code for FlashPix was given to ImageMagick and is maintained by them.