SolidWorks

February 2, 2024 by Thor 1 Comment

The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.

I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.

SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.

There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.

Path = flatann.sldprt
Type = Compound
Physical Size = 5851648
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21 D....                            Contents
                    .....        60844        60928  Header
                    .....        45022        45056  Preview
1997-08-05 08:34:06 D....                            ThirdPty
                    .....          237          256  [5]SummaryInformation
1997-08-05 08:34:18 D....                            _MO_VERSION_629
                    .....          157          192  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
                    .....       996343       996352  Contents/Definition
                    .....      1003198      1003520  Contents/Default
                    .....       781536       781824  Contents/DisplayLists
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21            2887463      2888256  8 files, 3 folders

We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.

Path = dispenser.sldasm
Type = Compound
Physical Size = 2143232
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-03-19 17:29:16 D....                            ThirdPty
                    .....        16812        16896  Preview
                    .....         4655         5120  Header
1997-09-04 15:30:48 D....                            Contents
                    .....      1009461      1009664  Contents/DisplayLists
                    .....        23931        24064  Contents/Definition
                    .....          237          256  [5]SummaryInformation
1997-09-04 15:35:39 D....                            _MO_VERSION_629
                    .....          107          128  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
1997-09-04 15:35:39            1055329      1056256  7 files, 3 folders

Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.

The SolidWorks 2000 format added additional files to the container which can help.

Path = SW2000-s01.SLDPRT
Type = Compound
Physical Size = 20992
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51 D....                            _DL_VERSION_1500
                    .....         5300         5632  Preview
                    .....          481          512  Header
2024-01-16 20:00:51 D....                            Contents
2024-01-16 20:00:51 D....                            ThirdPty
                    .....            4           64  Contents/OleItems
                    .....           69          128  Contents/CMgrHdr
                    .....          343          384  Contents/CMgr
                    .....         5456         5632  Contents/Config-0
                    .....          592          640  Contents/DisplayLists__Zip
                    .....          957          960  Contents/Definition
                    .....          252          256  [5]SummaryInformation
2024-01-16 20:00:51 D....                            _MO_VERSION_1500
                    .....          840          896  _MO_VERSION_1500/Biography
                    .....           98          128  _MO_VERSION_1500/History
                    .....          148          192  [5]DocumentSummaryInformation
                    .....          120          128  ISolidWorksInformation
                    .....            6           64  _DL_VERSION_1500/DLUpdateStamp
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51              14666        15616  14 files, 4 folders

The introduction of the “ISolidWorksInformation” file helps give positive identification of the SolidWorks format.

hexdump -C SW2000-s01.SLDPRT/ISolidWorksInformation      
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  48 00 00 00 02 00 00 00  02 00 00 00 18 00 00 00  |H...............|
00000040  00 00 00 00 24 00 00 00  1e 00 00 00 01 00 00 00  |....$...........|
00000050  00 00 00 00 02 00 00 00  00 00 00 00 01 00 00 00  |................|
00000060  00 02 00 00 00 0d 00 00  00 53 57 2d 46 69 6c 65  |.........SW-File|
00000070  20 4e 61 6d 65 00 00 00                           | Name...|

hexdump -C SW2000-s02.SLDASM/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  6c 00 00 00 03 00 00 00  02 00 00 00 20 00 00 00  |l........... ...|
00000040  03 00 00 00 2c 00 00 00  00 00 00 00 34 00 00 00  |....,.......4...|
00000050  1e 00 00 00 01 00 00 00  00 00 00 00 0b 00 00 00  |................|
00000060  00 00 00 00 03 00 00 00  00 00 00 00 01 00 00 00  |................|
00000070  00 03 00 00 00 0e 00 00  00 41 73 73 65 6d 62 6c  |.........Assembl|
00000080  79 20 74 79 70 65 00 02  00 00 00 0d 00 00 00 53  |y type.........S|
00000090  57 2d 46 69 6c 65 20 4e  61 6d 65 00              |W-File Name.|

hexdump -C SW2000-s01.SLDDRW/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  bc 01 00 00 0a 00 00 00  02 00 00 00 58 00 00 00  |............X...|
00000040  03 00 00 00 64 00 00 00  04 00 00 00 70 00 00 00  |....d.......p...|
00000050  05 00 00 00 7c 00 00 00  06 00 00 00 88 00 00 00  |....|...........|
*
000000d0  05 00 00 00 52 27 a0 89  b0 e1 d1 3f 05 00 00 00  |....R'.....?....|
000000e0  51 6b 9a 77 9c a2 cb 3f  03 00 00 00 00 00 00 00  |Qk.w...?........|
000000f0  0a 00 00 00 00 00 00 00  01 00 00 00 00 04 00 00  |................|
00000100  00 15 00 00 00 53 57 2d  53 68 65 65 74 20 46 6f  |.....SW-Sheet Fo|
00000110  72 6d 61 74 20 53 69 7a  65 00 05 00 00 00 11 00  |rmat Size.......|
00000120  00 00 53 57 2d 43 75 72  72 65 6e 74 20 53 68 65  |..SW-Current She|
00000130  65 74 00 08 00 00 00 19  00 00 00 41 63 74 69 76  |et.........Activ|
00000140  65 20 73 68 65 65 74 20  70 61 70 65 72 20 77 69  |e sheet paper wi|
00000150  64 74 68 00 02 00 00 00  0d 00 00 00 53 57 2d 46  |dth.........SW-F|
00000160  69 6c 65 20 4e 61 6d 65  00 09 00 00 00 14 00 00  |ile Name........|
00000170  00 41 63 74 69 76 65 20  73 68 65 65 74 20 48 65  |.Active sheet He|
00000180  69 67 68 74 00 07 00 00  00 0e 00 00 00 53 57 2d  |ight.........SW-|
00000190  53 68 65 65 74 20 4e 61  6d 65 00 0a 00 00 00 18  |Sheet Name......|
000001a0  00 00 00 41 63 74 69 76  65 20 73 68 65 65 74 20  |...Active sheet |
000001b0  70 61 70 65 72 20 73 69  7a 65 00 03 00 00 00 0f  |paper size......|
000001c0  00 00 00 53 57 2d 53 68  65 65 74 20 53 63 61 6c  |...SW-Sheet Scal|
000001d0  65 00 06 00 00 00 10 00  00 00 53 57 2d 54 6f 74  |e.........SW-Tot|
000001e0  61 6c 20 53 68 65 65 74  73 00 00 00              |al Sheets...|

Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:

hexdump -C Bracket.SLDPRT | head
00000000  9f e4 18 9f 00 00 00 04  26 00 42 15 14 00 06 00  |........&.B.....|
00000010  08 00 06 00 40 a5 c3 a7  0e 51 5b 03 00 00 91 07  |....@....Q[.....|
00000020  00 00 0d 00 00 00 34 f6  e6 47 56 e6 47 37 f2 34  |......4..GV.G7.4|
00000030  d4 76 27 b5 55 5d 48 14  51 14 3e ab 2e f6 63 65  |.v'.U]H.Q.>...ce|
00000040  8b be 55 2e 42 0f 45 89  05 16 68 3a 93 eb f6 03  |..U.B.E...h:....|
00000050  ab 2e ae 89 d4 c2 3a ee  ce ae 53 bb 3b cb cc 2e  |......:...S.;...|
00000060  18 42 0d f8 16 41 3d 95  42 94 24 41 b0 3d 54 14  |.B...A=.B.$A.=T.|
00000070  fd 68 ad 52 0f 45 54 06  61 84 d1 0f 52 3e 44 20  |.h.R.ET.a...R>D |
00000080  48 af 6e e7 cc cc dd 3f  5d ea a5 3b dc 3d df f9  |H.n....?]..;.=..|
00000090  b9 e7 9c 7b ef b9 67 d3  69 0b 54 41 44 76 c8 d1  |...{..g.i.TADv..|

hexdump -C SW2023-s01.SLDPRT | head
00000000  f4 e9 02 fc 00 00 00 04  51 3f 60 ad 6a 35 f9 b3  |........Q?`.j5..|
00000010  14 00 06 00 08 00 a8 8c  60 c0 d0 05 00 00 74 01  |........`.....t.|
00000020  00 00 e8 02 00 00 07 00  00 00 05 27 56 67 96 56  |...........'Vg.V|
00000030  77 d6 df ea 07 e7 cf ed  c6 8e 6c a1 48 70 d6 76  |w.........l.Hp.v|
00000040  cd 16 7f e9 6b 95 3a 4e  bb 6e 95 cc d2 b3 69 a9  |....k.:N.n....i.|
00000050  72 6b af c7 82 38 95 6f  bc 37 d2 4e a6 28 36 bd  |rk...8.o.7.N.(6.|
00000060  c3 cf 85 46 0a 85 63 97  83 56 88 a1 38 02 64 14  |...F..c..V..8.d.|
00000070  00 06 00 08 00 a8 8c 60  c0 44 07 00 00 d1 01 00  |.......`.D......|
00000080  00 a2 03 00 00 22 00 00  00 34 f6 e6 47 56 e6 47  |....."...4..GV.G|
00000090  37 f2 34 f6 e6 66 96 76  d2 03 d2 25 56 37 f6 c6  |7.4..f.v...%V7..|

The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.

More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.

AskSam

January 26, 2024 by Thor Leave a comment

I was recently asked to look at a set of files with the extension of .ASK. A quick little search led me to find they belong to AskSam which was a free-form database software often used by researchers and libraries as early as 1985. The first few versions of Access Stored Knowledge via Symbolic Access Method were released for DOS and later Windows. The company askSam Systems disappeared around 2015.

The AskSam software competed with other personal information managers with unstructured data storage and retrieval. It was used to keep track of e-mail, special collections, letters, articles, web sites, etc. It could index all the contents and make searching and retrieval easy. By setting up fields the data could be exported to delimitated text. The software also appears to have been localized in German, but file format is the same.

AskSam had many import filters which included:

Microsoft Word
WordPerfect
Text (ASCII files)
HTML Files (from the Internet)
RTF Files (Rich Text Format)
Eudora E-Mail
Microsoft Outlook
Microsoft Outlook Express
Text delimited files – Comma Separated Values, Fixed position, etc.
dBASE
FoxPro
Paradox
Microsoft Access
Microsoft Excel

AskSam has its own proprietary format to store the database using the .ASK extension. They appear to have a 256 byte header. All the DOS versions of the software use the simple BOF string of “askSam”.

hexdump -C TEST.ASK       
00000000  61 73 6b 53 61 6d 00 00  00 00 00 07 0f 01 00 00  |askSam..........|
00000010  01 00 00 00 00 01 00 05  00 37 00 02 00 00 00 01  |.........7......|
00000020  33 00 32 00 00 00 00 00  50 00 00 00 00 00 00 00  |3.2.....P.......|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000d0  00 14 00 01 00 00 01 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 01 00 00  00 00 03 1d 42 00 01 00  |............B...|
000000f0  00 13 01 00 00 00 00 01  00 00 00 00 00 00 00 00  |................|
00000100  00 00 00 00 f6 00 00 00  00 54 65 73 74 01 01 01  |.........Test...|
00000110  01 01 00

When the first Windows version came out in 1993, the header changed to the logical string:

hexdump -C DOS-WIN.ASK | head
00000000  61 73 6b 77 69 6e 00 00  00 00 00 07 0f 01 00 04  |askwin..........|
00000010  01 00 00 00 01 01 00 05  01 37 03 00 00 00 00 01  |.........7......|
00000020  64 00 32 2e 01 4e 00 00  a0 00 00 00 00 00 00 00  |d.2..N..........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 76 43 00  |.............vC.|
00000050  00 8c 00 00 00 00 00 00  00 00 00 00 00 01 00 00  |................|
00000060  00 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 41 72 69 61 6c 00 72  20 4e 65 77 00 00 00 00  |.Arial.r New....|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 5b 3a 00 10 10 10  |..........[:....|

With Version 2 for Windows we start seeing a slightly different header:

hexdump -C AS2W-S01.ASK 
00000000  61 73 6b 57 69 53 00 00  00 00 00 07 0f 01 00 04  |askWiS..........|
00000010  01 00 00 00 01 01 00 05  00 37 03 00 00 00 00 01  |.........7......|
00000020  c8 00 32 2f 02 4c 00 00  a0 00 00 00 00 00 00 00  |..2/.L..........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000070  00 43 6f 75 72 69 65 72  20 4e 65 77 00 00 00 00  |.Courier New....|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 5b 3a 00 10 10 14  |..........[:....|
000000a0  14 02 00 00 0a 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000c0  00 00 00 00 00 00 00 00  00 00 00 60 00 00 00 00  |...........`....|
000000d0  05 00 00 00 00 00 01 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 01 00 00  00 00 00 00 00 00 01 00  |................|
000000f0  00 1d 01 00 00 00 00 01  00 00 00 00 00 00 0a 00  |................|
00000100  00 00 00 00 f6 00 00 00  0a 54 65 73 74 69 6e 67  |.........Testing|
00000110  20 20 00 0a 01 09 10 c0  14 14 42 07 01           |  ........B..|

Then all samples from version 4 to the final version 7 all have the same header, although I know there is some features in the later versions that make them incompatible, there isn’t a easy way to identify the different versions after version 4.

hexdump -C Asksam4-s01.ask | head
00000000  61 73 6b 77 34 30 00 00  00 00 25 00 00 00 00 00  |askw40....%.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000f0  00 00 00 00 00 00 00 00  00 00 02 00 00 00 e5 38  |...............8|
00000100  0c 3a 67 31 4d 38 dd b5  9c 65 00 00 00 00 90 01  |.:g1M8...e......|
00000110  00 00 01 01 0c 43 00 00  00 00 00 00 be 00 00 00  |.....C..........|
00000120  24 14 00 00 00 00 00 00  10 14 00 00 00 00 00 00  |$...............|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000150  7b 4c 00 00 00 00 00 00  af 4f 00 00 00 00 00 00  |{L.......O......|

hexdump -C AskSam6-s01.ask | head
00000000  61 73 6b 77 34 30 00 00  00 00 38 00 00 00 00 00  |askw40....8.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000f0  00 00 00 00 00 00 00 00  00 00 02 00 00 00 21 f1  |..............!.|
00000100  ad 41 61 9f c0 39 cd 4a  af 65 00 00 00 00 58 02  |.Aa..9.J.e....X.|
00000110  00 00 01 01 84 2e 00 00  00 00 00 00 be 00 00 00  |................|
00000120  24 14 00 00 00 00 00 00  50 13 00 00 00 00 00 00  |$.......P.......|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 c6 5b 00 00  |.............[..|
00000150  ba 33 00 00 00 00 00 00  53 33 00 00 00 00 00 00  |.3......S3......|

hexdump -C AskSam7-s01.ask | head
00000000  61 73 6b 77 34 30 00 00  00 00 87 04 00 00 00 00  |askw40..........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000f0  00 00 00 00 00 00 00 00  00 00 02 00 00 00 b2 fd  |................|
00000100  b5 47 61 9f c0 39 5c 4b  af 65 00 00 00 00 bc 02  |.Ga..9\K.e......|
00000110  00 00 01 01 db 34 00 00  00 00 00 00 be 00 00 00  |.....4..........|
00000120  24 14 00 00 00 00 00 00  50 13 00 00 00 00 00 00  |$.......P.......|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000150  aa 39 00 00 00 00 00 00  de 3c 00 00 00 00 00 00  |.9.......<......|

Even though everything after version 4 for Windows has the same header, files created in version 7 will not open in version 6. There must be some additional byte sequences which identify the files with the version which created the file. I have been unable to located the free askSam 7 viewer, but here is a link to the version 6 free viewer. It runs in the latest Windows OS. If you open an older version it will ask you to upgrade your file, so be sure to keep a copy of your original.

Once you have your ASK Database opened, you can export to a few formats, an RTF or a delimitated text file based on fields you have entered in the form. Word of warning, if you entered a password to protect modifying of your data in an earlier version, you have to re-enter the password in order to open/upgrade the file, but the viewer will not open password protected files, you will need the full version.

Here are two files created in AskSam 5.11 DOS, one without a password one with. You can see the 16 byte hex values from offset 41 to 57 are zeros in the file with no password and full of values in the protected file. I’m sure someone with more skills could figure out the encryption.

hexdump -C AS5-OPEN.ASK 
00000000  61 73 6b 53 61 6d 00 00  00 00 00 07 0f 01 00 00  |askSam..........|
00000010  01 00 00 00 00 01 00 05  00 37 00 02 00 00 00 01  |.........7......|
00000020  33 00 32 00 00 00 00 00  50 00 00 00 00 00 00 00  |3.2.....P.......|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000d0  00 14 00 01 00 00 01 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 01 00 00  00 00 03 1d 42 00 01 00  |............B...|
000000f0  00 13 01 00 00 00 00 01  00 00 00 00 00 00 00 00  |................|
00000100  00 00 00 00 f6 00 00 00  00 54 65 73 74 69 6e 67  |.........Testing|
00000110  01 01 00                                          |...|

hexdump -C AS5-PASS.ASK 
00000000  61 73 6b 53 61 6d 00 00  01 00 00 07 0f 01 00 00  |askSam..........|
00000010  01 00 00 00 00 01 00 05  00 37 00 02 00 00 00 01  |.........7......|
00000020  33 00 32 00 00 00 00 00  50 66 5f 14 66 42 53 40  |3.2.....Pf_.fBS@|
00000030  42 71 29 59 6a 61 62 60  6e 00 00 00 00 00 00 00  |Bq)Yjab`n.......|
*
000000d0  00 14 00 01 00 00 01 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 01 00 00  00 00 03 1d 42 00 01 00  |............B...|
000000f0  00 13 01 00 00 00 00 01  00 00 00 00 00 00 00 00  |................|
00000100  00 00 00 00 f6 00 00 00  00 54 65 73 74 69 6e 67  |.........Testing|
00000110  01 01 00                                          |...|

You can check out my samples and my recommendation to PRONOM on my Github page.

FlashPix

January 19, 2024 by Thor Leave a comment

Is there a perfect raster image format? TIFF has been around quite some time and is generally accepted as a preferred preservation format. There have been a few attempts to have a single file contain multiple resolutions with the purpose of providing resolutions for different uses, lower-resolution for web and higher-resolution for print. Even the semi popular JPEG2000 added multiple resolutions to improve the JPEG format. Kodak came up with a few ideas to do this as well. The Kodak PCD, PhotoCD or Image PAC files was one that was used for awhile before it was abandoned. Another was FlashPix.

I briefly mentioned FlashPix on an earlier post about the Microsoft Picture It! format. They are extremely similar. Both. have the same basic structure in a Compound Object format. Some of the FlashPix files generated by Picture It! even have the same identifiers in the CompObj header.

FlashPix was supposed to be the answer to all the problems with storing bitmap image data and how we view the web. Kodak partnered with some big names, Microsoft Corporation, Hewlett-Packard Company and Live Picture, Inc, were among them. Kodak marketed the format and even included it as a native file format to some of its new digital cameras. The format was made official in June of 1996, with a Whitepaper explaining all the benefits and architecture. There was a lot of hype, some even calling it, “Not your Grandma’s format“. Many graphics software started to include support for the new format, including Adobe Photoshop. So what happened, why didn’t the format catch on? Some say it was the size of storing multiple resolutions in one file, others believe it was the complicated Compound Object structure that lead to its demise. Either way, the format had a lot of hype in the late 1990’s, but by the year 2000, it had gone silent and all the websites went away.

FlashPix did have a big impact, and there were many software and hardware devices which were made compatible. There are a few stories left behind of those who scanned all their photos to the FlashPix format only to find a few years later it was unsupported on more modern computers. There was also a few early digital camera’s which could capture directly to the format. Take my Kodak DC260 zoom camera, circa 1998. Changing the Capture Preferences, I can switch between a JPG and FPX.

Using exiftool we can take a look at one of the images from the camera:

exiftool P0004795.FPX
ExifTool Version Number         : 12.73
File Name                       : P0004795.FPX
Directory                       : GitHub/digicam_corpus/Kodak/DC260/DC260_01
File Size                       : 251 kB
File Modification Date/Time     : 2024:01:06 12:54:20-07:00
File Access Date/Time           : 2024:01:06 13:20:46-07:00
File Inode Change Date/Time     : 2024:01:06 13:04:34-07:00
File Permissions                : -rwxrwxrwx
File Type                       : FPX
File Type Extension             : fpx
MIME Type                       : image/vnd.fpx
Code Page                       : Unicode UTF-16, little endian
Data Object ID                  : 13BC5A58-6B90-1B6B-12C9-0800201177F8
Data Object Status              : Exists, Not Purgeable
Creating Transform              : Source Image
Using Transforms                : 
Cached Image Height             : 1024
Cached Image Width              : 1536
Comp Obj User Type Len          : 16
Comp Obj User Type              : FlashPix_Object
Visible Outputs                 : 1
Maximum Image Index             : 1
Maximum Transform Index         : 0
Maximum Operation Index         : 0
Thumbnail Clip                  : (Binary data 18480 bytes, use -b option to extract)
Revision Number                 : 1
Create Date                     : 2024:01:06 12:53:29
Modify Date                     : 2024:01:06 12:53:29
Software                        : KODAK DIGITAL SCIENCE DC260
Image Width                     : 1536
Image Height                    : 1024
Subimage Width                  : 1536
Subimage Height                 : 1024
Subimage Color                  : RGB
Subimage Numerical Format       : 8-bit, Unsigned
Decimation Method               : None (Full-sized Image)
JPEG Tables                     : (Binary data 558 bytes, use -b option to extract)
Number Of Resolutions           : 1
Max JPEG Table Index            : 1
Scene Type                      : Original Scene
Software Release                : KODAK DIGITAL SCIENCE DC260
Make                            : Eastman Kodak Company
Camera Model Name               : KODAK DIGITAL SCIENCE DC260
Serial Number                   : 7577
Exposure Time                   : 1/180
F Number                        : 4.7
Exposure Program                : Program AE
Exposure Compensation           : 0
Subject Distance                : 0.520 m
Metering Mode                   : Center-weighted average
Light Source                    : Unknown
Focal Length                    : 24.0 mm
Max Aperture Value              : 4.6
Flash                           : No Flash
Exposure Index                  : 90
Sharpness Approximation         : 0
File Source                     : Digital Camera
Sensing Method                  : One-chip color area
Extension Create Date           : 2024:01:06 12:53:29
Extension Modify Date           : 2024:01:06 12:53:29
Creating Application            : Picoss
Extension Name                  : ijuhsimasa
Extension Persistence           : Always Valid
Extension Description           : Data Object Store 000001
Storage-Stream Pathname         : /Data Object Store 000001
Extension Class ID              : 56616000-C154-11CE-8553-00AA00A1F95B
Used Extension Numbers          : 1
Screen Nail                     : (Binary data 4304 bytes, use -b option to extract)
Subimage Tile Count             : 384
Subimage Tile Width             : 64
Subimage Tile Height            : 64
Num Channels                    : 3
Audio Stream                    : (Binary data 30780 bytes, use -b option to extract)
Aperture                        : 4.7
Image Size                      : 1536x1024
Megapixels                      : 1.6
Shutter Speed                   : 1/180
Preview Image                   : (Binary data 4164 bytes, use -b option to extract)
Focal Length                    : 24.0 mm

The file also does identify in PRONOM:

sf P0004795.FPX 
---
siegfried   : 1.11.0
scandate    : 2024-01-17T23:13:59-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'P0004795.FPX'
filesize : 250880
modified : 2024-01-06T12:54:20-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/56'
    format  : 'Kodak FlashPix Image'
    version : 
    mime    : 'image/vnd.fpx'
    class   : 'Image (Raster)'
    basis   : 'extension match fpx; container name CompObj with byte match at 53, 36 (signature 2/2)'
    warning :

If you notice, PRONOM has two signatures for the FlashPix format, this image was identified with signature #2. The first signature looks for the string “FlashPix Object”, but the second looks for the CLSID which is unique to each compound object format. FlashPix has the CLSID: {56616700-c154-11ce-8553-00aa00a1f95b}. Looking at many of the other samples I have there is much variation on the use of the string and CLSID.

FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

The images from the Kodak Camera use “FlashPix_Object” string so with the underscore it doesn’t match the first signature, but others I made using Picture It! software used a couple variations. Many don’t use the string at all. Others use a sightly different CLSID in both uppercase and lowercase. We will have to suggest adjustments to the current signature to identify them all.

Looking at the contents of the OLE container we can see some interesting things.

Path = P0004795.FPX
Type = Compound
Physical Size = 250880
Extension = compound
Cluster Size = 512
Sector Size = 64

Size         Compressed     Name
------------ ------------  ------------------------
188          192           [5]Data Object 000001
272          320           [1]CompObj
388          448           [5]Extension List
144          192           [5]Global Info
                           Data Object Store 000001
18704        18944         [5]SummaryInformation
816          832           Data Object Store 000001/[5]Image Contents
272          320           Data Object Store 000001/[1]CompObj
988          1024          Data Object Store 000001/[5]Extension List
1624         1664          Data Object Store 000001/[5]Image Info
4332         4608          Data Object Store 000001/[5]Screen Nail_bd0100609719a180
                           Data Object Store 000001/Resolution 0005
                           Data Object Store 000001/Audio_bd0100609719a180
1112         1152          Data Object Store 000001/[5]KDC_bd0100609719a180
72           128           Data Object Store 000001/[5]SummaryInformation
108          128           Data Object Store 000001/Audio_bd0100609719a180/[5]Audio Info
30808        31232         Data Object Store 000001/Audio_bd0100609719a180/Audio Stream 000000
6208         6656          Data Object Store 000001/Resolution 0005/Subimage 0000 Header
176378       176640        Data Object Store 000001/Resolution 0005/Subimage 0000 Data
------------ ------------  ------------------------
242414       244480        16 files, 3 folders

The main CompObj is where we find the identification information, but the Data Object Store 000001 directory is where all the image data is stored. In a multiple resolution image we might see additional Resolution directories. You may also notice a mention of an Audio directory. Yes, this image was captured and then audio was recorded with it. Not a video, but an audio clip associated with the image. FlashPix can contain audio streams. This isn’t the first time we have seen this, HP camera’s also have this function which as it turns out is stored in a FlashPix exif extension within a JPEG.

The FlashPix native format may have disappeared, but the format lives on as an extension to Exif data, allowing you to embed audio and other media within a JPEG file. The code for FlashPix was given to ImageMagick and is maintained by them.

Presto!

January 12, 2024 by Thor Leave a comment

Working in preservation and archiving for the last few years has caused me to change a habit most people use everyday. The double-click. I am usually opening a file in a hex editor or control clicking on a file to open it in a different software application than is default. Maybe it’s just me, but having control over opening a file is essential. The thought of double-clicking on a file and the uncertainty of what is actually happening scares me a little.

Of course opening an application executable requires a double-click or a right-click/open process and from there you can open the file of your choosing. Executables are run-able files because they have the required pieces for the operating system and cpu to interpret and well; run. We need executables in order to make sense of the files we preserve. Without something to interpret our the data in our files they are just a bunch of one’s & zero’s.

Take a PDF for example. By itself, it is hard to make sense of the file. You need Acrobat Reader, or any number of other executable software programs to open and render the PDF.

But what if you could take a file and wrap it in an executable so it is all self contained, the file format and an executable in one file! No separate software needed! On the surface this seems like a great idea, which is why a few software companies had this as an option. An early competitor of PDF, Common Ground had the option to embed the DP file into a self contained viewer. Many archive software tools have the ability to make “self-extracting” executables as well. One obvious downside is being unable to execute on a different platform or a later operating system. But at the time they were very convenient.

One software in particular added the option to export a few different formats into a special wrapper making them viewable on any Windows machine.

New Soft Technology Corporation Presto! PageManager is document management software which can view many different file types. The software helps manage document and photo scanning and keep everything organized. The software often came bundled with home consumer scanners, such as the UMAX Astra scanner I bought years ago. With the Windows version of the software you can take one or more photos and “wrap” them into a Presto! Wrapper.

Once exported to a Presto! Wrapper the files within have a portable viewer wrapped up with them. One double-click and Presto!, you can view, rotate, export, and print your images. The wrapper has a your typical .EXE extension and identifies as such.

sf Presto6-s02.EXE
---
siegfried   : 1.11.0
scandate    : 2024-01-09T23:39:36-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'Presto6-s02.EXE'
filesize : 818301
modified : 2024-01-07T23:48:01-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/899'
    format  : 'Windows Portable Executable'
    version : '32 bit'
    mime    : 'application/vnd.microsoft.portable-executable'
    class   : 
    basis   : 'extension match exe; byte match at [[0 2] [232 94]]'

hexdump -C Presto6-s02.EXE | head
00000000  4d 5a 90 00 03 00 00 00  04 00 00 00 ff ff 00 00  |MZ..............|
00000010  b8 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |........@.......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 e8 00 00 00  |................|
00000040  0e 1f ba 0e 00 b4 09 cd  21 b8 01 4c cd 21 54 68  |........!..L.!Th|
00000050  69 73 20 70 72 6f 67 72  61 6d 20 63 61 6e 6e 6f  |is program canno|
00000060  74 20 62 65 20 72 75 6e  20 69 6e 20 44 4f 53 20  |t be run in DOS |
00000070  6d 6f 64 65 2e 0d 0d 0a  24 00 00 00 00 00 00 00  |mode....$.......|
00000080  99 72 8f bf dd 13 e1 ec  dd 13 e1 ec dd 13 e1 ec  |.r..............|
00000090  5e 0f ef ec dc 13 e1 ec  b2 0c eb ec d6 13 e1 ec  |^...............|

The preservation of executables is, in my opinion, complicated. Running a 32 bit executable on a computer today might not even work. Then we have to get into the license of using the software and wether the license allows us to use it freely in perpetuity. So as much as this is an executable, knowing it is also a wrapper for regular images is important to know as an option for preservation. The files wrapped inside can be exported and preserved as a solution. So what makes this executable unique. Let’s look a little closer.

00005000  00 00 00 00 11 2e 40 00  00 10 40 00 80 1f 40 00  |......@...@...@.|
00005010  c0 24 40 00 00 00 00 00  00 00 00 00 00 00 00 00  |.$@.............|
00005020  50 6d 76 69 65 77 20 69  73 20 63 6c 6f 73 65 2e  |Pmview is close.|
00005030  00 00 00 00 5c 00 00 00  74 6d 70 00 5c 54 45 4d  |....\...tmp.\TEM|
00005040  50 00 00 00 20 4e 65 77  53 6f 66 74 20 56 69 65  |P... NewSoft Vie|
00005050  77 65 72 00 34 31 36 44  37 30 36 43 36 31 37 39  |wer.416D706C6179|
00005060  36 35 37 32 00 00 00 00  41 6d 70 6c 61 79 65 72  |6572....Amplayer|
00005070  00 00 00 00 70 6d 76 69  65 77 2e 65 78 65 00 00  |....pmview.exe..|
00005080  41 6d 70 6c 61 79 65 72  2e 65 78 65 20 67 72 65  |Amplayer.exe gre|
00005090  65 74 2e 69 64 20 56 00  41 6d 70 6c 61 79 65 72  |et.id V.Amplayer|
000050a0  2e 65 78 65 00 00 00 00  2e 2e 00 00 2e 00 00 00  |.exe............|
000050b0  5c 2a 2e 2a 00 00 00 00  4c 6f 63 61 6c 20 41 70  |\*.*....Local Ap|
000050c0  70 57 69 7a 61 72 64 2d  47 65 6e 65 72 61 74 65  |pWizard-Generate|
000050d0  64 20 41 70 70 6c 69 63  61 74 69 6f 6e 73 00 00  |d Applications..|
000050e0  57 72 61 70 70 65 72 00  43 45 78 70 76 77 44 6f  |Wrapper.CExpvwDo|
000050f0  63 00 00 00 43 45 78 70  76 77 56 69 65 77 00 00  |c...CExpvwView..|

It is indeed a wrapper, the header looks like any other EXE file, but a little further into the file we can see some specifics to the viewer. In all my samples I can see the string “NewsSoft Viewer“. That might be enough to distinguish it from other executables. See some samples here.

I guess part of the question is wether identifying specific software executables is needed in preservation. Arn’t they all executables and should be treated similar? This isn’t the first type of executables I have seen like this. awhile back I came across another home software which allowed you to make a slideshow, complete with audio and wrap it into an executable to put on a disk so playback was easy for the user and nothing additional was needed. The software is called Family Album Creator, use at your own risk.

PNG Plus

January 5, 2024 by Thor Leave a comment

Usually in the software world file formats are fairly efficient, the structure is meant to provide a way to store the data of the software being used. There isn’t much need to add additional unnecessary additions. This isn’t always true, but in the early days, disk space was expensive so compression and efficiency ruled. There also wasn’t much need to hide anything or complicate things. That is unless it is intended. This makes me think of two things, Polyglots and Steganography.

Steganography is the art of embedding data within an image. With digital images you can hide another image within the main image by using the most and least significant bits. Fun use of technology, but not something you normally would find in your regular desktop software.

Ange is the master at polyglots. If you haven’t watched his presentation on funky file formats, you are missing out.

.@Gynvael’s png/zip polyglot visualized (cf his article in the latest @pagedout_zine) pic.twitter.com/5BR6GLoB98
— Ange (@angealbertini) December 18, 2023

Imagine my surprise when I was researching the Picture It! software and the MIX file format only to discover Microsoft decided to make their own polyglot of sorts for their PNG Plus format which replaced the MIX format, then both obsolete when Digital Image was discontinued in 2007. The PNG Plus format was the native format for the Microsoft Picture It! and Digital Image software often found with the Microsoft Works or Digital Imaging suite of software.

According to the help within Digital Image:

The PNG Plus format uses the standard PNG extension but provides saving of layers and pages within the PNG format. Since the PNG format cannot do this natively, how did Microsoft accomplish this? Well, by throwing an OLE container into the middle of the file of course!

PNG Plus files are your regular PNG format and will identify as such. But they are just a low resolution thumbnail of the full image. Let’s take a look:

exiftool PictureIt7-s02.png 
ExifTool Version Number         : 12.70
File Name                       : PictureIt7-s02.png
File Size                       : 26 kB
File Modification Date/Time     : 2023:12:26 22:01:58-07:00
File Access Date/Time           : 2024:01:01 12:31:07-07:00
File Inode Change Date/Time     : 2023:12:26 22:01:58-07:00
File Permissions                : -rwx------
File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 500
Image Height                    : 333
Bit Depth                       : 8
Color Type                      : RGB with Alpha
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
SRGB Rendering                  : Perceptual
Gamma                           : 2.2
White Point X                   : 0.3127
White Point Y                   : 0.329
Red X                           : 0.64
Red Y                           : 0.33
Green X                         : 0.3
Green Y                         : 0.6
Blue X                          : 0.15
Blue Y                          : 0.06
Warning                  : [minor] Text/EXIF chunk(s) found after PNG IDAT (may be ignored by some readers)
Title                           : PictureIt7-s02
Image Size                      : 500x333
Megapixels                      : 0.167

Looks like there is some additional data after the IDAT chunk.

hexdump -C PictureIt7-s02.png | head
00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
00000010  00 00 01 f4 00 00 01 4d  08 06 00 00 00 f6 13 9d  |.......M........|
00000020  37 00 00 00 01 73 52 47  42 00 ae ce 1c e9 00 00  |7....sRGB.......|
00000030  00 04 67 41 4d 41 00 00  b1 8f 0b fc 61 05 00 00  |..gAMA......a...|
00000040  00 20 63 48 52 4d 00 00  7a 26 00 00 80 84 00 00  |. cHRM..z&......|
00000050  fa 00 00 00 80 e8 00 00  75 30 00 00 ea 60 00 00  |........u0...`..|
00000060  3a 98 00 00 17 70 9c ba  51 3c 00 00 24 f4 49 44  |:....p..Q<..$.ID|
00000070  41 54 78 5e ed dd 4d a8  15 57 be 28 f0 1e 08 1e  |ATx^..M..W.(....|
00000080  e3 47 8e 49 ab c7 d8 81  03 09 41 9c 28 38 e8 80  |.G.I......A.(8..|
00000090  d0 9c 0e 08 0e 1a 11 c2  15 07 5e 5a 07 4d c7 2b  |..........^Z.M.+|

The header looks the same as any PNG file, so lets look a little further:

00002560  ff 1f fa 5f 90 66 c9 e6  ad 88 00 00 00 00 63 6d  |..._.f........cm|
00002570  4f 44 4e 88 09 c1 00 00  40 00 63 70 49 70 d0 cf  |ODN.....@.cpIp..|
00002580  11 e0 a1 b1 1a e1 00 00  00 00 00 00 00 00 00 00  |................|
00002590  00 00 00 00 00 00 3e 00  03 00 fe ff 09 00 06 00  |......>.........|
000025a0  00 00 00 00 00 00 00 00  00 00 01 00 00 00 01 00  |................|
000025b0  00 00 00 00 00 00 00 10  00 00 02 00 00 00 01 00  |................|
*
00002970  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff 52 00  |..............R.|
00002980  6f 00 6f 00 74 00 20 00  45 00 6e 00 74 00 72 00  |o.o.t. .E.n.t.r.|
00002990  79 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |y...............|
000029a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000029b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 16 00  |................|
000029c0  05 00 ff ff ff ff ff ff  ff ff 01 00 00 00 7e 7f  |..............~.|
000029d0  3f b5 a5 f6 86 43 a1 a1  a3 02 24 d2 88 ef 00 00  |?....C....$.....|
000029e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000029f0  00 00 03 00 00 00 40 12  00 00 00 00 00 00 44 00  |......@.......D.|
00002a00  61 00 74 00 61 00 53 00  74 00 6f 00 72 00 65 00  |a.t.a.S.t.o.r.e.|
00002a10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003930  00 00 00 00 00 00 00 00  00 00 00 00 00 00 43 48  |..............CH|
00003940  4e 4b 49 4e 4b 20 04 00  07 00 0c 00 00 03 00 02  |NKINK ..........|
00003950  00 00 00 0a 00 00 f8 01  0c 00 ff ff ff ff 18 00  |................|
00003960  54 45 58 54 00 00 01 00  00 00 54 45 58 54 00 02  |TEXT......TEXT..|
00003970  00 00 22 00 00 00 18 00  46 44 50 50 00 00 43 00  |..".....FDPP..C.|
00003980  4f 00 4e 00 54 00 45 00  4e 00 54 00 53 00 00 00  |O.N.T.E.N.T.S...|
00003990  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000039f0  00 00 1f 00 00 00 00 0a  00 00 00 00 00 00 01 00  |................|
00003a00  43 00 6f 00 6d 00 70 00  4f 00 62 00 6a 00 00 00  |C.o.m.p.O.b.j...|
00003a10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00004530  00 00 00 00 00 00 00 00  00 00 00 00 00 00 01 00  |................|
00004540  fe ff 03 0a 00 00 ff ff  ff ff 00 00 00 00 00 00  |................|
00004550  00 00 00 00 00 00 00 00  00 00 1a 00 00 00 51 75  |..............Qu|
00004560  69 6c 6c 39 36 20 53 74  6f 72 79 20 47 72 6f 75  |ill96 Story Grou|
00004570  70 20 43 6c 61 73 73 00  ff ff ff ff 01 00 00 00  |p Class.........|
00004580  00 00 00 00 f4 39 b2 71  00 00 00 00 00 00 00 00  |.....9.q........|
00004590  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00006570  00 00 00 00 00 00 00 00  00 00 00 00 00 00 ba 84  |................|
00006580  43 51 00 00 00 18 69 54  58 74 54 69 74 6c 65 00  |CQ....iTXtTitle.|
00006590  00 00 00 00 50 69 63 74  75 72 65 49 74 37 2d 73  |....PictureIt7-s|
000065a0  30 32 3a 70 9c 00 00 00  00 14 74 45 58 74 54 69  |02:p......tEXtTi|
000065b0  74 6c 65 00 50 69 63 74  75 72 65 49 74 37 2d 73  |tle.PictureIt7-s|
000065c0  30 32 f2 8f d5 89 00 00  00 00 49 45 4e 44 ae 42  |02........IEND.B|
000065d0  60 82                                             |`.|

What what do we have here? Near the end of the file before the IEND chunk is an OLE file with the very recognizable hex values of “D0CF11E0“. Let’s strip out the OLE file and take a look.

Path = PictureIt7-s02-ole
Type = Compound
WARNINGS:
There are data after the end of archive
Physical Size = 8704
Tail Size = 7764
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2023-12-26 22:01:58 D....                            DataStore
2023-12-26 22:01:58 D....                            Text
                    .....         2560         2560  Text/CONTENTS
                    .....           86          128  Text/[1]CompObj
                    .....           96          128  DataStore/3
                    .....            4           64  DataStore/1
                    .....          121          128  DataStore/0
                    .....           57           64  DataStore/2
                    .....           98          128  DataStore/5
                    .....            4           64  DataStore/4
                    .....         1254         1280  DataStore/7
                    .....            4           64  DataStore/6
                    .....            4           64  DataStore/8
------------------- ----- ------------ ------------  ------------------------
2023-12-26 22:01:58               4288         4672  11 files, 2 folders

Interesting, I don’t think I have come across a standard format with a container embedded within. I have come across many OLE and ZIP containers which contain other common formats within, but this format is definitely unique. Others have added features in the IDAT chunk, such as a web shell. I am sure there are others out there. The CompObj file found within the Text directory is very similar to the Microsoft Works and Publisher format. Although trying to open the file in Publisher doesn’t work!

hexdump -C PictureIt7-s02-ole/Text/\[1\]CompObj | head
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 1a 00 00 00  |................|
00000020  51 75 69 6c 6c 39 36 20  53 74 6f 72 79 20 47 72  |Quill96 Story Gr|
00000030  6f 75 70 20 43 6c 61 73  73 00 ff ff ff ff 01 00  |oup Class.......|
00000040  00 00 00 00 00 00 f4 39  b2 71 00 00 00 00 00 00  |.......9.q......|
00000050  00 00 00 00 00 00                                 |......|

PRONOM uses binary and container signatures to identify file formats. Even though this file format contains a valid OLE container, because it is within a regular binary file format, I don’t believe a container signature would work. The difficulty will be to clearly identify this new format without falsely identifying a regular PNG instead. The OLE file format header is not in a consistent location to use a specific offset. Making the string a variable location can causes some undo processing, so lets look to see if there is anything else we can use to make a positive ID.

The PNG file format is based on chunks, you have to have IHDR, then an IDAT and the IEND chunk. If we take a look at a regular PNG file using a libpng tool pngcheck, we see this:

pngcheck -cvt rgb-8.png 
File: rgb-8.png (759 bytes)
  chunk IHDR at offset 0x0000c, length 13
    256 x 256 image, 24-bit RGB, non-interlaced
  chunk tEXt at offset 0x00025, length 44, keyword: Copyright
    ? 2013,2015 John Cunningham Bowler
  chunk iTXt at offset 0x0005d, length 116, keyword: Licensing
    compressed, language tag = en
    no translated keyword, 101 bytes of UTF-8 text
  chunk IDAT at offset 0x000dd, length 518
    zlib: deflated, 32K window, maximum compression
  chunk IEND at offset 0x002ef, length 0
No errors detected in rgb-8.png (5 chunks, 99.6% compression).

The required chunk are there, but a couple extra, the tEXt and iTXt, which are textual metadata you can add. Now lets look at a PNG Plus file:

pngcheck -cvt PictureIt7-s02.png         
File: PictureIt7-s02.png (26066 bytes)
  chunk IHDR at offset 0x0000c, length 13
    500 x 333 image, 32-bit RGB+alpha, non-interlaced
  chunk sRGB at offset 0x00025, length 1
    rendering intent = perceptual
  chunk gAMA at offset 0x00032, length 4: 0.45455
  chunk cHRM at offset 0x00042, length 32
    White x = 0.3127 y = 0.329,  Red x = 0.64 y = 0.33
    Green x = 0.3 y = 0.6,  Blue x = 0.15 y = 0.06
  chunk IDAT at offset 0x0006e, length 9460
    zlib: deflated, 32K window, fast compression
  chunk cmOD at offset 0x0256e, length 0
    Microsoft Picture It private, ancillary, unsafe-to-copy chunk
  chunk cpIp at offset 0x0257a, length 16384
    Microsoft Picture It private, ancillary, safe-to-copy chunk
  chunk iTXt at offset 0x06586, length 24, keyword: Title
    uncompressed, no language tag
    no translated keyword, 15 bytes of UTF-8 text
  chunk tEXt at offset 0x065aa, length 20, keyword: Title
    PictureIt7-s02
  chunk IEND at offset 0x065ca, length 0
No errors detected in PictureIt7-s02.png (10 chunks, 96.1% compression).

It looks like we have the required chunks and some textual chunks but also a couple chunks which pngcheck describes as private and identify’s them as Microsoft Picture It chunks. The cpIp chunk is the one which contains the OLE container. This is the chunk we need to identify in a signature. The problem is the offset for the cpIp chunk is not the same each time. Here is one from Digital Image 10 Pro.

  chunk cpIp at offset 0x737a7, length 245760
    Microsoft Picture It private, ancillary, safe-to-copy chunk

Significantly further in the file that the other example. These samples currently identify as PNG 1.2 files. PRONOM fmt/13 so we can use the signature and add to it, but it currently doesn’t look for IDAT only the iTXt chunk, which is probably not optimal. For PNG Plus, lets get the header which includes IHDR, IDAT, then the cpIp chunk then an end of file sequence for IEND. Take a look at my signature and samples, I am curious how many PNG Plus files are out there hidden to the world.

Turns out there is another PNG flavor which has been enhanced to allow for layers and pages. Adobe Fireworks uses a PNG format as their native format. They also use private chunks, but not within an OLE container. They use additional chunks, but before the IDAT chunk:

  chunk prVW at offset 0x00092, length 1700
    Macromedia Fireworks preview chunk (private, ancillary, unsafe to copy)
  chunk mkBF at offset 0x00742, length 72
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkTS at offset 0x00796, length 36716
    Macromedia Fireworks(?) private, ancillary, unsafe-to-copy chunk
  chunk mkBS at offset 0x0970e, length 190
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x097d8, length 1251
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x09cc7, length 1358
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0a221, length 1145
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0a6a6, length 339
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0a805, length 695
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0aac8, length 3799
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0b9ab, length 7733
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0d7ec, length 2741
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0e2ad, length 5153
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk
  chunk mkBT at offset 0x0f6da, length 10775
    Macromedia Fireworks private, ancillary, unsafe-to-copy chunk

It’s hard to know which each of the chunks are for and if they are all required for the Fireworks PNG format. From the book on PNG.

In addition to supporting PNG as an output format, Fireworks actually uses PNG as its native file format for day-to-day intermediate saves. This is possible thanks to PNG’s extensible “chunk-based” design, which allows programs to incorporate application-specific data in a well-defined way. Macromedia has embraced this capability, defining at least four custom chunk types that hold various things pertinent to the editor. Unfortunately, one of them (pRVW) violates the PNG naming rules by claiming to be an officially registered, public chunk type, but this was an oversight and should be fixed in version 2.0.

Picture It!

December 29, 2023 by Thor 1 Comment

Most everyone has heard of Microsoft Office, the suite of applications used by millions everyday. Less people know about Microsoft Works, which was a lower cost alternative, but was quite popular as a home office suite of applications. One tool which often came with the Works suite was a digital image tool called Picture It!

Picture It! was a photo editing tool first released by Microsoft in 1996 geared to making photo editing easy and affordable.

Picture It! used a wizard type interface which walked you through acquiring an image and adding to it. One of the key features of the software was the ability to “stack” objects like layers. Because of this feature a new file format was used to save this information to disk. Meet the Microsoft Image (Picture) Extension format, commonly known as the MIX file format. It is very similar to the FlashPix image format, which was supposed to be an image file format to solve many delivery issues, but didn’t seem to gain hold despite being created by Kodak, HP, and others. In fact many of the MIX files I found on Microsoft disks are actually FlashPix files.

The MIX extension was also used by another Microsoft program, PhotoDraw, which causes confusion as they were similar, but PhotoDraw has some added features which may not be compatible with Picture It!. Both formats are based on the Microsoft Compound Object (OLE) container, and have a similar structure. Let’s take a look at a MIX file from Picture It! version 1.

7z l PictureIt1-s02.mix                 

--
Path = PictureIt1-s02.mix
Type = Compound
Physical Size = 48128
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....          328          384  [5]Data Object 000001
                    .....          396          448  [5]Transform 000004
                    .....          872          896  [5]Operation 000001
                    .....          320          320  [1]CompObj
                    .....          292          320  [5]Global Info
                    .....          872          896  [5]Operation 000002
                    .....          144          192  [5]Operation 000003
                    .....          684          704  [5]Transform 000008
                    .....         1028         1088  [5]Transform 000009
                    .....          328          384  [5]Data Object 000009
                    .....          324          384  [5]Data Object 000005
2023-12-27 11:04:39 D....                            Data Object Store 000001
                    .....          328          384  [5]Data Object 000010
                    .....        20932        20992  [5]SummaryInformation
                    .....          200          256  [5]Microsoft Embedding Info
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0001
                    .....         1400         1408  Data Object Store 000001/[5]Image Contents
                    .....          230          256  Data Object Store 000001/[1]CompObj
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0000
                    .....           28           64  Data Object Store 000001/Resolution 0000/Subimage 0000 Data
                    .....           80          128  Data Object Store 000001/Resolution 0000/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0003
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0002
                    .....           28           64  Data Object Store 000001/Resolution 0002/Subimage 0000 Data
                    .....          208          256  Data Object Store 000001/Resolution 0002/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0005
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0004
                    .....           28           64  Data Object Store 000001/Resolution 0004/Subimage 0000 Data
                    .....         1792         1792  Data Object Store 000001/Resolution 0004/Subimage 0000 Header
                    .....          124          128  Data Object Store 000001/[5]SummaryInformation
                    .....           28           64  Data Object Store 000001/Resolution 0005/Subimage 0000 Data
                    .....         6976         7168  Data Object Store 000001/Resolution 0005/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0003/Subimage 0000 Data
                    .....          544          576  Data Object Store 000001/Resolution 0003/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0001/Subimage 0000 Data
                    .....          128          128  Data Object Store 000001/Resolution 0001/Subimage 0000 Header
------------------- ----- ------------ ------------  ------------------------
2023-12-27 11:04:39              38698        39872  29 files, 7 folders

This is a simple MIX file with one line of text, but contains a lot of content inside the OLE container. If I try and use the PRONOM registry to identify the file, I get:

sf PictureIt1-s02.mix 
---
siegfried   : 1.11.0
scandate    : 2023-12-27T11:06:32-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'PictureIt1-s02.mix'
filesize : 48128
modified : 2023-12-27T11:04:40-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/111'
    format  : 'OLE2 Compound Document Format'
    version : 
    mime    : 
    class   : 'Text (Structured)'
    basis   : 'byte match at 0, 30'
    warning :

Hmm, we know it is an OLE compound document, but it should identify as a Picture It! file as PRONOM has defined a PUID for the format. fmt/936 has been defined as “Microsoft Picture It! Image File 1”. So I am not sure why this file from version 1 is not identifying correctly. Let’s take a look. The PRONOM container signature for fmt/936 is looking for this:

    <ContainerSignature Id="17015" ContainerType="OLE2">
      <Description>Microsoft Picture It! Image File</Description>
      <Files>
        <File>
          <Path>CompObj</Path>
          <BinarySignatures>
            <InternalSignatureCollection>
              <InternalSignature ID="17015">
                <ByteSequence Reference="BOFoffset">
                  <SubSequence Position="1" SubSeqMinOffset="32"
                               SubSeqMaxOffset="32">
                    <Sequence>'Microsoft Picture It! version 1 Picture'</Sequence>
                  </SubSequence>
                </ByteSequence>
              </InternalSignature>
            </InternalSignatureCollection>
          </BinarySignatures>
        </File>
      </Files>
    </ContainerSignature>

The container signature is looking into the OLE container for the “CompObj” file (which seems to be required), then looks for the string “Microsoft Picture It! version 1 Picture” starting at the 32nd byte. That is pretty specific. The sample file I am using as an example has the following string of bytes.

hexdump -C PictureIt1-s02/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 00 68 61 56  |.............haV|
00000010  54 c1 ce 11 85 53 00 aa  00 a1 f9 5b 1e 00 00 00  |T....S.....[....|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 50 69  63 74 75 72 65 00 27 00  |e It! Picture.'.|
00000040  00 00 7b 35 36 36 31 36  38 30 30 2d 43 31 35 34  |..{56616800-C154|
00000050  2d 31 31 43 45 2d 38 35  35 33 2d 30 30 41 41 30  |-11CE-8553-00AA0|
00000060  30 41 31 46 39 35 42 7d  00 13 00 00 00 50 69 63  |0A1F95B}.....Pic|
00000070  74 75 72 65 49 74 21 2e  50 69 63 74 75 72 65 00  |tureIt!.Picture.|

Ok, so this sample has a similar string but is missing the “version 1” text. It seems the samples used to created the PRONOM signature was working off samples which included the version 1 in the header of CompObj. Maybe when Microsoft learned they would be making a version 2, they decided a version number should be included going forward. Let’s take a look a file from version 2 to compare:

hexdump -C PictureIt2-s01/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 50 28 72 2d  |............P(r-|
00000010  4b 8c d0 11 a9 6f 00 a0  c9 05 41 0d 28 00 00 00  |K....o....A.(...|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 76 65  72 73 69 6f 6e 20 32 20  |e It! version 2 |
00000040  50 69 63 74 75 72 65 00  27 00 00 00 7b 32 44 37  |Picture.'...{2D7|
00000050  32 32 38 35 30 2d 38 43  34 42 2d 31 31 44 30 2d  |22850-8C4B-11D0-|
00000060  41 39 36 46 2d 30 30 41  30 43 39 30 35 34 31 30  |A96F-00A0C905410|
00000070  44 7d 00 f4 39 b2 71 50  00 00 00 4d 00 69 00 63  |D}..9.qP...M.i.c|

Ok, so it looks like they did update the version string for version 2. This file also does not identify correctly. A quick look at the wikipedia page for Microsoft Picture It! tells us they continued to release the software until version 10. Is there a different string for each version?

Diving into this and gathering many samples has brought a lot of variants to surface. Let’s see if we can list all the CompObj header variants.

Version 1 samples:
Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! version 1 Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! Collage'{56616800-C154-11CE-8553-00AA00A1F95B}

Version 2 samples:
Microsoft Picture It! version 2 Picture'{2D722850-8C4B-11D0-A96F-00A0C905410D}

Version 3 samples:
Microsoft Picture It! version 3 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

Version 4 samples:
Microsoft Picture It! version 4 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 1 samples:
Microsoft PhotoDraw version 1 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 2 samples:
Microsoft PhotoDraw version 2 Picture'{18B8D021-B4FD-11D0-A97E-00A0C905410D}

FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

Ok, there is a lot to discuss here. First of all, it seems MIX was only used in Picture It! until version 5 (2001), then the Picture It! software used a new format, PNG Plus to store the layered stacks. More on that in a future post! Although some later versions seems to be able to open the older MIX format. Version 4 of the MIX format seems to be the last as the 2001 software had only version 4 files on it. Probably safe to say only the 4 versions are needed for identification.

You may notice the additional unique identifier I included in each format. This is called a Class ID for the OLE format, which A LOT of formats use. Each “format” has a unique ID associated with it to help distinguish it from other formats. This Unique ID could possibly be a better solution for identification. It does cross over with the PhotoDraw format, but the FlashPix format seems to have a unique ID. With all the variations in the version 1 strings, the ID remains the same. For version 3 and 4 the ID is the same, which could mean they are interchangeable. It is also the same as PhotoDraw version 1. Not to complicate things.

So it seems in order to get proper identification of these similar formats we need to:

Clean up version 1 identification for fmt/936
Add a signature for 2, 3, and 4
Add a version 2 signature for the PhotoDraw format
Add some additional signature variations for the FlashPix format.

The Class ID’s could be used to distinguish different versions and formats, but many of the ID’s are identical, this could mean they are the same format. But for now we can just add the additional variation strings and it should identify everything for now. The FlashPix format needs more research as there is so many different variations and it’s so close to the MIX format. Take a look at my GitHub submission, maybe you have some additional variations to add?

Digital Negatives

December 22, 2023 by Thor 1 Comment

One of the important parts about Digital Preservation is to gather significant properties of the digital files we hope to preserve. This can allow us to base our risk assessments off of more data than just an extension. For example, a TIFF file is a mighty good preservation format. Well documented and adopted by the preservation community, and with hundreds if not thousands of software tools to render and make use of the format. But if a TIFF file uses compression like LZW, or if it happens to have multiple pages, those are good things to know about. Most formats might have a stable set of properties, but sometimes can have properties which adds more risk to the format becoming difficult to render or migrate.

A DNG or Digital Negative developed by Adobe was supposed to solve the issues with proprietary RAW digital camera formats. Rendering a PhaseOne IIQ file often times requires the full CaptureOne software which can be expensive. Adobe spends quite a bit of resources in adding support to its Camera RAW toolkit and adding the ability to take majority of these RAW formats and move them into a DNG. There is also more and more camera manufacturers who image directly to a DNG as their native RAW format. This is the case for Apple’s ProRAW format which uses the DNG specification.

Another manufacturer is the Insta360 camera’s. Their 360 camera’s can use two lenses to capture 180 degrees from each and then stitch into a 360 photo or video. They can capture compressed images and videos, but also in RAW. Because of the two lenses and sensors, their DNG’s can get quite large. For this reason I recently asked PRONOM to adjust their signatures to allow for a bigger offset of DNG information in the larger RAW images.

exiftool IMG_20230913_141939_00_039.dng 
ExifTool Version Number         : 12.70
File Name                       : IMG_20230913_141939_00_039.dng
File Size                       : 143 MB
File Type                       : DNG
File Type Extension             : dng
MIME Type                       : image/x-adobe-dng
Exif Byte Order                 : Little-endian (Intel, II)
Subfile Type                    : Full-resolution image
Image Width                     : 5984
Image Height                    : 11968
Bits Per Sample                 : 16
Compression                     : Uncompressed
Photometric Interpretation      : Color Filter Array
Make                            : Arashi Vision
Camera Model Name               : Insta360 X3

DNG files are actually based on the TIFF format, TIFF/EP to be precise, which means there is some good history behind the format and understanding of its structure. DNG does add many new tags and new features, so there is much more going on. Here is a TIFFInfo view of a DNG. Lots of new tags…..

tiffinfo IMG_20230913_141939_00_039.dng 
TIFFReadDirectory: Warning, Unknown field with tag 33421 (0x828d) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 33422 (0x828e) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50937 (0xc6f9) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50938 (0xc6fa) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50940 (0xc6fc) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 51009 (0xc741) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 51107 (0xc7a3) encountered.
=== TIFF directory 0 ===
TIFF Directory at offset 0x889946c (143234156)
  Subfile Type: (0 = 0x0)
  Image Width: 5984 Image Length: 11968
  Bits/Sample: 16
  Sample Format: unsigned integer
  Compression Scheme: None
  Photometric Interpretation: 32803 (0x8023)
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 11968
  Planar Configuration: single image plane
  Make: Arashi Vision
  Model: Insta360 X3
  Software: v1.0.69_build1
  DateTime: 2023:09:13 14:19:40
  Tag 33421: 2,2
  Tag 33422: 1,2,0,1
  EXIFIFDOffset: 0x8
  GPSIFDOffset: 0x3e6
  DNGVersion: 1,3,0,0
  DNGBackwardVersion: 1,3,0,0
  UniqueCameraModel: Insta360 X3

An IFD (Image File Directory) is the building block of a TIFF file. A TIFF file can have multiple IFD’s within a single file. But an IFD can also be a thumbnail, metadata or GPS info. For a DNG, they use the IFD structure as well, but often, the first IFD is a lower resolution of the full image.

 <File:FileType>DNG</File:FileType>
 <File:FileTypeExtension>dng</File:FileTypeExtension>
 <File:MIMEType>image/x-adobe-dng</File:MIMEType>
 <File:ExifByteOrder>Little-endian (Intel, II)</File:ExifByteOrder>
 <IFD0:SubfileType>Reduced-resolution image</IFD0:SubfileType>
 <IFD0:ImageWidth>256</IFD0:ImageWidth>
 <IFD0:ImageHeight>171</IFD0:ImageHeight>
 <IFD0:BitsPerSample>8 8 8</IFD0:BitsPerSample>
 <IFD0:Compression>Uncompressed</IFD0:Compression>
 <IFD0:PhotometricInterpretation>RGB</IFD0:PhotometricInterpretation>
 <IFD0:Make>Canon</IFD0:Make>
 <IFD0:Model>Canon EOS RP</IFD0:Model>
...
 <SubIFD:SubfileType>Full-resolution image</SubIFD:SubfileType>
 <SubIFD:ImageWidth>6384</SubIFD:ImageWidth>
 <SubIFD:ImageHeight>4224</SubIFD:ImageHeight>
 <SubIFD:BitsPerSample>16</SubIFD:BitsPerSample>
 <SubIFD:Compression>JPEG</SubIFD:Compression>

But not always the same way.

 <IFD0:SubfileType>Full-resolution image</IFD0:SubfileType>
 <IFD0:ImageWidth>5984</IFD0:ImageWidth>
 <IFD0:ImageHeight>11968</IFD0:ImageHeight>
 <IFD0:BitsPerSample>16</IFD0:BitsPerSample>
 <IFD0:Compression>Uncompressed</IFD0:Compression>
 <IFD0:PhotometricInterpretation>Color Filter Array</IFD0:PhotometricInterpretation>
 <IFD0:Make>Arashi Vision</IFD0:Make>
 <IFD0:Model>Insta360 X3</IFD0:Model>

 <IFD0:SubfileType>Reduced-resolution image</IFD0:SubfileType>
 <IFD0:ImageWidth>4032</IFD0:ImageWidth>
 <IFD0:ImageHeight>3024</IFD0:ImageHeight>
 <IFD0:BitsPerSample>8 8 8</IFD0:BitsPerSample>
 <IFD0:Compression>JPEG</IFD0:Compression>
 <IFD0:PhotometricInterpretation>YCbCr</IFD0:PhotometricInterpretation>
 <IFD0:Make>Apple</IFD0:Make>
 <IFD0:Model>iPhone 13 Pro</IFD0:Model>
...
 <SubIFD:SubfileType>Full-resolution image</SubIFD:SubfileType>
 <SubIFD:ImageWidth>4032</SubIFD:ImageWidth>
 <SubIFD:ImageHeight>3024</SubIFD:ImageHeight>
 <SubIFD:BitsPerSample>12 12 12</SubIFD:BitsPerSample>
 <SubIFD:Compression>JPEG</SubIFD:Compression>

It can get confusing, especially for tools we use to extract metadata and significant properties from a DNG for preservation. Within Rosetta, the preservation system I use at work, there is no dedicated DNG extractor, so we use JHOVE, as it is the tool we use for our TIFF images. This presents a problem as the process only extracts properties for the first IFD assuming it is the main IFD, but in many cases it reports back the image is much smaller in pixel dimensions than it actually is. More work is needed to improve extracting correct significant properties for DNG and other RAW image formats.

Adobe released a new version of DNG this year. In June, DNG version 1.7.0.0 was finalized. The new version brought a few new features, two of which are including JPEG XL compression and a new HDR colorimetric value. In order to add JPEG XL compression DNG version 1.7 is required. Here is how one looks in exiftool, created with Adobe DNG Converter 16.1.

exiftool _MG_9375_1.dng 
ExifTool Version Number         : 12.70
File Name                       : _MG_9375_1.dng
File Size                       : 5.4 MB
File Type                       : DNG
File Type Extension             : dng
MIME Type                       : image/x-adobe-dng
Exif Byte Order                 : Little-endian (Intel, II)
Make                            : Canon
Camera Model Name               : Canon EOS DIGITAL REBEL XT
Preview Image Start             : 91884
Orientation                     : Rotate 270 CW
Rows Per Strip                  : 171
Preview Image Length            : 10305
Software                        : Adobe DNG Converter 16.1 (Macintosh)
Modify Date                     : 2023:12:18 11:45:06
Artist                          : unknown
Image Width                     : 3516
Image Height                    : 2328
Bits Per Sample                 : 16
Compression                     : JPEG XL
DNG Version                     : 1.7.1.0
DNG Backward Version            : 1.7.1.0

I had recently submitted a new signature for DNG 1.7 to PRONOM, but I found this new DNG version falls outside the signature I created. I had made the assumption all DNG’s report their version based on the last two values of 0.0, so I created the signature to look for 1.7.0.0. This is wrong now that I can see an example of version 1.7.1.0.

In order to fix the issue, I would need to change all the DNG signatures to remove the last two bytes so:

12C601000400000001070000 would change to 12C60100040000000107

This would allow for identification if some DNG files have a point version.

The pace at which manufacturers are producing camera’s with new features is much faster than the Digital Preservation community can keep up with. As new technologies get released, we play catch up trying to identify new formats and variations to existing ones. I guess that is job security?

Final Cut Pro

December 15, 2023 by Thor Leave a comment

When it comes to Digital Preservation, the easiest types of file formats to preserve are often single self contained formats with lots of documentation. There are plenty of formats which break this norm, but a file format like a simple TIFF file is well understood and can stand on its own. The hardest file formats to preserve, I have found, are the complex under documented formats which often show up when you don’t expect them. There is a file format type which indeed makes things difficult. The project format.

There are many software tools out there which generate a “Project”, this is often proprietary and can only be used by the software which created it. Project files are also interdependent, meaning they require other files in known locations in order to be used. This interdependence is often links to images, audio, video, fonts, and other multimedia. The file format itself is just a reference to all the project settings and the paths to the files included in the project. This makes things very difficult to preserve and maintain the complex structure required. Any renaming, removing, or moving the files out of their original order can render the project useless. Many project formats are human readable in XML, or other human readable text, but others are not. I have made a recent attempt to document more Project formats on the File Format Wiki, including many Label and Optical disc project formats, along with updates to Adobe InDesign, QuarkXPress and other desktop publishing project formats. There is still plenty of work needed in other Video and Audio project formats.

Apple computers over the years has created some very powerful software for content creators to use, especially in Video editing. iMovie was used by many home movie editors and iDVD to burn those movies to DVD to share with family and friends, but Apple also sold a professional Video Editing suite which included Final Cut Pro.

Final Cut Pro started life as a Macromedia software tool called KeyGrip which never was released and later bought by Apple. Final Cut Pro was well used and loved by video editors and was given a major upgrade in 2011 to Final Cut Pro X, which was full re-written to be 64-bit. This change included a change to the Project file format. So for version 1 through version 7, Final Cut Pro used a project format with the extension .FCP. Lets take a closer look at the this project format.

hexdump -C Swing.fcp | head
00000000  a2 4b 65 79 47 0a 0d 0a  00 00 00 00 20 fc c5 5b  |.KeyG....... ..[|
00000010  00 de b3 11 d0 93 19 00  05 02 18 66 07 00 00 00  |...........f....|
00000020  03 00 00 00 00 00 00 00  00 01 00 00 00 00 01 00  |................|
00000030  00 00 11 07 73 75 62 74  79 70 65 00 00 00 01 01  |....subtype.....|
00000040  00 00 00 03 00 06 4e 4f  55 4e 44 4f 00 00 00 00  |......NOUNDO....|
00000050  01 01 00 00 00 00 00 00  00 00 00 00 00 07 52 55  |..............RU|
00000060  4e 54 49 4d 45 00 00 00  00 01 01 00 00 00 00 00  |NTIME...........|
00000070  00 00 00 01 07 76 69 65  77 65 72 73 00 00 00 00  |.....viewers....|
00000080  01 01 00 00 00 00 00 00  00 00 00 00 00 00 00 08  |................|
00000090  63 68 69 6c 64 72 65 6e  00 00 00 00 01 01 00 00  |children........|
*
00000e30  00 00 00 00 00 00 00 00  00 00 00 00 00 00 07 8c  |................|
00000e40  b3 2e 56 40 4d 6f 6f 56  54 56 4f 44 00 02 00 02  |..V@MooVTVOD....|
00000e50  00 00 00 11 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000e60  00 00 00 0b 44 61 6e 63  65 20 53 68 6f 74 73 00  |....Dance Shots.|
00000e70  00 01 00 08 00 00 07 8a  00 00 07 84 00 02 00 2f  |.............../|
00000e80  41 54 54 4f 20 52 41 49  44 30 20 47 72 6f 75 70  |ATTO RAID0 Group|
00000e90  3a 54 55 54 4f 52 49 41  4c 3a 44 61 6e 63 65 20  |:TUTORIAL:Dance |
00000ea0  53 68 6f 74 73 3a 49 6e  74 72 6f 2e 6d 6f 76 00  |Shots:Intro.mov.|
00000eb0  00 09 00 a8 00 a8 61 66  70 6d 00 00 00 00 00 03  |......afpm......|
00000ec0  00 18 00 39 00 59 00 75  00 95 00 9e 07 49 4c 31  |...9.Y.u.....IL1|
00000ed0  20 33 72 64 00 00 00 00  00 00 00 00 00 00 00 00  | 3rd............|
00000ee0  00 00 00 00 00 00 00 00  00 00 00 00 00 0f 77 61  |..............wa|
00000ef0  6c 74 d5 73 20 43 6f 6d  70 75 74 65 72 00 00 00  |lt.s Computer...|
00000f00  00 00 00 00 00 00 00 00  00 00 00 00 00 10 41 54  |..............AT|
00000f10  54 4f 20 52 41 49 44 30  20 47 72 6f 75 70 00 00  |TO RAID0 Group..|
00000f20  00 00 00 00 00 00 00 00  00 07 77 73 68 69 72 65  |..........wshire|
00000f30  73 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |s...............|
00000f40  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000f50  00 00 00 00 00 00 00 00  00 00 00 00 ff ff 00 00  |................|
00000f60  00 00 00 00 00 00 00 10  41 54 54 4f 20 52 41 49  |........ATTO RAI|
00000f70  44 30 20 47 72 6f 75 70  00 00 00 00 00 00 00 2b  |D0 Group.......+|
00000f80  00 00 00 01 00 00 00 03  00 00 00 03 54 55 54 4f  |............TUTO|
00000f90  52 49 41 4c 00 44 61 6e  63 65 20 53 68 6f 74 73  |RIAL.Dance Shots|
00000fa0  00 49 6e 74 72 6f 2e 6d  6f 76 00 00 00 00 00 00  |.Intro.mov......|

From the header we can see a remnant of the original KeyGrip software, but later in the file we find some references to files in the Mac HFS path format which includes a colon instead of a slash. These are the paths to the each of the MOV files used in the Project. This file is from the tutorial disk of Final Cut Pro version 1.2, so lets take a look at the last version released, version 7.

hexdump -C Lesson 1 Project.fcp | head
00000000  a2 4b 65 79 47 0a 0d 0a  01 de 00 00 00 20 08 92  |.KeyG........ ..|
00000010  66 c4 28 d7 11 8a e5 00  30 65 ec fe 98 03 00 00  |f.(.....0e......|
00000020  00 00 00 00 00 00 00 00  00 01 00 00 00 00 01 15  |................|
00000030  00 00 00 07 73 75 62 74  79 70 65 01 00 00 00 01  |....subtype.....|
00000040  03 00 00 00 00 06 4e 4f  55 4e 44 4f 00 00 00 00  |......NOUNDO....|
00000050  01 01 00 00 00 00 00 00  00 00 00 00 00 07 52 55  |..............RU|
00000060  4e 54 49 4d 45 00 00 00  00 01 01 00 00 00 00 00  |NTIME...........|
00000070  01 00 00 00 07 76 69 65  77 65 72 73 00 00 00 00  |.....viewers....|
00000080  01 01 00 00 00 00 00 00  00 00 00 00 00 00 00 08  |................|
00000090  63 68 69 6c 64 72 65 6e  00 00 00 00 01 01 01 00  |children........|

Almost identical to the first version, which is helpful for identification, but if we need to identify based on version, it might prove a little more difficult. It appears all the samples I have and have seen reference to all begin with the same 5 hex values, A24B657947, 0xA2 KeyG. It’s hard to know what other hex values might have something to do with versions of the file format. More samples could tell us, but from what I have the 20 bytes starting from offset 12 seems to be consistent among the different version samples. But for now the 5 bytes at the beginning of the file should suffice for identification.

When Final Cut Pro went through a complete re-write in 2011, the FCP format was abandoned. Not only made obsolete, but completely unsupported. The new Final Cut Pro X software was not able to support this now obsolete format. The new format followed the pattern of many other Apple formats of using a folder identified through an extension as a single file. Called a bundle format, Final Cut Pro X used the extension, .FCPBUNDLE. This bundle could include the media assets along with project settings/thumbnails and clips. Because of this “bundle” format, identification would have to be done at the individual file level inside the bundle. This would include formats with extensions such as .flexolibrary and .fcpevent, which appear to be SQLite databases. This complex format makes preservation of this type of object difficult with current methods and practices.

Luckily Apple didn’t leave Final Cut Pro users completely unable to migrate their content. Final Cut Pro could export the project as an XML file. This format is called Final Cut Pro XML Interchange Format and was well documented. The format was not made to bridge the gap from Final Cut Pro to Final Cut Pro X, but rather make the project file more useful outside of Final Cut Pro. Final Cut Pro X actually can’t open these files either, which is why a third party developer came in and developed 7toX (SendtoX) to allow for projects to be converted to a newer XML format.

Lets take a look at the basic Final Cut Pro XML Interchange Format which has a standard XML extension:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="Sequence 1 ">...</sequence>
</xmeml>

Standard XML with a Doctype/root of xmeml. Clever. A little ways into the XML we also see:

<appspecificdata>
	<appname>Final Cut Pro</appname>
	<appmanufacturer>Apple Inc.</appmanufacturer>
	<appversion>7.0</appversion>
</appspecificdata>

Final Cut Pro X also has an XML format which is different than XMEML and has an extension FCPXML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE fcpxml>

<fcpxml version="1.8">
    <resources>
        <format id="r1" name="FFVideoFormatDV720x480i5994" frameDuration="2002/60000s" fieldOrder="lower first" width="720" height="480" paspH="10" paspV="11" colorSpace="6-1-6 (Rec. 601 (NTSC))"/>
    </resources>
    <library location="file:///Untitled.fcpbundle/">...</library>
</fcpxml>

A different Doctype/root and structure but should be easy to identify.

The preservation of projects files, according to some, is not necessary since they are not the finalized product. Preserving the finalized output would be preferable as it can be managed easier and represent the final render of a project. But identification of the Final Cut Pro project and all the assets gives the option to access a collection more accurately. I was able to create a signature for the FCP, XML, and FCPXML formats. Take a look on my GitHub for the signatures and some test files.

PianoSoft DOM-30

December 8, 2023 by Thor Leave a comment

I often find myself at a thrift store looking through the well used Compact Discs. Often see the same ones over and over, but occasionally finding a gem. While looking through a set of discs, a few caught my eye. When I pulled one out to look at the cover I noticed it was not your typical CD. Opening the cover I was greeted with a 3.5 floppy inside the jewel case. That was a fun surprise.

The 3.5 inch floppy disk appears to be made specifically for the Yamaha Disklavier piano’s. The disk had the appearance of your typical double density floppy. Unfortunately, when I inserted the disk I was greeted with the error, “no mountable file systems”. I was however able to use ddrescue and make a disk image. Here is what the disk header looks like:

hexdump -C Yamaha_RazzleDazzle.img | head
00000000  e5 e5 e5 e5 e5 e5 e5 e5  e5 e5 e5 e5 e5 e5 e5 e5  |................|
*
00000200  f9 ff ff 03 40 00 05 60  00 07 80 00 47 00 00 0b  |....@..`....G...|
00000210  c0 00 0d e0 00 0f f0 ff  11 20 01 13 40 01 15 f0  |......... ..@...|
00000220  ff 17 80 01 19 a0 01 1b  c0 01 42 00 00 1f 00 02  |..........B.....|
00000230  21 20 02 23 40 02 25 60  02 27 80 02 29 f0 ff 2b  |! .#@.%`.'..)..+|
00000240  c0 02 2d e0 02 2f f0 ff  31 20 03 33 40 03 35 60  |..-../..1 .3@.5`|
00000250  03 37 80 03 39 f0 ff 3b  c0 03 3d e0 03 3f 00 04  |.7..9..;..=..?..|
00000260  41 f0 ff ff 4f 04 45 60  04 48 f0 ff 49 f0 ff 00  |A...O.E`.H..I...|
00000270  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The header actually has 512 bytes of the E5 hex values. Not a FAT12 file system for sure. A little farther into the disk image I can see what appears to be file listing.

00000e00  53 4f 4e 47 20 20 20 20  50 30 31 00 00 00 00 00  |SONG    P01.....|
00000e10  00 00 00 00 00 00 00 00  00 00 02 00 00 20 00 00  |............. ..|
00000e20  53 48 4f 52 54 20 20 20  50 30 32 20 00 00 00 00  |SHORT   P02 ....|
00000e30  00 00 00 00 00 00 fa ac  a2 16 0a 00 00 18 00 00  |................|
00000e40  54 4f 59 20 20 20 20 20  50 30 33 20 00 00 00 00  |TOY     P03 ....|
00000e50  00 00 00 00 00 00 05 ad  a2 16 10 00 00 18 00 00  |................|
00000e60  53 57 45 45 54 20 20 20  50 30 34 20 00 00 00 00  |SWEET   P04 ....|
00000e70  00 00 00 00 00 00 12 ad  a2 16 16 00 00 20 00 00  |............. ..|
00000e80  56 49 4f 4c 49 4e 20 20  50 30 35 20 00 00 00 00  |VIOLIN  P05 ....|
00000e90  00 00 00 00 00 00 40 ad  a2 16 1e 00 00 30 00 00  |......@......0..|
00000ea0  51 55 49 45 54 20 20 20  50 30 36 20 00 00 00 00  |QUIET   P06 ....|
00000eb0  00 00 00 00 00 00 50 ad  a2 16 2a 00 00 18 00 00  |......P...*.....|
00000ec0  52 41 5a 5a 4c 45 20 20  50 30 37 20 00 00 00 00  |RAZZLE  P07 ....|
00000ed0  00 00 00 00 00 00 83 ad  a2 16 30 00 00 28 00 00  |..........0..(..|
00000ee0  4c 45 54 53 20 20 20 20  50 30 38 20 00 00 00 00  |LETS    P08 ....|
00000ef0  00 00 00 00 00 00 9a ad  a2 16 3a 00 00 20 00 00  |..........:.. ..|
00000f00  50 49 41 4e 4f 44 49 52  46 49 4c 20 00 00 00 00  |PIANODIRFIL ....|
00000f10  00 00 00 00 00 00 a1 ad  a2 16 43 00 00 18 00 00  |..........C.....|
00000f20  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001c00  fe 00 00 00 20 00 00 43  4f 4d 2d 45 53 45 51 51  |.... ..COM-ESEQQ|
00001c10  31 31 56 31 2e 30 30 80  00 00 00 d9 01 00 00 00  |11V1.00.........|
00001c20  20 00 00 01 58 00 00 20  20 20 20 20 20 20 20 20  | ...X..         |
00001c30  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

This gave me a few clues. A few google searches I came across a project on Hackaday about “Hacking Yamaha Disklavier Floppies“. I was eager to test out the python software and see if could export the files on the disk image. To my disappointment, the python script could not read my disk image. So I reached out to the author. After sharing a couple disk images with him, he was able to enable support for this different type of disklavier disk.

python3 disklav.py -t Yamaha_RazzleDazzle.img 
Loading file...OK
Format: PianoSoft DOM-30
Disk: PPC 1919       
Title: RAZZLE DAZZLE
--------------------------------------------------------------------
Track 01 - Song Without Words
Track 02 - Shortenin' Bread Boogie
Track 03 - Toy Bugle
Track 04 - Sweet Tooth
Track 05 - The Mysterious Violin
Track 06 - Quiet Moment
Track 07 - Razzle Dazzle
Track 08 - Let's Have A Party!

When I used the extract option with the software I was rewarded with eight files with the .FIL extention.

sf Yamaha_RazzleDazzle-track01.fil           
---
siegfried   : 1.10.1
scandate    : 2023-12-04T22:38:04-07:00
signature   : default.sig
created     : 2023-12-04T22:37:35-07:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'Yamaha_RazzleDazzle-track01.fil'
filesize : 6409
modified : 2023-12-04T22:28:34-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match'

No surprise the file was not known to PRONOM. I doubt these files have made a big appearance in many archives.

hexdump -C Yamaha_RazzleDazzle-track01.fil | head
00000000  fe 00 00 00 20 00 00 43  4f 4d 2d 45 53 45 51 51  |.... ..COM-ESEQQ|
00000010  31 31 56 31 2e 30 30 80  00 00 00 d9 01 00 00 00  |11V1.00.........|
00000020  20 00 00 01 58 00 00 20  20 20 20 20 20 20 20 20  | ...X..         |
00000030  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000120  20 20 20 20 20 20 20 00  76 04 02 00 1e ff 00 00  |       .v.......|
00000130  ff ff ff ff ff 00 ff 00  00 00 00 ff ff 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 02 02 02  02 02 02 02 02 01 01 01  |................|
00000160  01 01 01 01 01 01 01 01  01 01 01 01 01 00 00 00  |................|

The header of a .FIL file has an ascii string “COM-ESEQ”. A little investigation shed a little light on the format. Turns out there is a proprietary format Yamaha developed called “E-SEQ”. The E-SEQ format is compatible with all Disklaviers, Clavinova digital pianos, and a few other Yamaha products. I was curious if the format was something similar to a MIDI file, which was commonly used with early keyboard systems, but I was unable to find anything to suggest they are similar in any way. Yamaha does mention there are tools out there to convert an E-SEQ file to a SMF (Standard Midi File) which was used on other systems.

There is another tool called PPFBU which can be used to extract a disk image from a Disklavier floppy and the E-SEQ files. Along with a companion tool called MID2PianoCD claims to be able to convert a E-SEQ to a WAV or MP3, although I haven’t had much luck.

Another set of tools are available here, they allow for copying of a disk and converting back and forth from E-SEQ and Midi. A text file in DVUtils from the link has the following background about the disk format and the FIL files:

DISKLAVIER FILES AND DISCS


Yamaha Disklavier discs are always on Double Density (2DD) media, High Density (HD)discs, which are more common nowadays,  will not work. Furthermore, they are formatted to 720 Kbytes not the default of 1.2 Mbytes.  The original discs are copy protected.  This has been achieved by placing invalid data on the first sector.  As DOS and Windows always refer to this sector to check out a floppy, they will report that the discs are bad.  The Yamaha machinery ignores the first sector so it reads them normally.

The music files on a Disklavier disc have the extension .FIL . They are frequently identified with titles like PIANO001.FIL but sometimes they have names similar to DOS like MUSIC1.FIL.  In addition to the music files, there is an index file on the disc.  This contains a list of the active music files on the disc, their titles, and pointers to their position on the disc.  The index file is always called PIANODIR.FIL and always has a size of 6 Kbytes. In order to set up a Disklavier disc to function on a Disklavier, you must first copy the music files onto it in Disklavier format (ESEQ) and then run the ESEQ EXPLORER program to build the index file.

Although there are many Disklavier Piano’s still out there and quite expensive if you want to pick one up for yourself, the websites dedicated to the format as slowly disappearing. One archived website has plenty of sample files to download and write to a floppy for use if you happen to have a Disklavier.

You can check out the signature I put together for the E-SEQ on my Github. Might be good to explore the disk image format more and add that as well to PRONOM for identification as well.

Embroidery Formats

December 1, 2023 by Thor 1 Comment

There are certain file formats which seem to be fairly mainstream and come up frequently from a variety of sources. Then you find one from a specialized niche industry. I recently came across a file with the HUS extension and it led me down a path of a family of formats I didn’t know existed. The world of embroidery formats has been around for quite some time and is quite confusing. There are many manufacturers of embroidery machines and over the years have merged with each other or upgraded to include new features all requiring changes to the file formats used by the systems.

Embroidery stitching designs are a unique file formats. They are images, probably vector, but with stitching information included in the file which determines the color of thread used and pattern used.

I took a data set of many different embroidery files from here and ran it through Siegfried.

That is pretty bleak. Not a single file identified except a few using an OLE container and a couple files which may be plain bitmaps. I have started to document where each of these formats come from on the File Format Wiki, but there are so many formats to research. The triple XXX format research has its other challenges…..

Today I wanted to focus on one brand, Husqvarna, a Swedish company with a long history.

Husqvarna has made sewing machines or quite some time. In the late 1970’s machines started to enter the market which were “computerized”. It was in the early 1990’s when the embroidery machines could take a memory card full of stitching patterns, then later could take a floppy disk or USB drive full of custom made designs. Let’s take a look at a few of the formats, starting with the extension .HUS.

First a sample with a 1994 last modified date:

hexdump -C Bow.hus | head
00000000  5b af c8 00 1a 09 00 00  01 00 00 00 98 01 67 01  |[.............g.|
00000010  6f fe 9b fe 2c 00 00 00  47 00 00 00 0e 07 00 00  |o...,...G.......|
00000020  00 00 00 00 00 00 00 00  00 00 07 00 00 10 38 84  |..............8.|
00000030  81 af f8 d9 6e bc 5e c1  b4 1d fc b2 00 00 00 0c  |....n.^.........|
00000040  93 03 49 98 00 e5 f0 07  41 77 75 c6 49 c6 ff e2  |..I.....Awu.I...|
00000050  80 aa aa 08 00 28 82 a8  b2 49 27 5e dd ae ba ee  |.....(...I'^....|
00000060  db 78 05 bc 4b ef 00 37  f0 da db b5 d6 ee 93 9e  |.x..K..7........|
00000070  55 15 01 00 44 00 05 51  01 3e 16 cf db ce 2c bc  |U...D..Q.>....,.|
00000080  ad db 48 97 cb e5 f2 fe  fc 63 79 e7 a3 a1 46 80  |..H......cy...F.|
00000090  5c 37 f0 10 41 43 a1 40  21 c7 a5 d2 6e 82 0a 20  |\7..AC.@!...n.. |

Then one from around 1996:

hexdump -C ROSEBUD.HUS | head
00000000  5b ff c8 00 5f 0b 00 00  04 00 00 00 ce 00 69 01  |[..._.........i.|
00000010  01 ff 17 fe 3a 00 00 00  66 00 00 00 97 09 00 00  |....:...f.......|
00000020  00 6b 6e 6c 6a 6b 00 00  00 00 0a 00 19 00 1a 00  |.knljk..........|
00000030  0e 00 1a 00 05 00 a4 a2  00 10 00 19 42 88 80 35  |............B..5|
00000040  ff 4d 96 0d c1 72 49 6e  09 00 c8 72 ee 90 76 93  |.M...rIn...r..v.|
00000050  47 ca 20 00 00 4a 32 b1  d4 d4 08 e1 e7 86 d3 50  |G. ..J2........P|
00000060  20 0f 2f 1a 63 e0 09 66  7e ed f0 85 b9 37 fd 12  | ./.c..f~....7..|
00000070  2c 89 09 01 02 00 30 33  33 00 44 96 4b 36 49 65  |,.....033.D.K6Ie|
00000080  bd d7 77 7e df bb cd f4  9b db e7 6f 5b 76 ed b1  |..w~.......o[v..|
00000090  2d b6 49 08 03 31 81 8c  00 02 44 91 22 24 b2 5f  |-.I..1....D."$._|

And another from 1998:

hexdump -C FLOWER.HUS | head
00000000  5d fc c8 00 de 23 00 00  04 00 00 00 79 01 e5 01  |]....#......y...|
00000010  87 fe 1b fe 32 00 00 00  e0 00 00 00 51 1b 00 00  |....2.......Q...|
00000020  00 00 00 00 00 00 00 00  00 00 0e 00 03 00 0c 00  |................|
00000030  05 00 00 69 53 6d 82 36  bf f0 d8 7a 55 c1 0b b6  |...iSm.6...zU...|
00000040  fd b9 52 ff da 61 20 20  14 6a 31 1d e8 1c b0 7f  |..R..a  .j1.....|
00000050  92 cc 48 0c 38 59 16 49  6d 71 11 39 00 00 1e 10  |..H.8Y.Imq.9....|
00000060  4c fc 18 00 00 00 00 00  00 00 0f 68 67 e0 be 11  |L..........hg...|
00000070  17 88 d5 2b 13 c4 5b 33  a2 98 f7 b9 6e 2d dc 62  |...+..[3....n-.b|
00000080  ba 5e 8f 50 bf 09 f9 28  13 38 29 2a de 47 f4 c1  |.^.P...(.8)*.G..|
00000090  9c 3e d6 37 bc 8c ad 95  f0 b3 c1 97 bc fb 1f b5  |.>.7............|

After 1998 Viking Husqvarna merged with the Pfaff brand and a new format with the extension .VIP was used. I found a couple variants of this format.

hexdump -C Magic Flower.vip | head
00000000  5d fc 90 01 0e 06 00 00  01 00 00 00 3e 01 19 01  |]...........>...|
00000010  c2 fe e7 fe 6e 00 00 00  80 00 00 00 c8 04 00 00  |....n...........|
00000020  00 00 00 00 00 00 00 00  00 00 36 00 00 00 89 bf  |..........6.....|
00000030  93 fc 01 00 00 00 1a 00  00 00 46 00 6c 00 6f 00  |..........F.l.o.|
00000040  72 00 61 00 6c 00 3b 00  20 00 41 00 62 00 73 00  |r.a.l.;. .A.b.s.|
00000050  74 00 72 00 61 00 63 00  74 00 3b 00 20 00 46 00  |t.r.a.c.t.;. .F.|
00000060  61 00 73 00 68 00 69 00  6f 00 6e 00 00 00 00 0b  |a.s.h.i.o.n.....|
00000070  38 68 61 2f f8 6c 9d 68  63 47 07 80 09 c0 6b e0  |8ha/.l.hcG....k.|
00000080  04 9f 6d eb 70 c5 4b 3f  e0 00 7d 52 4a 45 66 6f  |..m.p.K?..}RJEfo|
00000090  bf 1e f5 41 be f6 50 44  90 02 21 fd 6f 39 bd 71  |...A..PD..!.o9.q|

The three HUS files and the VIP files have a similar first 4 bytes. Should make an easy signature.

With the addition of the TruE mySewnet service and software, the format went through a change and started using the .VP3 extension.

hexdump -C Magic Flower.vp3 | head
00000000  25 76 73 6d 25 00 00 38  00 50 00 72 00 6f 00 64  |%vsm%..8.P.r.o.d|
00000010  00 75 00 63 00 65 00 64  00 20 00 62 00 79 00 20  |.u.c.e.d. .b.y. |
00000020  00 56 00 53 00 4d 00 20  00 53 00 6f 00 66 00 74  |.V.S.M. .S.o.f.t|
00000030  00 77 00 61 00 72 00 65  00 20 00 4c 00 74 00 64  |.w.a.r.e. .L.t.d|
00000040  00 02 00 00 00 0d 64 00  32 00 46 00 6c 00 6f 00  |......d.2.F.l.o.|
00000050  72 00 61 00 6c 00 3b 00  20 00 41 00 62 00 73 00  |r.a.l.;. .A.b.s.|
00000060  74 00 72 00 61 00 63 00  74 00 3b 00 20 00 46 00  |t.r.a.c.t.;. .F.|
00000070  61 00 73 00 68 00 69 00  6f 00 6e 00 00 7c 38 00  |a.s.h.i.o.n..|8.|
00000080  00 6d c4 ff ff 83 c8 ff  ff 92 3c 00 00 06 0e 00  |.m........<.....|
00000090  01 0c 00 01 00 03 00 00  00 0d 10 00 00 00 00 00  |................|

The current version of the Premier+ software produces a file with the extension .VP4.

hexdump -C Magic Flower.vp4 | head
00000000  25 56 70 34 25 01 00 00  00 4d 73 61 0a df 74 29  |%Vp4%....Msa..t)|
00000010  3c 87 6b 44 2c 84 2f 00  3c 7c f7 e7 a0 69 6e 66  |<.kD,./.<|...inf|
00000020  6f 00 00 00 00 45 00 00  00 6e 74 74 6e 00 00 00  |o....E...nttn...|
00000030  00 27 00 00 00 02 00 6e  74 65 73 19 00 46 6c 6f  |.'.....ntes..Flo|
00000040  72 61 6c 3b 20 41 62 73  74 72 61 63 74 3b 20 46  |ral; Abstract; F|
00000050  61 73 68 69 6f 6e 73 74  67 73 00 00 0c 06 00 00  |ashionstgs......|
00000060  01 00 00 00 01 00 c2 fe  e7 fe 3e 01 19 01 00 00  |..........>.....|
00000070  73 62 64 73 00 00 00 00  ab 0c 00 00 01 00 73 62  |sbds..........sb|
00000080  64 6e 00 00 00 00 9d 0c  00 00 00 00 00 00 00 00  |dn..............|
00000090  00 00 00 00 00 00 00 00  00 00 00 6e 74 74 6e 00  |...........nttn.|

There is a great python library which can read in many of these formats and can give us some confirmation of the format headers. It is called pyembroidery and has readers for HUS and VP3.

One other format associated with the Husqvarna Viking Designer 1 model which used a floppy disk is the SHV stitch file. This is a special format which could only be used on the embroidery machine if it was properly formatted. This would put into a structure that would look like this:

The SHV file is the design file with the MHV and PHV are used for display on the embroidery machine. This is what we find when we look at the SHV content:

hexdump -C My Designs/MENU_01/DES01_01.SHV | head
00000000  45 6d 62 72 6f 69 64 65  72 79 20 64 69 73 6b 20  |Embroidery disk |
00000010  63 72 65 61 74 65 64 20  75 73 69 6e 67 20 73 6f  |created using so|
00000020  66 74 77 61 72 65 20 6c  69 63 65 6e 73 65 64 20  |ftware licensed |
00000030  66 72 6f 6d 20 56 69 6b  69 6e 67 20 53 65 77 69  |from Viking Sewi|
00000040  6e 67 20 4d 61 63 68 69  6e 65 73 20 41 42 2c 20  |ng Machines AB, |
00000050  53 77 65 64 65 6e 08 55  6e 74 69 74 6c 65 64 8f  |Sweden.Untitled.|
00000060  a0 47 50 62 7c 00 00 00  00 00 00 00 00 00 00 00  |.GPb|...........|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 09 99  |................|

The MHV and PHV files have the same header so identification of just the SHV design file will be difficult.

I have only scratched the surface with these embroidery formats. The embroidery market was quite large with many different manufacturers. The user base seems to have been quite large, but its reach may be limited to personal collections. I have attempted a signature for the Husqvarna formats, you can find them in my GitHub repository. More to come hopefully!