Canvas

When it comes to design software there were many options over the years, many being released with a lot of hype and others disappearing not long after they released. There are few which lasted long enough to not be gobbled up by big names such as Adobe. One of those is Canvas by Deneba Systems.

First released in 1987, it is still available over at Canvas GFX. It’s amazing it was never bought by one of the big names, Adobe, Corel, Aldus, etc and remained under Deneba Systems until 2003 when it was bought by ACD Systems, but kept the name Deneba Canvas for a time. The later versions were not popular to all, and Mac support was dropped, but the software continued. Awhile back I was looking through a few of my old ZIP disks and found some software my father used in the mid 1980’s. He had a copy of Canvas version 2 for Macintosh. At that time I was more interested in playing games on our family’s Macintosh 128k than using design software.

Over the years I have come across many Canvas documents. With each version released, changes were made to the file format used to store the drawings and artwork. There were many file format changes as well as the extensions used with each version. Some are easily identifiable and others have some confusing structures. Lets look into it.

VersionPlatformExtensionDescription
Canvas 1-3 & artWORKSMacintoshnoneno strong pattern
Canvas 3.5Mac & WindowsCVSSimilar to v1-3
Canvas 5Mac & WindowsCV5CANVAS5 string
Canvas 6-8Mac & WindowsCNVCANVAS6 string
Canvas 9-XMac & WindowsCVXSimilar to 6-8
Canvas DrawMacCVDDifferent than others
Canvas Image FileCVIDAD5PROX

The first three versions of Canvas were Macintosh only and in those early days there was no extension, just a Type / Creator indicating to the Finder how to open them. Deneba Systems used the Creator codes DAD2, DAD5, through DADX.

The first versions are quite frustrating. I have gathered samples from Version 2, 3, 3.5 and artWORKS version 1. Even with numerous samples, there are no patterns I can discern from them. I even reached out to the current CanvasX technical support for answers. They wanted to be helpful, but their answers didn’t offer much help.

With “CVS” or ‘drw2’ for mac, the header contains ranges inside a structure, and other data like if it was compressed. When we see if it’s a valid file we check the ranges. There is no easy way to determine what hex values would be written because of flipping, Intel vs (PPC or 68K). Unfortunately, the research needed to identify the Hex value will require the original code for version 3.5 which we do not have access to easily. Canvas 3.5 code is 16 bit… this would also be an issue.

Let’s take a look at a couple samples:

hexdump -C Canvas2.1-Sample | head
00000000  00 00 03 06 00 00 3d 9c  00 00 00 2a 00 00 00 0a  |......=....*....|
00000010  00 00 00 76 00 00 00 36  00 00 00 2e 00 00 00 1e  |...v...6........|
00000020  00 00 00 12 00 00 00 42  00 00 00 1a 00 00 00 82  |.......B........|
00000030  00 00 00 3c 00 66 00 01  00 00 3d 9c 00 48 00 00  |...<.f....=..H..|
00000040  40 02 90 00 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000050  00 01 00 00 01 00 00 00  00 20 00 40 00 60 00 80  |......... .@.`..|
00000060  00 c0 01 40 01 80 01 c0  02 40 02 80 00 00 00 00  |...@.....@......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 05  |................|
00000080  00 00 00 00 00 01 00 10  00 00 00 01 00 03 3f fc  |..............?.|
00000090  80 00 00 00 00 00 00 00  00 07 00 01 00 01 00 0b  |................|

hexdump -C Canvas2-s02 | head
00000000  00 00 03 b2 00 00 07 ec  00 00 00 2a 00 00 00 0a  |...........*....|
00000010  00 00 00 76 00 00 00 36  00 00 00 2e 00 00 00 1e  |...v...6........|
00000020  00 00 00 12 00 00 00 42  00 00 00 1a 00 00 00 82  |.......B........|
00000030  00 00 00 3c 00 66 00 01  00 00 07 ec 00 48 00 00  |...<.f.......H..|
00000040  40 02 90 00 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000050  00 01 01 00 01 00 00 00  00 20 00 40 00 60 00 80  |......... .@.`..|
00000060  00 c0 01 40 01 80 01 c0  02 40 02 80 00 00 00 00  |...@.....@......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 05  |................|
00000080  00 00 00 00 00 01 00 10  00 00 00 01 00 03 3f fc  |..............?.|
00000090  80 00 00 00 00 00 00 00  00 07 00 01 00 01 00 0b  |................|

hexdump -C Canvas3.04 | head
00000000  00 00 02 5a 00 00 00 1c  00 00 00 2a 00 00 00 0a  |...Z.......*....|
00000010  00 00 00 76 00 00 00 36  00 00 00 2e 00 00 00 1e  |...v...6........|
00000020  00 00 00 12 00 00 00 42  00 00 00 1a 00 00 00 82  |.......B........|
00000030  00 00 00 3c 00 68 00 02  00 00 00 1c 00 48 00 00  |...<.h.......H..|
00000040  40 02 90 00 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000050  00 01 01 00 01 03 00 00  00 20 00 40 00 60 00 80  |......... .@.`..|
00000060  00 c0 01 40 01 80 01 c0  02 40 02 80 00 00 00 00  |...@.....@......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 01 00 00 00 01 00 10  00 00 00 01 00 03 3f fc  |..............?.|
00000090  80 00 00 00 00 00 00 00  00 07 00 01 00 01 00 0b  |................|

hexdump -C Canvas5-3.5-Sample1.CVS | head
00000000  00 00 01 58 00 00 01 30  00 00 00 2a 00 00 00 00  |...X...0...*....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000030  00 00 00 00 00 69 00 02  00 00 01 30 00 48 00 00  |.....i.....0.H..|
00000040  40 02 90 00 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000050  00 01 01 01 00 00 00 00  00 20 00 40 00 60 00 80  |......... .@.`..|
00000060  00 c0 01 40 01 80 01 c0  02 40 02 80 00 00 00 00  |...@.....@......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 01 00 00 00 01 00 10  00 00 00 01 00 03 3f fc  |..............?.|
00000090  80 00 00 00 00 00 00 00  00 07 00 01 00 01 00 01  |................|

hexdump -C C3-5-S01.CVS | head
00000000  78 11 00 00 10 00 00 00  2a 00 00 00 0a 00 00 00  |x.......*.......|
00000010  26 00 00 00 26 00 00 00  26 00 00 00 26 00 00 00  |&...&...&...&...|
00000020  96 00 00 00 2a 00 00 00  2e 00 00 00 32 00 00 00  |....*.......2...|
00000030  00 00 00 00 01 6b 01 00  50 14 00 00 28 00 00 00  |.....k..P...(...|
00000040  6e 00 00 00 5b 00 00 00  01 00 04 00 00 00 00 00  |n...[...........|
00000050  e8 13 00 00 12 0b 00 00  12 0b 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 80 00 00 80 00 00  |................|
00000070  00 80 80 00 80 00 00 00  80 00 80 00 80 80 00 00  |................|
00000080  c0 c0 c0 00 80 80 80 00  00 00 ff 00 00 ff 00 00  |................|
00000090  00 ff ff 00 ff 00 00 00  ff 00 ff 00 ff ff 00 00  |................|

In the version 2 & 3 samples you can see some patterns, which I thought would allow for proper identification, but looking at more samples I found differences. One pattern I was hopeful might be consistent was the hex values “002000400060008000C00140018001C002400280”, but there are some which don’t match this pattern. If the file is truly compressed, it will be hard to know which values would be consistent among all files. I have over 8,000 samples and have a signature that only excludes around 20, so it will have to do for now.

When we start with Version 5 we get into some more identifiable headers, there is some oddness with some samples. But with an ascii string like “CANVAS5”, it should be easy, right? Not so fast, in version 5 you can compress the file structure. This removes the easily identifiable “CANVAS5” string. But some have a small string at the tail end, but others do not.

hexdump -C Canvas5-Sample1.CV5 | head
00000000  02 00 00 80 00 00 00 00  00 00 00 4e 96 00 00 4e  |...........N...N|
00000010  96 18 02 00 00 00 0e a8  da 43 41 4e 56 41 53 35  |.........CANVAS5|
00000020  00 01 00 00 00 00 00 05  03 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 21 00 00  00 21 00 00 00 79 00 00  |.....!...!...y..|
00000040  00 03 00 00 01 6b 00 00  00 03 00 00 00 01 ff ff  |.....k..........|
00000050  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

hexdump -C Canvas5-Sample3-cmp.CV5 | head
00000000  02 00 00 80 00 00 00 00  08 00 00 80 00 00 00 03  |................|
00000010  5c ff ff ff ff 00 00 40  22 00 00 03 50 10 00 89  |\......@"...P...|
00000020  07 60 bd 0f f0 00 00 10  03 04 10 56 00 20 05 00  |.`.........V. ..|
00000030  e0 18 02 10 35 04 30 4e  05 30 72 07 f0 a8 0d a1  |....5.0N.0r.....|
00000040  17 11 81 19 05 50 5c 00  60 0f 00 10 80 02 90 80  |.....P\.`.......|
00000050  03 f0 56 05 50 55 05 b0  75 12 51 29 05 e0 55 05  |..V.PU..u.Q)..U.|

hexdump -C Canvas5-Sample3-cmp.CV5 | tail
00001ff0  00 00 00 01 08 a5 ab c0  00 00 00 00 3f 89 2c 58  |............?.,X|
00002000  00 00 00 00 08 a5 ab 80  00 00 00 00 ff d4 11 e4  |................|
00002010  00 00 00 00 08 a5 ab 90  00 02 3e d8 ff d3 12 cc  |..........>.....|
00002020  00 00 00 00 00 00 00 00  00 02 3e d8 00 01 00 09  |..........>.....|
00002030  00 00 00 00 00 00 00 00  00 00 00 00 08 a5 ab f8  |................|
00002040  00 00 00 00 43 4e 56 35                           |....CNV5|

Canvas 6 uses a new extension, but has a similar structure to the file format. With compression as an option. But some of the compressed files on Windows has a reversed string, “5VNC“. So many Canvas 5 compressed look identical to Canvas 6 compressed, complicating identification.

hexdump -C Canvas6-Sample.CNV | head
00000000  01 00 80 00 00 90 07 cd  07 00 80 00 00 00 80 00  |................|
00000010  00 17 01 00 00 59 f5 0e  00 43 41 4e 56 41 53 36  |.....Y...CANVAS6|
00000020  00 01 00 00 00 00 06 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 21 7a 00  00 00 7a 00 00 00 03 00  |.....!z...z.....|
00000040  00 00 6e 01 00 00 03 00  00 00 01 00 00 00 ff ff  |..n.............|
00000050  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

hexdump -C Canvas6-Sample1-c.CNV | head
00000000  01 00 80 00 00 58 ea 2b  00 c2 1d 00 00 d0 09 00  |.....X.+........|
00000010  00 00 00 0f 2e 00 00 0b  07 00 00 09 c4 10 00 01  |................|
00000020  00 00 03 00 20 04 00 70  ff 00 80 05 00 c0 06 06  |.... ..p........|
00000030  50 20 03 00 0f 06 10 6b  00 a0 12 01 00 48 07 20  |P .....k.....H. |
00000040  6d 07 30 40 06 40 11 06  00 0b 05 00 10 00 10 71  |m.0@.@.........q|
00000050  01 40 21 00 00 59 01 00  0f 05 10 00 00 e1 14 00  |.@!..Y..........|

hexdump -C Canvas6-Sample1-c.CNV | tail
000016a0  00 00 00 12 f6 00 00 c0  f0 12 00 3c d0 80 7c 58  |...........<..|X|
000016b0  2f 14 00 00 00 00 00 bc  f4 8d 00 0f 00 00 00 00  |/...............|
000016c0  f1 12 00 7f 00 00 00 f8  2e 14 00 bc f4 8d 00 1c  |................|
000016d0  f2 12 00 04 f3 12 00 fc  d1 80 7c 09 04 00 00 00  |..........|.....|
000016e0  00 00 40 00 f2 12 00 ff  ff ff ff 00 f1 12 00 1c  |..@.............|
000016f0  f1 12 00 bc f4 8d 00 00  00 00 40 35 56 4e 43     |..........@5VNC|

While most have the “CANVAS6” string near the beginning, quite a few are missing the CNV5/5VNC string at the end. Instead, many have the string “%SI-0200” near the end, which I use in my signature suggestion. This structure remained the same from version 6 to 8.

hexdump -C Canvas8-S01.CNV | head
00000000  02 00 00 80 00 00 12 b8  80 00 00 11 19 00 00 11  |................|
00000010  19 18 02 00 00 00 0e f5  59 43 41 4e 56 41 53 36  |........YCANVAS6|
00000020  00 01 00 00 00 00 00 08  01 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 21 00 00  00 00 00 00 00 00 00 00  |.....!..........|
00000040  00 03 00 00 00 00 00 00  00 03 00 00 00 01 00 00  |................|
00000050  00 01 ff ff ff ff 00 00  00 02 00 00 00 02 00 00  |................|

But…….. There are plenty without these strings, just the “%SI-0200” near the end.

hexdump -C TELEGRPH.CNV | head
00000000  02 00 00 80 00 00 00 00  08 00 00 80 00 00 00 3d  |...............=|
00000010  f2 ff ff ff ff 00 00 75  76 00 00 3d e6 10 00 ff  |.......uv..=....|
00000020  00 00 b3 0d 90 a9 03 b0  8a 07 f0 98 07 60 80 08  |.............`..|
00000030  d0 35 01 c0 58 01 e0 59  04 80 b8 03 90 38 02 f0  |.5..X..Y.....8..|
00000040  e2 00 20 0b 03 70 1d 03  20 36 0f 30 00 01 80 09  |.. ..p.. 6.0....|

hexdump -C TELEGRPH.CNV | tail
00006850  2b 2c f9 ae 30 00 00 00  20 00 00 00 01 00 00 00  |+,..0... .......|
00006860  0f 00 00 00 10 00 00 00  1e 00 00 00 07 00 00 00  |................|
00006870  64 65 6e 65 62 61 00 00  00 00 01 4c 25 53 49 2d  |deneba.....L%SI-|
00006880  30 32 30 30 6d 61 63 00  00 00 00 00 00 00 00 00  |0200mac.........|
00006890  00 00 00 00                                       |....|

In version 9 and forward we have an extension change to CVX, but the format is similar with the “CANVAS6” string, but is a slightly different offset. It is still used with the current version of Canvas X.

hexdump -C Canvas9-Sample1.cvx | head
00000000  00 00 00 00 00 00 00 00  00 00 02 00 00 80 00 07  |................|
00000010  d1 84 d0 00 00 80 00 00  00 80 00 18 02 00 00 00  |................|
00000020  0f b7 ef 43 41 4e 56 41  53 36 00 01 00 00 00 00  |...CANVAS6......|
00000030  00 09 00 00 00 03 34 00  00 00 04 00 00 00 00 00  |......4.........|
00000040  00 00 00 3c 42 45 47 49  4e 5f 50 52 45 56 49 45  |...<BEGIN_PREVIE|
00000050  57 5f 54 41 47 3e 21 00  00 00 75 00 00 00 79 00  |W_TAG>!...u...y.|
00000060  00 00 03 00 00 01 6b 00  00 00 03 00 00 00 01 ff  |......k.........|
00000070  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

hexdump -C Canvas9-Sample1-compressed.cvx | tail
00004090  00 00 e0 20 00 57 80 00  00 00 00 00 0a 13 00 09  |... .W..........|
000040a0  00 00 04 00 00 00 00 01  00 00 00 00 bf ff e0 80  |................|
000040b0  bf ff e0 40 01 8c 5e 00  02 4a 22 d0 00 00 01 60  |...@..^..J"....`|
000040c0  bf ff e0 40 00 5c 08 18  00 00 00 00 00 0d 84 80  |...@.\..........|
000040d0  43 61 6e 76 61 73 39 2d  53 61 6d 70 6c 65 31 2d  |Canvas9-Sample1-|
000040e0  63 6f 6d 70 72 65 73 73  65 64 2e 63 76 78 00 18  |compressed.cvx..|
000040f0  bf ff e0 70 0a 12 6a a0  02 43 22 b4 00 0c aa 9c  |...p..j..C".....|
00004100  bf ff e0 80 00 00 00 01  00 00 00 00 00 0d 84 80  |................|
00004110  bf ff e0 b0 43 4e 56 35                           |....CNV5|

hexdump -C CanvasX2019-S01.cvx | head
00000000  00 00 00 00 00 00 00 00  00 00 01 00 80 00 00 00  |................|
00000010  6e ab 03 00 80 00 00 00  80 00 00 17 01 00 00 ef  |n...............|
00000020  b7 0f 00 43 41 4e 56 41  53 36 00 01 00 00 00 00  |...CANVAS6......|
00000030  09 00 00 4d 01 00 00 eb  4c 00 00 41 00 00 00 31  |...M....L..A...1|
00000040  52 45 56 03 00 00 00 01  00 00 00 00 00 00 00 00  |REV.............|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This collection of file formats is very hard to make sense of. Some really great consistent patterns on many samples, with lots of exceptions. Super confusing. This software has had a long run, with the latter years staying pretty stagnate in terms of new development. It is worth defining and creating a signature for the consistent patterns, then we can dial in the variants over time?

The signatures I have built miss about 23 files in versions 1-3 out of the ~9000 samples I have and for Canvas 5, only some of the compressed files are currently not identified. But so far all my CNV and CVX files identify correctly, so probably good for now.

CanvasX dropped supported for the Macintosh, but did release an entirely different product called Canvas X Draw, which does support the Macintosh. Here is what a CVD file looks like:

hexdump -C CanvasXDraw7-Sample1.cvd | head
00000000  25 43 61 6e 76 61 73 43  56 44 09 31 2e 30 25 bb  |%CanvasCVD.1.0%.|
00000010  54 48 65 61 64 65 72 00  00 00 00 00 00 00 00 00  |THeader.........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 bb 52 4d 61 63 4f 53  56 65 72 73 69 6f 6e 20  |..RMacOSVersion |
00000040  31 30 2e 31 33 2e 36 20  28 42 75 69 6c 64 20 31  |10.13.6 (Build 1|
00000050  37 47 31 34 30 34 32 29  31 30 2e 32 33 30 34 08  |7G14042)10.2304.|
00000060  00 00 00 70 6c 61 74 66  6f 72 6d 0a 73 00 00 00  |...platform.s...|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 05 00 00  00 02 00 00 00 00 00 00  |................|
00000090  00 08 00 00 00 6f 73 0a  73 00 00 00 00 00 00 00  |.....os.s.......|

There is also the matter of a Canvas Image, which the User Guide calls proxy images. They are Raster images used in placements within Canvas Documents. Should be easy to identify.

hexdump -C Canvas5-Sample1.CVI | head
00000000  00 00 00 01 44 41 44 35  50 52 4f 58 00 00 09 99  |....DAD5PROX....|
00000010  00 00 00 11 00 00 00 2d  00 00 00 03 00 00 00 08  |.......-........|
00000020  00 48 00 00 00 00 00 06  00 03 00 08 00 00 00 11  |.H..............|
00000030  00 00 00 2d 00 03 00 03  00 48 00 00 00 48 00 00  |...-.....H...H..|
00000040  00 00 00 00 00 00 00 00  00 00 00 11 00 00 00 2d  |...............-|
00000050  00 00 00 02 00 00 00 08  00 00 00 01 00 00 00 11  |................|
00000060  00 00 00 2d ff ff ff ff  ff ff ff ff ff ff ff ff  |...-............|
00000070  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

Phew, if you held on for this whole post you must really like confusing file format structures. This format has been on my mind on and off for about 6 years. Hopefully these signatures will work for the vast majority of the Canvas files found in archives and personal systems. As always here is my GitHub with the signatures I am proposing and a few samples to get you confused.

Picture It!

Most everyone has heard of Microsoft Office, the suite of applications used by millions everyday. Less people know about Microsoft Works, which was a lower cost alternative, but was quite popular as a home office suite of applications. One tool which often came with the Works suite was a digital image tool called Picture It!

Picture It! was a photo editing tool first released by Microsoft in 1996 geared to making photo editing easy and affordable.

Picture It! used a wizard type interface which walked you through acquiring an image and adding to it. One of the key features of the software was the ability to “stack” objects like layers. Because of this feature a new file format was used to save this information to disk. Meet the Microsoft Image (Picture) Extension format, commonly known as the MIX file format. It is very similar to the FlashPix image format, which was supposed to be an image file format to solve many delivery issues, but didn’t seem to gain hold despite being created by Kodak, HP, and others. In fact many of the MIX files I found on Microsoft disks are actually FlashPix files.

The MIX extension was also used by another Microsoft program, PhotoDraw, which causes confusion as they were similar, but PhotoDraw has some added features which may not be compatible with Picture It!. Both formats are based on the Microsoft Compound Object (OLE) container, and have a similar structure. Let’s take a look at a MIX file from Picture It! version 1.

7z l PictureIt1-s02.mix                 

--
Path = PictureIt1-s02.mix
Type = Compound
Physical Size = 48128
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....          328          384  [5]Data Object 000001
                    .....          396          448  [5]Transform 000004
                    .....          872          896  [5]Operation 000001
                    .....          320          320  [1]CompObj
                    .....          292          320  [5]Global Info
                    .....          872          896  [5]Operation 000002
                    .....          144          192  [5]Operation 000003
                    .....          684          704  [5]Transform 000008
                    .....         1028         1088  [5]Transform 000009
                    .....          328          384  [5]Data Object 000009
                    .....          324          384  [5]Data Object 000005
2023-12-27 11:04:39 D....                            Data Object Store 000001
                    .....          328          384  [5]Data Object 000010
                    .....        20932        20992  [5]SummaryInformation
                    .....          200          256  [5]Microsoft Embedding Info
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0001
                    .....         1400         1408  Data Object Store 000001/[5]Image Contents
                    .....          230          256  Data Object Store 000001/[1]CompObj
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0000
                    .....           28           64  Data Object Store 000001/Resolution 0000/Subimage 0000 Data
                    .....           80          128  Data Object Store 000001/Resolution 0000/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0003
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0002
                    .....           28           64  Data Object Store 000001/Resolution 0002/Subimage 0000 Data
                    .....          208          256  Data Object Store 000001/Resolution 0002/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0005
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0004
                    .....           28           64  Data Object Store 000001/Resolution 0004/Subimage 0000 Data
                    .....         1792         1792  Data Object Store 000001/Resolution 0004/Subimage 0000 Header
                    .....          124          128  Data Object Store 000001/[5]SummaryInformation
                    .....           28           64  Data Object Store 000001/Resolution 0005/Subimage 0000 Data
                    .....         6976         7168  Data Object Store 000001/Resolution 0005/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0003/Subimage 0000 Data
                    .....          544          576  Data Object Store 000001/Resolution 0003/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0001/Subimage 0000 Data
                    .....          128          128  Data Object Store 000001/Resolution 0001/Subimage 0000 Header
------------------- ----- ------------ ------------  ------------------------
2023-12-27 11:04:39              38698        39872  29 files, 7 folders

This is a simple MIX file with one line of text, but contains a lot of content inside the OLE container. If I try and use the PRONOM registry to identify the file, I get:

sf PictureIt1-s02.mix 
---
siegfried   : 1.11.0
scandate    : 2023-12-27T11:06:32-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'PictureIt1-s02.mix'
filesize : 48128
modified : 2023-12-27T11:04:40-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/111'
    format  : 'OLE2 Compound Document Format'
    version : 
    mime    : 
    class   : 'Text (Structured)'
    basis   : 'byte match at 0, 30'
    warning : 

Hmm, we know it is an OLE compound document, but it should identify as a Picture It! file as PRONOM has defined a PUID for the format. fmt/936 has been defined as “Microsoft Picture It! Image File 1”. So I am not sure why this file from version 1 is not identifying correctly. Let’s take a look. The PRONOM container signature for fmt/936 is looking for this:

    <ContainerSignature Id="17015" ContainerType="OLE2">
      <Description>Microsoft Picture It! Image File</Description>
      <Files>
        <File>
          <Path>CompObj</Path>
          <BinarySignatures>
            <InternalSignatureCollection>
              <InternalSignature ID="17015">
                <ByteSequence Reference="BOFoffset">
                  <SubSequence Position="1" SubSeqMinOffset="32"
                               SubSeqMaxOffset="32">
                    <Sequence>'Microsoft Picture It! version 1 Picture'</Sequence>
                  </SubSequence>
                </ByteSequence>
              </InternalSignature>
            </InternalSignatureCollection>
          </BinarySignatures>
        </File>
      </Files>
    </ContainerSignature>

The container signature is looking into the OLE container for the “CompObj” file (which seems to be required), then looks for the string “Microsoft Picture It! version 1 Picture” starting at the 32nd byte. That is pretty specific. The sample file I am using as an example has the following string of bytes.

hexdump -C PictureIt1-s02/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 00 68 61 56  |.............haV|
00000010  54 c1 ce 11 85 53 00 aa  00 a1 f9 5b 1e 00 00 00  |T....S.....[....|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 50 69  63 74 75 72 65 00 27 00  |e It! Picture.'.|
00000040  00 00 7b 35 36 36 31 36  38 30 30 2d 43 31 35 34  |..{56616800-C154|
00000050  2d 31 31 43 45 2d 38 35  35 33 2d 30 30 41 41 30  |-11CE-8553-00AA0|
00000060  30 41 31 46 39 35 42 7d  00 13 00 00 00 50 69 63  |0A1F95B}.....Pic|
00000070  74 75 72 65 49 74 21 2e  50 69 63 74 75 72 65 00  |tureIt!.Picture.|

Ok, so this sample has a similar string but is missing the “version 1” text. It seems the samples used to created the PRONOM signature was working off samples which included the version 1 in the header of CompObj. Maybe when Microsoft learned they would be making a version 2, they decided a version number should be included going forward. Let’s take a look a file from version 2 to compare:

hexdump -C PictureIt2-s01/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 50 28 72 2d  |............P(r-|
00000010  4b 8c d0 11 a9 6f 00 a0  c9 05 41 0d 28 00 00 00  |K....o....A.(...|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 76 65  72 73 69 6f 6e 20 32 20  |e It! version 2 |
00000040  50 69 63 74 75 72 65 00  27 00 00 00 7b 32 44 37  |Picture.'...{2D7|
00000050  32 32 38 35 30 2d 38 43  34 42 2d 31 31 44 30 2d  |22850-8C4B-11D0-|
00000060  41 39 36 46 2d 30 30 41  30 43 39 30 35 34 31 30  |A96F-00A0C905410|
00000070  44 7d 00 f4 39 b2 71 50  00 00 00 4d 00 69 00 63  |D}..9.qP...M.i.c|

Ok, so it looks like they did update the version string for version 2. This file also does not identify correctly. A quick look at the wikipedia page for Microsoft Picture It! tells us they continued to release the software until version 10. Is there a different string for each version?

Diving into this and gathering many samples has brought a lot of variants to surface. Let’s see if we can list all the CompObj header variants.

Version 1 samples:
Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! version 1 Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! Collage'{56616800-C154-11CE-8553-00AA00A1F95B}

Version 2 samples:
Microsoft Picture It! version 2 Picture'{2D722850-8C4B-11D0-A96F-00A0C905410D}

Version 3 samples:
Microsoft Picture It! version 3 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

Version 4 samples:
Microsoft Picture It! version 4 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 1 samples:
Microsoft PhotoDraw version 1 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 2 samples:
Microsoft PhotoDraw version 2 Picture'{18B8D021-B4FD-11D0-A97E-00A0C905410D}

FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

Ok, there is a lot to discuss here. First of all, it seems MIX was only used in Picture It! until version 5 (2001), then the Picture It! software used a new format, PNG Plus to store the layered stacks. More on that in a future post! Although some later versions seems to be able to open the older MIX format. Version 4 of the MIX format seems to be the last as the 2001 software had only version 4 files on it. Probably safe to say only the 4 versions are needed for identification.

You may notice the additional unique identifier I included in each format. This is called a Class ID for the OLE format, which A LOT of formats use. Each “format” has a unique ID associated with it to help distinguish it from other formats. This Unique ID could possibly be a better solution for identification. It does cross over with the PhotoDraw format, but the FlashPix format seems to have a unique ID. With all the variations in the version 1 strings, the ID remains the same. For version 3 and 4 the ID is the same, which could mean they are interchangeable. It is also the same as PhotoDraw version 1. Not to complicate things.

So it seems in order to get proper identification of these similar formats we need to:

  • Clean up version 1 identification for fmt/936
  • Add a signature for 2, 3, and 4
  • Add a version 2 signature for the PhotoDraw format
  • Add some additional signature variations for the FlashPix format.

The Class ID’s could be used to distinguish different versions and formats, but many of the ID’s are identical, this could mean they are the same format. But for now we can just add the additional variation strings and it should identify everything for now. The FlashPix format needs more research as there is so many different variations and it’s so close to the MIX format. Take a look at my GitHub submission, maybe you have some additional variations to add?