Camtasia

February 14, 2025 by Thor Leave a comment

Not to be confused with Fantasia, a magical screen recording tool has been around for many years. Books have been written on the use of this software to instruct others on how to teach and demonstrate other software and ideas.

Unlike Fantasia, the screen recording software Camtasia was not made by Disney, but does contain some proprietary data. Camtasia is a screen recording software by the developer TechSmith. First released in 2002, it was available first for Windows and much later, Macintosh.

The first versions of Camtasia would encode screen recordings in an AVI container, using the TSCC codec. The TSCC codec, aka TechSmith Screen Capture Codec, was developed by TechSmith and the codec was distributed freely. Let’s see what MediaInfo knows about it.

mediainfo Camtasia1-s01.avi 
General
Complete name                            : Camtasia1-s01.avi
Format                                   : AVI
Format/Info                              : Audio Video Interleave
Format settings                          : BitmapInfoHeader
File size                                : 1.66 MiB
Duration                                 : 2 s 333 ms
Overall bit rate                         : 5 966 kb/s
Frame rate                               : 15.000 FPS

Video
ID                                       : 0
Format                                   : TechSmith
Codec ID                                 : tscc
Codec ID/Info                            : TechSmith Screen Capture
Duration                                 : 2 s 333 ms
Bit rate                                 : 87.3 kb/s
Width                                    : 320 pixels
Height                                   : 240 pixels
Display aspect ratio                     : 4:3
Frame rate                               : 15.000 FPS
Bit depth                                : 8 bits
Bits/(Pixel*Frame)                       : 0.076
Stream size                              : 24.9 KiB (1%)

The AVI video format was the default recording format for the first couple versions. In version 3 the default format changed to the proprietary CAMREC format.

Camrec video files are a proprietary TechSmith file format that is used to store multiple files and information in a single package. Overall, .camrec files store your screen and camera recording plus some meta data about the various streams.
However, it is important to note that you cannot view or play .camrec files outside of Camtasia Studio.

The CAMREC video format isn’t entirely proprietary and uses a common container.

hexdump -C Camtasia3-s01.camrec | head
00000000  d0 cf 11 e0 a1 b1 1a e1  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  3e 00 04 00 fe ff 0c 00  |........>.......|
00000020  06 00 00 00 00 00 00 00  01 00 00 00 02 00 00 00  |................|
00000030  01 00 00 00 00 00 00 00  00 10 00 00 02 00 00 00  |................|
00000040  01 00 00 00 fe ff ff ff  00 00 00 00 00 00 00 00  |................|
00000050  fc 03 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000060  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

file Camtasia3-s01.camrec 
Camtasia3-s01.camrec: Composite Document File V2 Document, Cannot read section info

7z l Camtasia3-s01.camrec  

Scanning the drive for archives:
1 file, 4696576 bytes (4587 KiB)                

Path = Camtasia3-s01.camrec
Type = Compound
ERRORS:
Unexpected end of archive
Physical Size = 4698112
Extension = compound
Cluster Size = 4096
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....         3912         3968  manifest.camxml
                    .....      4672000      4673536  Screen_Stream.avi
------------------- ----- ------------ ------------  ------------------------
                               4675912      4677504  2 files

The CAMREC file might be unknown to most video players, but the AVI within the compound object is the same as the versions before it. Camtasia even has a built in extractor if you really need to pull the AVI out of the format.

7z l Camtasia8-s01.camrec
Path = Camtasia8-s01.camrec
Type = Compound
Physical Size = 33849344
Extension = compound
Cluster Size = 4096
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....         4798         8192  manifest.camxml
                    .....         4286         8192  cursor-1.ico
                    .....          766          768  cursor-0.ico
                    .....         9565        12288  Events.dat
                    .....           36           64  Keyboard.dat
                    .....     33764198     33767424  Screen_Stream.avi
------------------- ----- ------------ ------------  ------------------------
                              33783649     33796928  6 files

Each CAMREC file contains a manifest.camxml. They seem to be UTF-16 XML files, with and without the XML declaration. The Screen_Steam.avi file seems to be in all my samples, but not clear if there can be a variant without an AVI file.

This CAMREC container was used in the Camtasia Studio software until version 8.4 when the default was changed to a new Codec, based on MPEG4, with the TREC extension.

mediainfo capture-1.trec 
General
Complete name                            : capture-1.trec
Format                                   : MPEG-4
Format profile                           : Base Media / Version 2
Codec ID                                 : mp42 (mp42/isom)
File size                                : 277 KiB
Duration                                 : 3 s 41 ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 746 kb/s
Frame rate                               : 19.091 FPS
Encoded date                             : 2025-02-11 03:48:25 UTC
Tagged date                              : 2025-02-11 03:48:34 UTC
FileExtension_Invalid                    : braw mov mp4 m4v m4a m4b m4p m4r 3ga 3gpa 3gpp 3gp 3gpp2 3g2 k3g jpm jpx mqv ismv isma ismt f4a f4b f4v

Video
ID                                       : 1
Format                                   : tsc2-D0
Codec ID                                 : tsc2-D0
Duration                                 : 2 s 933 ms
Bit rate                                 : 495 kb/s
Width                                    : 924 pixels
Height                                   : 696 pixels
Display aspect ratio                     : 4:3
Frame rate mode                          : Variable
Frame rate                               : 19.091 FPS
Minimum frame rate                       : 10.000 FPS
Maximum frame rate                       : 30.000 FPS
Bits/(Pixel*Frame)                       : 0.040
Stream size                              : 177 KiB (64%)
Title                                    : 100
Encoded date                             : 2025-02-11 03:48:25 UTC
Tagged date                              : 2025-02-11 03:48:34 UTC

TechSmith Recording File (TREC) files will identify as an MP4 in most identification tools, you will need MediaInfo or other tools to understand the codec used. If we look at the header of the MP4 TREC file:

hexdump -C Camtasia8.4-s01.trec | head
00000000  00 00 00 18 66 74 79 70  6d 70 34 32 00 00 00 00  |....ftypmp42....|
00000010  6d 70 34 32 69 73 6f 6d  00 00 00 88 66 72 65 65  |mp42isom....free|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000a0  00 00 00 01 6d 64 61 74  00 00 00 00 01 1f c4 b9  |....mdat........|
000000b0  01 02 02 01 40 00 00 00  7f 7f 7f 7f 7f 7f 7f 7f  |....@...........|
000000c0  7f 7f 7f 7f 7f 7f 7f 7f  7f 7f 7f 7f 7f 7f 7f 7f  |................|
*
000000f0  7f 7f 7f 7f 7f 7f 7f 63  da 11 00 00 d6 46 18 e0  |.......c.....F..|
00000100  77 ff 43 ff aa e4 eb 9c  dc 8f 9a 56 7a 30 71 ea  |w.C........Vz0q.|

We see the standard header for an MP4 file. The codec specific to the Camtasia software is identified later in the file, but identification using a PRONOM signature might be challenging. In looking at the hex of the file, near the end, you can find embedded PNG’s and other data. VLC and FFMPEG can read the codec, but players like Quicktime struggle.

A promising section near the end shows the name and version of Camtasia Studio. More data needed.

hexdump -C Camtasia8.4-s02.trec
0569a3b0  00 00 00 00 00 00 00 00  00 00 01 54 53 43 52 00  |...........TSCR.|
0569a3c0  00 00 00 00 00 00 50 01  00 00 00 00 00 00 00 43  |......P........C|
0569a3d0  61 6d 74 61 73 69 61 20  53 74 75 64 69 6f 00 00  |amtasia Studio..|
0569a3e0  00 00 00 38 2e 34 00 00  00 00 00 00 00 00 00 00  |...8.4..........|
0569a3f0  00 00 00 57 69 6e 00 60  bb c4 00 00 00 00 00 00  |...Win.`........|
0569a400  00 00 00 00 00 00 00 00  00 00 01 54 53 43 4d 00  |...........TSCM.|
0569a410  00 00 00 00 00 95 2c 00  00 00 01 44 41 54 41 00  |......,....DATA.|

Camtasia also uses a lot of Project files to managing the video editing process of your screen recordings. The project files can vary between the Windows and Macintosh versions.

The older versions of Camtasia for Windows up until version 8.4, used the CAMPROJ extension for their projects. These are in XML and simply use “<Project_Data>” for the root element. With Version 8 having a later element “<CSMLData>” to manage the assets. Other projects also have a File element that begins with either “tscrec4://” or “TSCRec://”. But it may be best to identify the older versions with the “<ClipBin_Array>” element.

For Mac version 2, they used CMPROJ for the Project, but also it was an Apple Bundle/Package file. It also used a recording file with the extension CMREC, but is also Apple Bundle/Package file which contains MOV and DAT files.

The most recent versions of Camtasia for Mac and windows use the TSCPROJ extension. They are plan text files with some resemblance of JSON.

{
  "title" : "",
  "description" : "",
  "author" : "",
  "width" : 854.0,
  "height" : 480.0,
  "version" : "0.5",
  "editRate" : 30,
  "authoringClientName" :  {
    "name" : "Camtasia",
    "platform" : "Mac",
    "version" : "3.1.7"
  }

There are a few formats related to Camtasia, but the CAMREC format is the one that shows up the most in my work. So today I am only proposing a signature for CAMREC and the CAMPROJ formats. We will have to have some discussion on the TREC format to determine if standard MPEG-4 identification is fine or if the format needs its own PUID. You can find some examples and my proposed signature on my Github page.

Binder

August 9, 2024 by Thor Leave a comment

Microsoft is never in short supply of file formats. They have made many changes over the years. Introduced lots of products, some lasting longer than others. The list is quite long.

One such software was called Office Binder. Introduced with Office 95, it was a companion application to combine a number of OLE objects together in one “Binder”. Meant to be the digital version of an Office Binder one often uses for presentations or proposals.

You could add sections and include Word documents, Images, Powerpoint, Excel spreadsheets, basically any OLE object. Of course a Binder file itself was an OLE compound object. They had the extension OBD, and templates used OBT. The PRONOM registry has PUID’s for the different Binder versions, but there are some issues.

PUID	Format Name	Format Version	Extension
fmt/237	Microsoft Office Binder File for Windows	95	obd
fmt/240	Microsoft Office Binder File for Windows	97-2000	obd
fmt/238	Microsoft Office Binder Template for Windows	95	obt
fmt/241	Microsoft Office Binder Template for Windows	97-2000	obt
fmt/239	Microsoft Office Binder Wizard for Windows	95	obz
fmt/242	Microsoft Office Binder Wizard for Windows	97-2000	obz

filename : 'Binder95-s01.obd'
filesize : 5120
modified : 2024-08-08T21:24:34-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/240'
    format  : 'Microsoft Office Binder File for Windows'
    version : '97-2000'
    mime    : 
    class   : 
    basis   : 'extension match obd; container name Binder with name only'

Turns out only one of the PRONOM PUID’s has an actual signature, the others are placeholders. So when I run Siegfried on an Office Binder 95 file, it comes back as fmt/240 which points to an Office Binder 97-2000 file. It’s a simple signature, looking for an internal file named “Binder”, which is inherent of all the Binder file types.

    <ContainerSignature Id="5500" ContainerType="OLE2">
      <Description>Microsoft Office Binder File for Windows 97-2000</Description>
      <Files>
        <File>
          <Path>Binder</Path>
        </File>
      </Files>
    </ContainerSignature>

Taking a look inside the Office 95 Binder file, we can see the “Binder” file.

Path = Binder95-s01.obd
Type = Compound
Physical Size = 5120
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....          316          320  [5]SummaryInformation
                    .....          144          192  Binder
                    .....          280          320  [5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
                                   740          832  3 files

hexdump -C Binder95-s01/Binder 
00000000  90 00 00 00 05 00 00 00  00 00 00 00 05 00 00 00  |................|
00000010  00 00 00 00 a1 6a 8a 8e  cc 55 ef 11 ab 06 00 0c  |.....j...U......|
00000020  29 b1 b4 d0 00 00 00 00  00 00 00 00 00 00 00 00  |)...............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 40 86 61 a6  |............@.a.|
00000040  0b ea da 01 00 00 00 00  00 00 00 00 40 86 61 a6  |............@.a.|
00000050  0b ea da 01 09 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 2c 00 00 00  00 00 00 00 01 00 00 00  |....,...........|
00000070  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000080  2c 00 00 00 2c 00 00 00  13 03 00 00 44 02 00 00  |,...,.......D...|

The bytes within a “Binder” file has some patterns, but nothing decipherable.

Microsoft Office Binder was only included in three versions of Office. Office 95, 97, and 2000. Let’s look at the other two versions.

Path = Binder97-s04.obd
Type = Compound
Physical Size = 5632
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....           28           64  HdrFtr
                    .....          144          192  Binder
                    .....          260          320  [5]SummaryInformation
                    .....          404          448  [5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
                                   836         1024  4 files

Path = Binder2K-S01.obd
Type = Compound
Physical Size = 5632
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....           28           64  HdrFtr
                    .....          144          192  Binder
                    .....          260          320  [5]SummaryInformation
                    .....          232          256  [5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
                                   664          832  4 files

It looks like version 97 and 2000 have an extra file. The “HdrFtr” file seems to reference a Header and Footer, which according to documentation was a feature added in Office 97.

What’s new in Office Binder 97

Office Binder makes it possible for you to group all your documents, workbooks, and presentations for a project in one place. To get started with Office Binder 97, add a new or existing document to your binder. Use the new Office 97 features while you work in a binder……. Print headers and footers for a binder

We can use the “HdrFtr” file within the container to differentiate between the 95 version and 97-2000 formats. Perhaps, a closer look at the DocumentSummaryInformation file in the future, might help with a more precise identification later. There doesn’t seem to be anything to distinguish an OBD file from a OBT template file, so those PUID’s may not be needed. The other format related to the Binder software has the OBZ extension. It is called a Wizard template file in some documentation, but I have been unable to find any type of “Wizard” functionality in the Office Binder Apps to generate a file. The OBZ format seems to have something to do with macros in Visual Basic. Luckily there are a few examples available on Office install disc‘s.

Path = CLIENT.OBZ
Type = Compound
Physical Size = 364032
Extension = doc
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1995-07-05 17:25:15 D....                            7
1995-07-05 17:25:14 D....                            5
1995-07-05 17:25:13 D....                            4
                    .....          106          128  4/[1]CompObj
                    .....           20           64  4/[1]Ole
                    .....         8880         9216  4/WordDocument
                    .....           32           64  4/[3]View000
                    .....          492          512  4/[5]SummaryInformation
                    .....          236          256  4/[5]DocumentSummaryInformation
1995-07-05 17:25:14 D....                            6
                    .....        17760        17920  6/Book
                    .....           20           64  6/[1]Ole
                    .....            0            0  6/[3]View000
                    .....          102          128  6/[1]CompObj
                    .....         3260         3264  6/[5]SummaryInformation
                    .....          192          192  6/[5]DocumentSummaryInformation
                    .....          106          128  5/[1]CompObj
                    .....           20           64  5/[1]Ole
                    .....         8055         8192  5/WordDocument
                    .....           32           64  5/[3]View000
                    .....         7280         7680  5/[5]SummaryInformation
                    .....          220          256  5/[5]DocumentSummaryInformation
1995-07-05 17:25:16 D....                            9
1995-07-05 17:25:15 D....                            8
                    .....        13857        14336  8/Book
                    .....           20           64  8/[1]Ole
                    .....            0            0  8/[3]View000
                    .....          102          128  8/[1]CompObj
                    .....          188          192  8/[5]SummaryInformation
                    .....          196          256  8/[5]DocumentSummaryInformation
                    .....          854          896  Binder
1995-07-05 17:25:19 D....                            10
                    .....        80382        80384  10/Book
                    .....           20           64  10/[1]Ole
                    .....            0            0  10/[3]View000
                    .....          102          128  10/[1]CompObj
                    .....         4044         4096  10/[5]SummaryInformation
1995-07-05 17:25:19 D....                            10/_VBA_PROJECT
                    .....         9425         9728  10/_VBA_PROJECT/812f9922c6
                    .....        12302        12800  10/_VBA_PROJECT/7b2f9922a4
                    .....        36937        37376  10/_VBA_PROJECT/dir
                    .....         6609         6656  10/_VBA_PROJECT/7e2f9922b5
                    .....        23014        23040  10/_VBA_PROJECT/872f9922e8
                    .....         7995         8192  10/_VBA_PROJECT/842f9922d9
                    .....         5338         5632  10/_VBA_PROJECT/902f992333
                    .....        36119        36352  10/_VBA_PROJECT/8d2f99231e
                    .....        18129        18432  10/_VBA_PROJECT/932f992342
                    .....        13055        13312  10/_VBA_PROJECT/b42fbcaa59
                    .....          208          256  10/[5]DocumentSummaryInformation
                    .....         4228         4608  [5]SummaryInformation
                    .....          956          960  [5]DocumentSummaryInformation
                    .....          106          128  9/[1]CompObj
                    .....           20           64  9/[1]Ole
                    .....         5914         6144  9/WordDocument
                    .....            0            0  9/[3]View000
                    .....         1520         1536  9/[5]SummaryInformation
                    .....          220          256  9/[5]DocumentSummaryInformation
                    .....        16141        16384  7/Book
                    .....           20           64  7/[1]Ole
                    .....            0            0  7/[3]View000
                    .....          102          128  7/[1]CompObj
                    .....          188          192  7/[5]SummaryInformation
                    .....          192          192  7/[5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
1995-07-05 17:25:19             345316       351168  55 files, 8 folders

Sure enough, the OBZ file has a Visual Basic macro (VBA_Project). Unfortunately, it appears to be nested in an additional folder within the container, with a variable number number which is likely to change from file to file. That fact will make identification in PRONOM much more difficult, as the signatures are not designed for variable names. Possibly something we can investigate later.

Microsoft Binder was only released in Office 95, 97, and 2000, but was supported in Office XP and 2003 through an UNBIND.EXE application which would simply separate all the different objects back out to the individual files.

The Microsoft Office Binder is not included in Office 2003. However, if a Binder file created in a previous version of Office contains information you want to access, you can use the Unbind tool to pull out the information and save it in the formats of the appropriate programs. In order to do this procedure, the Unbind tool must be installed.

As always, you can look at some sample files and my proposal for updated signatures on my GitHub page.

PowerBI

February 23, 2024 by Thor Leave a comment

I think when most of us have some data to sort or make sense of, we tend to gravitate toward a spreadsheet. Using Excel or LibreOffice, or if you really like to party, OpenRefine. There are plenty of meme’s out there representing the frustration people have with bugs, features and limitations of Excel specifically.

Optimist: The glass is ½ full.
Pessimist: The glass is ½ empty.
Excel: The glass is January 2nd.
— jxf@mastodon.social (@jxxf) May 7, 2022

There are more tools out there for making sense of data, one some people have access to is Microsoft’s more advanced PowerBI tool. Marketed as a Data Visualization tool it is accessible to many with a Office 365 subscription. It offers expanded features than excel and isn’t as limited in row maximums.

PowerBi was recently the topic of a Code4Lib editorial issue. The writer of an article for their journal posted two PowerBI datasets which a reader later noticed had private data. After some miscommunications and misunderstandings an open letter was drafted and received some support. Code4Lib did release a statement and lessons were learned.

One statement from the Code4Lib staff caught my eye. “The released files were in a proprietary file format, Microsoft Power BI, with which none of the editors have experience.”

We all use tools for our jobs we are most familiar or available to us. No one can be an expert in all file formats. Some us try, but things change so fast it is impossible. But, we can do more in documenting and making formats identifiable through the tools we use for digital preservation. The File Format Wiki and PRONOM ~~have~~ had no mention of Power BI, so let’s change that.

Microsoft Power BI was released in 2011 and has been part of the Microsoft Power Platform. Power BI can gather data from many sources. The software can be accessed in the Office 365 cloud, but also using a Desktop application. In the desktop application, all the data sources and connections are stored in a single file with the extension PBIX. But there are other related formats.

filename : 'PowerBI-Test.pbix'
filesize : 401951
modified : 2024-02-22T11:29:41-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/263'
    format  : 'ZIP Format'
    version : 
    mime    : 'application/zip'
    class   : 'Aggregate'
    basis   : 'byte match at [[0 4] [401867 3] [401929 4]]'
    warning : 'extension mismatch'

Path = PowerBI-Test.pbix
Type = zip
Physical Size = 401951

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2024-02-22 18:29:40 .....            8           10  Version
2024-02-22 18:29:40 .....          488          230  [Content_Types].xml
2024-02-22 18:29:40 .....       397312       397312  DataModel
2024-02-22 18:29:40 .....         2848          882  Report/Layout
2024-02-22 18:29:40 .....          328          161  Settings
2024-02-22 18:29:40 .....          136          120  Connections
2024-02-22 18:29:40 .....        18972         1733  Report/StaticResources/SharedResources/BaseThemes/CY24SU02.json
2024-02-22 18:29:40 .....          358          357  SecurityBindings
------------------- ----- ------------ ------------  ------------------------
2024-02-22 18:29:40             420450       400805  8 files

Just like many modern Microsoft formats it is a ZIP container with a mixture of XML and JSON. There is also a DataModel file along with Settings and Connections. A quick peek at some of the contents shows us:

hexdump -C PowerBI-Test/Version | head
00000000  31 00 2e 00 32 00 38 00                           |1...2.8.|

hexdump -C PowerBI-Test/DataModel | head
00000000  ff fe 53 00 54 00 52 00  45 00 41 00 4d 00 5f 00  |..S.T.R.E.A.M._.|
00000010  53 00 54 00 4f 00 52 00  41 00 47 00 45 00 5f 00  |S.T.O.R.A.G.E._.|
00000020  53 00 49 00 47 00 4e 00  41 00 54 00 55 00 52 00  |S.I.G.N.A.T.U.R.|
00000030  45 00 5f 00 29 00 21 00  40 00 23 00 24 00 25 00  |E._.).!.@.#.$.%.|
00000040  5e 00 26 00 2a 00 28 00  3c 00 42 00 61 00 63 00  |^.&.*.(.<.B.a.c.|
00000050  6b 00 75 00 70 00 4c 00  6f 00 67 00 3e 00 3c 00  |k.u.p.L.o.g.>.<.|
00000060  42 00 61 00 63 00 6b 00  75 00 70 00 52 00 65 00  |B.a.c.k.u.p.R.e.|
00000070  73 00 74 00 6f 00 72 00  65 00 53 00 79 00 6e 00  |s.t.o.r.e.S.y.n.|
00000080  63 00 56 00 65 00 72 00  73 00 69 00 6f 00 6e 00  |c.V.e.r.s.i.o.n.|
00000090  3e 00 31 00 34 00 30 00  3c 00 2f 00 42 00 61 00  |>.1.4.0.<./.B.a.|

hexdump -C PowerBI-Test/\[Content_Types\].xml | head
00000000  ef bb bf 3c 3f 78 6d 6c  20 76 65 72 73 69 6f 6e  |...<?xml version|
00000010  3d 22 31 2e 30 22 20 65  6e 63 6f 64 69 6e 67 3d  |="1.0" encoding=|
00000020  22 75 74 66 2d 38 22 3f  3e 3c 54 79 70 65 73 20  |"utf-8"?><Types |
00000030  78 6d 6c 6e 73 3d 22 68  74 74 70 3a 2f 2f 73 63  |xmlns="http://sc|
00000040  68 65 6d 61 73 2e 6f 70  65 6e 78 6d 6c 66 6f 72  |hemas.openxmlfor|
00000050  6d 61 74 73 2e 6f 72 67  2f 70 61 63 6b 61 67 65  |mats.org/package|
00000060  2f 32 30 30 36 2f 63 6f  6e 74 65 6e 74 2d 74 79  |/2006/content-ty|
00000070  70 65 73 22 3e 3c 44 65  66 61 75 6c 74 20 45 78  |pes"><Default Ex|
00000080  74 65 6e 73 69 6f 6e 3d  22 6a 73 6f 6e 22 20 43  |tension="json" C|
00000090  6f 6e 74 65 6e 74 54 79  70 65 3d 22 22 20 2f 3e  |ontentType="" />|

So it looks like the ZIP structure follows the standard for OpenXML packages as it contains a “[Content_Types].xml” file. So using this XML alone would clash with too many other formats. From what I could find the “DataModel” file is what stores the data is more unique to this format, even though the name is pretty generic. Using a string within the file would probably help be more accurate. The “DataModel” file does have unicode double byte strings we can use. “STREAM_STORAGE_SIGNATURE” seems like a unique enough string to use, but it looks like it may not be unique to PBIX. Looks like the “DataModel” file is a Microsoft “MS-XLDM” file format and is a “Spreadsheet Data Model File Format“.

There is a variation to the DataModel file and I am not sure when the standard is used verses this variation, “This backup was created using XPress9 compression”. Not sure if it is versioning or how the file is saved, but they both seem to function correctly.

hexdump -C DataModel | head
00000000  54 00 68 00 69 00 73 00  20 00 62 00 61 00 63 00  |T.h.i.s. .b.a.c.|
00000010  6b 00 75 00 70 00 20 00  77 00 61 00 73 00 20 00  |k.u.p. .w.a.s. .|
00000020  63 00 72 00 65 00 61 00  74 00 65 00 64 00 20 00  |c.r.e.a.t.e.d. .|
00000030  75 00 73 00 69 00 6e 00  67 00 20 00 58 00 50 00  |u.s.i.n.g. .X.P.|
00000040  72 00 65 00 73 00 73 00  39 00 20 00 63 00 6f 00  |r.e.s.s.9. .c.o.|
00000050  6d 00 70 00 72 00 65 00  73 00 73 00 69 00 6f 00  |m.p.r.e.s.s.i.o.|
00000060  6e 00 2e 00 00 00 00 b0  07 00 76 75 00 00 2a d7  |n.........vu..*.|
00000070  86 4e 00 b0 07 00 ad ab  03 00 2c cb 06 00 00 00  |.N........,.....|
00000080  00 00 f8 6c 86 7f 00 00  00 00 68 01 56 6e 00 00  |...l......h.Vn..|
00000090  20 82 67 49 52 06 00 f6  ab fc fc fe 2d f6 da 8b  | .gIR.......-...|

After a bit of digging it seems like the MS-XLDM format can be found within an XSLX file. I found an example with these datasets. Within an XSLX there can be a found a file “xl/model/item.data” and it has the same structure as DataModel within a PBIX.

hexdump -C Customer Profitability Sample-no-PV/xl/model/item.data | head
00000000  ff fe 53 00 54 00 52 00  45 00 41 00 4d 00 5f 00  |..S.T.R.E.A.M._.|
00000010  53 00 54 00 4f 00 52 00  41 00 47 00 45 00 5f 00  |S.T.O.R.A.G.E._.|
00000020  53 00 49 00 47 00 4e 00  41 00 54 00 55 00 52 00  |S.I.G.N.A.T.U.R.|
00000030  45 00 5f 00 29 00 21 00  40 00 23 00 24 00 25 00  |E._.).!.@.#.$.%.|
00000040  5e 00 26 00 2a 00 28 00  3c 00 42 00 61 00 63 00  |^.&.*.(.<.B.a.c.|
00000050  6b 00 75 00 70 00 4c 00  6f 00 67 00 3e 00 3c 00  |k.u.p.L.o.g.>.<.|
00000060  42 00 61 00 63 00 6b 00  75 00 70 00 52 00 65 00  |B.a.c.k.u.p.R.e.|
00000070  73 00 74 00 6f 00 72 00  65 00 53 00 79 00 6e 00  |s.t.o.r.e.S.y.n.|
00000080  63 00 56 00 65 00 72 00  73 00 69 00 6f 00 6e 00  |c.V.e.r.s.i.o.n.|
00000090  3e 00 31 00 35 00 30 00  3c 00 2f 00 42 00 61 00  |>.1.5.0.<./.B.a.|

Because this file has a different filename and is in a different path, using “DataModel” should keep identification specific to a PBIX file.

The Power BI Report has a template option. This format uses the .PBIT extension and doesn’t contain any data only a template to use with other data. The structure is roughly the same, but doesn’t contain the “DataModel” file, but “DataModelSchema”, which appears to be a JSON file.

hexdump -C DataModelSchema | head
00000000  7b 00 0d 00 0a 00 20 00  20 00 22 00 6e 00 61 00  |{..... . .".n.a.|
00000010  6d 00 65 00 22 00 3a 00  20 00 22 00 38 00 36 00  |m.e.".:. .".8.6.|
00000020  65 00 34 00 32 00 62 00  33 00 30 00 2d 00 30 00  |e.4.2.b.3.0.-.0.|
00000030  34 00 34 00 33 00 2d 00  34 00 36 00 30 00 63 00  |4.4.3.-.4.6.0.c.|
00000040  2d 00 61 00 36 00 66 00  36 00 2d 00 36 00 66 00  |-.a.6.f.6.-.6.f.|
00000050  34 00 35 00 35 00 66 00  64 00 64 00 31 00 61 00  |4.5.5.f.d.d.1.a.|
00000060  35 00 36 00 22 00 2c 00  0d 00 0a 00 20 00 20 00  |5.6.".,..... . .|
00000070  22 00 63 00 6f 00 6d 00  70 00 61 00 74 00 69 00  |".c.o.m.p.a.t.i.|
00000080  62 00 69 00 6c 00 69 00  74 00 79 00 4c 00 65 00  |b.i.l.i.t.y.L.e.|
00000090  76 00 65 00 6c 00 22 00  3a 00 20 00 31 00 35 00  |v.e.l.".:. .1.5.|

The DataModelSchema JSON has some plain text strings which could be used for identification. Later in the file there is a string, “defaultPowerBIDataSourceVersion“.

000001c0  20 00 20 00 20 00 7d 00  2c 00 0d 00 0a 00 20 00  | . . .}.,..... .|
000001d0  20 00 20 00 20 00 22 00  64 00 65 00 66 00 61 00  | . . .".d.e.f.a.|
000001e0  75 00 6c 00 74 00 50 00  6f 00 77 00 65 00 72 00  |u.l.t.P.o.w.e.r.|
000001f0  42 00 49 00 44 00 61 00  74 00 61 00 53 00 6f 00  |B.I.D.a.t.a.S.o.|
00000200  75 00 72 00 63 00 65 00  56 00 65 00 72 00 73 00  |u.r.c.e.V.e.r.s.|
00000210  69 00 6f 00 6e 00 22 00  3a 00 20 00 22 00 70 00  |i.o.n.".:. .".p.|
00000220  6f 00 77 00 65 00 72 00  42 00 49 00 5f 00 56 00  |o.w.e.r.B.I._.V.|
00000230  33 00 22 00 2c 00 0d 00  0a 00 20 00 20 00 20 00  |3.".,..... . . .|

Seems like the best identification of the template format.

As usual you can find my signature proposal on my GitHub along with a couple “safe” samples.

Picture It!

December 29, 2023 by Thor 1 Comment

Most everyone has heard of Microsoft Office, the suite of applications used by millions everyday. Less people know about Microsoft Works, which was a lower cost alternative, but was quite popular as a home office suite of applications. One tool which often came with the Works suite was a digital image tool called Picture It!

Picture It! was a photo editing tool first released by Microsoft in 1996 geared to making photo editing easy and affordable.

Picture It! used a wizard type interface which walked you through acquiring an image and adding to it. One of the key features of the software was the ability to “stack” objects like layers. Because of this feature a new file format was used to save this information to disk. Meet the Microsoft Image (Picture) Extension format, commonly known as the MIX file format. It is very similar to the FlashPix image format, which was supposed to be an image file format to solve many delivery issues, but didn’t seem to gain hold despite being created by Kodak, HP, and others. In fact many of the MIX files I found on Microsoft disks are actually FlashPix files.

The MIX extension was also used by another Microsoft program, PhotoDraw, which causes confusion as they were similar, but PhotoDraw has some added features which may not be compatible with Picture It!. Both formats are based on the Microsoft Compound Object (OLE) container, and have a similar structure. Let’s take a look at a MIX file from Picture It! version 1.

7z l PictureIt1-s02.mix                 

--
Path = PictureIt1-s02.mix
Type = Compound
Physical Size = 48128
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....          328          384  [5]Data Object 000001
                    .....          396          448  [5]Transform 000004
                    .....          872          896  [5]Operation 000001
                    .....          320          320  [1]CompObj
                    .....          292          320  [5]Global Info
                    .....          872          896  [5]Operation 000002
                    .....          144          192  [5]Operation 000003
                    .....          684          704  [5]Transform 000008
                    .....         1028         1088  [5]Transform 000009
                    .....          328          384  [5]Data Object 000009
                    .....          324          384  [5]Data Object 000005
2023-12-27 11:04:39 D....                            Data Object Store 000001
                    .....          328          384  [5]Data Object 000010
                    .....        20932        20992  [5]SummaryInformation
                    .....          200          256  [5]Microsoft Embedding Info
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0001
                    .....         1400         1408  Data Object Store 000001/[5]Image Contents
                    .....          230          256  Data Object Store 000001/[1]CompObj
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0000
                    .....           28           64  Data Object Store 000001/Resolution 0000/Subimage 0000 Data
                    .....           80          128  Data Object Store 000001/Resolution 0000/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0003
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0002
                    .....           28           64  Data Object Store 000001/Resolution 0002/Subimage 0000 Data
                    .....          208          256  Data Object Store 000001/Resolution 0002/Subimage 0000 Header
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0005
2023-12-27 11:04:39 D....                            Data Object Store 000001/Resolution 0004
                    .....           28           64  Data Object Store 000001/Resolution 0004/Subimage 0000 Data
                    .....         1792         1792  Data Object Store 000001/Resolution 0004/Subimage 0000 Header
                    .....          124          128  Data Object Store 000001/[5]SummaryInformation
                    .....           28           64  Data Object Store 000001/Resolution 0005/Subimage 0000 Data
                    .....         6976         7168  Data Object Store 000001/Resolution 0005/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0003/Subimage 0000 Data
                    .....          544          576  Data Object Store 000001/Resolution 0003/Subimage 0000 Header
                    .....           28           64  Data Object Store 000001/Resolution 0001/Subimage 0000 Data
                    .....          128          128  Data Object Store 000001/Resolution 0001/Subimage 0000 Header
------------------- ----- ------------ ------------  ------------------------
2023-12-27 11:04:39              38698        39872  29 files, 7 folders

This is a simple MIX file with one line of text, but contains a lot of content inside the OLE container. If I try and use the PRONOM registry to identify the file, I get:

sf PictureIt1-s02.mix 
---
siegfried   : 1.11.0
scandate    : 2023-12-27T11:06:32-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'PictureIt1-s02.mix'
filesize : 48128
modified : 2023-12-27T11:04:40-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/111'
    format  : 'OLE2 Compound Document Format'
    version : 
    mime    : 
    class   : 'Text (Structured)'
    basis   : 'byte match at 0, 30'
    warning :

Hmm, we know it is an OLE compound document, but it should identify as a Picture It! file as PRONOM has defined a PUID for the format. fmt/936 has been defined as “Microsoft Picture It! Image File 1”. So I am not sure why this file from version 1 is not identifying correctly. Let’s take a look. The PRONOM container signature for fmt/936 is looking for this:

    <ContainerSignature Id="17015" ContainerType="OLE2">
      <Description>Microsoft Picture It! Image File</Description>
      <Files>
        <File>
          <Path>CompObj</Path>
          <BinarySignatures>
            <InternalSignatureCollection>
              <InternalSignature ID="17015">
                <ByteSequence Reference="BOFoffset">
                  <SubSequence Position="1" SubSeqMinOffset="32"
                               SubSeqMaxOffset="32">
                    <Sequence>'Microsoft Picture It! version 1 Picture'</Sequence>
                  </SubSequence>
                </ByteSequence>
              </InternalSignature>
            </InternalSignatureCollection>
          </BinarySignatures>
        </File>
      </Files>
    </ContainerSignature>

The container signature is looking into the OLE container for the “CompObj” file (which seems to be required), then looks for the string “Microsoft Picture It! version 1 Picture” starting at the 32nd byte. That is pretty specific. The sample file I am using as an example has the following string of bytes.

hexdump -C PictureIt1-s02/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 00 68 61 56  |.............haV|
00000010  54 c1 ce 11 85 53 00 aa  00 a1 f9 5b 1e 00 00 00  |T....S.....[....|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 50 69  63 74 75 72 65 00 27 00  |e It! Picture.'.|
00000040  00 00 7b 35 36 36 31 36  38 30 30 2d 43 31 35 34  |..{56616800-C154|
00000050  2d 31 31 43 45 2d 38 35  35 33 2d 30 30 41 41 30  |-11CE-8553-00AA0|
00000060  30 41 31 46 39 35 42 7d  00 13 00 00 00 50 69 63  |0A1F95B}.....Pic|
00000070  74 75 72 65 49 74 21 2e  50 69 63 74 75 72 65 00  |tureIt!.Picture.|

Ok, so this sample has a similar string but is missing the “version 1” text. It seems the samples used to created the PRONOM signature was working off samples which included the version 1 in the header of CompObj. Maybe when Microsoft learned they would be making a version 2, they decided a version number should be included going forward. Let’s take a look a file from version 2 to compare:

hexdump -C PictureIt2-s01/\[1\]CompObj 
00000000  01 00 fe ff 03 0a 00 00  ff ff ff ff 50 28 72 2d  |............P(r-|
00000010  4b 8c d0 11 a9 6f 00 a0  c9 05 41 0d 28 00 00 00  |K....o....A.(...|
00000020  4d 69 63 72 6f 73 6f 66  74 20 50 69 63 74 75 72  |Microsoft Pictur|
00000030  65 20 49 74 21 20 76 65  72 73 69 6f 6e 20 32 20  |e It! version 2 |
00000040  50 69 63 74 75 72 65 00  27 00 00 00 7b 32 44 37  |Picture.'...{2D7|
00000050  32 32 38 35 30 2d 38 43  34 42 2d 31 31 44 30 2d  |22850-8C4B-11D0-|
00000060  41 39 36 46 2d 30 30 41  30 43 39 30 35 34 31 30  |A96F-00A0C905410|
00000070  44 7d 00 f4 39 b2 71 50  00 00 00 4d 00 69 00 63  |D}..9.qP...M.i.c|

Ok, so it looks like they did update the version string for version 2. This file also does not identify correctly. A quick look at the wikipedia page for Microsoft Picture It! tells us they continued to release the software until version 10. Is there a different string for each version?

Diving into this and gathering many samples has brought a lot of variants to surface. Let’s see if we can list all the CompObj header variants.

Version 1 samples:
Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! version 1 Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! Collage'{56616800-C154-11CE-8553-00AA00A1F95B}

Version 2 samples:
Microsoft Picture It! version 2 Picture'{2D722850-8C4B-11D0-A96F-00A0C905410D}

Version 3 samples:
Microsoft Picture It! version 3 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

Version 4 samples:
Microsoft Picture It! version 4 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 1 samples:
Microsoft PhotoDraw version 1 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}

PhotoDraw version 2 samples:
Microsoft PhotoDraw version 2 Picture'{18B8D021-B4FD-11D0-A97E-00A0C905410D}

FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

Ok, there is a lot to discuss here. First of all, it seems MIX was only used in Picture It! until version 5 (2001), then the Picture It! software used a new format, PNG Plus to store the layered stacks. More on that in a future post! Although some later versions seems to be able to open the older MIX format. Version 4 of the MIX format seems to be the last as the 2001 software had only version 4 files on it. Probably safe to say only the 4 versions are needed for identification.

You may notice the additional unique identifier I included in each format. This is called a Class ID for the OLE format, which A LOT of formats use. Each “format” has a unique ID associated with it to help distinguish it from other formats. This Unique ID could possibly be a better solution for identification. It does cross over with the PhotoDraw format, but the FlashPix format seems to have a unique ID. With all the variations in the version 1 strings, the ID remains the same. For version 3 and 4 the ID is the same, which could mean they are interchangeable. It is also the same as PhotoDraw version 1. Not to complicate things.

So it seems in order to get proper identification of these similar formats we need to:

Clean up version 1 identification for fmt/936
Add a signature for 2, 3, and 4
Add a version 2 signature for the PhotoDraw format
Add some additional signature variations for the FlashPix format.

The Class ID’s could be used to distinguish different versions and formats, but many of the ID’s are identical, this could mean they are the same format. But for now we can just add the additional variation strings and it should identify everything for now. The FlashPix format needs more research as there is so many different variations and it’s so close to the MIX format. Take a look at my GitHub submission, maybe you have some additional variations to add?

Multiplan

November 10, 2023 by Thor Leave a comment

This is a follow up post to the post “EARLY MICROSOFT EXCEL” earlier this year.

I have to admit, often when I am researching file formats I can get distracted by a shinier format I come across. I often go down rabbit holes and forget the reason I started down the path I am on. I try and focus on the current needs in my life as a Digital Preservation Manager, but can get easily sidetracked. I always look forward to November every year so I can celebrate World Digital Preservation Day which sometimes comes along with a PRONOM research week. This gives me a chance to look at formats that may need attention which are not normally on my radar.

This week I a taking a look again at Multiplan. There is a PRONOM PUID for version 4, but does not have a description nor does it have a binary signature. It is was also lacking a File Format Wiki entry. So I decided to dive in. I had already bumped into the format while doing some research on early Microsoft Excel formats. This includes the SYLK format which needed a little update.

Microsoft Multiplan was the parent of Microsoft Excel. Multiplan was built for many different types of computers in the 1980’s, but was never ported to Windows. So to use Multiplan you have to be comfortable with using DOS. If you want to take Multiplan for a spin, head over to PCjs Machines and load up one of the many emulated systems they have.

In the end, Multiplan had four versions, but the last one, version 4.2, had some big changes, especially to the file format. More on that in a minute.

Mutiplan Version 1 – DOS

hexdump -C MP1.MOD  | head
00000000  08 e7 00 00 58 09 01 00  08 00 01 00 00 00 0a 00  |....X...........|
00000010  40 00 00 00 2e f5 0a 80  27 07 94 00 12 00 01 00  |@.......'.......|
00000020  0a 00 01 00 0c 0a 08 00  27 00 0d 80 04 00 01 00  |........'.......|
00000030  54 00 00 00 27 00 10 00  54 52 41 4e 53 46 48 f5  |T...'...TRANSFH.|
00000040  00 80 84 0a 68 61 52 f5  58 f5 5a f5 4e f5 0c 0a  |....haR.X.Z.N...|
00000050  12 00 01 00 72 f5 72 f5  0a 80 4b 0b 0f 00 12 00  |....r.r...K.....|
00000060  0c 0a 01 00 0c 00 01 00  08 00 20 4e 40 00 09 00  |.......... N@...|
00000070  8a f5 0a 80 4a 07 30 00  48 00 01 00 20 4e 00 00  |....J.0.H... N..|
00000080  28 0c 18 00 04 00 0d 80  03 00 28 0c 04 00 00 00  |(.........(.....|
00000090  26 00 00 00 54 52 d0 01  00 00 a4 f5 0a 00 62 0b  |&...TR........b.|

Mutiplan Version 1 – Macintosh

hexdump -C Multiplan1  | head
00000000  11 ab 00 00 13 e8 00 00  00 00 00 00 00 02 02 8c  |................|
00000010  00 18 00 0e 02 a4 02 b2  00 0e 02 fe 00 03 00 0e  |................|
00000020  00 bd 01 e3 2f 0f 00 08  15 5e 19 d1 03 5e 19 dd  |..../....^...^..|
00000030  61 60 60 5e 16 90 00 67  60 60 60 8f 5f 03 e8 7a  |a``^...g```._..z|
00000040  30 61 60 60 13 5f 03 e8  7b 90 00 67 60 60 60 8f  |0a``._..{..g```.|
00000050  16 85 67 60 60 60 8f 16  6d 85 61 60 60 13 5e 10  |..g```..m.a``.^.|
00000060  7b 90 00 67 60 60 60 8f  13 7a 31 14 6a d7 16 6e  |{..g```..z1.j..n|
00000070  85 14 77 60 16 6f 85 67  60 60 60 90 00 67 60 60  |..w`.o.g```..g``|
00000080  60 90 00 67 60 60 60 8f  13 7a 31 14 6a d7 16 70  |`..g```..z1.j..p|
00000090  85 14 77 60 16 71 85 67  60 60 60 90 00 67 60 60  |..w`.q.g```..g``|

Mutiplan Version 2 – DOS

hexdump -C MP2.MOD | head
00000000  0c ec 00 00 08 ab 08 00  1f 00 1a 00 03 00 27 03  |..............'.|
00000010  4b 05 00 00 00 00 00 00  00 00 00 1d c8 14 03 00  |K...............|
00000020  00 00 2f 00 9a 2e b3 fc  46 02 34 04 f3 16 00 00  |../.....F.4.....|
00000030  00 00 00 00 08 00 10 22  00 00 0d 06 84 1d 08 1d  |......."........|
00000040  ff 03 83 0a c8 18 48 1a  02 19 00 00 00 00 15 1b  |......H.........|
00000050  98 15 85 15 03 00 2a 00  00 37 46 32 1c 00 18 00  |......*..7F2....|
00000060  00 00 01 00 01 00 a9 03  0f 80 e8 14 00 00 01 00  |................|
00000070  6a 1c 00 00 01 00 0d 00  0f 80 0a 15 00 00 77 20  |j.............w |
00000080  00 00 01 00 6e 61 6c 20  00 00 2a 00 00 00 04 00  |....nal ..*.....|
00000090  00 00 0d 00 14 19 00 00  d4 06 0e 80 24 15 00 00  |............$...|

The DOS files for Version 2 begin with 0CEC0000 08AB0800, but a file for the Xenix system starts with 0AEC0000 08AB0800. So it appears the first byte may be different depending on the system.

hexdump -C MP3.MOD | head                         
00000000  0c ed 00 00 08 ab 08 00  1f 00 1a 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000110  00 00 02 00 01 00 00 00  00 00 ff 0f ff 00 00 00  |................|
00000120  00 00 05 00 06 00 46 00  36 00 42 00 00 00 00 00  |......F.6.B.....|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 01 00 00  |................|
00000150  00 fe 0f 00 fe 00 00 00  00 00 00 00 00 00 00 00  |................|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The DOS files for Version 3 begin with a similar hex pattern, 0CED0000 08AB0800. This would make sense as the documentation for Multiplan 4.2 states it supports opening of Version 2 & 3, but not Version 1.

There was also a companion product that went along with Multiplan, it was called Microsoft Chart. Here is a file from version 3:

hexdump -C EXAMPLE1.MC | head
00000000  90 01 00 00 08 ab 00 00  00 00 00 00 00 00 04 00  |................|
00000010  80 00 05 00 04 00 43 10  00 00 00 00 00 00 00 00  |......C.........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  e8 ff 04 00 00 22 24 36  a4 1f 00 00 11 00 24 00  |....."$6......$.|
00000090  03 00 64 00 00 00 cc 0c  cc 0c cc 0c cc 0c 00 00  |..d.............|
000000a0  00 00 00 00 ff 7f 00 00  01 f0 00 00 00 5f 00 00  |............._..|
000000b0  00 00 a2 ff a0 ff 00 00  01 f0 01 00 64 00 00 00  |............d...|
000000c0  00 00 01 f0 00 00 01 70  00 00 8e ff 8c ff 00 ff  |.......p........|
000000d0  0d 00 00 00 20 14 01 00  00 00 00 00 00 00 00 00  |.... ...........|

The Chart file format has a similar byte pattern with the 08AB pattern and looks similar to the BIFF format. We will have to make sure it doesn’t conflict with any signatures so it can be identified separately.

Version 4 of Multiplan was the first to use the BIFF (Binary Interchange File Format). Technically Version BIFF2, not much is know about BIFF1 or if it ever existed. BIFF2 is the exact same format as Excel 2.0 used, so there will be some problems if we want to identify them separately. They currently identify as fmt/55.

hexdump -C MP4.MOD | head
00000000  09 00 04 00 40 01 10 00  42 00 02 00 b5 01 66 00  |....@...B.....f.|
00000010  1b 00 00 00 00 00 00 00  00 00 ff ff 0f 01 00 01  |................|
00000020  00 01 00 01 00 00 00 00  00 00 00 00 00 0d 00 02  |................|
00000030  00 01 00 0e 00 02 00 01  00 0f 00 02 00 00 00 11  |................|
00000040  00 02 00 00 00 2a 00 02  00 00 00 6b 00 13 00 01  |.....*.....k....|
00000050  00 00 00 00 00 fe 0f 00  fe 40 02 e0 3d d0 2f 00  |.........@..=./.|
00000060  01 00 26 00 08 00 00 00  00 00 00 00 e0 3f 27 00  |..&..........?'.|
00000070  08 00 00 00 00 00 00 00  e0 3f 28 00 08 00 00 00  |.........?(.....|
00000080  00 00 00 00 f0 3f 29 00  08 00 00 00 00 00 00 00  |.....?).........|
00000090  f0 3f 70 00 0b 00 00 00  2e 00 02 04 f0 0a 00 f0  |.?p.............|

hexdump -C EXCEL2.XLS | head
00000000  09 00 04 00 02 00 10 00  0b 00 10 00 71 02 00 00  |............q...|
00000010  01 00 29 00 06 03 00 00  dc 0d 00 00 0c 00 02 00  |..).............|
00000020  64 00 0d 00 02 00 01 00  0e 00 02 00 01 00 0f 00  |d...............|
00000030  02 00 01 00 10 00 08 00  fc a9 f1 d2 4d 62 50 3f  |............MbP?|
00000040  11 00 02 00 00 00 22 00  02 00 00 00 40 00 02 00  |......".....@...|
00000050  00 00 2a 00 02 00 00 00  2b 00 02 00 00 00 25 00  |..*.....+.....%.|
00000060  02 00 2c 01 31 00 09 00  c8 00 00 00 04 48 65 6c  |..,.1........Hel|
00000070  76 32 00 0e 00 00 00 00  00 00 00 90 01 00 00 00  |v2..............|
00000080  00 00 8d 31 00 09 00 c8  00 01 00 04 48 65 6c 76  |...1........Helv|
00000090  32 00 0e 00 00 00 00 00  00 00 bc 02 00 00 00 00  |2...............|

You can see in the hex values above a difference of two bytes in the header. The reason the Multiplan file identifies as an Excel 2 file is the PRONOM signature ignores those two bytes and allows them to be anything. Some specifications say these aren’t used, but clearly there is a use for them. We could probably use the same signature for Multiplan, but include the two bytes, then set the priority to the Multiplan signature.

Multiplan 4.2 is very different.

hexdump -C MP42.MOD | head
00000000  0c ef 4d 50 a4 01 00 00  00 00 00 00 00 00 00 00  |..MP............|
00000010  00 00 00 00 00 00 80 02  00 00 00 00 00 00 00 2e  |................|
00000020  ff 0f ff 00 01 00 d0 02  d0 02 a0 05 a0 05 d0 2f  |.............../|
00000030  e0 3d 40 02 09 00 03 00  02 04 0a 00 00 00 fe 0f  |.=@.............|
00000040  00 fe 00 00 01 00 01 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  01 00 00 00 06 01 15 50  |...............P|
00000060  05 00 00 00 00 00 00 00  06 00 13 00 07 00 07 00  |................|
00000070  00 00 00 00 00 00 08 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The hex values for the first 4 bytes have a similar pattern. 0CEF, Which seems to be in sequence where Version 3 left off. Microsoft calls this new format, New or Normal Binary File Format. They claim it is “the fastest loading and fastest saving file format ever“! Exciting as the new format probably was, it didn’t last long. Multiplan was phased out so Excel could shine.

When I was younger I didn’t use DOS very often because the computer my father brought home in the mid 1980’s was a Macintosh. I use DOS more now in my research then I did when I was younger. Using the DOS interface is not easy. There are a lot of key commands you need to know intuitively just to navigate, but it is fascinating to see how far software has come. Early Excel, Multiplan, and Chart were all intertwined, but hopefully combing through all of these samples can bring some clarity. Take a look at the draft signature I made and all the samples that go with it on my GitHub page.

Early Microsoft Excel

April 14, 2023 by Thor 1 Comment

The first version of Microsoft Excel was released on Macintosh in 1985. Before that there was MultiPlan.

The ancestor of Excel is Multiplan 1981-1988.
Until v4, it used blitted files.
From v4 on, it’s using Biff1, like Excel 1: Starts with 09 00 BoF, Ends with 0A 00 EoF.

Easter egg? Plan lives forever
— Ange (@angealbertini) March 12, 2023

MultiPlan version 4 and Excel version 2 used the well known and documented BIFF format. Before BIFF2 the formats are a bit of a mystery. AFAIK, Microsoft never released any documentation on the file format used for Excel version 1 and MultiPlan 1 -3, they emphasized using the SYLK format for interchange. To make matters worse, there were upwards of 100 different versions of the early MultiPlan, ported for dozens of different systems. Some of them are discussed on the TRS-80 website.

Or you can take MultiPlan 1.06 for a spin over at PCjs!

Needless to say documenting and finding a pattern which could be used to identify the early versions of MultiPlan and Excel 1 are difficult. These versions are missing from the PRONOM registry, but hopefully with enough samples, some patterns can be found to confidently identify formats from the early days of spreadsheets!

Marco Pontello’s TrID identifier software has signatures for the early Multiplan and Excel formats. His software scans for patterns in samples and finds commonalities between them. So the more samples he can scan the more accurate the identification can be.

Currently the signatures are as follows.

Microsoft Excel for Mac Spreadsheet (v1.x)
		<Pattern>
			<Bytes>532700</Bytes>
			<ASCII> S '</ASCII>
			<Pos>0</Pos>
		</Pattern>
		<Pattern>
			<Bytes>AB27000000000000000203</Bytes>
			<ASCII> . '</ASCII>
			<Pos>4</Pos>
		</Pattern>
Multiplan for Mac spreadsheet (v1.x)
		<Pattern>
			<Bytes>11AB000013E8000000000000</Bytes>
			<ASCII> . . . . . . . . . . . .</ASCII>
			<Pos>0</Pos>
		</Pattern>
Multiplan spreadsheet (v1.x)
		<Pattern>
			<Bytes>0CE9000008AB08001F0016000200</Bytes>
			<Pos>0</Pos>
		</Pattern>
Multiplan spreadsheet (v1.0x)
		<Pattern>
			<Bytes>08E700</Bytes>
			<Pos>0</Pos>
		</Pattern>
		<Pattern>
			<Bytes>0100</Bytes>
			<Pos>6</Pos>
		</Pattern>
		<Pattern>
			<Bytes>000000</Bytes>
			<Pos>11</Pos>
		</Pattern>
Multiplan spreadsheet (v2.x)
		<Pattern>
			<Bytes>0CEC000008AB08001F001A000300</Bytes>
			<Pos>0</Pos>
		</Pattern>
Multiplan for Xenix spreadsheet (v2.x)
		<Pattern>
			<Bytes>0AEC000008AB08001F001A000300</Bytes>
			<Pos>0</Pos>
		</Pattern>
Multiplan spreadsheet (v3.x)
		<Pattern>
			<Bytes>0CED000008AB08001F001A000000</Bytes>
			<Pos>0</Pos>
		</Pattern>

There seems to be some patterns between versions, but then also some major differences. Without a specification or an understanding of the system the samples were created on, it is hard to identify these formats with certainty. There could be hex values which are the same for the samples we have but different for others, headers can often have values indicating dates or length of the file, so finding variations in files is key to a good signature.

Keep an eye on my GitHub PRONOM Research folder as I add more samples and prepare a signature for PRONOM.