Binder

Microsoft is never in short supply of file formats. They have made many changes over the years. Introduced lots of products, some lasting longer than others. The list is quite long.

One such software was called Office Binder. Introduced with Office 95, it was a companion application to combine a number of OLE objects together in one “Binder”. Meant to be the digital version of an Office Binder one often uses for presentations or proposals.

You could add sections and include Word documents, Images, Powerpoint, Excel spreadsheets, basically any OLE object. Of course a Binder file itself was an OLE compound object. They had the extension OBD, and templates used OBT. The PRONOM registry has PUID’s for the different Binder versions, but there are some issues.

PUIDFormat NameFormat VersionExtension
fmt/237Microsoft Office Binder File for Windows95obd
fmt/240Microsoft Office Binder File for Windows97-2000obd
fmt/238Microsoft Office Binder Template for Windows95obt
fmt/241Microsoft Office Binder Template for Windows97-2000obt
fmt/239Microsoft Office Binder Wizard for Windows95obz
fmt/242Microsoft Office Binder Wizard for Windows97-2000obz
filename : 'Binder95-s01.obd'
filesize : 5120
modified : 2024-08-08T21:24:34-06:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/240'
format : 'Microsoft Office Binder File for Windows'
version : '97-2000'
mime :
class :
basis : 'extension match obd; container name Binder with name only'

Turns out only one of the PRONOM PUID’s has an actual signature, the others are placeholders. So when I run Siegfried on an Office Binder 95 file, it comes back as fmt/240 which points to an Office Binder 97-2000 file. It’s a simple signature, looking for an internal file named “Binder”, which is inherent of all the Binder file types.

    <ContainerSignature Id="5500" ContainerType="OLE2">
<Description>Microsoft Office Binder File for Windows 97-2000</Description>
<Files>
<File>
<Path>Binder</Path>
</File>
</Files>
</ContainerSignature>

Taking a look inside the Office 95 Binder file, we can see the “Binder” file.

Path = Binder95-s01.obd
Type = Compound
Physical Size = 5120
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 316 320 [5]SummaryInformation
..... 144 192 Binder
..... 280 320 [5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
740 832 3 files

hexdump -C Binder95-s01/Binder
00000000 90 00 00 00 05 00 00 00 00 00 00 00 05 00 00 00 |................|
00000010 00 00 00 00 a1 6a 8a 8e cc 55 ef 11 ab 06 00 0c |.....j...U......|
00000020 29 b1 b4 d0 00 00 00 00 00 00 00 00 00 00 00 00 |)...............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 40 86 61 a6 |............@.a.|
00000040 0b ea da 01 00 00 00 00 00 00 00 00 40 86 61 a6 |............@.a.|
00000050 0b ea da 01 09 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 2c 00 00 00 00 00 00 00 01 00 00 00 |....,...........|
00000070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000080 2c 00 00 00 2c 00 00 00 13 03 00 00 44 02 00 00 |,...,.......D...|

The bytes within a “Binder” file has some patterns, but nothing decipherable.

Microsoft Office Binder was only included in three versions of Office. Office 95, 97, and 2000. Let’s look at the other two versions.

Path = Binder97-s04.obd
Type = Compound
Physical Size = 5632
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 28 64 HdrFtr
..... 144 192 Binder
..... 260 320 [5]SummaryInformation
..... 404 448 [5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
836 1024 4 files

Path = Binder2K-S01.obd
Type = Compound
Physical Size = 5632
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 28 64 HdrFtr
..... 144 192 Binder
..... 260 320 [5]SummaryInformation
..... 232 256 [5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
664 832 4 files

It looks like version 97 and 2000 have an extra file. The “HdrFtr” file seems to reference a Header and Footer, which according to documentation was a feature added in Office 97.

What’s new in Office Binder 97

Office Binder makes it possible for you to group all your documents, workbooks, and presentations for a project in one place. To get started with Office Binder 97, add a new or existing document to your binder. Use the new Office 97 features while you work in a binder……. Print headers and footers for a binder

We can use the “HdrFtr” file within the container to differentiate between the 95 version and 97-2000 formats. Perhaps, a closer look at the DocumentSummaryInformation file in the future, might help with a more precise identification later. There doesn’t seem to be anything to distinguish an OBD file from a OBT template file, so those PUID’s may not be needed. The other format related to the Binder software has the OBZ extension. It is called a Wizard template file in some documentation, but I have been unable to find any type of “Wizard” functionality in the Office Binder Apps to generate a file. The OBZ format seems to have something to do with macros in Visual Basic. Luckily there are a few examples available on Office install disc‘s.

Path = CLIENT.OBZ
Type = Compound
Physical Size = 364032
Extension = doc
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
1995-07-05 17:25:15 D.... 7
1995-07-05 17:25:14 D.... 5
1995-07-05 17:25:13 D.... 4
..... 106 128 4/[1]CompObj
..... 20 64 4/[1]Ole
..... 8880 9216 4/WordDocument
..... 32 64 4/[3]View000
..... 492 512 4/[5]SummaryInformation
..... 236 256 4/[5]DocumentSummaryInformation
1995-07-05 17:25:14 D.... 6
..... 17760 17920 6/Book
..... 20 64 6/[1]Ole
..... 0 0 6/[3]View000
..... 102 128 6/[1]CompObj
..... 3260 3264 6/[5]SummaryInformation
..... 192 192 6/[5]DocumentSummaryInformation
..... 106 128 5/[1]CompObj
..... 20 64 5/[1]Ole
..... 8055 8192 5/WordDocument
..... 32 64 5/[3]View000
..... 7280 7680 5/[5]SummaryInformation
..... 220 256 5/[5]DocumentSummaryInformation
1995-07-05 17:25:16 D.... 9
1995-07-05 17:25:15 D.... 8
..... 13857 14336 8/Book
..... 20 64 8/[1]Ole
..... 0 0 8/[3]View000
..... 102 128 8/[1]CompObj
..... 188 192 8/[5]SummaryInformation
..... 196 256 8/[5]DocumentSummaryInformation
..... 854 896 Binder
1995-07-05 17:25:19 D.... 10
..... 80382 80384 10/Book
..... 20 64 10/[1]Ole
..... 0 0 10/[3]View000
..... 102 128 10/[1]CompObj
..... 4044 4096 10/[5]SummaryInformation
1995-07-05 17:25:19 D.... 10/_VBA_PROJECT
..... 9425 9728 10/_VBA_PROJECT/812f9922c6
..... 12302 12800 10/_VBA_PROJECT/7b2f9922a4
..... 36937 37376 10/_VBA_PROJECT/dir
..... 6609 6656 10/_VBA_PROJECT/7e2f9922b5
..... 23014 23040 10/_VBA_PROJECT/872f9922e8
..... 7995 8192 10/_VBA_PROJECT/842f9922d9
..... 5338 5632 10/_VBA_PROJECT/902f992333
..... 36119 36352 10/_VBA_PROJECT/8d2f99231e
..... 18129 18432 10/_VBA_PROJECT/932f992342
..... 13055 13312 10/_VBA_PROJECT/b42fbcaa59

..... 208 256 10/[5]DocumentSummaryInformation
..... 4228 4608 [5]SummaryInformation
..... 956 960 [5]DocumentSummaryInformation
..... 106 128 9/[1]CompObj
..... 20 64 9/[1]Ole
..... 5914 6144 9/WordDocument
..... 0 0 9/[3]View000
..... 1520 1536 9/[5]SummaryInformation
..... 220 256 9/[5]DocumentSummaryInformation
..... 16141 16384 7/Book
..... 20 64 7/[1]Ole
..... 0 0 7/[3]View000
..... 102 128 7/[1]CompObj
..... 188 192 7/[5]SummaryInformation
..... 192 192 7/[5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
1995-07-05 17:25:19 345316 351168 55 files, 8 folders

Sure enough, the OBZ file has a Visual Basic macro (VBA_Project). Unfortunately, it appears to be nested in an additional folder within the container, with a variable number number which is likely to change from file to file. That fact will make identification in PRONOM much more difficult, as the signatures are not designed for variable names. Possibly something we can investigate later.

Microsoft Binder was only released in Office 95, 97, and 2000, but was supported in Office XP and 2003 through an UNBIND.EXE application which would simply separate all the different objects back out to the individual files.

The Microsoft Office Binder is not included in Office 2003. However, if a Binder file created in a previous version of Office contains information you want to access, you can use the Unbind tool to pull out the information and save it in the formats of the appropriate programs. In order to do this procedure, the Unbind tool must be installed.

    As always, you can look at some sample files and my proposal for updated signatures on my GitHub page.

    UFO

    Researching file formats isn’t for everyone. Others may find it boring or even odd. Trying to explain to others the nuances of a binary format versus a container format would bring many tears. Their reactions sometimes are similar to hearing someone explain their belief in aliens. Passionate, but a bit on the crazy side.

    So with aliens and containers on my mind, let’s take a look at a format with the extension UFO. It is not an unidentified flying object or a UAP, it may as well be an unidentified file object, but in this case it is a “Ulead File for Objects” format. It is the exclusive file format for use with the PhotoImpact software from Ulead Systems, a Taiwanese developer known for many popular software programs. First released in 1996 with version 3, the PhotoImpact software was marketed as “a fully object-based tool, which pioneered a number of important innovations“.

    The reason it was a considered a full object-based tool was the UFO format is based on the, at the time, popular OLE Compound File Storage object format developed by Microsoft. So by using some OLE tools we can take a closer look at some of these Unidentified File Objects……..

    oleid Sample.ufo 
    oleid 0.60.1 - http://decalage.info/oletools
    THIS IS WORK IN PROGRESS - Check updates regularly!
    Please report any issue at https://github.com/decalage2/oletools/issues

    Filename: Sample.ufo
    --------------------+--------------------+----------+--------------------------
    Indicator |Value |Risk |Description
    --------------------+--------------------+----------+--------------------------
    File format |Generic OLE file / |info |Unrecognized OLE file.
    |Compound File | |Root CLSID: - None
    |(unknown format) | |
    --------------------+--------------------+----------+--------------------------
    Container format |OLE |info |Container type
    --------------------+--------------------+----------+--------------------------
    Encrypted |False |none |The file is not encrypted
    --------------------+--------------------+----------+--------------------------
    VBA Macros |No |none |This file does not contain
    | | |VBA macros.
    --------------------+--------------------+----------+--------------------------
    XLM Macros |No |none |This file does not contain
    | | |Excel 4/XLM macros.
    --------------------+--------------------+----------+--------------------------
    External |0 |none |External relationships
    Relationships | | |such as remote templates,
    | | |remote OLE objects, etc
    --------------------+--------------------+----------+--------------------------

    Well, it is a OLE file, but is unrecognized/unidentified by the oletools software. It also appears to be missing the root entry and CLSID you commonly find in OLE files. Since this is an OLE container we can also just use 7zip to peek inside as well.

    Path = Sample.ufo
    Type = Compound
    Physical Size = 937984
    Extension = compound
    Cluster Size = 512
    Sector Size = 64

    Date Time Attr Size Compressed Name
    ------------------- ----- ------------ ------------ ------------------------
    1999-05-25 03:33:05 D.... OS-3
    1999-05-25 03:33:04 D.... OS-1
    1999-05-25 03:33:03 D.... OS-0
    ..... 31122 31232 OS-0/ObjectImage
    ..... 1316 1344 OS-0/ObjectData
    ..... 137996 138240 OS-0/PathStream
    ..... 19591 19968 OS-0/ObjectMask0
    1999-05-25 03:33:05 D.... OS-2
    ..... 43405 43520 OS-2/ObjectImage
    ..... 1316 1344 OS-2/ObjectData
    ..... 176204 176640 OS-2/PathStream
    ..... 25524 25600 OS-2/ObjectMask0
    ..... 41588 41984 OS-1/ObjectImage
    ..... 1316 1344 OS-1/ObjectData
    ..... 170132 170496 OS-1/PathStream
    ..... 25221 25600 OS-1/ObjectMask0
    ..... 34505 34816 LtfMainImage
    ..... 656 704 LtfHeader
    1999-05-25 03:33:06 D.... OS-4
    ..... 19249 19456 OS-4/ObjectImage
    ..... 1316 1344 OS-4/ObjectData
    ..... 4842 5120 LtfPreviewImage
    ..... 1160 1216 LtfObjectList
    ..... 31753 32256 OS-3/ObjectImage
    ..... 1316 1344 OS-3/ObjectData
    ..... 131892 132096 OS-3/PathStream
    ..... 19439 19456 OS-3/ObjectMask0
    ------------------- ----- ------------ ------------ ------------------------
    1999-05-25 03:33:06 920859 925120 22 files, 5 folders

    In this sample file, we have a bunch of directories and objects, but none of what we expect to see in an OLE file, such as a “SummaryInformation” or “DocumentSummaryInformation” like we would see in a Word DOC file. By not having the standard contents of the container, it makes these files very specific to PhotoImpact software.

    Path = PhotoImpactX3-s01.ufo
    Type = Compound
    Physical Size = 5120
    Extension = compound
    Cluster Size = 512
    Sector Size = 64

    Date Time Attr Size Compressed Name
    ------------------- ----- ------------ ------------ ------------------------
    ..... 20 64 HotspotStream
    ..... 656 704 LtfHeader
    ..... 20 64 SliceInfoStream
    ..... 412 448 LtfPreviewImage
    ..... 714 768 WebPropStream
    ..... 20 64 ManualHotspotScriptInfoStream
    ..... 20 64 ObjectHotspotScriptInfoStream
    ------------------- ----- ------------ ------------ ------------------------
    1862 2176 7 files

    Here is another UFO file from the last version of the software PhotoImpact X3 when it was owned by Corel, but phased out in 2009. This is the basic file structure with no objects added to the file. We can be fairly confident these are the base files in most every other UFO file. It doesn’t have any of the “OS” folders which contain the objects, so I think the LtfHeader file might be our best bet for a signature. Let’s take a look at the Hex values for a few of them.

    hexdump -C PhotoImpactX3-s01/LtfHeader| head
    00000000 90 02 00 00 4c 54 46 00 58 02 00 00 02 00 ba dc |....LTF.X.......|
    00000010 ee 02 00 00 26 02 00 00 80 fc 0a 00 80 fc 0a 00 |....&...........|
    00000020 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
    00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

    hexdump -C Sample/LtfHeader| head
    00000000 90 02 00 00 4c 54 46 00 90 01 00 00 02 00 f7 bf |....LTF.........|
    00000010 90 01 00 00 90 01 00 00 80 fc 0a 00 80 fc 0a 00 |................|
    00000020 00 00 00 00 06 00 00 00 00 00 00 00 00 00 00 00 |................|
    00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

    hexdump -C v3/ANIMALS/LtfHeader| head
    00000000 90 02 00 00 4c 54 46 00 64 00 00 00 02 00 6e 00 |....LTF.d.....n.|
    00000010 40 01 00 00 c8 00 00 00 80 fc 0a 00 80 fc 0a 00 |@...............|
    00000020 00 00 00 00 11 00 00 00 01 00 00 00 01 00 00 00 |................|
    00000030 01 00 00 00 60 00 00 00 3c 00 00 00 60 00 00 00 |....`...<...`...|

    Making a signature using the first 8 bytes of the LtfHeader file appears to have worked for all the 3,400+ sample files I have collected. Problem is it also worked for another extension found in the some of the later versions of PhotoImpact.

    When you have successfully finished your template, make sure to save it in the Ulead File For Photo Project format (*.UFP). This allows you to open and use your template in the Photo Projects dialog box. In the Template tab, click Open Project and browse for the created file.

    They appear to be a template version for the format so we should be fine just adding the extension to the same signature.

    Well, this Unidentified File Object is no longer unidentifiable. Was it sent by aliens? Possibly, but at least we know where these UFO’s came from, PhotoImpact. Take a look at the samples and proposed signature in my GitHub.

    Also be sure to join us at this years iPres conference and attend our workshop on container signatures in PRONOM!

    SolidWorks

    The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.

    I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.

    SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.

    There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.

    Path = flatann.sldprt
    Type = Compound
    Physical Size = 5851648
    Extension = compound
    Cluster Size = 512
    Sector Size = 64
    
       Date      Time    Attr         Size   Compressed  Name
    ------------------- ----- ------------ ------------  ------------------------
    1997-08-05 08:34:21 D....                            Contents
                        .....        60844        60928  Header
                        .....        45022        45056  Preview
    1997-08-05 08:34:06 D....                            ThirdPty
                        .....          237          256  [5]SummaryInformation
    1997-08-05 08:34:18 D....                            _MO_VERSION_629
                        .....          157          192  _MO_VERSION_629/History
                        .....          126          128  [5]DocumentSummaryInformation
                        .....       996343       996352  Contents/Definition
                        .....      1003198      1003520  Contents/Default
                        .....       781536       781824  Contents/DisplayLists
    ------------------- ----- ------------ ------------  ------------------------
    1997-08-05 08:34:21            2887463      2888256  8 files, 3 folders

    We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.

    Path = dispenser.sldasm
    Type = Compound
    Physical Size = 2143232
    Extension = compound
    Cluster Size = 512
    Sector Size = 64
    
       Date      Time    Attr         Size   Compressed  Name
    ------------------- ----- ------------ ------------  ------------------------
    1997-03-19 17:29:16 D....                            ThirdPty
                        .....        16812        16896  Preview
                        .....         4655         5120  Header
    1997-09-04 15:30:48 D....                            Contents
                        .....      1009461      1009664  Contents/DisplayLists
                        .....        23931        24064  Contents/Definition
                        .....          237          256  [5]SummaryInformation
    1997-09-04 15:35:39 D....                            _MO_VERSION_629
                        .....          107          128  _MO_VERSION_629/History
                        .....          126          128  [5]DocumentSummaryInformation
    ------------------- ----- ------------ ------------  ------------------------
    1997-09-04 15:35:39            1055329      1056256  7 files, 3 folders

    Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.

    The SolidWorks 2000 format added additional files to the container which can help.

    Path = SW2000-s01.SLDPRT
    Type = Compound
    Physical Size = 20992
    Extension = compound
    Cluster Size = 512
    Sector Size = 64
    
       Date      Time    Attr         Size   Compressed  Name
    ------------------- ----- ------------ ------------  ------------------------
    2024-01-16 20:00:51 D....                            _DL_VERSION_1500
                        .....         5300         5632  Preview
                        .....          481          512  Header
    2024-01-16 20:00:51 D....                            Contents
    2024-01-16 20:00:51 D....                            ThirdPty
                        .....            4           64  Contents/OleItems
                        .....           69          128  Contents/CMgrHdr
                        .....          343          384  Contents/CMgr
                        .....         5456         5632  Contents/Config-0
                        .....          592          640  Contents/DisplayLists__Zip
                        .....          957          960  Contents/Definition
                        .....          252          256  [5]SummaryInformation
    2024-01-16 20:00:51 D....                            _MO_VERSION_1500
                        .....          840          896  _MO_VERSION_1500/Biography
                        .....           98          128  _MO_VERSION_1500/History
                        .....          148          192  [5]DocumentSummaryInformation
                        .....          120          128  ISolidWorksInformation
                        .....            6           64  _DL_VERSION_1500/DLUpdateStamp
    ------------------- ----- ------------ ------------  ------------------------
    2024-01-16 20:00:51              14666        15616  14 files, 4 folders

    The introduction of the “ISolidWorksInformation” file helps give positive identification of the SolidWorks format.

    hexdump -C SW2000-s01.SLDPRT/ISolidWorksInformation      
    00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
    00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
    00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
    00000030  48 00 00 00 02 00 00 00  02 00 00 00 18 00 00 00  |H...............|
    00000040  00 00 00 00 24 00 00 00  1e 00 00 00 01 00 00 00  |....$...........|
    00000050  00 00 00 00 02 00 00 00  00 00 00 00 01 00 00 00  |................|
    00000060  00 02 00 00 00 0d 00 00  00 53 57 2d 46 69 6c 65  |.........SW-File|
    00000070  20 4e 61 6d 65 00 00 00                           | Name...|
    
    hexdump -C SW2000-s02.SLDASM/ISolidWorksInformation
    00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
    00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
    00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
    00000030  6c 00 00 00 03 00 00 00  02 00 00 00 20 00 00 00  |l........... ...|
    00000040  03 00 00 00 2c 00 00 00  00 00 00 00 34 00 00 00  |....,.......4...|
    00000050  1e 00 00 00 01 00 00 00  00 00 00 00 0b 00 00 00  |................|
    00000060  00 00 00 00 03 00 00 00  00 00 00 00 01 00 00 00  |................|
    00000070  00 03 00 00 00 0e 00 00  00 41 73 73 65 6d 62 6c  |.........Assembl|
    00000080  79 20 74 79 70 65 00 02  00 00 00 0d 00 00 00 53  |y type.........S|
    00000090  57 2d 46 69 6c 65 20 4e  61 6d 65 00              |W-File Name.|
    
    hexdump -C SW2000-s01.SLDDRW/ISolidWorksInformation
    00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
    00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
    00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
    00000030  bc 01 00 00 0a 00 00 00  02 00 00 00 58 00 00 00  |............X...|
    00000040  03 00 00 00 64 00 00 00  04 00 00 00 70 00 00 00  |....d.......p...|
    00000050  05 00 00 00 7c 00 00 00  06 00 00 00 88 00 00 00  |....|...........|
    *
    000000d0  05 00 00 00 52 27 a0 89  b0 e1 d1 3f 05 00 00 00  |....R'.....?....|
    000000e0  51 6b 9a 77 9c a2 cb 3f  03 00 00 00 00 00 00 00  |Qk.w...?........|
    000000f0  0a 00 00 00 00 00 00 00  01 00 00 00 00 04 00 00  |................|
    00000100  00 15 00 00 00 53 57 2d  53 68 65 65 74 20 46 6f  |.....SW-Sheet Fo|
    00000110  72 6d 61 74 20 53 69 7a  65 00 05 00 00 00 11 00  |rmat Size.......|
    00000120  00 00 53 57 2d 43 75 72  72 65 6e 74 20 53 68 65  |..SW-Current She|
    00000130  65 74 00 08 00 00 00 19  00 00 00 41 63 74 69 76  |et.........Activ|
    00000140  65 20 73 68 65 65 74 20  70 61 70 65 72 20 77 69  |e sheet paper wi|
    00000150  64 74 68 00 02 00 00 00  0d 00 00 00 53 57 2d 46  |dth.........SW-F|
    00000160  69 6c 65 20 4e 61 6d 65  00 09 00 00 00 14 00 00  |ile Name........|
    00000170  00 41 63 74 69 76 65 20  73 68 65 65 74 20 48 65  |.Active sheet He|
    00000180  69 67 68 74 00 07 00 00  00 0e 00 00 00 53 57 2d  |ight.........SW-|
    00000190  53 68 65 65 74 20 4e 61  6d 65 00 0a 00 00 00 18  |Sheet Name......|
    000001a0  00 00 00 41 63 74 69 76  65 20 73 68 65 65 74 20  |...Active sheet |
    000001b0  70 61 70 65 72 20 73 69  7a 65 00 03 00 00 00 0f  |paper size......|
    000001c0  00 00 00 53 57 2d 53 68  65 65 74 20 53 63 61 6c  |...SW-Sheet Scal|
    000001d0  65 00 06 00 00 00 10 00  00 00 53 57 2d 54 6f 74  |e.........SW-Tot|
    000001e0  61 6c 20 53 68 65 65 74  73 00 00 00              |al Sheets...|

    Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:

    hexdump -C Bracket.SLDPRT | head
    00000000  9f e4 18 9f 00 00 00 04  26 00 42 15 14 00 06 00  |........&.B.....|
    00000010  08 00 06 00 40 a5 c3 a7  0e 51 5b 03 00 00 91 07  |....@....Q[.....|
    00000020  00 00 0d 00 00 00 34 f6  e6 47 56 e6 47 37 f2 34  |......4..GV.G7.4|
    00000030  d4 76 27 b5 55 5d 48 14  51 14 3e ab 2e f6 63 65  |.v'.U]H.Q.>...ce|
    00000040  8b be 55 2e 42 0f 45 89  05 16 68 3a 93 eb f6 03  |..U.B.E...h:....|
    00000050  ab 2e ae 89 d4 c2 3a ee  ce ae 53 bb 3b cb cc 2e  |......:...S.;...|
    00000060  18 42 0d f8 16 41 3d 95  42 94 24 41 b0 3d 54 14  |.B...A=.B.$A.=T.|
    00000070  fd 68 ad 52 0f 45 54 06  61 84 d1 0f 52 3e 44 20  |.h.R.ET.a...R>D |
    00000080  48 af 6e e7 cc cc dd 3f  5d ea a5 3b dc 3d df f9  |H.n....?]..;.=..|
    00000090  b9 e7 9c 7b ef b9 67 d3  69 0b 54 41 44 76 c8 d1  |...{..g.i.TADv..|
    
    hexdump -C SW2023-s01.SLDPRT | head
    00000000  f4 e9 02 fc 00 00 00 04  51 3f 60 ad 6a 35 f9 b3  |........Q?`.j5..|
    00000010  14 00 06 00 08 00 a8 8c  60 c0 d0 05 00 00 74 01  |........`.....t.|
    00000020  00 00 e8 02 00 00 07 00  00 00 05 27 56 67 96 56  |...........'Vg.V|
    00000030  77 d6 df ea 07 e7 cf ed  c6 8e 6c a1 48 70 d6 76  |w.........l.Hp.v|
    00000040  cd 16 7f e9 6b 95 3a 4e  bb 6e 95 cc d2 b3 69 a9  |....k.:N.n....i.|
    00000050  72 6b af c7 82 38 95 6f  bc 37 d2 4e a6 28 36 bd  |rk...8.o.7.N.(6.|
    00000060  c3 cf 85 46 0a 85 63 97  83 56 88 a1 38 02 64 14  |...F..c..V..8.d.|
    00000070  00 06 00 08 00 a8 8c 60  c0 44 07 00 00 d1 01 00  |.......`.D......|
    00000080  00 a2 03 00 00 22 00 00  00 34 f6 e6 47 56 e6 47  |....."...4..GV.G|
    00000090  37 f2 34 f6 e6 66 96 76  d2 03 d2 25 56 37 f6 c6  |7.4..f.v...%V7..|

    The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.

    More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.

    No bad deed….

    I had access to my first Macintosh computer around 1987. My father brought it home and I spent hours on it playing games and occasionally writing reports for school. The Macintosh Plus computer had one floppy drive and no hard drive. I remember playing the game Orbiter which had two floppy disks and right in the middle of game play it would pause and ask me to insert disk 2, then quickly ask for disk 1 again. The struggle was real. I spent years using many different Macintosh computers and now own more than I wish to admit. I’m preserving them!

    The wild world of digital preservation has been a little lacking on the Macintosh side of things as I have come to realize. There still not a great way to manage Resource Forks in many preservation systems and the identification tools are mainly focused on the data bytetreams and not any system specific attributes Macintosh used often.

    The PRONOM registry has either referenced early Macintosh specific formats or missed them entirely so I have been slowly working on a few to close that gap.

    Interestingly enough, many Microsoft programs initially made their GUI debuts on the early Macintosh before making their way to Windows. Excel is one I am working on, as Version 1 is not identifiable in PRONOM, it was Macintosh only at the time.

    Another is PowerPoint, I recently submitted two new signatures to PRONOM.

    fmt/1747: Microsoft PowerPoint Presentation v2.x. Full entry added.
    fmt/1748: Microsoft PowerPoint Presentation v3.x. Full entry added.
    fmt/1866: Microsoft Powerpoint for Macintosh v.2. Full entry added.
    fmt/1867: Microsoft Powerpoint for Macintosh v.3. Full entry added.

    PowerPoint was initially released in 1987 on the Macintosh platform. It was developed by a company called ForeThought. Version 1.0 on the Macintosh was under this name, until it was bought by Microsoft only three months after being released. The history of PowerPoint can be discovered at Robert Gaskins, one of the original developers, website and book he wrote. The available information provided by Microsoft is only for the OLE format, covering versions 4.0 until 2003.

    So, lets take a look at the Powerpoint original file format, before OLE.

       Type/Creator      RF      DF  Date         Filename
    f  SLDS/PPNT         0       932 Oct 10 19:10 PowerPoint-v1

    Luckily the early PowerPoint files did not have a Resource Fork. The Data Fork, if you haven’t noticed, has an interesting set of hex values at the beginning of the file. 0BADDEED is the first 4 bytes. If we look at a PowerPoint version 2 file from Windows.

    The file format is the same, but because of the weird world of endianness, the first few bytes are in reverse order, EDDEAD0B.

    Obviously we need to discuss this magic number and the meaning behind “Bad Deed”. This question was asked previously by the digital preservation community. I have a previous blog post about the use of words for the magic number CAFEBEEF as it was used with with JAVA class files and Express Publisher in the 1990’s. BADDEED looks like another clever use of the hex values that formed words. But was there a story behind the words? Joe Carrano asked if this string might be hexspeak. I wanted to know more so I asked some one who might know.

    Robert Gaskins was kind enough to chat with me for a bit about the early days of PowerPoint.

    I had a theory on the possible meaning behind BADDEED, so I asked him what the feeling was like between Apple and Microsoft at the time. I had heard for years that PowerPoint was originally created for the Macintosh, but Robert informed me:

      In fact, PowerPoint was designed first for Microsoft Windows, 

    and its first spec shows that: “All the screen shots, menus, and 

    dialogs were set up to look like Microsoft Windows, not like 

    Macintosh.”  (Gaskins, Sweating Bullets, p. 92)  You can see that 

    spec here.

    A year later, we concluded that we would be forced to ship 

    on Mac first, although we still thought that Windows was the 

    big opportunity and thought that Mac was risky.  “We just didn’t think 

    we could successfully ship a product for Windows, yet, though we planned 

    to later. (Gaskins, Sweating Bullets, p. 105)  The considerations are 

    summarized in my June 1986 product marketing document.

    Of course, we turned out to have been right all along.  PowerPoint on 

    Mac was much loved, but sales remained poor because Mac sales were 

    so poor.  It was only after we shipped on Windows that PowerPoint gained 

    the dominant market share which has characterized it ever since, and 

    Windows PPT outsold Mac PPT very quickly. (Gaskins, Sweating Bullets, p. 403)

    So my original thought was that there was some bad feelings around this Apple, Microsoft battle which has been the sentiment for quite some time. So when I asked if any of that influenced the use of BADDEED, I was told:

    So, far from being disgruntled by expanding PowerPoint to Windows, 

    that had been our goal all along, and its achievement was the most 

    important success we had.

    I judge that you are fully aware of all that, and that 

    your question is more, “was there any bad deed signified 

    by the Mac hex value chosen?”  No, it was just the poverty 

    of choice when you only have six letters.

    So there you have it. The use of the hex values 0x0BADDEED, was simply chosen from a limited set of values when looking at words hexadecimal could spell. I guess I should never let the truth get in the way of a good story.

    I continued to have a wonderful conversation with Robert and also asked him for some details on the rest of the PowerPoint file format. I was hoping there might be some documentation out there explaining the early format before Microsoft took over. Robert said:

     I don’t know of any such documentation apart from the official 

    Microsoft support files available online.  I don’t have any such 

    information.  I know that Dennis Austin deposited some of our 

    working files at the Computer History Museum (not online):

    https://archive.computerhistory.org/resources/access/text/finding-aids/102733943-Austin/102733943-Austin.pdf

    and it’s likely that some information is there–if nothing 

    else, it claims to contain a source code listing for PPT 1.0 

    which would contain the code to read the file format.

    So there might be some information in at the Computer History Museum worth looking into.

    As far as I could tell from the available online information, there is a few differences between Version 1.0 and Version 2.0, the biggest being the fact that 1.0 did not have an option to print in color, amount a few other minor things. Here is a screenshot of a page from the Microsoft PowerPoint 2.0 documentation on archive.org.

    I suppose with the signature additions of the Macintosh and Windows versions 2.0 and 3.0 of the PowerPoint file format in PRONOM, that should cover most needs. Currently my PowerPoint 1.0 files identify at 2.0 files, so I may need to have them adjust the PUID to include both versions 1.0 and 2.0 as they are so similar. If I am able to find a difference or get my hands on the original source code I may find a better solution.