SolidWorks

The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.

I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.

SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.

There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.

Path = flatann.sldprt
Type = Compound
Physical Size = 5851648
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21 D....                            Contents
                    .....        60844        60928  Header
                    .....        45022        45056  Preview
1997-08-05 08:34:06 D....                            ThirdPty
                    .....          237          256  [5]SummaryInformation
1997-08-05 08:34:18 D....                            _MO_VERSION_629
                    .....          157          192  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
                    .....       996343       996352  Contents/Definition
                    .....      1003198      1003520  Contents/Default
                    .....       781536       781824  Contents/DisplayLists
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21            2887463      2888256  8 files, 3 folders

We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.

Path = dispenser.sldasm
Type = Compound
Physical Size = 2143232
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-03-19 17:29:16 D....                            ThirdPty
                    .....        16812        16896  Preview
                    .....         4655         5120  Header
1997-09-04 15:30:48 D....                            Contents
                    .....      1009461      1009664  Contents/DisplayLists
                    .....        23931        24064  Contents/Definition
                    .....          237          256  [5]SummaryInformation
1997-09-04 15:35:39 D....                            _MO_VERSION_629
                    .....          107          128  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
1997-09-04 15:35:39            1055329      1056256  7 files, 3 folders

Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.

The SolidWorks 2000 format added additional files to the container which can help.

Path = SW2000-s01.SLDPRT
Type = Compound
Physical Size = 20992
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51 D....                            _DL_VERSION_1500
                    .....         5300         5632  Preview
                    .....          481          512  Header
2024-01-16 20:00:51 D....                            Contents
2024-01-16 20:00:51 D....                            ThirdPty
                    .....            4           64  Contents/OleItems
                    .....           69          128  Contents/CMgrHdr
                    .....          343          384  Contents/CMgr
                    .....         5456         5632  Contents/Config-0
                    .....          592          640  Contents/DisplayLists__Zip
                    .....          957          960  Contents/Definition
                    .....          252          256  [5]SummaryInformation
2024-01-16 20:00:51 D....                            _MO_VERSION_1500
                    .....          840          896  _MO_VERSION_1500/Biography
                    .....           98          128  _MO_VERSION_1500/History
                    .....          148          192  [5]DocumentSummaryInformation
                    .....          120          128  ISolidWorksInformation
                    .....            6           64  _DL_VERSION_1500/DLUpdateStamp
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51              14666        15616  14 files, 4 folders

The introduction of the “ISolidWorksInformation” file helps give positive identification of the SolidWorks format.

hexdump -C SW2000-s01.SLDPRT/ISolidWorksInformation      
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  48 00 00 00 02 00 00 00  02 00 00 00 18 00 00 00  |H...............|
00000040  00 00 00 00 24 00 00 00  1e 00 00 00 01 00 00 00  |....$...........|
00000050  00 00 00 00 02 00 00 00  00 00 00 00 01 00 00 00  |................|
00000060  00 02 00 00 00 0d 00 00  00 53 57 2d 46 69 6c 65  |.........SW-File|
00000070  20 4e 61 6d 65 00 00 00                           | Name...|

hexdump -C SW2000-s02.SLDASM/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  6c 00 00 00 03 00 00 00  02 00 00 00 20 00 00 00  |l........... ...|
00000040  03 00 00 00 2c 00 00 00  00 00 00 00 34 00 00 00  |....,.......4...|
00000050  1e 00 00 00 01 00 00 00  00 00 00 00 0b 00 00 00  |................|
00000060  00 00 00 00 03 00 00 00  00 00 00 00 01 00 00 00  |................|
00000070  00 03 00 00 00 0e 00 00  00 41 73 73 65 6d 62 6c  |.........Assembl|
00000080  79 20 74 79 70 65 00 02  00 00 00 0d 00 00 00 53  |y type.........S|
00000090  57 2d 46 69 6c 65 20 4e  61 6d 65 00              |W-File Name.|

hexdump -C SW2000-s01.SLDDRW/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  bc 01 00 00 0a 00 00 00  02 00 00 00 58 00 00 00  |............X...|
00000040  03 00 00 00 64 00 00 00  04 00 00 00 70 00 00 00  |....d.......p...|
00000050  05 00 00 00 7c 00 00 00  06 00 00 00 88 00 00 00  |....|...........|
*
000000d0  05 00 00 00 52 27 a0 89  b0 e1 d1 3f 05 00 00 00  |....R'.....?....|
000000e0  51 6b 9a 77 9c a2 cb 3f  03 00 00 00 00 00 00 00  |Qk.w...?........|
000000f0  0a 00 00 00 00 00 00 00  01 00 00 00 00 04 00 00  |................|
00000100  00 15 00 00 00 53 57 2d  53 68 65 65 74 20 46 6f  |.....SW-Sheet Fo|
00000110  72 6d 61 74 20 53 69 7a  65 00 05 00 00 00 11 00  |rmat Size.......|
00000120  00 00 53 57 2d 43 75 72  72 65 6e 74 20 53 68 65  |..SW-Current She|
00000130  65 74 00 08 00 00 00 19  00 00 00 41 63 74 69 76  |et.........Activ|
00000140  65 20 73 68 65 65 74 20  70 61 70 65 72 20 77 69  |e sheet paper wi|
00000150  64 74 68 00 02 00 00 00  0d 00 00 00 53 57 2d 46  |dth.........SW-F|
00000160  69 6c 65 20 4e 61 6d 65  00 09 00 00 00 14 00 00  |ile Name........|
00000170  00 41 63 74 69 76 65 20  73 68 65 65 74 20 48 65  |.Active sheet He|
00000180  69 67 68 74 00 07 00 00  00 0e 00 00 00 53 57 2d  |ight.........SW-|
00000190  53 68 65 65 74 20 4e 61  6d 65 00 0a 00 00 00 18  |Sheet Name......|
000001a0  00 00 00 41 63 74 69 76  65 20 73 68 65 65 74 20  |...Active sheet |
000001b0  70 61 70 65 72 20 73 69  7a 65 00 03 00 00 00 0f  |paper size......|
000001c0  00 00 00 53 57 2d 53 68  65 65 74 20 53 63 61 6c  |...SW-Sheet Scal|
000001d0  65 00 06 00 00 00 10 00  00 00 53 57 2d 54 6f 74  |e.........SW-Tot|
000001e0  61 6c 20 53 68 65 65 74  73 00 00 00              |al Sheets...|

Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:

hexdump -C Bracket.SLDPRT | head
00000000  9f e4 18 9f 00 00 00 04  26 00 42 15 14 00 06 00  |........&.B.....|
00000010  08 00 06 00 40 a5 c3 a7  0e 51 5b 03 00 00 91 07  |....@....Q[.....|
00000020  00 00 0d 00 00 00 34 f6  e6 47 56 e6 47 37 f2 34  |......4..GV.G7.4|
00000030  d4 76 27 b5 55 5d 48 14  51 14 3e ab 2e f6 63 65  |.v'.U]H.Q.>...ce|
00000040  8b be 55 2e 42 0f 45 89  05 16 68 3a 93 eb f6 03  |..U.B.E...h:....|
00000050  ab 2e ae 89 d4 c2 3a ee  ce ae 53 bb 3b cb cc 2e  |......:...S.;...|
00000060  18 42 0d f8 16 41 3d 95  42 94 24 41 b0 3d 54 14  |.B...A=.B.$A.=T.|
00000070  fd 68 ad 52 0f 45 54 06  61 84 d1 0f 52 3e 44 20  |.h.R.ET.a...R>D |
00000080  48 af 6e e7 cc cc dd 3f  5d ea a5 3b dc 3d df f9  |H.n....?]..;.=..|
00000090  b9 e7 9c 7b ef b9 67 d3  69 0b 54 41 44 76 c8 d1  |...{..g.i.TADv..|

hexdump -C SW2023-s01.SLDPRT | head
00000000  f4 e9 02 fc 00 00 00 04  51 3f 60 ad 6a 35 f9 b3  |........Q?`.j5..|
00000010  14 00 06 00 08 00 a8 8c  60 c0 d0 05 00 00 74 01  |........`.....t.|
00000020  00 00 e8 02 00 00 07 00  00 00 05 27 56 67 96 56  |...........'Vg.V|
00000030  77 d6 df ea 07 e7 cf ed  c6 8e 6c a1 48 70 d6 76  |w.........l.Hp.v|
00000040  cd 16 7f e9 6b 95 3a 4e  bb 6e 95 cc d2 b3 69 a9  |....k.:N.n....i.|
00000050  72 6b af c7 82 38 95 6f  bc 37 d2 4e a6 28 36 bd  |rk...8.o.7.N.(6.|
00000060  c3 cf 85 46 0a 85 63 97  83 56 88 a1 38 02 64 14  |...F..c..V..8.d.|
00000070  00 06 00 08 00 a8 8c 60  c0 44 07 00 00 d1 01 00  |.......`.D......|
00000080  00 a2 03 00 00 22 00 00  00 34 f6 e6 47 56 e6 47  |....."...4..GV.G|
00000090  37 f2 34 f6 e6 66 96 76  d2 03 d2 25 56 37 f6 c6  |7.4..f.v...%V7..|

The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.

More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.

No bad deed….

I had access to my first Macintosh computer around 1987. My father brought it home and I spent hours on it playing games and occasionally writing reports for school. The Macintosh Plus computer had one floppy drive and no hard drive. I remember playing the game Orbiter which had two floppy disks and right in the middle of game play it would pause and ask me to insert disk 2, then quickly ask for disk 1 again. The struggle was real. I spent years using many different Macintosh computers and now own more than I wish to admit. I’m preserving them!

The wild world of digital preservation has been a little lacking on the Macintosh side of things as I have come to realize. There still not a great way to manage Resource Forks in many preservation systems and the identification tools are mainly focused on the data bytetreams and not any system specific attributes Macintosh used often.

The PRONOM registry has either referenced early Macintosh specific formats or missed them entirely so I have been slowly working on a few to close that gap.

Interestingly enough, many Microsoft programs initially made their GUI debuts on the early Macintosh before making their way to Windows. Excel is one I am working on, as Version 1 is not identifiable in PRONOM, it was Macintosh only at the time.

Another is PowerPoint, I recently submitted two new signatures to PRONOM.

fmt/1747: Microsoft PowerPoint Presentation v2.x. Full entry added.
fmt/1748: Microsoft PowerPoint Presentation v3.x. Full entry added.
fmt/1866: Microsoft Powerpoint for Macintosh v.2. Full entry added.
fmt/1867: Microsoft Powerpoint for Macintosh v.3. Full entry added.

PowerPoint was initially released in 1987 on the Macintosh platform. It was developed by a company called ForeThought. Version 1.0 on the Macintosh was under this name, until it was bought by Microsoft only three months after being released. The history of PowerPoint can be discovered at Robert Gaskins, one of the original developers, website and book he wrote. The available information provided by Microsoft is only for the OLE format, covering versions 4.0 until 2003.

So, lets take a look at the Powerpoint original file format, before OLE.

   Type/Creator      RF      DF  Date         Filename
f  SLDS/PPNT         0       932 Oct 10 19:10 PowerPoint-v1

Luckily the early PowerPoint files did not have a Resource Fork. The Data Fork, if you haven’t noticed, has an interesting set of hex values at the beginning of the file. 0BADDEED is the first 4 bytes. If we look at a PowerPoint version 2 file from Windows.

The file format is the same, but because of the weird world of endianness, the first few bytes are in reverse order, EDDEAD0B.

Obviously we need to discuss this magic number and the meaning behind “Bad Deed”. This question was asked previously by the digital preservation community. I have a previous blog post about the use of words for the magic number CAFEBEEF as it was used with with JAVA class files and Express Publisher in the 1990’s. BADDEED looks like another clever use of the hex values that formed words. But was there a story behind the words? Joe Carrano asked if this string might be hexspeak. I wanted to know more so I asked some one who might know.

Robert Gaskins was kind enough to chat with me for a bit about the early days of PowerPoint.

I had a theory on the possible meaning behind BADDEED, so I asked him what the feeling was like between Apple and Microsoft at the time. I had heard for years that PowerPoint was originally created for the Macintosh, but Robert informed me:

  In fact, PowerPoint was designed first for Microsoft Windows, 

and its first spec shows that: “All the screen shots, menus, and 

dialogs were set up to look like Microsoft Windows, not like 

Macintosh.”  (Gaskins, Sweating Bullets, p. 92)  You can see that 

spec here.

A year later, we concluded that we would be forced to ship 

on Mac first, although we still thought that Windows was the 

big opportunity and thought that Mac was risky.  “We just didn’t think 

we could successfully ship a product for Windows, yet, though we planned 

to later. (Gaskins, Sweating Bullets, p. 105)  The considerations are 

summarized in my June 1986 product marketing document.

Of course, we turned out to have been right all along.  PowerPoint on 

Mac was much loved, but sales remained poor because Mac sales were 

so poor.  It was only after we shipped on Windows that PowerPoint gained 

the dominant market share which has characterized it ever since, and 

Windows PPT outsold Mac PPT very quickly. (Gaskins, Sweating Bullets, p. 403)

So my original thought was that there was some bad feelings around this Apple, Microsoft battle which has been the sentiment for quite some time. So when I asked if any of that influenced the use of BADDEED, I was told:

So, far from being disgruntled by expanding PowerPoint to Windows, 

that had been our goal all along, and its achievement was the most 

important success we had.

I judge that you are fully aware of all that, and that 

your question is more, “was there any bad deed signified 

by the Mac hex value chosen?”  No, it was just the poverty 

of choice when you only have six letters.

So there you have it. The use of the hex values 0x0BADDEED, was simply chosen from a limited set of values when looking at words hexadecimal could spell. I guess I should never let the truth get in the way of a good story.

I continued to have a wonderful conversation with Robert and also asked him for some details on the rest of the PowerPoint file format. I was hoping there might be some documentation out there explaining the early format before Microsoft took over. Robert said:

 I don’t know of any such documentation apart from the official 

Microsoft support files available online.  I don’t have any such 

information.  I know that Dennis Austin deposited some of our 

working files at the Computer History Museum (not online):

https://archive.computerhistory.org/resources/access/text/finding-aids/102733943-Austin/102733943-Austin.pdf

and it’s likely that some information is there–if nothing 

else, it claims to contain a source code listing for PPT 1.0 

which would contain the code to read the file format.

So there might be some information in at the Computer History Museum worth looking into.

As far as I could tell from the available online information, there is a few differences between Version 1.0 and Version 2.0, the biggest being the fact that 1.0 did not have an option to print in color, amount a few other minor things. Here is a screenshot of a page from the Microsoft PowerPoint 2.0 documentation on archive.org.

I suppose with the signature additions of the Macintosh and Windows versions 2.0 and 3.0 of the PowerPoint file format in PRONOM, that should cover most needs. Currently my PowerPoint 1.0 files identify at 2.0 files, so I may need to have them adjust the PUID to include both versions 1.0 and 2.0 as they are so similar. If I am able to find a difference or get my hands on the original source code I may find a better solution.