The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.
I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.
SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.
There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.
Path = flatann.sldprt Type = Compound Physical Size = 5851648 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 1997-08-05 08:34:21 D.... Contents ..... 60844 60928 Header ..... 45022 45056 Preview 1997-08-05 08:34:06 D.... ThirdPty ..... 237 256 [5]SummaryInformation 1997-08-05 08:34:18 D.... _MO_VERSION_629 ..... 157 192 _MO_VERSION_629/History ..... 126 128 [5]DocumentSummaryInformation ..... 996343 996352 Contents/Definition ..... 1003198 1003520 Contents/Default ..... 781536 781824 Contents/DisplayLists ------------------- ----- ------------ ------------ ------------------------ 1997-08-05 08:34:21 2887463 2888256 8 files, 3 folders
We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.
Path = dispenser.sldasm Type = Compound Physical Size = 2143232 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 1997-03-19 17:29:16 D.... ThirdPty ..... 16812 16896 Preview ..... 4655 5120 Header 1997-09-04 15:30:48 D.... Contents ..... 1009461 1009664 Contents/DisplayLists ..... 23931 24064 Contents/Definition ..... 237 256 [5]SummaryInformation 1997-09-04 15:35:39 D.... _MO_VERSION_629 ..... 107 128 _MO_VERSION_629/History ..... 126 128 [5]DocumentSummaryInformation ------------------- ----- ------------ ------------ ------------------------ 1997-09-04 15:35:39 1055329 1056256 7 files, 3 folders
Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.
The SolidWorks 2000 format added additional files to the container which can help.
Path = SW2000-s01.SLDPRT Type = Compound Physical Size = 20992 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2024-01-16 20:00:51 D.... _DL_VERSION_1500 ..... 5300 5632 Preview ..... 481 512 Header 2024-01-16 20:00:51 D.... Contents 2024-01-16 20:00:51 D.... ThirdPty ..... 4 64 Contents/OleItems ..... 69 128 Contents/CMgrHdr ..... 343 384 Contents/CMgr ..... 5456 5632 Contents/Config-0 ..... 592 640 Contents/DisplayLists__Zip ..... 957 960 Contents/Definition ..... 252 256 [5]SummaryInformation 2024-01-16 20:00:51 D.... _MO_VERSION_1500 ..... 840 896 _MO_VERSION_1500/Biography ..... 98 128 _MO_VERSION_1500/History ..... 148 192 [5]DocumentSummaryInformation ..... 120 128 ISolidWorksInformation ..... 6 64 _DL_VERSION_1500/DLUpdateStamp ------------------- ----- ------------ ------------ ------------------------ 2024-01-16 20:00:51 14666 15616 14 files, 4 folders
The introduction of the “ISolidWorksInformation” file helps give positive identification of the SolidWorks format.
hexdump -C SW2000-s01.SLDPRT/ISolidWorksInformation 00000000 fe ff 00 00 04 0a 02 00 02 d5 cd d5 9c 2e 1b 10 |................| 00000010 93 97 08 00 2b 2c f9 ae 01 00 00 00 05 d5 cd d5 |....+,..........| 00000020 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae 30 00 00 00 |........+,..0...| 00000030 48 00 00 00 02 00 00 00 02 00 00 00 18 00 00 00 |H...............| 00000040 00 00 00 00 24 00 00 00 1e 00 00 00 01 00 00 00 |....$...........| 00000050 00 00 00 00 02 00 00 00 00 00 00 00 01 00 00 00 |................| 00000060 00 02 00 00 00 0d 00 00 00 53 57 2d 46 69 6c 65 |.........SW-File| 00000070 20 4e 61 6d 65 00 00 00 | Name...| hexdump -C SW2000-s02.SLDASM/ISolidWorksInformation 00000000 fe ff 00 00 04 0a 02 00 02 d5 cd d5 9c 2e 1b 10 |................| 00000010 93 97 08 00 2b 2c f9 ae 01 00 00 00 05 d5 cd d5 |....+,..........| 00000020 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae 30 00 00 00 |........+,..0...| 00000030 6c 00 00 00 03 00 00 00 02 00 00 00 20 00 00 00 |l........... ...| 00000040 03 00 00 00 2c 00 00 00 00 00 00 00 34 00 00 00 |....,.......4...| 00000050 1e 00 00 00 01 00 00 00 00 00 00 00 0b 00 00 00 |................| 00000060 00 00 00 00 03 00 00 00 00 00 00 00 01 00 00 00 |................| 00000070 00 03 00 00 00 0e 00 00 00 41 73 73 65 6d 62 6c |.........Assembl| 00000080 79 20 74 79 70 65 00 02 00 00 00 0d 00 00 00 53 |y type.........S| 00000090 57 2d 46 69 6c 65 20 4e 61 6d 65 00 |W-File Name.| hexdump -C SW2000-s01.SLDDRW/ISolidWorksInformation 00000000 fe ff 00 00 04 0a 02 00 02 d5 cd d5 9c 2e 1b 10 |................| 00000010 93 97 08 00 2b 2c f9 ae 01 00 00 00 05 d5 cd d5 |....+,..........| 00000020 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae 30 00 00 00 |........+,..0...| 00000030 bc 01 00 00 0a 00 00 00 02 00 00 00 58 00 00 00 |............X...| 00000040 03 00 00 00 64 00 00 00 04 00 00 00 70 00 00 00 |....d.......p...| 00000050 05 00 00 00 7c 00 00 00 06 00 00 00 88 00 00 00 |....|...........| * 000000d0 05 00 00 00 52 27 a0 89 b0 e1 d1 3f 05 00 00 00 |....R'.....?....| 000000e0 51 6b 9a 77 9c a2 cb 3f 03 00 00 00 00 00 00 00 |Qk.w...?........| 000000f0 0a 00 00 00 00 00 00 00 01 00 00 00 00 04 00 00 |................| 00000100 00 15 00 00 00 53 57 2d 53 68 65 65 74 20 46 6f |.....SW-Sheet Fo| 00000110 72 6d 61 74 20 53 69 7a 65 00 05 00 00 00 11 00 |rmat Size.......| 00000120 00 00 53 57 2d 43 75 72 72 65 6e 74 20 53 68 65 |..SW-Current She| 00000130 65 74 00 08 00 00 00 19 00 00 00 41 63 74 69 76 |et.........Activ| 00000140 65 20 73 68 65 65 74 20 70 61 70 65 72 20 77 69 |e sheet paper wi| 00000150 64 74 68 00 02 00 00 00 0d 00 00 00 53 57 2d 46 |dth.........SW-F| 00000160 69 6c 65 20 4e 61 6d 65 00 09 00 00 00 14 00 00 |ile Name........| 00000170 00 41 63 74 69 76 65 20 73 68 65 65 74 20 48 65 |.Active sheet He| 00000180 69 67 68 74 00 07 00 00 00 0e 00 00 00 53 57 2d |ight.........SW-| 00000190 53 68 65 65 74 20 4e 61 6d 65 00 0a 00 00 00 18 |Sheet Name......| 000001a0 00 00 00 41 63 74 69 76 65 20 73 68 65 65 74 20 |...Active sheet | 000001b0 70 61 70 65 72 20 73 69 7a 65 00 03 00 00 00 0f |paper size......| 000001c0 00 00 00 53 57 2d 53 68 65 65 74 20 53 63 61 6c |...SW-Sheet Scal| 000001d0 65 00 06 00 00 00 10 00 00 00 53 57 2d 54 6f 74 |e.........SW-Tot| 000001e0 61 6c 20 53 68 65 65 74 73 00 00 00 |al Sheets...|
Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:
hexdump -C Bracket.SLDPRT | head 00000000 9f e4 18 9f 00 00 00 04 26 00 42 15 14 00 06 00 |........&.B.....| 00000010 08 00 06 00 40 a5 c3 a7 0e 51 5b 03 00 00 91 07 |....@....Q[.....| 00000020 00 00 0d 00 00 00 34 f6 e6 47 56 e6 47 37 f2 34 |......4..GV.G7.4| 00000030 d4 76 27 b5 55 5d 48 14 51 14 3e ab 2e f6 63 65 |.v'.U]H.Q.>...ce| 00000040 8b be 55 2e 42 0f 45 89 05 16 68 3a 93 eb f6 03 |..U.B.E...h:....| 00000050 ab 2e ae 89 d4 c2 3a ee ce ae 53 bb 3b cb cc 2e |......:...S.;...| 00000060 18 42 0d f8 16 41 3d 95 42 94 24 41 b0 3d 54 14 |.B...A=.B.$A.=T.| 00000070 fd 68 ad 52 0f 45 54 06 61 84 d1 0f 52 3e 44 20 |.h.R.ET.a...R>D | 00000080 48 af 6e e7 cc cc dd 3f 5d ea a5 3b dc 3d df f9 |H.n....?]..;.=..| 00000090 b9 e7 9c 7b ef b9 67 d3 69 0b 54 41 44 76 c8 d1 |...{..g.i.TADv..| hexdump -C SW2023-s01.SLDPRT | head 00000000 f4 e9 02 fc 00 00 00 04 51 3f 60 ad 6a 35 f9 b3 |........Q?`.j5..| 00000010 14 00 06 00 08 00 a8 8c 60 c0 d0 05 00 00 74 01 |........`.....t.| 00000020 00 00 e8 02 00 00 07 00 00 00 05 27 56 67 96 56 |...........'Vg.V| 00000030 77 d6 df ea 07 e7 cf ed c6 8e 6c a1 48 70 d6 76 |w.........l.Hp.v| 00000040 cd 16 7f e9 6b 95 3a 4e bb 6e 95 cc d2 b3 69 a9 |....k.:N.n....i.| 00000050 72 6b af c7 82 38 95 6f bc 37 d2 4e a6 28 36 bd |rk...8.o.7.N.(6.| 00000060 c3 cf 85 46 0a 85 63 97 83 56 88 a1 38 02 64 14 |...F..c..V..8.d.| 00000070 00 06 00 08 00 a8 8c 60 c0 44 07 00 00 d1 01 00 |.......`.D......| 00000080 00 a2 03 00 00 22 00 00 00 34 f6 e6 47 56 e6 47 |....."...4..GV.G| 00000090 37 f2 34 f6 e6 66 96 76 d2 03 d2 25 56 37 f6 c6 |7.4..f.v...%V7..|
The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.
More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.