NCH Software

Recently I came across a piece of software which used dozens of extensions for a single file format.

This T-Shirt Factory Deluxe files are a bit of an extreme, probably a prank against all of us doing file format identification. If you know who made this decision, I would like to have a chat.

This is not first time I have come across a format which seems to have been used for more than one software title. Awhile back I tried to find more information on a file format used with many tools created by MetaCreations. It was called “Composite File Management System“, and was used with Kai’s Power tools, Bryce3D, Ray Dream, Poser, and others. I did a previous post about the format.

I came across another recently with a similar issue. They are also many different software titles with the same native format.

NCH Software is an Australian software company who produce a massive number of software titles covering many different needs. From Audio Editing to Business charts and from Accounting tools to a 3D model converter, they have it all. Their audio editing software WavePad is quite popular. My initial entry into their software world was for the specialized Dictation/Scribe software which produced a slightly proprietary audio format with the extension DCT. This format does not use the format many of the other titles use.

With the number of different titles, it probably makes sense they use the same file structure to make processing/programming more efficient. They appear to be mostly proprietary binary files.

hexdump -C Wavepad/Untitled2.wpp | head
00000000 6c 73 64 66 01 00 1a 00 00 00 07 00 00 00 00 00 |lsdf............|
00000010 ca 84 20 00 00 00 00 00 e9 03 00 00 a5 84 20 00 |.. ........... .|
00000020 00 00 00 00 d0 07 00 00 99 84 20 00 00 00 00 00 |.......... .....|
00000030 d1 07 06 00 24 00 00 00 00 00 00 00 2f 55 73 65 |....$......./Use|
00000040 72 73 2f 74 79 6c 65 72 2f 44 65 73 6b 74 6f 70 |rs/tyler/Desktop|
00000050 2f 55 6e 74 69 74 6c 65 64 5f 30 2e 77 61 76 00 |/Untitled_0.wav.|
00000060 dc 07 02 00 04 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 d2 07 03 00 08 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 d3 07 03 00 08 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 d4 07 03 00 08 00 00 00 |................|

hexdump -C Crescendo/examples/Grooving.cdo | head
00000000 6c 73 64 66 01 00 05 00 00 00 03 00 00 00 00 00 |lsdf............|
00000010 8a b5 00 00 00 00 00 00 00 10 00 00 65 05 00 00 |............e...|
00000020 00 00 00 00 01 11 04 00 04 00 00 00 00 00 00 00 |................|
00000030 00 00 00 41 02 11 02 00 04 00 00 00 00 00 00 00 |...A............|
00000040 05 00 00 00 03 11 04 00 04 00 00 00 00 00 00 00 |................|
00000050 00 00 52 43 04 11 04 00 04 00 00 00 00 00 00 00 |..RC............|
00000060 00 80 94 43 05 11 04 00 04 00 00 00 00 00 00 00 |...C............|
00000070 00 00 a0 41 06 11 02 00 04 00 00 00 00 00 00 00 |...A............|
00000080 01 00 00 00 07 11 04 00 04 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 08 11 04 00 04 00 00 00 00 00 00 00 |................|

hexdump -C Spin3D/bunny.3dp | head
00000000 6c 73 64 66 01 00 20 00 00 00 01 00 00 00 00 00 |lsdf.. .........|
00000010 ec bc 65 00 00 00 00 00 00 10 00 00 e0 bc 65 00 |..e...........e.|
00000020 00 00 00 00 00 12 00 00 38 bc 65 00 00 00 00 00 |........8.e.....|
00000030 01 12 07 00 8c 26 26 00 00 00 00 00 cc d1 27 3f |.....&&.......'?|
00000040 1c b5 80 3f 3c f4 bd 3d d9 79 27 3f de af 80 3f |...?<..=.y'?...?|
00000050 bf 81 a9 3d ad fa 28 3f 10 e7 7d 3f 05 a8 a9 3d |...=..(?..}?...=|
00000060 ec a4 1a 3f 56 29 49 3f ab d0 c0 3d 3e 3c 1f 3f |...?V)I?...=><.?|
00000070 5f ed 4c 3f 5a 48 c0 3d 04 59 1b 3f 48 53 49 3f |_.L?ZH.=.Y.?HSI?|
00000080 42 e9 ab 3d 74 5d 1c 3f 05 6c 3b 3f f7 03 5e 3d |B..=t].?.l;?..^=|
00000090 46 d2 1a 3f f6 d4 3e 3f ef ac 5d 3d 94 db 1a 3f |F..?..>?..]=...?|

hexdump -C Voxal/Geek.voxal | head
00000000 6c 73 64 66 01 00 0c 00 00 00 01 00 00 00 00 00 |lsdf............|
00000010 ea 01 00 00 00 00 00 00 ec 03 01 00 01 00 00 00 |................|
00000020 00 00 00 00 01 e8 03 00 00 a9 01 00 00 00 00 00 |................|
00000030 00 00 20 02 00 04 00 00 00 00 00 00 00 13 00 00 |.. .............|
00000040 00 00 10 00 00 39 00 00 00 00 00 00 00 00 10 00 |.....9..........|
00000050 00 0d 00 00 00 00 00 00 00 00 20 01 00 01 00 00 |.......... .....|
00000060 00 00 00 00 00 00 01 20 04 00 04 00 00 00 00 00 |....... ........|
00000070 00 00 c3 f5 40 41 02 20 02 00 04 00 00 00 00 00 |....@A. ........|
00000080 00 00 22 00 00 00 00 20 02 00 04 00 00 00 00 00 |..".... ........|
00000090 00 00 0e 00 00 00 00 10 00 00 29 00 00 00 00 00 |..........).....|

hexdump -C PhotoPad/test.ppp | head
00000000 6c 73 64 66 01 00 02 00 00 00 00 00 00 00 00 00 |lsdf............|
00000010 ee 3c 00 00 00 00 00 00 c9 00 01 00 01 00 00 00 |.<..............|
00000020 00 00 00 00 00 04 00 00 00 d5 3c 00 00 00 00 00 |..........<.....|
00000030 00 02 00 00 00 c9 3c 00 00 00 00 00 00 03 00 06 |......<.........|
00000040 00 0f 00 00 00 00 00 00 00 6f 72 69 67 69 6e 61 |.........origina|
00000050 6c 5f 69 6d 61 67 65 00 01 00 00 00 85 3c 00 00 |l_image......<..|
00000060 00 00 00 00 07 00 07 00 79 3c 00 00 00 00 00 00 |........y<......|
00000070 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 |.PNG........IHDR|
00000080 00 00 04 00 00 00 03 00 08 06 00 00 00 ba ba 15 |................|
00000090 0d 00 00 00 01 73 52 47 42 00 ae ce 1c e9 00 00 |.....sRGB.......|

Above are just a few of the titles which use the same structure. The LSDF string is the first 4 bytes and always the last 4 bytes. The next two bytes, 0100, seem consistent for all samples, but the two bytes after that seem to be unique to the software. So far I have found the following titles use the format.

Software TitleNameExtensionPattern
WavePadWavePad Audio Editor Project FileWPP6C736466 01001A00
CrescendoCrescendo Score FileCDO6C736466 01000500
Spin3DNCH Software model format3DP6C736466 01002000
VoxalVoxal Voices FileVOXAL6C736466 01000C00
PhotoPadPhotoPad Project FilePPP6C736466 01000200
MixPadMixPad ProjectMPDP6C736466 01000400
DisketchDisketch ProjectDEPROJ6C736466 01000700
ClickChartsClickCharts DiagramCCD6C736466 01000A00
DreamPlanDreamPlan FileDDP6C736466 01001300
DrawPadDrawPad FileDRP6C736466 01001500

Without downloading and installing their vast library of software it’s hard to know all the different titles which use the format. The rest of the file for each sample seems to be proprietary in a binary format, except a few with a PNG image mixed in.

The simplest sample I could find was a preset file for the Zulu DJ Software which uses the ECF extension. The ECF extension is common with a few of the titles, like effect chains for WavePad and MixPad.

hexdump -C Zulu/Untitled.ecf
00000000 6c 73 64 66 01 00 0c 00 00 00 01 00 00 00 00 00 |lsdf............|
00000010 6b 00 00 00 00 00 00 00 00 10 00 00 1a 00 00 00 |k...............|
00000020 00 00 00 00 00 00 01 00 01 00 00 00 00 00 00 00 |................|
00000030 00 01 00 01 00 01 00 00 00 00 00 00 00 00 00 30 |...............0|
00000040 02 00 04 00 00 00 00 00 00 00 01 00 00 00 00 20 |............... |
00000050 00 00 29 00 00 00 00 00 00 00 00 10 00 00 0d 00 |..).............|
00000060 00 00 00 00 00 00 00 20 01 00 01 00 00 00 00 00 |....... ........|
00000070 00 00 00 00 20 02 00 04 00 00 00 00 00 00 00 00 |.... ...........|
00000080 00 00 01 6c 73 64 66 |...lsdf|

This header is identical to the header for the VOXAL format, so not sure if the second set of 4 bytes is directly connected to the software title. Or if there purpose is something else.

The question that needs to be answered is how we might represent these formats in PRONOM if needed. We could create a unique signature for each title based on the magic header and footer and the second set of 4 bytes which may indicate the software. Or create a single generic signature to identify the basic format using the magic header and footer and adding all the extensions to the list, which would be lengthy. This would be the easiest and catch all formats related to NCH Software using this file format, but then additional characterization would need to happen to identify the specific software title needed to render the file.

The NCH Software company seems to churn out new software and versions quite frequently and a search for reviews of their software turns up some questionable results. Many might enjoy their software as they are easy to use and are free for home use. I had lots of trouble with a few of them as they wanted to mount network locations and disk images I had used recently, which seems sketchy. I would love to know if anyone uses their software and has any need to preserve these formats. I currently don’t, but found the common use of a file format intriguing. I also found no reference to the magic bytes they use, except for a few TrID entries. Marco always is a step ahead!

SolidWorks

The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.

I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.

SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.

There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.

Path = flatann.sldprt
Type = Compound
Physical Size = 5851648
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21 D....                            Contents
                    .....        60844        60928  Header
                    .....        45022        45056  Preview
1997-08-05 08:34:06 D....                            ThirdPty
                    .....          237          256  [5]SummaryInformation
1997-08-05 08:34:18 D....                            _MO_VERSION_629
                    .....          157          192  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
                    .....       996343       996352  Contents/Definition
                    .....      1003198      1003520  Contents/Default
                    .....       781536       781824  Contents/DisplayLists
------------------- ----- ------------ ------------  ------------------------
1997-08-05 08:34:21            2887463      2888256  8 files, 3 folders

We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.

Path = dispenser.sldasm
Type = Compound
Physical Size = 2143232
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1997-03-19 17:29:16 D....                            ThirdPty
                    .....        16812        16896  Preview
                    .....         4655         5120  Header
1997-09-04 15:30:48 D....                            Contents
                    .....      1009461      1009664  Contents/DisplayLists
                    .....        23931        24064  Contents/Definition
                    .....          237          256  [5]SummaryInformation
1997-09-04 15:35:39 D....                            _MO_VERSION_629
                    .....          107          128  _MO_VERSION_629/History
                    .....          126          128  [5]DocumentSummaryInformation
------------------- ----- ------------ ------------  ------------------------
1997-09-04 15:35:39            1055329      1056256  7 files, 3 folders

Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.

The SolidWorks 2000 format added additional files to the container which can help.

Path = SW2000-s01.SLDPRT
Type = Compound
Physical Size = 20992
Extension = compound
Cluster Size = 512
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51 D....                            _DL_VERSION_1500
                    .....         5300         5632  Preview
                    .....          481          512  Header
2024-01-16 20:00:51 D....                            Contents
2024-01-16 20:00:51 D....                            ThirdPty
                    .....            4           64  Contents/OleItems
                    .....           69          128  Contents/CMgrHdr
                    .....          343          384  Contents/CMgr
                    .....         5456         5632  Contents/Config-0
                    .....          592          640  Contents/DisplayLists__Zip
                    .....          957          960  Contents/Definition
                    .....          252          256  [5]SummaryInformation
2024-01-16 20:00:51 D....                            _MO_VERSION_1500
                    .....          840          896  _MO_VERSION_1500/Biography
                    .....           98          128  _MO_VERSION_1500/History
                    .....          148          192  [5]DocumentSummaryInformation
                    .....          120          128  ISolidWorksInformation
                    .....            6           64  _DL_VERSION_1500/DLUpdateStamp
------------------- ----- ------------ ------------  ------------------------
2024-01-16 20:00:51              14666        15616  14 files, 4 folders

The introduction of the “ISolidWorksInformation” file helps give positive identification of the SolidWorks format.

hexdump -C SW2000-s01.SLDPRT/ISolidWorksInformation      
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  48 00 00 00 02 00 00 00  02 00 00 00 18 00 00 00  |H...............|
00000040  00 00 00 00 24 00 00 00  1e 00 00 00 01 00 00 00  |....$...........|
00000050  00 00 00 00 02 00 00 00  00 00 00 00 01 00 00 00  |................|
00000060  00 02 00 00 00 0d 00 00  00 53 57 2d 46 69 6c 65  |.........SW-File|
00000070  20 4e 61 6d 65 00 00 00                           | Name...|

hexdump -C SW2000-s02.SLDASM/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  6c 00 00 00 03 00 00 00  02 00 00 00 20 00 00 00  |l........... ...|
00000040  03 00 00 00 2c 00 00 00  00 00 00 00 34 00 00 00  |....,.......4...|
00000050  1e 00 00 00 01 00 00 00  00 00 00 00 0b 00 00 00  |................|
00000060  00 00 00 00 03 00 00 00  00 00 00 00 01 00 00 00  |................|
00000070  00 03 00 00 00 0e 00 00  00 41 73 73 65 6d 62 6c  |.........Assembl|
00000080  79 20 74 79 70 65 00 02  00 00 00 0d 00 00 00 53  |y type.........S|
00000090  57 2d 46 69 6c 65 20 4e  61 6d 65 00              |W-File Name.|

hexdump -C SW2000-s01.SLDDRW/ISolidWorksInformation
00000000  fe ff 00 00 04 0a 02 00  02 d5 cd d5 9c 2e 1b 10  |................|
00000010  93 97 08 00 2b 2c f9 ae  01 00 00 00 05 d5 cd d5  |....+,..........|
00000020  9c 2e 1b 10 93 97 08 00  2b 2c f9 ae 30 00 00 00  |........+,..0...|
00000030  bc 01 00 00 0a 00 00 00  02 00 00 00 58 00 00 00  |............X...|
00000040  03 00 00 00 64 00 00 00  04 00 00 00 70 00 00 00  |....d.......p...|
00000050  05 00 00 00 7c 00 00 00  06 00 00 00 88 00 00 00  |....|...........|
*
000000d0  05 00 00 00 52 27 a0 89  b0 e1 d1 3f 05 00 00 00  |....R'.....?....|
000000e0  51 6b 9a 77 9c a2 cb 3f  03 00 00 00 00 00 00 00  |Qk.w...?........|
000000f0  0a 00 00 00 00 00 00 00  01 00 00 00 00 04 00 00  |................|
00000100  00 15 00 00 00 53 57 2d  53 68 65 65 74 20 46 6f  |.....SW-Sheet Fo|
00000110  72 6d 61 74 20 53 69 7a  65 00 05 00 00 00 11 00  |rmat Size.......|
00000120  00 00 53 57 2d 43 75 72  72 65 6e 74 20 53 68 65  |..SW-Current She|
00000130  65 74 00 08 00 00 00 19  00 00 00 41 63 74 69 76  |et.........Activ|
00000140  65 20 73 68 65 65 74 20  70 61 70 65 72 20 77 69  |e sheet paper wi|
00000150  64 74 68 00 02 00 00 00  0d 00 00 00 53 57 2d 46  |dth.........SW-F|
00000160  69 6c 65 20 4e 61 6d 65  00 09 00 00 00 14 00 00  |ile Name........|
00000170  00 41 63 74 69 76 65 20  73 68 65 65 74 20 48 65  |.Active sheet He|
00000180  69 67 68 74 00 07 00 00  00 0e 00 00 00 53 57 2d  |ight.........SW-|
00000190  53 68 65 65 74 20 4e 61  6d 65 00 0a 00 00 00 18  |Sheet Name......|
000001a0  00 00 00 41 63 74 69 76  65 20 73 68 65 65 74 20  |...Active sheet |
000001b0  70 61 70 65 72 20 73 69  7a 65 00 03 00 00 00 0f  |paper size......|
000001c0  00 00 00 53 57 2d 53 68  65 65 74 20 53 63 61 6c  |...SW-Sheet Scal|
000001d0  65 00 06 00 00 00 10 00  00 00 53 57 2d 54 6f 74  |e.........SW-Tot|
000001e0  61 6c 20 53 68 65 65 74  73 00 00 00              |al Sheets...|

Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:

hexdump -C Bracket.SLDPRT | head
00000000  9f e4 18 9f 00 00 00 04  26 00 42 15 14 00 06 00  |........&.B.....|
00000010  08 00 06 00 40 a5 c3 a7  0e 51 5b 03 00 00 91 07  |....@....Q[.....|
00000020  00 00 0d 00 00 00 34 f6  e6 47 56 e6 47 37 f2 34  |......4..GV.G7.4|
00000030  d4 76 27 b5 55 5d 48 14  51 14 3e ab 2e f6 63 65  |.v'.U]H.Q.>...ce|
00000040  8b be 55 2e 42 0f 45 89  05 16 68 3a 93 eb f6 03  |..U.B.E...h:....|
00000050  ab 2e ae 89 d4 c2 3a ee  ce ae 53 bb 3b cb cc 2e  |......:...S.;...|
00000060  18 42 0d f8 16 41 3d 95  42 94 24 41 b0 3d 54 14  |.B...A=.B.$A.=T.|
00000070  fd 68 ad 52 0f 45 54 06  61 84 d1 0f 52 3e 44 20  |.h.R.ET.a...R>D |
00000080  48 af 6e e7 cc cc dd 3f  5d ea a5 3b dc 3d df f9  |H.n....?]..;.=..|
00000090  b9 e7 9c 7b ef b9 67 d3  69 0b 54 41 44 76 c8 d1  |...{..g.i.TADv..|

hexdump -C SW2023-s01.SLDPRT | head
00000000  f4 e9 02 fc 00 00 00 04  51 3f 60 ad 6a 35 f9 b3  |........Q?`.j5..|
00000010  14 00 06 00 08 00 a8 8c  60 c0 d0 05 00 00 74 01  |........`.....t.|
00000020  00 00 e8 02 00 00 07 00  00 00 05 27 56 67 96 56  |...........'Vg.V|
00000030  77 d6 df ea 07 e7 cf ed  c6 8e 6c a1 48 70 d6 76  |w.........l.Hp.v|
00000040  cd 16 7f e9 6b 95 3a 4e  bb 6e 95 cc d2 b3 69 a9  |....k.:N.n....i.|
00000050  72 6b af c7 82 38 95 6f  bc 37 d2 4e a6 28 36 bd  |rk...8.o.7.N.(6.|
00000060  c3 cf 85 46 0a 85 63 97  83 56 88 a1 38 02 64 14  |...F..c..V..8.d.|
00000070  00 06 00 08 00 a8 8c 60  c0 44 07 00 00 d1 01 00  |.......`.D......|
00000080  00 a2 03 00 00 22 00 00  00 34 f6 e6 47 56 e6 47  |....."...4..GV.G|
00000090  37 f2 34 f6 e6 66 96 76  d2 03 d2 25 56 37 f6 c6  |7.4..f.v...%V7..|

The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.

More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.

Universal Scene Description

A few years ago I became obsessed with creating 3D models from physical objects. There was an app on my iPhone called 123D Catch by AutoDesk and it allowed you to take a series of photos with your iPhone camera, then combine them to create a 3D Model. This lead me down a path to eventually take a course on Photogrammetry and develop a process for capturing objects in our Museum.

Autodesk eventually discontinued the app and built the technology into their paid products. This is when we started seeing lidar introduced with handheld devices. The first one I tried was using my XBOX360 kinect sensor with the skanect software. The quality was horrible, but was fun to learn about depth sensors and structure from motion. When the iPhone finally came out with lidar sensor it was like Apple had read my mind. I love having the ability to capture objects I find into 3D models. The quality is pretty good, not as good as taking the time to capture image sets for photogrammetry and use tools like Aigisoft metashape, but apps like Scaniverse do a fantastic job. You can check out some of the models I have captured on my Sketchfab page.

With any new technology comes new file formats, and 3D formats are definitely no exception. It seems every software developer has to come up with their own proprietary format leaving the digital preservation folks scrambling to keep up. The DPC and Archivematica published a report a couple years ago and state:

“There are many challenges in preserving 3D data. As well as the complexity of the data itself, there is
a lack of interoperability between the different (often proprietary) systems that are used to create
and manipulate 3D models. Relationships to other data, software and hardware also need to be
captured and managed effectively.”

https://www.dpconline.org/docs/technology-watch-reports/2479-preserving-3d/file

With my new iPhone in hand I found myself with new file format I was unfamiliar with. Universal Scene Description is a framework to exchanging 3D data between different software developed by Pixar. The relationship between Apple and Pixar goes way back so it was no surprise the Apple iPhone has support built in for this new format and I found myself capturing and sending 3D models to others with an iPhone. The USDZ format is a ZIP package format for containing a USD 3D model and is perfect for sharing and preserving.

There is no current PRONOM signatures for identifying USD formats, so I wanted to look into creating one. This is where I ran into a problem. The current PRONOM signature syntax has no way of properly identifying the USDZ format. Let me explain.

When DROID or Siegfried is used to identify a container format such as USDZ. It will first identify the format as a ZIP file, which technically it is. This triggers the software to then refer to the container signature to see if any patterns from the files internal to the ZIP match to a known format. This is done by pointing to a specific file and a hex pattern or ascii string within the file. In the case of a USDZ the internal structure may look like this:

Listing archive: scaniverse-20210928-113055.usdz

--
Path = scaniverse-20210928-113055.usdz
Type = zip
Physical Size = 5702256

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2021-09-28 11:47:36 .....       297999       297999  scaniverse-20210928-113055.usdc
2021-09-28 11:47:36 .....      5403849      5403849  0/texgen_0.jpg
------------------- ----- ------------ ------------  ------------------------
2021-09-28 11:47:36            5701848      5701848  2 files

In this sample file the name of the USDZ is the same name as the internal USDC file. So the name of the USDC is variable and DROID needs a static name and path to look for patterns. The USDZ specification is clear that the only required file inside a USDZ is a USD model, anything else is ancillary and is not always going to be included. Currently the only format used is USDC, but in the future may allow a simple USD or USDA format. In addition, some of the other sample files show a very nested USDC file, making identification even more difficult.

Listing archive: Scan.usdz

--
Path = Scan.usdz
Type = zip
Physical Size = 19155195
Characteristics = Minor_Extra_ERROR

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2021-03-09 09:22:36 .....     19154773     19154773  /private/var/mobile/Containers/Data/Application/EFD09E66-32FB-4B08-8BED-B7E3D78FE1A8/tmp/Scan.usdc
------------------- ----- ------------ ------------  ------------------------
2021-03-09 09:22:36           19154773     19154773  1 files

The USDZ format is not the only file format which makes identification difficult through variable names and non-static patterns. An issue on GitHub has been raised to address this problem. One potential fix is to use glob patterns as suggested by the amazing Richard Lehane, creator of Siegfried. This way we could use wildcard to ignore the variable names and find any file with an extension of .USDC for example. The USDC file format has a nice 8 byte header “PXR-USDC” which is perfectly suited for identification so our container signature might look like this:

<ContainerSignature Id="1000" ContainerType="ZIP">
  <Description>USDZ 3D Package</Description>
    <Files>
	<File>
          <Path>*.usdc</Path>
              <BinarySignatures>
                  <InternalSignatureCollection>                    
	             <InternalSignature ID="300">
	                 <ByteSequence Reference="BOFoffset">
	                     <SubSequence Position="1" SubSeqMinOffset="0" SubSeqMaxOffset="0">
	                         <Sequence>50 58 52 2D 55 53 44 43</Sequence>
	                     </SubSequence>
	                 </ByteSequence>
	             </InternalSignature>
	          </InternalSignatureCollection>
             </BinarySignatures>
        </File>           
    </Files>
</ContainerSignature>

Update: I was able to get a beta version of Siegfried working with my test signature.

siegfried   : 1.11.0
scandate    : 2023-06-02T08:54:27-06:00
signature   : default.sig
created     : 2023-06-02T08:52:33-06:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V112.xml; container-signature-20230510.xml; extensions: usdz-signature-file-v1.xml; container extensions: usdz-dev1-signaturefile-20230601.xml'
---
filename : 'scaniverse-20210928-113055.usdz'
filesize : 5702256
modified : 2021-09-28T11:47:37-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'BYUdev/1'
    format  : 'USDZ 3D Package'
    version : 
    mime    : 'model/vnd.usdz+zip'
    class   : 
    basis   : 'extension match usdz; container name scaniverse-20210928-113055.usdc with byte match at 0, 8 (signature 1/2)'
    warning : 

I am still in the process of testing some beta versions of Siegfried in hopes of getting the glob matching to work, but still have more to do. Stay tuned!