Pro Tools Sessions

One of the most important software titles related to professional audio recording and mixing is Pro Tools. The Digital Audio Workstation by Digidesign, now Avid, has been around since 1991 and was born from the very popular Sound Designer software first released in 1985. When Sound Designer II was released a few years later, the audio format used became the standard file format for audio recordings. Pro Tools progressed from there to become the industry standard for professional audio production, even winning a Technical Grammy, Emmy, and Oscar.

Pro Tools helped produce amazing music for artists such as No Doubt, Maroon 5, Ricky Martin, and many others. Obviously the best part is the final mixed audio used to make the music we love, but the work that goes into creating the audio mixes is saved in a Pro Tools session. The session is where all the magic happens. A Pro Tools session is actually a project file within a folder where all the supporting files are located.

tree PT Sample/
├── Audio Files
│   ├── GTR 1_02.wav
│   ├── GTR 1_03.wav
│   └── GTR 1_04.wav
└── Test.ptx

These Session “Folders” can get pretty complex as more audio and effects are added to the session, adding folders such as Fade Files, Rendered Files, and Plug-in settings. The current version of Pro Tools uses a project session file with the extension PTX, but that wasn’t always the case. The current version of Pro Tools can be run on Macintosh and Windows, but that also was not always the case. Because the software was originally written for Macintosh hardware, the session files were only compatible on the Macintosh file system as well.

Lets start by looking at a session from Pro Tools version 1.1 from 1991.

ls -l@ Demo Disk 1 
total 1504
-rw-r--r--@ 1 thorsted Domain Users 45056 Sep 13 1991 Backward Kick
com.apple.FinderInfo 32
com.apple.ResourceFork 1354
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 0 Sep 16 1991 Demo Session
com.apple.FinderInfo 32
com.apple.ResourceFork 13671
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 0 Sep 16 1991 Desktop
com.apple.FinderInfo 32
com.apple.ResourceFork 3081
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 339456 Sep 13 1991 Solo 1
com.apple.FinderInfo 32
com.apple.ResourceFork 2040
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 350390 Sep 13 1991 Solo 2
com.apple.FinderInfo 32
com.apple.ResourceFork 2006
com.apple.provenance 11

You might notice the “Demo Session” file is Zero Bytes, but the Resource Fork is 13671 bytes in size.

The Pro Tools Sessions from the beginning until version 5 used this method of storing the session data. ALL in the Resource Fork. Because the session data was in the resource fork and the supporting audio files were in the Sound Designer II format, which also stored important information in the resource fork, this made it impossible to use on anything but a Macintosh file system.

Version 10 of Pro Tools allows you to export the full session back into older versions of the software to version 3.2. When you choose version 5 on a Mac, it forces you to also convert the audio formats to SD2 files as well. For versions 1 & 2 of Pro Tools, there was no official extension for the session files, but starting with version 3, you might often find the extension PT3, then PT4, and PT5. With version 4, there was also a version P24 extension used when Pro Tools version 4 made the leap to 24bit. But for each of these versions identification is not possible with current preservation tools like PRONOM. You could encode the session as a MacBinary to retain everything for modern systems, which is identifiable, but you could also use my proposal for a lookup in the TCDB python tool located here.

python3 TC-lookup-draft-uni.py "PT Session 02-41.pt4"
Type Code: PT4S
Creator Code: PTul
Size of Data Fork: 0 bytes
Size of Resource Fork: 14003 bytes
Rows with Type Code b'PT4S' and Creator Code b'PTul':
Row index: 32813
File Name: Pro Tools 4
Type: PT4S
Creator: PTul
Extension: pt4
Data by Ilan Szekely, Jerusalem: nan
ExtensionVersionTypeCreator
Pro Tools 1.1mtSFTLin
Pro Tools 2PSesPTul
PT3Pro Tools 3.2PSesPTul
PT4Pro Tools 4 16bitPT4SPTul
PT24Pro Tools 4 24bitPT24PTul
PT5Pro Tools 5PT5SPTul
PTSPro Tools 5.1-6.9PTS PTul
PTFPro Tools 7-9PTF PTul
PTXPro Tools 10+PTX PTul

There isn’t a lot of information about when Pro Tools was made for Windows. I found some references to a Windows NT version of the 16bit and 24bit version 4. I did also find a copy of the free Pro Tools version 5.01 for Windows 98. In the Read Me it states:

Cross–platform File Exchange is not supported in this version of Pro Tools FREE

File interchange between Mac and PC versions of Pro Tools FREE is not possible in this 5.0.1 release. We hope to include this functionality in a future release of Pro Tools FREE.You can exchange files with Pro Tools LE and TDM users who use the same platform (Mac or Win98/Me) as you, but remember, Pro Tools FREE is limited to 8 audio and 48 MIDI tracks.

Running the software confirms the session file for this version has the extension PT5 and not the later PTS for version 5.1. This version of Pro Tools also allows you to save back to the P24 and PT4 versions, which are probably the first Windows versions. But they are entirely different file formats from the Macintosh versions.

hexdump -C PT5-Win-s03.pt5 | head
00000000 00 00 01 00 00 00 45 ae 00 00 44 ae 00 00 03 98 |......E...D.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 00 00 5e 50 53 56 45 00 01 04 31 00 52 05 00 |...^PSVE...1.R..|
00000110 45 44 05 00 45 44 19 99 03 26 0c 50 72 6f 54 6f |ED..ED...&.ProTo|
00000120 6f 6c 73 20 35 2e 30 fc c5 00 d7 12 00 78 5e 00 |ols 5.0......x^.|
00000130 00 00 0e 32 00 78 5e 00 00 00 00 00 00 00 00 00 |...2.x^.........|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

hexdump -C PT24-Win-s03.p24 | head
00000000 00 00 01 00 00 00 3f d3 00 00 3e d3 00 00 02 f1 |......?...>.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 00 01 0a 50 61 74 68 00 01 02 b4 37 0e 2e 48 |....Path....7..H|
00000110 43 3a 5c 57 49 4e 44 4f 57 53 5c 44 65 73 6b 74 |C:\WINDOWS\Deskt|
00000120 6f 70 5c 50 54 5c 50 54 35 2d 57 69 6e 2d 73 30 |op\PT\PT5-Win-s0|
00000130 33 5c 41 75 64 69 6f 20 46 69 6c 65 73 00 00 00 |3\Audio Files...|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

hexdump -C PT4-Win-s03-16.pt4 | head
00000000 00 00 01 00 00 00 3f d9 00 00 3e d9 00 00 02 f1 |......?...>.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 00 01 0a 50 61 74 68 00 01 02 b4 37 0e 2e 48 |....Path....7..H|
00000110 43 3a 5c 57 49 4e 44 4f 57 53 5c 44 65 73 6b 74 |C:\WINDOWS\Deskt|
00000120 6f 70 5c 50 54 5c 50 54 35 2d 57 69 6e 2d 73 30 |op\PT\PT5-Win-s0|
00000130 33 5c 41 75 64 69 6f 20 46 69 6c 65 73 00 00 00 |3\Audio Files...|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Starting with Pro Tools 5.1 in 2001 things began to change. Pro Tools has always been tied very closely with hardware and software so with Apple launching Mac OS X, this provided an opportunity for DigiDesign/Avid to revamp their hardware and software for better compatibility and this included a cross-platform session format.

Pro Tools 5.1 used a new session format which used the extension PTS. Let’s take a look at a sample.

hexdump -C PT Session 02-51.pts | head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 00 01 3d 6e 1c 06 eb d8 c1 aa 16 fd 65 4e 6d |1..=n........eNm|
00000020 23 09 96 db c4 ad 95 7f 68 5d 3a 23 0c a5 ac a8 |#.......h]:#....|
00000030 90 cd ed 04 38 4e 06 47 bc e2 ca b3 9c 8f 6e 57 |....8N.G......nW|
00000040 40 2a 12 fb e4 c4 b6 9f 88 77 5a 43 2c 24 ce c9 |@*.......wZC,$..|
00000050 e3 97 9b 8a 73 5d 46 2f 4a 64 86 b6 dd d6 eb 77 |....s]F/Jd.....w|
00000060 76 49 32 1b 54 9f b9 9f fc fe 15 0f 3f 15 4d 62 |vI2.T.......?.Mb|
00000070 83 aa ab c4 fa 5d 20 26 54 44 0b f3 d9 c5 ae 97 |.....] &TD......|
00000080 cd 08 31 74 77 0d f6 df c8 b5 c0 8b 6c 7c 3f 27 |..1tw.......l|?'|
00000090 10 9e c2 cb b4 9d 86 45 58 41 2a ad e1 78 2d b4 |.......EXA*..x-.|

The session is a new proprietary binary format with an interesting header. There is one byte and then a sequence of ASCII characters in the form of a binary string. 0010111100101011 What it means is unknown to me. In Decimal, the binary reads “12075”, or hex values “2F2B” or in text “/+”. Regardless of what it means, this header was used from versions 5.1 through 9. The extension changed to PTF with version 7-9, but the header is the same. This is why PRONOM PUID fmt/1951 refers to both extensions covering 5.1-9.

hexdump -C PT Session 02-7.ptf | head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 00 01 4c 6a cd 68 00 a0 3c d8 d2 c1 ac 48 be |1..Lj.h..<....H.|
00000020 85 1c 25 54 f0 8c 31 e1 61 fc 98 34 d0 6c 08 a4 |..%T..1.a..4.l..|
00000030 40 dc 79 14 b0 4c eb 84 21 bc 58 f4 90 2c cc 64 |@.y..L..!.X..,.d|
00000040 00 9c 0e a7 15 6f a9 44 e0 7c 18 b4 7a ec 88 24 |.....o.D.|..z..$|
00000050 c6 42 65 77 5d b8 f2 80 a1 3c d8 2e 12 ac 6b e4 |.Bew]....<....k.|
00000060 80 1c a2 71 f0 8c 2c c4 60 fc ae 47 b5 0f 09 a4 |...q..,.`..G....|
00000070 40 dc 78 14 9a 4c e8 84 26 a2 c5 17 fd 58 52 e0 |@.x..L..&....XR.|
00000080 01 9c 38 d4 70 0d a8 44 e0 26 1a b4 73 ec 88 24 |..8.p..D.&..s..$|
00000090 da 79 f8 94 34 cc 68 04 96 4f bd 17 11 ac 48 e4 |.y..4.h..O....H.|

It might be possible to look closer at the two extensions and find something which can distinguish between them, but because they are in a proprietary binary format, there isn’t much to go on. There has been a few attempts at reverse engineering the formats, but they even choose to lump the two extensions together.

The other import byte in this header is the second byte after the odd binary ASCII sequence. Above highlighted in purple. 0x01 is important because in the next version PTX, this changes to 0x05, highlighted below in purple.

Pro Tools version 10 was a big release, it added new features and started to phase out the HD hardware. With this release we see a new session format which is still used by the current version of Pro Tools.

hexdump -C PT Session 02-10.ptx | head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 00 05 13 5a 01 00 04 00 00 00 49 a4 00 00 5a |1...Z......I...Z|
00000020 03 00 64 00 00 00 03 00 00 0c 00 00 00 50 72 6f |..d..........Pro|
00000030 20 54 6f 6f 6c 73 20 48 44 03 00 00 00 0a 00 00 | Tools HD.......|
00000040 00 03 00 00 00 09 00 00 00 06 00 00 00 31 30 2e |.............10.|
00000050 33 2e 39 01 07 00 00 00 52 65 6c 65 61 73 65 00 |3.9.....Release.|
00000060 16 00 00 00 50 72 6f 20 54 6f 6f 6c 73 20 53 65 |....Pro Tools Se|
00000070 73 73 69 6f 6e 20 46 69 6c 65 06 00 05 00 00 00 |ssion File......|
00000080 4d 61 63 4f 53 00 00 00 00 05 5a 08 00 eb 00 00 |MacOS.....Z.....|
00000090 00 67 20 00 00 00 00 2a 00 00 00 be 1d 9d e3 03 |.g ....*........|

This new session format has the same binary ASCII string, but a lot more plain text in the header and throughout the file. This gives us more to explore and understand with even listing the linked Audio files and their paths. PRONOM has this new format assigned to PUID fmt/1727. The signature for these files is the same sequence as the previous version, also the 0x05 byte, but with a couple additional bytes, 5A010004, after the main sequence. I am not sure of the bytes significance, but they are in all the samples I have, even from the current version.

Pro Tools has some other formats which go along with their sessions. One I’ll highlight is the Groove template format. They end with the extension GRV. You can see some samples here. They also have the odd binary ASCII header, but with 0x00 for the second byte after the main header. Highlighted in purple below.

hexdump -C DiskoKonga.grv| head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 01 00 5a 00 01 00 00 00 04 00 00 15 f8 5a 00 |1..Z..........Z.|
00000020 01 00 00 15 d3 10 42 04 04 00 64 00 64 00 64 00 |......B...d.d.d.|
00000030 01 00 01 00 01 00 00 00 00 01 d4 c0 00 00 00 00 |................|
00000040 00 00 00 00 81 00 00 00 00 00 00 00 81 5a 00 01 |.............Z..|
00000050 00 00 00 24 10 43 00 00 00 00 00 00 00 00 00 00 |...$.C..........|
00000060 00 00 00 01 d4 c0 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 01 d4 c0 00 49 5a 00 01 00 00 00 24 10 |.......IZ.....$.|
00000080 43 00 00 00 00 00 01 d4 c0 00 00 00 00 00 05 7e |C..............~|
00000090 40 00 00 00 00 00 04 8e e0 00 00 00 00 00 01 d4 |@...............|

Other extensions associated with Pro Tools which use the same format are: PIO, PIM, PTT, PTXT, RGRP.

Pro Tools has always been software directly tied to audio hardware and system software. In addition they also used software dongles to control software licensing and the licenses were not cheap. Because of this, trying to use older versions is very difficult. Finding samples for each version is difficult as each version allows for a variety of features that may not be available in another version. Luckily, there are some older “Free” versions out there with limited features we can get some ideas of the session format.

PRONOM has working identification for the two major formats and until PRONOM can incorporate Macintosh Resource Fork identification it will have to do. The PC version 4 and 5 formats could use more research as I only have one source. The groove and other formats all seem to have the same header so they will need more research as well. Until then, enjoy some sample files and also a disk image of some older Macintosh Pro Tools 3 sessions.

PAR

Some file formats have a unique extension. Some formats use three character extensions which are well known, so its not common for them to be used with other software. Take the extension PDF for example, pretty sure no one else will use it as it is so well known. Other extensions often get reused by a few different software titles. There are plenty of titles which use the DOC extension.

Part of defining a file format I come across is also defining other formats which use the same extension or the same basic patterns within the format. I want the format I am researching to be identified correctly, but I also don’t want other formats to falsely identify as them either.

When using the DROID tool, if a file can’t be identified using a signature, the tool will then look to see if the extension matches any formats within the PRONOM registry, if it finds one, it will identify as that format with the identification method as “Extension”. This can be confusing and dangerous.

The topic of a format came up recently in reference to the extension PAR. Lets take a look at what we know about files with the extension PAR. Using the handy tool at digipres.org, we can see there are many formats using the PAR extension.

Apparently many people like to use the extension with their software. One might think their files with the PAR extension have to be in this list, and they would be wrong in that assumption. The PRONOM registry has no records of any format using the PAR extension. Hopefully we can add a few to help with proper identification instead of using the extension only.

A PArchive or Parity Volume Set is a group of file formats used in error correction and data integrity. Only the first version used the PAR extension, it is now obsolete with version 2 being the last stable version.

hexdump -C archive.par | head
00000000 50 41 52 00 00 00 00 00 00 00 01 00 00 09 00 02 |PAR.............|
00000010 8f d0 ce 2e 21 db 3b e5 41 d5 18 be d3 0e 52 f0 |....!.;.A.....R.|
00000020 de b6 b3 9f 53 09 ff ba 16 6b ca d2 48 a6 ca 45 |....S....k..H..E|
00000030 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000040 60 00 00 00 00 00 00 00 4e 00 00 00 00 00 00 00 |`.......N.......|
00000050 ae 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 4e 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |N...............|
00000070 45 16 01 00 00 00 00 00 76 da 44 2b 43 5f b5 bd |E.......v.D+C_..|
00000080 08 7b d2 b0 2e 16 7d 86 46 75 7b 79 f0 36 75 3b |.{....}.Fu{y.6u;|
00000090 a1 14 22 f3 0c 77 85 3c 70 00 61 00 72 00 2d 00 |.."..w.<p.a.r.-.|

hexdump -C Testing.docx.par2 | head
00000000 50 41 52 32 00 50 4b 54 84 00 00 00 00 00 00 00 |PAR2.PKT........|
00000010 76 1f e0 a4 5a 32 e0 84 d9 e9 32 32 06 9f 03 ff |v...Z2....22....|
00000020 71 48 73 d5 59 c6 ae 7c c7 21 3d ba 8d e5 ea 04 |qHs.Y..|.!=.....|
00000030 50 41 52 20 32 2e 30 00 46 69 6c 65 44 65 73 63 |PAR 2.0.FileDesc|
00000040 5d 74 b5 3d 64 ae 1f d8 ae 41 f1 8c 2f 7a cc c1 |]t.=d....A../z..|
00000050 27 9b bc 61 46 21 4d 37 a3 c7 f2 07 b4 b8 df 81 |'..aF!M7........|

Pretty straightforward. The only thing that would have made it easier is if the first version used “PAR1”, but be glad they didn’t as that signature is used by another!

hexdump -C null_list.parquet | head
00000000 50 41 52 31 15 00 15 18 15 18 2c 15 02 15 00 15 |PAR1......,.....|
00000010 06 15 06 00 00 02 00 00 00 02 00 02 00 00 00 02 |................|
00000020 01 26 42 1c 15 02 19 25 00 06 19 38 09 65 6d 70 |.&B....%...8.emp|
00000030 74 79 6c 69 73 74 04 6c 69 73 74 04 69 74 65 6d |tylist.list.item|
00000040 15 00 16 02 16 3a 16 3a 26 08 3c 36 02 00 00 00 |.....:.:&.<6....|
00000050 15 02 19 4c 48 0c 61 72 72 6f 77 5f 73 63 68 65 |...LH.arrow_sche|
00000060 6d 61 15 02 00 35 02 18 09 65 6d 70 74 79 6c 69 |ma...5...emptyli|
00000070 73 74 15 02 15 06 4c 3c 00 00 00 35 04 18 04 6c |st....L<...5...l|
00000080 69 73 74 15 02 00 15 02 25 02 18 04 69 74 65 6d |ist.....%...item|
00000090 6c bc 00 00 00 16 02 19 1c 19 1c 26 42 1c 15 02 |l..........&B...|

Apache Parquet is a more modern format used to store column-oriented data. At least they used a unique file extension!

Another common bit of software which uses the PAR extension is Solid Edge by Siemens. They use the PAR extension to encode their 3D parts format. For some reason this format still uses the OLE compound object container.

7z l tinyscrew.par 

Path = tinyscrew.par
Type = Compound
Physical Size = 86528
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 31964 32256 PSMcluster0
..... 12 64 Versions
2001-12-19 15:44:14 D.... Display
2001-12-19 15:44:14 D.... ACIS
..... 8462 8704 ACIS/Solid1.sab
..... 238 256 PSMroots
2001-12-19 15:44:14 D.... Display/Cache0
2001-12-19 15:44:14 D.... Display/Styles
..... 1725 1728 Display/Styles/Library0
..... 12 64 Display/Styles/DefaultStyles
..... 88 128 Display/Cache0/Info
..... 4248 4608 Display/Cache0/L1-T1
..... 8 64 JSitesList
2001-12-19 15:44:14 D.... PARASOLID
..... 3389 3392 PARASOLID/STREAM434.D_B
..... 10402 10752 PARASOLID/STREAM434.P_B
..... 4 64 DocVersion2
..... 199 256 PSMclustertable
..... 8 64 PSMuserroots
..... 512 512 JVisibleData
2001-12-19 15:44:14 D.... PSMspacemap
..... 66 128 PSMspacemap/0x00002000
..... 6090 6144 PSMspacemap/0x00000000
..... 174 192 PSMspacemap/0x00004000
..... 4716 5120 PSMtypetable
..... 8 64 FamilyMembers
..... 8 64 BuildVersions
..... 150 192 PartsLiteData
..... 596 640 [5]C3teagxwOttdbfkuIaamtae3Ie
..... 476 512 [5]SummaryInformation
..... 12 64 PSMsegmenttable
..... 96 128 MSConvertedPropertyset
..... 148 192 [5]K4teagxwOttdbfkuIaamtae3Ie
..... 280 320 [5]DocumentSummaryInformation
..... 116 128 [5]SszbwomgY1udb2whAaq5u2jwCg
..... 264 320 [5]Rfunnyd1AvtdbfkuIaamtae3Ie
..... 140 192 Dynamic Attributes Metadata
..... 458 512 Unclustered Dynamic Attributes
------------------- ----- ------------ ------------ ------------------------
2001-12-19 15:44:14 75069 77824 32 files, 6 folders

We will have to use the a container signature to correctly identify this format. There are also ASM and DFT formats which are also Solid Edge formats which use the same OLE container. Hopefully there are some unique features we can use to identify them.

One other file format which uses the PAR extension is not listed in any of the registries. Not in PRONOM, TrID, Wikidata, or others. I came across it while researching another format, DVD Studio Pro. On a Macintosh computer running the now discontinued DVD Studio Pro, one could save their DVD mastering project as a “file” which used the DSPPROJ extension. I use the term file loosely here as it wasn’t actually a file, it was a folder with an extension which MacOS would interpret as a single file. These are the package formats Apple used and still uses quite frequently. Moving this folder to another other system results in a folder of content.

tree sample.dspproj 
/sample.dspproj
└── Contents
├── PkgInfo
└── Resources
├── Audio
├── MPEG
├── Menu
├── ModuleDataB
├── ObjectDataB
├── Openers.plist
├── Overlay
├── Picture
├── Render Data
│   ├── C4272B0100797459.M2V
│   └── PAR
│   └── C4272B0100797459.M2V.par
├── Styles
├── Temp
├── Templates
└── Thumbnails

14 directories, 6 files

This PAR extension is explained in the DVD Studio Pro manual:

About the Parse Files
To use an asset in a project, DVD Studio Pro needs to know some general information about it, such as its length, type, and integrity. Video assets encoded within DVD Studio Pro can include this information in the encoded files, or can create separate files for it. Assets encoded by Compressor outside of DVD Studio Pro can include this information if you select the “Add DVD Studio Pro meta-data” option in the Extras pane of the Encoder settings.
Assets encoded with other encoders, or with the “Add DVD Studio Pro meta-data” option disabled when using Compressor, must be parsed before DVD Studio Pro can use them. Parsing creates a small file, with the same name as the video asset and a “.par” extension that contains the required information. The parse file can take from several seconds to several minutes to create, depending on the size of the asset file.

hexdump -C E4712E541A60E300.M2V.par | head
00000000 56 50 41 52 00 00 00 20 00 00 00 00 00 01 e2 40 |VPAR... .......@|
00000010 00 00 00 00 00 c6 19 7c 2f 55 73 65 72 73 2f 74 |.......|/Users/t|
00000020 79 6c 65 72 2f 44 6f 63 75 6d 65 6e 74 73 2f 46 |yler/Documents/F|
00000030 69 6e 61 6c 20 52 65 6e 64 65 72 20 66 6f 72 20 |inal Render for |
00000040 44 56 44 20 56 51 42 2f 56 61 72 73 69 74 79 51 |DVD VQB/VarsityQ|
00000050 42 20 44 56 44 2f 56 61 72 73 69 74 79 51 42 2d |B DVD/VarsityQB-|
00000060 44 69 73 63 32 2e 64 73 70 70 72 6f 6a 2f 43 6f |Disc2.dspproj/Co|
00000070 6e 74 65 6e 74 73 2f 52 65 73 6f 75 72 63 65 73 |ntents/Resources|
00000080 2f 52 65 6e 64 65 72 20 44 61 74 61 2f 45 34 37 |/Render Data/E47|
00000090 31 32 45 35 34 31 41 36 30 45 33 30 30 2e 4d 32 |12E541A60E300.M2|

Parity, Parts, and Parse files, oh my.

If you thought we were done, you would be wrong! Let’s look at yet another PAR format.

hexdump -C MESSROH.PAR | head
00000000 08 69 64 73 32 30 30 30 30 d0 4e 01 51 46 42 00 |.ids20000.N.QFB.|
00000010 98 d0 4e 01 80 01 58 01 b6 b9 f7 bf 82 30 00 00 |..N...X......0..|
00000020 dc 08 00 00 60 51 f2 bf 82 30 01 59 ff ff ff ff |....`Q...0.Y....|
00000030 a4 d0 4e 01 28 3e f2 bf 78 63 a4 01 dc 08 00 0b |..N.(>..xc......|
00000040 5a 45 52 4f 2d 4f 46 46 53 45 54 01 18 0e ac 01 |ZERO-OFFSET.....|
00000050 d4 d0 4e 01 00 ac 43 00 18 0e ac 01 d4 d0 4e 01 |..N...C.......N.|
00000060 51 46 42 00 ec d0 4e 01 d4 00 4e 01 b6 b9 f7 bf |QFB...N...N.....|
00000070 5c 4c 75 81 5c 81 00 00 45 07 41 00 c0 0a 00 01 |\Lu.\...E.A.....|
00000080 cd d0 41 00 d5 d0 41 00 5c 81 00 00 dc 0a a4 01 |..A...A.\.......|
00000090 5b 5d 42 00 cc d0 4e 01 72 5d 42 00 7a 5d 42 00 |[]B...N.r]B.z]B.|

hexdump -C DUMMYDAT.PAR | head
00000000 08 73 65 69 73 6d 69 63 31 00 00 00 00 00 00 00 |.seismic1.......|
00000010 00 00 00 00 00 01 58 00 00 00 00 00 00 00 00 00 |......X.........|
00000020 00 00 00 00 00 00 00 00 00 00 01 59 00 00 00 00 |...........Y....|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a |................|
00000040 41 4b 55 53 54 49 4b 4c 4f 47 00 00 00 00 00 00 |AKUSTIKLOG......|
00000050 00 00 00 00 02 2f 2f 00 08 41 47 43 2d 47 41 49 |.....//..AGC-GAI|
00000060 4e 00 00 00 00 00 00 00 00 00 00 00 00 32 00 00 |N............2..|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

This PAR format is called “Reflexw data-format“. This is a RAW format header that always is paired with a DAT file, together used to store geophysical wave data from devices such as GPR. Relexw is software made by Sandmeier geophysical research.

The PAR file samples I have don’t seem to have a consistent header as each have a unique set of bytes, but all of them have some similar bytes later in the file at around the 0x1D8 (472) offset:

000001d0  00 00 a0 3d 00 00 a0 41  00 00 00 00 00 00 00 00  |...=...A........|
000001e0 0a d7 23 3c 00 00 80 3f 00 00 00 00 00 00 00 00 |..#<...?........|
000001f0 00 00 00 00 cc cc dc 40 00 00 00 00 00 00 00 00 |.......@........|
00000200 00 00 80 3f 00 00 00 00 00 00 00 00 00 00 00 00 |...?............|
00000210 00 00 00 00 00 00 00 00 17 b7 d1 38 00 00 00 00 |...........8....|

It seems these sequence of bytes are the only consistent bytes among all my samples. I have no idea what they mean or reference. The specification does indicate some bytes which should lead to proper identification, but the integer used for the “HeaderMarker” is looking for a 4 byte “00 00 00 01”, which won’t be enough to cleanly identify the format. Love to hear what others can see from the spec. You can find some samples files here.

So we have some Parity files, Parts files, Parse files, Parquet files, and a Header file. I am sure other will be found and added to this lot. Hopefully the PAR files you run across will match one of these patterns! I am still working on a signature proposal. Stay Tuned!

No bad deed….

I had access to my first Macintosh computer around 1987. My father brought it home and I spent hours on it playing games and occasionally writing reports for school. The Macintosh Plus computer had one floppy drive and no hard drive. I remember playing the game Orbiter which had two floppy disks and right in the middle of game play it would pause and ask me to insert disk 2, then quickly ask for disk 1 again. The struggle was real. I spent years using many different Macintosh computers and now own more than I wish to admit. I’m preserving them!

The wild world of digital preservation has been a little lacking on the Macintosh side of things as I have come to realize. There still not a great way to manage Resource Forks in many preservation systems and the identification tools are mainly focused on the data bytetreams and not any system specific attributes Macintosh used often.

The PRONOM registry has either referenced early Macintosh specific formats or missed them entirely so I have been slowly working on a few to close that gap.

Interestingly enough, many Microsoft programs initially made their GUI debuts on the early Macintosh before making their way to Windows. Excel is one I am working on, as Version 1 is not identifiable in PRONOM, it was Macintosh only at the time.

Another is PowerPoint, I recently submitted two new signatures to PRONOM.

fmt/1747: Microsoft PowerPoint Presentation v2.x. Full entry added.
fmt/1748: Microsoft PowerPoint Presentation v3.x. Full entry added.
fmt/1866: Microsoft Powerpoint for Macintosh v.2. Full entry added.
fmt/1867: Microsoft Powerpoint for Macintosh v.3. Full entry added.

PowerPoint was initially released in 1987 on the Macintosh platform. It was developed by a company called ForeThought. Version 1.0 on the Macintosh was under this name, until it was bought by Microsoft only three months after being released. The history of PowerPoint can be discovered at Robert Gaskins, one of the original developers, website and book he wrote. The available information provided by Microsoft is only for the OLE format, covering versions 4.0 until 2003.

So, lets take a look at the Powerpoint original file format, before OLE.

   Type/Creator      RF      DF  Date         Filename
f  SLDS/PPNT         0       932 Oct 10 19:10 PowerPoint-v1

Luckily the early PowerPoint files did not have a Resource Fork. The Data Fork, if you haven’t noticed, has an interesting set of hex values at the beginning of the file. 0BADDEED is the first 4 bytes. If we look at a PowerPoint version 2 file from Windows.

The file format is the same, but because of the weird world of endianness, the first few bytes are in reverse order, EDDEAD0B.

Obviously we need to discuss this magic number and the meaning behind “Bad Deed”. This question was asked previously by the digital preservation community. I have a previous blog post about the use of words for the magic number CAFEBEEF as it was used with with JAVA class files and Express Publisher in the 1990’s. BADDEED looks like another clever use of the hex values that formed words. But was there a story behind the words? Joe Carrano asked if this string might be hexspeak. I wanted to know more so I asked some one who might know.

Robert Gaskins was kind enough to chat with me for a bit about the early days of PowerPoint.

I had a theory on the possible meaning behind BADDEED, so I asked him what the feeling was like between Apple and Microsoft at the time. I had heard for years that PowerPoint was originally created for the Macintosh, but Robert informed me:

  In fact, PowerPoint was designed first for Microsoft Windows, 

and its first spec shows that: “All the screen shots, menus, and 

dialogs were set up to look like Microsoft Windows, not like 

Macintosh.”  (Gaskins, Sweating Bullets, p. 92)  You can see that 

spec here.

A year later, we concluded that we would be forced to ship 

on Mac first, although we still thought that Windows was the 

big opportunity and thought that Mac was risky.  “We just didn’t think 

we could successfully ship a product for Windows, yet, though we planned 

to later. (Gaskins, Sweating Bullets, p. 105)  The considerations are 

summarized in my June 1986 product marketing document.

Of course, we turned out to have been right all along.  PowerPoint on 

Mac was much loved, but sales remained poor because Mac sales were 

so poor.  It was only after we shipped on Windows that PowerPoint gained 

the dominant market share which has characterized it ever since, and 

Windows PPT outsold Mac PPT very quickly. (Gaskins, Sweating Bullets, p. 403)

So my original thought was that there was some bad feelings around this Apple, Microsoft battle which has been the sentiment for quite some time. So when I asked if any of that influenced the use of BADDEED, I was told:

So, far from being disgruntled by expanding PowerPoint to Windows, 

that had been our goal all along, and its achievement was the most 

important success we had.

I judge that you are fully aware of all that, and that 

your question is more, “was there any bad deed signified 

by the Mac hex value chosen?”  No, it was just the poverty 

of choice when you only have six letters.

So there you have it. The use of the hex values 0x0BADDEED, was simply chosen from a limited set of values when looking at words hexadecimal could spell. I guess I should never let the truth get in the way of a good story.

I continued to have a wonderful conversation with Robert and also asked him for some details on the rest of the PowerPoint file format. I was hoping there might be some documentation out there explaining the early format before Microsoft took over. Robert said:

 I don’t know of any such documentation apart from the official 

Microsoft support files available online.  I don’t have any such 

information.  I know that Dennis Austin deposited some of our 

working files at the Computer History Museum (not online):

https://archive.computerhistory.org/resources/access/text/finding-aids/102733943-Austin/102733943-Austin.pdf

and it’s likely that some information is there–if nothing 

else, it claims to contain a source code listing for PPT 1.0 

which would contain the code to read the file format.

So there might be some information in at the Computer History Museum worth looking into.

As far as I could tell from the available online information, there is a few differences between Version 1.0 and Version 2.0, the biggest being the fact that 1.0 did not have an option to print in color, amount a few other minor things. Here is a screenshot of a page from the Microsoft PowerPoint 2.0 documentation on archive.org.

I suppose with the signature additions of the Macintosh and Windows versions 2.0 and 3.0 of the PowerPoint file format in PRONOM, that should cover most needs. Currently my PowerPoint 1.0 files identify at 2.0 files, so I may need to have them adjust the PUID to include both versions 1.0 and 2.0 as they are so similar. If I am able to find a difference or get my hands on the original source code I may find a better solution.

Student Writing Center

When it comes to difficult file formats, one of the more difficult groups of formats are word processing text files. Difficult for many reasons, one being the shear number of them, the other is their lack of identifiable headers. Just when you think you have seen them all another pops up to add to the mix.

In a batch of other known word processing formats I came across a few files with no extension and with the following header:

The rest of the file was binary so the only thing I had to go one was the string “TLC” and “FF”. A few searches across the interwebs didn’t reveal much, seems it wasn’t a well documented format. From the names of the files and the fact they were with other word processing formats led me to assume they were also some sort of document format. The date stamps were still intact and I could see they were from the mid 1990’s. It took a few creative searches before I wondered if the “TLC” might have something to do with “The Learning Company“. If it was, I still had quite a bit of work ahead as the software developer had produced quite a few titles over the years. You probably remember the “Reader Rabbit” series of educational games.

After a bit of time I narrowed it down to a few titles and started looking for samples of each. Software was hard to find as well. I tried opening the file in a few different software until I finally came to one called “Student Writing Center”. Which may sound familiar to some of you, but there was some variations on this name out there. Some of which are:

  • Student Writing Center
  • Student Writing & Publishing Center 
  • The Children’s Writing & Publishing Center
  • The Writing Center
  • Ultimate Writing & Creativity Center

There were probably others, considering the budget software company started in 1980 and made titles for a few computer platforms starting with the Apple II. The story behind the company is a fun read.

The Student Writing Center was a simple word processor aimed at students 10 years old and older. It was found in many schools right along side Kid Pix, another very popular graphic program for kids. The software had a few different document types to help students get started writing their book reports or journal entries.

The Student Writing Center ran on both Macintosh and Windows allowing it to be one of the more popular writing tools for the younger crowd.

Each document type had a unique interface and save menu, which on Windows would save with the extensions, .RP, .NL, .JN, .LT, and .SG. They also had a slightly different header.

Reports:        1A544C43 01464600 0000
Newsletters:    1A544C43 00464600 0300
Journals:       1A544C43 00464600 0100
Letters:        1A544C43 00464600 0400
Signs:          1A544C43 00464600 0200

The signatures submitted to PRONOM take into account endianness for Windows and Macintosh with the last two byte locations being swapped. Also every document had the values “46461A” “FF” at the end of the file.

But wait! Just when you think you had it figured out…….

This file may look similar, but they are two different formats and are not compatible with each other. The little brother to the Student Writing Center was called “Ultimate Writing & Creativity Center” and was made for younger kids, ages 6-10. It had more of a cartoon interface and a cute little fountain pen teacher to walk you through the writing process.

When you saved your file in UWCC, you could choose between formats and I guess move your documents up to the more advanced program once you turned 10! If you would like to experience or re-live the opening sequence, enjoy.

I’m not done yet………

To complicate things even more The Learning Company also released another word processor called “The Writing Center“. This gets confused with Student Writing Center frequently.

But unlike the two others, this format is very different.

We’ll have to save this format for another day.

There seems to be a never ending list of word processor formats, with no end in sight. But if you used a school computer back in the early 1990’s and still have your floppy disk from back then, hopefully now you can open that report you wrote on Abraham Lincoln.