ACE

September 12, 2025 by Thor Leave a comment

Without divulging any youthful indiscretions, I recently was going back through some of my personal archives and came across a disc I burned around 2002 with some music stored on it. Normally I would find MP3 files, but in this case the file had a ACE extension. I remembered the format as an alternative to the common RAR or ZIP format often used to compress content for transporting (sharing) around the internet. I did what I normally do when something is compressed and reached for 7zip. But to my surprise, it threw an error.

% 7z l sample.ace 

Scanning the drive for archives:
1 file, 12501419 bytes (12 MiB)    

Listing archive: sample.ace


ERROR: sample.ace : Can not open the file as archive

7zip usually can handle most common archives but a part of me remembered there was two versions of WinACE back in the day. Version 1 which was a free version and Version 2 which was for paid users of WinACE. How do I know which version I have is the question I frequently find myself asking. First was to check the PRONOM registry.

% sf sample.ace 
---
siegfried   : 1.11.2
scandate    : 2025-09-11T09:01:25-06:00
signature   : default.sig
created     : 2025-03-01T15:28:08+11:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V120.xml; container-signature-20240715.xml'
---
filename : 'sample.ace'
filesize : 12501419
modified : 2025-09-11T09:04:36-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match'

Nope, this format is not known to PRONOM. Lets try another tool.

% file sample.ace 
sample.ace: ACE archive data version 20, from Win/32, version 20 to extract, solid

Ok, so the file tool knows it is a version 2 ACE file and requires version 2 to extract. Good info from a file identification tool. Now lets see what we can find to extract this file on MacOS. The website Winace.com is long gone as this compression tool lost popularity and the final release was over 14 years ago. Looking at the website in the WaybackMachine we can see some downloads available. One being UnACE for Mac OS X, which upon further review, only works for the older PowerPC Mac’s. There is an open source version of unace for Linux, but it only supports version 1, the free version of the format.

Below is a screenshot of the DOS version of the ACE software. Created by Marcel Lemke.

It might be good to mention that WinRAR used to support the ACE format, but with WinACE support ending years ago and with some new vulnerabilities and folks using it for malware, support was dropped in 2019.

Luckily, I still have my PowerMac G5 lying around waiting for this very situation. After a quick install, unace was able to unarchive my music and I was able to listen to some of my favorite songs from 23 years ago. I still wanted to find a modern solution and later discovered there is a python project which can read and extract bother versions. Acefile is a pure python, no-dependencies implementation of the UnACE format. I had a little issue installing on an older Catalina laptop, but worked well on later MacOS versions. Acefile has a few features that are helpful in not only extracting, but testing and dumping the headers of an ACE file. I did install WinACE in a Windows XP Virtual Machine to make a few samples, here is one of them.

% acefile-unace --test sample.ace 
success  test.tif
total 1 tested, 1 ok, 0 failed

The test feature works well to ensure the file is complete and can be extracted, but doesn’t give me much to go on for knowing the version. Lets try dumping the header.

% acefile-unace --header sample.ace 
volume
    filename    sample.ace
    filesize    12501419
    headers     MAIN:1 FILE:1 RECOVERY:0 others:0
header
    hdr_crc     0x4900
    hdr_size    44
    hdr_type    0x00        MAIN
    hdr_flags   0x8100      V20FORMAT|SOLID
    magic       b'**ACE**'
    eversion    20          2.0
    cversion    20          2.0
    host        0x02        Win32
    volume      0
    datetime    0x5b2aae37  2025-09-10 21:49:46
    reserved1   c8 51 62 e3 5b 80 00 00
    advert      b''
    comment     b''
    reserved2   b'\x00e\x9c\xb1\xd8\x00\x03\n\x00\x00@\x08\x00test.'
header
    hdr_crc     0x3626
    hdr_size    39
    hdr_type    0x01        FILE32
    hdr_flags   0x8001      ADDSIZE|SOLID
    packsize    12501328
    origsize    25264236
    datetime    0x5b2aadcd  2025-09-10 21:46:26
    attribs     0x00000080  NORMAL
    crc32       0x9290955a
    comptype    0x02        blocked
    compqual    0x03        normal
    params      0x000a
    reserved1   0x4000
    filename    b'test.tif'
    comment     b''
    ntsecurity  b''
    reserved2   b''

This is very helpful. We can see the output shows the magic bytes, but also the e(xtraction)version and c(creating)version. We can also find this information in the open source unace technical documentation.

       2      HEAD_CRC      CRC16 over block up from HEAD_TYPE
       2      HEAD_SIZE     size of the block from HEAD_TYPE
                              up to the last byte of this block

       1      HEAD_TYPE     archive header type is 0
       2      HEAD_FLAGS    contains most important information about the
                            archive

                               bit  discription

                                0   0  (no ADDSIZE field)
                                1   presence of a main comment

                                9   SFX-archive
                                10  dictionary size limited to 256K
                                    (because of a junior SFX)
                                11  archive consists of multiple volumes
                                12  main header contains AV-string
                                13  recovery record present
                                14  archive is locked
                                15  archive is solid

       7      ACESIGN       fixed string: '**ACE**' serves to find the
                              archive header

       1      VER_EXTRACT   version needed to extract archive
       1      VER_CREATED   version used to create the archive

I think we have enough to go on to create a signature, we just need to see what the 1 byte versions number look like in an actual file.

% hexdump -C sample.ace | head
00000000  00 49 2c 00 00 00 81 2a  2a 41 43 45 2a 2a 14 14  |.I,....**ACE**..|
00000010  02 00 37 ae 2a 5b c8 51  62 e3 5b 80 00 00 00 65  |..7.*[.Qb.[....e|
00000020  9c b1 d8 00 03 0a 00 00  40 08 00 74 65 73 74 2e  |........@..test.|
00000030  26 36 27 00 01 01 80 50  c1 be 00 6c 80 81 01 cd  |&6'....P...l....|
00000040  ad 2a 5b 80 00 00 00 5a  95 90 92 02 03 0a 00 00  |.*[....Z........|
00000050  40 08 00 74 65 73 74 2e  74 69 66 28 25 a4 89 04  |@..test.tif(%...|
00000060  fa 43 b1 05 49 0c a3 76  8e 16 a9 2c 92 44 34 8c  |.C..I..v...,.D4.|
00000070  2c 12 e7 28 67 68 49 69  a7 92 4a 10 07 da 10 16  |,..(ghIi..J.....|
00000080  9c 16 4a 10 07 2b 9c ae  30 a9 50 c4 0a 69 51 a6  |..J..+..0.P..iQ.|
00000090  c9 64 a7 24 09 93 3d 81  26 31 a9 c2 68 32 c1 33  |.d.$..=.&1..h2.3|

As you can see above, we have our magic bytes **ACE** starting at the seventh byte and taking up seven bytes. Then two bytes after it both with the hex value 14. If we convert that hex value to decimal we get “20”. Let’s look at another:

% hexdump -C sample2.ace | head
00000000  61 67 31 00 00 00 90 2a  2a 41 43 45 2a 2a 0a 0c  |ag1....**ACE**..|
00000010  02 00 50 7c 31 26 d7 2b  c0 48 af 83 ce d9 16 2a  |..P|1&.+.H.....*|
00000020  55 4e 52 45 47 49 53 54  45 52 45 44 20 56 45 52  |UNREGISTERED VER|
00000030  53 49 4f 4e 2a 34 5f 24  00 01 01 80 00 00 00 00  |SION*4_$........|
00000040  35 00 00 00 3c 7c 31 26  10 00 00 00 ff ff ff ff  |5...<|1&........|
00000050  01 05 0a 00 2a 55 05 00  61 75 64 69 6f 45 72 23  |....*U..audioEr#|
00000060  00 01 01 80 00 00 00 00  35 00 00 00 3c 7c 31 26  |........5...<|1&|
00000070  10 00 00 00 ff ff ff ff  01 05 0a 00 2a 55 04 00  |............*U..|
00000080  42 49 54 53 98 14 24 00  01 01 80 00 00 00 00 35  |BITS..$........5|
00000090  00 00 00 3c 7c 31 26 10  00 00 00 ff ff ff ff 01  |...<|1&.........|

Hmm, now we have two different values. “0A” converts to decimal “10” and “0C” converts to decimal “12”. So we can infer this ACE file was created in version 1.2 and requires at least version 1.0 to extract. Let’s try another:

% hexdump -C sample3.ace | head   
00000000  c0 3f 2c 00 00 00 81 2a  2a 41 43 45 2a 2a 0a 14  |.?,....**ACE**..|
00000010  02 00 dc ad 2a 5b 23 52  89 e0 5b 80 00 00 00 65  |....*[#R..[....e|
00000020  9c b1 d8 00 03 0a 00 00  40 08 00 74 65 73 74 2e  |........@..test.|
00000030  92 f3 27 00 01 01 80 54  c3 be 00 6c 80 81 01 cd  |..'....T...l....|
00000040  ad 2a 5b 80 00 00 00 5a  95 90 92 01 03 0a 00 00  |.*[....Z........|
00000050  40 08 00 74 65 73 74 2e  74 69 66 28 25 a4 89 04  |@..test.tif(%...|
00000060  fa 43 b1 05 49 0c a3 76  8e 16 a9 2c 92 44 34 8c  |.C..I..v...,.D4.|
00000070  2c 12 e7 28 67 68 49 69  a7 92 4a 10 07 da 10 16  |,..(ghIi..J.....|
00000080  9c 16 4a 10 07 2b 9c ae  30 a9 50 c4 0a 69 51 a6  |..J..+..0.P..iQ.|
00000090  c9 64 a7 24 09 93 3d 81  26 31 a9 c2 68 32 c1 33  |.d.$..=.&1..h2.3|

Again we have “0A” which converts to decimal “10” and hex 14, which converts to decimal “20”. So made with version 2.0 of the software, but made compatible with version 1.0 for extraction. One more:

% hexdump -C sample4.ace | head
00000000  8b d6 31 00 00 00 90 2a  2a 41 43 45 2a 2a 0b 0b  |..1....**ACE**..|
00000010  02 00 cd b4 3e 26 4a e3  a1 80 32 4b c1 d9 16 2a  |....>&J...2K...*|
00000020  55 4e 52 45 47 49 53 54  45 52 45 44 20 56 45 52  |UNREGISTERED VER|
00000030  53 49 4f 4e 2a aa 08 24  00 01 01 00 00 00 00 00  |SION*..$........|
00000040  00 00 00 00 83 b2 3e 26  10 00 00 00 ff ff ff ff  |......>&........|
00000050  01 05 0a 00 2a 55 05 00  4d 75 73 69 63 77 73 27  |....*U..Musicws'|
00000060  00 01 01 00 00 00 00 00  00 00 00 00 83 b2 3e 26  |..............>&|
00000070  10 00 00 00 ff ff ff ff  01 05 0a 00 2a 55 08 00  |............*U..|
00000080  52 65 73 6f 75 72 63 65  93 75 25 00 01 01 00 00  |Resource.u%.....|
00000090  00 00 00 00 00 00 00 83  b2 3e 26 10 00 00 00 ff  |.........>&.....|

Both extraction and creation version are hex “0B” which converts to decimal “11”. I would have assumed any version 1.0 version could extract anything created with later 1.x versions, but I guess that might not be true. I am not clear on all the versions released, so I am not sure how many versions I should include in a signature. I did look through some of the captured pages on the WayBackMachine and feel the last 1.x version was version 1.32.

When building these signatures, it should be easy to create two signatures based on their extraction version. But should the creation version be a factor? Version 1.0 could look like this:

2A2A4143452A2A(0A|0B|0C|0D)(0A|0B|0C|0D|14)

This accounts for the versions 1.0 through 1.3 for extract version and 1.0 through 2.0 for creation version. Version 2.0 doesn’t seem to indicate minor versions with all 2.0 versions using decimal 14. So a signature could be:

2A2A4143452A2A1414

Both would start from offset 7 from the beginning of the file. Is there a better solution?

I will warn you, there are a couple of ACE formats out there which you may come across. One being an image/texture format for Microsoft Train Simulator. That might be for another day. There is another use of the ACE archive which is worth discussing. The Comic Book Archive file with the extension CBA will use the ACE archive for storing a series of images used in some Comic Book Readers. They are indeed ACE archive files, only having the different extension and a specific purpose. Maybe adding the CBA extension to the signature would be sufficient?

I am sure there are some other properties, seen above, of the ACE format we could discuss, encryption, the differences between Solid and SFX, and dictionary headers, but I think for now, identification of the format and the main version difference is sufficient. For now, check out my Github page for my signature proposal and a few samples I made.

Page Perfect

August 22, 2025 by Thor Leave a comment

PagePerfect: the Promise of Desktop Publishing Realized

Now, PagePerfect has arrived. And suddenly PC desktop publishing is a lot
simpler and less expensive, because PagePerfect integrates desktop
publishing, word processing, and graphics editing all in one package.

The 1980’s was a time of growth in personal computing and one industry was progressing rapidly. Previously in order to get printed more than just words, you had to use a complex arrangement of type, masking, screening; all done by hand. Now with a personal computer you could design and print well designed layouts. There were many software applications who came on the scene in these early days. My personal favorite was QuarkXPress, I used the software in the early 1990’s and spent the next few years working in a commercial printshop using the software. What once took a team of skilled workers to set copy, mask, blueline, etc took only one person with the right software.

I recently came across a set of floppy disks for some software called PagePerfect, by a well known software company IMSI.

This article in a 1988 PC Magazine announces this new revolutionary software. This was early on in the days of computer desktop publishing and even on a DOS system the software was powerful. It didn’t always get the best reviews in terms of ease of use, but it was well built. The company behind this powerful software wasn’t IMSI as you might expect, it was programed by a different company, Beyond Words, started by three former MicroPro employees, the makers of WordStar. Beyond Words liked to “leave sales to others” which included IMSI and a big contract with Canon called their Desktop Publishing System.

IMSI was able to market the software well and was well priced. The name PagePerfect didn’t last long and soon after they renamed the software IMSI Publisher in 1989. I’m not 100% sure, but it might have to do with WordPerfect asserting some copyright to the name around that same time. By 1990, the software was not seen much anymore, but another name pops up, Beyond Words Composer 2.0.

All three versions of the software have a very similar interface.

But the one thing they all have in common is their file formats. Unfortunately they used the same extensions many word processing software used during this time and after. .DOC and also .STY which was used frequently by Microsoft Word as well. It makes sense, a Document is shortened to DOC and a Stylesheet is shortened to STY. So if you have any DOC files which don’t open in Word, you might look here. The other problem is the file format used is not plain text and is in a binary proprietary format.

hexdump -C TEST.DOC | head
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 3c af  13 5b 1e 00 00 00 95 63  |......<..[.....c|
00000020  00 00 5e 00 00 00 18 00  00 00 01 00 76 00 00 00  |..^.........v...|
00000030  68 01 00 00 0a 00 de 01  00 00 00 00 00 00 00 00  |h...............|
00000040  de 01 00 00 8b 60 00 00  1e 00 69 62 00 00 2c 01  |.....`....ib..,.|
00000050  00 00 1e 00 00 00 00 00  00 00 00 00 00 00 5b 42  |..............[B|
00000060  57 44 4f 43 5d 00 00 00  00 32 2e 30 39 00 00 00  |WDOC]....2.09...|
00000070  00 00 00 00 0a 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 6c 00 00 00  00 00 00 00 00 00 00 00  |....l...........|

The one positive is the very obvious strings of text in the header. [BWDB] and [BWDOC], which one could infer as Beyond Words DB and Beyond Words Document. A later Beyond Words Composer document has the same header but a higher version number.

hexdump -C WELCOME.DOC | head
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 aa 14  56 16 29 00 00 00 30 84  |........V.)...0.|
00000020  00 00 5e 00 00 00 18 00  00 00 01 00 76 00 00 00  |..^.........v...|
00000030  b0 01 00 00 0c 00 26 02  00 00 00 00 00 00 00 00  |......&.........|
00000040  26 02 00 00 70 80 00 00  29 00 96 82 00 00 9a 01  |&...p...).......|
00000050  00 00 29 00 00 00 00 00  00 00 00 00 00 00 5b 42  |..)...........[B|
00000060  57 44 4f 43 5d 00 00 00  00 33 2e 30 31 00 00 00  |WDOC]....3.01...|
00000070  00 00 00 00 0c 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 6e 00 00 00  00 00 00 00 00 00 00 00  |....n...........|

If we look at the Stylesheets we see the same patterns.

hexdump -C SAMPLE.STY | head   
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 51 10  76 10 09 00 00 00 da 2c  |......Q.v......,|
00000020  00 00 5e 00 00 00 18 00  00 00 01 00 76 00 00 00  |..^.........v...|
00000030  68 01 00 00 0a 00 de 01  00 00 00 00 00 00 00 00  |h...............|
00000040  de 01 00 00 a2 2a 00 00  09 00 80 2c 00 00 5a 00  |.....*.....,..Z.|
00000050  00 00 09 00 00 00 00 00  00 00 00 00 00 00 5b 42  |..............[B|
00000060  57 44 4f 43 5d 00 00 00  00 32 2e 30 39 00 00 00  |WDOC]....2.09...|
00000070  00 00 00 00 0a 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 6c 00 00 00  00 00 00 00 00 00 00 00  |....l...........|

I haven’t been able to find any specific bytes which differentiate the Stylesheets from the Documents. They may be the same format, but for now we will consider them the same. These stylesheets seem to function as a template which are often the same format.

Apart from the document layout, the software can also create and use databases. Which appear to be a similar format but with different offsets.

hexdump -C DOCUMENT.TBL | head
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 6b 10  36 00 00 00 18 00 00 00  |......k.6.......|
00000020  01 00 4e 00 00 00 68 01  00 00 0a 00 b6 01 00 00  |..N...h.........|
00000030  00 00 00 00 00 00 5b 42  57 44 4f 43 5d 00 00 00  |......[BWDOC]...|
00000040  00 32 2e 30 39 00 00 00  00 00 00 00 0a 00 00 00  |.2.09...........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 6c 00 00 00  |............l...|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Prior to me diving into this format, the only tool which had some information on this format was TrID, which identified all the DOC and STY files as Beyond Words Composer style. Which is mostly true. Hopefully with this background you can be aware of the different software names this format was used with and with some luck convert the files to something less proprietary.

Some disks that came with my PagePerfect install disks do have some personal documents created with the software, but I wonder how much this software really was used in the late 1980’s and early 1990’s, because after that point, you don’t hear about the software anymore. There is some references to the software getting absorbed into another software, IBM DisplayWrite 5/2. I would be curious if others have come across this file format.

More Student Writing Center

August 8, 2025 by Thor Leave a comment

Most of what you will find on this blog is file format identification. I see this as the first step in a longer process of preservation and ultimately access. Hopefully the analysis of some file formats can help make better decisions when needing to render the file in an emulator or migrate to another format. I don’t spend much time trying to parse the files I look at to understand the actual content, just enough to properly identify and differentiate between important versions of the format.

One area I sometimes touch on, but often skim over is encryption. Many file formats are binary, meaning they use a sequence of bytes to encode data which is more efficient than human readable text and is often compressed. The bytes used to store data is designed by the developer of the software, they can encode the data however they choose, which is often unreadable by anyone else and is proprietary. A file can also be further encrypted by a password to limit use, even with the right software.

I recently had one of the numerous fans of this blog reach out and ask about the post I made on the software Student Writing Center. They had a bunch of journal files from their youth and couldn’t find a way to read these older files. I offered my help as I still have the software and a nice emulator to run the old software.

As I was going through and converting the journal entries into a PDF. I came across a few which asked for a password to open. You can see below the explanation from the help menu confirms the file format is a proprietary format only readable by their software and the password feature is to further protect the content.

Finding a few of the journal documents password protected was frustrating at first. I was converting some documents that are over 26 years years old, I doubted the password would be remembered. When I asked, they gave me a couple passwords to try, but nothing worked. But I don’t give up that easily!

My first thought was to take all the text from the other journal entries and make a dictionary and then use it to try and brute-force the password. There are some great tools to do this like hashcat. With tools like this, you need to retrieve a hash of the password. This is an encrypted sequence of the password stored in the file. So the first step was to find where the password was stored in the file. Since I have the software and can make new password protected files using a password of my choice this proved a simple task. Create two identical files, add a password to each but different. Then compare the two files in a hex editor to find the difference.

There it is. The password field in the software only let me put in 10 characters and these 10 bytes lit up when I ran a difference between the two files. I went to check the files given to me which also had password protection and found they also had a similar pattern. In fact I noticed from a few checks that the passwords I used also had a pattern in the file.

For this file I used the number “1” ten times. In that same location it repeated the same byte value”85″, 10 times. After a couple more tests I could see this wasn’t an algorithm I need to crack, but a simple replacement. I created a few more files using all the letters in the alphabet and all the numbers and came up with a substitution cypher.

Obviously the passwords used in the documents I was trying to open didn’t all use the full 10 characters, but the password was always preceded by the values “00” and had the values “1A46461A” after the password. The byte prior to the “00” indicates the length of the password. From there I just needed to decode the bytes between those two offsets.

So for this file with an 8 byte sequence “90D54F4FA3FBBA94” decodes to: password. How cool is that? To make things even easier, the passwords used in Student Writing Center are not case sensitive. There are additional values for symbols. You can see the entire substitution list here.

One other thing related to identification. Would it be important to identify a password protected file differently than a regular file? At offset 0xDA there seems to be a indicator that the file is password protected. “00” if not “01” if protected.

What do you think? Should this property be identified as a separate file format from a regular file or is this property something that should be gathered using additional tools that can gather additional properties from a file like this?

Speaking of additional tools. There is a pretty cool project called the Import library for legacy Mac documents or libmwaw which claims to have support for Student Center Writing documents and a lot more. It indeed does, but not the journal format, only the main letter format. I bet it wouldn’t take much to add the journal format to the library, something I will look into.

Microstation

July 11, 2025 by Thor Leave a comment

I recently was able to image a few Bernoulli Disks for a collection using a SCSI device I have found quite useful. The disks had been sitting around for quite some time waiting for the right tools and resources to extract the contents. I mentioned the accomplishment to a few coworkers and one asked me if I would extract the contents from their old disk they used for school back in the 1990’s. They had spent a whopping $99 at the local bookstore for a disk which held a total of 150MB. Not GB’s like we are used to now, but megabytes. I have some camera’s which takes RAW photos larger than then would fit on one disk. Once I had the data extracted from their disk, I took a look at the contents. There was a few file formats on the disk I was unfamiliar with. A quick scan with DROID revealed some matches and a few problems.

Turns out the data were files written by an old version of Bentley Microstation. The files dated from late 1995 and the disk was formatted for FAT16 which leans more to being used in a DOS system, but could have been used with the newly released Windows 95. The Bentley Microstation 95 software wasn’t released until November of 1995, so my guess is these Microstation files where created with the Microstation version 5 for DOS.

disktype HD6_imaged-004.hda 

Regular file, size 144.0 MiB (150998016 bytes)
No type and creator code
DOS/MBR partition map
Partition 4: 144.0 MiB (150978560 bytes, 294880 sectors from 32, bootable)
  Type 0x06 (FAT16)
  FAT16 file system (hints score 5 of 5)
    Volume size 143.8 MiB (150810624 bytes, 36819 clusters of 4 KiB)
    Volume name "ode 009 - I"

PRONOM has a few entries for the Microstation software:

PUID	Format Name	Format Name	Extension
x-fmt/346	Microstation CAD Drawing	95	DGN
fmt/502	Bentley V8 DGN		DGN
fmt/1626	MicroStation Symbology Resource File		RSC
fmt/1549	Bentley Microstation Hidden Line File		HLN
fmt/1358	MicroStation Base File		BSE
fmt/1183	MicroStation Material Palette		PAL
fmt/1177	MicroStation Material Library		MAT

The files found on this old Bernoulli disk gave varied results in identification. Most of the DGN files give me this multiple Identifications in DROID.

A little digging and we can learn a bit about the major formats. Integraph and Bentley used a Binary version of their drawing format, DGN, from versions 2 until 7, spanning 1987 to 2001, with the release of version 8, they made a major change to the format. Version 8 use the Microsoft OLE2 container to enhance the format allowing it to hold multiple drawings and more information about the model. With this change, the format became proprietary. Sure, they started an OpenDGN program to make the format more compatible with other systems, but required you to sign an NDA in order to get a copy of the format specifications. You had to request access and sign an NDA, which doesn’t sound “open” to me. You can read another file format researchers thoughts on this on her blog.

So I know many of these files are not Version 8 of the DGN format as they are not OLE2 containers, but the other issue is that x-fmt/346 for the Microstation CAD drawing 95 is an outline record. It has no signature. So DROID is guessing based on extension only. We need to dig deeper.

I noticed than many of the DGN files in my sample set also identified as a “Microstation Hidden Line File”, but instead of a HLN extension, they use DGN.

sf samp15.dgn 

filename : 'samp15.dgn'
filesize : 359424
modified : 1998-09-01T12:31:52-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1549'
    format  : 'Bentley Microstation Hidden Line File'
    version : 
    mime    : 
    class   : 'Model'
    basis   : 'byte match at [[0 3] [359422 2]]'
    warning : 'extension mismatch'

hexdump -C samp15.dgn | head
00000000  08 09 fe 02 01 08 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 c8 45  |............ ..E|
00000020  00 00 00 00 00 00 00 00  40 06 0c 00 01 05 dc a0  |........@.......|
00000030  ff ff ff ff ff ff ff ff  b5 8b 9f 63 b9 88 85 a7  |...........c....|
00000040  00 00 00 00 19 00 b4 86  13 00 fe be 00 00 00 00  |................|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  80 40 00 00 00 00 00 00  |.........@......|

hexdump -C samp7.dgn | head
00000000  c8 09 fe 02 01 08 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 04 7a 45  |..............zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 01 05 fc b0  |................|
00000030  ff ff ff ff ff ff ff ff  0d 00 9d b5 0c 00 74 93  |..............t.|
00000040  ff ff a6 fd 09 00 40 11  05 00 50 aa 00 00 e5 f8  |......@...P.....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Looking at a couple files in the same sample set, some use the header “08 09 fe 02 01 08 00 00” while another uses “c8 09 fe 02 01 08 00 00”. This is why samp15.dgn identifies as an HLN files as the signature matches, while samp7.dgn uses “C8” instead of “08” making it not identify as an HLN file. What is the difference and what is an HLN file?

First let’s define an HLN file. The name of the format is “Hidden Line File”, although most references refer to it as a “Visible Edges File“. Confusing, but the definition is: “a 2D or 3D DGN file that contains the edges visible in a 3D view (that is, with those edges that would be hidden, removed).”

Looking at a couple HLN files, we can see the format is the same as DGN files:

hexdump -C test-2d.hln | head
00000000  08 09 fe 02 08 01 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 7a 45  |............ .zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 00 05 fc b2  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff 5b f5 ff ff fe f9  |..........[.....|
00000040  00 00 00 00 01 00 d3 cb  01 00 36 2a 00 00 e8 03  |..........6*....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  80 40 00 00 00 00 00 00  |.........@......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C test-3d.hln | head
00000000  c8 09 fe 02 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 7a 45  |............ .zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 00 05 fc b2  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff 5b f5 ff ff fe f9  |..........[.....|
00000040  ff ff 0c fe 01 00 d3 cb  01 00 36 2a 00 00 e8 03  |..........6*....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|

Same difference between the two previous files. These two files also explain the difference between the “08” and the “c8” values. Microstation uses the first to indicate it is a 2D file and the latter to indicate a 3D file. The DGN format has been documented in libdgn and this distinction is referenced.

This presents a problem with the current PRONOM identification.

filename : 'MS95-2D.dgn'
filesize : 12288
modified : 2025-06-05T21:13:52-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1549'
    format  : 'Bentley Microstation Hidden Line File'
    version : 
    mime    : 
    class   : 'Model'
    basis   : 'byte match at [[0 3] [12286 2]]'
    warning : 'extension mismatch'

filename : 'MS95-3D.dgn'
filesize : 12800
modified : 2025-06-05T21:14:00-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/346'
    format  : 'Microstation CAD Drawing'
    version : '95'
    mime    : 
    class   : 
    basis   : 'extension match dgn'
    warning : 'match on extension only'

The 2D files mis-identify as Hidden Line Files and the 3D files are identified through extension only. We learned from a previous test that Hidden Line Files can be both 2D and 3D and are the same format as DGN, so a separate identification PUID is unnecessary, but the x-fmt/346 identification doesn’t have a signatures, so a few things need to change.

The other issue is a Hidden Line File is also available in version 8+.

filename : 'Microstationv8-s01.hln'
filesize : 7168
modified : 2025-06-05T19:48:09-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/502'
    format  : 'Bentley V8 DGN'
    version : 
    mime    : 
    class   : 'Image (Vector)'
    basis   : 'container name Dgn~H with name only'
    warning : 'extension mismatch'

They also identify as Bentley V8 DGN files, but with an extension mismatch. This should be easy to remedy with the addition of the extension HLN to the signature. The container signature seems to work well, no need to change anything.

My suggestions to fix these issues would be:

Depreciate x-fmt/346
Change name of fmt/1549 from “Bentley Microstation Hidden Line File” to “Microstation CAD Drawing” and use the version 2-7 to distinguish from v8
Change the signature for fmt/1549 from “0809FE” to “(08|C8)09FE02” no EOF of “FFFF”

The other option would be to make fmt/1549 the 2D drawing format and x-fmt/346 could be used for the 3D drawing format. What do you think?

I have uploaded a few samples to my GitHub page. Curious if your examples of DGN files match what I am seeing. There are a few other related formats that will need to be explored, but this should help for now.

SCP

June 27, 2025 by Thor Leave a comment

If you have been following previous posts about Floppy disk flux captures, you may have read about the HFE or A2R flux image formats. Both very useful in the preservation, archiving and emulation of old software and games stored on decaying and copy-protected floppy disks. I also built a Fluxengine which has come in handy more than once. It captures flux data in its own FLUX format. At work I also have access to a Kryoflux board which captures in separate RAW tracks.

Today we are looking at the SCP format. I recently purchased a Greaseweazle for personal use and the main format used while capturing raw flux data is SCP. It works a little better on my older MacBook Pro than the fluxengine and I wanted to have another option for capturing flux data. So far it has worked really well. Of course I wanted to know everything I could about the SCP format so the first thing I did was run Siegfried against a file.

filename : 'unknown.scp'
filesize : 47017278
modified : 2025-06-14T19:09:58-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match'
  - ns        : 'wikidata'
    id        : 'Q29000565'
    format    : 'SuperCard Pro dump'
    URI       : 'http://www.wikidata.org/entity/Q29000565'
    permalink : 'https://www.wikidata.org/w/index.php?oldid=1866792367&title=Q29000565'
    mime      : 'application/octet-stream'
    basis     : 'extension match scp; byte match at 0, 3 (Wikidata reference is empty)'

Looks like Wikidata has a signature pattern, but PRONOM does not. Lets take a look and see how difficult it might be.

hexdump -C unknown.scp | head
00000000  53 43 50 00 80 03 00 a3  23 00 00 00 d2 0f 26 99  |SCP.....#.....&.|
00000010  b0 02 00 00 14 43 04 00  c6 96 08 00 64 78 0d 00  |.....C......dx..|
00000020  ea bb 12 00 de 37 16 00  a2 b3 19 00 26 68 1e 00  |.....7......&h..|
00000030  42 b7 23 00 2a 33 27 00  c8 ae 2a 00 a8 54 2f 00  |B.#.*3'...*..T/.|
00000040  fc 94 34 00 e2 10 38 00  a8 8c 3b 00 98 68 40 00  |..4...8...;..h@.|
00000050  1c b6 45 00 14 32 49 00  cc ad 4c 00 9e 9b 51 00  |..E..2I...L...Q.|
00000060  0e d3 56 00 de 4e 5a 00  74 ca 5d 00 be 7b 62 00  |..V..NZ.t.]..{b.|
00000070  b4 b3 67 00 a8 2f 6b 00  68 ab 6e 00 50 88 73 00  |..g../k.h.n.P.s.|
00000080  0c ce 78 00 02 4a 7c 00  ae c5 7f 00 96 bd 84 00  |..x..J|.........|
00000090  8a 2d 8a 00 8a a9 8d 00  56 25 91 00 b6 a3 95 00  |.-......V%......|

Well, probably not hard at all. I love easy well understood headers. But only three bytes can have issues, lets look a little closer at the published specification. Before we dive into the spec, it might be good to note a few things. The SCP image format was developed for another hobby board. A Supercard Pro, is a custom board to connect a floppy drive through USB to software which can also capture flux data and help interpret the data to a image format which can be used to write back to a floppy or used in an emulator. The software is Windows only so those on Linux or MacOS can’t use it, but since the specification was made public, many other boards and tools can read and write to the format. Even though it is open, I worry about preserving the spec. When you try and ensure it is saved in the WayBackMachine you get this fun page.

This sorry page is usually found when the owner of a URL has asked specifically for their domain to be excluded from the web archive. This worries me as I have found many specifications have been lost to time. I would love to know why the owner has chosen to do this, but it is available now, so lets dive in. The versions appear to have started in 2014, but the page is copyright 2012, so I assume the format was created around this time. It was last updated in February of 2024, so is pretty up-to-date. One important update was made in 2021:

v2.3 - 06/03/21

*  Added additional FLAG bit (bit 7) to identify a 3rd party flux creator.  PLEASE
   SET THIS BIT IF YOU ARE A 3RD PARTY DEVELOPER USING THE SCP FORMAT!

This update to version 2.3 added a bit to indicate the 3rd party flux creator. This means a board like the Greaseweazle will indicate its software as the creator instead of a SCP created by SuperCard Pro.

The header of an SCP file is comprised of a few bytes, not just the ASCII “SCP”.

All offsets are the start of the file (byte 0) unless otherwise stated.  The .scp image
consists of a disk definition header, the track data header offset table, and the flux
data for each track (preceeded by Track Data Header).  The image file format is described
below:

BYTES 0x00-0x02 contains the ASCII of "SCP" as the first 3 bytes. If this is not found,
then the file is not ours.

With Byte 0x03, we will see the version of the software which created the SCP. In my sample, created by my Greaseweazle, did not add a number here, only “00”. Byte 0x04 is the disk type, there is some set definitions in the spec for this byte. My test sample uses “80”, but not sure what that represents. Bytes 5-7 are used for other disk information, but byte 8 is where we find the flags which include a bit for flux creator. My sample has the value “23”, but since we are looking at the individual bit level, the value will be a combination of all the bits in the flag area. The individual bits are, “00100011”, so since the seventh bit is set, then the SCP was created by 3rd party which is correct.

So the only reliable static data in the header will be those first 3 bytes. There is some bytes later in the file which should be static. That is the start of the Tracks, which include a Track Data Header. We can see from the spec, the last byte in the main header is 0x2AF, which makes the main header 687 bytes long. Starting on the 688 byte, or 0x2B0 is the ASCII string TRK. Adding these 3 bytes should make for a nice signature.

000002b0  54 52 4b 00 a9 86 65 00  5e b5 00 00 28 00 00 00  |TRK...e.^...(...|
000002c0  ab 86 65 00 60 b5 00 00  e4 6a 01 00 56 87 65 00  |..e.`....j..V.e.|
000002d0  60 b5 00 00 a4 d5 02 00  00 39 00 7e 00 7c 00 ce  |`........9.~.|..|
000002e0  00 c7 00 c7 00 cd 00 7e  00 7c 00 eb 00 4f 00 60  |.......~.|...O.`|
000002f0  00 39 00 77 00 cd 00 7c  00 7f 00 ce 00 c7 00 c6  |.9.w...|........|
00000300  00 ce 00 7a 00 80 00 cd  00 c8 00 c6 00 ce 00 7b  |...z...........{|

We could use the TRK string for identification, but looking further into the spec, we can also see the SCP format may contain a footer.

; ------------------------------------------------------------------
; EXTENSION FOOTER FORMAT
; ------------------------------------------------------------------
;
; 0000           DRIVE MANUFACTURER STRING OFFSET            - 4 bytes
; 0004           DRIVE MODEL STRING OFFSET                   - 4 bytes
; 0008           DRIVE SERIAL NUMBER STRING OFFSET           - 4 bytes
; 000C           CREATOR STRING OFFSET                       - 4 bytes
; 0010           APPLICATION NAME STRING OFFSET              - 4 bytes
; 0014           COMMENTS STRING OFFSET                      - 4 bytes
; 0018           IMAGE CREATION TIMESTAMP                    - 8 bytes
; 0020           IMAGE MODIFICATION TIMESTAMP                - 8 bytes
; 0028           APPLICATION VERSION (nibbles major/minor)   - 1 byte
; 0029           SCP HARDWARE VERSION (nibbles major/minor)  - 1 byte
; 002A           SCP FIRMWARE VERSION (nibbles major/minor)  - 1 byte
; 002B           IMAGE FORMAT REVISION (nibbles major/minor) - 1 byte
; 002C           'FPCS' (ASCII CHARS)                        - 4 bytes

Here is the tail of my sample file, you can see it contains the ASCII characters listed here for the last four bytes. It also contains an application string, indicating the Greaseweazle software used to create the file. All every helpful information. We can also see on the 5th to last byte the value “24”, this indicates the file format version being used. Version 2.4 being used in this file but we know 2.5 is the latest. I wonder if it would be valuable to have separate identification for version 1 and 2 of the format? Could also consider assigning version 2.3 and 2.4 as unique as they will have the additional 3rd party information.

hexdump -C unknown.scp | tail
02cd6cb0  00 85 00 5a 00 39 00 90  00 75 00 8e 00 42 00 3c  |...Z.9...u...B.<|
02cd6cc0  00 78 00 2e 00 42 00 3a  00 47 00 78 00 42 00 46  |.x...B.:.G.x.B.F|
02cd6cd0  00 33 00 52 00 29 00 3a  00 55 00 5d 00 5b 00 54  |.3.R.).:.U.].[.T|
02cd6ce0  00 35 00 e0 00 48 00 91  00 75 00 3a 00 36 00 33  |.5...H...u.:.6.3|
02cd6cf0  00 55 02 03 01 d3 00 33  00 58 11 00 47 72 65 61  |.U.....3.X..Grea|
02cd6d00  73 65 77 65 61 7a 6c 65  20 31 2e 32 32 00 00 00  |seweazle 1.22...|
02cd6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 fa 6c  |...............l|
02cd6d20  cd 02 00 00 00 00 66 1d  4e 68 00 00 00 00 66 1d  |......f.Nh....f.|
02cd6d30  4e 68 00 00 00 00 00 00  00 24 46 50 43 53        |Nh.......$FPCS|

So maybe we don’t need the TRK header in our signature, just the first 3 bytes and last 4 bytes. I believe this should allow for proper identification, while avoiding false positives.

I have a proposal for a PRONOM signature and a sample file on my Github page. Other samples files can be found all over the interwebs, with many on archive.org.

DaVinci Resolve

May 2, 2025 by Thor Leave a comment

A previous post was about LUTs, the little files needed to color grade your photo’s and video’s. One of the best systems for color grading video in use by professionals today is DaVinci Resolve. The system originally was all hardware based, but in the 2004 as computers were able to process higher quality video, da Vinci Systems released new digital systems.

Like most professional multimedia editing software, projects are used to manage work and DaVinci Resolve is no different. Projects are generally where all the settings for the project are stored, but don’t generally store the actual media used in the project. Project files are often XML with unique schema’s, but other pack a little more into the project file.

hexdump -C project.drp | head
00000000  50 4b 03 04 14 00 08 00  08 00 f2 54 90 5a ef 18  |PK.........T.Z..|
00000010  b0 25 47 0c 00 00 db 1b  00 00 0b 00 00 00 70 72  |.%G...........pr|
00000020  6f 6a 65 63 74 2e 78 6d  6c 9d 58 d9 72 5b 37 12  |oject.xml.X.r[7.|
00000030  7d cf 57 68 f4 7e 4d ec  4b 8a 51 ca b1 92 89 aa  |}.Wh.~M.K.Q.....|
00000040  2c db 65 29 79 9d 6a 00  0d 85 09 45 aa 48 4a 71  |,.e)y.j....E.HJq|
00000050  fe 7e 0e ee 42 51 94 9c  68 c6 29 85 17 0d a0 d1  |.~..BQ..h.).....|
00000060  e8 3e bd 61 fe fd 97 db  e5 c9 03 6f b6 8b f5 ea  |.>.a.......o....|
00000070  bb 53 f9 46 9c 9e f0 2a  af cb 62 75 f3 dd e9 2f  |.S.F...*..bu.../|
00000080  d7 3f 75 e1 f4 fb b3 6f  e6 ff ea ba f3 f4 f6 ee  |.?u....o........|
00000090  ee 57 de 60 55 7c 23 df  98 37 42 48 79 7a 72 9e  |.W.`U|#..7BHyzr.|

DaVinci Resolve keeps all projects in a database, but you can export them to a project file. A DaVinci Resolve Project file uses a ZIP container to store all the project settings in one file. Let’s see what also might be inside.

Path = project.drp
Type = zip
Physical Size = 543860

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2018-02-27 20:25:08 .....      1010030       287793  project.xml
2018-02-27 20:25:08 .....        21173         6856  MediaPool/Master/000_Timelines/MpFolder.xml
2018-02-27 20:25:08 .....       492690        28067  MediaPool/Master/001_Audio/MpFolder.xml
2018-02-27 20:25:08 .....        20177         3588  MediaPool/Master/002_gfx/MpFolder.xml
2018-02-27 20:25:08 .....        11025         2611  MediaPool/Master/003_VO/MpFolder.xml
2018-02-27 20:25:08 .....        98309         7042  MediaPool/Master/004_ScreenCaptures Consolidated/MpFolder.xml
2018-02-27 20:25:08 .....      1278493        66424  MediaPool/Master/005_Video H264/MpFolder.xml
2018-02-27 20:25:08 .....         1995          748  MediaPool/Master/MpFolder.xml
2018-02-27 20:25:08 .....      1638204       137086  SeqContainer/909a0a2c-4183-4310-9f78-6e15c3c59cb4.xml
2018-02-27 20:25:08 .....         8806         1169  Gallery.xml
2018-02-27 20:25:08 .....        12697          696  media.dat
------------------- ----- ------------ ------------  ------------------------
2018-02-27 20:25:08            4593599       542080  11 files

Looks like a lot of XML! The consistent XML in all the DRP files is the apply named “project.xml” along with “Gallery.xml”.

cat project.xml | head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="19.1.4.0011" DbPrjVer="14"-->
<SM_Project DbId="db65f2ee-2bff-41cd-b478-f96c26e9609f">
 <FieldsBlob>000000010000000700000026005400650078007400520065006e006400650072004900740065006d005600650063004200410000000c00ffffffff0000002400520065006e0064006500720043006100630068006500560065007200730069006f006e0000000200000000010000001e00500072006f006a00650063007400460065006100740075007200650073000000050000000000000000010000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000030000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800330033003400320034003300380036002d0034006400330030002d0034003600610035002d0061006100340033002d006100330035003200620066006500370038003200640063000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
 <LockId/>
 <User>86f03abc-9354-47d9-9006-a55b6b1d49cf</User>
 <Folder/>
 <UserId>-1</UserId>
 <SysId>6CB133A11B81</SysId>
 <ProjectId>0</ProjectId>

It appears the version of DaVinci Resolve is pretty important. If you try and open a DRP file without using the most up-to-date software you might run into problems. From what I can see, every time a new major version is released, the updates to the XML cause the project error when imported. So knowing the version of the DRP file can be a critical piece of metadata needed in understanding the format. There are some helpful apps created by DaVinci Resolve users you can try, or you can try a little python script to report back the version used in a DRP or whole folder of DRP files.

There is one other file used by the DaVinci Resolve software. It uses the DRT extension and is for exporting and importing single timelines to the software. Like a DRP it is a simple project file that only points to the media used in the project and only stores the settings needed.

Path = timeline.drt
Type = zip
Physical Size = 215159

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2021-04-21 21:16:42 .....        45726         8888  project.xml
2021-04-21 21:16:42 .....       670306       198698  MediaPool/Master/MpFolder.xml
2021-04-21 21:16:42 .....        98268         7089  SeqContainer/7eb849f3-41cb-4e3f-baa8-d5b134b57aa7.xml
------------------- ----- ------------ ------------  ------------------------
2021-04-21 21:16:42             814300       214675  3 files

This DRT file also has a project.xml file, but doesn’t have the Gallery.xml file we normally find in a DRP file. We can use this to distinguish the difference. The project.xml is the same as the DRP, so this distinction is important.

cat project.xml |head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="17.1.1.0009" DbPrjVer="10"-->
<SM_Project DbId="ec6cb2e2-0b3c-43b8-8f90-a5fcb973af3b">
 <FieldsBlob>00000001000000040000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000020000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800660030003800380038003300390038002d0066006400620037002d0034006300320036002d0061003700310032002d003300360038006200300036003300300065003400330031000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
 <LockId/>
 <User>04d71873-a504-40c6-bde5-41709691a2c9</User>
 <Folder/>
 <UserId>-1</UserId>
 <SysId>94F6D6F3F60F</SysId>
 <ProjectId>0</ProjectId>

In both formats they use the XML root tag of “SM_Project”, this can also be used to define a signature for the two formats as “project.xml” could be used with a different format and we don’t want there to be a false identification.

I was able to trace back the use of the DRP format back to DaVinci Resolve version 9. In version 8, it appears projects are exported using the name and extension, “Default Project.resolve.zip”. From what I could find, DaVinci Resolve version 9 was a big re-write and so it makes sense to settle on more useful extension. The project.xml file in a version 8 format is slightly different.

cat project.xml | head
<SM_Project DbId="9ba0c4dc-d99c-4b7f-b0da-d254d91e34e2" DbAppVer="8.2 (#153)">
 <LockId></LockId>
 <User>159415b8-7515-43bf-b5f5-00d98949434b</User>
 <UserId>-1</UserId>
 <SysId>7cd1c388ea29</SysId>
 <ProjectId>0</ProjectId>
 <RevivalTaskSetID>-1</RevivalTaskSetID>
 <PlayHeadsSplitDisplay>false</PlayHeadsSplitDisplay>
 <pGallery>
  <Gallery::GyGallery DbId="9884d8ff-096e-4df0-b833-0e75e6e07e15">

Still uses the “SM_Project” root tag, but displays the DbAppVer information differently. It would be good to find more examples of the version 8 and earlier to see how this format has evolved over time. For now, I have created a signature you can test if you happen to have any DRP files in your archive.

Scrivener

March 21, 2025 by Thor Leave a comment

Word Processors are everywhere and have some of the most recognizable file formats. Some are very simple in that they just contain plain text, others are more complex. There are formats which allow for images and others which can handle different languages and writing directions.

A writing platform I recently learned about is called Scrivener. It was first released in 2007 by a company called Literature & Latte Ltd, and has a Macintosh and Windows version. The software is marketed toward writers as there is some features that help with note taking, research and much more. It also allows for adding multimedia and even full webpages.

This is accomplished by a file format which uses a non-traditional method for storing all the data needed to render the format.

tree Scrivener3-s01.scriv
Scrivener3-s01.scriv
├── Files
│   ├── Data
│   │   ├── 921B4A08-54C0-4B69-94FD-428F56FDAB89
│   │   │   └── content.rtf
│   │   └── docs.checksum
│   ├── binder.autosave
│   ├── binder.backup
│   ├── search.indexes
│   ├── styles.xml
│   ├── version.txt
│   └── writing.history
├── Scrivener3-s01.scrivx
└── Settings
    ├── recents.txt
    ├── ui-common.xml
    └── ui.ini

Scrivener uses a folder structure to store all the data used in the format. The folder has an extension, .scriv. The format includes some rich text, backups, indexes, version history and more. One unique format within the folder is an XML file with the extension .scrivx. This makes the format proprietary and can only be rendered using the Scrivener software.

cat Scrivener3-s01.scrivx | head
<?xml version="1.0" encoding="UTF-8"?>
<ScrivenerProject Template="No" Version="2.0" Identifier="DF5DA7F0-27DB-4815-A050-B4D6F23CABA7" Creator="SCRWIN-3.1.5.1" Device="DESKTOP-JMM4K7M" Modified="2025-03-14 22:15:28 -0600" ModID="B4A944C3-FF79-49F6-A737-158BEB4E58BB">
    <Binder>
        <BinderItem UUID="17807D28-117A-409E-B12D-B34922B6CC6F" Type="DraftFolder" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:17 -0600">
            <Title>Draft</Title>
            <MetaData>
                <IncludeInCompile>Yes</IncludeInCompile>
            </MetaData>
            <Children>
                <BinderItem UUID="921B4A08-54C0-4B69-94FD-428F56FDAB89" Type="Text" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:23 -0600">

The XML has enough to be able to identify them apart from other XML files. The signature would be straight forward. Earlier versions of Scrivener sometimes have the SCRIVX file but also sometimes has a
.scrivproj extension. This file on a Macintosh is in a Binary plist format, which is different than earlier Windows versions. Seems they may have unified them under version 2 or 3, where version 1 & 2 for Windows uses Project version 1 and version 3 uses project version 2.

hexdump -C Scrivener1-s01.scriv/binder.scrivproj | head
00000000  62 70 6c 69 73 74 30 30  d4 00 01 00 02 00 03 00  |bplist00........|
00000010  04 00 05 00 1d 01 d8 01  d9 54 24 74 6f 70 58 24  |.........T$topX$|
00000020  6f 62 6a 65 63 74 73 58  24 76 65 72 73 69 6f 6e  |objectsX$version|
00000030  59 24 61 72 63 68 69 76  65 72 dc 00 06 00 07 00  |Y$archiver......|
00000040  08 00 09 00 0a 00 0b 00  0c 00 0d 00 0e 00 0f 00  |................|
00000050  10 00 11 00 12 00 13 00  14 00 15 00 16 00 17 00  |................|
00000060  18 00 19 00 1a 00 15 00  1b 00 1c 5a 4c 61 62 65  |...........ZLabe|
00000070  6c 54 69 74 6c 65 59 4c  61 62 65 6c 4c 69 73 74  |lTitleYLabelList|
00000080  5e 42 69 6e 64 65 72 43  6f 6e 74 65 6e 74 73 5f  |^BinderContents_|
00000090  10 0f 44 65 66 61 75 6c  74 4c 61 62 65 6c 54 61  |..DefaultLabelTa|

Since the developers of Scrivener decided to make the SCRIV format simply a folder with different content within, something special happens on the MacOS. The Scrivener software registers all the extensions is uses with the MacOS launch services. This process then changes the way the SCRIV folder is displayed in the MacOS Finder. They now appears as a single file and given a file type. This is called a Document Package format.

By right-clicking on the “file” you can then browse the package contents. There is nothing in the folder itself or hidden in any attributes which causes this to happen, it is all controlled by what extensions have been registered with the launch services database. We can however ask the MacOS to give us some extended metadata details about the package, as long as the file is on a Apple filesystem like HFS or APFS.

mdls Scrivener3-s01.scriv 
_kMDItemDisplayNameWithExtensions      = "Scrivener3-s01.scriv"
kMDItemContentCreationDate             = 2025-03-15 04:15:17 +0000
kMDItemContentCreationDate_Ranking     = 2025-03-15 00:00:00 +0000
kMDItemContentModificationDate         = 2025-03-15 04:15:18 +0000
kMDItemContentModificationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentType                     = "com.literatureandlatte.scrivener3.scriv"
kMDItemContentTypeTree                 = (
    "com.literatureandlatte.scrivener3.scriv",
    "public.directory",
    "public.item",
    "com.apple.package",
    "public.content",
    "public.composite-content"
)
kMDItemDateAdded                       = 2025-03-21 04:38:48 +0000
kMDItemDateAdded_Ranking               = 2025-03-21 00:00:00 +0000
kMDItemDisplayName                     = "Scrivener3-s01.scriv"
kMDItemDocumentIdentifier              = 0
kMDItemFSContentChangeDate             = 2025-03-15 04:15:18 +0000
kMDItemFSCreationDate                  = 2025-03-15 04:15:17 +0000
kMDItemFSCreatorCode                   = ""
kMDItemFSFinderFlags                   = 0
kMDItemFSHasCustomIcon                 = (null)
kMDItemFSInvisible                     = 0
kMDItemFSIsExtensionHidden             = 0
kMDItemFSIsStationery                  = (null)
kMDItemFSLabel                         = 0
kMDItemFSName                          = "Scrivener3-s01.scriv"
kMDItemFSNodeCount                     = 3
kMDItemFSOwnerGroupID                  = 20
kMDItemFSOwnerUserID                   = 501
kMDItemFSSize                          = 31155
kMDItemFSTypeCode                      = ""
kMDItemInterestingDate_Ranking         = 2025-03-15 00:00:00 +0000
kMDItemKind                            = "Scrivener Project"
kMDItemLogicalSize                     = 31155
kMDItemPhysicalSize                    = 69632

There is a lot of additional details available using the MDLS command, this includes the content type of “com.apple.package“. This tools works with any files in MacOS and can be a very useful tool in getting all the information you may need for preservation needs.

Until the tools we use for format identification can recognize package formats, tools like this may be needed to gather the neccessary metadata for preservation. But in the meantime, identification of the package content is the best we can hope for. Creating a signature for the XML based SCRIVX format is the first step.

Stay tuned for more on the package format as I will be bring it up more in the Digital Preservation community.

LUTS

February 28, 2025 by Thor 1 Comment

If you are looking for LUTs, you’re in luck. There is a website for sharing your FreshLUTs. Even though they are fresh, they are probably not as exciting as one might think.

LUTs are short for Look-Up Tables, which doesn’t sound as exciting as you were probably hoping. They are a pretty interesting process for dealing with color in high end Image and Video processing applications. Often called 3D Look-up Tables, they are used for color grading, an essential step in film production and restoration to map from one color space to another. LUTs are not to be confused with ICC profiles which aim for color accuracy, while LUTs are looking for more color quality and aesthetics.

There are a lot of LUT formats out there, it seems. In looking into this format, I have found dozens of others to investigate, but today lets look at the four available as an export from Photoshop.

Above you can see a simple screenshot for the export of different formats from Adobe Photoshop. Adobe is one of the biggest developer and supporter of the formats used in LUTs, but there are many other graphics tools which create and support LUTs. In this Photoshop export we can see four formats included in the export. Lets take a look at each of these.

ICC Profiles are well documented and available for identification in PRONOM.

filename : 'LUTs-Export-s01.icc'
filesize : 197024
modified : 2025-02-25T09:37:24-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1975'
    format  : 'ICC Profile'
    version : '2'
    mime    : 'application/vnd.iccprofile'
    class   : 'Dataset'
    basis   : 'extension match icc; byte match at 8, 32'

But the other three are plain text files and still identify as such. Let us start with the CUBE format.

filename : 'LUTs-Export-s01.cube'
filesize : 884963
modified : 2025-02-25T09:37:24-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/111'
    format  : 'Plain Text File'
    version : 
    mime    : 'text/plain'
    class   : 
    basis   : 'text match ASCII'
    warning : 'match on text only; extension mismatch'

cat LUTs-Export-s01.cube 
#Created by: Adobe Photoshop Export Color Lookup Plugin
#Copyright: (C) Copyright 2025 ObsoleteThor
TITLE "LUT-export-s01"

#LUT size
LUT_3D_SIZE 32

#data domain
DOMAIN_MIN 0.0 0.0 0.0
DOMAIN_MAX 1.0 1.0 1.0

#LUT data points
0.000000 0.000000 0.000000

The CUBE format was first developed by IRIDAS in 2003 as a answer to ensure interoperability with other software. Adobe acquired IRIDAS in 2011 in a effort to be a leader in the color grading and enhancement market. They have published the CUBE specifications for version 1.0 in 2013.

A Cube file is a text file that defines a look-up table in the Cube format.
The Cube look-up tables store RGB values.
Advantages of the Cube format include:

The Cube format can describe look-up tables for a wide range of purposes, from simple gamma adjustments for display output to complex HDR image processing.

The format is well suited for professional digital cinema applications and for both normal range and High-Dynamic Range image processing.

As Cube files are text files, they are easily edited or reviewed using a text editor.

A Cube file can include three 1-dimensional tables or one 3-dimensional table.

The tables can be in a wide range of sizes.

Cube files are trivial to write and read.

All values are human-readable as they are in decimal form, and can be of high precision.

The input domain and output range are not limited to the range 0.0 to 1.0.

According to the specifications, a CUBE file can be a One-Dimensional Cube file or a Three-Dimensional Cube file. From the example above you can see the file is a Three-Dimensional file with the required line “LUT_3D_SIZE“. But in a One-Dimensional file, the required line is “LUT_1D_SIZE“.

cat Demo.cube
TITLE "Demo"
LUT_1D_SIZE 3
DOMAIN_MIN 0 0 0
DOMAIN_MAX 1 2 3
0 0 0
# Comments can go anywhere
0.5 1 1.5
1 1 1

Each CUBE file has one or the other and should be an easy string to look for. It is in a variable position as there can be comments before the required line and also may have a TITLE line. The TITLE and DOMAIN lines are common to every file but not required.

Now, the CUBE format is a bit different depending on the source. They all seem to have the same header, but different elements. It seems the IRIDAS Cube format is the most interoperable. The Truelight Cube format generally has the CUB extension, and the Cinespace Cube has the CSP extension, which will look at next/ You can read more about the differences on this format comparison table. This LUTCalc web site has many different types of Cube’s it can output, so there are some differences.

The other file format available in the export is a CSP. The CSP is also a plain text file, often called a cineSpace LUT file. This format come from the cineSpace software, a color management software for the film and television industry.

cat LUTS-s01.csp 
CSPLUTV100
3D

BEGIN METADATA
#Created by: Adobe Photoshop Export Color Lookup Plugin
TITLE "LUTS"
END METADATA

2
0.0 1.0
0.0 1.0
2
0.0 1.0
0.0 1.0
2
0.0 1.0
0.0 1.0

32 32 32
0.000000 0.000000 0.000000

The CSP File Format specifications outlines header and the other two sections.:

The cineSpace LUT format contains three main sections.
Header
This section contains the LUT identifier and the LUT type, 3D or 1D.
It is made up of the first two (2) valid lines in the file. See Notes below for the definition of a valid line.

Examples
• (3D LUT) header:
CSPLUTV100
3D
• (1D LUT) header:
CSPLUTV100
1D

So there is a pretty obvious header to work with in identification. “CSPLUTV100” can be used to identify both 1D and 3D CSP files.

The other format available to export from Photoshop is 3DL. They seem to be connected to the Assimilate Inc. company and software. A specification has been posted, and it looks like there is only ASCII and not much in the way of a header.

cat LUTS-s01.3dl 
#Created by: Adobe Photoshop Export Color Lookup Plugin
#Description: LUTS
0 33 66 99 132 165 198 231 264 297 330 363 396 429 462 495 528 561 594 627 660 693 726 759 792 825 858 891 924 957 990 1023

It does not appear there is any headers or static strings to use for identification. The specification calls the format, 3DL ASCII format and that “All lines starting with ‘#’ are treated as comments.” Because of this, I don’t think positive identification can happen at this time.

For now I am just proposing 2 new file formats to PRONOM, The CUBE format And the CSP Format. Click on my GitHub submission page to see the signatures and enjoy some samples!

Pro Tools Sessions

January 10, 2025 by Thor Leave a comment

One of the most important software titles related to professional audio recording and mixing is Pro Tools. The Digital Audio Workstation by Digidesign, now Avid, has been around since 1991 and was born from the very popular Sound Designer software first released in 1985. When Sound Designer II was released a few years later, the audio format used became the standard file format for audio recordings. Pro Tools progressed from there to become the industry standard for professional audio production, even winning a Technical Grammy, Emmy, and Oscar.

Pro Tools helped produce amazing music for artists such as No Doubt, Maroon 5, Ricky Martin, and many others. Obviously the best part is the final mixed audio used to make the music we love, but the work that goes into creating the audio mixes is saved in a Pro Tools session. The session is where all the magic happens. A Pro Tools session is actually a project file within a folder where all the supporting files are located.

tree PT Sample/
├── Audio Files
│   ├── GTR 1_02.wav
│   ├── GTR 1_03.wav
│   └── GTR 1_04.wav
└── Test.ptx

These Session “Folders” can get pretty complex as more audio and effects are added to the session, adding folders such as Fade Files, Rendered Files, and Plug-in settings. The current version of Pro Tools uses a project session file with the extension PTX, but that wasn’t always the case. The current version of Pro Tools can be run on Macintosh and Windows, but that also was not always the case. Because the software was originally written for Macintosh hardware, the session files were only compatible on the Macintosh file system as well.

Lets start by looking at a session from Pro Tools version 1.1 from 1991.

ls -l@ Demo Disk 1 
total 1504
-rw-r--r--@ 1 thorsted  Domain Users   45056 Sep 13  1991 Backward Kick
	com.apple.FinderInfo	    32 
	com.apple.ResourceFork	  1354 
	com.apple.provenance	    11 
-rw-r--r--@ 1 thorsted  Domain Users       0 Sep 16  1991 Demo Session
	com.apple.FinderInfo	    32 
	com.apple.ResourceFork	 13671 
	com.apple.provenance	    11 
-rw-r--r--@ 1 thorsted  Domain Users       0 Sep 16  1991 Desktop
	com.apple.FinderInfo	    32 
	com.apple.ResourceFork	  3081 
	com.apple.provenance	    11 
-rw-r--r--@ 1 thorsted  Domain Users  339456 Sep 13  1991 Solo 1
	com.apple.FinderInfo	    32 
	com.apple.ResourceFork	  2040 
	com.apple.provenance	    11 
-rw-r--r--@ 1 thorsted  Domain Users  350390 Sep 13  1991 Solo 2
	com.apple.FinderInfo	    32 
	com.apple.ResourceFork	  2006 
	com.apple.provenance	    11

You might notice the “Demo Session” file is Zero Bytes, but the Resource Fork is 13671 bytes in size.

The Pro Tools Sessions from the beginning until version 5 used this method of storing the session data. ALL in the Resource Fork. Because the session data was in the resource fork and the supporting audio files were in the Sound Designer II format, which also stored important information in the resource fork, this made it impossible to use on anything but a Macintosh file system.

Version 10 of Pro Tools allows you to export the full session back into older versions of the software to version 3.2. When you choose version 5 on a Mac, it forces you to also convert the audio formats to SD2 files as well. For versions 1 & 2 of Pro Tools, there was no official extension for the session files, but starting with version 3, you might often find the extension PT3, then PT4, and PT5. With version 4, there was also a version P24 extension used when Pro Tools version 4 made the leap to 24bit. But for each of these versions identification is not possible with current preservation tools like PRONOM. You could encode the session as a MacBinary to retain everything for modern systems, which is identifiable, but you could also use my proposal for a lookup in the TCDB python tool located here.

python3 TC-lookup-draft-uni.py "PT Session 02-41.pt4"
Type Code: PT4S
Creator Code: PTul
Size of Data Fork: 0 bytes
Size of Resource Fork: 14003 bytes
Rows with Type Code b'PT4S' and Creator Code b'PTul': 
Row index: 32813
File Name: Pro Tools 4
Type: PT4S
Creator: PTul
Extension: pt4
Data by Ilan Szekely, Jerusalem: nan

Extension	Version	Type	Creator
	Pro Tools 1.1	mtSF	TLin
	Pro Tools 2	PSes	PTul
PT3	Pro Tools 3.2	PSes	PTul
PT4	Pro Tools 4 16bit	PT4S	PTul
PT24	Pro Tools 4 24bit	PT24	PTul
PT5	Pro Tools 5	PT5S	PTul
PTS	Pro Tools 5.1-6.9	PTS	PTul
PTF	Pro Tools 7-9	PTF	PTul
PTX	Pro Tools 10+	PTX	PTul

There isn’t a lot of information about when Pro Tools was made for Windows. I found some references to a Windows NT version of the 16bit and 24bit version 4. I did also find a copy of the free Pro Tools version 5.01 for Windows 98. In the Read Me it states:

Cross–platform File Exchange is not supported in this version of Pro Tools FREE

File interchange between Mac and PC versions of Pro Tools FREE is not possible in this 5.0.1 release. We hope to include this functionality in a future release of Pro Tools FREE.You can exchange files with Pro Tools LE and TDM users who use the same platform (Mac or Win98/Me) as you, but remember, Pro Tools FREE is limited to 8 audio and 48 MIDI tracks.

Running the software confirms the session file for this version has the extension PT5 and not the later PTS for version 5.1. This version of Pro Tools also allows you to save back to the P24 and PT4 versions, which are probably the first Windows versions. But they are entirely different file formats from the Macintosh versions.

hexdump -C PT5-Win-s03.pt5 | head
00000000  00 00 01 00 00 00 45 ae  00 00 44 ae 00 00 03 98  |......E...D.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 00 00 5e 50 53 56 45  00 01 04 31 00 52 05 00  |...^PSVE...1.R..|
00000110  45 44 05 00 45 44 19 99  03 26 0c 50 72 6f 54 6f  |ED..ED...&.ProTo|
00000120  6f 6c 73 20 35 2e 30 fc  c5 00 d7 12 00 78 5e 00  |ols 5.0......x^.|
00000130  00 00 0e 32 00 78 5e 00  00 00 00 00 00 00 00 00  |...2.x^.........|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C PT24-Win-s03.p24 | head
00000000  00 00 01 00 00 00 3f d3  00 00 3e d3 00 00 02 f1  |......?...>.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 00 01 0a 50 61 74 68  00 01 02 b4 37 0e 2e 48  |....Path....7..H|
00000110  43 3a 5c 57 49 4e 44 4f  57 53 5c 44 65 73 6b 74  |C:\WINDOWS\Deskt|
00000120  6f 70 5c 50 54 5c 50 54  35 2d 57 69 6e 2d 73 30  |op\PT\PT5-Win-s0|
00000130  33 5c 41 75 64 69 6f 20  46 69 6c 65 73 00 00 00  |3\Audio Files...|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C PT4-Win-s03-16.pt4 | head
00000000  00 00 01 00 00 00 3f d9  00 00 3e d9 00 00 02 f1  |......?...>.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 00 01 0a 50 61 74 68  00 01 02 b4 37 0e 2e 48  |....Path....7..H|
00000110  43 3a 5c 57 49 4e 44 4f  57 53 5c 44 65 73 6b 74  |C:\WINDOWS\Deskt|
00000120  6f 70 5c 50 54 5c 50 54  35 2d 57 69 6e 2d 73 30  |op\PT\PT5-Win-s0|
00000130  33 5c 41 75 64 69 6f 20  46 69 6c 65 73 00 00 00  |3\Audio Files...|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Starting with Pro Tools 5.1 in 2001 things began to change. Pro Tools has always been tied very closely with hardware and software so with Apple launching Mac OS X, this provided an opportunity for DigiDesign/Avid to revamp their hardware and software for better compatibility and this included a cross-platform session format.

Pro Tools 5.1 used a new session format which used the extension PTS. Let’s take a look at a sample.

hexdump -C PT Session 02-51.pts | head
00000000  03 30 30 31 30 31 31 31  31 30 30 31 30 31 30 31  |.001011110010101|
00000010  31 00 01 3d 6e 1c 06 eb  d8 c1 aa 16 fd 65 4e 6d  |1..=n........eNm|
00000020  23 09 96 db c4 ad 95 7f  68 5d 3a 23 0c a5 ac a8  |#.......h]:#....|
00000030  90 cd ed 04 38 4e 06 47  bc e2 ca b3 9c 8f 6e 57  |....8N.G......nW|
00000040  40 2a 12 fb e4 c4 b6 9f  88 77 5a 43 2c 24 ce c9  |@*.......wZC,$..|
00000050  e3 97 9b 8a 73 5d 46 2f  4a 64 86 b6 dd d6 eb 77  |....s]F/Jd.....w|
00000060  76 49 32 1b 54 9f b9 9f  fc fe 15 0f 3f 15 4d 62  |vI2.T.......?.Mb|
00000070  83 aa ab c4 fa 5d 20 26  54 44 0b f3 d9 c5 ae 97  |.....] &TD......|
00000080  cd 08 31 74 77 0d f6 df  c8 b5 c0 8b 6c 7c 3f 27  |..1tw.......l|?'|
00000090  10 9e c2 cb b4 9d 86 45  58 41 2a ad e1 78 2d b4  |.......EXA*..x-.|

The session is a new proprietary binary format with an interesting header. There is one byte and then a sequence of ASCII characters in the form of a binary string. 0010111100101011 What it means is unknown to me. In Decimal, the binary reads “12075”, or hex values “2F2B” or in text “/+”. Regardless of what it means, this header was used from versions 5.1 through 9. The extension changed to PTF with version 7-9, but the header is the same. This is why PRONOM PUID fmt/1951 refers to both extensions covering 5.1-9.

hexdump -C PT Session 02-7.ptf | head
00000000  03 30 30 31 30 31 31 31  31 30 30 31 30 31 30 31  |.001011110010101|
00000010  31 00 01 4c 6a cd 68 00  a0 3c d8 d2 c1 ac 48 be  |1..Lj.h..<....H.|
00000020  85 1c 25 54 f0 8c 31 e1  61 fc 98 34 d0 6c 08 a4  |..%T..1.a..4.l..|
00000030  40 dc 79 14 b0 4c eb 84  21 bc 58 f4 90 2c cc 64  |@.y..L..!.X..,.d|
00000040  00 9c 0e a7 15 6f a9 44  e0 7c 18 b4 7a ec 88 24  |.....o.D.|..z..$|
00000050  c6 42 65 77 5d b8 f2 80  a1 3c d8 2e 12 ac 6b e4  |.Bew]....<....k.|
00000060  80 1c a2 71 f0 8c 2c c4  60 fc ae 47 b5 0f 09 a4  |...q..,.`..G....|
00000070  40 dc 78 14 9a 4c e8 84  26 a2 c5 17 fd 58 52 e0  |@.x..L..&....XR.|
00000080  01 9c 38 d4 70 0d a8 44  e0 26 1a b4 73 ec 88 24  |..8.p..D.&..s..$|
00000090  da 79 f8 94 34 cc 68 04  96 4f bd 17 11 ac 48 e4  |.y..4.h..O....H.|

It might be possible to look closer at the two extensions and find something which can distinguish between them, but because they are in a proprietary binary format, there isn’t much to go on. There has been a few attempts at reverse engineering the formats, but they even choose to lump the two extensions together.

The other import byte in this header is the second byte after the odd binary ASCII sequence. Above highlighted in purple. 0x01 is important because in the next version PTX, this changes to 0x05, highlighted below in purple.

Pro Tools version 10 was a big release, it added new features and started to phase out the HD hardware. With this release we see a new session format which is still used by the current version of Pro Tools.

hexdump -C PT Session 02-10.ptx | head
00000000  03 30 30 31 30 31 31 31  31 30 30 31 30 31 30 31  |.001011110010101|
00000010  31 00 05 13 5a 01 00 04  00 00 00 49 a4 00 00 5a  |1...Z......I...Z|
00000020  03 00 64 00 00 00 03 00  00 0c 00 00 00 50 72 6f  |..d..........Pro|
00000030  20 54 6f 6f 6c 73 20 48  44 03 00 00 00 0a 00 00  | Tools HD.......|
00000040  00 03 00 00 00 09 00 00  00 06 00 00 00 31 30 2e  |.............10.|
00000050  33 2e 39 01 07 00 00 00  52 65 6c 65 61 73 65 00  |3.9.....Release.|
00000060  16 00 00 00 50 72 6f 20  54 6f 6f 6c 73 20 53 65  |....Pro Tools Se|
00000070  73 73 69 6f 6e 20 46 69  6c 65 06 00 05 00 00 00  |ssion File......|
00000080  4d 61 63 4f 53 00 00 00  00 05 5a 08 00 eb 00 00  |MacOS.....Z.....|
00000090  00 67 20 00 00 00 00 2a  00 00 00 be 1d 9d e3 03  |.g ....*........|

This new session format has the same binary ASCII string, but a lot more plain text in the header and throughout the file. This gives us more to explore and understand with even listing the linked Audio files and their paths. PRONOM has this new format assigned to PUID fmt/1727. The signature for these files is the same sequence as the previous version, also the 0x05 byte, but with a couple additional bytes, 5A010004, after the main sequence. I am not sure of the bytes significance, but they are in all the samples I have, even from the current version.

Pro Tools has some other formats which go along with their sessions. One I’ll highlight is the Groove template format. They end with the extension GRV. You can see some samples here. They also have the odd binary ASCII header, but with 0x00 for the second byte after the main header. Highlighted in purple below.

hexdump -C DiskoKonga.grv| head
00000000  03 30 30 31 30 31 31 31  31 30 30 31 30 31 30 31  |.001011110010101|
00000010  31 01 00 5a 00 01 00 00  00 04 00 00 15 f8 5a 00  |1..Z..........Z.|
00000020  01 00 00 15 d3 10 42 04  04 00 64 00 64 00 64 00  |......B...d.d.d.|
00000030  01 00 01 00 01 00 00 00  00 01 d4 c0 00 00 00 00  |................|
00000040  00 00 00 00 81 00 00 00  00 00 00 00 81 5a 00 01  |.............Z..|
00000050  00 00 00 24 10 43 00 00  00 00 00 00 00 00 00 00  |...$.C..........|
00000060  00 00 00 01 d4 c0 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 00 00 01 d4 c0 00 49  5a 00 01 00 00 00 24 10  |.......IZ.....$.|
00000080  43 00 00 00 00 00 01 d4  c0 00 00 00 00 00 05 7e  |C..............~|
00000090  40 00 00 00 00 00 04 8e  e0 00 00 00 00 00 01 d4  |@...............|

Other extensions associated with Pro Tools which use the same format are: PIO, PIM, PTT, PTXT, RGRP.

Pro Tools has always been software directly tied to audio hardware and system software. In addition they also used software dongles to control software licensing and the licenses were not cheap. Because of this, trying to use older versions is very difficult. Finding samples for each version is difficult as each version allows for a variety of features that may not be available in another version. Luckily, there are some older “Free” versions out there with limited features we can get some ideas of the session format.

PRONOM has working identification for the two major formats and until PRONOM can incorporate Macintosh Resource Fork identification it will have to do. The PC version 4 and 5 formats could use more research as I only have one source. The groove and other formats all seem to have the same header so they will need more research as well. Until then, enjoy some sample files and also a disk image of some older Macintosh Pro Tools 3 sessions.

Script Writing

December 20, 2024 by Thor Leave a comment

A few of you may remember a couple years ago reading in a Vice article about Eric Roth and his use of an old DOS only software program for writing all his Hollywood scripts. The Vice article was based on some earlier reporting in 2014 about his writing process. You can watch the full interview of Eric Roth on YouTube.

I remember seeing a link to the Vice article a couple years ago and finding the screenwriters use of an old DOS program, Movie Master, funny and interesting. He says in his interview that out of half superstition and half fear of change he prefers to use this very old software to write his screenplays. It’s so old and obsolete, he can’t even email the files to Hollywood. He has to print them out and have the studio scan them into modern software for use. The interview shows the screen of his old Windows computer and you can see the software he is using.

Of course because I love researching obsolete software and formats so much, I wanted to know if the scripts generated by “Movie Master”, version 3.09, are in a format that needed to be documented. I was a little surprised that this version of Movie Master was no where to be found. It was on none of the old abandoned software sites. Not on Internet Archive, no where it seemed. I did find a later version of Movie Master, version 5, but found this software was not the same thing.

The original programmer of Movie Master was Adam Greissman, which you can clearly see in the screenshot above. The software was copyright Comprehensive Video Supply in the 1980’s, but the Movie Master version 5 was developed by Ballistic Software, Inc, which was also known as “Comprehensive Cinema Software” or “Hollywood Cinema Software” later in the 1990’s.

According to a very in depth article by Daniel Plagens, Reinventing the Typewriter, mentions Adam Greissman not wanting to move the software from DOS to Windows as he didn’t feel there was enough of a market at the time. As it turns out the founder of Comprehensive Video Supply, Jules Leni, got a lot of pressure from users of Movie Master after Greissman, who left the company in 1991, to develop a Windows and Macintosh version of the software. They released this new version in October of 1996.

Let’s take a look at a couple of example files from version 5.

hexdump -C Scene.scr | head
00000000  11 0d 0a 32 2e 20 20 20  20 15 0d 0a 15 0d 0a 15  |...2.    .......|
00000010  0d 0a 15 0d 0a 11 0d 0a  10 0d 0a 15 0d 0a 15 0d  |................|
00000020  0a 15 0d 0a 10 0d 0a 46  41 44 45 20 49 4e 3a 15  |.......FADE IN:.|
00000030  0d 0a 54 68 65 20 66 6f  6c 6c 6f 77 69 6e 67 20  |..The following |
00000040  22 73 63 72 69 70 74 6c  65 74 22 20 64 65 6d 6f  |"scriptlet" demo|
00000050  6e 73 74 72 61 74 65 73  20 68 6f 77 20 4d 6f 76  |nstrates how Mov|
00000060  69 65 20 4d 61 73 74 65  72 20 0d 0a 63 61 6e 20  |ie Master ..can |
00000070  62 65 20 75 73 65 64 20  74 6f 20 6f 75 74 6c 69  |be used to outli|
00000080  6e 65 20 73 63 65 6e 65  73 2e 20 20 4f 6e 63 65  |ne scenes.  Once|
00000090  20 79 6f 75 20 68 61 76  65 20 66 69 6e 69 73 68  | you have finish|

hexdump -C MM5-s01.scr | head 
00000000  11 0d 0a 31 2e 20 20 20  20 15 0d 0a 15 0d 0a 15  |...1.    .......|
00000010  0d 0a 15 0d 0a 11 0d 0a  10 0d 0a 15 0d 0a 15 0d  |................|
00000020  0a 15 0d 0a 10 0d 0a 54  45 53 54 49 4e 47 15 0d  |.......TESTING..|
00000030  0a 7e 60 21 40 23 24 25  5e 26 2a 28 29 2d 2b 7c  |.~`!@#$%^&*()-+||
00000040  3d 2d 54 65 43 66 4d 74  0d 0a 01 00 00 07 00 02  |=-TeCfMt........|
00000050  00 00 00 00 00 00 01 00  00 01 00 00 01 00 00 01  |................|
00000060  00 00 01 00 00 01 00 00  01 00 00 01 00 00 01 00  |................|
00000070  00 01 00 00 bf 03 00 00  0c 00 43 6f 75 72 69 65  |..........Courie|
00000080  72 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |r...............|
00000090  00 00 00 00 00 00 00 00  00 30 00 00 00 00 00 00  |.........0......|

Version 5 of Movie Master uses the extension SCR, which one could assume is short for “Script”. There does appear to be a header before any readable text starts, so that will be helpful in identification. Currently there is only one PUID, x-fmt/100, in PRONOM with the extension SCR, which happens to be for an AutoCAD script and has no signature, so anything you ask DROID or Siegfried to identify with the SCR extension will default to an AutoCAD script, which is frustrating. According to the File Format Wiki, there are quite a few formats with the SCR extension. More work to be done there for sure.

So I tried for a few weeks to find a copy of Movie Master version 3.09, I even put in a eBay favorite search for the name so it would alert me to a copy being sold, but no such luck. I gave up for awhile, then recently someone posted a link to a large collection of early warez. Warez is the name given to software that has been illegally copied. When I followed the link and searched though the vast amount of software titles, I got excited to see a couple matches to “Movie Master”. After a little wrangling of some downloads, I spun up a copy of DOSBox and low and behold, Movie Master 3.09!

Welcome to Movie Master V3.09 about screen

A lot of people have compared the old DOS scriptwriting tools to early word processors like Word, Perfect Writer, WordStar, etc. They did much of the same thing, but with special controls for helping with scenes, characters, indents, and everything writers needed to make some of the best Hollywood films out there. As Daniel Plagens noted:

The program proved popular for many years. Greissman estimates they sold over 10,000 units—“saturating the market,” as he put it—and recalls seeing help wanted ads in Hollywood Reporter and Variety where knowledge of Movie Master was a hiring requirement. He visited the sets of Days of Thunder and Hunt for Red October to help their writers and production teams acclimate to Movie Master.

Makes me wonder where all the old scripts from Hollywood movies are located in their electronic form? I am sure Eric Roth probably has quite the collection of different scripts he has written. I sure hope he backs them up and donates them to a library in the future.

Well, let’s take a look at a couple sample files from Movie Master version 3 and version 4. Version 4.04 was also in the collection uploaded to Internet Archive.

hexdump -C TEST3.SCR | head 
00000000  33 2e 30 39 0a 00 00 00  00 31 00 00 00 00 00 00  |3.09.....1......|
00000010  31 00 00 00 00 00 00 0a  00 4e 41 4d 45 20 3f 0a  |1........NAME ?.|
00000020  ff 53 43 52 45 45 4e 0a  2a 42 01 19 3c 01 1e 37  |.SCREEN.*B..<..7|
00000030  01 1c 2f 01 14 25 01 18  24 01 39 4c 01 31 42 01  |../..%..$.9L.1B.|
00000040  35 41 01 0a 46 01 0a 46  01 3d 4b 01 02 00 01 0a  |5A..F..F.=K.....|
00000050  03 00 54 65 73 74 69 6e  67 20 4d 6f 76 69 65 20  |..Testing Movie |
00000060  4d 61 73 74 65 72 20 76  65 72 73 69 6f 6e 20 33  |Master version 3|
00000070  2e 30 39 11 11 31 11 31  0a                       |.09..1.1.|

hexdump -C TEST.SCR          
00000000  34 2e 30 34 0a 00 00 00  00 31 00 00 00 00 00 00  |4.04.....1......|
00000010  31 00 00 00 00 00 00 00  00 00 00 00 00 00 00 0a  |1...............|
00000020  ff 0a 2a 42 01 00 19 3c  01 00 1e 37 01 00 1c 2f  |..*B...<...7.../|
00000030  01 00 14 25 01 00 18 24  01 00 39 4c 01 00 31 42  |...%...$..9L..1B|
00000040  01 00 35 41 01 00 0a 46  01 00 0a 46 01 00 3d 4b  |..5A...F...F..=K|
00000050  01 00 0a 18 01 00 0a 46  01 00 02 00 00 54 68 69  |.......F.....Thi|
00000060  73 20 69 73 20 61 20 74  65 73 74 20 6f 66 20 4d  |s is a test of M|
00000070  6f 76 69 65 20 4d 61 73  74 65 72 20 53 63 72 69  |ovie Master Scri|
00000080  70 74 20 77 72 69 74 69  6e 67 20 73 6f 66 74 77  |pt writing softw|
00000090  61 72 65 2e 0a 01 03 00  00 31 0a 01 00 00 00 00  |are......1......|
000000a0  0a 03 01 0a                                       |....|

hexdump -C COVER.SCR | head 
00000000  33 2e 30 35 0a 01 00 00  00 31 00 00 00 00 00 00  |3.05.....1......|
00000010  31 00 00 00 00 00 00 0a  ff 43 4f 56 45 52 0a 2a  |1........COVER.*|
00000020  42 01 19 3c 01 1e 37 01  1c 2f 01 14 25 01 18 24  |B..<..7../..%..$|
00000030  01 39 4c 01 31 42 01 35  41 01 0a 46 01 0a 46 01  |.9L.1B.5A..F..F.|
00000040  3d 4b 01 06 00 00 0a 03  01 31 0a 01 03 00 00 11  |=K.......1......|
00000050  11 11 11 11 11 11 11 11  11 11 11 11 11 11 11 11  |................|
00000060  11 11 11 11 11 11 11 11  20 20 20 20 20 20 20 20  |........        |
00000070  20 20 20 20 20 20 20 20  20 20 20 20 20 22 4d 65  |             "Me|
00000080  65 74 20 74 68 65 20 44  72 61 63 75 6c 61 73 22  |et the Draculas"|
00000090  11 11 11 11 11 20 20 20  20 20 20 20 20 20 20 20  |.....

hexdump -C DRAC2.SCR | head 
00000000  34 2e 30 30 0a 01 00 2b  00 36 00 00 00 00 00 00  |4.00...+.6......|
00000010  35 00 00 00 00 00 00 00  00 00 00 00 00 00 00 0a  |5...............|
00000020  00 42 4f 42 0a 01 54 45  44 0a 02 43 41 52 4f 4c  |.BOB..TED..CAROL|
00000030  0a 03 41 4c 49 43 45 0a  04 49 47 4f 52 0a 05 44  |..ALICE..IGOR..D|
00000040  45 4e 4e 49 53 0a 06 4d  55 46 46 49 4e 0a ff 53  |ENNIS..MUFFIN..S|
00000050  43 52 45 45 4e 0a 2a 42  01 00 19 3c 01 00 1e 37  |CREEN.*B...<...7|
00000060  01 00 1c 2f 01 00 14 25  01 00 18 24 01 00 39 4c  |.../...%...$..9L|
00000070  01 00 31 42 01 00 35 41  01 00 0a 46 01 00 0a 46  |..1B..5A...F...F|
00000080  01 00 3d 4b 01 00 0a 18  01 00 0a 46 01 00 02 01  |..=K.......F....|
00000090  01 35 0a 03 00 45 58 54  20 54 45 44 20 44 52 41  |.5...EXT TED DRA|

The first thing to notice is they all start with the version number of the software which wrote the file. Really nice to have, but a terrible magic header. The files also all begin (after the version number) and end with the Hex value “0A”. Which happens to be a line feed control character. So super common, but could be helpful. Another pattern is that on the 9th byte it is “31” on most of the samples and “36” on one of them. “31” is the start of the ASCII number sequence, so could be the sequence number for the script as each SCR file could only store what was in memory.

I fear the rest of the format will have the same issue most word processors had at the time which is not having a header, but lots of formatting codes which may or may not be in every file, making programatic identification difficult. Might take awhile to identify all the formatting codes, but could lead to better identification and possibly an import module for tools like LibeOffice or Final Draft.

Screenshot of Movie Master 4.04 start screen

I didn’t find much different with Movie Master 4, seemed to have the same restrictions to 16 files in a script. The files from version 4 also seem to follow the same patterns from version 3. But both versions are different from the the Windows version of Movie Master, version 5. Click here for Movie Master 5 help menu on “Introduction for Movie Master DOS Users“.

There was another elusive script writing software title which adds to the confusion. Scriptware was another screenwriting software tool which seems to have had a large following. They produced a Windows and Macintosh version. It also started out for DOS and also used the SCR extension. The website is still active for the software, but hasn’t updated in 24 years. I wrote a little about in my post on PROmotion. All the demo versions out there are not useable demos, but animation demos. In this nice batch of old software on the Internet Archive I was able to find an early copy. Wasn’t able to get it to run, but the folder did have some samples.

hexdump -C SAMPLE1.SCR | head
00000000  32 5f 01 00 00 00 00 00  00 00 00 39 01 4a 5f 00  |2_.........9.J_.|
00000010  ff ff 2c 01 00 00 00 00  00 00 95 80 01 00 11 53  |..,............S|
00000020  63 72 69 70 74 77 61 72  65 20 53 63 72 69 70 74  |criptware Script|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000b0  00 00 00 00 00 00 00 00  11 00 02 00 02 00 14 00  |................|
000000c0  12 00 6f 02 f9 04 0b 00  7b 04 01 00 05 00 00 02  |..o.....{.......|
000000d0  00 11 00 00 00 0c 00 00  00 06 00 ed 01 05 06 00  |................|
000000e0  00 00 00 00 08 00 0b 00  00 00 04 00 00 00 04 00  |................|
000000f0  82 00 01 01 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C SAMPLE2.SCR | head
00000000  0b 53 63 72 69 70 74 77  61 72 65 1a 95 80 04 80  |.Scriptware.....|
00000010  1e 53 63 72 69 70 74 77  61 72 65 20 53 63 72 69  |.Scriptware Scri|
00000020  70 74 20 32 2e 32 33 3a  34 3b 37 30 32 32 31 00  |pt 2.23:4;70221.|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000a0  00 00 00 00 00 00 00 00  00 00 11 00 02 00 02 00  |................|
000000b0  11 00 00 00 34 02 01 05  0b 00 89 04 01 00 05 00  |....4...........|
000000c0  00 02 00 11 00 00 00 0b  00 00 00 06 00 aa 01 05  |................|
000000d0  06 00 00 00 00 00 08 00  0c 00 00 00 05 00 00 00  |................|
000000e0  05 00 8a 00 01 01 00 00  00 00 00 00 00 00 00 00  |................|

Luckily, they make it quite easy to identify these SCR files. ScriptWare was very popular and continued on with Windows and Macintosh versions. Later on, the format was changed along with the extension, which changed to SW3.

The SCR extension has been used often. On my desktop they default as a Paintbrush document. Apparently SCR is sometimes used as an extension for the ZSoft Paintbrush (PCX) format. It is also used on older postscript fonts on the Macintosh as a Type 1 screen font. Can also be a screensaver on Windows, but watch out, they can hide malicious code. You get the idea, SCR is a very common extension, identifying it up front can help avoid problems later!

Moral of the story is to never give up searching for old software and even though illegal copying of software should be avoided, I am grateful to those who help save abandoned software. Without them many titles would be lost.

I don’t have a good signature for these formats yet, but you can find a few samples on my GitHub page.