Page Perfect

PagePerfect: the Promise of Desktop Publishing Realized

Now, PagePerfect has arrived. And suddenly PC desktop publishing is a lot
simpler and less expensive, because PagePerfect integrates desktop
publishing, word processing, and graphics editing all in one package.

The 1980’s was a time of growth in personal computing and one industry was progressing rapidly. Previously in order to get printed more than just words, you had to use a complex arrangement of type, masking, screening; all done by hand. Now with a personal computer you could design and print well designed layouts. There were many software applications who came on the scene in these early days. My personal favorite was QuarkXPress, I used the software in the early 1990’s and spent the next few years working in a commercial printshop using the software. What once took a team of skilled workers to set copy, mask, blueline, etc took only one person with the right software.

I recently came across a set of floppy disks for some software called PagePerfect, by a well known software company IMSI.

This article in a 1988 PC Magazine announces this new revolutionary software. This was early on in the days of computer desktop publishing and even on a DOS system the software was powerful. It didn’t always get the best reviews in terms of ease of use, but it was well built. The company behind this powerful software wasn’t IMSI as you might expect, it was programed by a different company, Beyond Words, started by three former MicroPro employees, the makers of WordStar. Beyond Words liked to “leave sales to others” which included IMSI and a big contract with Canon called their Desktop Publishing System.

IMSI was able to market the software well and was well priced. The name PagePerfect didn’t last long and soon after they renamed the software IMSI Publisher in 1989. I’m not 100% sure, but it might have to do with WordPerfect asserting some copyright to the name around that same time. By 1990, the software was not seen much anymore, but another name pops up, Beyond Words Composer 2.0.

All three versions of the software have a very similar interface.

But the one thing they all have in common is their file formats. Unfortunately they used the same extensions many word processing software used during this time and after. .DOC and also .STY which was used frequently by Microsoft Word as well. It makes sense, a Document is shortened to DOC and a Stylesheet is shortened to STY. So if you have any DOC files which don’t open in Word, you might look here. The other problem is the file format used is not plain text and is in a binary proprietary format.

hexdump -C TEST.DOC | head
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 3c af 13 5b 1e 00 00 00 95 63 |......<..[.....c|
00000020 00 00 5e 00 00 00 18 00 00 00 01 00 76 00 00 00 |..^.........v...|
00000030 68 01 00 00 0a 00 de 01 00 00 00 00 00 00 00 00 |h...............|
00000040 de 01 00 00 8b 60 00 00 1e 00 69 62 00 00 2c 01 |.....`....ib..,.|
00000050 00 00 1e 00 00 00 00 00 00 00 00 00 00 00 5b 42 |..............[B|
00000060 57 44 4f 43 5d 00 00 00 00 32 2e 30 39 00 00 00 |WDOC]....2.09...|
00000070 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 6c 00 00 00 00 00 00 00 00 00 00 00 |....l...........|

The one positive is the very obvious strings of text in the header. [BWDB] and [BWDOC], which one could infer as Beyond Words DB and Beyond Words Document. A later Beyond Words Composer document has the same header but a higher version number.

hexdump -C WELCOME.DOC | head
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 aa 14 56 16 29 00 00 00 30 84 |........V.)...0.|
00000020 00 00 5e 00 00 00 18 00 00 00 01 00 76 00 00 00 |..^.........v...|
00000030 b0 01 00 00 0c 00 26 02 00 00 00 00 00 00 00 00 |......&.........|
00000040 26 02 00 00 70 80 00 00 29 00 96 82 00 00 9a 01 |&...p...).......|
00000050 00 00 29 00 00 00 00 00 00 00 00 00 00 00 5b 42 |..)...........[B|
00000060 57 44 4f 43 5d 00 00 00 00 33 2e 30 31 00 00 00 |WDOC]....3.01...|
00000070 00 00 00 00 0c 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 6e 00 00 00 00 00 00 00 00 00 00 00 |....n...........|

If we look at the Stylesheets we see the same patterns.

hexdump -C SAMPLE.STY | head   
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 51 10 76 10 09 00 00 00 da 2c |......Q.v......,|
00000020 00 00 5e 00 00 00 18 00 00 00 01 00 76 00 00 00 |..^.........v...|
00000030 68 01 00 00 0a 00 de 01 00 00 00 00 00 00 00 00 |h...............|
00000040 de 01 00 00 a2 2a 00 00 09 00 80 2c 00 00 5a 00 |.....*.....,..Z.|
00000050 00 00 09 00 00 00 00 00 00 00 00 00 00 00 5b 42 |..............[B|
00000060 57 44 4f 43 5d 00 00 00 00 32 2e 30 39 00 00 00 |WDOC]....2.09...|
00000070 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 6c 00 00 00 00 00 00 00 00 00 00 00 |....l...........|

I haven’t been able to find any specific bytes which differentiate the Stylesheets from the Documents. They may be the same format, but for now we will consider them the same. These stylesheets seem to function as a template which are often the same format.

Apart from the document layout, the software can also create and use databases. Which appear to be a similar format but with different offsets.

hexdump -C DOCUMENT.TBL | head
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 6b 10 36 00 00 00 18 00 00 00 |......k.6.......|
00000020 01 00 4e 00 00 00 68 01 00 00 0a 00 b6 01 00 00 |..N...h.........|
00000030 00 00 00 00 00 00 5b 42 57 44 4f 43 5d 00 00 00 |......[BWDOC]...|
00000040 00 32 2e 30 39 00 00 00 00 00 00 00 0a 00 00 00 |.2.09...........|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 6c 00 00 00 |............l...|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Prior to me diving into this format, the only tool which had some information on this format was TrID, which identified all the DOC and STY files as Beyond Words Composer style. Which is mostly true. Hopefully with this background you can be aware of the different software names this format was used with and with some luck convert the files to something less proprietary.

Some disks that came with my PagePerfect install disks do have some personal documents created with the software, but I wonder how much this software really was used in the late 1980’s and early 1990’s, because after that point, you don’t hear about the software anymore. There is some references to the software getting absorbed into another software, IBM DisplayWrite 5/2. I would be curious if others have come across this file format.

More Student Writing Center

Most of what you will find on this blog is file format identification. I see this as the first step in a longer process of preservation and ultimately access. Hopefully the analysis of some file formats can help make better decisions when needing to render the file in an emulator or migrate to another format. I don’t spend much time trying to parse the files I look at to understand the actual content, just enough to properly identify and differentiate between important versions of the format.

One area I sometimes touch on, but often skim over is encryption. Many file formats are binary, meaning they use a sequence of bytes to encode data which is more efficient than human readable text and is often compressed. The bytes used to store data is designed by the developer of the software, they can encode the data however they choose, which is often unreadable by anyone else and is proprietary. A file can also be further encrypted by a password to limit use, even with the right software.

I recently had one of the numerous fans of this blog reach out and ask about the post I made on the software Student Writing Center. They had a bunch of journal files from their youth and couldn’t find a way to read these older files. I offered my help as I still have the software and a nice emulator to run the old software.

As I was going through and converting the journal entries into a PDF. I came across a few which asked for a password to open. You can see below the explanation from the help menu confirms the file format is a proprietary format only readable by their software and the password feature is to further protect the content.

Finding a few of the journal documents password protected was frustrating at first. I was converting some documents that are over 26 years years old, I doubted the password would be remembered. When I asked, they gave me a couple passwords to try, but nothing worked. But I don’t give up that easily!

My first thought was to take all the text from the other journal entries and make a dictionary and then use it to try and brute-force the password. There are some great tools to do this like hashcat. With tools like this, you need to retrieve a hash of the password. This is an encrypted sequence of the password stored in the file. So the first step was to find where the password was stored in the file. Since I have the software and can make new password protected files using a password of my choice this proved a simple task. Create two identical files, add a password to each but different. Then compare the two files in a hex editor to find the difference.

There it is. The password field in the software only let me put in 10 characters and these 10 bytes lit up when I ran a difference between the two files. I went to check the files given to me which also had password protection and found they also had a similar pattern. In fact I noticed from a few checks that the passwords I used also had a pattern in the file.

For this file I used the number “1” ten times. In that same location it repeated the same byte value”85″, 10 times. After a couple more tests I could see this wasn’t an algorithm I need to crack, but a simple replacement. I created a few more files using all the letters in the alphabet and all the numbers and came up with a substitution cypher.

Obviously the passwords used in the documents I was trying to open didn’t all use the full 10 characters, but the password was always preceded by the values “00” and had the values “1A46461A” after the password. The byte prior to the “00” indicates the length of the password. From there I just needed to decode the bytes between those two offsets.

So for this file with an 8 byte sequence “90D54F4FA3FBBA94” decodes to: password. How cool is that? To make things even easier, the passwords used in Student Writing Center are not case sensitive. There are additional values for symbols. You can see the entire substitution list here.

One other thing related to identification. Would it be important to identify a password protected file differently than a regular file? At offset 0xDA there seems to be a indicator that the file is password protected. “00” if not “01” if protected.

What do you think? Should this property be identified as a separate file format from a regular file or is this property something that should be gathered using additional tools that can gather additional properties from a file like this?

Speaking of additional tools. There is a pretty cool project called the Import library for legacy Mac documents or libmwaw which claims to have support for Student Center Writing documents and a lot more. It indeed does, but not the journal format, only the main letter format. I bet it wouldn’t take much to add the journal format to the library, something I will look into.

Microstation

I recently was able to image a few Bernoulli Disks for a collection using a SCSI device I have found quite useful. The disks had been sitting around for quite some time waiting for the right tools and resources to extract the contents. I mentioned the accomplishment to a few coworkers and one asked me if I would extract the contents from their old disk they used for school back in the 1990’s. They had spent a whopping $99 at the local bookstore for a disk which held a total of 150MB. Not GB’s like we are used to now, but megabytes. I have some camera’s which takes RAW photos larger than then would fit on one disk. Once I had the data extracted from their disk, I took a look at the contents. There was a few file formats on the disk I was unfamiliar with. A quick scan with DROID revealed some matches and a few problems.

Turns out the data were files written by an old version of Bentley Microstation. The files dated from late 1995 and the disk was formatted for FAT16 which leans more to being used in a DOS system, but could have been used with the newly released Windows 95. The Bentley Microstation 95 software wasn’t released until November of 1995, so my guess is these Microstation files where created with the Microstation version 5 for DOS.

disktype HD6_imaged-004.hda 

Regular file, size 144.0 MiB (150998016 bytes)
No type and creator code
DOS/MBR partition map
Partition 4: 144.0 MiB (150978560 bytes, 294880 sectors from 32, bootable)
Type 0x06 (FAT16)
FAT16 file system (hints score 5 of 5)
Volume size 143.8 MiB (150810624 bytes, 36819 clusters of 4 KiB)
Volume name "ode 009 - I"

PRONOM has a few entries for the Microstation software:

PUIDFormat NameFormat NameExtension
x-fmt/346Microstation CAD Drawing95DGN
fmt/502Bentley V8 DGNDGN
fmt/1626MicroStation Symbology Resource FileRSC
fmt/1549Bentley Microstation Hidden Line FileHLN
fmt/1358MicroStation Base FileBSE
fmt/1183MicroStation Material PalettePAL
fmt/1177MicroStation Material LibraryMAT

The files found on this old Bernoulli disk gave varied results in identification. Most of the DGN files give me this multiple Identifications in DROID.

A little digging and we can learn a bit about the major formats. Integraph and Bentley used a Binary version of their drawing format, DGN, from versions 2 until 7, spanning 1987 to 2001, with the release of version 8, they made a major change to the format. Version 8 use the Microsoft OLE2 container to enhance the format allowing it to hold multiple drawings and more information about the model. With this change, the format became proprietary. Sure, they started an OpenDGN program to make the format more compatible with other systems, but required you to sign an NDA in order to get a copy of the format specifications. You had to request access and sign an NDA, which doesn’t sound “open” to me. You can read another file format researchers thoughts on this on her blog.

So I know many of these files are not Version 8 of the DGN format as they are not OLE2 containers, but the other issue is that x-fmt/346 for the Microstation CAD drawing 95 is an outline record. It has no signature. So DROID is guessing based on extension only. We need to dig deeper.

I noticed than many of the DGN files in my sample set also identified as a “Microstation Hidden Line File”, but instead of a HLN extension, they use DGN.

sf samp15.dgn 

filename : 'samp15.dgn'
filesize : 359424
modified : 1998-09-01T12:31:52-06:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/1549'
format : 'Bentley Microstation Hidden Line File'
version :
mime :
class : 'Model'
basis : 'byte match at [[0 3] [359422 2]]'
warning : 'extension mismatch'
hexdump -C samp15.dgn | head
00000000 08 09 fe 02 01 08 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 20 00 c8 45 |............ ..E|
00000020 00 00 00 00 00 00 00 00 40 06 0c 00 01 05 dc a0 |........@.......|
00000030 ff ff ff ff ff ff ff ff b5 8b 9f 63 b9 88 85 a7 |...........c....|
00000040 00 00 00 00 19 00 b4 86 13 00 fe be 00 00 00 00 |................|
00000050 80 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.@..............|
00000060 00 00 00 00 00 00 00 00 80 40 00 00 00 00 00 00 |.........@......|

hexdump -C samp7.dgn | head
00000000 c8 09 fe 02 01 08 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 04 7a 45 |..............zE|
00000020 00 00 00 00 00 00 00 00 e8 03 0a 00 01 05 fc b0 |................|
00000030 ff ff ff ff ff ff ff ff 0d 00 9d b5 0c 00 74 93 |..............t.|
00000040 ff ff a6 fd 09 00 40 11 05 00 50 aa 00 00 e5 f8 |......@...P.....|
00000050 80 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.@..............|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Looking at a couple files in the same sample set, some use the header “08 09 fe 02 01 08 00 00” while another uses “c8 09 fe 02 01 08 00 00”. This is why samp15.dgn identifies as an HLN files as the signature matches, while samp7.dgn uses “C8” instead of “08” making it not identify as an HLN file. What is the difference and what is an HLN file?

First let’s define an HLN file. The name of the format is “Hidden Line File”, although most references refer to it as a “Visible Edges File“. Confusing, but the definition is: “a 2D or 3D DGN file that contains the edges visible in a 3D view (that is, with those edges that would be hidden, removed).”

Looking at a couple HLN files, we can see the format is the same as DGN files:

hexdump -C test-2d.hln | head
00000000 08 09 fe 02 08 01 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 20 00 7a 45 |............ .zE|
00000020 00 00 00 00 00 00 00 00 e8 03 0a 00 00 05 fc b2 |................|
00000030 ff ff ff ff ff ff ff ff ff ff 5b f5 ff ff fe f9 |..........[.....|
00000040 00 00 00 00 01 00 d3 cb 01 00 36 2a 00 00 e8 03 |..........6*....|
00000050 80 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.@..............|
00000060 00 00 00 00 00 00 00 00 80 40 00 00 00 00 00 00 |.........@......|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

hexdump -C test-3d.hln | head
00000000 c8 09 fe 02 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 20 00 7a 45 |............ .zE|
00000020 00 00 00 00 00 00 00 00 e8 03 0a 00 00 05 fc b2 |................|
00000030 ff ff ff ff ff ff ff ff ff ff 5b f5 ff ff fe f9 |..........[.....|
00000040 ff ff 0c fe 01 00 d3 cb 01 00 36 2a 00 00 e8 03 |..........6*....|
00000050 80 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.@..............|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 80 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.@..............|

Same difference between the two previous files. These two files also explain the difference between the “08” and the “c8” values. Microstation uses the first to indicate it is a 2D file and the latter to indicate a 3D file. The DGN format has been documented in libdgn and this distinction is referenced.

This presents a problem with the current PRONOM identification.

filename : 'MS95-2D.dgn'
filesize : 12288
modified : 2025-06-05T21:13:52-06:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/1549'
format : 'Bentley Microstation Hidden Line File'
version :
mime :
class : 'Model'
basis : 'byte match at [[0 3] [12286 2]]'
warning : 'extension mismatch'

filename : 'MS95-3D.dgn'
filesize : 12800
modified : 2025-06-05T21:14:00-06:00
errors :
matches :
- ns : 'pronom'
id : 'x-fmt/346'
format : 'Microstation CAD Drawing'
version : '95'
mime :
class :
basis : 'extension match dgn'
warning : 'match on extension only'

The 2D files mis-identify as Hidden Line Files and the 3D files are identified through extension only. We learned from a previous test that Hidden Line Files can be both 2D and 3D and are the same format as DGN, so a separate identification PUID is unnecessary, but the x-fmt/346 identification doesn’t have a signatures, so a few things need to change.

The other issue is a Hidden Line File is also available in version 8+.

filename : 'Microstationv8-s01.hln'
filesize : 7168
modified : 2025-06-05T19:48:09-06:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/502'
format : 'Bentley V8 DGN'
version :
mime :
class : 'Image (Vector)'
basis : 'container name Dgn~H with name only'
warning : 'extension mismatch'

They also identify as Bentley V8 DGN files, but with an extension mismatch. This should be easy to remedy with the addition of the extension HLN to the signature. The container signature seems to work well, no need to change anything.

My suggestions to fix these issues would be:

  • Depreciate x-fmt/346
  • Change name of fmt/1549 from “Bentley Microstation Hidden Line File” to “Microstation CAD Drawing” and use the version 2-7 to distinguish from v8
  • Change the signature for fmt/1549 from “0809FE” to “(08|C8)09FE02” no EOF of “FFFF”

The other option would be to make fmt/1549 the 2D drawing format and x-fmt/346 could be used for the 3D drawing format. What do you think?

I have uploaded a few samples to my GitHub page. Curious if your examples of DGN files match what I am seeing. There are a few other related formats that will need to be explored, but this should help for now.

SCP

If you have been following previous posts about Floppy disk flux captures, you may have read about the HFE or A2R flux image formats. Both very useful in the preservation, archiving and emulation of old software and games stored on decaying and copy-protected floppy disks. I also built a Fluxengine which has come in handy more than once. It captures flux data in its own FLUX format. At work I also have access to a Kryoflux board which captures in separate RAW tracks.

Today we are looking at the SCP format. I recently purchased a Greaseweazle for personal use and the main format used while capturing raw flux data is SCP. It works a little better on my older MacBook Pro than the fluxengine and I wanted to have another option for capturing flux data. So far it has worked really well. Of course I wanted to know everything I could about the SCP format so the first thing I did was run Siegfried against a file.

filename : 'unknown.scp'
filesize : 47017278
modified : 2025-06-14T19:09:58-06:00
errors :
matches :
- ns : 'pronom'
id : 'UNKNOWN'
format :
version :
mime :
class :
basis :
warning : 'no match'
- ns : 'wikidata'
id : 'Q29000565'
format : 'SuperCard Pro dump'
URI : 'http://www.wikidata.org/entity/Q29000565'
permalink : 'https://www.wikidata.org/w/index.php?oldid=1866792367&title=Q29000565'
mime : 'application/octet-stream'
basis : 'extension match scp; byte match at 0, 3 (Wikidata reference is empty)'

Looks like Wikidata has a signature pattern, but PRONOM does not. Lets take a look and see how difficult it might be.

hexdump -C unknown.scp | head
00000000 53 43 50 00 80 03 00 a3 23 00 00 00 d2 0f 26 99 |SCP.....#.....&.|
00000010 b0 02 00 00 14 43 04 00 c6 96 08 00 64 78 0d 00 |.....C......dx..|
00000020 ea bb 12 00 de 37 16 00 a2 b3 19 00 26 68 1e 00 |.....7......&h..|
00000030 42 b7 23 00 2a 33 27 00 c8 ae 2a 00 a8 54 2f 00 |B.#.*3'...*..T/.|
00000040 fc 94 34 00 e2 10 38 00 a8 8c 3b 00 98 68 40 00 |..4...8...;..h@.|
00000050 1c b6 45 00 14 32 49 00 cc ad 4c 00 9e 9b 51 00 |..E..2I...L...Q.|
00000060 0e d3 56 00 de 4e 5a 00 74 ca 5d 00 be 7b 62 00 |..V..NZ.t.]..{b.|
00000070 b4 b3 67 00 a8 2f 6b 00 68 ab 6e 00 50 88 73 00 |..g../k.h.n.P.s.|
00000080 0c ce 78 00 02 4a 7c 00 ae c5 7f 00 96 bd 84 00 |..x..J|.........|
00000090 8a 2d 8a 00 8a a9 8d 00 56 25 91 00 b6 a3 95 00 |.-......V%......|

Well, probably not hard at all. I love easy well understood headers. But only three bytes can have issues, lets look a little closer at the published specification. Before we dive into the spec, it might be good to note a few things. The SCP image format was developed for another hobby board. A Supercard Pro, is a custom board to connect a floppy drive through USB to software which can also capture flux data and help interpret the data to a image format which can be used to write back to a floppy or used in an emulator. The software is Windows only so those on Linux or MacOS can’t use it, but since the specification was made public, many other boards and tools can read and write to the format. Even though it is open, I worry about preserving the spec. When you try and ensure it is saved in the WayBackMachine you get this fun page.

This sorry page is usually found when the owner of a URL has asked specifically for their domain to be excluded from the web archive. This worries me as I have found many specifications have been lost to time. I would love to know why the owner has chosen to do this, but it is available now, so lets dive in. The versions appear to have started in 2014, but the page is copyright 2012, so I assume the format was created around this time. It was last updated in February of 2024, so is pretty up-to-date. One important update was made in 2021:

v2.3 - 06/03/21

* Added additional FLAG bit (bit 7) to identify a 3rd party flux creator. PLEASE
SET THIS BIT IF YOU ARE A 3RD PARTY DEVELOPER USING THE SCP FORMAT!

This update to version 2.3 added a bit to indicate the 3rd party flux creator. This means a board like the Greaseweazle will indicate its software as the creator instead of a SCP created by SuperCard Pro.

The header of an SCP file is comprised of a few bytes, not just the ASCII “SCP”.

All offsets are the start of the file (byte 0) unless otherwise stated.  The .scp image
consists of a disk definition header, the track data header offset table, and the flux
data for each track (preceeded by Track Data Header). The image file format is described
below:

BYTES 0x00-0x02 contains the ASCII of "SCP" as the first 3 bytes. If this is not found,
then the file is not ours.

With Byte 0x03, we will see the version of the software which created the SCP. In my sample, created by my Greaseweazle, did not add a number here, only “00”. Byte 0x04 is the disk type, there is some set definitions in the spec for this byte. My test sample uses “80”, but not sure what that represents. Bytes 5-7 are used for other disk information, but byte 8 is where we find the flags which include a bit for flux creator. My sample has the value “23”, but since we are looking at the individual bit level, the value will be a combination of all the bits in the flag area. The individual bits are, “00100011”, so since the seventh bit is set, then the SCP was created by 3rd party which is correct.

So the only reliable static data in the header will be those first 3 bytes. There is some bytes later in the file which should be static. That is the start of the Tracks, which include a Track Data Header. We can see from the spec, the last byte in the main header is 0x2AF, which makes the main header 687 bytes long. Starting on the 688 byte, or 0x2B0 is the ASCII string TRK. Adding these 3 bytes should make for a nice signature.

000002b0  54 52 4b 00 a9 86 65 00  5e b5 00 00 28 00 00 00  |TRK...e.^...(...|
000002c0 ab 86 65 00 60 b5 00 00 e4 6a 01 00 56 87 65 00 |..e.`....j..V.e.|
000002d0 60 b5 00 00 a4 d5 02 00 00 39 00 7e 00 7c 00 ce |`........9.~.|..|
000002e0 00 c7 00 c7 00 cd 00 7e 00 7c 00 eb 00 4f 00 60 |.......~.|...O.`|
000002f0 00 39 00 77 00 cd 00 7c 00 7f 00 ce 00 c7 00 c6 |.9.w...|........|
00000300 00 ce 00 7a 00 80 00 cd 00 c8 00 c6 00 ce 00 7b |...z...........{|

We could use the TRK string for identification, but looking further into the spec, we can also see the SCP format may contain a footer.

; ------------------------------------------------------------------
; EXTENSION FOOTER FORMAT
; ------------------------------------------------------------------
;
; 0000 DRIVE MANUFACTURER STRING OFFSET - 4 bytes
; 0004 DRIVE MODEL STRING OFFSET - 4 bytes
; 0008 DRIVE SERIAL NUMBER STRING OFFSET - 4 bytes
; 000C CREATOR STRING OFFSET - 4 bytes
; 0010 APPLICATION NAME STRING OFFSET - 4 bytes
; 0014 COMMENTS STRING OFFSET - 4 bytes
; 0018 IMAGE CREATION TIMESTAMP - 8 bytes
; 0020 IMAGE MODIFICATION TIMESTAMP - 8 bytes
; 0028 APPLICATION VERSION (nibbles major/minor) - 1 byte
; 0029 SCP HARDWARE VERSION (nibbles major/minor) - 1 byte
; 002A SCP FIRMWARE VERSION (nibbles major/minor) - 1 byte
; 002B IMAGE FORMAT REVISION (nibbles major/minor) - 1 byte
; 002C 'FPCS' (ASCII CHARS) - 4 bytes

Here is the tail of my sample file, you can see it contains the ASCII characters listed here for the last four bytes. It also contains an application string, indicating the Greaseweazle software used to create the file. All every helpful information. We can also see on the 5th to last byte the value “24”, this indicates the file format version being used. Version 2.4 being used in this file but we know 2.5 is the latest. I wonder if it would be valuable to have separate identification for version 1 and 2 of the format? Could also consider assigning version 2.3 and 2.4 as unique as they will have the additional 3rd party information.

hexdump -C unknown.scp | tail
02cd6cb0 00 85 00 5a 00 39 00 90 00 75 00 8e 00 42 00 3c |...Z.9...u...B.<|
02cd6cc0 00 78 00 2e 00 42 00 3a 00 47 00 78 00 42 00 46 |.x...B.:.G.x.B.F|
02cd6cd0 00 33 00 52 00 29 00 3a 00 55 00 5d 00 5b 00 54 |.3.R.).:.U.].[.T|
02cd6ce0 00 35 00 e0 00 48 00 91 00 75 00 3a 00 36 00 33 |.5...H...u.:.6.3|
02cd6cf0 00 55 02 03 01 d3 00 33 00 58 11 00 47 72 65 61 |.U.....3.X..Grea|
02cd6d00 73 65 77 65 61 7a 6c 65 20 31 2e 32 32 00 00 00 |seweazle 1.22...|
02cd6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 6c |...............l|
02cd6d20 cd 02 00 00 00 00 66 1d 4e 68 00 00 00 00 66 1d |......f.Nh....f.|
02cd6d30 4e 68 00 00 00 00 00 00 00 24 46 50 43 53 |Nh.......$FPCS|

So maybe we don’t need the TRK header in our signature, just the first 3 bytes and last 4 bytes. I believe this should allow for proper identification, while avoiding false positives.

I have a proposal for a PRONOM signature and a sample file on my Github page. Other samples files can be found all over the interwebs, with many on archive.org.

DaVinci Resolve

A previous post was about LUTs, the little files needed to color grade your photo’s and video’s. One of the best systems for color grading video in use by professionals today is DaVinci Resolve. The system originally was all hardware based, but in the 2004 as computers were able to process higher quality video, da Vinci Systems released new digital systems.

Like most professional multimedia editing software, projects are used to manage work and DaVinci Resolve is no different. Projects are generally where all the settings for the project are stored, but don’t generally store the actual media used in the project. Project files are often XML with unique schema’s, but other pack a little more into the project file.

hexdump -C project.drp | head
00000000 50 4b 03 04 14 00 08 00 08 00 f2 54 90 5a ef 18 |PK.........T.Z..|
00000010 b0 25 47 0c 00 00 db 1b 00 00 0b 00 00 00 70 72 |.%G...........pr|
00000020 6f 6a 65 63 74 2e 78 6d 6c 9d 58 d9 72 5b 37 12 |oject.xml.X.r[7.|
00000030 7d cf 57 68 f4 7e 4d ec 4b 8a 51 ca b1 92 89 aa |}.Wh.~M.K.Q.....|
00000040 2c db 65 29 79 9d 6a 00 0d 85 09 45 aa 48 4a 71 |,.e)y.j....E.HJq|
00000050 fe 7e 0e ee 42 51 94 9c 68 c6 29 85 17 0d a0 d1 |.~..BQ..h.).....|
00000060 e8 3e bd 61 fe fd 97 db e5 c9 03 6f b6 8b f5 ea |.>.a.......o....|
00000070 bb 53 f9 46 9c 9e f0 2a af cb 62 75 f3 dd e9 2f |.S.F...*..bu.../|
00000080 d7 3f 75 e1 f4 fb b3 6f e6 ff ea ba f3 f4 f6 ee |.?u....o........|
00000090 ee 57 de 60 55 7c 23 df 98 37 42 48 79 7a 72 9e |.W.`U|#..7BHyzr.|

DaVinci Resolve keeps all projects in a database, but you can export them to a project file. A DaVinci Resolve Project file uses a ZIP container to store all the project settings in one file. Let’s see what also might be inside.

Path = project.drp
Type = zip
Physical Size = 543860

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2018-02-27 20:25:08 ..... 1010030 287793 project.xml
2018-02-27 20:25:08 ..... 21173 6856 MediaPool/Master/000_Timelines/MpFolder.xml
2018-02-27 20:25:08 ..... 492690 28067 MediaPool/Master/001_Audio/MpFolder.xml
2018-02-27 20:25:08 ..... 20177 3588 MediaPool/Master/002_gfx/MpFolder.xml
2018-02-27 20:25:08 ..... 11025 2611 MediaPool/Master/003_VO/MpFolder.xml
2018-02-27 20:25:08 ..... 98309 7042 MediaPool/Master/004_ScreenCaptures Consolidated/MpFolder.xml
2018-02-27 20:25:08 ..... 1278493 66424 MediaPool/Master/005_Video H264/MpFolder.xml
2018-02-27 20:25:08 ..... 1995 748 MediaPool/Master/MpFolder.xml
2018-02-27 20:25:08 ..... 1638204 137086 SeqContainer/909a0a2c-4183-4310-9f78-6e15c3c59cb4.xml
2018-02-27 20:25:08 ..... 8806 1169 Gallery.xml
2018-02-27 20:25:08 ..... 12697 696 media.dat
------------------- ----- ------------ ------------ ------------------------
2018-02-27 20:25:08 4593599 542080 11 files

Looks like a lot of XML! The consistent XML in all the DRP files is the apply named “project.xml” along with “Gallery.xml”.

cat project.xml | head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="19.1.4.0011" DbPrjVer="14"-->
<SM_Project DbId="db65f2ee-2bff-41cd-b478-f96c26e9609f">
<FieldsBlob>000000010000000700000026005400650078007400520065006e006400650072004900740065006d005600650063004200410000000c00ffffffff0000002400520065006e0064006500720043006100630068006500560065007200730069006f006e0000000200000000010000001e00500072006f006a00650063007400460065006100740075007200650073000000050000000000000000010000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000030000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800330033003400320034003300380036002d0034006400330030002d0034003600610035002d0061006100340033002d006100330035003200620066006500370038003200640063000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
<LockId/>
<User>86f03abc-9354-47d9-9006-a55b6b1d49cf</User>
<Folder/>
<UserId>-1</UserId>
<SysId>6CB133A11B81</SysId>
<ProjectId>0</ProjectId>

It appears the version of DaVinci Resolve is pretty important. If you try and open a DRP file without using the most up-to-date software you might run into problems. From what I can see, every time a new major version is released, the updates to the XML cause the project error when imported. So knowing the version of the DRP file can be a critical piece of metadata needed in understanding the format. There are some helpful apps created by DaVinci Resolve users you can try, or you can try a little python script to report back the version used in a DRP or whole folder of DRP files.

There is one other file used by the DaVinci Resolve software. It uses the DRT extension and is for exporting and importing single timelines to the software. Like a DRP it is a simple project file that only points to the media used in the project and only stores the settings needed.

Path = timeline.drt
Type = zip
Physical Size = 215159

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2021-04-21 21:16:42 ..... 45726 8888 project.xml
2021-04-21 21:16:42 ..... 670306 198698 MediaPool/Master/MpFolder.xml
2021-04-21 21:16:42 ..... 98268 7089 SeqContainer/7eb849f3-41cb-4e3f-baa8-d5b134b57aa7.xml
------------------- ----- ------------ ------------ ------------------------
2021-04-21 21:16:42 814300 214675 3 files

This DRT file also has a project.xml file, but doesn’t have the Gallery.xml file we normally find in a DRP file. We can use this to distinguish the difference. The project.xml is the same as the DRP, so this distinction is important.

cat project.xml |head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="17.1.1.0009" DbPrjVer="10"-->
<SM_Project DbId="ec6cb2e2-0b3c-43b8-8f90-a5fcb973af3b">
<FieldsBlob>00000001000000040000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000020000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800660030003800380038003300390038002d0066006400620037002d0034006300320036002d0061003700310032002d003300360038006200300036003300300065003400330031000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
<LockId/>
<User>04d71873-a504-40c6-bde5-41709691a2c9</User>
<Folder/>
<UserId>-1</UserId>
<SysId>94F6D6F3F60F</SysId>
<ProjectId>0</ProjectId>

In both formats they use the XML root tag of “SM_Project”, this can also be used to define a signature for the two formats as “project.xml” could be used with a different format and we don’t want there to be a false identification.

I was able to trace back the use of the DRP format back to DaVinci Resolve version 9. In version 8, it appears projects are exported using the name and extension, “Default Project.resolve.zip”. From what I could find, DaVinci Resolve version 9 was a big re-write and so it makes sense to settle on more useful extension. The project.xml file in a version 8 format is slightly different.

cat project.xml | head
<SM_Project DbId="9ba0c4dc-d99c-4b7f-b0da-d254d91e34e2" DbAppVer="8.2 (#153)">
<LockId></LockId>
<User>159415b8-7515-43bf-b5f5-00d98949434b</User>
<UserId>-1</UserId>
<SysId>7cd1c388ea29</SysId>
<ProjectId>0</ProjectId>
<RevivalTaskSetID>-1</RevivalTaskSetID>
<PlayHeadsSplitDisplay>false</PlayHeadsSplitDisplay>
<pGallery>
<Gallery::GyGallery DbId="9884d8ff-096e-4df0-b833-0e75e6e07e15">

Still uses the “SM_Project” root tag, but displays the DbAppVer information differently. It would be good to find more examples of the version 8 and earlier to see how this format has evolved over time. For now, I have created a signature you can test if you happen to have any DRP files in your archive.

Scrivener

Word Processors are everywhere and have some of the most recognizable file formats. Some are very simple in that they just contain plain text, others are more complex. There are formats which allow for images and others which can handle different languages and writing directions.

A writing platform I recently learned about is called Scrivener. It was first released in 2007 by a company called Literature & Latte Ltd, and has a Macintosh and Windows version. The software is marketed toward writers as there is some features that help with note taking, research and much more. It also allows for adding multimedia and even full webpages.

This is accomplished by a file format which uses a non-traditional method for storing all the data needed to render the format.

tree Scrivener3-s01.scriv
Scrivener3-s01.scriv
├── Files
│   ├── Data
│   │   ├── 921B4A08-54C0-4B69-94FD-428F56FDAB89
│   │   │   └── content.rtf
│   │   └── docs.checksum
│   ├── binder.autosave
│   ├── binder.backup
│   ├── search.indexes
│   ├── styles.xml
│   ├── version.txt
│   └── writing.history
├── Scrivener3-s01.scrivx
└── Settings
├── recents.txt
├── ui-common.xml
└── ui.ini

Scrivener uses a folder structure to store all the data used in the format. The folder has an extension, .scriv. The format includes some rich text, backups, indexes, version history and more. One unique format within the folder is an XML file with the extension .scrivx. This makes the format proprietary and can only be rendered using the Scrivener software.

cat Scrivener3-s01.scrivx | head
<?xml version="1.0" encoding="UTF-8"?>
<ScrivenerProject Template="No" Version="2.0" Identifier="DF5DA7F0-27DB-4815-A050-B4D6F23CABA7" Creator="SCRWIN-3.1.5.1" Device="DESKTOP-JMM4K7M" Modified="2025-03-14 22:15:28 -0600" ModID="B4A944C3-FF79-49F6-A737-158BEB4E58BB">
<Binder>
<BinderItem UUID="17807D28-117A-409E-B12D-B34922B6CC6F" Type="DraftFolder" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:17 -0600">
<Title>Draft</Title>
<MetaData>
<IncludeInCompile>Yes</IncludeInCompile>
</MetaData>
<Children>
<BinderItem UUID="921B4A08-54C0-4B69-94FD-428F56FDAB89" Type="Text" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:23 -0600">

The XML has enough to be able to identify them apart from other XML files. The signature would be straight forward. Earlier versions of Scrivener sometimes have the SCRIVX file but also sometimes has a
.scrivproj extension. This file on a Macintosh is in a Binary plist format, which is different than earlier Windows versions. Seems they may have unified them under version 2 or 3, where version 1 & 2 for Windows uses Project version 1 and version 3 uses project version 2.

hexdump -C Scrivener1-s01.scriv/binder.scrivproj | head
00000000 62 70 6c 69 73 74 30 30 d4 00 01 00 02 00 03 00 |bplist00........|
00000010 04 00 05 00 1d 01 d8 01 d9 54 24 74 6f 70 58 24 |.........T$topX$|
00000020 6f 62 6a 65 63 74 73 58 24 76 65 72 73 69 6f 6e |objectsX$version|
00000030 59 24 61 72 63 68 69 76 65 72 dc 00 06 00 07 00 |Y$archiver......|
00000040 08 00 09 00 0a 00 0b 00 0c 00 0d 00 0e 00 0f 00 |................|
00000050 10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00 |................|
00000060 18 00 19 00 1a 00 15 00 1b 00 1c 5a 4c 61 62 65 |...........ZLabe|
00000070 6c 54 69 74 6c 65 59 4c 61 62 65 6c 4c 69 73 74 |lTitleYLabelList|
00000080 5e 42 69 6e 64 65 72 43 6f 6e 74 65 6e 74 73 5f |^BinderContents_|
00000090 10 0f 44 65 66 61 75 6c 74 4c 61 62 65 6c 54 61 |..DefaultLabelTa|

Since the developers of Scrivener decided to make the SCRIV format simply a folder with different content within, something special happens on the MacOS. The Scrivener software registers all the extensions is uses with the MacOS launch services. This process then changes the way the SCRIV folder is displayed in the MacOS Finder. They now appears as a single file and given a file type. This is called a Document Package format.

By right-clicking on the “file” you can then browse the package contents. There is nothing in the folder itself or hidden in any attributes which causes this to happen, it is all controlled by what extensions have been registered with the launch services database. We can however ask the MacOS to give us some extended metadata details about the package, as long as the file is on a Apple filesystem like HFS or APFS.

mdls Scrivener3-s01.scriv 
_kMDItemDisplayNameWithExtensions = "Scrivener3-s01.scriv"
kMDItemContentCreationDate = 2025-03-15 04:15:17 +0000
kMDItemContentCreationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentModificationDate = 2025-03-15 04:15:18 +0000
kMDItemContentModificationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentType = "com.literatureandlatte.scrivener3.scriv"
kMDItemContentTypeTree = (
"com.literatureandlatte.scrivener3.scriv",
"public.directory",
"public.item",
"com.apple.package",
"public.content",
"public.composite-content"
)
kMDItemDateAdded = 2025-03-21 04:38:48 +0000
kMDItemDateAdded_Ranking = 2025-03-21 00:00:00 +0000
kMDItemDisplayName = "Scrivener3-s01.scriv"
kMDItemDocumentIdentifier = 0
kMDItemFSContentChangeDate = 2025-03-15 04:15:18 +0000
kMDItemFSCreationDate = 2025-03-15 04:15:17 +0000
kMDItemFSCreatorCode = ""
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = (null)
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = (null)
kMDItemFSLabel = 0
kMDItemFSName = "Scrivener3-s01.scriv"
kMDItemFSNodeCount = 3
kMDItemFSOwnerGroupID = 20
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 31155
kMDItemFSTypeCode = ""
kMDItemInterestingDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemKind = "Scrivener Project"
kMDItemLogicalSize = 31155
kMDItemPhysicalSize = 69632

There is a lot of additional details available using the MDLS command, this includes the content type of “com.apple.package“. This tools works with any files in MacOS and can be a very useful tool in getting all the information you may need for preservation needs.

Until the tools we use for format identification can recognize package formats, tools like this may be needed to gather the neccessary metadata for preservation. But in the meantime, identification of the package content is the best we can hope for. Creating a signature for the XML based SCRIVX format is the first step.

Stay tuned for more on the package format as I will be bring it up more in the Digital Preservation community.

LUTS

If you are looking for LUTs, you’re in luck. There is a website for sharing your FreshLUTs. Even though they are fresh, they are probably not as exciting as one might think.

LUTs are short for Look-Up Tables, which doesn’t sound as exciting as you were probably hoping. They are a pretty interesting process for dealing with color in high end Image and Video processing applications. Often called 3D Look-up Tables, they are used for color grading, an essential step in film production and restoration to map from one color space to another. LUTs are not to be confused with ICC profiles which aim for color accuracy, while LUTs are looking for more color quality and aesthetics.

There are a lot of LUT formats out there, it seems. In looking into this format, I have found dozens of others to investigate, but today lets look at the four available as an export from Photoshop.

Above you can see a simple screenshot for the export of different formats from Adobe Photoshop. Adobe is one of the biggest developer and supporter of the formats used in LUTs, but there are many other graphics tools which create and support LUTs. In this Photoshop export we can see four formats included in the export. Lets take a look at each of these.

ICC Profiles are well documented and available for identification in PRONOM.

filename : 'LUTs-Export-s01.icc'
filesize : 197024
modified : 2025-02-25T09:37:24-07:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/1975'
format : 'ICC Profile'
version : '2'
mime : 'application/vnd.iccprofile'
class : 'Dataset'
basis : 'extension match icc; byte match at 8, 32'

But the other three are plain text files and still identify as such. Let us start with the CUBE format.

filename : 'LUTs-Export-s01.cube'
filesize : 884963
modified : 2025-02-25T09:37:24-07:00
errors :
matches :
- ns : 'pronom'
id : 'x-fmt/111'
format : 'Plain Text File'
version :
mime : 'text/plain'
class :
basis : 'text match ASCII'
warning : 'match on text only; extension mismatch'

cat LUTs-Export-s01.cube
#Created by: Adobe Photoshop Export Color Lookup Plugin
#Copyright: (C) Copyright 2025 ObsoleteThor
TITLE "LUT-export-s01"

#LUT size
LUT_3D_SIZE 32

#data domain
DOMAIN_MIN 0.0 0.0 0.0
DOMAIN_MAX 1.0 1.0 1.0

#LUT data points
0.000000 0.000000 0.000000

The CUBE format was first developed by IRIDAS in 2003 as a answer to ensure interoperability with other software. Adobe acquired IRIDAS in 2011 in a effort to be a leader in the color grading and enhancement market. They have published the CUBE specifications for version 1.0 in 2013.

A Cube file is a text file that defines a look-up table in the Cube format.
The Cube look-up tables store RGB values.
Advantages of the Cube format include:
  • The Cube format can describe look-up tables for a wide range of purposes, from simple gamma adjustments for display output to complex HDR image processing.
  • The format is well suited for professional digital cinema applications and for both normal range and High-Dynamic Range image processing.
  • As Cube files are text files, they are easily edited or reviewed using a text editor.
  • A Cube file can include three 1-dimensional tables or one 3-dimensional table.
  • The tables can be in a wide range of sizes.
  • Cube files are trivial to write and read.
  • All values are human-readable as they are in decimal form, and can be of high precision.
  • The input domain and output range are not limited to the range 0.0 to 1.0.

According to the specifications, a CUBE file can be a One-Dimensional Cube file or a Three-Dimensional Cube file. From the example above you can see the file is a Three-Dimensional file with the required line “LUT_3D_SIZE“. But in a One-Dimensional file, the required line is “LUT_1D_SIZE“.

cat Demo.cube
TITLE "Demo"
LUT_1D_SIZE 3
DOMAIN_MIN 0 0 0
DOMAIN_MAX 1 2 3
0 0 0
# Comments can go anywhere
0.5 1 1.5
1 1 1

Each CUBE file has one or the other and should be an easy string to look for. It is in a variable position as there can be comments before the required line and also may have a TITLE line. The TITLE and DOMAIN lines are common to every file but not required.

Now, the CUBE format is a bit different depending on the source. They all seem to have the same header, but different elements. It seems the IRIDAS Cube format is the most interoperable. The Truelight Cube format generally has the CUB extension, and the Cinespace Cube has the CSP extension, which will look at next/ You can read more about the differences on this format comparison table. This LUTCalc web site has many different types of Cube’s it can output, so there are some differences.

The other file format available in the export is a CSP. The CSP is also a plain text file, often called a cineSpace LUT file. This format come from the cineSpace software, a color management software for the film and television industry.

cat LUTS-s01.csp 
CSPLUTV100
3D

BEGIN METADATA
#Created by: Adobe Photoshop Export Color Lookup Plugin
TITLE "LUTS"
END METADATA

2
0.0 1.0
0.0 1.0
2
0.0 1.0
0.0 1.0
2
0.0 1.0
0.0 1.0

32 32 32
0.000000 0.000000 0.000000

The CSP File Format specifications outlines header and the other two sections.:

The cineSpace LUT format contains three main sections.
Header
This section contains the LUT identifier and the LUT type, 3D or 1D.
It is made up of the first two (2) valid lines in the file. See Notes below for the definition of a valid line.

Examples
• (3D LUT) header:
CSPLUTV100
3D
• (1D LUT) header:
CSPLUTV100
1D

So there is a pretty obvious header to work with in identification. “CSPLUTV100” can be used to identify both 1D and 3D CSP files.

The other format available to export from Photoshop is 3DL. They seem to be connected to the Assimilate Inc. company and software. A specification has been posted, and it looks like there is only ASCII and not much in the way of a header.

cat LUTS-s01.3dl 
#Created by: Adobe Photoshop Export Color Lookup Plugin
#Description: LUTS
0 33 66 99 132 165 198 231 264 297 330 363 396 429 462 495 528 561 594 627 660 693 726 759 792 825 858 891 924 957 990 1023

It does not appear there is any headers or static strings to use for identification. The specification calls the format, 3DL ASCII format and that “All lines starting with ‘#’ are treated as comments.” Because of this, I don’t think positive identification can happen at this time.

For now I am just proposing 2 new file formats to PRONOM, The CUBE format And the CSP Format. Click on my GitHub submission page to see the signatures and enjoy some samples!

Pro Tools Sessions

One of the most important software titles related to professional audio recording and mixing is Pro Tools. The Digital Audio Workstation by Digidesign, now Avid, has been around since 1991 and was born from the very popular Sound Designer software first released in 1985. When Sound Designer II was released a few years later, the audio format used became the standard file format for audio recordings. Pro Tools progressed from there to become the industry standard for professional audio production, even winning a Technical Grammy, Emmy, and Oscar.

Pro Tools helped produce amazing music for artists such as No Doubt, Maroon 5, Ricky Martin, and many others. Obviously the best part is the final mixed audio used to make the music we love, but the work that goes into creating the audio mixes is saved in a Pro Tools session. The session is where all the magic happens. A Pro Tools session is actually a project file within a folder where all the supporting files are located.

tree PT Sample/
├── Audio Files
│   ├── GTR 1_02.wav
│   ├── GTR 1_03.wav
│   └── GTR 1_04.wav
└── Test.ptx

These Session “Folders” can get pretty complex as more audio and effects are added to the session, adding folders such as Fade Files, Rendered Files, and Plug-in settings. The current version of Pro Tools uses a project session file with the extension PTX, but that wasn’t always the case. The current version of Pro Tools can be run on Macintosh and Windows, but that also was not always the case. Because the software was originally written for Macintosh hardware, the session files were only compatible on the Macintosh file system as well.

Lets start by looking at a session from Pro Tools version 1.1 from 1991.

ls -l@ Demo Disk 1 
total 1504
-rw-r--r--@ 1 thorsted Domain Users 45056 Sep 13 1991 Backward Kick
com.apple.FinderInfo 32
com.apple.ResourceFork 1354
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 0 Sep 16 1991 Demo Session
com.apple.FinderInfo 32
com.apple.ResourceFork 13671
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 0 Sep 16 1991 Desktop
com.apple.FinderInfo 32
com.apple.ResourceFork 3081
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 339456 Sep 13 1991 Solo 1
com.apple.FinderInfo 32
com.apple.ResourceFork 2040
com.apple.provenance 11
-rw-r--r--@ 1 thorsted Domain Users 350390 Sep 13 1991 Solo 2
com.apple.FinderInfo 32
com.apple.ResourceFork 2006
com.apple.provenance 11

You might notice the “Demo Session” file is Zero Bytes, but the Resource Fork is 13671 bytes in size.

The Pro Tools Sessions from the beginning until version 5 used this method of storing the session data. ALL in the Resource Fork. Because the session data was in the resource fork and the supporting audio files were in the Sound Designer II format, which also stored important information in the resource fork, this made it impossible to use on anything but a Macintosh file system.

Version 10 of Pro Tools allows you to export the full session back into older versions of the software to version 3.2. When you choose version 5 on a Mac, it forces you to also convert the audio formats to SD2 files as well. For versions 1 & 2 of Pro Tools, there was no official extension for the session files, but starting with version 3, you might often find the extension PT3, then PT4, and PT5. With version 4, there was also a version P24 extension used when Pro Tools version 4 made the leap to 24bit. But for each of these versions identification is not possible with current preservation tools like PRONOM. You could encode the session as a MacBinary to retain everything for modern systems, which is identifiable, but you could also use my proposal for a lookup in the TCDB python tool located here.

python3 TC-lookup-draft-uni.py "PT Session 02-41.pt4"
Type Code: PT4S
Creator Code: PTul
Size of Data Fork: 0 bytes
Size of Resource Fork: 14003 bytes
Rows with Type Code b'PT4S' and Creator Code b'PTul':
Row index: 32813
File Name: Pro Tools 4
Type: PT4S
Creator: PTul
Extension: pt4
Data by Ilan Szekely, Jerusalem: nan
ExtensionVersionTypeCreator
Pro Tools 1.1mtSFTLin
Pro Tools 2PSesPTul
PT3Pro Tools 3.2PSesPTul
PT4Pro Tools 4 16bitPT4SPTul
PT24Pro Tools 4 24bitPT24PTul
PT5Pro Tools 5PT5SPTul
PTSPro Tools 5.1-6.9PTS PTul
PTFPro Tools 7-9PTF PTul
PTXPro Tools 10+PTX PTul

There isn’t a lot of information about when Pro Tools was made for Windows. I found some references to a Windows NT version of the 16bit and 24bit version 4. I did also find a copy of the free Pro Tools version 5.01 for Windows 98. In the Read Me it states:

Cross–platform File Exchange is not supported in this version of Pro Tools FREE

File interchange between Mac and PC versions of Pro Tools FREE is not possible in this 5.0.1 release. We hope to include this functionality in a future release of Pro Tools FREE.You can exchange files with Pro Tools LE and TDM users who use the same platform (Mac or Win98/Me) as you, but remember, Pro Tools FREE is limited to 8 audio and 48 MIDI tracks.

Running the software confirms the session file for this version has the extension PT5 and not the later PTS for version 5.1. This version of Pro Tools also allows you to save back to the P24 and PT4 versions, which are probably the first Windows versions. But they are entirely different file formats from the Macintosh versions.

hexdump -C PT5-Win-s03.pt5 | head
00000000 00 00 01 00 00 00 45 ae 00 00 44 ae 00 00 03 98 |......E...D.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 00 00 5e 50 53 56 45 00 01 04 31 00 52 05 00 |...^PSVE...1.R..|
00000110 45 44 05 00 45 44 19 99 03 26 0c 50 72 6f 54 6f |ED..ED...&.ProTo|
00000120 6f 6c 73 20 35 2e 30 fc c5 00 d7 12 00 78 5e 00 |ols 5.0......x^.|
00000130 00 00 0e 32 00 78 5e 00 00 00 00 00 00 00 00 00 |...2.x^.........|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

hexdump -C PT24-Win-s03.p24 | head
00000000 00 00 01 00 00 00 3f d3 00 00 3e d3 00 00 02 f1 |......?...>.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 00 01 0a 50 61 74 68 00 01 02 b4 37 0e 2e 48 |....Path....7..H|
00000110 43 3a 5c 57 49 4e 44 4f 57 53 5c 44 65 73 6b 74 |C:\WINDOWS\Deskt|
00000120 6f 70 5c 50 54 5c 50 54 35 2d 57 69 6e 2d 73 30 |op\PT\PT5-Win-s0|
00000130 33 5c 41 75 64 69 6f 20 46 69 6c 65 73 00 00 00 |3\Audio Files...|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

hexdump -C PT4-Win-s03-16.pt4 | head
00000000 00 00 01 00 00 00 3f d9 00 00 3e d9 00 00 02 f1 |......?...>.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 00 01 0a 50 61 74 68 00 01 02 b4 37 0e 2e 48 |....Path....7..H|
00000110 43 3a 5c 57 49 4e 44 4f 57 53 5c 44 65 73 6b 74 |C:\WINDOWS\Deskt|
00000120 6f 70 5c 50 54 5c 50 54 35 2d 57 69 6e 2d 73 30 |op\PT\PT5-Win-s0|
00000130 33 5c 41 75 64 69 6f 20 46 69 6c 65 73 00 00 00 |3\Audio Files...|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Starting with Pro Tools 5.1 in 2001 things began to change. Pro Tools has always been tied very closely with hardware and software so with Apple launching Mac OS X, this provided an opportunity for DigiDesign/Avid to revamp their hardware and software for better compatibility and this included a cross-platform session format.

Pro Tools 5.1 used a new session format which used the extension PTS. Let’s take a look at a sample.

hexdump -C PT Session 02-51.pts | head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 00 01 3d 6e 1c 06 eb d8 c1 aa 16 fd 65 4e 6d |1..=n........eNm|
00000020 23 09 96 db c4 ad 95 7f 68 5d 3a 23 0c a5 ac a8 |#.......h]:#....|
00000030 90 cd ed 04 38 4e 06 47 bc e2 ca b3 9c 8f 6e 57 |....8N.G......nW|
00000040 40 2a 12 fb e4 c4 b6 9f 88 77 5a 43 2c 24 ce c9 |@*.......wZC,$..|
00000050 e3 97 9b 8a 73 5d 46 2f 4a 64 86 b6 dd d6 eb 77 |....s]F/Jd.....w|
00000060 76 49 32 1b 54 9f b9 9f fc fe 15 0f 3f 15 4d 62 |vI2.T.......?.Mb|
00000070 83 aa ab c4 fa 5d 20 26 54 44 0b f3 d9 c5 ae 97 |.....] &TD......|
00000080 cd 08 31 74 77 0d f6 df c8 b5 c0 8b 6c 7c 3f 27 |..1tw.......l|?'|
00000090 10 9e c2 cb b4 9d 86 45 58 41 2a ad e1 78 2d b4 |.......EXA*..x-.|

The session is a new proprietary binary format with an interesting header. There is one byte and then a sequence of ASCII characters in the form of a binary string. 0010111100101011 What it means is unknown to me. In Decimal, the binary reads “12075”, or hex values “2F2B” or in text “/+”. Regardless of what it means, this header was used from versions 5.1 through 9. The extension changed to PTF with version 7-9, but the header is the same. This is why PRONOM PUID fmt/1951 refers to both extensions covering 5.1-9.

hexdump -C PT Session 02-7.ptf | head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 00 01 4c 6a cd 68 00 a0 3c d8 d2 c1 ac 48 be |1..Lj.h..<....H.|
00000020 85 1c 25 54 f0 8c 31 e1 61 fc 98 34 d0 6c 08 a4 |..%T..1.a..4.l..|
00000030 40 dc 79 14 b0 4c eb 84 21 bc 58 f4 90 2c cc 64 |@.y..L..!.X..,.d|
00000040 00 9c 0e a7 15 6f a9 44 e0 7c 18 b4 7a ec 88 24 |.....o.D.|..z..$|
00000050 c6 42 65 77 5d b8 f2 80 a1 3c d8 2e 12 ac 6b e4 |.Bew]....<....k.|
00000060 80 1c a2 71 f0 8c 2c c4 60 fc ae 47 b5 0f 09 a4 |...q..,.`..G....|
00000070 40 dc 78 14 9a 4c e8 84 26 a2 c5 17 fd 58 52 e0 |@.x..L..&....XR.|
00000080 01 9c 38 d4 70 0d a8 44 e0 26 1a b4 73 ec 88 24 |..8.p..D.&..s..$|
00000090 da 79 f8 94 34 cc 68 04 96 4f bd 17 11 ac 48 e4 |.y..4.h..O....H.|

It might be possible to look closer at the two extensions and find something which can distinguish between them, but because they are in a proprietary binary format, there isn’t much to go on. There has been a few attempts at reverse engineering the formats, but they even choose to lump the two extensions together.

The other import byte in this header is the second byte after the odd binary ASCII sequence. Above highlighted in purple. 0x01 is important because in the next version PTX, this changes to 0x05, highlighted below in purple.

Pro Tools version 10 was a big release, it added new features and started to phase out the HD hardware. With this release we see a new session format which is still used by the current version of Pro Tools.

hexdump -C PT Session 02-10.ptx | head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 00 05 13 5a 01 00 04 00 00 00 49 a4 00 00 5a |1...Z......I...Z|
00000020 03 00 64 00 00 00 03 00 00 0c 00 00 00 50 72 6f |..d..........Pro|
00000030 20 54 6f 6f 6c 73 20 48 44 03 00 00 00 0a 00 00 | Tools HD.......|
00000040 00 03 00 00 00 09 00 00 00 06 00 00 00 31 30 2e |.............10.|
00000050 33 2e 39 01 07 00 00 00 52 65 6c 65 61 73 65 00 |3.9.....Release.|
00000060 16 00 00 00 50 72 6f 20 54 6f 6f 6c 73 20 53 65 |....Pro Tools Se|
00000070 73 73 69 6f 6e 20 46 69 6c 65 06 00 05 00 00 00 |ssion File......|
00000080 4d 61 63 4f 53 00 00 00 00 05 5a 08 00 eb 00 00 |MacOS.....Z.....|
00000090 00 67 20 00 00 00 00 2a 00 00 00 be 1d 9d e3 03 |.g ....*........|

This new session format has the same binary ASCII string, but a lot more plain text in the header and throughout the file. This gives us more to explore and understand with even listing the linked Audio files and their paths. PRONOM has this new format assigned to PUID fmt/1727. The signature for these files is the same sequence as the previous version, also the 0x05 byte, but with a couple additional bytes, 5A010004, after the main sequence. I am not sure of the bytes significance, but they are in all the samples I have, even from the current version.

Pro Tools has some other formats which go along with their sessions. One I’ll highlight is the Groove template format. They end with the extension GRV. You can see some samples here. They also have the odd binary ASCII header, but with 0x00 for the second byte after the main header. Highlighted in purple below.

hexdump -C DiskoKonga.grv| head
00000000 03 30 30 31 30 31 31 31 31 30 30 31 30 31 30 31 |.001011110010101|
00000010 31 01 00 5a 00 01 00 00 00 04 00 00 15 f8 5a 00 |1..Z..........Z.|
00000020 01 00 00 15 d3 10 42 04 04 00 64 00 64 00 64 00 |......B...d.d.d.|
00000030 01 00 01 00 01 00 00 00 00 01 d4 c0 00 00 00 00 |................|
00000040 00 00 00 00 81 00 00 00 00 00 00 00 81 5a 00 01 |.............Z..|
00000050 00 00 00 24 10 43 00 00 00 00 00 00 00 00 00 00 |...$.C..........|
00000060 00 00 00 01 d4 c0 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 01 d4 c0 00 49 5a 00 01 00 00 00 24 10 |.......IZ.....$.|
00000080 43 00 00 00 00 00 01 d4 c0 00 00 00 00 00 05 7e |C..............~|
00000090 40 00 00 00 00 00 04 8e e0 00 00 00 00 00 01 d4 |@...............|

Other extensions associated with Pro Tools which use the same format are: PIO, PIM, PTT, PTXT, RGRP.

Pro Tools has always been software directly tied to audio hardware and system software. In addition they also used software dongles to control software licensing and the licenses were not cheap. Because of this, trying to use older versions is very difficult. Finding samples for each version is difficult as each version allows for a variety of features that may not be available in another version. Luckily, there are some older “Free” versions out there with limited features we can get some ideas of the session format.

PRONOM has working identification for the two major formats and until PRONOM can incorporate Macintosh Resource Fork identification it will have to do. The PC version 4 and 5 formats could use more research as I only have one source. The groove and other formats all seem to have the same header so they will need more research as well. Until then, enjoy some sample files and also a disk image of some older Macintosh Pro Tools 3 sessions.

Script Writing

A few of you may remember a couple years ago reading in a Vice article about Eric Roth and his use of an old DOS only software program for writing all his Hollywood scripts. The Vice article was based on some earlier reporting in 2014 about his writing process. You can watch the full interview of Eric Roth on YouTube.

I remember seeing a link to the Vice article a couple years ago and finding the screenwriters use of an old DOS program, Movie Master, funny and interesting. He says in his interview that out of half superstition and half fear of change he prefers to use this very old software to write his screenplays. It’s so old and obsolete, he can’t even email the files to Hollywood. He has to print them out and have the studio scan them into modern software for use. The interview shows the screen of his old Windows computer and you can see the software he is using.

Of course because I love researching obsolete software and formats so much, I wanted to know if the scripts generated by “Movie Master”, version 3.09, are in a format that needed to be documented. I was a little surprised that this version of Movie Master was no where to be found. It was on none of the old abandoned software sites. Not on Internet Archive, no where it seemed. I did find a later version of Movie Master, version 5, but found this software was not the same thing.

The original programmer of Movie Master was Adam Greissman, which you can clearly see in the screenshot above. The software was copyright Comprehensive Video Supply in the 1980’s, but the Movie Master version 5 was developed by Ballistic Software, Inc, which was also known as “Comprehensive Cinema Software” or “Hollywood Cinema Software” later in the 1990’s.

According to a very in depth article by Daniel Plagens, Reinventing the Typewriter, mentions Adam Greissman not wanting to move the software from DOS to Windows as he didn’t feel there was enough of a market at the time. As it turns out the founder of Comprehensive Video Supply, Jules Leni, got a lot of pressure from users of Movie Master after Greissman, who left the company in 1991, to develop a Windows and Macintosh version of the software. They released this new version in October of 1996.

Let’s take a look at a couple of example files from version 5.

hexdump -C Scene.scr | head
00000000 11 0d 0a 32 2e 20 20 20 20 15 0d 0a 15 0d 0a 15 |...2. .......|
00000010 0d 0a 15 0d 0a 11 0d 0a 10 0d 0a 15 0d 0a 15 0d |................|
00000020 0a 15 0d 0a 10 0d 0a 46 41 44 45 20 49 4e 3a 15 |.......FADE IN:.|
00000030 0d 0a 54 68 65 20 66 6f 6c 6c 6f 77 69 6e 67 20 |..The following |
00000040 22 73 63 72 69 70 74 6c 65 74 22 20 64 65 6d 6f |"scriptlet" demo|
00000050 6e 73 74 72 61 74 65 73 20 68 6f 77 20 4d 6f 76 |nstrates how Mov|
00000060 69 65 20 4d 61 73 74 65 72 20 0d 0a 63 61 6e 20 |ie Master ..can |
00000070 62 65 20 75 73 65 64 20 74 6f 20 6f 75 74 6c 69 |be used to outli|
00000080 6e 65 20 73 63 65 6e 65 73 2e 20 20 4f 6e 63 65 |ne scenes. Once|
00000090 20 79 6f 75 20 68 61 76 65 20 66 69 6e 69 73 68 | you have finish|

hexdump -C MM5-s01.scr | head
00000000 11 0d 0a 31 2e 20 20 20 20 15 0d 0a 15 0d 0a 15 |...1. .......|
00000010 0d 0a 15 0d 0a 11 0d 0a 10 0d 0a 15 0d 0a 15 0d |................|
00000020 0a 15 0d 0a 10 0d 0a 54 45 53 54 49 4e 47 15 0d |.......TESTING..|
00000030 0a 7e 60 21 40 23 24 25 5e 26 2a 28 29 2d 2b 7c |.~`!@#$%^&*()-+||
00000040 3d 2d 54 65 43 66 4d 74 0d 0a 01 00 00 07 00 02 |=-TeCfMt........|
00000050 00 00 00 00 00 00 01 00 00 01 00 00 01 00 00 01 |................|
00000060 00 00 01 00 00 01 00 00 01 00 00 01 00 00 01 00 |................|
00000070 00 01 00 00 bf 03 00 00 0c 00 43 6f 75 72 69 65 |..........Courie|
00000080 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |r...............|
00000090 00 00 00 00 00 00 00 00 00 30 00 00 00 00 00 00 |.........0......|

Version 5 of Movie Master uses the extension SCR, which one could assume is short for “Script”. There does appear to be a header before any readable text starts, so that will be helpful in identification. Currently there is only one PUID, x-fmt/100, in PRONOM with the extension SCR, which happens to be for an AutoCAD script and has no signature, so anything you ask DROID or Siegfried to identify with the SCR extension will default to an AutoCAD script, which is frustrating. According to the File Format Wiki, there are quite a few formats with the SCR extension. More work to be done there for sure.

So I tried for a few weeks to find a copy of Movie Master version 3.09, I even put in a eBay favorite search for the name so it would alert me to a copy being sold, but no such luck. I gave up for awhile, then recently someone posted a link to a large collection of early warez. Warez is the name given to software that has been illegally copied. When I followed the link and searched though the vast amount of software titles, I got excited to see a couple matches to “Movie Master”. After a little wrangling of some downloads, I spun up a copy of DOSBox and low and behold, Movie Master 3.09!

Welcome to Movie Master V3.09 about screen

A lot of people have compared the old DOS scriptwriting tools to early word processors like Word, Perfect Writer, WordStar, etc. They did much of the same thing, but with special controls for helping with scenes, characters, indents, and everything writers needed to make some of the best Hollywood films out there. As Daniel Plagens noted:

The program proved popular for many years. Greissman estimates they sold over 10,000 units—“saturating the market,” as he put it—and recalls seeing help wanted ads in Hollywood Reporter and Variety where knowledge of Movie Master was a hiring requirement. He visited the sets of Days of Thunder and Hunt for Red October to help their writers and production teams acclimate to Movie Master.

Makes me wonder where all the old scripts from Hollywood movies are located in their electronic form? I am sure Eric Roth probably has quite the collection of different scripts he has written. I sure hope he backs them up and donates them to a library in the future.

Well, let’s take a look at a couple sample files from Movie Master version 3 and version 4. Version 4.04 was also in the collection uploaded to Internet Archive.

hexdump -C TEST3.SCR | head 
00000000 33 2e 30 39 0a 00 00 00 00 31 00 00 00 00 00 00 |3.09.....1......|
00000010 31 00 00 00 00 00 00 0a 00 4e 41 4d 45 20 3f 0a |1........NAME ?.|
00000020 ff 53 43 52 45 45 4e 0a 2a 42 01 19 3c 01 1e 37 |.SCREEN.*B..<..7|
00000030 01 1c 2f 01 14 25 01 18 24 01 39 4c 01 31 42 01 |../..%..$.9L.1B.|
00000040 35 41 01 0a 46 01 0a 46 01 3d 4b 01 02 00 01 0a |5A..F..F.=K.....|
00000050 03 00 54 65 73 74 69 6e 67 20 4d 6f 76 69 65 20 |..Testing Movie |
00000060 4d 61 73 74 65 72 20 76 65 72 73 69 6f 6e 20 33 |Master version 3|
00000070 2e 30 39 11 11 31 11 31 0a |.09..1.1.|

hexdump -C TEST.SCR
00000000 34 2e 30 34 0a 00 00 00 00 31 00 00 00 00 00 00 |4.04.....1......|
00000010 31 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a |1...............|
00000020 ff 0a 2a 42 01 00 19 3c 01 00 1e 37 01 00 1c 2f |..*B...<...7.../|
00000030 01 00 14 25 01 00 18 24 01 00 39 4c 01 00 31 42 |...%...$..9L..1B|
00000040 01 00 35 41 01 00 0a 46 01 00 0a 46 01 00 3d 4b |..5A...F...F..=K|
00000050 01 00 0a 18 01 00 0a 46 01 00 02 00 00 54 68 69 |.......F.....Thi|
00000060 73 20 69 73 20 61 20 74 65 73 74 20 6f 66 20 4d |s is a test of M|
00000070 6f 76 69 65 20 4d 61 73 74 65 72 20 53 63 72 69 |ovie Master Scri|
00000080 70 74 20 77 72 69 74 69 6e 67 20 73 6f 66 74 77 |pt writing softw|
00000090 61 72 65 2e 0a 01 03 00 00 31 0a 01 00 00 00 00 |are......1......|
000000a0 0a 03 01 0a |....|

hexdump -C COVER.SCR | head
00000000 33 2e 30 35 0a 01 00 00 00 31 00 00 00 00 00 00 |3.05.....1......|
00000010 31 00 00 00 00 00 00 0a ff 43 4f 56 45 52 0a 2a |1........COVER.*|
00000020 42 01 19 3c 01 1e 37 01 1c 2f 01 14 25 01 18 24 |B..<..7../..%..$|
00000030 01 39 4c 01 31 42 01 35 41 01 0a 46 01 0a 46 01 |.9L.1B.5A..F..F.|
00000040 3d 4b 01 06 00 00 0a 03 01 31 0a 01 03 00 00 11 |=K.......1......|
00000050 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 |................|
00000060 11 11 11 11 11 11 11 11 20 20 20 20 20 20 20 20 |........ |
00000070 20 20 20 20 20 20 20 20 20 20 20 20 20 22 4d 65 | "Me|
00000080 65 74 20 74 68 65 20 44 72 61 63 75 6c 61 73 22 |et the Draculas"|
00000090 11 11 11 11 11 20 20 20 20 20 20 20 20 20 20 20 |.....

hexdump -C DRAC2.SCR | head
00000000 34 2e 30 30 0a 01 00 2b 00 36 00 00 00 00 00 00 |4.00...+.6......|
00000010 35 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a |5...............|
00000020 00 42 4f 42 0a 01 54 45 44 0a 02 43 41 52 4f 4c |.BOB..TED..CAROL|
00000030 0a 03 41 4c 49 43 45 0a 04 49 47 4f 52 0a 05 44 |..ALICE..IGOR..D|
00000040 45 4e 4e 49 53 0a 06 4d 55 46 46 49 4e 0a ff 53 |ENNIS..MUFFIN..S|
00000050 43 52 45 45 4e 0a 2a 42 01 00 19 3c 01 00 1e 37 |CREEN.*B...<...7|
00000060 01 00 1c 2f 01 00 14 25 01 00 18 24 01 00 39 4c |.../...%...$..9L|
00000070 01 00 31 42 01 00 35 41 01 00 0a 46 01 00 0a 46 |..1B..5A...F...F|
00000080 01 00 3d 4b 01 00 0a 18 01 00 0a 46 01 00 02 01 |..=K.......F....|
00000090 01 35 0a 03 00 45 58 54 20 54 45 44 20 44 52 41 |.5...EXT TED DRA|

The first thing to notice is they all start with the version number of the software which wrote the file. Really nice to have, but a terrible magic header. The files also all begin (after the version number) and end with the Hex value “0A”. Which happens to be a line feed control character. So super common, but could be helpful. Another pattern is that on the 9th byte it is “31” on most of the samples and “36” on one of them. “31” is the start of the ASCII number sequence, so could be the sequence number for the script as each SCR file could only store what was in memory.

I fear the rest of the format will have the same issue most word processors had at the time which is not having a header, but lots of formatting codes which may or may not be in every file, making programatic identification difficult. Might take awhile to identify all the formatting codes, but could lead to better identification and possibly an import module for tools like LibeOffice or Final Draft.

Screenshot of Movie Master 4.04 start screen

I didn’t find much different with Movie Master 4, seemed to have the same restrictions to 16 files in a script. The files from version 4 also seem to follow the same patterns from version 3. But both versions are different from the the Windows version of Movie Master, version 5. Click here for Movie Master 5 help menu on “Introduction for Movie Master DOS Users“.

There was another elusive script writing software title which adds to the confusion. Scriptware was another screenwriting software tool which seems to have had a large following. They produced a Windows and Macintosh version. It also started out for DOS and also used the SCR extension. The website is still active for the software, but hasn’t updated in 24 years. I wrote a little about in my post on PROmotion. All the demo versions out there are not useable demos, but animation demos. In this nice batch of old software on the Internet Archive I was able to find an early copy. Wasn’t able to get it to run, but the folder did have some samples.

hexdump -C SAMPLE1.SCR | head
00000000 32 5f 01 00 00 00 00 00 00 00 00 39 01 4a 5f 00 |2_.........9.J_.|
00000010 ff ff 2c 01 00 00 00 00 00 00 95 80 01 00 11 53 |..,............S|
00000020 63 72 69 70 74 77 61 72 65 20 53 63 72 69 70 74 |criptware Script|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000000b0 00 00 00 00 00 00 00 00 11 00 02 00 02 00 14 00 |................|
000000c0 12 00 6f 02 f9 04 0b 00 7b 04 01 00 05 00 00 02 |..o.....{.......|
000000d0 00 11 00 00 00 0c 00 00 00 06 00 ed 01 05 06 00 |................|
000000e0 00 00 00 00 08 00 0b 00 00 00 04 00 00 00 04 00 |................|
000000f0 82 00 01 01 00 00 00 00 00 00 00 00 00 00 00 00 |................|

hexdump -C SAMPLE2.SCR | head
00000000 0b 53 63 72 69 70 74 77 61 72 65 1a 95 80 04 80 |.Scriptware.....|
00000010 1e 53 63 72 69 70 74 77 61 72 65 20 53 63 72 69 |.Scriptware Scri|
00000020 70 74 20 32 2e 32 33 3a 34 3b 37 30 32 32 31 00 |pt 2.23:4;70221.|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000000a0 00 00 00 00 00 00 00 00 00 00 11 00 02 00 02 00 |................|
000000b0 11 00 00 00 34 02 01 05 0b 00 89 04 01 00 05 00 |....4...........|
000000c0 00 02 00 11 00 00 00 0b 00 00 00 06 00 aa 01 05 |................|
000000d0 06 00 00 00 00 00 08 00 0c 00 00 00 05 00 00 00 |................|
000000e0 05 00 8a 00 01 01 00 00 00 00 00 00 00 00 00 00 |................|

Luckily, they make it quite easy to identify these SCR files. ScriptWare was very popular and continued on with Windows and Macintosh versions. Later on, the format was changed along with the extension, which changed to SW3.

The SCR extension has been used often. On my desktop they default as a Paintbrush document. Apparently SCR is sometimes used as an extension for the ZSoft Paintbrush (PCX) format. It is also used on older postscript fonts on the Macintosh as a Type 1 screen font. Can also be a screensaver on Windows, but watch out, they can hide malicious code. You get the idea, SCR is a very common extension, identifying it up front can help avoid problems later!

Moral of the story is to never give up searching for old software and even though illegal copying of software should be avoided, I am grateful to those who help save abandoned software. Without them many titles would be lost.

I don’t have a good signature for these formats yet, but you can find a few samples on my GitHub page.

CD Architect

Receiving electronic media from an outside source can be an adventure. Often times you find yourself sorting the valuable files and separating them from the chaff. There can be hidden files, cache files, application files, drivers, and everything in between. Determining what formats are important can sometimes be difficult, especially if you don’t know the file format of some of the files.

I was recently working on a collection of files which had been produced through some audio software. When working with audio, a WAVE file is what is usually kept as they contain the actual audio data. With these files they came with a couple other formats. One of those formats was a bunch of SFK peak files. These files are meant to be temporary as they are generated from the WAVE file to make opening of audio data faster. They are important, but can easily be regenerated. One could argue they have historical value, but also they don’t contain anything that can be used by itself, so alone they don’t have much value.

The other format found with the WAVE files have a CDP extension. These came up as unknown when using DROID. It is not a common extension so finding the name of the software which created the files wasn’t too hard. Let’s take a look at one of them.

hexdump -C tutor1.cdp | head
00000000 52 49 46 46 79 03 00 00 53 46 50 4a 66 6d 74 20 |RIFFy...SFPJfmt |
00000010 18 00 00 00 00 00 01 00 02 00 00 00 10 00 00 00 |................|
00000020 44 ac 00 00 03 00 00 00 01 00 00 00 4c 49 53 54 |D...........LIST|
00000030 88 00 00 00 66 6c 73 74 66 69 6c 65 23 00 00 00 |....flstfile#...|
00000040 44 3a 5c 53 6f 75 6e 64 73 5c 4e 65 77 20 54 75 |D:\Sounds\New Tu|
00000050 74 6f 72 20 66 69 6c 65 73 5c 53 6f 6e 67 33 2e |tor files\Song3.|
00000060 77 61 76 00 66 69 6c 65 23 00 00 00 44 3a 5c 53 |wav.file#...D:\S|
00000070 6f 75 6e 64 73 5c 4e 65 77 20 54 75 74 6f 72 20 |ounds\New Tutor |
00000080 66 69 6c 65 73 5c 53 6f 6e 67 32 2e 77 61 76 00 |files\Song2.wav.|
00000090 66 69 6c 65 23 00 00 00 44 3a 5c 53 6f 75 6e 64 |file#...D:\Sound|

Huh, this is a RIFF file. RIFF is most commonly used as the container used for WAVE and AVI files. You can read more about the RIFF format on a previous post. The RIFF container format can be used for all sorts of things. Looking at the internals we can see a few unique list chunk’s.

Lots of references to other files, specifically WAVE files. But not a lot of actual data. That is because this format turns out to be just a project format for some software called “CD Architect“. Sonic Foundry was an audio software developer for a few years before they sold their catalog to Sony in 2003. In looking at the manual for CD Architect version 5.2, it explains the CDP Project format.

CD Architect software handles the organization of your CD using a small project file (CDP) that saves information about source file locations, edits, cuts, and insertion points. This project file is not a multimedia file, but is instead used to create the CD when editing is finished.

Looking at another CDP file from the collection, I noticed something different.

hexdump -C CDArch50a-s01.cdp | head
00000000 72 69 66 66 2e 91 cf 11 a5 d6 28 db 04 c1 00 00 |riff......(.....|
00000010 20 0a 00 00 00 00 00 00 84 38 15 b3 da 08 85 44 | ........8.....D|
00000020 b2 2a 5b 70 a1 32 15 ff 5a 2d 8f b2 0f 23 d2 11 |.*[p.2..Z-...#..|
00000030 86 af 00 c0 4f 8e db 8a 00 02 00 00 00 00 00 00 |....O...........|
00000040 78 00 00 00 00 00 04 00 11 00 00 00 44 ac 00 00 |x...........D...|
00000050 00 00 00 00 00 c0 52 40 00 00 00 00 00 00 5e 40 |......R@......^@|
00000060 00 00 00 00 00 00 00 00 04 00 04 00 40 00 00 00 |............@...|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 7c 00 00 00 |............|...|
00000080 50 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 |P...............|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

That’s odd, the RIFF format is always uppercase ASCII, this is lowercase. Also the important RIFF form, which was “SFPJ” in the other sample, is missing. This is not a valid RIFF format.

But further down in the file I can see the same list chunks. Did they take RIFF format and make a proprietary version of their own? I think they may have. It seems the first example was from CD Architect version 4 and these other files are from CD Architect version 5. That complicates things. Sony stopped developing CD Architect after version 5.2d and maintained it for a few years before selling many of their titles to MAGIX Software. As far as I know there was never any new versions released. The software was very popular, as it had some really nice audio mastering features and was easy to use. Many were upset when the software was abandoned.

Creating a signature for both version 4 and version 5 CDP files will be pretty straightforward. I feel knowing what you have in a collection you are processing is the first step in making informed decisions. Wether or not you keep the project files are up for debate. Some may only want the final audio created from a CD Architect project, while others may want to see the way the audio was put together and mixed. Either way, the more you know…..

One more thing. CD Architect would default to saving a CDP project file, but could also save a “CD Image file”. This process actually would save the project to a full WAVE file with some extras baked in.

An image file is essentially a wave file with volume, crossfades, effects, mixes, and track information embedded. Burning an image file will reduce the risk of buffer underruns (especially if you have a complex project or are using a slow computer) since no audio processing is required. 

Interesting, normally when working with track information in a single WAVE file you would need a companion CUE Sheet in order to reference the track layout of the Audio CD. So I am curious how they do all of this. Lets take a look at a “CD Image”.

mediainfo CDArch52d-s02.wav
General
Complete name : CDArch52d-s02.wav
Format : Wave
Format settings : PcmWaveformat
File size : 5.05 MiB
Duration : 30 s 0 ms
Overall bit rate mode : Constant
Overall bit rate : 1 411 kb/s
Conformance errors : 2
RIFF : Yes
General compliance : File size 5292434 is less than expected size 5292823 (offset 0x8)
WAVE : Yes
General compliance : Element size 5292811 is more than maximal permitted size 5292422 (offset 0xC)

Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 30 s 0 ms
Bit rate mode : Constant
Bit rate : 1 411.2 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 5.05 MiB (100%)

Already seeing some issues with the format, but all the important bits are there. JHOVE doesn’t like them much either.

JhoveView (Rel. 1.32.0, 2024-09-12)
Date: 2024-12-11 16:01:08 MST
RepresentationInformation: CDArch52d-s02.wav
ReportingModule: WAVE-hul, Rel. 1.8.3 (2024-03-05)
LastModified: 2024-12-11 15:58:02 MST
Size: 5292434
Format: WAVE
Status: Not well-formed
SignatureMatches:
WAVE-hul
InfoMessage: Ignored unrecognized list type: "pqls"
ID: WAVE-HUL-15
Offset: 5292044
ErrorMessage: Unexpected end of file: Bytes missing = 389
ID: WAVE-HUL-3
Offset: 5292434
MIMEtype: audio/vnd.wave; codec=1
Profile: PCMWAVEFORMAT

JHOVE is giving me two issues. The major error is the file appears truncated according to both MediaInfo and JHOVE. The InfoMessage which is less of an issue but more of a heads up that the WAVE file has an extra LIST type. “PQLS”, which was also in the CPD RIFF file we looked at earlier. So it seems by making a “CD Image” of a project embeds the project chunk data into the WAVE container. Identification is not an issue as these WAVE’s follow the standard pattern and therefore identify correctly, but one might want to be aware through further characterization these WAVE’s have some not so obvious extra data.

My attempts to find any samples from version 3 of CD Architect have failed. Until then, my proposal is to add version 4 & 5 to PRONOM with the signature on my Github page. There you will find a few samples as well.