Script Writing

A few of you may remember a couple years ago reading in a Vice article about Eric Roth and his use of an old DOS only software program for writing all his Hollywood scripts. The Vice article was based on some earlier reporting in 2014 about his writing process. You can watch the full interview of Eric Roth on YouTube.

I remember seeing a link to the Vice article a couple years ago and finding the screenwriters use of an old DOS program, Movie Master, funny and interesting. He says in his interview that out of half superstition and half fear of change he prefers to use this very old software to write his screenplays. It’s so old and obsolete, he can’t even email the files to Hollywood. He has to print them out and have the studio scan them into modern software for use. The interview shows the screen of his old Windows computer and you can see the software he is using.

Of course because I love researching obsolete software and formats so much, I wanted to know if the scripts generated by “Movie Master”, version 3.09, are in a format that needed to be documented. I was a little surprised that this version of Movie Master was no where to be found. It was on none of the old abandoned software sites. Not on Internet Archive, no where it seemed. I did find a later version of Movie Master, version 5, but found this software was not the same thing.

The original programmer of Movie Master was Adam Greissman, which you can clearly see in the screenshot above. The software was copyright Comprehensive Video Supply in the 1980’s, but the Movie Master version 5 was developed by Ballistic Software, Inc, which was also known as “Comprehensive Cinema Software” or “Hollywood Cinema Software” later in the 1990’s.

According to a very in depth article by Daniel Plagens, Reinventing the Typewriter, mentions Adam Greissman not wanting to move the software from DOS to Windows as he didn’t feel there was enough of a market at the time. As it turns out the founder of Comprehensive Video Supply, Jules Leni, got a lot of pressure from users of Movie Master after Greissman, who left the company in 1991, to develop a Windows and Macintosh version of the software. They released this new version in October of 1996.

Let’s take a look at a couple of example files from version 5.

hexdump -C Scene.scr | head
00000000 11 0d 0a 32 2e 20 20 20 20 15 0d 0a 15 0d 0a 15 |...2. .......|
00000010 0d 0a 15 0d 0a 11 0d 0a 10 0d 0a 15 0d 0a 15 0d |................|
00000020 0a 15 0d 0a 10 0d 0a 46 41 44 45 20 49 4e 3a 15 |.......FADE IN:.|
00000030 0d 0a 54 68 65 20 66 6f 6c 6c 6f 77 69 6e 67 20 |..The following |
00000040 22 73 63 72 69 70 74 6c 65 74 22 20 64 65 6d 6f |"scriptlet" demo|
00000050 6e 73 74 72 61 74 65 73 20 68 6f 77 20 4d 6f 76 |nstrates how Mov|
00000060 69 65 20 4d 61 73 74 65 72 20 0d 0a 63 61 6e 20 |ie Master ..can |
00000070 62 65 20 75 73 65 64 20 74 6f 20 6f 75 74 6c 69 |be used to outli|
00000080 6e 65 20 73 63 65 6e 65 73 2e 20 20 4f 6e 63 65 |ne scenes. Once|
00000090 20 79 6f 75 20 68 61 76 65 20 66 69 6e 69 73 68 | you have finish|

hexdump -C MM5-s01.scr | head
00000000 11 0d 0a 31 2e 20 20 20 20 15 0d 0a 15 0d 0a 15 |...1. .......|
00000010 0d 0a 15 0d 0a 11 0d 0a 10 0d 0a 15 0d 0a 15 0d |................|
00000020 0a 15 0d 0a 10 0d 0a 54 45 53 54 49 4e 47 15 0d |.......TESTING..|
00000030 0a 7e 60 21 40 23 24 25 5e 26 2a 28 29 2d 2b 7c |.~`!@#$%^&*()-+||
00000040 3d 2d 54 65 43 66 4d 74 0d 0a 01 00 00 07 00 02 |=-TeCfMt........|
00000050 00 00 00 00 00 00 01 00 00 01 00 00 01 00 00 01 |................|
00000060 00 00 01 00 00 01 00 00 01 00 00 01 00 00 01 00 |................|
00000070 00 01 00 00 bf 03 00 00 0c 00 43 6f 75 72 69 65 |..........Courie|
00000080 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |r...............|
00000090 00 00 00 00 00 00 00 00 00 30 00 00 00 00 00 00 |.........0......|

Version 5 of Movie Master uses the extension SCR, which one could assume is short for “Script”. There does appear to be a header before any readable text starts, so that will be helpful in identification. Currently there is only one PUID, x-fmt/100, in PRONOM with the extension SCR, which happens to be for an AutoCAD script and has no signature, so anything you ask DROID or Siegfried to identify with the SCR extension will default to an AutoCAD script, which is frustrating. According to the File Format Wiki, there are quite a few formats with the SCR extension. More work to be done there for sure.

So I tried for a few weeks to find a copy of Movie Master version 3.09, I even put in a eBay favorite search for the name so it would alert me to a copy being sold, but no such luck. I gave up for awhile, then recently someone posted a link to a large collection of early warez. Warez is the name given to software that has been illegally copied. When I followed the link and searched though the vast amount of software titles, I got excited to see a couple matches to “Movie Master”. After a little wrangling of some downloads, I spun up a copy of DOSBox and low and behold, Movie Master 3.09!

Welcome to Movie Master V3.09 about screen

A lot of people have compared the old DOS scriptwriting tools to early word processors like Word, Perfect Writer, WordStar, etc. They did much of the same thing, but with special controls for helping with scenes, characters, indents, and everything writers needed to make some of the best Hollywood films out there. As Daniel Plagens noted:

The program proved popular for many years. Greissman estimates they sold over 10,000 units—“saturating the market,” as he put it—and recalls seeing help wanted ads in Hollywood Reporter and Variety where knowledge of Movie Master was a hiring requirement. He visited the sets of Days of Thunder and Hunt for Red October to help their writers and production teams acclimate to Movie Master.

Makes me wonder where all the old scripts from Hollywood movies are located in their electronic form? I am sure Eric Roth probably has quite the collection of different scripts he has written. I sure hope he backs them up and donates them to a library in the future.

Well, let’s take a look at a couple sample files from Movie Master version 3 and version 4. Version 4.04 was also in the collection uploaded to Internet Archive.

hexdump -C TEST3.SCR | head 
00000000 33 2e 30 39 0a 00 00 00 00 31 00 00 00 00 00 00 |3.09.....1......|
00000010 31 00 00 00 00 00 00 0a 00 4e 41 4d 45 20 3f 0a |1........NAME ?.|
00000020 ff 53 43 52 45 45 4e 0a 2a 42 01 19 3c 01 1e 37 |.SCREEN.*B..<..7|
00000030 01 1c 2f 01 14 25 01 18 24 01 39 4c 01 31 42 01 |../..%..$.9L.1B.|
00000040 35 41 01 0a 46 01 0a 46 01 3d 4b 01 02 00 01 0a |5A..F..F.=K.....|
00000050 03 00 54 65 73 74 69 6e 67 20 4d 6f 76 69 65 20 |..Testing Movie |
00000060 4d 61 73 74 65 72 20 76 65 72 73 69 6f 6e 20 33 |Master version 3|
00000070 2e 30 39 11 11 31 11 31 0a |.09..1.1.|

hexdump -C TEST.SCR
00000000 34 2e 30 34 0a 00 00 00 00 31 00 00 00 00 00 00 |4.04.....1......|
00000010 31 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a |1...............|
00000020 ff 0a 2a 42 01 00 19 3c 01 00 1e 37 01 00 1c 2f |..*B...<...7.../|
00000030 01 00 14 25 01 00 18 24 01 00 39 4c 01 00 31 42 |...%...$..9L..1B|
00000040 01 00 35 41 01 00 0a 46 01 00 0a 46 01 00 3d 4b |..5A...F...F..=K|
00000050 01 00 0a 18 01 00 0a 46 01 00 02 00 00 54 68 69 |.......F.....Thi|
00000060 73 20 69 73 20 61 20 74 65 73 74 20 6f 66 20 4d |s is a test of M|
00000070 6f 76 69 65 20 4d 61 73 74 65 72 20 53 63 72 69 |ovie Master Scri|
00000080 70 74 20 77 72 69 74 69 6e 67 20 73 6f 66 74 77 |pt writing softw|
00000090 61 72 65 2e 0a 01 03 00 00 31 0a 01 00 00 00 00 |are......1......|
000000a0 0a 03 01 0a |....|

hexdump -C COVER.SCR | head
00000000 33 2e 30 35 0a 01 00 00 00 31 00 00 00 00 00 00 |3.05.....1......|
00000010 31 00 00 00 00 00 00 0a ff 43 4f 56 45 52 0a 2a |1........COVER.*|
00000020 42 01 19 3c 01 1e 37 01 1c 2f 01 14 25 01 18 24 |B..<..7../..%..$|
00000030 01 39 4c 01 31 42 01 35 41 01 0a 46 01 0a 46 01 |.9L.1B.5A..F..F.|
00000040 3d 4b 01 06 00 00 0a 03 01 31 0a 01 03 00 00 11 |=K.......1......|
00000050 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 |................|
00000060 11 11 11 11 11 11 11 11 20 20 20 20 20 20 20 20 |........ |
00000070 20 20 20 20 20 20 20 20 20 20 20 20 20 22 4d 65 | "Me|
00000080 65 74 20 74 68 65 20 44 72 61 63 75 6c 61 73 22 |et the Draculas"|
00000090 11 11 11 11 11 20 20 20 20 20 20 20 20 20 20 20 |.....

hexdump -C DRAC2.SCR | head
00000000 34 2e 30 30 0a 01 00 2b 00 36 00 00 00 00 00 00 |4.00...+.6......|
00000010 35 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a |5...............|
00000020 00 42 4f 42 0a 01 54 45 44 0a 02 43 41 52 4f 4c |.BOB..TED..CAROL|
00000030 0a 03 41 4c 49 43 45 0a 04 49 47 4f 52 0a 05 44 |..ALICE..IGOR..D|
00000040 45 4e 4e 49 53 0a 06 4d 55 46 46 49 4e 0a ff 53 |ENNIS..MUFFIN..S|
00000050 43 52 45 45 4e 0a 2a 42 01 00 19 3c 01 00 1e 37 |CREEN.*B...<...7|
00000060 01 00 1c 2f 01 00 14 25 01 00 18 24 01 00 39 4c |.../...%...$..9L|
00000070 01 00 31 42 01 00 35 41 01 00 0a 46 01 00 0a 46 |..1B..5A...F...F|
00000080 01 00 3d 4b 01 00 0a 18 01 00 0a 46 01 00 02 01 |..=K.......F....|
00000090 01 35 0a 03 00 45 58 54 20 54 45 44 20 44 52 41 |.5...EXT TED DRA|

The first thing to notice is they all start with the version number of the software which wrote the file. Really nice to have, but a terrible magic header. The files also all begin (after the version number) and end with the Hex value “0A”. Which happens to be a line feed control character. So super common, but could be helpful. Another pattern is that on the 9th byte it is “31” on most of the samples and “36” on one of them. “31” is the start of the ASCII number sequence, so could be the sequence number for the script as each SCR file could only store what was in memory.

I fear the rest of the format will have the same issue most word processors had at the time which is not having a header, but lots of formatting codes which may or may not be in every file, making programatic identification difficult. Might take awhile to identify all the formatting codes, but could lead to better identification and possibly an import module for tools like LibeOffice or Final Draft.

Screenshot of Movie Master 4.04 start screen

I didn’t find much different with Movie Master 4, seemed to have the same restrictions to 16 files in a script. The files from version 4 also seem to follow the same patterns from version 3. But both versions are different from the the Windows version of Movie Master, version 5. Click here for Movie Master 5 help menu on “Introduction for Movie Master DOS Users“.

There was another elusive script writing software title which adds to the confusion. Scriptware was another screenwriting software tool which seems to have had a large following. They produced a Windows and Macintosh version. It also started out for DOS and also used the SCR extension. The website is still active for the software, but hasn’t updated in 24 years. I wrote a little about in my post on PROmotion. All the demo versions out there are not useable demos, but animation demos. In this nice batch of old software on the Internet Archive I was able to find an early copy. Wasn’t able to get it to run, but the folder did have some samples.

hexdump -C SAMPLE1.SCR | head
00000000 32 5f 01 00 00 00 00 00 00 00 00 39 01 4a 5f 00 |2_.........9.J_.|
00000010 ff ff 2c 01 00 00 00 00 00 00 95 80 01 00 11 53 |..,............S|
00000020 63 72 69 70 74 77 61 72 65 20 53 63 72 69 70 74 |criptware Script|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000000b0 00 00 00 00 00 00 00 00 11 00 02 00 02 00 14 00 |................|
000000c0 12 00 6f 02 f9 04 0b 00 7b 04 01 00 05 00 00 02 |..o.....{.......|
000000d0 00 11 00 00 00 0c 00 00 00 06 00 ed 01 05 06 00 |................|
000000e0 00 00 00 00 08 00 0b 00 00 00 04 00 00 00 04 00 |................|
000000f0 82 00 01 01 00 00 00 00 00 00 00 00 00 00 00 00 |................|

hexdump -C SAMPLE2.SCR | head
00000000 0b 53 63 72 69 70 74 77 61 72 65 1a 95 80 04 80 |.Scriptware.....|
00000010 1e 53 63 72 69 70 74 77 61 72 65 20 53 63 72 69 |.Scriptware Scri|
00000020 70 74 20 32 2e 32 33 3a 34 3b 37 30 32 32 31 00 |pt 2.23:4;70221.|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000000a0 00 00 00 00 00 00 00 00 00 00 11 00 02 00 02 00 |................|
000000b0 11 00 00 00 34 02 01 05 0b 00 89 04 01 00 05 00 |....4...........|
000000c0 00 02 00 11 00 00 00 0b 00 00 00 06 00 aa 01 05 |................|
000000d0 06 00 00 00 00 00 08 00 0c 00 00 00 05 00 00 00 |................|
000000e0 05 00 8a 00 01 01 00 00 00 00 00 00 00 00 00 00 |................|

Luckily, they make it quite easy to identify these SCR files. ScriptWare was very popular and continued on with Windows and Macintosh versions. Later on, the format was changed along with the extension, which changed to SW3.

The SCR extension has been used often. On my desktop they default as a Paintbrush document. Apparently SCR is sometimes used as an extension for the ZSoft Paintbrush (PCX) format. It is also used on older postscript fonts on the Macintosh as a Type 1 screen font. Can also be a screensaver on Windows, but watch out, they can hide malicious code. You get the idea, SCR is a very common extension, identifying it up front can help avoid problems later!

Moral of the story is to never give up searching for old software and even though illegal copying of software should be avoided, I am grateful to those who help save abandoned software. Without them many titles would be lost.

I don’t have a good signature for these formats yet, but you can find a few samples on my GitHub page.

CD Architect

Receiving electronic media from an outside source can be an adventure. Often times you find yourself sorting the valuable files and separating them from the chaff. There can be hidden files, cache files, application files, drivers, and everything in between. Determining what formats are important can sometimes be difficult, especially if you don’t know the file format of some of the files.

I was recently working on a collection of files which had been produced through some audio software. When working with audio, a WAVE file is what is usually kept as they contain the actual audio data. With these files they came with a couple other formats. One of those formats was a bunch of SFK peak files. These files are meant to be temporary as they are generated from the WAVE file to make opening of audio data faster. They are important, but can easily be regenerated. One could argue they have historical value, but also they don’t contain anything that can be used by itself, so alone they don’t have much value.

The other format found with the WAVE files have a CDP extension. These came up as unknown when using DROID. It is not a common extension so finding the name of the software which created the files wasn’t too hard. Let’s take a look at one of them.

hexdump -C tutor1.cdp | head
00000000 52 49 46 46 79 03 00 00 53 46 50 4a 66 6d 74 20 |RIFFy...SFPJfmt |
00000010 18 00 00 00 00 00 01 00 02 00 00 00 10 00 00 00 |................|
00000020 44 ac 00 00 03 00 00 00 01 00 00 00 4c 49 53 54 |D...........LIST|
00000030 88 00 00 00 66 6c 73 74 66 69 6c 65 23 00 00 00 |....flstfile#...|
00000040 44 3a 5c 53 6f 75 6e 64 73 5c 4e 65 77 20 54 75 |D:\Sounds\New Tu|
00000050 74 6f 72 20 66 69 6c 65 73 5c 53 6f 6e 67 33 2e |tor files\Song3.|
00000060 77 61 76 00 66 69 6c 65 23 00 00 00 44 3a 5c 53 |wav.file#...D:\S|
00000070 6f 75 6e 64 73 5c 4e 65 77 20 54 75 74 6f 72 20 |ounds\New Tutor |
00000080 66 69 6c 65 73 5c 53 6f 6e 67 32 2e 77 61 76 00 |files\Song2.wav.|
00000090 66 69 6c 65 23 00 00 00 44 3a 5c 53 6f 75 6e 64 |file#...D:\Sound|

Huh, this is a RIFF file. RIFF is most commonly used as the container used for WAVE and AVI files. You can read more about the RIFF format on a previous post. The RIFF container format can be used for all sorts of things. Looking at the internals we can see a few unique list chunk’s.

Lots of references to other files, specifically WAVE files. But not a lot of actual data. That is because this format turns out to be just a project format for some software called “CD Architect“. Sonic Foundry was an audio software developer for a few years before they sold their catalog to Sony in 2003. In looking at the manual for CD Architect version 5.2, it explains the CDP Project format.

CD Architect software handles the organization of your CD using a small project file (CDP) that saves information about source file locations, edits, cuts, and insertion points. This project file is not a multimedia file, but is instead used to create the CD when editing is finished.

Looking at another CDP file from the collection, I noticed something different.

hexdump -C CDArch50a-s01.cdp | head
00000000 72 69 66 66 2e 91 cf 11 a5 d6 28 db 04 c1 00 00 |riff......(.....|
00000010 20 0a 00 00 00 00 00 00 84 38 15 b3 da 08 85 44 | ........8.....D|
00000020 b2 2a 5b 70 a1 32 15 ff 5a 2d 8f b2 0f 23 d2 11 |.*[p.2..Z-...#..|
00000030 86 af 00 c0 4f 8e db 8a 00 02 00 00 00 00 00 00 |....O...........|
00000040 78 00 00 00 00 00 04 00 11 00 00 00 44 ac 00 00 |x...........D...|
00000050 00 00 00 00 00 c0 52 40 00 00 00 00 00 00 5e 40 |......R@......^@|
00000060 00 00 00 00 00 00 00 00 04 00 04 00 40 00 00 00 |............@...|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 7c 00 00 00 |............|...|
00000080 50 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 |P...............|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

That’s odd, the RIFF format is always uppercase ASCII, this is lowercase. Also the important RIFF form, which was “SFPJ” in the other sample, is missing. This is not a valid RIFF format.

But further down in the file I can see the same list chunks. Did they take RIFF format and make a proprietary version of their own? I think they may have. It seems the first example was from CD Architect version 4 and these other files are from CD Architect version 5. That complicates things. Sony stopped developing CD Architect after version 5.2d and maintained it for a few years before selling many of their titles to MAGIX Software. As far as I know there was never any new versions released. The software was very popular, as it had some really nice audio mastering features and was easy to use. Many were upset when the software was abandoned.

Creating a signature for both version 4 and version 5 CDP files will be pretty straightforward. I feel knowing what you have in a collection you are processing is the first step in making informed decisions. Wether or not you keep the project files are up for debate. Some may only want the final audio created from a CD Architect project, while others may want to see the way the audio was put together and mixed. Either way, the more you know…..

One more thing. CD Architect would default to saving a CDP project file, but could also save a “CD Image file”. This process actually would save the project to a full WAVE file with some extras baked in.

An image file is essentially a wave file with volume, crossfades, effects, mixes, and track information embedded. Burning an image file will reduce the risk of buffer underruns (especially if you have a complex project or are using a slow computer) since no audio processing is required. 

Interesting, normally when working with track information in a single WAVE file you would need a companion CUE Sheet in order to reference the track layout of the Audio CD. So I am curious how they do all of this. Lets take a look at a “CD Image”.

mediainfo CDArch52d-s02.wav
General
Complete name : CDArch52d-s02.wav
Format : Wave
Format settings : PcmWaveformat
File size : 5.05 MiB
Duration : 30 s 0 ms
Overall bit rate mode : Constant
Overall bit rate : 1 411 kb/s
Conformance errors : 2
RIFF : Yes
General compliance : File size 5292434 is less than expected size 5292823 (offset 0x8)
WAVE : Yes
General compliance : Element size 5292811 is more than maximal permitted size 5292422 (offset 0xC)

Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 30 s 0 ms
Bit rate mode : Constant
Bit rate : 1 411.2 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 5.05 MiB (100%)

Already seeing some issues with the format, but all the important bits are there. JHOVE doesn’t like them much either.

JhoveView (Rel. 1.32.0, 2024-09-12)
Date: 2024-12-11 16:01:08 MST
RepresentationInformation: CDArch52d-s02.wav
ReportingModule: WAVE-hul, Rel. 1.8.3 (2024-03-05)
LastModified: 2024-12-11 15:58:02 MST
Size: 5292434
Format: WAVE
Status: Not well-formed
SignatureMatches:
WAVE-hul
InfoMessage: Ignored unrecognized list type: "pqls"
ID: WAVE-HUL-15
Offset: 5292044
ErrorMessage: Unexpected end of file: Bytes missing = 389
ID: WAVE-HUL-3
Offset: 5292434
MIMEtype: audio/vnd.wave; codec=1
Profile: PCMWAVEFORMAT

JHOVE is giving me two issues. The major error is the file appears truncated according to both MediaInfo and JHOVE. The InfoMessage which is less of an issue but more of a heads up that the WAVE file has an extra LIST type. “PQLS”, which was also in the CPD RIFF file we looked at earlier. So it seems by making a “CD Image” of a project embeds the project chunk data into the WAVE container. Identification is not an issue as these WAVE’s follow the standard pattern and therefore identify correctly, but one might want to be aware through further characterization these WAVE’s have some not so obvious extra data.

My attempts to find any samples from version 3 of CD Architect have failed. Until then, my proposal is to add version 4 & 5 to PRONOM with the signature on my Github page. There you will find a few samples as well.

KODAK TIFF

Years ago I bought my first digital camera. It was an Epson PhotoPC 3100z and I bought it because it could capture a digital image directly to a TIFF file. I don’t think most people would care about such a feature, but I thought it was awesome. Granted it filled up the small 32MB compact flash card pretty quick, I had to upgrade to a 512MB card, that set me back.

TIFF images are pretty universal, they have a well known structure and have been around for a very long time. I have written about TIFF’s before, so I wont go into too much about the format. The format is well respected in the preservation community, although one of the best websites, Aware Systems, documenting the various TIFF tags has gone dark in the this year, here is an archived version.

Many of the digital camera’s from the beginning to now use the TIFF format to store RAW sensor data. Most use their own extension and follow well established methods for storing the sensor data in an IFD with lots of common and custom tags. The DNG format is an open RAW format which uses the TIFF format to store sensor data, although many use SubIFD’s and can be incompatible with some software.

The first Digital Camera was invented by a Kodak employee, Steve Sasson in 1975, well, he was the first to use a CCD sensor in a self contained unit. This led Kodak to push the technology forward and in 1991 released the Kodak DCS digital system which used Nikon cameras equipped with a digital sensor. These early digital cameras were quite expensive, they used early CF cards and SCSI connections. Kodak released a few models of the DCS series, first on Nikon bodies, then on some Canon bodies. These early cameras used the TIFF format to store the RAW sensor data. For some reason, they decided to use a proprietary method and compression while still using the TIF extension.

Kodak was responsible for many new image file formats. Not sure why they decided to use a common format like TIFF and still use the TIF extension, but make it proprietary. The RAW file created by the DCS series of camera’s had to be opened with special plugins or software, if you tried to open the TIFF’s with anything else, you would only see the small thumbnail image located at IFD0 instead of the full size image hidden in a SubIFD1.

Finding samples of this format is particularly hard as they have the common TIF extension. The camera’s are also pretty rare and finding one is difficult, especially in working condition. I was only aware of a couple samples on the rawsamples.ch site, but that wasn’t enough to understand the format as the two files had a different structure.

hexdump -C RAW_KODAK_DCS460D_FILEVERSION_3.TIF | head
00000000 49 49 2a 00 00 03 00 00 7c 01 00 00 00 00 00 00 |II*.....|.......|
00000010 4b 4f 44 41 4b 20 20 20 20 20 20 20 20 20 20 20 |KODAK |
00000020 44 43 53 34 36 30 44 20 20 20 20 20 20 20 20 20 |DCS460D |
00000030 46 49 4c 45 20 56 45 52 53 49 4f 4e 20 33 20 20 |FILE VERSION 3 |
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 30 35 31 39 39 38 20 20 20 20 20 20 20 20 20 20 |051998 |
00000060 34 36 30 2d 32 39 35 30 00 00 00 00 00 00 00 00 |460-2950........|
00000070 31 39 39 30 3a 30 31 3a 30 31 20 31 32 3a 30 32 |1990:01:01 12:02|
00000080 3a 30 37 00 5b 20 32 5d 0d 49 53 4f 3a 20 20 20 |:07.[ 2].ISO: |
00000090 20 20 20 20 20 38 30 20 20 0d 41 70 65 72 74 75 | 80 .Apertu|

hexdump -C RAW_KODAK_DCS560C.TIF | head
00000000 4d 4d 00 2a 00 00 11 76 00 04 f7 50 00 00 00 00 |MM.*...v...P....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000040 54 68 69 73 20 69 6d 61 67 65 20 66 69 6c 65 20 |This image file |
00000050 77 61 73 20 63 72 65 61 74 65 64 20 62 79 20 61 |was created by a|
00000060 20 4b 6f 64 61 6b 20 44 43 53 35 36 30 43 20 64 | Kodak DCS560C d|
00000070 69 67 69 74 61 6c 20 63 61 6d 65 72 61 2e 20 28 |igital camera. (|
00000080 6e 75 6c 6c 29 20 20 00 00 00 00 00 00 00 00 00 |null) .........|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

There is/was a website called https://raw.pixls.us/, but it has been offline since last June, the regular site still works, but the raw sub-domain is unreachable. Luckily the wayback machine had archived a few samples.

I also found a reference on an older website referring to a sample set maintained by Kodak for developers using the SDK, but also no longer available. You can find the old website also on the wayback machine.

With a few more samples to refer to, it makes it easier to understand the headers and put together a signature. There was an SDK, but seems to be difficult to locate today, but the manual does give us a little more info on the different models and their format.

So from the SDK statement, the samples I have in TIF, and others I have in the more recent DCR format, I can conclude the custom TIF format was used with the DCS 3xx, 4xx, 5xx, 6xx models and from 7xx on the DCR format was used as the camera RAW. Looking closer at the samples in TIF, we can see all the 4xx models used the “FILE VERSION 3” version of the format, while the others have the full statement in the header. Not 100% clear on which format came first, but the 4xx models are some of the earliest models.

At the time, there was only Kodak software that could properly “develop” the RAW file taken by these camera models. Today that has changed and the format has been added to many open source libraries such as libraw and rawspeed. Many other commercial products also claim to support the DCS models including Adobe Camera Raw, which seems to be able to open these TIF’s.

Distinguishing these RAW TIF’s is important to properly manage them over the long term. These images currently identify in the PRONOM repository as regular TIF’s, fmt/353, so we would need to create a signature which identifies the standard TIFF header, but also uses bytes unique to this format. In the few samples I have the “VERSION 3” images all start with the litte-endian header, “49492A00”, while the other samples start with the big-endian header, “4D4D002A”. That makes it a little easier for each signature.

For for the “VERSION 3” format we could use a pattern such as 49492A00{12}4B4F44414B{11}(444353|454F53444353). This looks for the TIFF header, skips 12 bytes, looks for the word “KODAK”, skips 11 more bytes to then look for either “DCS” or “EOSDCS” right before the camera model number.

For the other format we also look for the TIFF header, but then find the whole string used in all the samples. 4D4D002A{60}5468697320696D6167652066696C652077617320637265617465642062792061204B6F64616B20444353{5}6469676974616C2063616D6572612E

This looks for the big-endian header, then the string, “This image file was created by a Kodak DCS”, skipping the model number, then the end of the string, “digital camera.” This should catch all the different models of this format.

You can find my proposed signature on my GitHub page, since none of the samples belong to me, you can find them above in some of the links.

RealVideo

For #WDPD24 and PRONOM Hackathon week this year, I want to find some older formats listed which did not have a signature. There is a list to choose from, but I wanted to find something I hadn’t worked on before. I came across two entries for Real Video:

PUIDNameExtension
fmt/204RealVideo Cliprv
x-fmt/277Real Videorv

I was familiar with Real Media and Real Audio, but had yet to come across any RealVideo with the RV extension. I thought it would be easy to find some references and samples, but that was not the case. I assume PRONOM originally added these based on MIME types available.

Real or RealNetworks is/was an Internet media company who jumped on the rapidly growing World Wide Web in 1995 to become a leader in Internet Media Delivery. Their initial offerings mainly focused on audio streaming and they accomplished all of this by providing free players and web browser extensions to make it easy to serve up a website with streaming media everyone could enjoy. Later adding video streaming optimized for the slower dialup and connections of the day. They used codecs based on common technology like H.263 and H.264, but used then to make their own proprietary codecs identified through FourCC codes, RV10-RV60.

So thought it would be easy to find a reference to the RV extension, I quickly discovered it wasn’t. Looking at the Wikipedia page on RealVideo, I found no reference to the RV extension. RV is an abbreviation for RealVideo, right? Well, I ended up finding a reference in the RealAudio page under file extensions. Ok, First clue to the existence of the RV extension. The page references RV as being used for video only files and was used by the flagship encoder (RealProducer).

RealProducer was the tool for creating the streaming audio and video formats that could then be used for your website or streaming platform. The RealProducer software came in a Basic version, which was free, and the Plus or Pro version, which was not free and provided more options. The first version of RealProducer to make video files was version 4. I was able to find a copy of the encoder and installed it under a Windows 95 emulator. To my surprise it only saved to the RealMedia RM file format. This format is well known and identified with PRONOM as x-fmt/190 also documented at the LoC.

This was the same with RealProducer 5, 7, 8, 9, and 10 that I was able to try. All made no mention of the RV extension. I was starting to feel this format didn’t exist or that some decided to use the RV extension on their own. Searches on Google yielded a couple results, mostly from users who had found a few files on their older discs and wanted to migrate them to something newer. I was able to find one example, one user shared, but it had the same header as the RealMedia format. The clue was in the file.

hexdump -C ambush_abb.rv
00000000  2e 52 4d 46 00 00 00 12  00 01 00 00 00 00 00 00  |.RMF............|
00000010  00 07 50 52 4f 50 00 00  00 32 00 00 00 03 6e e8  |..PROP...2....n.|
00000020  00 03 6e e8 00 00 03 e0  00 00 01 b3 00 00 6a 6f  |..n...........jo|
00000030  00 06 80 fa 00 00 08 b5  00 ba 41 73 00 00 03 55  |..........As...U|
00000040  00 03 00 09 43 4f 4e 54  00 00 00 40 00 00 00 00  |....CONT...@....|
00000050  00 00 00 08 28 43 29 20  32 30 30 35 00 26 00 00  |....(C) 2005.&..|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000270  00 09 61 75 64 69 6f 4d  6f 64 65 00 00 00 02 00  |..audioMode.....|
00000280  06 76 6f 69 63 65 00 00  00 00 2d 00 00 0d 43 72  |.voice....-...Cr|
00000290  65 61 74 69 6f 6e 20 44  61 74 65 00 00 00 02 00  |eation Date.....|
000002a0  13 39 2f 32 30 2f 32 30  30 36 20 31 34 3a 30 37  |.9/20/2006 14:07|
000002b0  3a 30 38 00 00 00 00 53  00 00 0c 47 65 6e 65 72  |:08....S...Gener|
000002c0  61 74 65 64 20 42 79 00  00 00 02 00 3a 52 65 61  |ated By.....:Rea|
000002d0  6c 50 72 6f 64 75 63 65  72 28 52 29 20 42 61 73  |lProducer(R) Bas|
000002e0  69 63 20 31 31 2e 30 20  66 6f 72 20 57 69 6e 64  |ic 11.0 for Wind|
000002f0  6f 77 73 2c 20 42 75 69  6c 64 20 31 31 2e 30 2e  |ows, Build 11.0.|
00000300  30 2e 32 30 30 39 00 00  00 00 31 00 00 11 4d 6f  |0.2009....1...Mo|
00000310  64 69 66 69 63 61 74 69  6f 6e 20 44 61 74 65 00  |dification Date.|
00000320  00 00 02 00 13 39 2f 32  30 2f 32 30 30 36 20 31  |.....9/20/2006 1|
00000330  34 3a 30 37 3a 30 38 00  00 00 00 1d 00 00 09 76  |4:07:08........v|
00000340  69 64 65 6f 4d 6f 64 65  00 00 00 02 00 07 6e 6f  |ideoMode......no|
00000350  72 6d 61 6c 00 44 41 54  41 00 ba 3e 1e 00 00 00  |rmal.DATA..>....|

RealProducer Basic 11 for Windows. The Wikipedia article did hint at this by saying “the latest version of RealProducer reverted to using .ra for audio only files and began using .rv for video files with or without audio.” Why would they use the RM extension for so long, then revert to a different extension with a later version? I found more in the User Manual for version 11.

• .rv – RealVideo
RealProducer uses the .rv file extension if the input is video-only or video-with-audio. You can also select the .rm file extension for video content.
Tip: Using the .rv file extension helps search engines identify the file as a RealVideo clip.

• .rm – RealAudio or RealVideo
RealProducer chooses the .rm file extension if it cannot determine the content of the input clip. You can use .rm file extension for any RealAudio or RealVideo clip, except for variable bit-rate clips.

Ok, so a few things to learn from this. One is the RV extension was used as the default for version 11 as they wanted search engines to identify them as a RealVideo clip. Second thing we learned is there is no difference between the two placeholders in PRONOM, one being a RealVideo file and the other being a RealVideo Clip. We don’t need both.

Now, is there any difference between an RV and RM file?

hexdump -C Producer11-01.rv | head
00000000 2e 52 4d 46 00 00 00 12 00 01 00 00 00 00 00 00 |.RMF............|
00000010 00 07 50 52 4f 50 00 00 00 32 00 00 00 03 6e e8 |..PROP...2....n.|
00000020 00 03 6e e8 00 00 03 e0 00 00 01 c7 00 00 01 66 |..n............f|
00000030 00 00 1b 57 00 00 07 41 00 02 91 0a 00 00 03 5e |...W...A.......^|
00000040 00 03 00 09 43 4f 4e 54 00 00 00 40 00 00 00 00 |....CONT...@....|
00000050 00 00 00 08 28 43 29 20 32 30 30 35 00 26 00 00 |....(C) 2005.&..|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000080 00 00 00 00 4d 44 50 52 00 00 00 70 00 00 00 00 |....MDPR...p....|
00000090 00 02 c2 a4 00 02 c2 a4 00 00 03 e0 00 00 01 9f |................|

hexdump -C Producer11-01.rm | head
00000000 2e 52 4d 46 00 00 00 12 00 01 00 00 00 00 00 00 |.RMF............|
00000010 00 07 50 52 4f 50 00 00 00 32 00 00 00 03 6e e8 |..PROP...2....n.|
00000020 00 03 6e e8 00 00 03 e0 00 00 01 a4 00 00 01 64 |..n............d|
00000030 00 00 1b 57 00 00 05 a4 00 02 5c 35 00 00 03 5e |...W......\5...^|
00000040 00 03 00 09 43 4f 4e 54 00 00 00 40 00 00 00 00 |....CONT...@....|
00000050 00 00 00 08 28 43 29 20 32 30 30 35 00 26 00 00 |....(C) 2005.&..|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000080 00 00 00 00 4d 44 50 52 00 00 00 70 00 00 00 00 |....MDPR...p....|
00000090 00 02 c2 a4 00 02 c2 a4 00 00 03 e0 00 00 01 a4 |................|

They both look very similar to me. Aside from a few bytes, they are practically identical. Lets see what MediaInfo has to say.

mediainfo Producer11-01.rv
General
Complete name : Producer11-01.rv
Format : RealMedia
File size : 164 KiB
Duration : 6 s 999 ms
Overall bit rate : 225 kb/s
Frame rate : 24.000 FPS
Copyright : (C) 2005
FileExtension_Invalid : rm rmvb ra

Video
ID : 0
Format : RealVideo 4
Codec ID : RV40
Codec ID/Info : Based on AVC (H.264), Real Player 9
Duration : 6 s 999 ms
Bit rate : 181 kb/s
Width : 640 pixels
Height : 424 pixels
Display aspect ratio : 3:2
Frame rate : 24.000 FPS
Bits/(Pixel*Frame) : 0.028
Stream size : 155 KiB (94%)

Audio
ID : 1
Format : Cooker
Codec ID : cook
Codec ID/Info : Based on G.722.1, Real Player 6
Duration : 7 s 429 ms
Bit rate : 44.1 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 40.0 KiB (24%)

mediainfo Producer11-01.rm
General
Complete name : Producer11-01.rm
Format : RealMedia
File size : 151 KiB
Duration : 6 s 999 ms
Overall bit rate : 225 kb/s
Frame rate : 24.000 FPS
Copyright : (C) 2005

Video
ID : 0
Format : RealVideo 4
Codec ID : RV40
Codec ID/Info : Based on AVC (H.264), Real Player 9
Duration : 6 s 999 ms
Bit rate : 181 kb/s
Width : 640 pixels
Height : 424 pixels
Display aspect ratio : 3:2
Frame rate : 24.000 FPS
Bits/(Pixel*Frame) : 0.028
Stream size : 155 KiB

Audio
ID : 1
Format : Cooker
Codec ID : cook
Codec ID/Info : Based on G.722.1, Real Player 6
Bit rate : 44.1 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits

Other than the RV file having a invalid file extension, they both identify as a RealMedia file and have identical properties. So it seems the RV file is really no different than the RM file. I think the best course of action for PRONOM is to deprecate these two RV PUID’s and just ad RV as an acceptable extension for the RealMedia format.

To add to the evidence, here is the output from ffprobe:

Input #0, rm, from 'Producer11-01.rm':
Metadata:
copyright : (C) 2005
comment :
ASMRuleBook : #($Bandwidth >= 0),Stream1Bandwidth = 44100, Stream0Bandwidth = 180900;
Audiences : 256k DSL or Cable;
audioMode : music
Creation Date : 11/12/2024 20:28:55
Generated By : RealProducer(R) Plus 11.1 for Windows, Build 11.1.0.2676
Modification Date: 11/12/2024 20:28:55
videoMode : normal
Duration: 00:00:07.00, start: 0.000000, bitrate: 176 kb/s
Stream #0:0: Video: rv40 (RV40 / 0x30345652), yuv420p, 640x424, 180 kb/s, 24 fps, 24 tbr, 1k tbn
Stream #0:1: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s

Input #0, rm, from 'Producer11-01.rv':
Metadata:
copyright : (C) 2005
comment :
ASMRuleBook : #($Bandwidth >= 0),Stream1Bandwidth = 44100, Stream0Bandwidth = 180900;
Audiences : 256k DSL or Cable;
audioMode : music
Creation Date : 11/12/2024 20:28:16
Generated By : RealProducer(R) Plus 11.1 for Windows, Build 11.1.0.2676
Modification Date: 11/12/2024 20:28:16
videoMode : normal
Duration: 00:00:07.43, start: 0.000000, bitrate: 181 kb/s
Stream #0:0: Video: rv40 (RV40 / 0x30345652), yuv420p, 640x424, 180 kb/s, 24 fps, 24 tbr, 1k tbn
Stream #0:1: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s

But wait, there are a couple formats we could add which are related to RealProducer. RealProducer used a few other formats to manage projects and other metadata for streaming. They include:

  • .RP RealPix Image
  • .RT RealText
  • .RPAD RealProducer Audience File
  • .RPJF RealProducer Job File
  • .RPSD RealProducer Server Destination
  • .RMHD RealMediaHD file
  • .RAM Playlist
  • .RPM Embedded RAM
File TypeExtensionMIME Type
Ram.ramaudio/x-pn-realaudio
Embedded Ram.rpmaudio/x-pn-realaudio-plugin
SMIL.smil and .smiapplication/smil
RealAudio.raaudio/x-pn-realaudio
RealVideo.rmapplication/x-pn-realmedia
Flash.swfapplication/x-shockwave-flash
RealPix.rpimage/vnd.rn-realpix
RealText.rttext/vnd.rn-realtext
https://web.archive.org/web/20120513203726/http://service.real.com/help/library/guides/production8/htmfiles/server.htm

Don’t get excited, the RealPix Image format really isn’t an image, it is simply an XML file with all the details of an image or group of images. Pretty boring. It was however a big thing in the day, even got a full guide written up for the process. “All information in the file occurs between an opening <imfl> tag and a closing </imfl> tag. This is the only tag that uses an end tag.” This format was the topic of discussion as malicious code could be in the RP file and executed just by having someone load your webpage. IMFL is obviously an acronym, but none of the documents I could find tells me what it stands for, so I did what everyone does now, I asked ChatGPT.

The RealPix format by RealNetworks, which was used for interactive multimedia content, indeed utilized IMFL as its tagged format. IMFL stands for “Interleaved Media File Language.” This markup was particularly designed to handle multimedia presentations, allowing the synchronization of images, audio, and video in a slideshow-style format. It used XML-like syntax where elements like <imfl>, <head>, and <fadein/> defined media objects, transitions, and their timing. Key components included attributes for positioning, color, and animation effects, making RealPix a flexible format for creating multimedia sequences compatible with RealPlayer.

For technical details, the RealPix format closely resembles SMIL (Synchronized Multimedia Integration Language) and supports strict tag closure and case sensitivity. This means all tags and attribute names must be lowercase, and attributes must be in double quotes, as seen in SMIL and RealSystem G2 markup, RealNetworks’ broader multimedia framework.

When I asked for a source, it could not give me one. So not sure if it is the correct answer, but it seems to fit. Here are some samples of RP, RT and SMIL files.

For RealText with the RT extension, we find a similar tagged text. This format is used to provide text presentations to go along with Images, Audio, or Video. The tagged text then describes when and how the text is displayed. This is all done in a player window, therefore the root tag of these RT documents starts and ends with <window>. I guess these could be considered a subtitle format for streaming media.

The SMIL files is interesting, it is known standard, but in many cases, does not have an XML declaration, therefore not identified by current PRONOM. They are used to link everything together. I might suggest a variant of the SMIL format to not have the XML declaration to identify these formats correctly.

<smil>
<body>
<par>
<textstream src=”rtsp://realserver.company.com/mary.rt”/>
<video src=”rtsp://realserver.company.com/mary.rm”/>
</par>
</body>
</smil>

The .RPAD RealProducer Audience File, .RPJF RealProducer Job File, .RPSD RealProducer Server Destination are all XML files for managing some of the configuration found in the RealProducer software.

cat 56k\ Dial-up.rpad
<?xml version="1.0" encoding="UTF-8"?>
<audience xmlns="http://ns.real.com/tools/audience.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.real.com/tools/audience.2.0 http://ns.real.com/tools/audience.2.0.xsd">
<avgBitrate type="uint">34000</avgBitrate>
<maxBitrate type="uint">68000</maxBitrate>
<streams>

cat RealProducer11-01.rpjf
<?xml version="1.0" encoding="UTF-8"?>
<job xmlns="http://ns.real.com/tools/job.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ns.real.com/tools/job.2.0 http://ns.real.com/tools/job.2.0.xsd">
<enableTwoPass type="bool">true</enableTwoPass>
<clipInfo>

cat Multicast\ Push\ Server.rpsd
<?xml version="1.0" encoding="UTF-8"?>
<destination xsi:type="pushServer" xmlns="http://ns.real.com/tools/server.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.real.com/tools/server.2.0 http://ns.real.com/tools/server.2.0.xsd">
<pluginName type="string">rn-server-rbs</pluginName>

Those three formats should be easy enough, especially if we look for Namespace urls.

The RAM and RPM formats are simply text files with a URL. You can find some samples here and here.

An RM and RV file are the same format as the RMVB file but just with a variable bitrate. Later on a new format was used to improve the quality of video. This format has the extension RMHD, referring to RealMedia HD. Let’s take a look.

hexdump -C DSC_0009.rmhd | head
00000000 2e 52 4d 50 00 00 00 12 00 01 00 00 00 00 00 00 |.RMP............|
00000010 00 07 50 52 4f 50 00 00 00 36 00 02 00 04 f7 33 |..PROP...6.....3|
00000020 00 04 f7 33 00 00 11 bd 00 00 02 5d 00 00 01 d2 |...3.......]....|
00000030 00 00 1b 2e 00 00 00 00 00 00 00 00 00 04 65 68 |..............eh|
00000040 00 00 01 6f 00 02 00 03 43 4f 4e 54 00 00 00 12 |...o....CONT....|
00000050 00 00 00 00 00 00 00 00 00 00 4d 44 50 52 00 00 |..........MDPR..|
00000060 00 76 00 00 00 00 00 03 24 64 00 03 24 64 00 00 |.v......$d..$d..|
00000070 11 bd 00 00 04 2a 00 00 00 00 00 00 00 00 00 00 |.....*..........|
00000080 1b 2e 0c 56 69 64 65 6f 20 53 74 72 65 61 6d 14 |...Video Stream.|
00000090 76 69 64 65 6f 2f 78 2d 70 6e 2d 72 65 61 6c 76 |video/x-pn-realv|

The format looks very similar, but has the magic header of .RMP instead of .RMF. MediaInfo and FFProbe are unaware of the format. The software mentions a RV11 codec which is confusing as the codecs went from RV10-RV60.

Phew, that was a lot considering the two formats I tried to research came up the same as an existing format. There are probably others I have missed. I did see a reference to an RMX format which seems to be an encrypted RM file. The header is the same so it will identify as a RealMedia file, but with the wrong extension. Let me know if you come across any. I have some samples of the formats mentioned here, plus a proposal of new signatures on my Github repository.

HFE

Last week I had the pleasure of attending the 20th annual iPres conference on Digital Preservation in Ghent, Belgium. I enjoyed hearing from many of my respected colleagues on many aspects of preservation including one of my favorite topics, floppy disks. There was tutorials, lightning talks, and even a workshop, presented by Leontien Talboom, Elizabeth Kata, Chris Knowles, and myself. We titled the workshop “A Guide to Imaging Obscure Floppy Disk Formats“. The workshop was conceived by a mutual interest in imaging Wang 5.25in word processor disks, but expanded to include imaging of Amstrad 3in disks, 240K Brother Typewriter Disks, and Macintosh 400/800k disks.

I brought my hand soldered FluxEngine board and others brought their Greaseweazle board to show off how imaging obscure and uncommon disks can be done on a budget.

Photo of workshop taken on a Mavica Floppy Disk camera
Image taken during workshop on a Mavica FD200 Floppy Disk Camera.

During the conference we talked a bit about the different type of hardware that can be used and the difference between a disk image and flux image. There seems to be quite the exhaustive list of different types of file formats, some specific to a platform and others more generic. I recently did a blog post on the formats used by the Applesauce software, which have some unique features.

There are many disk image types which should be researched and added to PRONOM and other format description sites, but today lets take a look at a generic format used by many tools.

The HxC Floppy Emulator file format which the extension HFE is a popular format used with floppy drive emulators. There is a lot of complexity with what is included in many of these image formats, some are simply a raw sector representation of the binary data on a disk, others contain the complete flux readings from a floppy disk. The HFE format contains a little more than a raw image, including a header, a track lookup table, and the bitstreams for each track all with the purpose of emulating the physical media. The HFE format contains only a single pass over the data, where other formats may contain multiple reading of each track to get more complete data which can be helpful for damaged or purposely copy-protected disks. You can read more on Ashley’s blog, Library of Congress format description.

HFE version list

When using the HxC Floppy Emulator software, you can open and save to many different formats. The main format being their HFE native format. It comes in 5 versions.

hexdump -C test01.hfe | head
00000000 48 58 43 50 49 43 46 45 00 53 02 00 e8 01 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

Above is a hexdump of the main SDCard HxC Floppy Emulator file format. The format specification shows the 8 byte header “HXCPICFE”. This is a very unique pattern and should be all we need to make a robust signature for the format, but we do need to take into account the other HFE “versions” and see if they might clash or need to be identified separately.

hexdump -C test02-a2.hfe | head 
00000000 48 58 43 50 49 43 46 45 00 53 02 00 d0 03 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

The “A2” version of the format has the same header but some different bytes further into the file.

hexdump -C test03-rev2.hfe | head
00000000 48 58 43 50 49 43 46 45 01 53 02 00 00 00 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

The “Rev 2” version also has the same header. But if you look at the 9th byte you can see the value changed from 00 to 01, which according to the specification, this is the revision byte.

hexdump -C test04-rev3.hfe | head 
00000000 48 58 43 48 46 45 56 33 00 53 02 00 e8 01 00 00 |HXCHFEV3.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

With “Rev 3” we see a change in the header with “HXCHFEV3” which appears to be referred to as HFEv3.

hexdump -C test05-stream.hfe | head 
00000000 48 78 43 5f 53 74 72 65 61 6d 5f 49 6d 61 67 65 |HxC_Stream_Image|
00000010 00 00 00 00 00 00 00 00 00 18 00 00 00 02 00 00 |................|
00000020 00 1a 00 00 53 00 00 00 02 00 00 00 40 9c 00 00 |....S.......@...|
00000030 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

This last format seems to be a special HxC stream image.

It seems the best option is to make three signatures to identify the three main headers. Additional software can be used to further parse the disk image. If you would like to see some sample images, you can download a bunch here. You can also take a look at my GitHub repository to see additional samples and a proposed set of signatures.

ATRAC

The year was 2001 and I found myself in need of an audio player and recorder. I had been burning CD’s for a few years, making mixed CD’s was fun and convenient, but I needed more flexibility. After some research I decided on a device that was super popular outside the United States, but was gaining some loyal fans.

This MZ-G750 MiniDisc device could record in a standard high quality mode through RCA, optical digital cable, and an optional microphone in mini-plug. This model also had the LP2 and LP4 modes which compressed higher, but could record up to 320 minutes on one MD disc.

Sony accomplished this by using a propriety compression codec called ATRAC, or Adaptive TRansform Acoustic Coding. This compression format was used with the MiniDisc and other Sony devices like the the flash memory Walkman’s sold later.

I recorded and stored a lot of music on the few disc’s I purchased over the next year, but as you may have surmised, the iPod came out later that year. I waited a bit but eventually purchased the updated 10GB model and the MiniDisc only was used to make a few recordings over the next little while.

As good as the MiniDisc is, the model I owned could record in a digital format, but lacked the connections to transfer the audio to a computer unless you used the optical cable and captured in real time to a computer with an optical input. This was by design, even when they put USB ports on later models, the software only allowed sending audio to the MiniDisc, but not back from the device.

A few years back I heard of some work the community has done to bring MiniDisc’s back from shadows. Now there is a thriving market and some models can cost a pretty penny. With that came some great tools and the ability to copy from the device back to the computer. The only problem, my device lacks a USB port. I kept my eye out for a “good” deal on a NetMD MiniDisc device. It took some time, but I am happy to report I am now the proud owner of a MZ-N420D.

With a new USB capable NetMD in hand, lets take a look at the different ATRAC formats!

The most common ATRAC formats are the ATRAC3 versions which generally have the extension OMA or OMG. But let’s start with ATRAC1, the format used on my earlier MiniDisc device when captured in Standard Mode. Using the amazing https://webmd.pro/ tool, I was able to connect my new device and “archive” my disc.

hexdump -C Test1.aea | head
00000000 00 08 00 00 54 65 73 74 31 00 00 00 00 00 00 00 |....Test1.......|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 00 00 00 1e 01 00 00 02 00 00 00 00 00 00 00 |................|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000b50 0c a0 45 57 54 44 32 35 41 44 22 34 32 24 13 23 |..EWTD25AD"42$.#|
00000b60 32 23 22 12 11 11 11 11 76 18 69 75 f8 63 69 a7 |2#".....v.iu.ci.|
00000b70 a4 5d 46 22 45 36 1f 59 55 9d 41 55 19 51 45 17 |.]F"E6.YU.AU.QE.|
00000b80 45 14 55 38 c2 cb 2c b2 88 26 fd b2 17 b3 f0 0f |E.U8..,..&......|

ffprobe -i Test1.aea
[aea @ 0x7fc5e6c04fc0] Estimating duration from bitrate, this may be inaccurate
Input #0, aea, from 'Test1.aea':
Duration: 00:00:01.63, bitrate: 302 kb/s
Stream #0:0: Audio: atrac1, 44100 Hz, stereo, fltp, 292 kb/s

ATRAC1 files can have the AEA extension, which ffmpeg can decode, but MediaInfo doesn’t appear to have added the support. According to the decoder the magic numbers for the ATRAC1 format are “Magic is ‘00 08 00 00‘ in little-endian”. This pattern matches my files, but the recent addition PRONOM fmt/1968 doesn’t match all the samples I have.

The magic numbers are too simple to be the only pattern used in a signature. The Track title follows the magic numbers but are not static. Then there are quite a bit of zero bytes, like a lot. All the samples I have seem to have some data around the 260 offset, then more zero bytes until around 2400 to 2800 byte offset range. I scanned all the samples I have through Tridscan, and it looks like the only bytes in common are the magic header, lots of zero’s, and a few strings.

	<GlobalStrings>
<String>ED33</String>
<String>EUD3</String>
<String>FTDC</String>
<String>T322</String>
<String>TC32</String>
<String>TC43</String>
<String>UC22</String>
<String>UED3</String>
<String>VD33</String>
<String>VETC</String>
<String>WEDD</String>
</GlobalStrings>

The ffmpeg libavformat code does tell us at byte 264 there will be a 01 or 02 which indicates channels. 44.1 kHz is assumed and the bitrate is calculated from a constant by how many channels, so not much else to identify common patterns. More testing needed.

ATRAC3 is what allowed my original MiniDisc to record in LP2 and LP4, extending the recording time. This format was also how some DRM was added to the device and computer to allow for some checking-in and checking-out of files, but to control their use. This was done with Desktop software from Sony, originally in the form of the title SonicStage, later incorporating OpenMG to manage the DRM. I used SonicStage to encode some audio into OMG and OMA formats.

OpenMG format files

These are audio files which have been converted to ATRAC3 format and encrypted in OpenMG format, which is the copyright protection technology for audio contents specific to OpenMG (with the extension .omg).

hexdump -C 01-Untitled.omg | head
00000000 30 80 30 80 06 07 66 6f 70 65 6e 4d 47 02 02 03 |0.0...fopenMG...|
00000010 eb 04 14 01 0f 50 00 00 04 00 00 00 ba d0 90 49 |.....P.........I|
00000020 3d 7f 61 7b 91 c4 30 06 02 67 01 02 02 3f 00 06 |=.a{..0..g...?..|
00000030 02 68 01 02 04 00 59 47 80 02 01 00 02 03 02 03 |.h....YG........|
00000040 a0 02 02 01 80 02 01 00 00 00 04 08 f5 94 79 c9 |..............y.|
00000050 6b 78 75 22 04 84 00 59 5e 30 83 0b 71 39 e3 e8 |kxu"...Y^0..q9..|
00000060 27 29 00 00 00 00 00 00 00 00 26 e2 65 d0 de e0 |')........&.e...|
00000070 69 19 73 45 1c c4 3b 36 8d 02 3b 72 bd eb 84 df |i.sE..;6..;r....|
00000080 cd 20 4e 43 d3 e3 23 8a 3f 9e df 80 f1 86 d1 aa |. NC..#.?.......|
00000090 2b 93 bf 09 59 0d d6 8f 78 5d 45 3a 9f d8 79 8b |+...Y...x]E:..y.|

ffprobe -i /01-Untitled.omg
[oma @ 0x7fed2440e980] Format oma detected only with low score of 1, misdetection possible!
[oma @ 0x7fed2440e980] Couldn't find the EA3 header !
/01-Untitled.omg: Invalid data found when processing input

The good news is there appears to be a standard header for the OMG format, but ffmpeg assumes they are OMA files. Turns out OMG was the original form of the format, but was replaced with OMA starting with SonicStage v2.1.

hexdump -C 01-Untitled.oma | head
00000000 65 61 33 03 00 00 00 00 17 76 54 49 54 32 00 00 |ea3......vTIT2..|
00000010 00 17 00 00 02 00 55 00 6e 00 74 00 69 00 74 00 |......U.n.t.i.t.|
00000020 6c 00 65 00 64 00 28 00 31 00 29 54 41 4c 42 00 |l.e.d.(.1.)TALB.|
00000030 00 00 11 00 00 02 00 55 00 6e 00 74 00 69 00 74 |.......U.n.t.i.t|
00000040 00 6c 00 65 00 64 54 58 58 58 00 00 00 17 00 00 |.l.e.dTXXX......|
00000050 02 00 4f 00 4d 00 47 00 5f 00 54 00 52 00 41 00 |..O.M.G._.T.R.A.|
00000060 43 00 4b 00 00 00 31 54 58 58 58 00 00 00 25 00 |C.K...1TXXX...%.|
00000070 00 02 00 4f 00 4d 00 47 00 5f 00 41 00 4c 00 42 |...O.M.G._.A.L.B|
00000080 00 4d 00 53 00 00 00 55 00 6e 00 74 00 69 00 74 |.M.S...U.n.t.i.t|
00000090 00 6c 00 65 00 64 54 58 58 58 00 00 00 23 00 00 |.l.e.dTXXX...#..|
*
00000c00  45 41 33 03 00 60 ff 80  00 00 00 00 01 0f 50 00  |EA3..`........P.|
00000c10  00 04 00 00 00 60 8a 07  e3 0a c9 91 63 46 c6 bc  |.....`......cF..|
00000c20  22 52 03 76 00 05 66 48  00 00 3b 86 00 00 00 00  |"R.v..fH..;.....|
00000c30  00 00 20 30 00 00 00 00  00 00 00 00 00 00 00 00  |.. 0............|
00000c40  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

ffprobe -i 01-Untitled.oma
Input #0, oma, from '01-Untitled.oma':
Metadata:
title : Untitled(1)
album : Untitled
OMG_TRACK : 1
OMG_ALBMS : Untitled
OMG_ASGTM : 2366000
OMG_TIT2S : Untitled(1)
TLEN : 353000
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Audio: atrac3al ([34][0][0][0] / 0x0022), 44100 Hz, stereo, fltp

We learned from trying an OMG file in ffprobe that ffmpeg is looking for EA3 header, which is found in this OMA file. Both of these formats should have a nice header to work from for a signature. In fact there has already been a request and signature submitted for the OMA format. Mine are slightly different, but only takes a small tweak to work with all my samples. Also, it seems the extension AA3 was used for awhile before settling on OMA. OMA can have a few different types:

ffprobe -i 02-Untitled.oma 
[oma @ 0x7fbc7ef047c0] Estimating duration from bitrate, this may be inaccurate
Input #0, oma, from '/Star Trek/02-Untitled.oma':
Metadata:
title : Untitled(2)
album : Star Trek
OMG_TRACK : 2
OMG_ALBMS : Star Trek
OMG_ASGTM : 2366000
OMG_TIT2S : Untitled(2)
TLEN : 27000
Duration: 00:00:27.21, start: 0.000000, bitrate: 193 kb/s
Stream #0:0: Audio: atrac3p ([1][0][0][0] / 0x0001), 44100 Hz, stereo, fltp, 192 kb/s

I’ll leave the technical properties to be handled by tools more suited for parsing the format like ffmpeg. Maybe MediaInfo could have the formats added, but until then, it might be best to simply identify the main format. I am also aware of some later additions to the ATRAC family, such as ATRAC3plus, ATRAC Advanced Lossless, and ATRAC9 (WAV RIFF). There are other extensions like AT3 out there which use the ATRAC codec, like Sony’s Playstation or PSP. I will have to keep my eyes out for the even more elusive Hi-MD MiniDisc devices to find out more. For now, take a look at some samples and my proposal for signatures on my GitHub.

ePic

Image compression has been around for awhile. It seems everyone took a crack at making better algorithms to improve quality and size. Some chose to invent new ways and others chose to use existing methods but with their own flare. Kodak tried this with their PhotoCD, but there was a couple other photo processing options that popped up in 90’s. One was Seattle FilmWorks and another was Konica PC PictureShow. Both of which used “proprietary” formats to deliver developed film on disk.

Seattle FilmWorks later called PhotoWorks, used an image format with the extension SFW and was based on BMP and JPG, but with their own twist. The same goes for the format used by Konica’s PC PictureShow.

Konica PC PictureShow Disk

If you took your film in to be developed at one of Konica’s photo labs, you could could have those images put on a diskette or later a CD-R. The disks came with software to view your photos called PC PhotoShow. The images stored on disk where in another proprietary format with the extension KQP. The KQP format was actually licensed from another company called Pegasus Imaging Corporation, later known as Accusoft. They developed their own way to compress a JPEG file which they called an ePic. An SDK called PICTools was offered for many years, but seems not to be available anymore.

ePIC (Proprietary)
  • Supports PIC format compression, replacing the JPEG Huffman encoder with the proprietary ELS entropy encoder for 15% more compression.
  • Can be losslessly converted back to JPEG format using Op_RORE.

A search on the internet for Konica KQP shows quite a few people over the years wondering what to do with their old disks and converting the old format to JPG, only to find a lack of information and available tools to do so. One such person used python to edit the file and making the file renderable as a JPG. While the method worked well for their KQP files, it might not work for all of them. Let’s look closer and understand why.

hexdump -C Sample.PIC | head
00000000 42 4d 00 00 00 00 00 00 00 00 42 04 00 00 44 00 |BM........B...D.|
00000010 00 00 34 08 00 00 24 fa ff ff 01 00 18 00 4a 50 |..4...$.......JP|
00000020 45 47 00 00 00 00 00 00 00 00 00 00 00 00 fc 00 |EG..............|
00000030 00 00 ec 00 00 00 2c 00 00 00 18 00 00 00 00 00 |......,.........|
00000040 00 00 02 00 00 00 08 00 00 00 01 00 00 00 01 00 |................|
00000050 00 00 60 00 00 00 00 00 60 00 00 60 00 00 00 00 |..`.....`..`....|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

At first glance the file appears to be a Bitmap (BMP), and it does have a Bitmap header claiming to have JPEG compression, but if we look a little further into the file.

identify -verbose Sample.PIC   
identify: length and filesize do not match `Sample.PIC' @ error/bmp.c/ReadBMPImage/950.
identify: unrecognized compression `Sample.PIC' @ error/bmp.c/ReadBMPImage/1019.

hexdump -C Sample.PIC
00000000 42 4d 00 00 00 00 00 00 00 00 42 04 00 00 44 00 |BM........B...D.|
00000010 00 00 34 08 00 00 24 fa ff ff 01 00 18 00 4a 50 |..4...$.......JP|
00000020 45 47 00 00 00 00 00 00 00 00 00 00 00 00 fc 00 |EG..............|
00000030 00 00 ec 00 00 00 2c 00 00 00 18 00 00 00 00 00 |......,.........|
00000040 00 00 02 00 00 00 08 00 00 00 01 00 00 00 01 00 |................|
00000050 00 00 60 00 00 00 00 00 60 00 00 60 00 00 00 00 |..`.....`..`....|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400 00 00 60 00 00 00 00 00 60 00 00 60 00 00 00 00 |..`.....`..`....|
00000410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000440 00 00 ff d8 ff e0 00 10 4a 46 49 46 00 01 02 02 |........JFIF....|
00000450 00 00 00 00 00 00 ff e1 00 0a 50 49 43 00 01 19 |..........PIC...|
00000460 1e 01 ff c0 00 11 08 05 dc 08 34 03 01 11 00 02 |..........4.....|

We find a JPG marker, in fact almost the whole jpg file is included, except the quantization tables for luminance and chrominance which are needed to properly display the image. This is the area the Pegasus company thought they could encode better to further compress the image. Their method was to use a new algorithm called ELS (Entropy Logarithmic-Scale). This new method was used by the PICTools software to make a Pegasus PIC file while Konica used it for their KQP format. They are identical. By choosing the luminance and chrominance values during compression, you could make a highly compressed image, but required specific software to render.

Pegasus also made use of a special custom APP marker (PIC) within the JPEG structure of the PIC/KQP and also any JPG compressed using their software. This marker which takes up around 8 bytes holds the luminance and chrominance values. Take the above sample for instance, it is compressing the image with a Luminance of 25 and a Chrominance of 30, these are integer values and in hex they would be “19” and “1E” respectively.

hexdump -C Sample.PIC      
00000440 00 00 ff d8 ff e0 00 10 4a 46 49 46 00 01 02 02 |........JFIF....|
00000450 00 00 00 00 00 00 ff e1 00 0a 50 49 43 00 01 19 |..........PIC...|
00000460 1e 01 ff c0 00 11 08 05 dc 08 34 03 01 11 00 02 |..........4.....|
00000470 11 01 03 11 01 ff c4 00 51 00 01 00 03 01 00 00 |........Q.......|

So in theory one could strip out any part of the file before the JPG beginning of file magic bytes (FF D8 FF E0), locate the APP marker, use the values to generate the two quantization tables, insert them in the appropriate spot and save out a JPG file.

This may be the case for the first few versions of the ePic format, but later versions got more complicated. It seems a “PIC2” version replaced the earlier versions and this format is a little more complicated.

hexdump -C Sample.KQP | head
00000000 50 49 43 32 01 08 00 00 00 64 00 01 00 b9 3e 00 |PIC2.....d....>.|
00000010 00 05 08 00 00 00 4a 50 47 45 03 00 00 00 16 24 |......JPGE.....$|
00000020 00 00 00 43 6f 6d 70 72 65 73 73 69 6f 6e 20 62 |...Compression b|
00000030 79 20 50 65 67 61 73 75 73 20 49 6d 61 67 69 6e |y Pegasus Imagin|
00000040 67 20 43 6f 72 70 2e 06 68 3e 00 00 ff d8 ff e0 |g Corp..h>......|
00000050 00 10 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 |..JFIF..........|
00000060 ff e1 00 16 50 49 43 00 03 00 00 01 00 00 00 00 |....PIC.........|
00000070 00 00 00 00 00 00 00 00 ff db 00 84 00 0f 0a 0a |................|
00000080 0a 0a 06 0f 0a 0a 0a 0f 0f 0f 0f 14 1e 14 14 14 |................|
00000090 14 14 28 1e 1e 19 1e 2d 28 32 32 2d 28 2d 2d 32 |..(....-(22-(--2|

Instead of the Bitmap (BMP) header, a proprietary PIC2 header is used, still containing a JPG in the JFIF format along with a the PIC APP marker, but encoded in a way that the simple method of adding a quantization table may not work. With the original format the JPG and the PIC/KQP were approximately the same size, this new version significantly reduces the size of the PIC/KQP in comparison with the JPG.

The ELS compression technology used in the ePic format seems to be patented by Pegasus and Accusoft, but is not entirely hidden as the libavcodec library includes a ELS decoder. Might be a fun project to use the code to decode the PIC/KQP formats fully.

In the meantime, a signature identifying the two versions should be added to PRONOM. Check out my proposal on my GitHub. If you need to convert your KQP or PIC files back to JPG here are a few links:

Konica PC PictureShow Version 4 (PIC2)

Accusoft PICTools Apollo Demo (Windows 7 Compatible)

Konica PC PictureShow for Macintosh

Interactive Quicktime

One of my favorite legacy formats to explore is any type of multimedia CD-ROM. The 1990’s and early 2000’s were filled with all sorts of multimedia for CD, Web, and Television. It is also one of the most difficult formats to try and preserve for the future. Many CD-ROM’s are filled with executables and/or Macromedia Director media, later having flash content. The operating systems and security needs today make playback almost impossible. For this reason many have built emulation services to mimic the original operation system and software to allow the many historic multimedia CD-ROM’s to once again interact with the user in a way many current systems still struggle with.

Many CD-ROM’s would come as Hybrid disc’s allowing them to be used on a Windows and Macintosh system, sometimes providing two different experiences. Then there were CD-Extra or Enhanced CD‘s as a separate session to an Audio CD which would contain bonus content playable only on a computer.

For fun I took a look back at some of my older Audio CD titles. I came across a couple, one claiming to be a “CD-Extra” and another an “Enhanced CD“. The CD-Extra disc when queried with cd-info claimed to have 12 tracks, with the 12th being a data XA track.

Disc mode is listed as: CD-ROM Mixed
CD-ROM Track List (1 - 12)
#: MSF LSN Type Green? Copy? Channels Premphasis?
1: 00:02:00 000000 audio false no 2 no
2: 02:13:66 009891 audio false no 2 no
3: 05:21:28 023953 audio false no 2 no
4: 08:18:19 037219 audio false no 2 no
5: 12:28:37 055987 audio false no 2 no
6: 16:11:58 072733 audio false no 2 no
7: 19:21:56 086981 audio false no 2 no
8: 23:17:49 104674 audio false no 2 no
9: 26:01:17 116942 audio false no 2 no
10: 28:30:02 128102 audio false no 2 no
11: 31:07:70 139945 audio false no 2 no
12: 37:29:46 168571 XA true no
170: 51:35:07 231982 leadout (520 MB raw, 516 MB formatted)
CD Analysis Report
CD-Plus/Extra
session #2 starts at track 12, LSN: 168571

Mounting the 12th track showed a mix of Macromedia Director (.DIR) files and quite a few Quicktime MOV movies. Playback was not possible on my current computer so I had to resort to using an emulator to experience this bonus content, full of band member photos and biographies.

The other disc I pulled out to explore was a bit different. Using cd-info the disc looked very similar:

Disc mode is listed as: CD-ROM Mixed
CD-ROM Track List (1 - 13)
#: MSF LSN Type Green? Copy? Channels Premphasis?
1: 00:02:00 000000 audio false no 2 no
2: 04:20:08 019358 audio false no 2 no
3: 08:04:27 036177 audio false no 2 no
4: 11:15:62 050537 audio false no 2 no
5: 14:54:32 066932 audio false no 2 no
6: 19:57:73 089698 audio false no 2 no
7: 26:12:36 117786 audio false no 2 no
8: 29:51:59 134234 audio false no 2 no
9: 34:44:00 156150 audio false no 2 no
10: 39:36:62 178112 audio false no 2 no
11: 42:06:01 189301 audio false no 2 no
12: 45:42:26 205526 audio false no 2 no
13: 57:10:54 257154 XA true no
170: 72:56:67 328117 leadout (735 MB raw, 730 MB formatted)
CD Analysis Report
CD-Plus/Extra
session #2 starts at track 13, LSN: 257154

The disc’s, even though were labeled CD-Extra and Enhanced CD, had the same structure and format. The difference was in the type of multimedia used. There was a simple application which launched Quicktime and loaded a single MOV movie. But, this was not your regular Quicktime Movie, this is a highly complex Interactive Quicktime movie.

The Quicktime movie could only be launched from an older operating system using Quicktime 6, and on the Macintosh, only a PPC CPU. The movie would launch with an interactive menu, allowing navigation as you might find on a DVD or Flash website, but all within a single MOV file. When I ran MediaInfo on the MOV file I got back quite a few tracks:

<media ref="/Volumes/VOLCANOECD/ALECD.mov">
<track type="General">
<VideoCount>10</VideoCount>
<AudioCount>1</AudioCount>
<OtherCount>51</OtherCount>
<FileExtension>mov</FileExtension>
<Format>QuickTime</Format>
<Format_Settings>Compressed header</Format_Settings>

Ten video tracks and 51 other tracks. Exploring with Quicktime, I could see the entire list of embedded content:

Quicktime movies, an Audio track, dozens of Flash, Photos, Animations, Sprites, with the possibility of more. These types of Quicktime files had requirements in order to run with Quicktime 6 being the last which could playback all the content correctly. Current versions of Quicktime give a warning on the lack of compatibility.

This Interactive Quicktime movie proudly claims; “Made with LiveStage Pro“, which was an authoring environment for Quicktime made by Totally Hip Software Inc. Started in 1995, but seemed to disappear after 2004 with no new development and by 2014 the website went offline.

If you would like to see a couple of Apple created simple examples see here.

LiveStage Pro was a very powerful authoring tool in its time, another similar tool called Electrifier competed for the interactive Quicktime market. Adobe GoLive also competed, but offered fewer features. The final Quicktime movie exported from LiveStage Pro was the main component, but the software did save a project format with the extension “LSD”. Versions 2 through 4 of LiveStage Pro had a similar header.

hexdump -C LiveStagePro4-s01.lsd | head
00000000 4c 53 41 46 00 00 00 04 00 00 09 16 00 00 00 00 |LSAF............|
00000010 00 00 00 00 00 00 00 00 00 00 09 0a 73 65 61 6e |............sean|
00000020 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 18 |................|
00000030 56 53 4e 6e 00 00 00 01 00 00 00 00 00 00 00 00 |VSNn............|
00000040 00 00 00 04 00 00 08 84 4d 50 52 4e 00 00 00 01 |........MPRN....|
00000050 00 00 00 49 00 00 00 00 00 00 00 21 6d 4f 55 54 |...I.......!mOUT|
00000060 00 00 00 01 00 00 00 00 00 00 00 00 55 6e 74 69 |............Unti|
00000070 74 6c 65 64 2e 6d 6f 76 00 00 00 00 18 57 6c 65 |tled.mov.....Wle|
00000080 66 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |f...............|
00000090 00 00 00 00 18 57 74 6f 70 00 00 00 01 00 00 00 |.....Wtop.......|

All the samples from version 2 through 4 have the first four bytes as “LSAF“. It also seems the next four bytes may be version related. Version 1 however has a different header.

hexdump -C contest.lsd | head
00000000 4c 53 50 72 00 00 00 08 00 00 00 00 00 00 02 80 |LSPr............|
00000010 01 e0 00 00 00 00 02 58 00 00 00 01 00 00 00 01 |.......X........|
00000020 00 00 00 02 00 00 00 00 00 08 00 00 00 00 00 00 |................|
00000030 00 00 08 53 02 d9 ff c9 04 76 02 97 01 00 44 00 |...S.....v....D.|
00000040 0b 02 fb 03 c9 00 00 00 01 00 00 00 01 00 00 00 |................|
00000050 00 07 41 63 74 69 6f 6e 73 00 00 00 00 00 00 00 |..Actions.......|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 00 00 00 00 00 05 00 00 00 01 50 49 43 |.............PIC|
00000080 54 ff ff 00 00 c1 ff 03 72 65 64 65 6e 6e 41 79 |T.......redennAy|
00000090 98 05 41 77 78 00 00 01 7a 00 10 00 00 31 fc 30 |..Awx...z....1.0|

Identification of a LiveStage project should be simple enough, but identifying and rendering back a Quicktime movie made by this software takes some work. In fact there are many “Enhanced CD’s” and CD-Extra titles out there with quite a few system requirements. If we are not careful, many of these little gems might get more difficult to experience or lost completely.

If you would like to explore the Quicktime Movie from the Enhanced CD mentioned here, send me a message. You can also take a look at my signature proposal and samples files on my Github for LiveStage.

SDIF

I have used and have researched a lot of audio editing software. Some are very simple and straightforward, others are feature rich and take some time to learn. While looking in a format, I came across some Audio software which nothing like I have used before. At first I was confused, I figured it would be simple to open a certain file format and play the audio. Not so fast.

Max is software which proudly says it is an, “infinitely flexible space to create your own interactive software”. Created by Cycling ’74 software, Max has been around for awhile, being developed in the mid 1980’s. It allows the user to make “patches” stringing around components and effects to accomplish an infinite amount of options and outcomes.

The software produces simple project files and patch files, but hey are just JSON data, at least in the latest version. But when working with audio files the software can save to a number of formats.

One of the options is a format called “SDIF”, which stands for “Sound Description Interchange Format“. SDIF was jointly developed by IRCAM and CNMAT, with proposals starting back in the mid-1990’s. Originally written as a Spectral Description, it was later changed to refer to a Sound Description.

The Specification states the general idea was to “store information related to signal processing and specifically of sound, in files, according to a common format to all data types. Thus, it is possible to store results or parameters of analyses, syntheses…” So not exactly the same as a simple WAVE file you can open and edit, this format was meant to store signal data for analysis.

Each SDIF file consists of a header and then an overall a succession of frames, not unlike chunks in the IFF/AIFF/RIFF formats, ordered in time. Each frame matrix declares a “Type” which can be a combination of many options. Lets take a look at a SDIF file:

hexdump -C test.sdif | head
00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
00000010 31 54 52 43 00 00 00 20 00 00 00 00 00 00 00 00 |1TRC... ........|
00000020 00 00 00 01 00 00 00 01 31 54 52 43 00 00 00 04 |........1TRC....|
00000030 00 00 00 00 00 00 00 04 31 54 52 43 00 00 00 c0 |........1TRC....|
00000040 3f 74 7a e1 40 00 00 00 00 00 00 01 00 00 00 01 |?tz.@...........|
00000050 31 54 52 43 00 00 00 04 00 00 00 0a 00 00 00 04 |1TRC............|
00000060 3f 80 00 00 45 95 35 c3 00 00 00 00 00 00 00 00 |?...E.5.........|
00000070 40 00 00 00 46 06 e2 14 00 00 00 00 00 00 00 00 |@...F...........|
00000080 40 40 00 00 45 3b 42 3d 00 00 00 00 00 00 00 00 |@@..E;B=........|
00000090 40 80 00 00 43 5d 94 7b 00 00 00 00 00 00 00 00 |@...C].{........|

This test file has the opening frame “SDIF“, to identify it as an SDIF, then a reference to the type “1TRC. I would try and explain a Matrix 1TRC Sinusoidal Track, but I have no idea what it means. Something, something sine wave, etc. Someone much smarter than me can make use of this format. Here are a couple examples of SDIF with other frame types.

hexdump -C angry_cat.part.sdif| head
00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
00000010 31 4e 56 54 00 00 00 88 ff ef ff ff ff ff ff ff |1NVT............|
00000020 ff ff ff fd 00 00 00 01 31 4e 56 54 00 00 03 01 |........1NVT....|
00000030 00 00 00 61 00 00 00 01 53 74 72 65 61 6d 49 44 |...a....StreamID|
00000040 09 30 0a 44 61 74 65 09 54 68 75 5f 41 75 67 5f |.0.Date.Thu_Aug_|
00000050 5f 33 5f 32 31 2e 33 32 2e 34 35 5f 32 30 30 30 |_3_21.32.45_2000|
00000060 5f 0a 54 61 62 6c 65 4e 61 6d 65 09 53 69 6e 75 |_.TableName.Sinu|
00000070 73 6f 69 64 61 6c 54 72 61 63 6b 73 0a 57 72 69 |soidalTracks.Wri|
00000080 74 74 65 6e 42 79 09 50 6d 5f 56 65 72 73 69 6f |ttenBy.Pm_Versio|
00000090 6e 5f 31 2e 32 2e 32 0a 00 00 00 00 00 00 00 00 |n_1.2.2.........|

hexdump -C cymbalum-c4.res.sdif| head
00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
00000010 31 52 45 53 00 00 0d 20 00 00 00 00 00 00 00 00 |1RES... ........|
00000020 00 00 00 04 00 00 00 01 31 52 45 53 00 00 00 04 |........1RES....|
00000030 00 00 00 d0 00 00 00 04 42 49 27 7a 39 59 fc ab |........BI'z9Y..|
00000040 3d 35 06 c9 00 00 00 00 42 6e 68 68 39 63 99 b1 |=5......Bnhh9c..|
00000050 3e 25 f7 c0 00 00 00 00 42 c6 02 bb 39 8c 31 79 |>%......B...9.1y|
00000060 3f bb 7e 6e 00 00 00 00 43 01 82 96 3a 1d 36 44 |?.~n....C...:.6D|
00000070 3e d9 21 12 00 00 00 00 43 07 35 f0 3a 20 6f 6e |>.!.....C.5.: on|
00000080 3f 02 32 7f 00 00 00 00 43 30 84 0b 39 97 f9 1b |?.2.....C0..9...|
00000090 3e c6 43 c7 00 00 00 00 43 4d e4 e4 39 88 14 90 |>.C.....CM..9...|

Unfortunately, the common tools I use to explore AV formats don’t seem to work on this format. MediaInfo, FFProbe, Exiftool, all give me unknown file warnings. So I had to compile the SDIF software in order to get some details.

querysdif angry_cat.part.sdif 
Header info of file angry_cat.part.sdif:

Format version: 3
Types version: 1

Ascii chunks of file angry_cat.part.sdif:

1NVT
{
StreamID 0;
Date Thu_Aug__3_21.32.45_2000_;
TableName SinusoidalTracks;
WrittenBy Pm_Version_1.2.2;
}

Data in file angry_cat.part.sdif (9504872 bytes):
1933 1TRC frames in stream 0 between time 0.000000 and 5.794875 containing
1933 1TRC matrices with 45 --400 rows, 4 -- 4 columns

An interesting thing is that a SDIF file can be in text form as well.

sdiftotext test.sdif 
SDIF


SDFC

1TRC 1 1 0
1TRC 0x0004 0 4

1TRC 1 1 0.005
1TRC 0x0004 10 4
1 4774.72 0 0
2 8632.52 0 0
3 2996.14 0 0
4 221.58 0 0
5 1943.02 0 0
6 123.951 0 0
7 6705.04 0 0
8 4304.97 0 0
9 3554.29 0 0
10 23.7822 0 0

1TRC 1 1 0.01
1TRC 0x0004 10 4
1 4774.72 0.0353114 2.06098
2 8632.52 0.00442518 0.68795
3 2996.14 0.0238517 -1.42295
4 221.58 0.0089712 -2.44141
5 1943.02 0.00768914 2.64629
6 123.951 0.0397061 -0.17527
7 6705.04 0.0245643 -0.168753
8 4304.97 0.00894803 1.45553
9 3554.29 0.0265175 2.57231
10 23.7822 0.0419019 -2.17731

1TRC 1 1 0.2
1TRC 0x0004 10 4
1 2284.56 0.02781 2.47054
2 4222.62 0.0151738 1.55309
3 31.1554 0.00421461 -0.657285
4 310.99 0.0122306 1.25794
5 215.192 0.0174093 1.25468
6 6253.69 0.000894192 2.21334
7 8533.32 0.0296167 2.07209
8 8044.77 0.0423002 2.54088
9 6087.45 0.0264733 -2.05523
10 7052.7 0.0287347 0.426339

1TRC 1 1 0.205
1TRC 0x0004 10 4
1 2284.56 0 0
2 4222.62 0 0
3 31.1554 0 0
4 310.99 0 0
5 215.192 0 0
6 6253.69 0 0
7 8533.32 0 0
8 8044.77 0 0
9 6087.45 0 0
10 7052.7 0 0

1TRC 1 1 0.21
1TRC 0x0004 0 4

ENDC
ENDF

An interesting format for sure. But wait, there is more!

My initial interest in this format was when I was given access to a set of MUBU files. I was unclear on how there were created at first and it took me down a long path of learning about SDIF and the Max software from Cycling ’74 and IRCAM. MUBU turns out to be a toolbox for Max which adds more analysis features.

MUBU stands for MUlti-BUffer, which helps overcome some limitations. It is actually a container using the SDIF standard. Lets take a look.

hexdump -C test.mubu | head
00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
00000010 31 4e 56 54 00 00 00 78 ff ef ff ff ff ff ff ff |1NVT...x........|
00000020 ff ff ff fd 00 00 00 01 31 4e 56 54 00 00 03 01 |........1NVT....|
00000030 00 00 00 53 00 00 00 01 4d 75 42 75 2e 43 6f 6e |...S....MuBu.Con|
00000040 74 61 69 6e 65 72 2e 4e 75 6d 54 72 61 63 6b 73 |tainer.NumTracks|
00000050 09 31 0a 4d 75 42 75 2e 43 6f 6e 74 61 69 6e 65 |.1.MuBu.Containe|
00000060 72 2e 56 65 72 73 69 6f 6e 09 31 2e 35 0a 4d 75 |r.Version.1.5.Mu|
00000070 42 75 2e 43 6f 6e 74 61 69 6e 65 72 2e 4e 75 6d |Bu.Container.Num|
00000080 42 75 66 66 65 72 73 09 31 0a 00 00 00 00 00 00 |Buffers.1.......|
00000090 31 4e 56 54 00 00 00 38 ff ef ff ff ff ff ff ff |1NVT...8........|

A MUBU file has the same SDIF frame header, but also include a “1NVT” frame, which is a Name Value Table. This is where the MUBU container is referenced. The MuBu file has its own structure:

If I query the MuBu file like I did the SDIF, I get the following:

querysdif test.mubu
Header info of file test.mubu:

Format version: 3
Types version: 1

Ascii chunks of file test.mubu:

1NVT
{
MuBu.Container.NumTracks 1;
MuBu.Container.Version 1.5;
MuBu.Container.NumBuffers 1;
}
1NVT
{
MuBu.Buffer.Index 0;
}
1NVT
{
MuBu.Track.MxRows 2;
AudioFile 1;
MuBu.Track.NonNumType 0;
MuBu.Track.MaxSize 93515;
meta_ISFT Lavf60.16.100;
MuBu.Track.Name mytrack;
MuBu.Track.BufferIndex 0;
MuBu.Track.SampleRate 48000;
FileName Wilhelm_Scream.wav;
MuBu.Track.MxVarRows 0;
MuBu.Track.MxCols 1;
meta_MetaDataSource WAV;
MuBu.Track.EndTime 1623.5;
FilePath /;
MuBu.Track.SampleOffset 0;
MuBu.Track.TimeTags 0;
MuBu.Track.Size 77929;
MuBu.Track.Index 0;
}

1TYP
{
1MTD M000 {unnamed}
1FTD M000
{
M000 Track-0-MatrixData;
}
}

Data in file test.mubu (3741392 bytes):
77929 M000 frames in stream 0 between time 0.000000 and 1.623500 containing
77929 M000 matrices with 2 -- 2 rows, 1 -- 1 columns

The MuBu file contains one audio track and one buffer. This is a simple test file, but MuBu files can be quite large with multiple tracks.

Working with the Max software or OpenMusic is not something I found to be easy to understand. I am sure if I was more musically inclined and with a little practice I could make some of this work. For the time being, a signature to identify a SDIF and MUBU will have to do. Check out the GitHub for my proposed signature and a couple examples.

Shorten

I was recently going through some of my old CD-R’s and came across this 11 year old fun memory.

I remember going to this 2003 Toad the Wet Sprocket concert in Salt Lake City with some friends, I had seen this band perform before, but this was the first time I was able to get a recording of the show. Normally having a recording of a concert of a well known band was a little shady, but for some bands, they not only allow recording of their live concerts, but they encourage it. There has been a few bands over the years who have this philosophy, one most have heard of is the Grateful Dead, because of all the tape trading, the band’s numerous concerts will live on forever.

The scene of recording concerts is still alive and well, and if you are into recording and sharing it is expected you share in a lossless audio format. The world of lossless audio is definitely in the minority of all those who listen to music on the daily. Most of us have been placated with the infinite playlists on services like Apple Music, Spotify, and Amazon Music. Most probably don’t care about owning music anymore, but for the few who consider themselves Audiophiles, having a lossless audio file is the only choice.

When it comes to formats, there are a few lossless formats to choose from, they all come with some advantages as well as some downsides. WAV files contain the full PCM audio stream, and while internet bandwidth today can handle full uncompressed audio, it can still be beneficial to use some compression for archiving or sharing over the web.

The most common lossless format today is the Free Lossless Audio Codec or FLAC, but there are also quite a few who like the Apple Lossless Audio Codec. Both offer many advantages, especially with metadata, cuesheets, and can contain cover album art. But many years ago another lossless format was most often used with bootleg recordings and audio sharing.

Shorten was one of the first lossless formats, developed by Tony Robinson in 1993 for SoftSound. It could cut the size in half of a typical 16-bit WAV file. It achieved this by using Huffman coding, kinda the same way a JPEG works, by reducing the frequency of how often patterns occur. Today FLAC and ALAC have replaced this format and offer improved features and support. Many audio players have dropped support for shorten making it difficult to use this old format.

The Shorten format uses the .SHN extension. It is one of the formats listed on the Library of Congress Sustainability of Digital Formats with the ID fdd000199, although a couple links don’t appear to work as it hasn’t been updated since 2011. Support was ended for this format and many of the links found on various websites are for broken, usually referencing the etree wiki. Much of which is archived on the Internet Archive.

Let’s take a look at the what makes up a lossless compressed SHN file. A quick look at a sample header:

hexdump -C test.shn | head
00000000  61 6a 6b 67 02 fb b1 70  09 f9 25 59 52 a4 d1 a8  |ajkg...p..%YR...|
00000010  dd cf 85 5a 01 57 a0 d5  a8 b6 6b 6d d2 41 10 80  |...Z.W....km.A..|
00000020  40 20 10 18 04 0a 01 44  d6 40 20 11 0d 8c 0a 01  |@ .....D.@ .....|
00000030  04 80 44 20 16 4b 0d d2  c3 b8 f8 55 a0 11 80 59  |..D .K.....U...Y|
00000040  98 56 1d b1 79 51 9f 39  f1 12 d2 d3 75 5c cd 08  |.V..yQ.9....u\..|
00000050  06 25 68 6b 52 5e 9f 4c  39 cd c1 32 c4 0d a9 b7  |.%hkR^.L9..2....|
00000060  69 34 56 f0 96 fa 46 89  a2 6e 8c ba d5 d0 58 de  |i4V...F..n....X.|
00000070  f5 44 5b aa 61 82 c7 85  88 37 d6 ee cb ab 4e 44  |.D[.a....7....ND|
00000080  91 19 b7 38 d4 20 ae 98  98 d1 2c 4a 4e 88 dd 3e  |...8. ....,JN..>|
00000090  36 68 1b 59 a8 7d 84 23  76 0a 84 21 a1 cd 80 8e  |6h.Y.}.#v..!....|

The first four bytes seem to be consistent among my samples. It makes me wonder if the ascii values have something to do with the author, Anthony (Tony) J. Robinson. In the source code for the shorten software, the file shorten.h defines the ascii “ajkg” as the magic header for the SHN format. Also found in current ffmpeg code. Although the tools don’t have much to say about them.

mediainfo test.shn 
General
Complete name                            : test.shn
Format                                   : Shorten
Format version                           : 2
File size                                : 3.17 MiB

Audio
Format                                   : Shorten
Compression mode                         : Lossless

ffprobe -i test.shn
Input #0, shn, from 'test.shn':
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Audio: shorten, 44100 Hz, 2 channels, s16p

Using the older SHNTOOL, we can get more information.

shntool info test.shn  
-------------------------------------------------------------------------------
File name:                    test.shn
Handled by:                   shn format module
Length:                       0:32.23
WAVE format:                  0x0001 (Microsoft PCM)
Channels:                     2
Bits/sample:                  16
Samples/sec:                  44100
Average bytes/sec:            176400
Rate (calculated):            176400
Block align:                  4
Header size:                  44 bytes
Data size:                    5697720 bytes
Chunk size:                   5697756 bytes
Total size (chunk size + 8):  5697764 bytes
Actual file size:             3325489
File is compressed:           yes
Compression ratio:            0.5836
CD-quality properties:
  CD quality:                 yes
  Cut on sector boundary:     no
  Sector misalignment:        1176 bytes
  Long enough to be burned:   yes
WAVE properties:
  Non-canonical header:       no
  Extra RIFF chunks:          no
Possible problems:
  File contains ID3v2 tag:    no
  Data chunk block-aligned:   yes
  Inconsistent header:        no
  File probably truncated:    unknown
  Junk appended to file:      unknown
  Odd data size has pad byte: n/a
Extra shn-specific info:
  Seekable:                   yes

Many Shorten Audio Files are found out there in archives and file sharing sites, so even though the format isn’t used to create new files, it will still be around for awhile. My GitHub has my signature proposal and a couple of samples.