A2R / MOOF / WOZ

There seems to be a never ending growing list of disk image formats. Many have features which are specific to the media and format. If you have ever imaged an older Macintosh floppy you know they are special. If you add in copy-protection which many early Apple II floppies have, and you need special drives, hardware, and a special format to store the floppy data.

When imaging special media, especially with unique media, it is best practice to image the floppies at the magnetic flux level.

Floppy disks contain magnetic fluctuations which are measured and recorded using specialized equipment. A popular method is using a Kryoflux board, floppy drive, and software. The software communicates with a custom controller board connected to a floppy drive through USB. If you are interested in the different controller boards, a good list has been compiled here.

A Kryoflux, fluxengine, greaseweazle, all can image specialized disks like a Macintosh 800k floppy, but the best controller board for them is an Applesauce setup. They are specifically designed to for the task. With that task, comes a few specialty formats.

A file format which can store flux data is a bit different than a regular disk image format. The flux data contains all the low-level recordings which can then be interpreted into disk images much like the original floppy. In the case of an Applesauce flux image, it can contain all the small nuances of the original floppy, this includes recording any copy protection or other creative methods used by software vendors throughout the years. The format used for storing this flux data is the A2R format.

A2R is in its third iteration. Let’s take a look at the basics of the format.

hexdump -C Samplev3.a2r | head
00000000 41 32 52 33 ff 0a 0d 0a 49 4e 46 4f 25 00 00 00 |A2R3....INFO%...|
00000010 01 41 70 70 6c 65 73 61 75 63 65 20 76 31 2e 38 |.Applesauce v1.8|
00000020 38 2e 35 20 20 20 20 20 20 20 20 20 20 20 20 20 |8.5 |
00000030 20 02 01 01 00 52 57 43 50 e9 49 6e 01 01 24 f4 | ....RWCP.In..$.|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 43 01 00 |.............C..|
00000050 00 01 27 3a 25 00 91 d9 00 00 21 20 21 21 21 21 |..':%.....! !!!!|
00000060 1f 21 21 21 21 1f 24 5e 24 1f 21 21 20 21 24 5c |.!!!!.$^$.!! !$\|
00000070 24 20 21 21 21 1f 24 5c 25 21 21 1f 21 21 23 5b |$ !!!.$\%!!.!!#[|
00000080 25 20 21 21 21 1f 21 22 23 3f 41 3f 26 3e 43 3f |% !!!.!"#?A?&>C?|
00000090 43 5f 41 27 3d 61 41 27 3d 61 3f 28 3e 61 3f 26 |C_A'=aA'=a?(>a?&|

hexdump -C Samplev2.a2r | head
00000000 41 32 52 32 ff 0a 0d 0a 49 4e 46 4f 24 00 00 00 |A2R2....INFO$...|
00000010 01 41 70 70 6c 65 73 61 75 63 65 20 76 31 2e 31 |.Applesauce v1.1|
00000020 2e 36 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |.6 |
00000030 20 02 01 01 53 54 52 4d 75 17 5d 01 00 01 e6 da | ...STRMu.].....|
00000040 00 00 83 a9 12 00 12 1e 11 13 1e 13 1e 13 11 1f |................|
00000050 21 1f 11 13 1c 14 1e 30 14 20 1e 14 1e 14 1c 14 |!......0. ......|
00000060 1c 13 11 20 21 1f 11 11 0f 13 1e 14 1c 14 2e 21 |... !..........!|
00000070 13 1e 13 1e 14 1e 11 11 20 21 1f 11 11 13 1e 1f |........ !......|
00000080 13 20 30 21 11 11 0f 13 1e 13 11 30 1f 21 20 13 |. 0!.......0.! .|
00000090 11 30 1f 14 1e 30 14 1e 11 11 11 1e 13 11 1e 14 |.0...0..........|

The A2R format uses a chunk system to store the various pieces to the format. Earlier versions used a STRM Chunk to store all the raw flux data. Version 3 changed to a RWCP Chunk to store all the raw flux data. Applesauce uses a 2-pass imaging process, doing a rapid imaging to determine where on the media surface track data exists and then a second pass that captures longer durations for processing and error correction.

Once the full raw flux data has been captured that data can be interpreted as a disk image. The Applesauce software is able to make a regular disk image, a Disk Copy 4.2 file, which are well known and identify in PRONOM as fmt/625, but can also create a couple of special disk image formats which allow for special nuances on an original disk.

The WOZ Disk Image format is an offshoot of the Applesauce project. Capturing highly accurate bit data is of no use if you don’t have a container to hold the data. The WOZ format was designed to be able to contain every possible Apple ][ disk structure and layout. It can be so accurate that even copy protected software can’t tell that it isn’t an original disk.

The WOZ format has become very popular in the Apple II community and is ideal for emulating all the old games and software titles popular in the early 1980’s. You may have guessed where the name comes from. The internet archive has a large collection of WOZ disks in their WOZ-a-Day collection. The file format of a WOZ disk image is also a chunk based format similar to the A2R format, it has two versions. Let’s take a look.

hexdump -C WOZ 1.0/Blazing Paddles (Baudville).woz | head
00000000 57 4f 5a 31 ff 0a 0d 0a f6 f5 92 d6 49 4e 46 4f |WOZ1........INFO|
00000010 3c 00 00 00 01 01 00 01 01 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 30 2e 32 36 20 20 20 20 20 20 20 |uce v0.26 |
00000030 20 20 20 20 20 20 20 20 20 00 00 00 00 00 00 00 | .......|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 00 ff 01 01 01 ff 02 |TMAP............|
00000060 02 02 ff 03 03 03 ff 04 04 04 ff 05 05 05 ff 06 |................|
00000070 06 06 ff 07 07 07 ff 08 08 08 ff 09 09 09 ff 0a |................|
00000080 0a 0a ff 0b 0b 0b ff 0c 0c 0c ff 0d 0d 0d ff 0e |................|
00000090 0e 0e ff 0f 0f 0f ff 10 10 10 ff 11 11 11 ff 12 |................|

hexdump -C WOZ 2.0/Blazing Paddles (Baudville).woz | head
00000000 57 4f 5a 32 ff 0a 0d 0a 21 da c2 c8 49 4e 46 4f |WOZ2....!...INFO|
00000010 3c 00 00 00 02 01 00 01 01 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 31 2e 31 20 20 20 20 20 20 20 20 |uce v1.1 |
00000030 20 20 20 20 20 20 20 20 20 01 01 20 00 00 00 00 | .. ....|
00000040 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 00 ff 01 01 01 ff 02 |TMAP............|
00000060 02 02 ff 03 03 03 ff 04 04 04 ff 05 05 05 ff 06 |................|
00000070 06 06 ff 07 07 07 ff 08 08 08 ff 09 09 09 ff 0a |................|
00000080 0a 0a ff 0b 0b 0b ff 0c 0c 0c ff 0d 0d 0d ff 0e |................|
00000090 0e 0e ff 0f 0f 0f ff 10 10 10 ff 11 11 11 ff 12 |................|

Unlike a common disk image, a WOZ image contains more than the bits on the disk, it contains a mapping of all the tracks and the associated data, this is how it can even contain copy-protection usually only possible with a physical disk. The ‘TMAP’ chunk contains a track map and the ‘TRKS’ chunk contains all the data.

What the WOZ is for the Apple II, MOOF was made for the Macintosh. You may wonder what is with the funny name, but there is a long history around “Clarus the Dogcow”. I’m sure this factoid will help you impress your friends or win at trivia night. Again, the purpose of the special format for Macintosh disks is to allow for emulating disks, even with copy protection. You can also find quite the collection of old Macintosh software in the MOOF format on the Internet Archive, even emulate your favorite game, such as Dark Castle, which I played for hours as a kid. Also a chunk based format, let’s take a look at the header.

hexdump -C Dark Castle v1.0 - Disk 1.moof | head
00000000 4d 4f 4f 46 ff 0a 0d 0a b5 75 f9 4e 49 4e 46 4f |MOOF.....u.NINFO|
00000010 3c 00 00 00 01 01 00 01 10 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 31 2e 37 33 20 20 20 20 20 20 20 |uce v1.73 |
00000030 20 20 20 20 20 20 20 20 20 00 13 00 00 00 00 00 | .......|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 ff 01 ff 02 ff 03 ff |TMAP............|
00000060 04 ff 05 ff 06 ff 07 ff 08 ff 09 ff 0a ff 0b ff |................|
00000070 0c ff 0d ff 0e ff 0f ff 10 ff 11 ff 12 ff 13 ff |................|
00000080 14 ff 15 ff 16 ff 17 ff 18 ff 19 ff 1a ff 1b ff |................|
00000090 1c ff 1d ff 1e ff 1f ff 20 ff 21 ff 22 ff 23 ff |........ .!.".#.|

All three formats created for imaging and emulating Apple and Macintosh software are well documented and open. They are also well suited for preservation as they can contain extensive metadata in the INFO chunk which gives provenance information on the source of the files. The Applesauce software even has a camera to photograph the disk itself for archiving. All of this makes these formats great for preservation and emulation. Take a look at my proposal for a signature on my Github.

Binder

Microsoft is never in short supply of file formats. They have made many changes over the years. Introduced lots of products, some lasting longer than others. The list is quite long.

One such software was called Office Binder. Introduced with Office 95, it was a companion application to combine a number of OLE objects together in one “Binder”. Meant to be the digital version of an Office Binder one often uses for presentations or proposals.

You could add sections and include Word documents, Images, Powerpoint, Excel spreadsheets, basically any OLE object. Of course a Binder file itself was an OLE compound object. They had the extension OBD, and templates used OBT. The PRONOM registry has PUID’s for the different Binder versions, but there are some issues.

PUIDFormat NameFormat VersionExtension
fmt/237Microsoft Office Binder File for Windows95obd
fmt/240Microsoft Office Binder File for Windows97-2000obd
fmt/238Microsoft Office Binder Template for Windows95obt
fmt/241Microsoft Office Binder Template for Windows97-2000obt
fmt/239Microsoft Office Binder Wizard for Windows95obz
fmt/242Microsoft Office Binder Wizard for Windows97-2000obz
filename : 'Binder95-s01.obd'
filesize : 5120
modified : 2024-08-08T21:24:34-06:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/240'
format : 'Microsoft Office Binder File for Windows'
version : '97-2000'
mime :
class :
basis : 'extension match obd; container name Binder with name only'

Turns out only one of the PRONOM PUID’s has an actual signature, the others are placeholders. So when I run Siegfried on an Office Binder 95 file, it comes back as fmt/240 which points to an Office Binder 97-2000 file. It’s a simple signature, looking for an internal file named “Binder”, which is inherent of all the Binder file types.

    <ContainerSignature Id="5500" ContainerType="OLE2">
<Description>Microsoft Office Binder File for Windows 97-2000</Description>
<Files>
<File>
<Path>Binder</Path>
</File>
</Files>
</ContainerSignature>

Taking a look inside the Office 95 Binder file, we can see the “Binder” file.

Path = Binder95-s01.obd
Type = Compound
Physical Size = 5120
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 316 320 [5]SummaryInformation
..... 144 192 Binder
..... 280 320 [5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
740 832 3 files

hexdump -C Binder95-s01/Binder
00000000 90 00 00 00 05 00 00 00 00 00 00 00 05 00 00 00 |................|
00000010 00 00 00 00 a1 6a 8a 8e cc 55 ef 11 ab 06 00 0c |.....j...U......|
00000020 29 b1 b4 d0 00 00 00 00 00 00 00 00 00 00 00 00 |)...............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 40 86 61 a6 |............@.a.|
00000040 0b ea da 01 00 00 00 00 00 00 00 00 40 86 61 a6 |............@.a.|
00000050 0b ea da 01 09 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 2c 00 00 00 00 00 00 00 01 00 00 00 |....,...........|
00000070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000080 2c 00 00 00 2c 00 00 00 13 03 00 00 44 02 00 00 |,...,.......D...|

The bytes within a “Binder” file has some patterns, but nothing decipherable.

Microsoft Office Binder was only included in three versions of Office. Office 95, 97, and 2000. Let’s look at the other two versions.

Path = Binder97-s04.obd
Type = Compound
Physical Size = 5632
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 28 64 HdrFtr
..... 144 192 Binder
..... 260 320 [5]SummaryInformation
..... 404 448 [5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
836 1024 4 files

Path = Binder2K-S01.obd
Type = Compound
Physical Size = 5632
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 28 64 HdrFtr
..... 144 192 Binder
..... 260 320 [5]SummaryInformation
..... 232 256 [5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
664 832 4 files

It looks like version 97 and 2000 have an extra file. The “HdrFtr” file seems to reference a Header and Footer, which according to documentation was a feature added in Office 97.

What’s new in Office Binder 97

Office Binder makes it possible for you to group all your documents, workbooks, and presentations for a project in one place. To get started with Office Binder 97, add a new or existing document to your binder. Use the new Office 97 features while you work in a binder……. Print headers and footers for a binder

We can use the “HdrFtr” file within the container to differentiate between the 95 version and 97-2000 formats. Perhaps, a closer look at the DocumentSummaryInformation file in the future, might help with a more precise identification later. There doesn’t seem to be anything to distinguish an OBD file from a OBT template file, so those PUID’s may not be needed. The other format related to the Binder software has the OBZ extension. It is called a Wizard template file in some documentation, but I have been unable to find any type of “Wizard” functionality in the Office Binder Apps to generate a file. The OBZ format seems to have something to do with macros in Visual Basic. Luckily there are a few examples available on Office install disc‘s.

Path = CLIENT.OBZ
Type = Compound
Physical Size = 364032
Extension = doc
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
1995-07-05 17:25:15 D.... 7
1995-07-05 17:25:14 D.... 5
1995-07-05 17:25:13 D.... 4
..... 106 128 4/[1]CompObj
..... 20 64 4/[1]Ole
..... 8880 9216 4/WordDocument
..... 32 64 4/[3]View000
..... 492 512 4/[5]SummaryInformation
..... 236 256 4/[5]DocumentSummaryInformation
1995-07-05 17:25:14 D.... 6
..... 17760 17920 6/Book
..... 20 64 6/[1]Ole
..... 0 0 6/[3]View000
..... 102 128 6/[1]CompObj
..... 3260 3264 6/[5]SummaryInformation
..... 192 192 6/[5]DocumentSummaryInformation
..... 106 128 5/[1]CompObj
..... 20 64 5/[1]Ole
..... 8055 8192 5/WordDocument
..... 32 64 5/[3]View000
..... 7280 7680 5/[5]SummaryInformation
..... 220 256 5/[5]DocumentSummaryInformation
1995-07-05 17:25:16 D.... 9
1995-07-05 17:25:15 D.... 8
..... 13857 14336 8/Book
..... 20 64 8/[1]Ole
..... 0 0 8/[3]View000
..... 102 128 8/[1]CompObj
..... 188 192 8/[5]SummaryInformation
..... 196 256 8/[5]DocumentSummaryInformation
..... 854 896 Binder
1995-07-05 17:25:19 D.... 10
..... 80382 80384 10/Book
..... 20 64 10/[1]Ole
..... 0 0 10/[3]View000
..... 102 128 10/[1]CompObj
..... 4044 4096 10/[5]SummaryInformation
1995-07-05 17:25:19 D.... 10/_VBA_PROJECT
..... 9425 9728 10/_VBA_PROJECT/812f9922c6
..... 12302 12800 10/_VBA_PROJECT/7b2f9922a4
..... 36937 37376 10/_VBA_PROJECT/dir
..... 6609 6656 10/_VBA_PROJECT/7e2f9922b5
..... 23014 23040 10/_VBA_PROJECT/872f9922e8
..... 7995 8192 10/_VBA_PROJECT/842f9922d9
..... 5338 5632 10/_VBA_PROJECT/902f992333
..... 36119 36352 10/_VBA_PROJECT/8d2f99231e
..... 18129 18432 10/_VBA_PROJECT/932f992342
..... 13055 13312 10/_VBA_PROJECT/b42fbcaa59

..... 208 256 10/[5]DocumentSummaryInformation
..... 4228 4608 [5]SummaryInformation
..... 956 960 [5]DocumentSummaryInformation
..... 106 128 9/[1]CompObj
..... 20 64 9/[1]Ole
..... 5914 6144 9/WordDocument
..... 0 0 9/[3]View000
..... 1520 1536 9/[5]SummaryInformation
..... 220 256 9/[5]DocumentSummaryInformation
..... 16141 16384 7/Book
..... 20 64 7/[1]Ole
..... 0 0 7/[3]View000
..... 102 128 7/[1]CompObj
..... 188 192 7/[5]SummaryInformation
..... 192 192 7/[5]DocumentSummaryInformation
------------------- ----- ------------ ------------ ------------------------
1995-07-05 17:25:19 345316 351168 55 files, 8 folders

Sure enough, the OBZ file has a Visual Basic macro (VBA_Project). Unfortunately, it appears to be nested in an additional folder within the container, with a variable number number which is likely to change from file to file. That fact will make identification in PRONOM much more difficult, as the signatures are not designed for variable names. Possibly something we can investigate later.

Microsoft Binder was only released in Office 95, 97, and 2000, but was supported in Office XP and 2003 through an UNBIND.EXE application which would simply separate all the different objects back out to the individual files.

The Microsoft Office Binder is not included in Office 2003. However, if a Binder file created in a previous version of Office contains information you want to access, you can use the Unbind tool to pull out the information and save it in the formats of the appropriate programs. In order to do this procedure, the Unbind tool must be installed.

    As always, you can look at some sample files and my proposal for updated signatures on my GitHub page.

    UFO

    Researching file formats isn’t for everyone. Others may find it boring or even odd. Trying to explain to others the nuances of a binary format versus a container format would bring many tears. Their reactions sometimes are similar to hearing someone explain their belief in aliens. Passionate, but a bit on the crazy side.

    So with aliens and containers on my mind, let’s take a look at a format with the extension UFO. It is not an unidentified flying object or a UAP, it may as well be an unidentified file object, but in this case it is a “Ulead File for Objects” format. It is the exclusive file format for use with the PhotoImpact software from Ulead Systems, a Taiwanese developer known for many popular software programs. First released in 1996 with version 3, the PhotoImpact software was marketed as “a fully object-based tool, which pioneered a number of important innovations“.

    The reason it was a considered a full object-based tool was the UFO format is based on the, at the time, popular OLE Compound File Storage object format developed by Microsoft. So by using some OLE tools we can take a closer look at some of these Unidentified File Objects……..

    oleid Sample.ufo 
    oleid 0.60.1 - http://decalage.info/oletools
    THIS IS WORK IN PROGRESS - Check updates regularly!
    Please report any issue at https://github.com/decalage2/oletools/issues

    Filename: Sample.ufo
    --------------------+--------------------+----------+--------------------------
    Indicator |Value |Risk |Description
    --------------------+--------------------+----------+--------------------------
    File format |Generic OLE file / |info |Unrecognized OLE file.
    |Compound File | |Root CLSID: - None
    |(unknown format) | |
    --------------------+--------------------+----------+--------------------------
    Container format |OLE |info |Container type
    --------------------+--------------------+----------+--------------------------
    Encrypted |False |none |The file is not encrypted
    --------------------+--------------------+----------+--------------------------
    VBA Macros |No |none |This file does not contain
    | | |VBA macros.
    --------------------+--------------------+----------+--------------------------
    XLM Macros |No |none |This file does not contain
    | | |Excel 4/XLM macros.
    --------------------+--------------------+----------+--------------------------
    External |0 |none |External relationships
    Relationships | | |such as remote templates,
    | | |remote OLE objects, etc
    --------------------+--------------------+----------+--------------------------

    Well, it is a OLE file, but is unrecognized/unidentified by the oletools software. It also appears to be missing the root entry and CLSID you commonly find in OLE files. Since this is an OLE container we can also just use 7zip to peek inside as well.

    Path = Sample.ufo
    Type = Compound
    Physical Size = 937984
    Extension = compound
    Cluster Size = 512
    Sector Size = 64

    Date Time Attr Size Compressed Name
    ------------------- ----- ------------ ------------ ------------------------
    1999-05-25 03:33:05 D.... OS-3
    1999-05-25 03:33:04 D.... OS-1
    1999-05-25 03:33:03 D.... OS-0
    ..... 31122 31232 OS-0/ObjectImage
    ..... 1316 1344 OS-0/ObjectData
    ..... 137996 138240 OS-0/PathStream
    ..... 19591 19968 OS-0/ObjectMask0
    1999-05-25 03:33:05 D.... OS-2
    ..... 43405 43520 OS-2/ObjectImage
    ..... 1316 1344 OS-2/ObjectData
    ..... 176204 176640 OS-2/PathStream
    ..... 25524 25600 OS-2/ObjectMask0
    ..... 41588 41984 OS-1/ObjectImage
    ..... 1316 1344 OS-1/ObjectData
    ..... 170132 170496 OS-1/PathStream
    ..... 25221 25600 OS-1/ObjectMask0
    ..... 34505 34816 LtfMainImage
    ..... 656 704 LtfHeader
    1999-05-25 03:33:06 D.... OS-4
    ..... 19249 19456 OS-4/ObjectImage
    ..... 1316 1344 OS-4/ObjectData
    ..... 4842 5120 LtfPreviewImage
    ..... 1160 1216 LtfObjectList
    ..... 31753 32256 OS-3/ObjectImage
    ..... 1316 1344 OS-3/ObjectData
    ..... 131892 132096 OS-3/PathStream
    ..... 19439 19456 OS-3/ObjectMask0
    ------------------- ----- ------------ ------------ ------------------------
    1999-05-25 03:33:06 920859 925120 22 files, 5 folders

    In this sample file, we have a bunch of directories and objects, but none of what we expect to see in an OLE file, such as a “SummaryInformation” or “DocumentSummaryInformation” like we would see in a Word DOC file. By not having the standard contents of the container, it makes these files very specific to PhotoImpact software.

    Path = PhotoImpactX3-s01.ufo
    Type = Compound
    Physical Size = 5120
    Extension = compound
    Cluster Size = 512
    Sector Size = 64

    Date Time Attr Size Compressed Name
    ------------------- ----- ------------ ------------ ------------------------
    ..... 20 64 HotspotStream
    ..... 656 704 LtfHeader
    ..... 20 64 SliceInfoStream
    ..... 412 448 LtfPreviewImage
    ..... 714 768 WebPropStream
    ..... 20 64 ManualHotspotScriptInfoStream
    ..... 20 64 ObjectHotspotScriptInfoStream
    ------------------- ----- ------------ ------------ ------------------------
    1862 2176 7 files

    Here is another UFO file from the last version of the software PhotoImpact X3 when it was owned by Corel, but phased out in 2009. This is the basic file structure with no objects added to the file. We can be fairly confident these are the base files in most every other UFO file. It doesn’t have any of the “OS” folders which contain the objects, so I think the LtfHeader file might be our best bet for a signature. Let’s take a look at the Hex values for a few of them.

    hexdump -C PhotoImpactX3-s01/LtfHeader| head
    00000000 90 02 00 00 4c 54 46 00 58 02 00 00 02 00 ba dc |....LTF.X.......|
    00000010 ee 02 00 00 26 02 00 00 80 fc 0a 00 80 fc 0a 00 |....&...........|
    00000020 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
    00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

    hexdump -C Sample/LtfHeader| head
    00000000 90 02 00 00 4c 54 46 00 90 01 00 00 02 00 f7 bf |....LTF.........|
    00000010 90 01 00 00 90 01 00 00 80 fc 0a 00 80 fc 0a 00 |................|
    00000020 00 00 00 00 06 00 00 00 00 00 00 00 00 00 00 00 |................|
    00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

    hexdump -C v3/ANIMALS/LtfHeader| head
    00000000 90 02 00 00 4c 54 46 00 64 00 00 00 02 00 6e 00 |....LTF.d.....n.|
    00000010 40 01 00 00 c8 00 00 00 80 fc 0a 00 80 fc 0a 00 |@...............|
    00000020 00 00 00 00 11 00 00 00 01 00 00 00 01 00 00 00 |................|
    00000030 01 00 00 00 60 00 00 00 3c 00 00 00 60 00 00 00 |....`...<...`...|

    Making a signature using the first 8 bytes of the LtfHeader file appears to have worked for all the 3,400+ sample files I have collected. Problem is it also worked for another extension found in the some of the later versions of PhotoImpact.

    When you have successfully finished your template, make sure to save it in the Ulead File For Photo Project format (*.UFP). This allows you to open and use your template in the Photo Projects dialog box. In the Template tab, click Open Project and browse for the created file.

    They appear to be a template version for the format so we should be fine just adding the extension to the same signature.

    Well, this Unidentified File Object is no longer unidentifiable. Was it sent by aliens? Possibly, but at least we know where these UFO’s came from, PhotoImpact. Take a look at the samples and proposed signature in my GitHub.

    Also be sure to join us at this years iPres conference and attend our workshop on container signatures in PRONOM!

    ePic

    Image compression has been around for awhile. It seems everyone took a crack at making better algorithms to improve quality and size. Some chose to invent new ways and others chose to use existing methods but with their own flare. Kodak tried this with their PhotoCD, but there was a couple other photo processing options that popped up in 90’s. One was Seattle FilmWorks and another was Konica PC PictureShow. Both of which used “proprietary” formats to deliver developed film on disk.

    Seattle FilmWorks later called PhotoWorks, used an image format with the extension SFW and was based on BMP and JPG, but with their own twist. The same goes for the format used by Konica’s PC PictureShow.

    Konica PC PictureShow Disk

    If you took your film in to be developed at one of Konica’s photo labs, you could could have those images put on a diskette or later a CD-R. The disks came with software to view your photos called PC PhotoShow. The images stored on disk where in another proprietary format with the extension KQP. The KQP format was actually licensed from another company called Pegasus Imaging Corporation, later known as Accusoft. They developed their own way to compress a JPEG file which they called an ePic. An SDK called PICTools was offered for many years, but seems not to be available anymore.

    ePIC (Proprietary)
    • Supports PIC format compression, replacing the JPEG Huffman encoder with the proprietary ELS entropy encoder for 15% more compression.
    • Can be losslessly converted back to JPEG format using Op_RORE.

    A search on the internet for Konica KQP shows quite a few people over the years wondering what to do with their old disks and converting the old format to JPG, only to find a lack of information and available tools to do so. One such person used python to edit the file and making the file renderable as a JPG. While the method worked well for their KQP files, it might not work for all of them. Let’s look closer and understand why.

    hexdump -C Sample.PIC | head
    00000000 42 4d 00 00 00 00 00 00 00 00 42 04 00 00 44 00 |BM........B...D.|
    00000010 00 00 34 08 00 00 24 fa ff ff 01 00 18 00 4a 50 |..4...$.......JP|
    00000020 45 47 00 00 00 00 00 00 00 00 00 00 00 00 fc 00 |EG..............|
    00000030 00 00 ec 00 00 00 2c 00 00 00 18 00 00 00 00 00 |......,.........|
    00000040 00 00 02 00 00 00 08 00 00 00 01 00 00 00 01 00 |................|
    00000050 00 00 60 00 00 00 00 00 60 00 00 60 00 00 00 00 |..`.....`..`....|
    00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

    At first glance the file appears to be a Bitmap (BMP), and it does have a Bitmap header claiming to have JPEG compression, but if we look a little further into the file.

    identify -verbose Sample.PIC   
    identify: length and filesize do not match `Sample.PIC' @ error/bmp.c/ReadBMPImage/950.
    identify: unrecognized compression `Sample.PIC' @ error/bmp.c/ReadBMPImage/1019.

    hexdump -C Sample.PIC
    00000000 42 4d 00 00 00 00 00 00 00 00 42 04 00 00 44 00 |BM........B...D.|
    00000010 00 00 34 08 00 00 24 fa ff ff 01 00 18 00 4a 50 |..4...$.......JP|
    00000020 45 47 00 00 00 00 00 00 00 00 00 00 00 00 fc 00 |EG..............|
    00000030 00 00 ec 00 00 00 2c 00 00 00 18 00 00 00 00 00 |......,.........|
    00000040 00 00 02 00 00 00 08 00 00 00 01 00 00 00 01 00 |................|
    00000050 00 00 60 00 00 00 00 00 60 00 00 60 00 00 00 00 |..`.....`..`....|
    00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    *
    00000400 00 00 60 00 00 00 00 00 60 00 00 60 00 00 00 00 |..`.....`..`....|
    00000410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    *
    00000440 00 00 ff d8 ff e0 00 10 4a 46 49 46 00 01 02 02 |........JFIF....|
    00000450 00 00 00 00 00 00 ff e1 00 0a 50 49 43 00 01 19 |..........PIC...|
    00000460 1e 01 ff c0 00 11 08 05 dc 08 34 03 01 11 00 02 |..........4.....|

    We find a JPG marker, in fact almost the whole jpg file is included, except the quantization tables for luminance and chrominance which are needed to properly display the image. This is the area the Pegasus company thought they could encode better to further compress the image. Their method was to use a new algorithm called ELS (Entropy Logarithmic-Scale). This new method was used by the PICTools software to make a Pegasus PIC file while Konica used it for their KQP format. They are identical. By choosing the luminance and chrominance values during compression, you could make a highly compressed image, but required specific software to render.

    Pegasus also made use of a special custom APP marker (PIC) within the JPEG structure of the PIC/KQP and also any JPG compressed using their software. This marker which takes up around 8 bytes holds the luminance and chrominance values. Take the above sample for instance, it is compressing the image with a Luminance of 25 and a Chrominance of 30, these are integer values and in hex they would be “19” and “1E” respectively.

    hexdump -C Sample.PIC      
    00000440 00 00 ff d8 ff e0 00 10 4a 46 49 46 00 01 02 02 |........JFIF....|
    00000450 00 00 00 00 00 00 ff e1 00 0a 50 49 43 00 01 19 |..........PIC...|
    00000460 1e 01 ff c0 00 11 08 05 dc 08 34 03 01 11 00 02 |..........4.....|
    00000470 11 01 03 11 01 ff c4 00 51 00 01 00 03 01 00 00 |........Q.......|

    So in theory one could strip out any part of the file before the JPG beginning of file magic bytes (FF D8 FF E0), locate the APP marker, use the values to generate the two quantization tables, insert them in the appropriate spot and save out a JPG file.

    This may be the case for the first few versions of the ePic format, but later versions got more complicated. It seems a “PIC2” version replaced the earlier versions and this format is a little more complicated.

    hexdump -C Sample.KQP | head
    00000000 50 49 43 32 01 08 00 00 00 64 00 01 00 b9 3e 00 |PIC2.....d....>.|
    00000010 00 05 08 00 00 00 4a 50 47 45 03 00 00 00 16 24 |......JPGE.....$|
    00000020 00 00 00 43 6f 6d 70 72 65 73 73 69 6f 6e 20 62 |...Compression b|
    00000030 79 20 50 65 67 61 73 75 73 20 49 6d 61 67 69 6e |y Pegasus Imagin|
    00000040 67 20 43 6f 72 70 2e 06 68 3e 00 00 ff d8 ff e0 |g Corp..h>......|
    00000050 00 10 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 |..JFIF..........|
    00000060 ff e1 00 16 50 49 43 00 03 00 00 01 00 00 00 00 |....PIC.........|
    00000070 00 00 00 00 00 00 00 00 ff db 00 84 00 0f 0a 0a |................|
    00000080 0a 0a 06 0f 0a 0a 0a 0f 0f 0f 0f 14 1e 14 14 14 |................|
    00000090 14 14 28 1e 1e 19 1e 2d 28 32 32 2d 28 2d 2d 32 |..(....-(22-(--2|

    Instead of the Bitmap (BMP) header, a proprietary PIC2 header is used, still containing a JPG in the JFIF format along with a the PIC APP marker, but encoded in a way that the simple method of adding a quantization table may not work. With the original format the JPG and the PIC/KQP were approximately the same size, this new version significantly reduces the size of the PIC/KQP in comparison with the JPG.

    The ELS compression technology used in the ePic format seems to be patented by Pegasus and Accusoft, but is not entirely hidden as the libavcodec library includes a ELS decoder. Might be a fun project to use the code to decode the PIC/KQP formats fully.

    In the meantime, a signature identifying the two versions should be added to PRONOM. Check out my proposal on my GitHub. If you need to convert your KQP or PIC files back to JPG here are a few links:

    Konica PC PictureShow Version 4 (PIC2)

    Accusoft PICTools Apollo Demo (Windows 7 Compatible)

    Konica PC PictureShow for Macintosh

    FASTA & FASTQ

    There seems to be a never ending source of file formats out there. Documenting past obsolete formats, one would assume a point at which there are no more to find, but in reality more are re-discovered everyday by the Digital Preservation community. When it comes to more modern formats, it seems more are invented everyday, too many to keep up with identification. Document one, 10 more pop up, it seems never-ending. Such is the case for scientific formats, including sequencing formats.

    I was speaking with a colleague from another institution the other day and a file format was mentioned I hadn’t heard of before. It seems many of their scientific data was stored in a format called FASTA “Fast A” (“fast-aye”). This format specifically stores DNA sequence data and is used quite a bit, it seems. I was even more surprised the next day when I went to process some new submissions for our repository only to find one submission contained three FASTA files. I love researching file formats, but sometimes in order to understand the format structure you have to know something about the content as well. Let’s explore the FASTA and FASTQ file formats. If you would like to take a peek at the Human Genome in FASTA, go here.

    Both the FASTA and FASTQ formats are text based and have a simple structure. Identification of each of these should be pretty simple, but to avoid conflicts with other formats, the signature might have to be more complex.

    The FASTA format is well documented as many in the scientific community use it. Basically the format starts with the greater than “>” character followed by a description, a new line character, then the sequence. For example:

    >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
    MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
    FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
    DIDGDGQVNYEEFVQMMTAK*

    Pretty straight forward, but so much of the format can be variable, a simple signature would clash with too many other formats. There are some rules with what characters can be used in the sequence so it might be possible to limit the signature to only allow certain characters. At first I thought it might only be able to contain the standard characters representative of adenine (A), cytosine (C), guanine (G), and thymine (T), but as it turns out the FASTA format can contain Nucleic Acid Code’s and Amino Acid Code’s. These codes allow more than the four I was expecting, but do limit what can be represented.

    Take the NCBI Sequence Viewer for a spin and download some data as FASTA.

    The FASTQ format adds more structure and is more limiting, but also presents some challenges. Here is a sample of its structure:

    @SEQ_ID
    GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
    +
    !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

    Instead of a greater than symbol, the FASTQ format uses an “@” symbol followed by an identifier. The identifier can be basically anything and as long as needed. There is a newline character followed by the DNA sequence, which is only the four characters I have heard before. It can contain an A, C, G, T, or N. The “N” can represent an unidentified nucleotide or indicate that the software was unable to make a basecall. A newline character again and the “+” symbol. This is place before the fourth line with is a quality score and is the same number of characters as the sequence.

    See what I mean when you have to learn about the context of a format in order to make a proper signature!

    One of the problems I am left with is how to determine how many of the sequence characters to use in the signature to not have any conflicts. Too few and it might conflict with another format or simple text file. Too many and the signature gets complicated and may exclude a short sequence file. As far as I can tell there is no set minimum or maximum for the sequence. Not sure what the genome for Pinus Taeda would look like in FASTA with 22.18 billion base pairs. The other problem is often times these formats are compressed into a GZIP file, so they need to be extracted before identification.

    These two formats are just a couple of the many sequencing formats being used in the bioinformatics community. I am sure others will pop up in the future. Until then, I have with the help of others put together a signature which seems to work well for the samples and data sets we have access to. Take a look at my GitHub for the signature proposal. If you find any issues, let me know!

    Interactive Quicktime

    One of my favorite legacy formats to explore is any type of multimedia CD-ROM. The 1990’s and early 2000’s were filled with all sorts of multimedia for CD, Web, and Television. It is also one of the most difficult formats to try and preserve for the future. Many CD-ROM’s are filled with executables and/or Macromedia Director media, later having flash content. The operating systems and security needs today make playback almost impossible. For this reason many have built emulation services to mimic the original operation system and software to allow the many historic multimedia CD-ROM’s to once again interact with the user in a way many current systems still struggle with.

    Many CD-ROM’s would come as Hybrid disc’s allowing them to be used on a Windows and Macintosh system, sometimes providing two different experiences. Then there were CD-Extra or Enhanced CD‘s as a separate session to an Audio CD which would contain bonus content playable only on a computer.

    For fun I took a look back at some of my older Audio CD titles. I came across a couple, one claiming to be a “CD-Extra” and another an “Enhanced CD“. The CD-Extra disc when queried with cd-info claimed to have 12 tracks, with the 12th being a data XA track.

    Disc mode is listed as: CD-ROM Mixed
    CD-ROM Track List (1 - 12)
    #: MSF LSN Type Green? Copy? Channels Premphasis?
    1: 00:02:00 000000 audio false no 2 no
    2: 02:13:66 009891 audio false no 2 no
    3: 05:21:28 023953 audio false no 2 no
    4: 08:18:19 037219 audio false no 2 no
    5: 12:28:37 055987 audio false no 2 no
    6: 16:11:58 072733 audio false no 2 no
    7: 19:21:56 086981 audio false no 2 no
    8: 23:17:49 104674 audio false no 2 no
    9: 26:01:17 116942 audio false no 2 no
    10: 28:30:02 128102 audio false no 2 no
    11: 31:07:70 139945 audio false no 2 no
    12: 37:29:46 168571 XA true no
    170: 51:35:07 231982 leadout (520 MB raw, 516 MB formatted)
    CD Analysis Report
    CD-Plus/Extra
    session #2 starts at track 12, LSN: 168571

    Mounting the 12th track showed a mix of Macromedia Director (.DIR) files and quite a few Quicktime MOV movies. Playback was not possible on my current computer so I had to resort to using an emulator to experience this bonus content, full of band member photos and biographies.

    The other disc I pulled out to explore was a bit different. Using cd-info the disc looked very similar:

    Disc mode is listed as: CD-ROM Mixed
    CD-ROM Track List (1 - 13)
    #: MSF LSN Type Green? Copy? Channels Premphasis?
    1: 00:02:00 000000 audio false no 2 no
    2: 04:20:08 019358 audio false no 2 no
    3: 08:04:27 036177 audio false no 2 no
    4: 11:15:62 050537 audio false no 2 no
    5: 14:54:32 066932 audio false no 2 no
    6: 19:57:73 089698 audio false no 2 no
    7: 26:12:36 117786 audio false no 2 no
    8: 29:51:59 134234 audio false no 2 no
    9: 34:44:00 156150 audio false no 2 no
    10: 39:36:62 178112 audio false no 2 no
    11: 42:06:01 189301 audio false no 2 no
    12: 45:42:26 205526 audio false no 2 no
    13: 57:10:54 257154 XA true no
    170: 72:56:67 328117 leadout (735 MB raw, 730 MB formatted)
    CD Analysis Report
    CD-Plus/Extra
    session #2 starts at track 13, LSN: 257154

    The disc’s, even though were labeled CD-Extra and Enhanced CD, had the same structure and format. The difference was in the type of multimedia used. There was a simple application which launched Quicktime and loaded a single MOV movie. But, this was not your regular Quicktime Movie, this is a highly complex Interactive Quicktime movie.

    The Quicktime movie could only be launched from an older operating system using Quicktime 6, and on the Macintosh, only a PPC CPU. The movie would launch with an interactive menu, allowing navigation as you might find on a DVD or Flash website, but all within a single MOV file. When I ran MediaInfo on the MOV file I got back quite a few tracks:

    <media ref="/Volumes/VOLCANOECD/ALECD.mov">
    <track type="General">
    <VideoCount>10</VideoCount>
    <AudioCount>1</AudioCount>
    <OtherCount>51</OtherCount>
    <FileExtension>mov</FileExtension>
    <Format>QuickTime</Format>
    <Format_Settings>Compressed header</Format_Settings>

    Ten video tracks and 51 other tracks. Exploring with Quicktime, I could see the entire list of embedded content:

    Quicktime movies, an Audio track, dozens of Flash, Photos, Animations, Sprites, with the possibility of more. These types of Quicktime files had requirements in order to run with Quicktime 6 being the last which could playback all the content correctly. Current versions of Quicktime give a warning on the lack of compatibility.

    This Interactive Quicktime movie proudly claims; “Made with LiveStage Pro“, which was an authoring environment for Quicktime made by Totally Hip Software Inc. Started in 1995, but seemed to disappear after 2004 with no new development and by 2014 the website went offline.

    If you would like to see a couple of Apple created simple examples see here.

    LiveStage Pro was a very powerful authoring tool in its time, another similar tool called Electrifier competed for the interactive Quicktime market. Adobe GoLive also competed, but offered fewer features. The final Quicktime movie exported from LiveStage Pro was the main component, but the software did save a project format with the extension “LSD”. Versions 2 through 4 of LiveStage Pro had a similar header.

    hexdump -C LiveStagePro4-s01.lsd | head
    00000000 4c 53 41 46 00 00 00 04 00 00 09 16 00 00 00 00 |LSAF............|
    00000010 00 00 00 00 00 00 00 00 00 00 09 0a 73 65 61 6e |............sean|
    00000020 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 18 |................|
    00000030 56 53 4e 6e 00 00 00 01 00 00 00 00 00 00 00 00 |VSNn............|
    00000040 00 00 00 04 00 00 08 84 4d 50 52 4e 00 00 00 01 |........MPRN....|
    00000050 00 00 00 49 00 00 00 00 00 00 00 21 6d 4f 55 54 |...I.......!mOUT|
    00000060 00 00 00 01 00 00 00 00 00 00 00 00 55 6e 74 69 |............Unti|
    00000070 74 6c 65 64 2e 6d 6f 76 00 00 00 00 18 57 6c 65 |tled.mov.....Wle|
    00000080 66 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |f...............|
    00000090 00 00 00 00 18 57 74 6f 70 00 00 00 01 00 00 00 |.....Wtop.......|

    All the samples from version 2 through 4 have the first four bytes as “LSAF“. It also seems the next four bytes may be version related. Version 1 however has a different header.

    hexdump -C contest.lsd | head
    00000000 4c 53 50 72 00 00 00 08 00 00 00 00 00 00 02 80 |LSPr............|
    00000010 01 e0 00 00 00 00 02 58 00 00 00 01 00 00 00 01 |.......X........|
    00000020 00 00 00 02 00 00 00 00 00 08 00 00 00 00 00 00 |................|
    00000030 00 00 08 53 02 d9 ff c9 04 76 02 97 01 00 44 00 |...S.....v....D.|
    00000040 0b 02 fb 03 c9 00 00 00 01 00 00 00 01 00 00 00 |................|
    00000050 00 07 41 63 74 69 6f 6e 73 00 00 00 00 00 00 00 |..Actions.......|
    00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    00000070 00 00 00 00 00 00 00 00 05 00 00 00 01 50 49 43 |.............PIC|
    00000080 54 ff ff 00 00 c1 ff 03 72 65 64 65 6e 6e 41 79 |T.......redennAy|
    00000090 98 05 41 77 78 00 00 01 7a 00 10 00 00 31 fc 30 |..Awx...z....1.0|

    Identification of a LiveStage project should be simple enough, but identifying and rendering back a Quicktime movie made by this software takes some work. In fact there are many “Enhanced CD’s” and CD-Extra titles out there with quite a few system requirements. If we are not careful, many of these little gems might get more difficult to experience or lost completely.

    If you would like to explore the Quicktime Movie from the Enhanced CD mentioned here, send me a message. You can also take a look at my signature proposal and samples files on my Github for LiveStage.

    SDIF

    I have used and have researched a lot of audio editing software. Some are very simple and straightforward, others are feature rich and take some time to learn. While looking in a format, I came across some Audio software which nothing like I have used before. At first I was confused, I figured it would be simple to open a certain file format and play the audio. Not so fast.

    Max is software which proudly says it is an, “infinitely flexible space to create your own interactive software”. Created by Cycling ’74 software, Max has been around for awhile, being developed in the mid 1980’s. It allows the user to make “patches” stringing around components and effects to accomplish an infinite amount of options and outcomes.

    The software produces simple project files and patch files, but hey are just JSON data, at least in the latest version. But when working with audio files the software can save to a number of formats.

    One of the options is a format called “SDIF”, which stands for “Sound Description Interchange Format“. SDIF was jointly developed by IRCAM and CNMAT, with proposals starting back in the mid-1990’s. Originally written as a Spectral Description, it was later changed to refer to a Sound Description.

    The Specification states the general idea was to “store information related to signal processing and specifically of sound, in files, according to a common format to all data types. Thus, it is possible to store results or parameters of analyses, syntheses…” So not exactly the same as a simple WAVE file you can open and edit, this format was meant to store signal data for analysis.

    Each SDIF file consists of a header and then an overall a succession of frames, not unlike chunks in the IFF/AIFF/RIFF formats, ordered in time. Each frame matrix declares a “Type” which can be a combination of many options. Lets take a look at a SDIF file:

    hexdump -C test.sdif | head
    00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
    00000010 31 54 52 43 00 00 00 20 00 00 00 00 00 00 00 00 |1TRC... ........|
    00000020 00 00 00 01 00 00 00 01 31 54 52 43 00 00 00 04 |........1TRC....|
    00000030 00 00 00 00 00 00 00 04 31 54 52 43 00 00 00 c0 |........1TRC....|
    00000040 3f 74 7a e1 40 00 00 00 00 00 00 01 00 00 00 01 |?tz.@...........|
    00000050 31 54 52 43 00 00 00 04 00 00 00 0a 00 00 00 04 |1TRC............|
    00000060 3f 80 00 00 45 95 35 c3 00 00 00 00 00 00 00 00 |?...E.5.........|
    00000070 40 00 00 00 46 06 e2 14 00 00 00 00 00 00 00 00 |@...F...........|
    00000080 40 40 00 00 45 3b 42 3d 00 00 00 00 00 00 00 00 |@@..E;B=........|
    00000090 40 80 00 00 43 5d 94 7b 00 00 00 00 00 00 00 00 |@...C].{........|

    This test file has the opening frame “SDIF“, to identify it as an SDIF, then a reference to the type “1TRC. I would try and explain a Matrix 1TRC Sinusoidal Track, but I have no idea what it means. Something, something sine wave, etc. Someone much smarter than me can make use of this format. Here are a couple examples of SDIF with other frame types.

    hexdump -C angry_cat.part.sdif| head
    00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
    00000010 31 4e 56 54 00 00 00 88 ff ef ff ff ff ff ff ff |1NVT............|
    00000020 ff ff ff fd 00 00 00 01 31 4e 56 54 00 00 03 01 |........1NVT....|
    00000030 00 00 00 61 00 00 00 01 53 74 72 65 61 6d 49 44 |...a....StreamID|
    00000040 09 30 0a 44 61 74 65 09 54 68 75 5f 41 75 67 5f |.0.Date.Thu_Aug_|
    00000050 5f 33 5f 32 31 2e 33 32 2e 34 35 5f 32 30 30 30 |_3_21.32.45_2000|
    00000060 5f 0a 54 61 62 6c 65 4e 61 6d 65 09 53 69 6e 75 |_.TableName.Sinu|
    00000070 73 6f 69 64 61 6c 54 72 61 63 6b 73 0a 57 72 69 |soidalTracks.Wri|
    00000080 74 74 65 6e 42 79 09 50 6d 5f 56 65 72 73 69 6f |ttenBy.Pm_Versio|
    00000090 6e 5f 31 2e 32 2e 32 0a 00 00 00 00 00 00 00 00 |n_1.2.2.........|

    hexdump -C cymbalum-c4.res.sdif| head
    00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
    00000010 31 52 45 53 00 00 0d 20 00 00 00 00 00 00 00 00 |1RES... ........|
    00000020 00 00 00 04 00 00 00 01 31 52 45 53 00 00 00 04 |........1RES....|
    00000030 00 00 00 d0 00 00 00 04 42 49 27 7a 39 59 fc ab |........BI'z9Y..|
    00000040 3d 35 06 c9 00 00 00 00 42 6e 68 68 39 63 99 b1 |=5......Bnhh9c..|
    00000050 3e 25 f7 c0 00 00 00 00 42 c6 02 bb 39 8c 31 79 |>%......B...9.1y|
    00000060 3f bb 7e 6e 00 00 00 00 43 01 82 96 3a 1d 36 44 |?.~n....C...:.6D|
    00000070 3e d9 21 12 00 00 00 00 43 07 35 f0 3a 20 6f 6e |>.!.....C.5.: on|
    00000080 3f 02 32 7f 00 00 00 00 43 30 84 0b 39 97 f9 1b |?.2.....C0..9...|
    00000090 3e c6 43 c7 00 00 00 00 43 4d e4 e4 39 88 14 90 |>.C.....CM..9...|

    Unfortunately, the common tools I use to explore AV formats don’t seem to work on this format. MediaInfo, FFProbe, Exiftool, all give me unknown file warnings. So I had to compile the SDIF software in order to get some details.

    querysdif angry_cat.part.sdif 
    Header info of file angry_cat.part.sdif:

    Format version: 3
    Types version: 1

    Ascii chunks of file angry_cat.part.sdif:

    1NVT
    {
    StreamID 0;
    Date Thu_Aug__3_21.32.45_2000_;
    TableName SinusoidalTracks;
    WrittenBy Pm_Version_1.2.2;
    }

    Data in file angry_cat.part.sdif (9504872 bytes):
    1933 1TRC frames in stream 0 between time 0.000000 and 5.794875 containing
    1933 1TRC matrices with 45 --400 rows, 4 -- 4 columns

    An interesting thing is that a SDIF file can be in text form as well.

    sdiftotext test.sdif 
    SDIF


    SDFC

    1TRC 1 1 0
    1TRC 0x0004 0 4

    1TRC 1 1 0.005
    1TRC 0x0004 10 4
    1 4774.72 0 0
    2 8632.52 0 0
    3 2996.14 0 0
    4 221.58 0 0
    5 1943.02 0 0
    6 123.951 0 0
    7 6705.04 0 0
    8 4304.97 0 0
    9 3554.29 0 0
    10 23.7822 0 0

    1TRC 1 1 0.01
    1TRC 0x0004 10 4
    1 4774.72 0.0353114 2.06098
    2 8632.52 0.00442518 0.68795
    3 2996.14 0.0238517 -1.42295
    4 221.58 0.0089712 -2.44141
    5 1943.02 0.00768914 2.64629
    6 123.951 0.0397061 -0.17527
    7 6705.04 0.0245643 -0.168753
    8 4304.97 0.00894803 1.45553
    9 3554.29 0.0265175 2.57231
    10 23.7822 0.0419019 -2.17731

    1TRC 1 1 0.2
    1TRC 0x0004 10 4
    1 2284.56 0.02781 2.47054
    2 4222.62 0.0151738 1.55309
    3 31.1554 0.00421461 -0.657285
    4 310.99 0.0122306 1.25794
    5 215.192 0.0174093 1.25468
    6 6253.69 0.000894192 2.21334
    7 8533.32 0.0296167 2.07209
    8 8044.77 0.0423002 2.54088
    9 6087.45 0.0264733 -2.05523
    10 7052.7 0.0287347 0.426339

    1TRC 1 1 0.205
    1TRC 0x0004 10 4
    1 2284.56 0 0
    2 4222.62 0 0
    3 31.1554 0 0
    4 310.99 0 0
    5 215.192 0 0
    6 6253.69 0 0
    7 8533.32 0 0
    8 8044.77 0 0
    9 6087.45 0 0
    10 7052.7 0 0

    1TRC 1 1 0.21
    1TRC 0x0004 0 4

    ENDC
    ENDF

    An interesting format for sure. But wait, there is more!

    My initial interest in this format was when I was given access to a set of MUBU files. I was unclear on how there were created at first and it took me down a long path of learning about SDIF and the Max software from Cycling ’74 and IRCAM. MUBU turns out to be a toolbox for Max which adds more analysis features.

    MUBU stands for MUlti-BUffer, which helps overcome some limitations. It is actually a container using the SDIF standard. Lets take a look.

    hexdump -C test.mubu | head
    00000000 53 44 49 46 00 00 00 08 00 00 00 03 00 00 00 01 |SDIF............|
    00000010 31 4e 56 54 00 00 00 78 ff ef ff ff ff ff ff ff |1NVT...x........|
    00000020 ff ff ff fd 00 00 00 01 31 4e 56 54 00 00 03 01 |........1NVT....|
    00000030 00 00 00 53 00 00 00 01 4d 75 42 75 2e 43 6f 6e |...S....MuBu.Con|
    00000040 74 61 69 6e 65 72 2e 4e 75 6d 54 72 61 63 6b 73 |tainer.NumTracks|
    00000050 09 31 0a 4d 75 42 75 2e 43 6f 6e 74 61 69 6e 65 |.1.MuBu.Containe|
    00000060 72 2e 56 65 72 73 69 6f 6e 09 31 2e 35 0a 4d 75 |r.Version.1.5.Mu|
    00000070 42 75 2e 43 6f 6e 74 61 69 6e 65 72 2e 4e 75 6d |Bu.Container.Num|
    00000080 42 75 66 66 65 72 73 09 31 0a 00 00 00 00 00 00 |Buffers.1.......|
    00000090 31 4e 56 54 00 00 00 38 ff ef ff ff ff ff ff ff |1NVT...8........|

    A MUBU file has the same SDIF frame header, but also include a “1NVT” frame, which is a Name Value Table. This is where the MUBU container is referenced. The MuBu file has its own structure:

    If I query the MuBu file like I did the SDIF, I get the following:

    querysdif test.mubu
    Header info of file test.mubu:

    Format version: 3
    Types version: 1

    Ascii chunks of file test.mubu:

    1NVT
    {
    MuBu.Container.NumTracks 1;
    MuBu.Container.Version 1.5;
    MuBu.Container.NumBuffers 1;
    }
    1NVT
    {
    MuBu.Buffer.Index 0;
    }
    1NVT
    {
    MuBu.Track.MxRows 2;
    AudioFile 1;
    MuBu.Track.NonNumType 0;
    MuBu.Track.MaxSize 93515;
    meta_ISFT Lavf60.16.100;
    MuBu.Track.Name mytrack;
    MuBu.Track.BufferIndex 0;
    MuBu.Track.SampleRate 48000;
    FileName Wilhelm_Scream.wav;
    MuBu.Track.MxVarRows 0;
    MuBu.Track.MxCols 1;
    meta_MetaDataSource WAV;
    MuBu.Track.EndTime 1623.5;
    FilePath /;
    MuBu.Track.SampleOffset 0;
    MuBu.Track.TimeTags 0;
    MuBu.Track.Size 77929;
    MuBu.Track.Index 0;
    }

    1TYP
    {
    1MTD M000 {unnamed}
    1FTD M000
    {
    M000 Track-0-MatrixData;
    }
    }

    Data in file test.mubu (3741392 bytes):
    77929 M000 frames in stream 0 between time 0.000000 and 1.623500 containing
    77929 M000 matrices with 2 -- 2 rows, 1 -- 1 columns

    The MuBu file contains one audio track and one buffer. This is a simple test file, but MuBu files can be quite large with multiple tracks.

    Working with the Max software or OpenMusic is not something I found to be easy to understand. I am sure if I was more musically inclined and with a little practice I could make some of this work. For the time being, a signature to identify a SDIF and MUBU will have to do. Check out the GitHub for my proposed signature and a couple examples.

    PROmotion

    The 1990’s was an amazing time for multimedia. Compared to what is possible today, the graphics were more simple but there were many software titles leading the charge in Animation. Macromedia Director, along with Flash, dominated the interactive multimedia market for quite some time. Eventually being picked up by Adobe and discontinued in 2013. Quite a few multimedia disc’s out there were built using Director.

    Competing with Director, another company had a strong product. Motion Works International was an early pioneer in the multimedia CD-ROM scene. Rumor has it, Motion Works was started by a 12 year old. Motion Works had been making software for use with the highly successful HyperCard software since 1988. In 1992 they released the successor to their ADDmotion software, a path based animation tool called PROmotion.

    PROmotion was used with with many Multimedia titles, some in cooperation with the Corel Home series. In addition to commercial titles PROmotion was a great tool for the creation of animation clips and other marketing material. I came across some stand-alone marketing files for old scriptwriting software called ScriptWare. When I unarchived the HQX file and Installed the Demo, I was presented with a set of files with the .MW extension.

    ls -l@
    total 10232
    -rw-r--r--@ 1 tyler  staff  1392 May  1 23:17 Read me first!
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	 452 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 begin_here.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	158901 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 characters.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	387029 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 cinovation.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	189509 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 cut paste.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	608405 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 formats.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	289698 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 modify formats.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	486730 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 notes.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	319250 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 overview.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	376854 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 scene shuffle.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	359746 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 script elements.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	279052 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 sw_menu.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	421836 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 title page.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	236614 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 transitions.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	471462 
    -rw-r--r--@ 1 tyler  staff     0 May  1 23:17 try it.MW
    	com.apple.FinderInfo	  32 
    	com.apple.ResourceFork	622312 
    
    getfileinfo sw_menu.MW 
    file: "sw_menu.MW"
    type: "APPL"
    creator: "AMvw"

    Looking at the files in the directory with their extended attributes I can see all the .MW files have no data fork (0 bytes), only a resource fork. This is common for any Application on the MacOS systems prior to MacOS X. At first the MW extension made me thing of MacWrite, but launching one of these MW files brought up an interactive menu. The type being APPL, which is Application.

    What I thought would be a demo of the application Scriptware was actually interactive animations demonstrating the software. By dumping the resource fork of one of the MW files I found some information which helped me know what software created these interactive demos.

    derez Scriptware\ Demo\ folder/sw_menu.MW
    
    data 'vers' (1) {
    	$"0103 8000 0000 0531 2E30 2E33 2941 4D20"            /* ..?....1.0.3)AM  */
    	$"5669 6577 6572 2031 2E30 2E33 0DA9 2031"            /* Viewer 1.0.3.? 1 */
    	$"3939 3320 4D6F 7469 6F6E 2057 6F72 6B73"            /* 993 Motion Works */
    	$"2049 6E74 6C2E"                                     /*  Intl. */
    };
    
    data 'vers' (2) {
    	$"0103 8000 0000 0531 2E30 2E33 1E50 6C61"            /* ..?....1.0.3.Pla */
    	$"7962 6163 6B20 6279 204D 6F74 696F 6E20"            /* yback by Motion  */
    	$"576F 726B 7320 496E 746C 2E"                        /* Works Intl. */
    };
    
    data 'STR#' (1250, "ADDmotion HC strings") {
    	$"000A 1641 4444 6D6F 7469 6F6E 5F65 7870"            /* ...ADDmotion_exp */
    	$"6F72 745F 6672 616D 650E 4144 446D 6F74"            /* ort_frame.ADDmot */
    	$"696F 6E5F 696E 666F 1141 4444 6D6F 7469"            /* ion_info.ADDmoti */
    	$"6F6E 5F73 7573 7065 6E64 1041 4444 6D6F"            /* on_suspend.ADDmo */
    	$"7469 6F6E 5F72 6573 756D 650E 4144 446D"            /* tion_resume.ADDm */
    	$"6F74 696F 6E5F 7175 6974 0E41 4444 6D6F"            /* otion_quit.ADDmo */
    	$"7469 6F6E 5F70 6C61 790E 4144 446D 6F74"            /* tion_play.ADDmot */
    	$"696F 6E5F 7374 6F70 0F41 4444 6D6F 7469"            /* ion_stop.ADDmoti */
    	$"6F6E 5F70 6175 7365 0000"                           /* on_pause.. */
    };

    Makes sense, MW stood for “Motion Works”. ADDmotion was another software title developed by Motion Works, most will remember it as an add-on for Hypercard for adding animation to stacks. These MW files are created using PROmotion and exporting them as a stand-alone animation which includes the “AM Viewer” built in. A regular PROmotion file, however, did not include a viewer and requires the software in order to open and run.

    -rwx------@ 1 tyler  staff      0 Apr 25 15:51 Example Animation
    	com.apple.FinderInfo	   32 
    	com.apple.ResourceFork	495272 

    The PROmotion file format also is Resource Fork only, making them difficult to manage outside of a Macintosh.

    getfileinfo Example\ Animation
    file: "Example Animation"
    type: "ADDm"
    creator: "ADDm"

    The files do have a Type/Creator code of “ADDm”, but with no data fork, identification through standard means is not possible. They also do not have the “vers” string to help identify them within the Resource Fork. Since standard methods of identification are impossible, I hope in the future there will be more tools available to read the Type/Creator codes while on the Mac, or in a disk image, or within a container and return back the Software which created the file and the file type.

    The products from Motion Works where significantly cheaper than animation tools such as Director, but were still pretty powerful for its day. I was surprised when I found the company didn’t last much longer than 1998 before disappearing. There are probably many stories like PROmotion, coming onto the scene with new and exciting features before thought impossible only to die out as other tools dominate the market.

    If you are interested in looking at the files yourself, here is a link to some original files, and the same files encoded in MacBinary.

    Scheduling EXport

    During a recent review of some help files for some older Final Draft software I came across this Q&A.

    Needless to say, I was intrigued, but let me give you a pro tip. Googling MovieMagic and “SEX” does not bring back results related to file formats. Also, probably best not to search at work.

    Movie Magic refers to software developed by Write Brothers/Screenplay. The main software, Screenwriter is a word processor built specifically for writing screen plays for TV, Movies, theater, etc. The software was first developed in 1983 and quickly replaced typewriters as the favorite for writing the very specific formatting required by screenplays.

    Screenwriter version 6 uses the extension MMSW and DEF for templates. Let’s have a look at one under the hood.

    hexdump -C Screenwriterv6-01.mmsw | head
    00000000  53 63 72 65 65 6e 77 72  69 74 65 72 57 69 6e 56  |ScreenwriterWinV|
    00000010  65 72 2e 20 36 2e 30 30  20 00 00 00 00 00 00 00  |er. 6.00 .......|
    00000020  00 00 00 0c 4e 4f 4e 41  4d 45 32 2e 4d 4d 53 57  |....NONAME2.MMSW|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    00000050  00 00 00 00 00 00 00 00  0f 0a 0a 00 00 00 00 0a  |................|
    00000060  01 0c 00 00 0f 0a 0a 01  00 00 01 0a 01 0c 00 00  |................|
    00000070  19 18 00 00 00 00 04 0a  01 0c 00 00 25 0a 0a 01  |............%...|
    00000080  00 00 03 0a 01 0c 00 00  0f 0a 0a 01 00 00 06 0a  |................|
    00000090  01 0c 00 00 1e 23 00 00  00 00 05 0a 01 0c 00 00  |.....#..........|

    The header is easy to interpret, Version 6 is the latest version of the software. The rest of the file is non-human readable binary so not much to look at. The screenplay website has a chart for figuring out compatibility for all the versions and extensions.

    The other major version used the SCW extension.

    hexdump -C ScreenWriter4-s01.scw | head
    00000000  53 63 72 65 65 6e 77 72  69 74 65 72 77 69 6e 76  |Screenwriterwinv|
    00000010  65 72 2e 20 34 2e 31 31  61 00 00 00 00 00 00 00  |er. 4.11a.......|
    00000020  00 00 00 11 53 63 72 65  65 6e 57 72 69 74 65 72  |....ScreenWriter|
    00000030  34 2d 73 30 31 00 00 00  00 00 00 00 00 00 00 00  |4-s01...........|
    00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000050  00 00 00 00 00 00 00 00  0f 0a 0a 00 00 00 00 0a  |................|
    00000060  00 0c 00 00 0f 0a 0a 01  00 00 01 0a 00 0c 00 00  |................|
    00000070  19 18 00 00 00 00 04 0a  00 0c 00 00 25 0a 0a 01  |............%...|
    00000080  00 00 03 0a 00 0c 00 00  0f 0a 0a 01 00 00 06 0a  |................|
    00000090  00 0c 00 00 1e 23 00 00  00 00 05 0a 00 0c 00 00  |.....#..........|
    
    hexdump -C Screenwriterv6-01.scw | head
    00000000  53 63 72 65 65 6e 77 72  69 74 65 72 57 69 6e 56  |ScreenwriterWinV|
    00000010  65 72 2e 20 34 2e 39 30  00 00 00 00 00 00 00 00  |er. 4.90........|
    00000020  00 00 00 0c 4e 4f 4e 41  4d 45 32 2e 4d 4d 53 57  |....NONAME2.MMSW|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    Files from the earlier version appear to be structured the same way. Although files generated on a Macintosh still seem to retain the “win” in the header, but with a capital “Win” and “Ver”. Looking at the older ScriptThing format it is similar.

    hexdump -C NONAME1.SCR | head
    00000000  53 63 72 69 70 74 54 68  69 6e 67 20 56 65 72 2e  |ScriptThing Ver.|
    00000010  20 32 2e 31 39 00 00 00  00 00 07 4e 4f 4e 41 4d  | 2.19......NONAM|
    00000020  45 31 2e 53 43 57 00 00  00 00 00 00 00 00 00 00  |E1.SCW..........|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 07 00 01  |................|
    00000050  00 0f 49 0b 00 00 0f 49  0b 01 01 19 3b 01 00 04  |..I....I....;...|
    00000060  25 49 0b 01 03 0f 49 0b  01 06 1e 30 01 00 05 0f  |%I....I....0....|
    00000070  49 0b 01 02 0f 49 0b 01  0a 0f 49 0b 01 0b 0f 49  |I....I....I....I|
    00000080  0b 04 14 00 00 02 02 02  02 02 00 00 01 02 00 37  |...............7|
    00000090  0b 28 43 4f 4e 54 49 4e  55 45 44 29 00 0a 43 4f  |.(CONTINUED)..CO|
    
    hexdump -C MULTIMEDIA DEMO.SCW | head
    00000000  53 63 72 69 70 74 54 68  69 6e 67 20 57 69 6e 56  |ScriptThing WinV|
    00000010  65 72 2e 20 31 2e 32 35  64 00 08 44 49 43 54 46  |er. 1.25d..DICTF|
    00000020  49 4c 45 1a 4d 75 6c 74  69 6d 65 64 69 61 20 44  |ILE.Multimedia D|
    00000030  65 6d 6f 20 53 63 72 69  70 74 2e 53 43 52 00 00  |emo Script.SCR..|
    00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000050  00 00 00 00 00 00 00 00  0a 0e 0a 00 00 00 00 0a  |................|
    00000060  00 0c 00 00 0a 0d 0a 01  00 00 01 0a 00 0c 00 05  |................|
    00000070  14 1b 00 00 00 00 04 0a  00 0c 00 00 20 0d 0a 01  |............ ...|
    00000080  00 00 03 0a 00 0c 00 00  0c 17 0a 01 00 00 06 0a  |................|
    00000090  00 0c 00 01 19 26 00 00  00 00 05 0a 00 0c 00 00  |.....&..........|

    Screenwriting software is very specialized, the formatting is important as well as keeping track of characters, locations, props, etc. An important part of filming a movie based on a script is to schedule the different scenes and which characters are needed for each scene. This scheduling can be generated by generating a Scheduling EXport. Not sure who decided on this extension, but there it is.

    hexdump -C Screenwriterv6-01.sex | head
    00000000  53 53 49 2a 00 23 00 00  00 e2 00 00 00 00 43 61  |SSI*.#........Ca|
    00000010  73 74 20 4d 65 6d 62 65  72 73 00 45 78 74 72 61  |st Members.Extra|
    00000020  73 00 53 74 75 6e 74 73  00 56 65 68 69 63 6c 65  |s.Stunts.Vehicle|
    00000030  73 00 50 72 6f 70 73 00  53 70 65 63 69 61 6c 20  |s.Props.Special |
    00000040  45 66 66 65 63 74 73 00  43 6f 73 74 75 6d 65 73  |Effects.Costumes|
    00000050  00 4d 61 6b 65 75 70 00  4c 69 76 65 73 74 6f 63  |.Makeup.Livestoc|
    00000060  6b 00 41 6e 69 6d 61 6c  20 48 61 6e 64 6c 65 72  |k.Animal Handler|
    00000070  00 4d 75 73 69 63 00 53  6f 75 6e 64 00 53 65 74  |.Music.Sound.Set|
    00000080  20 44 72 65 73 73 69 6e  67 00 47 72 65 65 6e 65  | Dressing.Greene|
    00000090  72 79 00 53 70 65 63 69  61 6c 20 45 71 75 69 70  |ry.Special Equip|

    Scheduling Export files begin with “SSI“, which I assume refers to Screenplay Systems, Inc. They begin with listing all the different things which need scheduling in plain text. In Screenwriter 6 there are a couple of export types with the same extension. One called Gorilla Scheduling and another called CompanyMOVE ShowPlanner, but both are identical to the Movie Magic file, so I am not quite sure their purpose. Maybe there would be more to discern from a whole script instead of the simple samples I made for this purpose.

    This was a fun format to research, although I had to be careful of the terms I used! You can check out my proposed signatures and samples on my GitHub.

    Sibelius

    Music notation software is among the earliest software for desktop computers. SCORE in 1987, Finale came around in 1988, Capella in 1992, and Sibelius in 1993. Many others came and went during this time. Music notation software was so much more than the typical word processing or desktop publishing system. Specialized fonts were needed to display the music notation and there are many other variables for different instruments allowing individuals and others the ability to create complicated compositions in an inexpensive way.

    Sibelius [SI] + [BAY] + [LEE] + [UHS] was originally developed for the Acorn system in 1986, then released on Windows and Macintosh in 1998-99. The software became very popular and in 2006 was purchased by the software giant AVID. The software was used enough to get a preservation assessment by the British Library in 2017 and draft status format description by the Library of Congress, written by the amazing Ashley!

    Both reviews of the format emphasize the proprietary nature of the file format which has been used since the early versions. Aside from the early Acorn release, the Windows and Macintosh versions used a binary format with the SIB extension. They are actually quite easy to identify.

    hexdump -C Sibelius-s01.sib | head
    00000000  0f 53 49 42 45 4c 49 55  53 00 00 40 00 02 00 a4  |.SIBELIUS..@....|
    00000010  f1 ed 00 00 00 30 00 00  00 02 00 00 00 01 00 00  |.....0..........|
    00000020  00 2a 00 00 00 00 00 00  00 00 0f 53 49 42 45 4c  |.*.........SIBEL|
    00000030  49 55 53 00 00 40 00 02  00 00 00 00 00 00 00 3a  |IUS..@.........:|
    00000040  00 00 00 00 38 a1 28 06  b3 d2 2f 66 03 04 16 4e  |....8.(.../f...N|
    00000050  5f 5c 8d f3 95 27 3e f1  2a 1b 68 de 08 81 e8 9a  |_\...'>.*.h.....|
    00000060  ea 1c bf dd 54 0e 92 8d  4d be e3 34 ed 42 78 36  |....T...M..4.Bx6|
    00000070  d2 e1 67 7b 8d f7 98 6a  3a 70 c4 8b 0b 08 7b 26  |..g{...j:p....{&|
    00000080  f9 45 00 00 00 00 48 71  7c 4c 98 df 0b 38 7d 9d  |.E....Hq|L...8}.|
    00000090  2a 2d 84 9c a4 39 0f 4d  da a2 cc 97 ad 3d b0 55  |*-...9.M.....=.U|

    This is exactly how PRONOM and other identification methods determine they are Sibelius files. PRONOM has assigned the format fmt/696 and is looking for the hexadecimal bytes 0F534942454C495553.

    The problem with this identification method is that all the Sibelius files are identified as such, regardless of version. As mentioned by Ashley, version of the software used is highly important as new features were added all the time making backwards compatibility difficult. Add in the fact that there were different releases for each version which would limit these features even more and I can see how a musician could get very frustrated. If you created a score in Sibelius 5 and tried to open in Sibelius 5 Student version, you may find your composition lacking in many ways. The only way to avoid compatibility issues is to always open in the latest “Ultimate” version. Sibelius Ultimate can open all versions of the SIB format back to version 2. The software even has an export feature which allows you to export back to a previous version stripping what is necessary to ensure compatibility.

    Sibelius export to previous version

    For those with a bunch of SIB files in their archives, how would you know which software version created the file? Well lets take a closer look at the bytes and see if we can find some patterns. Let it be known, I am not reverse engineering the format, just looking for patterns which will allow for proper identification!

    I am not the first person to ask this question, many others want to know the versions of their SIB files. Thankfully others have found some clues on which bytes hold the version information. It seems we can determine the version based on 4 bytes shortly after the SIBELIUS string. Specifically bytes 10-13.

    hexdump -C Sibelius2-s01.sib | head
    00000000  0f 53 49 42 45 4c 49 55  53 00 00 08 00 22 00 47  |.SIBELIUS....".G|
    00000010  98 4c 00 00 00 3a 00 00  00 00 4e 81 49 34 41 2c  |.L...:....N.I4A,|
    00000020  fa 76 62 f9 71 53 a9 93  0f 54 1e 20 6c 63 61 4d  |.vb.qS...T. lcaM|
    00000030  f7 b2 b0 a7 5d bd 82 3a  0d 86 02 8b f2 89 d2 a0  |....]..:........|
    00000040  83 1f 8d e0 37 1b ed 1c  6a 8b 82 08 4b 6d 64 60  |....7...j...Kmd`|
    00000050  71 59 e8 aa ef b1 3c df  5c 25 0a 9f 66 50 69 de  |qY....<.\%..fPi.|
    00000060  2a d3 4e 2a cd 97 88 06  67 5f 50 64 0f 8f 86 2b  |*.N*....g_Pd...+|
    00000070  08 0d 3f f7 80 26 e0 63  f6 7d 4e f8 e7 c0 3f fc  |..?..&.c.}N...?.|
    00000080  7a 77 ea b3 4a b9 30 59  13 47 6e 09 0a 0b ae 3c  |zw..J.0Y.Gn....<|
    00000090  c1 93 85 f6 41 f8 58 22  4b 92 35 3f b2 f5 3f 9d  |....A.X"K.5?..?.|

    From what others have gathered and updating it with more recent versions I have come up with a list.

    VersionHex 10-13
    Sibelius 1.200 00 00 0E
    Sibelius 2.x00 08 xx xx 
    Sibelius 3.x00 0A xx xx 
    Sibelius 4.x00 1B xx xx 
    Sibelius 5.000 2D 00 03 
    Sibelius 5.100 2D 00 0D 
    Sibelius 5.2.x – 5.400 2D 00 10 
    Sibelius 6.0.x00 36 00 01 
    Sibelius 6.100 36 00 17 
    Sibelius 6.200 36 00 1E 
    Sibelius 7.000 39 00 0C 
    Sibelius 7.0.1 – 7.0.200 39 00 0E 
    Sibelius 7.0.300 39 00 13 
    Sibelius 7.1.000 39 00 15 
    Sibelius 7.1.2 – 7.1.300 39 00 16 
    Sibelius 7.5.x00 3D 00 0E 
    Sibelius 8.0.0 – 8.0.100 3D 00 10 
    Sibelius 8.1.x00 3E 00 00 
    Sibelius 8.200 3E 00 01 
    Sibelius 8.300 3E 00 02 
    Sibelius 8.4.x00 3E 00 06 
    Sibelius 8.5.x00 3E 00 07 
    Sibelius 8.6.x, 8.7.0, 8.7.100 3F 00 00 
    Sibelius 8.7.2, 2018.1, 2018.4.x, 2018.5, 2018.6, 2018.700 3F 00 01 
    Sibelius 2018.11, 2018.1200 3F 00 02
    Sibelius 2019.100 3F 00 04
    Sibelius 2019.4.x, 2019.5, 2019.7, 2019.900 3F 00 06
    Sibelius 2019.1200 3F 00 07
    Sibelius 8.6-2019.1200 3F 00 0A
    Sibelius 2020.100 3F 00 0B
    Sibelius 2020.3, 2020.600 40 00 01
    Sibelius 2020.900 40 00 02
    Sibelius 2022.500 40 00 03
    Sibelius 2022.1100 41 00 02
    Sibelius 2022.1200 42 00 00
    Sibelius 2023.300 42 00 01
    Sibelius 2023.800 43 00 07
    Sibelius 2024.3.100 44 00 01

    That is a lot of versions and I feel there may be some gaps that still need to be identified. It appears that the first two bytes are the major version and the second set of bytes is the minor version. Although it looks like a few major version bytes span across a few software versions. With this chart, one could be very specific in identifying which Sibelius version wrote the file, but for archiving purposes it seems we can group many of these capturing just the major version. The export screenshot above seems to have broken down significant changes and grouped similar formats together, the biggest being 8.6 through 2019.12. A comparison of “student” and “first” formats don’t have any obvious bytes which indicate as such, so for now they are all lumped together.

    There is one other similar format which needs to be mentioned. Sibelius Scorch was a product made to share scores online. This has been replaced with Sibelius Cloud Publishing, but for awhile was the best way to share a score with others in a way that protected the original. I have no idea how they were made, but sites like scorestreet.net and sibeliusmusic.com were sites you could upload your score to for sharing. Some SCO files appear to have a PDF embedded within them for proper printing.

    hexdump -C smd_h_0000000000097761.sco | head
    00000000  0f 43 43 53 43 4f 52 43  48 00 00 36 00 1e 00 c0  |.CCSCORCH..6....|
    00000010  d4 55 00 00 00 30 00 00  00 01 00 00 00 01 00 00  |.U...0..........|
    00000020  00 22 0f 43 43 53 43 4f  52 43 48 00 00 36 00 1e  |.".CCSCORCH..6..|
    00000030  00 00 00 00 00 00 00 3a  00 00 00 00 03 56 11 b9  |.......:.....V..|
    00000040  70 dc fe 90 50 48 30 df  eb 39 88 23 8e 88 78 bf  |p...PH0..9.#..x.|
    00000050  da ab ab 5b e2 13 98 89  66 eb 94 67 8d 16 00 00  |...[....f..g....|
    00000060  00 00 cf 6f 0c 67 85 ec  57 90 e5 c1 ea 8a eb 9f  |...o.g..W.......|
    00000070  c8 13 d2 1d 75 bd a5 9f  eb b9 ef 1d 25 79 45 2c  |....u.......%yE,|
    00000080  05 bb 74 41 e8 8f 27 6a  01 07 d0 f5 3b 17 ce 87  |..tA..'j....;...|
    00000090  7b c2 82 d9 41 6b 82 2f  d8 b8 17 32 fa d3 59 05  |{...Ak./...2..Y.|

    I am not sure the best way to handle all the different versions within the PRONOM registry. I went ahead and made a few signatures based on the export dialog of Sibelius 2024. Even with combining a few together, it leaves us with 17 new PUID’s. Maybe further discussion can refine these down a bit more? Regardless, each file can be associated with a specific Sibelius version, making it easier to open and migrate if needed without fear of opening in the wrong version. Take a look at some samples and my signatures on my GitHub page and let me know if there is a better way.