CD Architect

Receiving electronic media from an outside source can be an adventure. Often times you find yourself sorting the valuable files and separating them from the chaff. There can be hidden files, cache files, application files, drivers, and everything in between. Determining what formats are important can sometimes be difficult, especially if you don’t know the file format of some of the files.

I was recently working on a collection of files which had been produced through some audio software. When working with audio, a WAVE file is what is usually kept as they contain the actual audio data. With these files they came with a couple other formats. One of those formats was a bunch of SFK peak files. These files are meant to be temporary as they are generated from the WAVE file to make opening of audio data faster. They are important, but can easily be regenerated. One could argue they have historical value, but also they don’t contain anything that can be used by itself, so alone they don’t have much value.

The other format found with the WAVE files have a CDP extension. These came up as unknown when using DROID. It is not a common extension so finding the name of the software which created the files wasn’t too hard. Let’s take a look at one of them.

hexdump -C tutor1.cdp | head
00000000 52 49 46 46 79 03 00 00 53 46 50 4a 66 6d 74 20 |RIFFy...SFPJfmt |
00000010 18 00 00 00 00 00 01 00 02 00 00 00 10 00 00 00 |................|
00000020 44 ac 00 00 03 00 00 00 01 00 00 00 4c 49 53 54 |D...........LIST|
00000030 88 00 00 00 66 6c 73 74 66 69 6c 65 23 00 00 00 |....flstfile#...|
00000040 44 3a 5c 53 6f 75 6e 64 73 5c 4e 65 77 20 54 75 |D:\Sounds\New Tu|
00000050 74 6f 72 20 66 69 6c 65 73 5c 53 6f 6e 67 33 2e |tor files\Song3.|
00000060 77 61 76 00 66 69 6c 65 23 00 00 00 44 3a 5c 53 |wav.file#...D:\S|
00000070 6f 75 6e 64 73 5c 4e 65 77 20 54 75 74 6f 72 20 |ounds\New Tutor |
00000080 66 69 6c 65 73 5c 53 6f 6e 67 32 2e 77 61 76 00 |files\Song2.wav.|
00000090 66 69 6c 65 23 00 00 00 44 3a 5c 53 6f 75 6e 64 |file#...D:\Sound|

Huh, this is a RIFF file. RIFF is most commonly used as the container used for WAVE and AVI files. You can read more about the RIFF format on a previous post. The RIFF container format can be used for all sorts of things. Looking at the internals we can see a few unique list chunk’s.

Lots of references to other files, specifically WAVE files. But not a lot of actual data. That is because this format turns out to be just a project format for some software called “CD Architect“. Sonic Foundry was an audio software developer for a few years before they sold their catalog to Sony in 2003. In looking at the manual for CD Architect version 5.2, it explains the CDP Project format.

CD Architect software handles the organization of your CD using a small project file (CDP) that saves information about source file locations, edits, cuts, and insertion points. This project file is not a multimedia file, but is instead used to create the CD when editing is finished.

Looking at another CDP file from the collection, I noticed something different.

hexdump -C CDArch50a-s01.cdp | head
00000000 72 69 66 66 2e 91 cf 11 a5 d6 28 db 04 c1 00 00 |riff......(.....|
00000010 20 0a 00 00 00 00 00 00 84 38 15 b3 da 08 85 44 | ........8.....D|
00000020 b2 2a 5b 70 a1 32 15 ff 5a 2d 8f b2 0f 23 d2 11 |.*[p.2..Z-...#..|
00000030 86 af 00 c0 4f 8e db 8a 00 02 00 00 00 00 00 00 |....O...........|
00000040 78 00 00 00 00 00 04 00 11 00 00 00 44 ac 00 00 |x...........D...|
00000050 00 00 00 00 00 c0 52 40 00 00 00 00 00 00 5e 40 |......R@......^@|
00000060 00 00 00 00 00 00 00 00 04 00 04 00 40 00 00 00 |............@...|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 7c 00 00 00 |............|...|
00000080 50 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 |P...............|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

That’s odd, the RIFF format is always uppercase ASCII, this is lowercase. Also the important RIFF form, which was “SFPJ” in the other sample, is missing. This is not a valid RIFF format.

But further down in the file I can see the same list chunks. Did they take RIFF format and make a proprietary version of their own? I think they may have. It seems the first example was from CD Architect version 4 and these other files are from CD Architect version 5. That complicates things. Sony stopped developing CD Architect after version 5.2d and maintained it for a few years before selling many of their titles to MAGIX Software. As far as I know there was never any new versions released. The software was very popular, as it had some really nice audio mastering features and was easy to use. Many were upset when the software was abandoned.

Creating a signature for both version 4 and version 5 CDP files will be pretty straightforward. I feel knowing what you have in a collection you are processing is the first step in making informed decisions. Wether or not you keep the project files are up for debate. Some may only want the final audio created from a CD Architect project, while others may want to see the way the audio was put together and mixed. Either way, the more you know…..

One more thing. CD Architect would default to saving a CDP project file, but could also save a “CD Image file”. This process actually would save the project to a full WAVE file with some extras baked in.

An image file is essentially a wave file with volume, crossfades, effects, mixes, and track information embedded. Burning an image file will reduce the risk of buffer underruns (especially if you have a complex project or are using a slow computer) since no audio processing is required. 

Interesting, normally when working with track information in a single WAVE file you would need a companion CUE Sheet in order to reference the track layout of the Audio CD. So I am curious how they do all of this. Lets take a look at a “CD Image”.

mediainfo CDArch52d-s02.wav
General
Complete name : CDArch52d-s02.wav
Format : Wave
Format settings : PcmWaveformat
File size : 5.05 MiB
Duration : 30 s 0 ms
Overall bit rate mode : Constant
Overall bit rate : 1 411 kb/s
Conformance errors : 2
RIFF : Yes
General compliance : File size 5292434 is less than expected size 5292823 (offset 0x8)
WAVE : Yes
General compliance : Element size 5292811 is more than maximal permitted size 5292422 (offset 0xC)

Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 30 s 0 ms
Bit rate mode : Constant
Bit rate : 1 411.2 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 5.05 MiB (100%)

Already seeing some issues with the format, but all the important bits are there. JHOVE doesn’t like them much either.

JhoveView (Rel. 1.32.0, 2024-09-12)
Date: 2024-12-11 16:01:08 MST
RepresentationInformation: CDArch52d-s02.wav
ReportingModule: WAVE-hul, Rel. 1.8.3 (2024-03-05)
LastModified: 2024-12-11 15:58:02 MST
Size: 5292434
Format: WAVE
Status: Not well-formed
SignatureMatches:
WAVE-hul
InfoMessage: Ignored unrecognized list type: "pqls"
ID: WAVE-HUL-15
Offset: 5292044
ErrorMessage: Unexpected end of file: Bytes missing = 389
ID: WAVE-HUL-3
Offset: 5292434
MIMEtype: audio/vnd.wave; codec=1
Profile: PCMWAVEFORMAT

JHOVE is giving me two issues. The major error is the file appears truncated according to both MediaInfo and JHOVE. The InfoMessage which is less of an issue but more of a heads up that the WAVE file has an extra LIST type. “PQLS”, which was also in the CPD RIFF file we looked at earlier. So it seems by making a “CD Image” of a project embeds the project chunk data into the WAVE container. Identification is not an issue as these WAVE’s follow the standard pattern and therefore identify correctly, but one might want to be aware through further characterization these WAVE’s have some not so obvious extra data.

My attempts to find any samples from version 3 of CD Architect have failed. Until then, my proposal is to add version 4 & 5 to PRONOM with the signature on my Github page. There you will find a few samples as well.

A2R / MOOF / WOZ

There seems to be a never ending growing list of disk image formats. Many have features which are specific to the media and format. If you have ever imaged an older Macintosh floppy you know they are special. If you add in copy-protection which many early Apple II floppies have, and you need special drives, hardware, and a special format to store the floppy data.

When imaging special media, especially with unique media, it is best practice to image the floppies at the magnetic flux level.

Floppy disks contain magnetic fluctuations which are measured and recorded using specialized equipment. A popular method is using a Kryoflux board, floppy drive, and software. The software communicates with a custom controller board connected to a floppy drive through USB. If you are interested in the different controller boards, a good list has been compiled here.

A Kryoflux, fluxengine, greaseweazle, all can image specialized disks like a Macintosh 800k floppy, but the best controller board for them is an Applesauce setup. They are specifically designed to for the task. With that task, comes a few specialty formats.

A file format which can store flux data is a bit different than a regular disk image format. The flux data contains all the low-level recordings which can then be interpreted into disk images much like the original floppy. In the case of an Applesauce flux image, it can contain all the small nuances of the original floppy, this includes recording any copy protection or other creative methods used by software vendors throughout the years. The format used for storing this flux data is the A2R format.

A2R is in its third iteration. Let’s take a look at the basics of the format.

hexdump -C Samplev3.a2r | head
00000000 41 32 52 33 ff 0a 0d 0a 49 4e 46 4f 25 00 00 00 |A2R3....INFO%...|
00000010 01 41 70 70 6c 65 73 61 75 63 65 20 76 31 2e 38 |.Applesauce v1.8|
00000020 38 2e 35 20 20 20 20 20 20 20 20 20 20 20 20 20 |8.5 |
00000030 20 02 01 01 00 52 57 43 50 e9 49 6e 01 01 24 f4 | ....RWCP.In..$.|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 43 01 00 |.............C..|
00000050 00 01 27 3a 25 00 91 d9 00 00 21 20 21 21 21 21 |..':%.....! !!!!|
00000060 1f 21 21 21 21 1f 24 5e 24 1f 21 21 20 21 24 5c |.!!!!.$^$.!! !$\|
00000070 24 20 21 21 21 1f 24 5c 25 21 21 1f 21 21 23 5b |$ !!!.$\%!!.!!#[|
00000080 25 20 21 21 21 1f 21 22 23 3f 41 3f 26 3e 43 3f |% !!!.!"#?A?&>C?|
00000090 43 5f 41 27 3d 61 41 27 3d 61 3f 28 3e 61 3f 26 |C_A'=aA'=a?(>a?&|

hexdump -C Samplev2.a2r | head
00000000 41 32 52 32 ff 0a 0d 0a 49 4e 46 4f 24 00 00 00 |A2R2....INFO$...|
00000010 01 41 70 70 6c 65 73 61 75 63 65 20 76 31 2e 31 |.Applesauce v1.1|
00000020 2e 36 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |.6 |
00000030 20 02 01 01 53 54 52 4d 75 17 5d 01 00 01 e6 da | ...STRMu.].....|
00000040 00 00 83 a9 12 00 12 1e 11 13 1e 13 1e 13 11 1f |................|
00000050 21 1f 11 13 1c 14 1e 30 14 20 1e 14 1e 14 1c 14 |!......0. ......|
00000060 1c 13 11 20 21 1f 11 11 0f 13 1e 14 1c 14 2e 21 |... !..........!|
00000070 13 1e 13 1e 14 1e 11 11 20 21 1f 11 11 13 1e 1f |........ !......|
00000080 13 20 30 21 11 11 0f 13 1e 13 11 30 1f 21 20 13 |. 0!.......0.! .|
00000090 11 30 1f 14 1e 30 14 1e 11 11 11 1e 13 11 1e 14 |.0...0..........|

The A2R format uses a chunk system to store the various pieces to the format. Earlier versions used a STRM Chunk to store all the raw flux data. Version 3 changed to a RWCP Chunk to store all the raw flux data. Applesauce uses a 2-pass imaging process, doing a rapid imaging to determine where on the media surface track data exists and then a second pass that captures longer durations for processing and error correction.

Once the full raw flux data has been captured that data can be interpreted as a disk image. The Applesauce software is able to make a regular disk image, a Disk Copy 4.2 file, which are well known and identify in PRONOM as fmt/625, but can also create a couple of special disk image formats which allow for special nuances on an original disk.

The WOZ Disk Image format is an offshoot of the Applesauce project. Capturing highly accurate bit data is of no use if you don’t have a container to hold the data. The WOZ format was designed to be able to contain every possible Apple ][ disk structure and layout. It can be so accurate that even copy protected software can’t tell that it isn’t an original disk.

The WOZ format has become very popular in the Apple II community and is ideal for emulating all the old games and software titles popular in the early 1980’s. You may have guessed where the name comes from. The internet archive has a large collection of WOZ disks in their WOZ-a-Day collection. The file format of a WOZ disk image is also a chunk based format similar to the A2R format, it has two versions. Let’s take a look.

hexdump -C WOZ 1.0/Blazing Paddles (Baudville).woz | head
00000000 57 4f 5a 31 ff 0a 0d 0a f6 f5 92 d6 49 4e 46 4f |WOZ1........INFO|
00000010 3c 00 00 00 01 01 00 01 01 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 30 2e 32 36 20 20 20 20 20 20 20 |uce v0.26 |
00000030 20 20 20 20 20 20 20 20 20 00 00 00 00 00 00 00 | .......|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 00 ff 01 01 01 ff 02 |TMAP............|
00000060 02 02 ff 03 03 03 ff 04 04 04 ff 05 05 05 ff 06 |................|
00000070 06 06 ff 07 07 07 ff 08 08 08 ff 09 09 09 ff 0a |................|
00000080 0a 0a ff 0b 0b 0b ff 0c 0c 0c ff 0d 0d 0d ff 0e |................|
00000090 0e 0e ff 0f 0f 0f ff 10 10 10 ff 11 11 11 ff 12 |................|

hexdump -C WOZ 2.0/Blazing Paddles (Baudville).woz | head
00000000 57 4f 5a 32 ff 0a 0d 0a 21 da c2 c8 49 4e 46 4f |WOZ2....!...INFO|
00000010 3c 00 00 00 02 01 00 01 01 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 31 2e 31 20 20 20 20 20 20 20 20 |uce v1.1 |
00000030 20 20 20 20 20 20 20 20 20 01 01 20 00 00 00 00 | .. ....|
00000040 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 00 ff 01 01 01 ff 02 |TMAP............|
00000060 02 02 ff 03 03 03 ff 04 04 04 ff 05 05 05 ff 06 |................|
00000070 06 06 ff 07 07 07 ff 08 08 08 ff 09 09 09 ff 0a |................|
00000080 0a 0a ff 0b 0b 0b ff 0c 0c 0c ff 0d 0d 0d ff 0e |................|
00000090 0e 0e ff 0f 0f 0f ff 10 10 10 ff 11 11 11 ff 12 |................|

Unlike a common disk image, a WOZ image contains more than the bits on the disk, it contains a mapping of all the tracks and the associated data, this is how it can even contain copy-protection usually only possible with a physical disk. The ‘TMAP’ chunk contains a track map and the ‘TRKS’ chunk contains all the data.

What the WOZ is for the Apple II, MOOF was made for the Macintosh. You may wonder what is with the funny name, but there is a long history around “Clarus the Dogcow”. I’m sure this factoid will help you impress your friends or win at trivia night. Again, the purpose of the special format for Macintosh disks is to allow for emulating disks, even with copy protection. You can also find quite the collection of old Macintosh software in the MOOF format on the Internet Archive, even emulate your favorite game, such as Dark Castle, which I played for hours as a kid. Also a chunk based format, let’s take a look at the header.

hexdump -C Dark Castle v1.0 - Disk 1.moof | head
00000000 4d 4f 4f 46 ff 0a 0d 0a b5 75 f9 4e 49 4e 46 4f |MOOF.....u.NINFO|
00000010 3c 00 00 00 01 01 00 01 10 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 31 2e 37 33 20 20 20 20 20 20 20 |uce v1.73 |
00000030 20 20 20 20 20 20 20 20 20 00 13 00 00 00 00 00 | .......|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 ff 01 ff 02 ff 03 ff |TMAP............|
00000060 04 ff 05 ff 06 ff 07 ff 08 ff 09 ff 0a ff 0b ff |................|
00000070 0c ff 0d ff 0e ff 0f ff 10 ff 11 ff 12 ff 13 ff |................|
00000080 14 ff 15 ff 16 ff 17 ff 18 ff 19 ff 1a ff 1b ff |................|
00000090 1c ff 1d ff 1e ff 1f ff 20 ff 21 ff 22 ff 23 ff |........ .!.".#.|

All three formats created for imaging and emulating Apple and Macintosh software are well documented and open. They are also well suited for preservation as they can contain extensive metadata in the INFO chunk which gives provenance information on the source of the files. The Applesauce software even has a camera to photograph the disk itself for archiving. All of this makes these formats great for preservation and emulation. Take a look at my proposal for a signature on my Github.