SCP

If you have been following previous posts about Floppy disk flux captures, you may have read about the HFE or A2R flux image formats. Both very useful in the preservation, archiving and emulation of old software and games stored on decaying and copy-protected floppy disks. I also built a Fluxengine which has come in handy more than once. It captures flux data in its own FLUX format. At work I also have access to a Kryoflux board which captures in separate RAW tracks.

Today we are looking at the SCP format. I recently purchased a Greaseweazle for personal use and the main format used while capturing raw flux data is SCP. It works a little better on my older MacBook Pro than the fluxengine and I wanted to have another option for capturing flux data. So far it has worked really well. Of course I wanted to know everything I could about the SCP format so the first thing I did was run Siegfried against a file.

filename : 'unknown.scp'
filesize : 47017278
modified : 2025-06-14T19:09:58-06:00
errors :
matches :
- ns : 'pronom'
id : 'UNKNOWN'
format :
version :
mime :
class :
basis :
warning : 'no match'
- ns : 'wikidata'
id : 'Q29000565'
format : 'SuperCard Pro dump'
URI : 'http://www.wikidata.org/entity/Q29000565'
permalink : 'https://www.wikidata.org/w/index.php?oldid=1866792367&title=Q29000565'
mime : 'application/octet-stream'
basis : 'extension match scp; byte match at 0, 3 (Wikidata reference is empty)'

Looks like Wikidata has a signature pattern, but PRONOM does not. Lets take a look and see how difficult it might be.

hexdump -C unknown.scp | head
00000000 53 43 50 00 80 03 00 a3 23 00 00 00 d2 0f 26 99 |SCP.....#.....&.|
00000010 b0 02 00 00 14 43 04 00 c6 96 08 00 64 78 0d 00 |.....C......dx..|
00000020 ea bb 12 00 de 37 16 00 a2 b3 19 00 26 68 1e 00 |.....7......&h..|
00000030 42 b7 23 00 2a 33 27 00 c8 ae 2a 00 a8 54 2f 00 |B.#.*3'...*..T/.|
00000040 fc 94 34 00 e2 10 38 00 a8 8c 3b 00 98 68 40 00 |..4...8...;..h@.|
00000050 1c b6 45 00 14 32 49 00 cc ad 4c 00 9e 9b 51 00 |..E..2I...L...Q.|
00000060 0e d3 56 00 de 4e 5a 00 74 ca 5d 00 be 7b 62 00 |..V..NZ.t.]..{b.|
00000070 b4 b3 67 00 a8 2f 6b 00 68 ab 6e 00 50 88 73 00 |..g../k.h.n.P.s.|
00000080 0c ce 78 00 02 4a 7c 00 ae c5 7f 00 96 bd 84 00 |..x..J|.........|
00000090 8a 2d 8a 00 8a a9 8d 00 56 25 91 00 b6 a3 95 00 |.-......V%......|

Well, probably not hard at all. I love easy well understood headers. But only three bytes can have issues, lets look a little closer at the published specification. Before we dive into the spec, it might be good to note a few things. The SCP image format was developed for another hobby board. A Supercard Pro, is a custom board to connect a floppy drive through USB to software which can also capture flux data and help interpret the data to a image format which can be used to write back to a floppy or used in an emulator. The software is Windows only so those on Linux or MacOS can’t use it, but since the specification was made public, many other boards and tools can read and write to the format. Even though it is open, I worry about preserving the spec. When you try and ensure it is saved in the WayBackMachine you get this fun page.

This sorry page is usually found when the owner of a URL has asked specifically for their domain to be excluded from the web archive. This worries me as I have found many specifications have been lost to time. I would love to know why the owner has chosen to do this, but it is available now, so lets dive in. The versions appear to have started in 2014, but the page is copyright 2012, so I assume the format was created around this time. It was last updated in February of 2024, so is pretty up-to-date. One important update was made in 2021:

v2.3 - 06/03/21

* Added additional FLAG bit (bit 7) to identify a 3rd party flux creator. PLEASE
SET THIS BIT IF YOU ARE A 3RD PARTY DEVELOPER USING THE SCP FORMAT!

This update to version 2.3 added a bit to indicate the 3rd party flux creator. This means a board like the Greaseweazle will indicate its software as the creator instead of a SCP created by SuperCard Pro.

The header of an SCP file is comprised of a few bytes, not just the ASCII “SCP”.

All offsets are the start of the file (byte 0) unless otherwise stated.  The .scp image
consists of a disk definition header, the track data header offset table, and the flux
data for each track (preceeded by Track Data Header). The image file format is described
below:

BYTES 0x00-0x02 contains the ASCII of "SCP" as the first 3 bytes. If this is not found,
then the file is not ours.

With Byte 0x03, we will see the version of the software which created the SCP. In my sample, created by my Greaseweazle, did not add a number here, only “00”. Byte 0x04 is the disk type, there is some set definitions in the spec for this byte. My test sample uses “80”, but not sure what that represents. Bytes 5-7 are used for other disk information, but byte 8 is where we find the flags which include a bit for flux creator. My sample has the value “23”, but since we are looking at the individual bit level, the value will be a combination of all the bits in the flag area. The individual bits are, “00100011”, so since the seventh bit is set, then the SCP was created by 3rd party which is correct.

So the only reliable static data in the header will be those first 3 bytes. There is some bytes later in the file which should be static. That is the start of the Tracks, which include a Track Data Header. We can see from the spec, the last byte in the main header is 0x2AF, which makes the main header 687 bytes long. Starting on the 688 byte, or 0x2B0 is the ASCII string TRK. Adding these 3 bytes should make for a nice signature.

000002b0  54 52 4b 00 a9 86 65 00  5e b5 00 00 28 00 00 00  |TRK...e.^...(...|
000002c0 ab 86 65 00 60 b5 00 00 e4 6a 01 00 56 87 65 00 |..e.`....j..V.e.|
000002d0 60 b5 00 00 a4 d5 02 00 00 39 00 7e 00 7c 00 ce |`........9.~.|..|
000002e0 00 c7 00 c7 00 cd 00 7e 00 7c 00 eb 00 4f 00 60 |.......~.|...O.`|
000002f0 00 39 00 77 00 cd 00 7c 00 7f 00 ce 00 c7 00 c6 |.9.w...|........|
00000300 00 ce 00 7a 00 80 00 cd 00 c8 00 c6 00 ce 00 7b |...z...........{|

We could use the TRK string for identification, but looking further into the spec, we can also see the SCP format may contain a footer.

; ------------------------------------------------------------------
; EXTENSION FOOTER FORMAT
; ------------------------------------------------------------------
;
; 0000 DRIVE MANUFACTURER STRING OFFSET - 4 bytes
; 0004 DRIVE MODEL STRING OFFSET - 4 bytes
; 0008 DRIVE SERIAL NUMBER STRING OFFSET - 4 bytes
; 000C CREATOR STRING OFFSET - 4 bytes
; 0010 APPLICATION NAME STRING OFFSET - 4 bytes
; 0014 COMMENTS STRING OFFSET - 4 bytes
; 0018 IMAGE CREATION TIMESTAMP - 8 bytes
; 0020 IMAGE MODIFICATION TIMESTAMP - 8 bytes
; 0028 APPLICATION VERSION (nibbles major/minor) - 1 byte
; 0029 SCP HARDWARE VERSION (nibbles major/minor) - 1 byte
; 002A SCP FIRMWARE VERSION (nibbles major/minor) - 1 byte
; 002B IMAGE FORMAT REVISION (nibbles major/minor) - 1 byte
; 002C 'FPCS' (ASCII CHARS) - 4 bytes

Here is the tail of my sample file, you can see it contains the ASCII characters listed here for the last four bytes. It also contains an application string, indicating the Greaseweazle software used to create the file. All every helpful information. We can also see on the 5th to last byte the value “24”, this indicates the file format version being used. Version 2.4 being used in this file but we know 2.5 is the latest. I wonder if it would be valuable to have separate identification for version 1 and 2 of the format? Could also consider assigning version 2.3 and 2.4 as unique as they will have the additional 3rd party information.

hexdump -C unknown.scp | tail
02cd6cb0 00 85 00 5a 00 39 00 90 00 75 00 8e 00 42 00 3c |...Z.9...u...B.<|
02cd6cc0 00 78 00 2e 00 42 00 3a 00 47 00 78 00 42 00 46 |.x...B.:.G.x.B.F|
02cd6cd0 00 33 00 52 00 29 00 3a 00 55 00 5d 00 5b 00 54 |.3.R.).:.U.].[.T|
02cd6ce0 00 35 00 e0 00 48 00 91 00 75 00 3a 00 36 00 33 |.5...H...u.:.6.3|
02cd6cf0 00 55 02 03 01 d3 00 33 00 58 11 00 47 72 65 61 |.U.....3.X..Grea|
02cd6d00 73 65 77 65 61 7a 6c 65 20 31 2e 32 32 00 00 00 |seweazle 1.22...|
02cd6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 6c |...............l|
02cd6d20 cd 02 00 00 00 00 66 1d 4e 68 00 00 00 00 66 1d |......f.Nh....f.|
02cd6d30 4e 68 00 00 00 00 00 00 00 24 46 50 43 53 |Nh.......$FPCS|

So maybe we don’t need the TRK header in our signature, just the first 3 bytes and last 4 bytes. I believe this should allow for proper identification, while avoiding false positives.

I have a proposal for a PRONOM signature and a sample file on my Github page. Other samples files can be found all over the interwebs, with many on archive.org.

HFE

Last week I had the pleasure of attending the 20th annual iPres conference on Digital Preservation in Ghent, Belgium. I enjoyed hearing from many of my respected colleagues on many aspects of preservation including one of my favorite topics, floppy disks. There was tutorials, lightning talks, and even a workshop, presented by Leontien Talboom, Elizabeth Kata, Chris Knowles, and myself. We titled the workshop “A Guide to Imaging Obscure Floppy Disk Formats“. The workshop was conceived by a mutual interest in imaging Wang 5.25in word processor disks, but expanded to include imaging of Amstrad 3in disks, 240K Brother Typewriter Disks, and Macintosh 400/800k disks.

I brought my hand soldered FluxEngine board and others brought their Greaseweazle board to show off how imaging obscure and uncommon disks can be done on a budget.

Photo of workshop taken on a Mavica Floppy Disk camera
Image taken during workshop on a Mavica FD200 Floppy Disk Camera.

During the conference we talked a bit about the different type of hardware that can be used and the difference between a disk image and flux image. There seems to be quite the exhaustive list of different types of file formats, some specific to a platform and others more generic. I recently did a blog post on the formats used by the Applesauce software, which have some unique features.

There are many disk image types which should be researched and added to PRONOM and other format description sites, but today lets take a look at a generic format used by many tools.

The HxC Floppy Emulator file format which the extension HFE is a popular format used with floppy drive emulators. There is a lot of complexity with what is included in many of these image formats, some are simply a raw sector representation of the binary data on a disk, others contain the complete flux readings from a floppy disk. The HFE format contains a little more than a raw image, including a header, a track lookup table, and the bitstreams for each track all with the purpose of emulating the physical media. The HFE format contains only a single pass over the data, where other formats may contain multiple reading of each track to get more complete data which can be helpful for damaged or purposely copy-protected disks. You can read more on Ashley’s blog, Library of Congress format description.

HFE version list

When using the HxC Floppy Emulator software, you can open and save to many different formats. The main format being their HFE native format. It comes in 5 versions.

hexdump -C test01.hfe | head
00000000 48 58 43 50 49 43 46 45 00 53 02 00 e8 01 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

Above is a hexdump of the main SDCard HxC Floppy Emulator file format. The format specification shows the 8 byte header “HXCPICFE”. This is a very unique pattern and should be all we need to make a robust signature for the format, but we do need to take into account the other HFE “versions” and see if they might clash or need to be identified separately.

hexdump -C test02-a2.hfe | head 
00000000 48 58 43 50 49 43 46 45 00 53 02 00 d0 03 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

The “A2” version of the format has the same header but some different bytes further into the file.

hexdump -C test03-rev2.hfe | head
00000000 48 58 43 50 49 43 46 45 01 53 02 00 00 00 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

The “Rev 2” version also has the same header. But if you look at the 9th byte you can see the value changed from 00 to 01, which according to the specification, this is the revision byte.

hexdump -C test04-rev3.hfe | head 
00000000 48 58 43 48 46 45 56 33 00 53 02 00 e8 01 00 00 |HXCHFEV3.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

With “Rev 3” we see a change in the header with “HXCHFEV3” which appears to be referred to as HFEv3.

hexdump -C test05-stream.hfe | head 
00000000 48 78 43 5f 53 74 72 65 61 6d 5f 49 6d 61 67 65 |HxC_Stream_Image|
00000010 00 00 00 00 00 00 00 00 00 18 00 00 00 02 00 00 |................|
00000020 00 1a 00 00 53 00 00 00 02 00 00 00 40 9c 00 00 |....S.......@...|
00000030 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

This last format seems to be a special HxC stream image.

It seems the best option is to make three signatures to identify the three main headers. Additional software can be used to further parse the disk image. If you would like to see some sample images, you can download a bunch here. You can also take a look at my GitHub repository to see additional samples and a proposed set of signatures.

A2R / MOOF / WOZ

There seems to be a never ending growing list of disk image formats. Many have features which are specific to the media and format. If you have ever imaged an older Macintosh floppy you know they are special. If you add in copy-protection which many early Apple II floppies have, and you need special drives, hardware, and a special format to store the floppy data.

When imaging special media, especially with unique media, it is best practice to image the floppies at the magnetic flux level.

Floppy disks contain magnetic fluctuations which are measured and recorded using specialized equipment. A popular method is using a Kryoflux board, floppy drive, and software. The software communicates with a custom controller board connected to a floppy drive through USB. If you are interested in the different controller boards, a good list has been compiled here.

A Kryoflux, fluxengine, greaseweazle, all can image specialized disks like a Macintosh 800k floppy, but the best controller board for them is an Applesauce setup. They are specifically designed to for the task. With that task, comes a few specialty formats.

A file format which can store flux data is a bit different than a regular disk image format. The flux data contains all the low-level recordings which can then be interpreted into disk images much like the original floppy. In the case of an Applesauce flux image, it can contain all the small nuances of the original floppy, this includes recording any copy protection or other creative methods used by software vendors throughout the years. The format used for storing this flux data is the A2R format.

A2R is in its third iteration. Let’s take a look at the basics of the format.

hexdump -C Samplev3.a2r | head
00000000 41 32 52 33 ff 0a 0d 0a 49 4e 46 4f 25 00 00 00 |A2R3....INFO%...|
00000010 01 41 70 70 6c 65 73 61 75 63 65 20 76 31 2e 38 |.Applesauce v1.8|
00000020 38 2e 35 20 20 20 20 20 20 20 20 20 20 20 20 20 |8.5 |
00000030 20 02 01 01 00 52 57 43 50 e9 49 6e 01 01 24 f4 | ....RWCP.In..$.|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 43 01 00 |.............C..|
00000050 00 01 27 3a 25 00 91 d9 00 00 21 20 21 21 21 21 |..':%.....! !!!!|
00000060 1f 21 21 21 21 1f 24 5e 24 1f 21 21 20 21 24 5c |.!!!!.$^$.!! !$\|
00000070 24 20 21 21 21 1f 24 5c 25 21 21 1f 21 21 23 5b |$ !!!.$\%!!.!!#[|
00000080 25 20 21 21 21 1f 21 22 23 3f 41 3f 26 3e 43 3f |% !!!.!"#?A?&>C?|
00000090 43 5f 41 27 3d 61 41 27 3d 61 3f 28 3e 61 3f 26 |C_A'=aA'=a?(>a?&|

hexdump -C Samplev2.a2r | head
00000000 41 32 52 32 ff 0a 0d 0a 49 4e 46 4f 24 00 00 00 |A2R2....INFO$...|
00000010 01 41 70 70 6c 65 73 61 75 63 65 20 76 31 2e 31 |.Applesauce v1.1|
00000020 2e 36 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |.6 |
00000030 20 02 01 01 53 54 52 4d 75 17 5d 01 00 01 e6 da | ...STRMu.].....|
00000040 00 00 83 a9 12 00 12 1e 11 13 1e 13 1e 13 11 1f |................|
00000050 21 1f 11 13 1c 14 1e 30 14 20 1e 14 1e 14 1c 14 |!......0. ......|
00000060 1c 13 11 20 21 1f 11 11 0f 13 1e 14 1c 14 2e 21 |... !..........!|
00000070 13 1e 13 1e 14 1e 11 11 20 21 1f 11 11 13 1e 1f |........ !......|
00000080 13 20 30 21 11 11 0f 13 1e 13 11 30 1f 21 20 13 |. 0!.......0.! .|
00000090 11 30 1f 14 1e 30 14 1e 11 11 11 1e 13 11 1e 14 |.0...0..........|

The A2R format uses a chunk system to store the various pieces to the format. Earlier versions used a STRM Chunk to store all the raw flux data. Version 3 changed to a RWCP Chunk to store all the raw flux data. Applesauce uses a 2-pass imaging process, doing a rapid imaging to determine where on the media surface track data exists and then a second pass that captures longer durations for processing and error correction.

Once the full raw flux data has been captured that data can be interpreted as a disk image. The Applesauce software is able to make a regular disk image, a Disk Copy 4.2 file, which are well known and identify in PRONOM as fmt/625, but can also create a couple of special disk image formats which allow for special nuances on an original disk.

The WOZ Disk Image format is an offshoot of the Applesauce project. Capturing highly accurate bit data is of no use if you don’t have a container to hold the data. The WOZ format was designed to be able to contain every possible Apple ][ disk structure and layout. It can be so accurate that even copy protected software can’t tell that it isn’t an original disk.

The WOZ format has become very popular in the Apple II community and is ideal for emulating all the old games and software titles popular in the early 1980’s. You may have guessed where the name comes from. The internet archive has a large collection of WOZ disks in their WOZ-a-Day collection. The file format of a WOZ disk image is also a chunk based format similar to the A2R format, it has two versions. Let’s take a look.

hexdump -C WOZ 1.0/Blazing Paddles (Baudville).woz | head
00000000 57 4f 5a 31 ff 0a 0d 0a f6 f5 92 d6 49 4e 46 4f |WOZ1........INFO|
00000010 3c 00 00 00 01 01 00 01 01 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 30 2e 32 36 20 20 20 20 20 20 20 |uce v0.26 |
00000030 20 20 20 20 20 20 20 20 20 00 00 00 00 00 00 00 | .......|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 00 ff 01 01 01 ff 02 |TMAP............|
00000060 02 02 ff 03 03 03 ff 04 04 04 ff 05 05 05 ff 06 |................|
00000070 06 06 ff 07 07 07 ff 08 08 08 ff 09 09 09 ff 0a |................|
00000080 0a 0a ff 0b 0b 0b ff 0c 0c 0c ff 0d 0d 0d ff 0e |................|
00000090 0e 0e ff 0f 0f 0f ff 10 10 10 ff 11 11 11 ff 12 |................|

hexdump -C WOZ 2.0/Blazing Paddles (Baudville).woz | head
00000000 57 4f 5a 32 ff 0a 0d 0a 21 da c2 c8 49 4e 46 4f |WOZ2....!...INFO|
00000010 3c 00 00 00 02 01 00 01 01 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 31 2e 31 20 20 20 20 20 20 20 20 |uce v1.1 |
00000030 20 20 20 20 20 20 20 20 20 01 01 20 00 00 00 00 | .. ....|
00000040 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 00 ff 01 01 01 ff 02 |TMAP............|
00000060 02 02 ff 03 03 03 ff 04 04 04 ff 05 05 05 ff 06 |................|
00000070 06 06 ff 07 07 07 ff 08 08 08 ff 09 09 09 ff 0a |................|
00000080 0a 0a ff 0b 0b 0b ff 0c 0c 0c ff 0d 0d 0d ff 0e |................|
00000090 0e 0e ff 0f 0f 0f ff 10 10 10 ff 11 11 11 ff 12 |................|

Unlike a common disk image, a WOZ image contains more than the bits on the disk, it contains a mapping of all the tracks and the associated data, this is how it can even contain copy-protection usually only possible with a physical disk. The ‘TMAP’ chunk contains a track map and the ‘TRKS’ chunk contains all the data.

What the WOZ is for the Apple II, MOOF was made for the Macintosh. You may wonder what is with the funny name, but there is a long history around “Clarus the Dogcow”. I’m sure this factoid will help you impress your friends or win at trivia night. Again, the purpose of the special format for Macintosh disks is to allow for emulating disks, even with copy protection. You can also find quite the collection of old Macintosh software in the MOOF format on the Internet Archive, even emulate your favorite game, such as Dark Castle, which I played for hours as a kid. Also a chunk based format, let’s take a look at the header.

hexdump -C Dark Castle v1.0 - Disk 1.moof | head
00000000 4d 4f 4f 46 ff 0a 0d 0a b5 75 f9 4e 49 4e 46 4f |MOOF.....u.NINFO|
00000010 3c 00 00 00 01 01 00 01 10 41 70 70 6c 65 73 61 |<........Applesa|
00000020 75 63 65 20 76 31 2e 37 33 20 20 20 20 20 20 20 |uce v1.73 |
00000030 20 20 20 20 20 20 20 20 20 00 13 00 00 00 00 00 | .......|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 54 4d 41 50 a0 00 00 00 00 ff 01 ff 02 ff 03 ff |TMAP............|
00000060 04 ff 05 ff 06 ff 07 ff 08 ff 09 ff 0a ff 0b ff |................|
00000070 0c ff 0d ff 0e ff 0f ff 10 ff 11 ff 12 ff 13 ff |................|
00000080 14 ff 15 ff 16 ff 17 ff 18 ff 19 ff 1a ff 1b ff |................|
00000090 1c ff 1d ff 1e ff 1f ff 20 ff 21 ff 22 ff 23 ff |........ .!.".#.|

All three formats created for imaging and emulating Apple and Macintosh software are well documented and open. They are also well suited for preservation as they can contain extensive metadata in the INFO chunk which gives provenance information on the source of the files. The Applesauce software even has a camera to photograph the disk itself for archiving. All of this makes these formats great for preservation and emulation. Take a look at my proposal for a signature on my Github.