SCP

If you have been following previous posts about Floppy disk flux captures, you may have read about the HFE or A2R flux image formats. Both very useful in the preservation, archiving and emulation of old software and games stored on decaying and copy-protected floppy disks. I also built a Fluxengine which has come in handy more than once. It captures flux data in its own FLUX format. At work I also have access to a Kryoflux board which captures in separate RAW tracks.

Today we are looking at the SCP format. I recently purchased a Greaseweazle for personal use and the main format used while capturing raw flux data is SCP. It works a little better on my older MacBook Pro than the fluxengine and I wanted to have another option for capturing flux data. So far it has worked really well. Of course I wanted to know everything I could about the SCP format so the first thing I did was run Siegfried against a file.

filename : 'unknown.scp'
filesize : 47017278
modified : 2025-06-14T19:09:58-06:00
errors :
matches :
- ns : 'pronom'
id : 'UNKNOWN'
format :
version :
mime :
class :
basis :
warning : 'no match'
- ns : 'wikidata'
id : 'Q29000565'
format : 'SuperCard Pro dump'
URI : 'http://www.wikidata.org/entity/Q29000565'
permalink : 'https://www.wikidata.org/w/index.php?oldid=1866792367&title=Q29000565'
mime : 'application/octet-stream'
basis : 'extension match scp; byte match at 0, 3 (Wikidata reference is empty)'

Looks like Wikidata has a signature pattern, but PRONOM does not. Lets take a look and see how difficult it might be.

hexdump -C unknown.scp | head
00000000 53 43 50 00 80 03 00 a3 23 00 00 00 d2 0f 26 99 |SCP.....#.....&.|
00000010 b0 02 00 00 14 43 04 00 c6 96 08 00 64 78 0d 00 |.....C......dx..|
00000020 ea bb 12 00 de 37 16 00 a2 b3 19 00 26 68 1e 00 |.....7......&h..|
00000030 42 b7 23 00 2a 33 27 00 c8 ae 2a 00 a8 54 2f 00 |B.#.*3'...*..T/.|
00000040 fc 94 34 00 e2 10 38 00 a8 8c 3b 00 98 68 40 00 |..4...8...;..h@.|
00000050 1c b6 45 00 14 32 49 00 cc ad 4c 00 9e 9b 51 00 |..E..2I...L...Q.|
00000060 0e d3 56 00 de 4e 5a 00 74 ca 5d 00 be 7b 62 00 |..V..NZ.t.]..{b.|
00000070 b4 b3 67 00 a8 2f 6b 00 68 ab 6e 00 50 88 73 00 |..g../k.h.n.P.s.|
00000080 0c ce 78 00 02 4a 7c 00 ae c5 7f 00 96 bd 84 00 |..x..J|.........|
00000090 8a 2d 8a 00 8a a9 8d 00 56 25 91 00 b6 a3 95 00 |.-......V%......|

Well, probably not hard at all. I love easy well understood headers. But only three bytes can have issues, lets look a little closer at the published specification. Before we dive into the spec, it might be good to note a few things. The SCP image format was developed for another hobby board. A Supercard Pro, is a custom board to connect a floppy drive through USB to software which can also capture flux data and help interpret the data to a image format which can be used to write back to a floppy or used in an emulator. The software is Windows only so those on Linux or MacOS can’t use it, but since the specification was made public, many other boards and tools can read and write to the format. Even though it is open, I worry about preserving the spec. When you try and ensure it is saved in the WayBackMachine you get this fun page.

This sorry page is usually found when the owner of a URL has asked specifically for their domain to be excluded from the web archive. This worries me as I have found many specifications have been lost to time. I would love to know why the owner has chosen to do this, but it is available now, so lets dive in. The versions appear to have started in 2014, but the page is copyright 2012, so I assume the format was created around this time. It was last updated in February of 2024, so is pretty up-to-date. One important update was made in 2021:

v2.3 - 06/03/21

* Added additional FLAG bit (bit 7) to identify a 3rd party flux creator. PLEASE
SET THIS BIT IF YOU ARE A 3RD PARTY DEVELOPER USING THE SCP FORMAT!

This update to version 2.3 added a bit to indicate the 3rd party flux creator. This means a board like the Greaseweazle will indicate its software as the creator instead of a SCP created by SuperCard Pro.

The header of an SCP file is comprised of a few bytes, not just the ASCII “SCP”.

All offsets are the start of the file (byte 0) unless otherwise stated.  The .scp image
consists of a disk definition header, the track data header offset table, and the flux
data for each track (preceeded by Track Data Header). The image file format is described
below:

BYTES 0x00-0x02 contains the ASCII of "SCP" as the first 3 bytes. If this is not found,
then the file is not ours.

With Byte 0x03, we will see the version of the software which created the SCP. In my sample, created by my Greaseweazle, did not add a number here, only “00”. Byte 0x04 is the disk type, there is some set definitions in the spec for this byte. My test sample uses “80”, but not sure what that represents. Bytes 5-7 are used for other disk information, but byte 8 is where we find the flags which include a bit for flux creator. My sample has the value “23”, but since we are looking at the individual bit level, the value will be a combination of all the bits in the flag area. The individual bits are, “00100011”, so since the seventh bit is set, then the SCP was created by 3rd party which is correct.

So the only reliable static data in the header will be those first 3 bytes. There is some bytes later in the file which should be static. That is the start of the Tracks, which include a Track Data Header. We can see from the spec, the last byte in the main header is 0x2AF, which makes the main header 687 bytes long. Starting on the 688 byte, or 0x2B0 is the ASCII string TRK. Adding these 3 bytes should make for a nice signature.

000002b0  54 52 4b 00 a9 86 65 00  5e b5 00 00 28 00 00 00  |TRK...e.^...(...|
000002c0 ab 86 65 00 60 b5 00 00 e4 6a 01 00 56 87 65 00 |..e.`....j..V.e.|
000002d0 60 b5 00 00 a4 d5 02 00 00 39 00 7e 00 7c 00 ce |`........9.~.|..|
000002e0 00 c7 00 c7 00 cd 00 7e 00 7c 00 eb 00 4f 00 60 |.......~.|...O.`|
000002f0 00 39 00 77 00 cd 00 7c 00 7f 00 ce 00 c7 00 c6 |.9.w...|........|
00000300 00 ce 00 7a 00 80 00 cd 00 c8 00 c6 00 ce 00 7b |...z...........{|

We could use the TRK string for identification, but looking further into the spec, we can also see the SCP format may contain a footer.

; ------------------------------------------------------------------
; EXTENSION FOOTER FORMAT
; ------------------------------------------------------------------
;
; 0000 DRIVE MANUFACTURER STRING OFFSET - 4 bytes
; 0004 DRIVE MODEL STRING OFFSET - 4 bytes
; 0008 DRIVE SERIAL NUMBER STRING OFFSET - 4 bytes
; 000C CREATOR STRING OFFSET - 4 bytes
; 0010 APPLICATION NAME STRING OFFSET - 4 bytes
; 0014 COMMENTS STRING OFFSET - 4 bytes
; 0018 IMAGE CREATION TIMESTAMP - 8 bytes
; 0020 IMAGE MODIFICATION TIMESTAMP - 8 bytes
; 0028 APPLICATION VERSION (nibbles major/minor) - 1 byte
; 0029 SCP HARDWARE VERSION (nibbles major/minor) - 1 byte
; 002A SCP FIRMWARE VERSION (nibbles major/minor) - 1 byte
; 002B IMAGE FORMAT REVISION (nibbles major/minor) - 1 byte
; 002C 'FPCS' (ASCII CHARS) - 4 bytes

Here is the tail of my sample file, you can see it contains the ASCII characters listed here for the last four bytes. It also contains an application string, indicating the Greaseweazle software used to create the file. All every helpful information. We can also see on the 5th to last byte the value “24”, this indicates the file format version being used. Version 2.4 being used in this file but we know 2.5 is the latest. I wonder if it would be valuable to have separate identification for version 1 and 2 of the format? Could also consider assigning version 2.3 and 2.4 as unique as they will have the additional 3rd party information.

hexdump -C unknown.scp | tail
02cd6cb0 00 85 00 5a 00 39 00 90 00 75 00 8e 00 42 00 3c |...Z.9...u...B.<|
02cd6cc0 00 78 00 2e 00 42 00 3a 00 47 00 78 00 42 00 46 |.x...B.:.G.x.B.F|
02cd6cd0 00 33 00 52 00 29 00 3a 00 55 00 5d 00 5b 00 54 |.3.R.).:.U.].[.T|
02cd6ce0 00 35 00 e0 00 48 00 91 00 75 00 3a 00 36 00 33 |.5...H...u.:.6.3|
02cd6cf0 00 55 02 03 01 d3 00 33 00 58 11 00 47 72 65 61 |.U.....3.X..Grea|
02cd6d00 73 65 77 65 61 7a 6c 65 20 31 2e 32 32 00 00 00 |seweazle 1.22...|
02cd6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 6c |...............l|
02cd6d20 cd 02 00 00 00 00 66 1d 4e 68 00 00 00 00 66 1d |......f.Nh....f.|
02cd6d30 4e 68 00 00 00 00 00 00 00 24 46 50 43 53 |Nh.......$FPCS|

So maybe we don’t need the TRK header in our signature, just the first 3 bytes and last 4 bytes. I believe this should allow for proper identification, while avoiding false positives.

I have a proposal for a PRONOM signature and a sample file on my Github page. Other samples files can be found all over the interwebs, with many on archive.org.