HFE

Last week I had the pleasure of attending the 20th annual iPres conference on Digital Preservation in Ghent, Belgium. I enjoyed hearing from many of my respected colleagues on many aspects of preservation including one of my favorite topics, floppy disks. There was tutorials, lightning talks, and even a workshop, presented by Leontien Talboom, Elizabeth Kata, Chris Knowles, and myself. We titled the workshop “A Guide to Imaging Obscure Floppy Disk Formats“. The workshop was conceived by a mutual interest in imaging Wang 5.25in word processor disks, but expanded to include imaging of Amstrad 3in disks, 240K Brother Typewriter Disks, and Macintosh 400/800k disks.

I brought my hand soldered FluxEngine board and others brought their Greaseweazle board to show off how imaging obscure and uncommon disks can be done on a budget.

Photo of workshop taken on a Mavica Floppy Disk camera
Image taken during workshop on a Mavica FD200 Floppy Disk Camera.

During the conference we talked a bit about the different type of hardware that can be used and the difference between a disk image and flux image. There seems to be quite the exhaustive list of different types of file formats, some specific to a platform and others more generic. I recently did a blog post on the formats used by the Applesauce software, which have some unique features.

There are many disk image types which should be researched and added to PRONOM and other format description sites, but today lets take a look at a generic format used by many tools.

The HxC Floppy Emulator file format which the extension HFE is a popular format used with floppy drive emulators. There is a lot of complexity with what is included in many of these image formats, some are simply a raw sector representation of the binary data on a disk, others contain the complete flux readings from a floppy disk. The HFE format contains a little more than a raw image, including a header, a track lookup table, and the bitstreams for each track all with the purpose of emulating the physical media. The HFE format contains only a single pass over the data, where other formats may contain multiple reading of each track to get more complete data which can be helpful for damaged or purposely copy-protected disks. You can read more on Ashley’s blog, Library of Congress format description.

HFE version list

When using the HxC Floppy Emulator software, you can open and save to many different formats. The main format being their HFE native format. It comes in 5 versions.

hexdump -C test01.hfe | head
00000000 48 58 43 50 49 43 46 45 00 53 02 00 e8 01 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

Above is a hexdump of the main SDCard HxC Floppy Emulator file format. The format specification shows the 8 byte header “HXCPICFE”. This is a very unique pattern and should be all we need to make a robust signature for the format, but we do need to take into account the other HFE “versions” and see if they might clash or need to be identified separately.

hexdump -C test02-a2.hfe | head 
00000000 48 58 43 50 49 43 46 45 00 53 02 00 d0 03 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

The “A2” version of the format has the same header but some different bytes further into the file.

hexdump -C test03-rev2.hfe | head
00000000 48 58 43 50 49 43 46 45 01 53 02 00 00 00 00 00 |HXCPICFE.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

The “Rev 2” version also has the same header. But if you look at the 9th byte you can see the value changed from 00 to 01, which according to the specification, this is the revision byte.

hexdump -C test04-rev3.hfe | head 
00000000 48 58 43 48 46 45 56 33 00 53 02 00 e8 01 00 00 |HXCHFEV3.S......|
00000010 07 01 01 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

With “Rev 3” we see a change in the header with “HXCHFEV3” which appears to be referred to as HFEv3.

hexdump -C test05-stream.hfe | head 
00000000 48 78 43 5f 53 74 72 65 61 6d 5f 49 6d 61 67 65 |HxC_Stream_Image|
00000010 00 00 00 00 00 00 00 00 00 18 00 00 00 02 00 00 |................|
00000020 00 1a 00 00 53 00 00 00 02 00 00 00 40 9c 00 00 |....S.......@...|
00000030 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

This last format seems to be a special HxC stream image.

It seems the best option is to make three signatures to identify the three main headers. Additional software can be used to further parse the disk image. If you would like to see some sample images, you can download a bunch here. You can also take a look at my GitHub repository to see additional samples and a proposed set of signatures.