Shorten

I was recently going through some of my old CD-R’s and came across this 11 year old fun memory.

I remember going to this 2003 Toad the Wet Sprocket concert in Salt Lake City with some friends, I had seen this band perform before, but this was the first time I was able to get a recording of the show. Normally having a recording of a concert of a well known band was a little shady, but for some bands, they not only allow recording of their live concerts, but they encourage it. There has been a few bands over the years who have this philosophy, one most have heard of is the Grateful Dead, because of all the tape trading, the band’s numerous concerts will live on forever.

The scene of recording concerts is still alive and well, and if you are into recording and sharing it is expected you share in a lossless audio format. The world of lossless audio is definitely in the minority of all those who listen to music on the daily. Most of us have been placated with the infinite playlists on services like Apple Music, Spotify, and Amazon Music. Most probably don’t care about owning music anymore, but for the few who consider themselves Audiophiles, having a lossless audio file is the only choice.

When it comes to formats, there are a few lossless formats to choose from, they all come with some advantages as well as some downsides. WAV files contain the full PCM audio stream, and while internet bandwidth today can handle full uncompressed audio, it can still be beneficial to use some compression for archiving or sharing over the web.

The most common lossless format today is the Free Lossless Audio Codec or FLAC, but there are also quite a few who like the Apple Lossless Audio Codec. Both offer many advantages, especially with metadata, cuesheets, and can contain cover album art. But many years ago another lossless format was most often used with bootleg recordings and audio sharing.

Shorten was one of the first lossless formats, developed by Tony Robinson in 1993 for SoftSound. It could cut the size in half of a typical 16-bit WAV file. It achieved this by using Huffman coding, kinda the same way a JPEG works, by reducing the frequency of how often patterns occur. Today FLAC and ALAC have replaced this format and offer improved features and support. Many audio players have dropped support for shorten making it difficult to use this old format.

The Shorten format uses the .SHN extension. It is one of the formats listed on the Library of Congress Sustainability of Digital Formats with the ID fdd000199, although a couple links don’t appear to work as it hasn’t been updated since 2011. Support was ended for this format and many of the links found on various websites are for broken, usually referencing the etree wiki. Much of which is archived on the Internet Archive.

Let’s take a look at the what makes up a lossless compressed SHN file. A quick look at a sample header:

hexdump -C test.shn | head
00000000  61 6a 6b 67 02 fb b1 70  09 f9 25 59 52 a4 d1 a8  |ajkg...p..%YR...|
00000010  dd cf 85 5a 01 57 a0 d5  a8 b6 6b 6d d2 41 10 80  |...Z.W....km.A..|
00000020  40 20 10 18 04 0a 01 44  d6 40 20 11 0d 8c 0a 01  |@ .....D.@ .....|
00000030  04 80 44 20 16 4b 0d d2  c3 b8 f8 55 a0 11 80 59  |..D .K.....U...Y|
00000040  98 56 1d b1 79 51 9f 39  f1 12 d2 d3 75 5c cd 08  |.V..yQ.9....u\..|
00000050  06 25 68 6b 52 5e 9f 4c  39 cd c1 32 c4 0d a9 b7  |.%hkR^.L9..2....|
00000060  69 34 56 f0 96 fa 46 89  a2 6e 8c ba d5 d0 58 de  |i4V...F..n....X.|
00000070  f5 44 5b aa 61 82 c7 85  88 37 d6 ee cb ab 4e 44  |.D[.a....7....ND|
00000080  91 19 b7 38 d4 20 ae 98  98 d1 2c 4a 4e 88 dd 3e  |...8. ....,JN..>|
00000090  36 68 1b 59 a8 7d 84 23  76 0a 84 21 a1 cd 80 8e  |6h.Y.}.#v..!....|

The first four bytes seem to be consistent among my samples. It makes me wonder if the ascii values have something to do with the author, Anthony (Tony) J. Robinson. In the source code for the shorten software, the file shorten.h defines the ascii “ajkg” as the magic header for the SHN format. Also found in current ffmpeg code. Although the tools don’t have much to say about them.

mediainfo test.shn 
General
Complete name                            : test.shn
Format                                   : Shorten
Format version                           : 2
File size                                : 3.17 MiB

Audio
Format                                   : Shorten
Compression mode                         : Lossless

ffprobe -i test.shn
Input #0, shn, from 'test.shn':
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Audio: shorten, 44100 Hz, 2 channels, s16p

Using the older SHNTOOL, we can get more information.

shntool info test.shn  
-------------------------------------------------------------------------------
File name:                    test.shn
Handled by:                   shn format module
Length:                       0:32.23
WAVE format:                  0x0001 (Microsoft PCM)
Channels:                     2
Bits/sample:                  16
Samples/sec:                  44100
Average bytes/sec:            176400
Rate (calculated):            176400
Block align:                  4
Header size:                  44 bytes
Data size:                    5697720 bytes
Chunk size:                   5697756 bytes
Total size (chunk size + 8):  5697764 bytes
Actual file size:             3325489
File is compressed:           yes
Compression ratio:            0.5836
CD-quality properties:
  CD quality:                 yes
  Cut on sector boundary:     no
  Sector misalignment:        1176 bytes
  Long enough to be burned:   yes
WAVE properties:
  Non-canonical header:       no
  Extra RIFF chunks:          no
Possible problems:
  File contains ID3v2 tag:    no
  Data chunk block-aligned:   yes
  Inconsistent header:        no
  File probably truncated:    unknown
  Junk appended to file:      unknown
  Odd data size has pad byte: n/a
Extra shn-specific info:
  Seekable:                   yes

Many Shorten Audio Files are found out there in archives and file sharing sites, so even though the format isn’t used to create new files, it will still be around for awhile. My GitHub has my signature proposal and a couple of samples.

Multiplan

This is a follow up post to the post “EARLY MICROSOFT EXCEL” earlier this year.

I have to admit, often when I am researching file formats I can get distracted by a shinier format I come across. I often go down rabbit holes and forget the reason I started down the path I am on. I try and focus on the current needs in my life as a Digital Preservation Manager, but can get easily sidetracked. I always look forward to November every year so I can celebrate World Digital Preservation Day which sometimes comes along with a PRONOM research week. This gives me a chance to look at formats that may need attention which are not normally on my radar.

This week I a taking a look again at Multiplan. There is a PRONOM PUID for version 4, but does not have a description nor does it have a binary signature. It is was also lacking a File Format Wiki entry. So I decided to dive in. I had already bumped into the format while doing some research on early Microsoft Excel formats. This includes the SYLK format which needed a little update.

Microsoft Multiplan was the parent of Microsoft Excel. Multiplan was built for many different types of computers in the 1980’s, but was never ported to Windows. So to use Multiplan you have to be comfortable with using DOS. If you want to take Multiplan for a spin, head over to PCjs Machines and load up one of the many emulated systems they have.

In the end, Multiplan had four versions, but the last one, version 4.2, had some big changes, especially to the file format. More on that in a minute.

Mutiplan Version 1 – DOS

hexdump -C MP1.MOD  | head
00000000  08 e7 00 00 58 09 01 00  08 00 01 00 00 00 0a 00  |....X...........|
00000010  40 00 00 00 2e f5 0a 80  27 07 94 00 12 00 01 00  |@.......'.......|
00000020  0a 00 01 00 0c 0a 08 00  27 00 0d 80 04 00 01 00  |........'.......|
00000030  54 00 00 00 27 00 10 00  54 52 41 4e 53 46 48 f5  |T...'...TRANSFH.|
00000040  00 80 84 0a 68 61 52 f5  58 f5 5a f5 4e f5 0c 0a  |....haR.X.Z.N...|
00000050  12 00 01 00 72 f5 72 f5  0a 80 4b 0b 0f 00 12 00  |....r.r...K.....|
00000060  0c 0a 01 00 0c 00 01 00  08 00 20 4e 40 00 09 00  |.......... N@...|
00000070  8a f5 0a 80 4a 07 30 00  48 00 01 00 20 4e 00 00  |....J.0.H... N..|
00000080  28 0c 18 00 04 00 0d 80  03 00 28 0c 04 00 00 00  |(.........(.....|
00000090  26 00 00 00 54 52 d0 01  00 00 a4 f5 0a 00 62 0b  |&...TR........b.|

Mutiplan Version 1 – Macintosh

hexdump -C Multiplan1  | head
00000000  11 ab 00 00 13 e8 00 00  00 00 00 00 00 02 02 8c  |................|
00000010  00 18 00 0e 02 a4 02 b2  00 0e 02 fe 00 03 00 0e  |................|
00000020  00 bd 01 e3 2f 0f 00 08  15 5e 19 d1 03 5e 19 dd  |..../....^...^..|
00000030  61 60 60 5e 16 90 00 67  60 60 60 8f 5f 03 e8 7a  |a``^...g```._..z|
00000040  30 61 60 60 13 5f 03 e8  7b 90 00 67 60 60 60 8f  |0a``._..{..g```.|
00000050  16 85 67 60 60 60 8f 16  6d 85 61 60 60 13 5e 10  |..g```..m.a``.^.|
00000060  7b 90 00 67 60 60 60 8f  13 7a 31 14 6a d7 16 6e  |{..g```..z1.j..n|
00000070  85 14 77 60 16 6f 85 67  60 60 60 90 00 67 60 60  |..w`.o.g```..g``|
00000080  60 90 00 67 60 60 60 8f  13 7a 31 14 6a d7 16 70  |`..g```..z1.j..p|
00000090  85 14 77 60 16 71 85 67  60 60 60 90 00 67 60 60  |..w`.q.g```..g``|

Mutiplan Version 2 – DOS

hexdump -C MP2.MOD | head
00000000  0c ec 00 00 08 ab 08 00  1f 00 1a 00 03 00 27 03  |..............'.|
00000010  4b 05 00 00 00 00 00 00  00 00 00 1d c8 14 03 00  |K...............|
00000020  00 00 2f 00 9a 2e b3 fc  46 02 34 04 f3 16 00 00  |../.....F.4.....|
00000030  00 00 00 00 08 00 10 22  00 00 0d 06 84 1d 08 1d  |......."........|
00000040  ff 03 83 0a c8 18 48 1a  02 19 00 00 00 00 15 1b  |......H.........|
00000050  98 15 85 15 03 00 2a 00  00 37 46 32 1c 00 18 00  |......*..7F2....|
00000060  00 00 01 00 01 00 a9 03  0f 80 e8 14 00 00 01 00  |................|
00000070  6a 1c 00 00 01 00 0d 00  0f 80 0a 15 00 00 77 20  |j.............w |
00000080  00 00 01 00 6e 61 6c 20  00 00 2a 00 00 00 04 00  |....nal ..*.....|
00000090  00 00 0d 00 14 19 00 00  d4 06 0e 80 24 15 00 00  |............$...|

The DOS files for Version 2 begin with 0CEC0000 08AB0800, but a file for the Xenix system starts with 0AEC0000 08AB0800. So it appears the first byte may be different depending on the system.

hexdump -C MP3.MOD | head                         
00000000  0c ed 00 00 08 ab 08 00  1f 00 1a 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000110  00 00 02 00 01 00 00 00  00 00 ff 0f ff 00 00 00  |................|
00000120  00 00 05 00 06 00 46 00  36 00 42 00 00 00 00 00  |......F.6.B.....|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 01 00 00  |................|
00000150  00 fe 0f 00 fe 00 00 00  00 00 00 00 00 00 00 00  |................|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The DOS files for Version 3 begin with a similar hex pattern, 0CED0000 08AB0800. This would make sense as the documentation for Multiplan 4.2 states it supports opening of Version 2 & 3, but not Version 1.

There was also a companion product that went along with Multiplan, it was called Microsoft Chart. Here is a file from version 3:

hexdump -C EXAMPLE1.MC | head
00000000  90 01 00 00 08 ab 00 00  00 00 00 00 00 00 04 00  |................|
00000010  80 00 05 00 04 00 43 10  00 00 00 00 00 00 00 00  |......C.........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  e8 ff 04 00 00 22 24 36  a4 1f 00 00 11 00 24 00  |....."$6......$.|
00000090  03 00 64 00 00 00 cc 0c  cc 0c cc 0c cc 0c 00 00  |..d.............|
000000a0  00 00 00 00 ff 7f 00 00  01 f0 00 00 00 5f 00 00  |............._..|
000000b0  00 00 a2 ff a0 ff 00 00  01 f0 01 00 64 00 00 00  |............d...|
000000c0  00 00 01 f0 00 00 01 70  00 00 8e ff 8c ff 00 ff  |.......p........|
000000d0  0d 00 00 00 20 14 01 00  00 00 00 00 00 00 00 00  |.... ...........|

The Chart file format has a similar byte pattern with the 08AB pattern and looks similar to the BIFF format. We will have to make sure it doesn’t conflict with any signatures so it can be identified separately.

Version 4 of Multiplan was the first to use the BIFF (Binary Interchange File Format). Technically Version BIFF2, not much is know about BIFF1 or if it ever existed. BIFF2 is the exact same format as Excel 2.0 used, so there will be some problems if we want to identify them separately. They currently identify as fmt/55.

hexdump -C MP4.MOD | head
00000000  09 00 04 00 40 01 10 00  42 00 02 00 b5 01 66 00  |....@...B.....f.|
00000010  1b 00 00 00 00 00 00 00  00 00 ff ff 0f 01 00 01  |................|
00000020  00 01 00 01 00 00 00 00  00 00 00 00 00 0d 00 02  |................|
00000030  00 01 00 0e 00 02 00 01  00 0f 00 02 00 00 00 11  |................|
00000040  00 02 00 00 00 2a 00 02  00 00 00 6b 00 13 00 01  |.....*.....k....|
00000050  00 00 00 00 00 fe 0f 00  fe 40 02 e0 3d d0 2f 00  |.........@..=./.|
00000060  01 00 26 00 08 00 00 00  00 00 00 00 e0 3f 27 00  |..&..........?'.|
00000070  08 00 00 00 00 00 00 00  e0 3f 28 00 08 00 00 00  |.........?(.....|
00000080  00 00 00 00 f0 3f 29 00  08 00 00 00 00 00 00 00  |.....?).........|
00000090  f0 3f 70 00 0b 00 00 00  2e 00 02 04 f0 0a 00 f0  |.?p.............|

hexdump -C EXCEL2.XLS | head
00000000  09 00 04 00 02 00 10 00  0b 00 10 00 71 02 00 00  |............q...|
00000010  01 00 29 00 06 03 00 00  dc 0d 00 00 0c 00 02 00  |..).............|
00000020  64 00 0d 00 02 00 01 00  0e 00 02 00 01 00 0f 00  |d...............|
00000030  02 00 01 00 10 00 08 00  fc a9 f1 d2 4d 62 50 3f  |............MbP?|
00000040  11 00 02 00 00 00 22 00  02 00 00 00 40 00 02 00  |......".....@...|
00000050  00 00 2a 00 02 00 00 00  2b 00 02 00 00 00 25 00  |..*.....+.....%.|
00000060  02 00 2c 01 31 00 09 00  c8 00 00 00 04 48 65 6c  |..,.1........Hel|
00000070  76 32 00 0e 00 00 00 00  00 00 00 90 01 00 00 00  |v2..............|
00000080  00 00 8d 31 00 09 00 c8  00 01 00 04 48 65 6c 76  |...1........Helv|
00000090  32 00 0e 00 00 00 00 00  00 00 bc 02 00 00 00 00  |2...............|

You can see in the hex values above a difference of two bytes in the header. The reason the Multiplan file identifies as an Excel 2 file is the PRONOM signature ignores those two bytes and allows them to be anything. Some specifications say these aren’t used, but clearly there is a use for them. We could probably use the same signature for Multiplan, but include the two bytes, then set the priority to the Multiplan signature.

Multiplan 4.2 is very different.

hexdump -C MP42.MOD | head
00000000  0c ef 4d 50 a4 01 00 00  00 00 00 00 00 00 00 00  |..MP............|
00000010  00 00 00 00 00 00 80 02  00 00 00 00 00 00 00 2e  |................|
00000020  ff 0f ff 00 01 00 d0 02  d0 02 a0 05 a0 05 d0 2f  |.............../|
00000030  e0 3d 40 02 09 00 03 00  02 04 0a 00 00 00 fe 0f  |.=@.............|
00000040  00 fe 00 00 01 00 01 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  01 00 00 00 06 01 15 50  |...............P|
00000060  05 00 00 00 00 00 00 00  06 00 13 00 07 00 07 00  |................|
00000070  00 00 00 00 00 00 08 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The hex values for the first 4 bytes have a similar pattern. 0CEF, Which seems to be in sequence where Version 3 left off. Microsoft calls this new format, New or Normal Binary File Format. They claim it is “the fastest loading and fastest saving file format ever“! Exciting as the new format probably was, it didn’t last long. Multiplan was phased out so Excel could shine.

When I was younger I didn’t use DOS very often because the computer my father brought home in the mid 1980’s was a Macintosh. I use DOS more now in my research then I did when I was younger. Using the DOS interface is not easy. There are a lot of key commands you need to know intuitively just to navigate, but it is fascinating to see how far software has come. Early Excel, Multiplan, and Chart were all intertwined, but hopefully combing through all of these samples can bring some clarity. Take a look at the draft signature I made and all the samples that go with it on my GitHub page.