FlashPix

Is there a perfect raster image format? TIFF has been around quite some time and is generally accepted as a preferred preservation format. There have been a few attempts to have a single file contain multiple resolutions with the purpose of providing resolutions for different uses, lower-resolution for web and higher-resolution for print. Even the semi popular JPEG2000 added multiple resolutions to improve the JPEG format. Kodak came up with a few ideas to do this as well. The Kodak PCD, PhotoCD or Image PAC files was one that was used for awhile before it was abandoned. Another was FlashPix.

I briefly mentioned FlashPix on an earlier post about the Microsoft Picture It! format. They are extremely similar. Both. have the same basic structure in a Compound Object format. Some of the FlashPix files generated by Picture It! even have the same identifiers in the CompObj header.

FlashPix was supposed to be the answer to all the problems with storing bitmap image data and how we view the web. Kodak partnered with some big names, Microsoft Corporation, Hewlett-Packard Company and Live Picture, Inc, were among them. Kodak marketed the format and even included it as a native file format to some of its new digital cameras. The format was made official in June of 1996, with a Whitepaper explaining all the benefits and architecture. There was a lot of hype, some even calling it, “Not your Grandma’s format“. Many graphics software started to include support for the new format, including Adobe Photoshop. So what happened, why didn’t the format catch on? Some say it was the size of storing multiple resolutions in one file, others believe it was the complicated Compound Object structure that lead to its demise. Either way, the format had a lot of hype in the late 1990’s, but by the year 2000, it had gone silent and all the websites went away.

FlashPix did have a big impact, and there were many software and hardware devices which were made compatible. There are a few stories left behind of those who scanned all their photos to the FlashPix format only to find a few years later it was unsupported on more modern computers. There was also a few early digital camera’s which could capture directly to the format. Take my Kodak DC260 zoom camera, circa 1998. Changing the Capture Preferences, I can switch between a JPG and FPX.

Using exiftool we can take a look at one of the images from the camera:

exiftool P0004795.FPX
ExifTool Version Number         : 12.73
File Name                       : P0004795.FPX
Directory                       : GitHub/digicam_corpus/Kodak/DC260/DC260_01
File Size                       : 251 kB
File Modification Date/Time     : 2024:01:06 12:54:20-07:00
File Access Date/Time           : 2024:01:06 13:20:46-07:00
File Inode Change Date/Time     : 2024:01:06 13:04:34-07:00
File Permissions                : -rwxrwxrwx
File Type                       : FPX
File Type Extension             : fpx
MIME Type                       : image/vnd.fpx
Code Page                       : Unicode UTF-16, little endian
Data Object ID                  : 13BC5A58-6B90-1B6B-12C9-0800201177F8
Data Object Status              : Exists, Not Purgeable
Creating Transform              : Source Image
Using Transforms                : 
Cached Image Height             : 1024
Cached Image Width              : 1536
Comp Obj User Type Len          : 16
Comp Obj User Type              : FlashPix_Object
Visible Outputs                 : 1
Maximum Image Index             : 1
Maximum Transform Index         : 0
Maximum Operation Index         : 0
Thumbnail Clip                  : (Binary data 18480 bytes, use -b option to extract)
Revision Number                 : 1
Create Date                     : 2024:01:06 12:53:29
Modify Date                     : 2024:01:06 12:53:29
Software                        : KODAK DIGITAL SCIENCE DC260
Image Width                     : 1536
Image Height                    : 1024
Subimage Width                  : 1536
Subimage Height                 : 1024
Subimage Color                  : RGB
Subimage Numerical Format       : 8-bit, Unsigned
Decimation Method               : None (Full-sized Image)
JPEG Tables                     : (Binary data 558 bytes, use -b option to extract)
Number Of Resolutions           : 1
Max JPEG Table Index            : 1
Scene Type                      : Original Scene
Software Release                : KODAK DIGITAL SCIENCE DC260
Make                            : Eastman Kodak Company
Camera Model Name               : KODAK DIGITAL SCIENCE DC260
Serial Number                   : 7577
Exposure Time                   : 1/180
F Number                        : 4.7
Exposure Program                : Program AE
Exposure Compensation           : 0
Subject Distance                : 0.520 m
Metering Mode                   : Center-weighted average
Light Source                    : Unknown
Focal Length                    : 24.0 mm
Max Aperture Value              : 4.6
Flash                           : No Flash
Exposure Index                  : 90
Sharpness Approximation         : 0
File Source                     : Digital Camera
Sensing Method                  : One-chip color area
Extension Create Date           : 2024:01:06 12:53:29
Extension Modify Date           : 2024:01:06 12:53:29
Creating Application            : Picoss
Extension Name                  : ijuhsimasa
Extension Persistence           : Always Valid
Extension Description           : Data Object Store 000001
Storage-Stream Pathname         : /Data Object Store 000001
Extension Class ID              : 56616000-C154-11CE-8553-00AA00A1F95B
Used Extension Numbers          : 1
Screen Nail                     : (Binary data 4304 bytes, use -b option to extract)
Subimage Tile Count             : 384
Subimage Tile Width             : 64
Subimage Tile Height            : 64
Num Channels                    : 3
Audio Stream                    : (Binary data 30780 bytes, use -b option to extract)
Aperture                        : 4.7
Image Size                      : 1536x1024
Megapixels                      : 1.6
Shutter Speed                   : 1/180
Preview Image                   : (Binary data 4164 bytes, use -b option to extract)
Focal Length                    : 24.0 mm

The file also does identify in PRONOM:

sf P0004795.FPX 
---
siegfried   : 1.11.0
scandate    : 2024-01-17T23:13:59-07:00
signature   : default.sig
created     : 2023-12-17T15:54:41+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'P0004795.FPX'
filesize : 250880
modified : 2024-01-06T12:54:20-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/56'
    format  : 'Kodak FlashPix Image'
    version : 
    mime    : 'image/vnd.fpx'
    class   : 'Image (Raster)'
    basis   : 'extension match fpx; container name CompObj with byte match at 53, 36 (signature 2/2)'
    warning : 

If you notice, PRONOM has two signatures for the FlashPix format, this image was identified with signature #2. The first signature looks for the string “FlashPix Object”, but the second looks for the CLSID which is unique to each compound object format. FlashPix has the CLSID: {56616700-c154-11ce-8553-00aa00a1f95b}. Looking at many of the other samples I have there is much variation on the use of the string and CLSID.

FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

The images from the Kodak Camera use “FlashPix_Object” string so with the underscore it doesn’t match the first signature, but others I made using Picture It! software used a couple variations. Many don’t use the string at all. Others use a sightly different CLSID in both uppercase and lowercase. We will have to suggest adjustments to the current signature to identify them all.

Looking at the contents of the OLE container we can see some interesting things.

Path = P0004795.FPX
Type = Compound
Physical Size = 250880
Extension = compound
Cluster Size = 512
Sector Size = 64

Size         Compressed     Name
------------ ------------  ------------------------
188          192           [5]Data Object 000001
272          320           [1]CompObj
388          448           [5]Extension List
144          192           [5]Global Info
                           Data Object Store 000001
18704        18944         [5]SummaryInformation
816          832           Data Object Store 000001/[5]Image Contents
272          320           Data Object Store 000001/[1]CompObj
988          1024          Data Object Store 000001/[5]Extension List
1624         1664          Data Object Store 000001/[5]Image Info
4332         4608          Data Object Store 000001/[5]Screen Nail_bd0100609719a180
                           Data Object Store 000001/Resolution 0005
                           Data Object Store 000001/Audio_bd0100609719a180
1112         1152          Data Object Store 000001/[5]KDC_bd0100609719a180
72           128           Data Object Store 000001/[5]SummaryInformation
108          128           Data Object Store 000001/Audio_bd0100609719a180/[5]Audio Info
30808        31232         Data Object Store 000001/Audio_bd0100609719a180/Audio Stream 000000
6208         6656          Data Object Store 000001/Resolution 0005/Subimage 0000 Header
176378       176640        Data Object Store 000001/Resolution 0005/Subimage 0000 Data
------------ ------------  ------------------------
242414       244480        16 files, 3 folders

The main CompObj is where we find the identification information, but the Data Object Store 000001 directory is where all the image data is stored. In a multiple resolution image we might see additional Resolution directories. You may also notice a mention of an Audio directory. Yes, this image was captured and then audio was recorded with it. Not a video, but an audio clip associated with the image. FlashPix can contain audio streams. This isn’t the first time we have seen this, HP camera’s also have this function which as it turns out is stored in a FlashPix exif extension within a JPEG.

The FlashPix native format may have disappeared, but the format lives on as an extension to Exif data, allowing you to embed audio and other media within a JPEG file. The code for FlashPix was given to ImageMagick and is maintained by them.

RCA-VOC

I wonder sometimes what goes through a software/hardware developers mind when deciding a format to use for a new device. There are so many options our there for audio formats to choose from. I am sure there are pros and cons to using one technology over another but it seems a few decide to go ahead and make their own. I am sure there is some commercial advantage to developing a proprietary audio format, but with all the established choices it seems unnecessary.

Sony developed their own audio compression formats, which I explored in an earlier blog post. I came across a small goofy looking RCA voice recorder, model VR6320.

Many of these RCA VR series recorders can record in a WAV or a VOC file format. The WAV files are pretty run of the mill, but the VOC format is unique to RCA recorders.

The VOC format is not to be confused with another audio format with the same extension. The Creative Voice Format is a bit more well known. It was used with the Creative’s sound cards (Sound Blaster family) many folks had in their Windows computers in the 1990’s. But the RCA file format is different, and because of the same extension needs its own identification so they are not confused with each other.

sf REC00001.VOC 
---
siegfried   : 1.10.1
scandate    : 2023-11-19T23:33:47-07:00
signature   : default.sig
created     : 2023-05-12T09:10:13Z
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V112.xml; container-signature-20230510.xml'
---
filename : 'REC00001.VOC'
filesize : 47231
modified : 2015-01-09T20:51:10-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match; possibilities based on extension are fmt/1736'

The RCA VOC file format seems to be undocumented, there isn’t much available. You can always download a copy of the RCA Digital Voice Manager software, which may or may not run on your current system, and convert the VOC files to WAV or you can use a piece of software coded in 2008 called “devoc“. The developer used to have an online website you could upload the VOC to and it would convert it automatically, but is not longer available. The code can also be found here.

Let’s take a look at the header of a couple of the files I have:

hexdump -C REC00001.VOC | head
00000000  56 43 50 31 36 32 5f 56  4f 43 5f 46 69 6c 65 0c  |VCP162_VOC_File.|
00000010  0f 01 09 14 32 1c 00 00  0b 44 03 00 00 00 00 00  |....2....D......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001b0  00 10 00 00 00 00 00 00  00 00 00 10 00 00 00 00  |................|
000001c0  00 00 00 00 00 10 00 00  00 00 00 00 00 00 00 10  |................|
000001d0  00 00 00 00 00 00 00 ff  ff ff ff ff ff ff ff ff  |................|
000001e0  ef 11 14 d3 96 77 57 44  34 33 34 44 43 33 44 43  |.....wWD434DC3DC|
000001f0  43 34 44 34 43 43 34 44  43 43 33 35 43 33 43 34  |C4D4CC4DCC35C3C4|
00000200  34 43 43 24 34 43 43 33  44 51 33 42 14 44 32 43  |4CC$4CC3DQ3B.D2C|

hexdump -C A0000003.VOC | head
00000000  52 50 35 31 32 30 5f 56  4f 43 5f 46 69 6c 65 78  |RP5120_VOC_Filex|
00000010  08 06 16 0a 0f 20 00 04  17 01 03 00 00 00 00 00  |..... ..........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000180  00 03 b9 2f 00 07 62 af  00 0b 0c 2f 00 0e b5 af  |.../..b..../....|
00000190  00 12 5f 2f 00 00 00 00  00 00 00 00 00 00 00 00  |.._/............|
000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000fa0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 e1  |................|
00000fb0  ea eb ea fe df ae 4e a1  1d cd 1c cf 9f de cf 3b  |......N........;|

Most of samples I have show “VCP162_VOC_File” in the header, but I have one sample with “RP5120_VOC_File“. I have heard of others, one being “V432_Voice_File“. There could be more variations. One could assume the header is somehow associated with the model number of the device, but that doesn’t appear to be the case. Although there is a device with the model number “RP 5120“. It might be that the older RP series get one header and the newer VR Series get VCP? I will need more samples to confirm, if you have any send them my way. Also, according to the manuals, there is a SP and LP mode to manage the bitrate of the file to squeeze more minutes on the built in memory of these devices. This doesn’t appear to affect identification, but might be good to differentiate in the future.

For now you can take a look at the signature on my GitHub page.

HighMAT

Before the days of streaming and devices likeSmart TVs, AppleTV and Fire sticks, a few companies tried their best to come up with ways to make viewing your media on your TV mainstream. In a previous blog post I touched on the Kodak PhotoCD method, but there is one you are probably even less familiar with. HighMAT. HighMAT, or High-Performance Media Access Technology was a technology co-developed by Microsoft and Panasonic. You may have at one point owned a DVD player which had the technology built-in, but may have never used it. It came on the scene around 2002, but was abandoned by 2008.

Panasonic DVD/CD Player with HighMAT playback.

There were quite a few devices stamped with the HighMAT logo. The technology allow you to playback any Audio and Images like a DVD, with a menu and everything.

There was three different types of HighMAT compatible devices, Audio, Audio-Image, and Audio-Image-Video.

Writing data to the HighMAT format could be done with a plugin for Windows which added the functionality to Windows Media Player for burning audio playlists to the HighMAT format or through the standard CD Writing Wizard built-in to Windows XP. An extra screen would come up asking if you would like to make the CD HighMAT compatible. Making video compatible HighMAT CDs could be done through Movie Maker.

When a HighMAT CD-R/CD-RW is authored we get an interesting CD. It appears to be a Mode 2 Form 1 format:

/dev/disk10 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:        CD_partition_scheme                        *846.4 MB   disk10
   1:       CD_ROM_Mode_2_Form_1 Highmat02               2.7 MB     disk10s0

If you would like to check out a sample disc, you can grab the ISO or BIN/CUE here.

tree /Volumes/Highmat02
/Volumes/Highmat02
├── Audio\ Samples
│   ├── 17\ [no\ artist]\ -\ Speaker\ Identification\ Test.wma
│   └── sine.wma
├── HIGHMAT
│   ├── AUTHOR.XML
│   ├── CONTENTS.HMT
│   ├── IMAGES
│   │   ├── T0.HMT
│   │   ├── T1.HMT
│   │   ├── T10.HMT
│   │   ├── T11.HMT
│   │   ├── T12.HMT
│   │   ├── T13.HMT
│   │   ├── T2.HMT
│   │   ├── T3.HMT
│   │   ├── T4.HMT
│   │   ├── T5.HMT
│   │   ├── T6.HMT
│   │   ├── T7.HMT
│   │   ├── T8.HMT
│   │   └── T9.HMT
│   ├── MENU.HMT
│   ├── PLAYLIST
│   │   ├── 00000001.HMT
│   │   ├── 00000002.HMT
│   │   ├── 00000003.HMT
│   │   ├── 00000004.HMT
│   │   ├── 00000005.HMT
│   │   ├── 00000006.HMT
│   │   └── 00000007.HMT
│   └── TEXT.HMT
└── My\ Pics
    ├── Blue\ hills.jpg
    ├── Sunset.jpg
    ├── Water\ lilies.jpg
    └── Winter.jpg

5 directories, 31 files

There is a lot going on here, lets take a look at a few of the formats we find in this disc structure. The files added to the CD are converted to WMA if you checked the “Convert Files” feature and are accessible like a normal data CD. The HighMAT folder is created to make a compatible HighMAT disc. Except for one XML file the rest of the files in the HighMAT folder all have an HMT extension. The author.xml file contains the root element <HMT> with some filenames indicating some of the HMT files may be thumbnails. If we open one of the HMT thumbnail files in a hex editor we can see:

Just a plain old JPG header. Exiftool tells us it is small 160×120 pixel image, must be a thumbnail. But lets take a look at another HMT file.

Even though the Menu.hmt file has the same extension as the thumbnails, this file is definitely not a JPG file with pixel data. Same goes for the Contents and Text files as well, unique formats.

The files in the playlist folder also have a unique format.

So it seems all the HighMAT folder really does is add compatibility for hardware to provide a menu to access the original data, providing playlists and thumbnails to navigate the data on your TV screen.

I came across one of these discs while processing a collection of CD-R discs donated to our library. Normally I would copy the images and other data off the disc to our preservation system, but this disc made me stop to think about the best way to preserve the data. Is a disc image appropriate or is the HighMAT folder even worth preserving if we have the original files from the disc? Finding hardware or a software player to present the disc as intended is getting harder to do. I am curious what others think of the value of this content.

I chose not to submit any signatures to PRONOM for the moment as we assess. It would be difficult to properly identify each format with all of them having the same extension, especially the JPG thumbnails as HMT is not a valid extension for the format. Take a look at my sample files and if you have come across this format before, let me know.

Shockwave Audio

Ok, confession time.

There is only a couple moments in my tech history which had a profound effect on me, enough to sear the memory of the moment into my brain. When I was in college around 1997 I had a decent CD collection and I had learned how to copy those AIFF files off the disc and use them on my trusty PowerCenter Pro. These files were huge, at the time. I knew a regular size song would take up around 50MB on my hard drive. This was a lot of space back in 1997, but I could then mix them with other songs, something I did sometimes for friends I had on the dance team. I didn’t have a CD burner at the time so I would transfer them to cassette tape. I know, but remember this was the 1990’s when everything was changing and expensive.

One night I was exploring the world wide web and I happened across someone sharing a few songs. I assumed they were just clips as they were only 5MB in size, a tenth the size they should be. I downloaded the song, which of course still took a few minutes back in those days. When I played the song, I was dumbfounded, it was the whole song. I was completely confused. How could they take a 4+ minute song and compress it down to under 5MB? This was amazing.

I started grabbing every song I could find. Before long I had quite the collection. And before you judge me for downloading music from the web, this was a couple years before the advertisement we all remember reminding us that we wouldn’t steal a car so why would we steal music.

The files I found on the internet were MP3 files, the same we are familiar with today. Back then creating MP3 files wasn’t easy. MP3 was actually a licensed product so you had to get a little creative in order to make them. On my Macintosh PowerCenter Pro, there were even fewer options. I was already familiar with the sound editing application from Macromedia called SoundEdit 16, it was the tool I used to do all my editing. I found there was a plugin you could add which allowed export to a format called Shockwave Audio. This was meant for use in Macromedia’s Director application to add sound to the growing Flash animation industry. Once I got the plugin and installed I couldn’t stop making files and I made them as fast as I could. For a whole album this could take over an hour on my hardware, but it was worth it. Before long I had a large collection of popular music ready to play at a moments notice. My player of choice was MacAMP, a sibling of the popular WinAMP. I even borrowed some equipment from a friend who DJ’d on the weekends and DJ’d a college dance. I lugged my whole PowerCenter Pro tower and 17in trinitron monitor over to the school. It was so much fun and folks didn’t understand when they asked to see my CD collection.

Enough about transgressions from my youth, lets talk about the Shockwave Audio format.

To create a SWA file you would first need SoundEdit 16 Version 2. Then the plugins to enable export. This would only run on PowerPC computers running Macintosh OS or Classic in Mac OS X. For this post I pulled out my trusty PowerBook G4 Titanium running MacOS 9 and MacOS X 10.2. Installed SoundEdit 16 and the plugins in the Xtras folder and we are good to go.

Before you export you need to set what bitrate you prefer for the final file, giving you the option of 8KBits up to 160KBits per second. The higher the bitrate the longer it took and made larger files.

SoundEdit 16 had a native audio format and also frequently used the SoundDesigner II format to save the uncompressed files. On a Macintosh you had to be careful as these formats did not travel well to other systems on account of the resource forks associated with the data.

Because these SWA files were meant to be used in websites and other non-Mac systems, they did not have a resource fork, but had the Creator/Type codes, SwaT/SHCK. An extension wasn’t necessary for use on your Macintosh, but it was best to use .swa.

Here is what the data looks like for a SWA file.

Even though the SWA format uses MPEG compression, this is not a typical header you might see in a MP3. There was no ID3 tags at the time so not much in terms of metadata.

General
Complete name                            : tone2.swa
Format                                   : MPEG Audio
File size                                : 80.7 KiB
Duration                                 : 5 s 166 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 128 kb/s
FileExtension_Invalid                    : m1a mpa mpa1 mp1 m2a mpa2 mp2 mp3

Audio
Format                                   : MPEG Audio
Format version                           : Version 1
Format profile                           : Layer 3
Format settings                          : Joint stereo / MS Stereo
Duration                                 : 5 s 172 ms
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 kHz
Frame rate                               : 38.281 FPS (1152 SPF)
Compression mode                         : Lossy
Stream size                              : 80.7 KiB (100%)
ffprobe -i tone2.swa 
[mp3 @ 0x155704a60] Format mp3 detected only with low score of 25, misdetection possible!
[mp3 @ 0x155704a60] Skipping 324 bytes of junk at 0.
[mp3 @ 0x155704a60] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'tone2.swa':
Duration: 00:00:05.15, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s

There are a few consistencies among all my files. They all begin with the hex values “00000140000000030000” for the first 10 bytes and all of them seem to have the string “MACRZ” at offset 36. I haven’t been able to find a open specification for this file format, so we will have to go with what we can find in the samples. According to ffprobe from above, there is 324 bytes of a header before the first MP3 frame starts.

MPEG signatures are difficult, there are no headers, just a sequence of frames. This is why there are often so many identification conflicts with the MP3 format. These SWA files indeed identify as MP3 files, but with a mismatch extension.

filename : 'tone2.swa'
filesize : 82661
modified : 1970-01-01T00:00:00-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/134'
    format  : 'MPEG 1/2 Audio Layer 3'
    version : 
    mime    : 'audio/mpeg'
    class   : 'Audio'
    basis   : 'byte match at 0, 4088 (signature 5/9)'
    warning : 'extension mismatch'

If we wanted to distinguish an SWA from an MP3 we would need to create a new signature and give it priority over the MP3 signature. There is enough of a header this would be possible and easier, but since they are, in reality, just MP3 files does it matter? Trying to play a SWA on a modern computer is only possible if you change the extension to MP3.

If you want to take a look at some samples you can grab a couple I made on my GitHub page or check out some commercially made files for an awesome Star Trek Starship Creator game.

Sony Voice Recorder

Sony’s IC Recorders have been a popular small digital voice recorder for many consumers. The current models all use common recording formats like Linear PCM WAVE files or MP3, but it wasn’t always so. One of the first models ICD-R100 would record to the ICS audio format, which was Sony’s original sound formats used on the IC Recorders. I am still looking for samples of this format. If you do have a need to convert this format, Sony has free converter software.

The next generation of IC Recorders used a Memory Stick and therefore recorded audio to the MSV (Memory Stick Voice) format. There were actually two different types of MSV files, the first used the ADPCM codec and the next used the LPEC codec. Later IC Recorders would record to the DVF (Digital Voice Format) which also had a couple versions, one using the LPEC codec and the other the older TRC codec.

AFAIK, none of the codecs used in these file formats has been made public and these formats are not readable by tools such as MediaInfo. The only way to know details of a file and have the ability to play or convert is to use Sony software which has been discontinued and the replacement, Sound Organizer, can only recognize the LPEC codec versions of MSV and DVF. There is also a plugin for Windows Media Player available here, which is required even for Switch to work.

PRONOM currently has one signature for the LPEC versions of MSV and DVF, so lets look closer at the formats and see if we can determine what they are from the header.

The CODECs

ADPCM is an abbreviation for “Adaptive differential pulse-code modulation“. Appears to only have been used with the ICD-MS1 and possibly MS2 digital recorders.

TRC may be an abbreviation of Truespeech’s “Triple Rate CODER” or “Triple Rate Codec“, but not much info exists.

LPEC is a proprietary compression format. It is an abbreviation of “Long-term Predicated Excitation Coding“. It even had its own trademarked logo which was cancelled in 2015.

The Software

The first IC Recorders came with PCLINK software, then came with the “MemoryStick Voice Editor” software. List of compatible formats.

Digital Voice Editor came next. It could read and convert everything except “ICS” files. Click here to download the last version. Version 1 compatible formats. Version 2 compatible formats. Version 3 Compatible Formats. The software was officially retired in 2016.

The current software for managing audio files from IC Recorder is Sound Organizer. The software does open and convert some MSV/DVF files as long as they use the LPEC codec. Sound Organizer Compatible formats.

Also note, Sony made one ICD-CX series recorder which could also capture photos. It requires the Visual & Voice Player software. Audio is recorded in the DVF format.

Test Data Set

In order to explore the different formats I first needed to gather some samples. There are a few out there, but with the Digital Voice Editor 3 software, I was able to take a sample file and convert it to the many options available. You can see in the screenshot below, the different samples, their extension and the codec used. You can find my samples in GitHub here.

All MSV and DVF file have a similar pattern. The first 32 bytes have the text string “MS_VOICE SONY CORPORATION”. In between MS_VOICE and SONY, there is 4 bytes which vary slightly between the different formats. Here is a table of samples and the 4 bytes so we can see the differences.

ModelCODECEXTENSIONHex Values
ICD-Px0TRCDVF01020000
ICD-Px8TRCDVF01020000
ICD-Px7TRCDVF01020000
ICD-SXxx0LPECMSV01030000
ICD-SXx8LPECMSV01030000
ICD-SXx7LPECMSV01030000
ICD-SXx6LPECDVF01020000
ICD-SXx5LPECDVF01020000
ICD-SXx0LPECDVF01020000
ICD-MXLPECMSV01020000
ICD-BMLPECMSV01020000
ICD-STLPECDVF01020000
ICD-MS5xxLPECMSV01010000
ICD-SLPECMSV01010000
ICD-BPx50LPECDVF01010000
ICD-BP100/x20LPECDVF01010000
ICD-MS1/MS2ADPCMMSV01000000
ICD-R100/R200UnknownICS

There is an obvious pattern to the hex values as they increment 0100, 0101, 0102, and 0103. But there is some overlap between extension and codec, so probably more of a version number than specific to the codec. Currently the PRONOM signature for this format fmt/472, has the pattern for the 0102 version, but none of the others. We could simply add a variable in the signature for the different values and update the PRONOM signature so more samples would be identified. This would work well if there was a secondary characterization process to get technical metadata such as the codec and quality, but I am unaware of any tool to gather this information from the format, so I wonder if we can find any hints in the file to identify the codec so we have multiple PRONOM signatures to choose from. Also, you can see from the screenshot above that some of the LPEC formats have specific model numbers in the codec column, which could mean they may not be exactly the same. Each IC Recorder model has different quality settings and it appears, some settings may not be compatible with other models.

Looking beyond the first 16 bytes there is a lot of hex values which are unknown. A close comparison of all the samples leads me to the 4 bytes at offset 60. They seem to be the same for files with the same settings. Below is a chart of those values.

ExtensionCODECQualityOffset 60
DVFTRCHQ00300001
DVFTRCSP00350001
DVFTRCLP00370001
DVFLPEC (ICD-BP-100/x20)SP00150001
DVFLPEC (ICD-BP-100/x20)LP00190001
DVFLPECSP002A0001
DVFLPECLP002C0001
MSVLPEC (ICD-BM/MX/SXx7/SXx8/SXxx0)SP004A0001
MSVLPEC (ICD-BM/MX/SXx7/SXx8/SXxx0)LP004C0001
MSV/DVFLPEC (ICD-SXx7/SXx8/SXxx0)STHQ00200002
MSV/DVFLPEC (ICD-SXx7/SXx8/SXxx0)ST00240002
MSVADPCMSP00050001
MSVADPCMLP00090001

Just to be sure this value at offset 60 was indeed an indication of codec and quality I manually switch out the 4 bytes from a LPEC ST file for a TRC HQ file. Sure enough, the software now saw the file as a TRC HQ audio file, even though the original is a Stereo file.

There is a very good chance this is not all the options. I only have one physical recorder which only records in Mono. But this gives us a really good idea of how to tell the difference between files. Below are the patterns I am submitting to PRONOM.

MSV ADPCM

4D535F564F494345{4}01000000534F4E5920434F52504F524154494F4E{28}00(05|09)0001

DVF TRC

4D535F564F494345{4}01020000534F4E5920434F52504F524154494F4E{28}00(30|35|37)0001

MSV/DVF LPEC

4D535F564F494345{4}01(01|02|03)0000534F4E5920434F52504F524154494F4E{28}00(15|19|20|24|2A|2C|4A|4C)00(01|02)

Perhaps we can alter the existing PRONOM signature for fmt/472 to catch all we may miss to:

4D535F564F494345{8}534F4E5920434F52504F524154494F4E6D73766C637374772E73706900000000

This is one example of a file format which has a proprietary component which was never released from the vendor. When the vendor stopped supporting the software to open and read these formats, the risk increased for long-term preservation. It would be really nice when a vendor discontinues a technology, which was used by consumers, they would make the documentation for the format openly available. If you know more about the format, please reach out or if you have samples which don’t match the patterns mentioned here.