KODAK TIFF

Years ago I bought my first digital camera. It was an Epson PhotoPC 3100z and I bought it because it could capture a digital image directly to a TIFF file. I don’t think most people would care about such a feature, but I thought it was awesome. Granted it filled up the small 32MB compact flash card pretty quick, I had to upgrade to a 512MB card, that set me back.

TIFF images are pretty universal, they have a well known structure and have been around for a very long time. I have written about TIFF’s before, so I wont go into too much about the format. The format is well respected in the preservation community, although one of the best websites, Aware Systems, documenting the various TIFF tags has gone dark in the this year, here is an archived version.

Many of the digital camera’s from the beginning to now use the TIFF format to store RAW sensor data. Most use their own extension and follow well established methods for storing the sensor data in an IFD with lots of common and custom tags. The DNG format is an open RAW format which uses the TIFF format to store sensor data, although many use SubIFD’s and can be incompatible with some software.

The first Digital Camera was invented by a Kodak employee, Steve Sasson in 1975, well, he was the first to use a CCD sensor in a self contained unit. This led Kodak to push the technology forward and in 1991 released the Kodak DCS digital system which used Nikon cameras equipped with a digital sensor. These early digital cameras were quite expensive, they used early CF cards and SCSI connections. Kodak released a few models of the DCS series, first on Nikon bodies, then on some Canon bodies. These early cameras used the TIFF format to store the RAW sensor data. For some reason, they decided to use a proprietary method and compression while still using the TIF extension.

Kodak was responsible for many new image file formats. Not sure why they decided to use a common format like TIFF and still use the TIF extension, but make it proprietary. The RAW file created by the DCS series of camera’s had to be opened with special plugins or software, if you tried to open the TIFF’s with anything else, you would only see the small thumbnail image located at IFD0 instead of the full size image hidden in a SubIFD1.

Finding samples of this format is particularly hard as they have the common TIF extension. The camera’s are also pretty rare and finding one is difficult, especially in working condition. I was only aware of a couple samples on the rawsamples.ch site, but that wasn’t enough to understand the format as the two files had a different structure.

hexdump -C RAW_KODAK_DCS460D_FILEVERSION_3.TIF | head
00000000 49 49 2a 00 00 03 00 00 7c 01 00 00 00 00 00 00 |II*.....|.......|
00000010 4b 4f 44 41 4b 20 20 20 20 20 20 20 20 20 20 20 |KODAK |
00000020 44 43 53 34 36 30 44 20 20 20 20 20 20 20 20 20 |DCS460D |
00000030 46 49 4c 45 20 56 45 52 53 49 4f 4e 20 33 20 20 |FILE VERSION 3 |
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 30 35 31 39 39 38 20 20 20 20 20 20 20 20 20 20 |051998 |
00000060 34 36 30 2d 32 39 35 30 00 00 00 00 00 00 00 00 |460-2950........|
00000070 31 39 39 30 3a 30 31 3a 30 31 20 31 32 3a 30 32 |1990:01:01 12:02|
00000080 3a 30 37 00 5b 20 32 5d 0d 49 53 4f 3a 20 20 20 |:07.[ 2].ISO: |
00000090 20 20 20 20 20 38 30 20 20 0d 41 70 65 72 74 75 | 80 .Apertu|

hexdump -C RAW_KODAK_DCS560C.TIF | head
00000000 4d 4d 00 2a 00 00 11 76 00 04 f7 50 00 00 00 00 |MM.*...v...P....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000040 54 68 69 73 20 69 6d 61 67 65 20 66 69 6c 65 20 |This image file |
00000050 77 61 73 20 63 72 65 61 74 65 64 20 62 79 20 61 |was created by a|
00000060 20 4b 6f 64 61 6b 20 44 43 53 35 36 30 43 20 64 | Kodak DCS560C d|
00000070 69 67 69 74 61 6c 20 63 61 6d 65 72 61 2e 20 28 |igital camera. (|
00000080 6e 75 6c 6c 29 20 20 00 00 00 00 00 00 00 00 00 |null) .........|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

There is/was a website called https://raw.pixls.us/, but it has been offline since last June, the regular site still works, but the raw sub-domain is unreachable. Luckily the wayback machine had archived a few samples.

I also found a reference on an older website referring to a sample set maintained by Kodak for developers using the SDK, but also no longer available. You can find the old website also on the wayback machine.

With a few more samples to refer to, it makes it easier to understand the headers and put together a signature. There was an SDK, but seems to be difficult to locate today, but the manual does give us a little more info on the different models and their format.

So from the SDK statement, the samples I have in TIF, and others I have in the more recent DCR format, I can conclude the custom TIF format was used with the DCS 3xx, 4xx, 5xx, 6xx models and from 7xx on the DCR format was used as the camera RAW. Looking closer at the samples in TIF, we can see all the 4xx models used the “FILE VERSION 3” version of the format, while the others have the full statement in the header. Not 100% clear on which format came first, but the 4xx models are some of the earliest models.

At the time, there was only Kodak software that could properly “develop” the RAW file taken by these camera models. Today that has changed and the format has been added to many open source libraries such as libraw and rawspeed. Many other commercial products also claim to support the DCS models including Adobe Camera Raw, which seems to be able to open these TIF’s.

Distinguishing these RAW TIF’s is important to properly manage them over the long term. These images currently identify in the PRONOM repository as regular TIF’s, fmt/353, so we would need to create a signature which identifies the standard TIFF header, but also uses bytes unique to this format. In the few samples I have the “VERSION 3” images all start with the litte-endian header, “49492A00”, while the other samples start with the big-endian header, “4D4D002A”. That makes it a little easier for each signature.

For for the “VERSION 3” format we could use a pattern such as 49492A00{12}4B4F44414B{11}(444353|454F53444353). This looks for the TIFF header, skips 12 bytes, looks for the word “KODAK”, skips 11 more bytes to then look for either “DCS” or “EOSDCS” right before the camera model number.

For the other format we also look for the TIFF header, but then find the whole string used in all the samples. 4D4D002A{60}5468697320696D6167652066696C652077617320637265617465642062792061204B6F64616B20444353{5}6469676974616C2063616D6572612E

This looks for the big-endian header, then the string, “This image file was created by a Kodak DCS”, skipping the model number, then the end of the string, “digital camera.” This should catch all the different models of this format.

You can find my proposed signature on my GitHub page, since none of the samples belong to me, you can find them above in some of the links.

RealVideo

For #WDPD24 and PRONOM Hackathon week this year, I want to find some older formats listed which did not have a signature. There is a list to choose from, but I wanted to find something I hadn’t worked on before. I came across two entries for Real Video:

PUIDNameExtension
fmt/204RealVideo Cliprv
x-fmt/277Real Videorv

I was familiar with Real Media and Real Audio, but had yet to come across any RealVideo with the RV extension. I thought it would be easy to find some references and samples, but that was not the case. I assume PRONOM originally added these based on MIME types available.

Real or RealNetworks is/was an Internet media company who jumped on the rapidly growing World Wide Web in 1995 to become a leader in Internet Media Delivery. Their initial offerings mainly focused on audio streaming and they accomplished all of this by providing free players and web browser extensions to make it easy to serve up a website with streaming media everyone could enjoy. Later adding video streaming optimized for the slower dialup and connections of the day. They used codecs based on common technology like H.263 and H.264, but used then to make their own proprietary codecs identified through FourCC codes, RV10-RV60.

So thought it would be easy to find a reference to the RV extension, I quickly discovered it wasn’t. Looking at the Wikipedia page on RealVideo, I found no reference to the RV extension. RV is an abbreviation for RealVideo, right? Well, I ended up finding a reference in the RealAudio page under file extensions. Ok, First clue to the existence of the RV extension. The page references RV as being used for video only files and was used by the flagship encoder (RealProducer).

RealProducer was the tool for creating the streaming audio and video formats that could then be used for your website or streaming platform. The RealProducer software came in a Basic version, which was free, and the Plus or Pro version, which was not free and provided more options. The first version of RealProducer to make video files was version 4. I was able to find a copy of the encoder and installed it under a Windows 95 emulator. To my surprise it only saved to the RealMedia RM file format. This format is well known and identified with PRONOM as x-fmt/190 also documented at the LoC.

This was the same with RealProducer 5, 7, 8, 9, and 10 that I was able to try. All made no mention of the RV extension. I was starting to feel this format didn’t exist or that some decided to use the RV extension on their own. Searches on Google yielded a couple results, mostly from users who had found a few files on their older discs and wanted to migrate them to something newer. I was able to find one example, one user shared, but it had the same header as the RealMedia format. The clue was in the file.

hexdump -C ambush_abb.rv
00000000  2e 52 4d 46 00 00 00 12  00 01 00 00 00 00 00 00  |.RMF............|
00000010  00 07 50 52 4f 50 00 00  00 32 00 00 00 03 6e e8  |..PROP...2....n.|
00000020  00 03 6e e8 00 00 03 e0  00 00 01 b3 00 00 6a 6f  |..n...........jo|
00000030  00 06 80 fa 00 00 08 b5  00 ba 41 73 00 00 03 55  |..........As...U|
00000040  00 03 00 09 43 4f 4e 54  00 00 00 40 00 00 00 00  |....CONT...@....|
00000050  00 00 00 08 28 43 29 20  32 30 30 35 00 26 00 00  |....(C) 2005.&..|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000270  00 09 61 75 64 69 6f 4d  6f 64 65 00 00 00 02 00  |..audioMode.....|
00000280  06 76 6f 69 63 65 00 00  00 00 2d 00 00 0d 43 72  |.voice....-...Cr|
00000290  65 61 74 69 6f 6e 20 44  61 74 65 00 00 00 02 00  |eation Date.....|
000002a0  13 39 2f 32 30 2f 32 30  30 36 20 31 34 3a 30 37  |.9/20/2006 14:07|
000002b0  3a 30 38 00 00 00 00 53  00 00 0c 47 65 6e 65 72  |:08....S...Gener|
000002c0  61 74 65 64 20 42 79 00  00 00 02 00 3a 52 65 61  |ated By.....:Rea|
000002d0  6c 50 72 6f 64 75 63 65  72 28 52 29 20 42 61 73  |lProducer(R) Bas|
000002e0  69 63 20 31 31 2e 30 20  66 6f 72 20 57 69 6e 64  |ic 11.0 for Wind|
000002f0  6f 77 73 2c 20 42 75 69  6c 64 20 31 31 2e 30 2e  |ows, Build 11.0.|
00000300  30 2e 32 30 30 39 00 00  00 00 31 00 00 11 4d 6f  |0.2009....1...Mo|
00000310  64 69 66 69 63 61 74 69  6f 6e 20 44 61 74 65 00  |dification Date.|
00000320  00 00 02 00 13 39 2f 32  30 2f 32 30 30 36 20 31  |.....9/20/2006 1|
00000330  34 3a 30 37 3a 30 38 00  00 00 00 1d 00 00 09 76  |4:07:08........v|
00000340  69 64 65 6f 4d 6f 64 65  00 00 00 02 00 07 6e 6f  |ideoMode......no|
00000350  72 6d 61 6c 00 44 41 54  41 00 ba 3e 1e 00 00 00  |rmal.DATA..>....|

RealProducer Basic 11 for Windows. The Wikipedia article did hint at this by saying “the latest version of RealProducer reverted to using .ra for audio only files and began using .rv for video files with or without audio.” Why would they use the RM extension for so long, then revert to a different extension with a later version? I found more in the User Manual for version 11.

• .rv – RealVideo
RealProducer uses the .rv file extension if the input is video-only or video-with-audio. You can also select the .rm file extension for video content.
Tip: Using the .rv file extension helps search engines identify the file as a RealVideo clip.

• .rm – RealAudio or RealVideo
RealProducer chooses the .rm file extension if it cannot determine the content of the input clip. You can use .rm file extension for any RealAudio or RealVideo clip, except for variable bit-rate clips.

Ok, so a few things to learn from this. One is the RV extension was used as the default for version 11 as they wanted search engines to identify them as a RealVideo clip. Second thing we learned is there is no difference between the two placeholders in PRONOM, one being a RealVideo file and the other being a RealVideo Clip. We don’t need both.

Now, is there any difference between an RV and RM file?

hexdump -C Producer11-01.rv | head
00000000 2e 52 4d 46 00 00 00 12 00 01 00 00 00 00 00 00 |.RMF............|
00000010 00 07 50 52 4f 50 00 00 00 32 00 00 00 03 6e e8 |..PROP...2....n.|
00000020 00 03 6e e8 00 00 03 e0 00 00 01 c7 00 00 01 66 |..n............f|
00000030 00 00 1b 57 00 00 07 41 00 02 91 0a 00 00 03 5e |...W...A.......^|
00000040 00 03 00 09 43 4f 4e 54 00 00 00 40 00 00 00 00 |....CONT...@....|
00000050 00 00 00 08 28 43 29 20 32 30 30 35 00 26 00 00 |....(C) 2005.&..|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000080 00 00 00 00 4d 44 50 52 00 00 00 70 00 00 00 00 |....MDPR...p....|
00000090 00 02 c2 a4 00 02 c2 a4 00 00 03 e0 00 00 01 9f |................|

hexdump -C Producer11-01.rm | head
00000000 2e 52 4d 46 00 00 00 12 00 01 00 00 00 00 00 00 |.RMF............|
00000010 00 07 50 52 4f 50 00 00 00 32 00 00 00 03 6e e8 |..PROP...2....n.|
00000020 00 03 6e e8 00 00 03 e0 00 00 01 a4 00 00 01 64 |..n............d|
00000030 00 00 1b 57 00 00 05 a4 00 02 5c 35 00 00 03 5e |...W......\5...^|
00000040 00 03 00 09 43 4f 4e 54 00 00 00 40 00 00 00 00 |....CONT...@....|
00000050 00 00 00 08 28 43 29 20 32 30 30 35 00 26 00 00 |....(C) 2005.&..|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000080 00 00 00 00 4d 44 50 52 00 00 00 70 00 00 00 00 |....MDPR...p....|
00000090 00 02 c2 a4 00 02 c2 a4 00 00 03 e0 00 00 01 a4 |................|

They both look very similar to me. Aside from a few bytes, they are practically identical. Lets see what MediaInfo has to say.

mediainfo Producer11-01.rv
General
Complete name : Producer11-01.rv
Format : RealMedia
File size : 164 KiB
Duration : 6 s 999 ms
Overall bit rate : 225 kb/s
Frame rate : 24.000 FPS
Copyright : (C) 2005
FileExtension_Invalid : rm rmvb ra

Video
ID : 0
Format : RealVideo 4
Codec ID : RV40
Codec ID/Info : Based on AVC (H.264), Real Player 9
Duration : 6 s 999 ms
Bit rate : 181 kb/s
Width : 640 pixels
Height : 424 pixels
Display aspect ratio : 3:2
Frame rate : 24.000 FPS
Bits/(Pixel*Frame) : 0.028
Stream size : 155 KiB (94%)

Audio
ID : 1
Format : Cooker
Codec ID : cook
Codec ID/Info : Based on G.722.1, Real Player 6
Duration : 7 s 429 ms
Bit rate : 44.1 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 40.0 KiB (24%)

mediainfo Producer11-01.rm
General
Complete name : Producer11-01.rm
Format : RealMedia
File size : 151 KiB
Duration : 6 s 999 ms
Overall bit rate : 225 kb/s
Frame rate : 24.000 FPS
Copyright : (C) 2005

Video
ID : 0
Format : RealVideo 4
Codec ID : RV40
Codec ID/Info : Based on AVC (H.264), Real Player 9
Duration : 6 s 999 ms
Bit rate : 181 kb/s
Width : 640 pixels
Height : 424 pixels
Display aspect ratio : 3:2
Frame rate : 24.000 FPS
Bits/(Pixel*Frame) : 0.028
Stream size : 155 KiB

Audio
ID : 1
Format : Cooker
Codec ID : cook
Codec ID/Info : Based on G.722.1, Real Player 6
Bit rate : 44.1 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits

Other than the RV file having a invalid file extension, they both identify as a RealMedia file and have identical properties. So it seems the RV file is really no different than the RM file. I think the best course of action for PRONOM is to deprecate these two RV PUID’s and just ad RV as an acceptable extension for the RealMedia format.

To add to the evidence, here is the output from ffprobe:

Input #0, rm, from 'Producer11-01.rm':
Metadata:
copyright : (C) 2005
comment :
ASMRuleBook : #($Bandwidth >= 0),Stream1Bandwidth = 44100, Stream0Bandwidth = 180900;
Audiences : 256k DSL or Cable;
audioMode : music
Creation Date : 11/12/2024 20:28:55
Generated By : RealProducer(R) Plus 11.1 for Windows, Build 11.1.0.2676
Modification Date: 11/12/2024 20:28:55
videoMode : normal
Duration: 00:00:07.00, start: 0.000000, bitrate: 176 kb/s
Stream #0:0: Video: rv40 (RV40 / 0x30345652), yuv420p, 640x424, 180 kb/s, 24 fps, 24 tbr, 1k tbn
Stream #0:1: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s

Input #0, rm, from 'Producer11-01.rv':
Metadata:
copyright : (C) 2005
comment :
ASMRuleBook : #($Bandwidth >= 0),Stream1Bandwidth = 44100, Stream0Bandwidth = 180900;
Audiences : 256k DSL or Cable;
audioMode : music
Creation Date : 11/12/2024 20:28:16
Generated By : RealProducer(R) Plus 11.1 for Windows, Build 11.1.0.2676
Modification Date: 11/12/2024 20:28:16
videoMode : normal
Duration: 00:00:07.43, start: 0.000000, bitrate: 181 kb/s
Stream #0:0: Video: rv40 (RV40 / 0x30345652), yuv420p, 640x424, 180 kb/s, 24 fps, 24 tbr, 1k tbn
Stream #0:1: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s

But wait, there are a couple formats we could add which are related to RealProducer. RealProducer used a few other formats to manage projects and other metadata for streaming. They include:

  • .RP RealPix Image
  • .RT RealText
  • .RPAD RealProducer Audience File
  • .RPJF RealProducer Job File
  • .RPSD RealProducer Server Destination
  • .RMHD RealMediaHD file
  • .RAM Playlist
  • .RPM Embedded RAM
File TypeExtensionMIME Type
Ram.ramaudio/x-pn-realaudio
Embedded Ram.rpmaudio/x-pn-realaudio-plugin
SMIL.smil and .smiapplication/smil
RealAudio.raaudio/x-pn-realaudio
RealVideo.rmapplication/x-pn-realmedia
Flash.swfapplication/x-shockwave-flash
RealPix.rpimage/vnd.rn-realpix
RealText.rttext/vnd.rn-realtext
https://web.archive.org/web/20120513203726/http://service.real.com/help/library/guides/production8/htmfiles/server.htm

Don’t get excited, the RealPix Image format really isn’t an image, it is simply an XML file with all the details of an image or group of images. Pretty boring. It was however a big thing in the day, even got a full guide written up for the process. “All information in the file occurs between an opening <imfl> tag and a closing </imfl> tag. This is the only tag that uses an end tag.” This format was the topic of discussion as malicious code could be in the RP file and executed just by having someone load your webpage. IMFL is obviously an acronym, but none of the documents I could find tells me what it stands for, so I did what everyone does now, I asked ChatGPT.

The RealPix format by RealNetworks, which was used for interactive multimedia content, indeed utilized IMFL as its tagged format. IMFL stands for “Interleaved Media File Language.” This markup was particularly designed to handle multimedia presentations, allowing the synchronization of images, audio, and video in a slideshow-style format. It used XML-like syntax where elements like <imfl>, <head>, and <fadein/> defined media objects, transitions, and their timing. Key components included attributes for positioning, color, and animation effects, making RealPix a flexible format for creating multimedia sequences compatible with RealPlayer.

For technical details, the RealPix format closely resembles SMIL (Synchronized Multimedia Integration Language) and supports strict tag closure and case sensitivity. This means all tags and attribute names must be lowercase, and attributes must be in double quotes, as seen in SMIL and RealSystem G2 markup, RealNetworks’ broader multimedia framework.

When I asked for a source, it could not give me one. So not sure if it is the correct answer, but it seems to fit. Here are some samples of RP, RT and SMIL files.

For RealText with the RT extension, we find a similar tagged text. This format is used to provide text presentations to go along with Images, Audio, or Video. The tagged text then describes when and how the text is displayed. This is all done in a player window, therefore the root tag of these RT documents starts and ends with <window>. I guess these could be considered a subtitle format for streaming media.

The SMIL files is interesting, it is known standard, but in many cases, does not have an XML declaration, therefore not identified by current PRONOM. They are used to link everything together. I might suggest a variant of the SMIL format to not have the XML declaration to identify these formats correctly.

<smil>
<body>
<par>
<textstream src=”rtsp://realserver.company.com/mary.rt”/>
<video src=”rtsp://realserver.company.com/mary.rm”/>
</par>
</body>
</smil>

The .RPAD RealProducer Audience File, .RPJF RealProducer Job File, .RPSD RealProducer Server Destination are all XML files for managing some of the configuration found in the RealProducer software.

cat 56k\ Dial-up.rpad
<?xml version="1.0" encoding="UTF-8"?>
<audience xmlns="http://ns.real.com/tools/audience.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.real.com/tools/audience.2.0 http://ns.real.com/tools/audience.2.0.xsd">
<avgBitrate type="uint">34000</avgBitrate>
<maxBitrate type="uint">68000</maxBitrate>
<streams>

cat RealProducer11-01.rpjf
<?xml version="1.0" encoding="UTF-8"?>
<job xmlns="http://ns.real.com/tools/job.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ns.real.com/tools/job.2.0 http://ns.real.com/tools/job.2.0.xsd">
<enableTwoPass type="bool">true</enableTwoPass>
<clipInfo>

cat Multicast\ Push\ Server.rpsd
<?xml version="1.0" encoding="UTF-8"?>
<destination xsi:type="pushServer" xmlns="http://ns.real.com/tools/server.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.real.com/tools/server.2.0 http://ns.real.com/tools/server.2.0.xsd">
<pluginName type="string">rn-server-rbs</pluginName>

Those three formats should be easy enough, especially if we look for Namespace urls.

The RAM and RPM formats are simply text files with a URL. You can find some samples here and here.

An RM and RV file are the same format as the RMVB file but just with a variable bitrate. Later on a new format was used to improve the quality of video. This format has the extension RMHD, referring to RealMedia HD. Let’s take a look.

hexdump -C DSC_0009.rmhd | head
00000000 2e 52 4d 50 00 00 00 12 00 01 00 00 00 00 00 00 |.RMP............|
00000010 00 07 50 52 4f 50 00 00 00 36 00 02 00 04 f7 33 |..PROP...6.....3|
00000020 00 04 f7 33 00 00 11 bd 00 00 02 5d 00 00 01 d2 |...3.......]....|
00000030 00 00 1b 2e 00 00 00 00 00 00 00 00 00 04 65 68 |..............eh|
00000040 00 00 01 6f 00 02 00 03 43 4f 4e 54 00 00 00 12 |...o....CONT....|
00000050 00 00 00 00 00 00 00 00 00 00 4d 44 50 52 00 00 |..........MDPR..|
00000060 00 76 00 00 00 00 00 03 24 64 00 03 24 64 00 00 |.v......$d..$d..|
00000070 11 bd 00 00 04 2a 00 00 00 00 00 00 00 00 00 00 |.....*..........|
00000080 1b 2e 0c 56 69 64 65 6f 20 53 74 72 65 61 6d 14 |...Video Stream.|
00000090 76 69 64 65 6f 2f 78 2d 70 6e 2d 72 65 61 6c 76 |video/x-pn-realv|

The format looks very similar, but has the magic header of .RMP instead of .RMF. MediaInfo and FFProbe are unaware of the format. The software mentions a RV11 codec which is confusing as the codecs went from RV10-RV60.

Phew, that was a lot considering the two formats I tried to research came up the same as an existing format. There are probably others I have missed. I did see a reference to an RMX format which seems to be an encrypted RM file. The header is the same so it will identify as a RealMedia file, but with the wrong extension. Let me know if you come across any. I have some samples of the formats mentioned here, plus a proposal of new signatures on my Github repository.

PAR

Some file formats have a unique extension. Some formats use three character extensions which are well known, so its not common for them to be used with other software. Take the extension PDF for example, pretty sure no one else will use it as it is so well known. Other extensions often get reused by a few different software titles. There are plenty of titles which use the DOC extension.

Part of defining a file format I come across is also defining other formats which use the same extension or the same basic patterns within the format. I want the format I am researching to be identified correctly, but I also don’t want other formats to falsely identify as them either.

When using the DROID tool, if a file can’t be identified using a signature, the tool will then look to see if the extension matches any formats within the PRONOM registry, if it finds one, it will identify as that format with the identification method as “Extension”. This can be confusing and dangerous.

The topic of a format came up recently in reference to the extension PAR. Lets take a look at what we know about files with the extension PAR. Using the handy tool at digipres.org, we can see there are many formats using the PAR extension.

Apparently many people like to use the extension with their software. One might think their files with the PAR extension have to be in this list, and they would be wrong in that assumption. The PRONOM registry has no records of any format using the PAR extension. Hopefully we can add a few to help with proper identification instead of using the extension only.

A PArchive or Parity Volume Set is a group of file formats used in error correction and data integrity. Only the first version used the PAR extension, it is now obsolete with version 2 being the last stable version.

hexdump -C archive.par | head
00000000 50 41 52 00 00 00 00 00 00 00 01 00 00 09 00 02 |PAR.............|
00000010 8f d0 ce 2e 21 db 3b e5 41 d5 18 be d3 0e 52 f0 |....!.;.A.....R.|
00000020 de b6 b3 9f 53 09 ff ba 16 6b ca d2 48 a6 ca 45 |....S....k..H..E|
00000030 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000040 60 00 00 00 00 00 00 00 4e 00 00 00 00 00 00 00 |`.......N.......|
00000050 ae 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 4e 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |N...............|
00000070 45 16 01 00 00 00 00 00 76 da 44 2b 43 5f b5 bd |E.......v.D+C_..|
00000080 08 7b d2 b0 2e 16 7d 86 46 75 7b 79 f0 36 75 3b |.{....}.Fu{y.6u;|
00000090 a1 14 22 f3 0c 77 85 3c 70 00 61 00 72 00 2d 00 |.."..w.<p.a.r.-.|

hexdump -C Testing.docx.par2 | head
00000000 50 41 52 32 00 50 4b 54 84 00 00 00 00 00 00 00 |PAR2.PKT........|
00000010 76 1f e0 a4 5a 32 e0 84 d9 e9 32 32 06 9f 03 ff |v...Z2....22....|
00000020 71 48 73 d5 59 c6 ae 7c c7 21 3d ba 8d e5 ea 04 |qHs.Y..|.!=.....|
00000030 50 41 52 20 32 2e 30 00 46 69 6c 65 44 65 73 63 |PAR 2.0.FileDesc|
00000040 5d 74 b5 3d 64 ae 1f d8 ae 41 f1 8c 2f 7a cc c1 |]t.=d....A../z..|
00000050 27 9b bc 61 46 21 4d 37 a3 c7 f2 07 b4 b8 df 81 |'..aF!M7........|

Pretty straightforward. The only thing that would have made it easier is if the first version used “PAR1”, but be glad they didn’t as that signature is used by another!

hexdump -C null_list.parquet | head
00000000 50 41 52 31 15 00 15 18 15 18 2c 15 02 15 00 15 |PAR1......,.....|
00000010 06 15 06 00 00 02 00 00 00 02 00 02 00 00 00 02 |................|
00000020 01 26 42 1c 15 02 19 25 00 06 19 38 09 65 6d 70 |.&B....%...8.emp|
00000030 74 79 6c 69 73 74 04 6c 69 73 74 04 69 74 65 6d |tylist.list.item|
00000040 15 00 16 02 16 3a 16 3a 26 08 3c 36 02 00 00 00 |.....:.:&.<6....|
00000050 15 02 19 4c 48 0c 61 72 72 6f 77 5f 73 63 68 65 |...LH.arrow_sche|
00000060 6d 61 15 02 00 35 02 18 09 65 6d 70 74 79 6c 69 |ma...5...emptyli|
00000070 73 74 15 02 15 06 4c 3c 00 00 00 35 04 18 04 6c |st....L<...5...l|
00000080 69 73 74 15 02 00 15 02 25 02 18 04 69 74 65 6d |ist.....%...item|
00000090 6c bc 00 00 00 16 02 19 1c 19 1c 26 42 1c 15 02 |l..........&B...|

Apache Parquet is a more modern format used to store column-oriented data. At least they used a unique file extension!

Another common bit of software which uses the PAR extension is Solid Edge by Siemens. They use the PAR extension to encode their 3D parts format. For some reason this format still uses the OLE compound object container.

7z l tinyscrew.par 

Path = tinyscrew.par
Type = Compound
Physical Size = 86528
Extension = compound
Cluster Size = 512
Sector Size = 64

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 31964 32256 PSMcluster0
..... 12 64 Versions
2001-12-19 15:44:14 D.... Display
2001-12-19 15:44:14 D.... ACIS
..... 8462 8704 ACIS/Solid1.sab
..... 238 256 PSMroots
2001-12-19 15:44:14 D.... Display/Cache0
2001-12-19 15:44:14 D.... Display/Styles
..... 1725 1728 Display/Styles/Library0
..... 12 64 Display/Styles/DefaultStyles
..... 88 128 Display/Cache0/Info
..... 4248 4608 Display/Cache0/L1-T1
..... 8 64 JSitesList
2001-12-19 15:44:14 D.... PARASOLID
..... 3389 3392 PARASOLID/STREAM434.D_B
..... 10402 10752 PARASOLID/STREAM434.P_B
..... 4 64 DocVersion2
..... 199 256 PSMclustertable
..... 8 64 PSMuserroots
..... 512 512 JVisibleData
2001-12-19 15:44:14 D.... PSMspacemap
..... 66 128 PSMspacemap/0x00002000
..... 6090 6144 PSMspacemap/0x00000000
..... 174 192 PSMspacemap/0x00004000
..... 4716 5120 PSMtypetable
..... 8 64 FamilyMembers
..... 8 64 BuildVersions
..... 150 192 PartsLiteData
..... 596 640 [5]C3teagxwOttdbfkuIaamtae3Ie
..... 476 512 [5]SummaryInformation
..... 12 64 PSMsegmenttable
..... 96 128 MSConvertedPropertyset
..... 148 192 [5]K4teagxwOttdbfkuIaamtae3Ie
..... 280 320 [5]DocumentSummaryInformation
..... 116 128 [5]SszbwomgY1udb2whAaq5u2jwCg
..... 264 320 [5]Rfunnyd1AvtdbfkuIaamtae3Ie
..... 140 192 Dynamic Attributes Metadata
..... 458 512 Unclustered Dynamic Attributes
------------------- ----- ------------ ------------ ------------------------
2001-12-19 15:44:14 75069 77824 32 files, 6 folders

We will have to use the a container signature to correctly identify this format. There are also ASM and DFT formats which are also Solid Edge formats which use the same OLE container. Hopefully there are some unique features we can use to identify them.

One other file format which uses the PAR extension is not listed in any of the registries. Not in PRONOM, TrID, Wikidata, or others. I came across it while researching another format, DVD Studio Pro. On a Macintosh computer running the now discontinued DVD Studio Pro, one could save their DVD mastering project as a “file” which used the DSPPROJ extension. I use the term file loosely here as it wasn’t actually a file, it was a folder with an extension which MacOS would interpret as a single file. These are the package formats Apple used and still uses quite frequently. Moving this folder to another other system results in a folder of content.

tree sample.dspproj 
/sample.dspproj
└── Contents
├── PkgInfo
└── Resources
├── Audio
├── MPEG
├── Menu
├── ModuleDataB
├── ObjectDataB
├── Openers.plist
├── Overlay
├── Picture
├── Render Data
│   ├── C4272B0100797459.M2V
│   └── PAR
│   └── C4272B0100797459.M2V.par
├── Styles
├── Temp
├── Templates
└── Thumbnails

14 directories, 6 files

This PAR extension is explained in the DVD Studio Pro manual:

About the Parse Files
To use an asset in a project, DVD Studio Pro needs to know some general information about it, such as its length, type, and integrity. Video assets encoded within DVD Studio Pro can include this information in the encoded files, or can create separate files for it. Assets encoded by Compressor outside of DVD Studio Pro can include this information if you select the “Add DVD Studio Pro meta-data” option in the Extras pane of the Encoder settings.
Assets encoded with other encoders, or with the “Add DVD Studio Pro meta-data” option disabled when using Compressor, must be parsed before DVD Studio Pro can use them. Parsing creates a small file, with the same name as the video asset and a “.par” extension that contains the required information. The parse file can take from several seconds to several minutes to create, depending on the size of the asset file.

hexdump -C E4712E541A60E300.M2V.par | head
00000000 56 50 41 52 00 00 00 20 00 00 00 00 00 01 e2 40 |VPAR... .......@|
00000010 00 00 00 00 00 c6 19 7c 2f 55 73 65 72 73 2f 74 |.......|/Users/t|
00000020 79 6c 65 72 2f 44 6f 63 75 6d 65 6e 74 73 2f 46 |yler/Documents/F|
00000030 69 6e 61 6c 20 52 65 6e 64 65 72 20 66 6f 72 20 |inal Render for |
00000040 44 56 44 20 56 51 42 2f 56 61 72 73 69 74 79 51 |DVD VQB/VarsityQ|
00000050 42 20 44 56 44 2f 56 61 72 73 69 74 79 51 42 2d |B DVD/VarsityQB-|
00000060 44 69 73 63 32 2e 64 73 70 70 72 6f 6a 2f 43 6f |Disc2.dspproj/Co|
00000070 6e 74 65 6e 74 73 2f 52 65 73 6f 75 72 63 65 73 |ntents/Resources|
00000080 2f 52 65 6e 64 65 72 20 44 61 74 61 2f 45 34 37 |/Render Data/E47|
00000090 31 32 45 35 34 31 41 36 30 45 33 30 30 2e 4d 32 |12E541A60E300.M2|

Parity, Parts, and Parse files, oh my.

If you thought we were done, you would be wrong! Let’s look at yet another PAR format.

hexdump -C MESSROH.PAR | head
00000000 08 69 64 73 32 30 30 30 30 d0 4e 01 51 46 42 00 |.ids20000.N.QFB.|
00000010 98 d0 4e 01 80 01 58 01 b6 b9 f7 bf 82 30 00 00 |..N...X......0..|
00000020 dc 08 00 00 60 51 f2 bf 82 30 01 59 ff ff ff ff |....`Q...0.Y....|
00000030 a4 d0 4e 01 28 3e f2 bf 78 63 a4 01 dc 08 00 0b |..N.(>..xc......|
00000040 5a 45 52 4f 2d 4f 46 46 53 45 54 01 18 0e ac 01 |ZERO-OFFSET.....|
00000050 d4 d0 4e 01 00 ac 43 00 18 0e ac 01 d4 d0 4e 01 |..N...C.......N.|
00000060 51 46 42 00 ec d0 4e 01 d4 00 4e 01 b6 b9 f7 bf |QFB...N...N.....|
00000070 5c 4c 75 81 5c 81 00 00 45 07 41 00 c0 0a 00 01 |\Lu.\...E.A.....|
00000080 cd d0 41 00 d5 d0 41 00 5c 81 00 00 dc 0a a4 01 |..A...A.\.......|
00000090 5b 5d 42 00 cc d0 4e 01 72 5d 42 00 7a 5d 42 00 |[]B...N.r]B.z]B.|

hexdump -C DUMMYDAT.PAR | head
00000000 08 73 65 69 73 6d 69 63 31 00 00 00 00 00 00 00 |.seismic1.......|
00000010 00 00 00 00 00 01 58 00 00 00 00 00 00 00 00 00 |......X.........|
00000020 00 00 00 00 00 00 00 00 00 00 01 59 00 00 00 00 |...........Y....|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a |................|
00000040 41 4b 55 53 54 49 4b 4c 4f 47 00 00 00 00 00 00 |AKUSTIKLOG......|
00000050 00 00 00 00 02 2f 2f 00 08 41 47 43 2d 47 41 49 |.....//..AGC-GAI|
00000060 4e 00 00 00 00 00 00 00 00 00 00 00 00 32 00 00 |N............2..|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

This PAR format is called “Reflexw data-format“. This is a RAW format header that always is paired with a DAT file, together used to store geophysical wave data from devices such as GPR. Relexw is software made by Sandmeier geophysical research.

The PAR file samples I have don’t seem to have a consistent header as each have a unique set of bytes, but all of them have some similar bytes later in the file at around the 0x1D8 (472) offset:

000001d0  00 00 a0 3d 00 00 a0 41  00 00 00 00 00 00 00 00  |...=...A........|
000001e0 0a d7 23 3c 00 00 80 3f 00 00 00 00 00 00 00 00 |..#<...?........|
000001f0 00 00 00 00 cc cc dc 40 00 00 00 00 00 00 00 00 |.......@........|
00000200 00 00 80 3f 00 00 00 00 00 00 00 00 00 00 00 00 |...?............|
00000210 00 00 00 00 00 00 00 00 17 b7 d1 38 00 00 00 00 |...........8....|

It seems these sequence of bytes are the only consistent bytes among all my samples. I have no idea what they mean or reference. The specification does indicate some bytes which should lead to proper identification, but the integer used for the “HeaderMarker” is looking for a 4 byte “00 00 00 01”, which won’t be enough to cleanly identify the format. Love to hear what others can see from the spec. You can find some samples files here.

So we have some Parity files, Parts files, Parse files, Parquet files, and a Header file. I am sure other will be found and added to this lot. Hopefully the PAR files you run across will match one of these patterns! I am still working on a signature proposal. Stay Tuned!