Microstation

July 11, 2025 by Thor Leave a comment

I recently was able to image a few Bernoulli Disks for a collection using a SCSI device I have found quite useful. The disks had been sitting around for quite some time waiting for the right tools and resources to extract the contents. I mentioned the accomplishment to a few coworkers and one asked me if I would extract the contents from their old disk they used for school back in the 1990’s. They had spent a whopping $99 at the local bookstore for a disk which held a total of 150MB. Not GB’s like we are used to now, but megabytes. I have some camera’s which takes RAW photos larger than then would fit on one disk. Once I had the data extracted from their disk, I took a look at the contents. There was a few file formats on the disk I was unfamiliar with. A quick scan with DROID revealed some matches and a few problems.

Turns out the data were files written by an old version of Bentley Microstation. The files dated from late 1995 and the disk was formatted for FAT16 which leans more to being used in a DOS system, but could have been used with the newly released Windows 95. The Bentley Microstation 95 software wasn’t released until November of 1995, so my guess is these Microstation files where created with the Microstation version 5 for DOS.

disktype HD6_imaged-004.hda 

Regular file, size 144.0 MiB (150998016 bytes)
No type and creator code
DOS/MBR partition map
Partition 4: 144.0 MiB (150978560 bytes, 294880 sectors from 32, bootable)
  Type 0x06 (FAT16)
  FAT16 file system (hints score 5 of 5)
    Volume size 143.8 MiB (150810624 bytes, 36819 clusters of 4 KiB)
    Volume name "ode 009 - I"

PRONOM has a few entries for the Microstation software:

PUID	Format Name	Format Name	Extension
x-fmt/346	Microstation CAD Drawing	95	DGN
fmt/502	Bentley V8 DGN		DGN
fmt/1626	MicroStation Symbology Resource File		RSC
fmt/1549	Bentley Microstation Hidden Line File		HLN
fmt/1358	MicroStation Base File		BSE
fmt/1183	MicroStation Material Palette		PAL
fmt/1177	MicroStation Material Library		MAT

The files found on this old Bernoulli disk gave varied results in identification. Most of the DGN files give me this multiple Identifications in DROID.

A little digging and we can learn a bit about the major formats. Integraph and Bentley used a Binary version of their drawing format, DGN, from versions 2 until 7, spanning 1987 to 2001, with the release of version 8, they made a major change to the format. Version 8 use the Microsoft OLE2 container to enhance the format allowing it to hold multiple drawings and more information about the model. With this change, the format became proprietary. Sure, they started an OpenDGN program to make the format more compatible with other systems, but required you to sign an NDA in order to get a copy of the format specifications. You had to request access and sign an NDA, which doesn’t sound “open” to me. You can read another file format researchers thoughts on this on her blog.

So I know many of these files are not Version 8 of the DGN format as they are not OLE2 containers, but the other issue is that x-fmt/346 for the Microstation CAD drawing 95 is an outline record. It has no signature. So DROID is guessing based on extension only. We need to dig deeper.

I noticed than many of the DGN files in my sample set also identified as a “Microstation Hidden Line File”, but instead of a HLN extension, they use DGN.

sf samp15.dgn 

filename : 'samp15.dgn'
filesize : 359424
modified : 1998-09-01T12:31:52-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1549'
    format  : 'Bentley Microstation Hidden Line File'
    version : 
    mime    : 
    class   : 'Model'
    basis   : 'byte match at [[0 3] [359422 2]]'
    warning : 'extension mismatch'

hexdump -C samp15.dgn | head
00000000  08 09 fe 02 01 08 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 c8 45  |............ ..E|
00000020  00 00 00 00 00 00 00 00  40 06 0c 00 01 05 dc a0  |........@.......|
00000030  ff ff ff ff ff ff ff ff  b5 8b 9f 63 b9 88 85 a7  |...........c....|
00000040  00 00 00 00 19 00 b4 86  13 00 fe be 00 00 00 00  |................|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  80 40 00 00 00 00 00 00  |.........@......|

hexdump -C samp7.dgn | head
00000000  c8 09 fe 02 01 08 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 04 7a 45  |..............zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 01 05 fc b0  |................|
00000030  ff ff ff ff ff ff ff ff  0d 00 9d b5 0c 00 74 93  |..............t.|
00000040  ff ff a6 fd 09 00 40 11  05 00 50 aa 00 00 e5 f8  |......@...P.....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Looking at a couple files in the same sample set, some use the header “08 09 fe 02 01 08 00 00” while another uses “c8 09 fe 02 01 08 00 00”. This is why samp15.dgn identifies as an HLN files as the signature matches, while samp7.dgn uses “C8” instead of “08” making it not identify as an HLN file. What is the difference and what is an HLN file?

First let’s define an HLN file. The name of the format is “Hidden Line File”, although most references refer to it as a “Visible Edges File“. Confusing, but the definition is: “a 2D or 3D DGN file that contains the edges visible in a 3D view (that is, with those edges that would be hidden, removed).”

Looking at a couple HLN files, we can see the format is the same as DGN files:

hexdump -C test-2d.hln | head
00000000  08 09 fe 02 08 01 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 7a 45  |............ .zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 00 05 fc b2  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff 5b f5 ff ff fe f9  |..........[.....|
00000040  00 00 00 00 01 00 d3 cb  01 00 36 2a 00 00 e8 03  |..........6*....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  80 40 00 00 00 00 00 00  |.........@......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C test-3d.hln | head
00000000  c8 09 fe 02 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 7a 45  |............ .zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 00 05 fc b2  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff 5b f5 ff ff fe f9  |..........[.....|
00000040  ff ff 0c fe 01 00 d3 cb  01 00 36 2a 00 00 e8 03  |..........6*....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|

Same difference between the two previous files. These two files also explain the difference between the “08” and the “c8” values. Microstation uses the first to indicate it is a 2D file and the latter to indicate a 3D file. The DGN format has been documented in libdgn and this distinction is referenced.

This presents a problem with the current PRONOM identification.

filename : 'MS95-2D.dgn'
filesize : 12288
modified : 2025-06-05T21:13:52-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1549'
    format  : 'Bentley Microstation Hidden Line File'
    version : 
    mime    : 
    class   : 'Model'
    basis   : 'byte match at [[0 3] [12286 2]]'
    warning : 'extension mismatch'

filename : 'MS95-3D.dgn'
filesize : 12800
modified : 2025-06-05T21:14:00-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/346'
    format  : 'Microstation CAD Drawing'
    version : '95'
    mime    : 
    class   : 
    basis   : 'extension match dgn'
    warning : 'match on extension only'

The 2D files mis-identify as Hidden Line Files and the 3D files are identified through extension only. We learned from a previous test that Hidden Line Files can be both 2D and 3D and are the same format as DGN, so a separate identification PUID is unnecessary, but the x-fmt/346 identification doesn’t have a signatures, so a few things need to change.

The other issue is a Hidden Line File is also available in version 8+.

filename : 'Microstationv8-s01.hln'
filesize : 7168
modified : 2025-06-05T19:48:09-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/502'
    format  : 'Bentley V8 DGN'
    version : 
    mime    : 
    class   : 'Image (Vector)'
    basis   : 'container name Dgn~H with name only'
    warning : 'extension mismatch'

They also identify as Bentley V8 DGN files, but with an extension mismatch. This should be easy to remedy with the addition of the extension HLN to the signature. The container signature seems to work well, no need to change anything.

My suggestions to fix these issues would be:

Depreciate x-fmt/346
Change name of fmt/1549 from “Bentley Microstation Hidden Line File” to “Microstation CAD Drawing” and use the version 2-7 to distinguish from v8
Change the signature for fmt/1549 from “0809FE” to “(08|C8)09FE02” no EOF of “FFFF”

The other option would be to make fmt/1549 the 2D drawing format and x-fmt/346 could be used for the 3D drawing format. What do you think?

I have uploaded a few samples to my GitHub page. Curious if your examples of DGN files match what I am seeing. There are a few other related formats that will need to be explored, but this should help for now.

miniDVD

June 20, 2025 by Thor Leave a comment

Let’s talk about the DVD format for a minute. Specifically the miniDVD media format.

DVD’s are indeed versatile, as the name implies. You can find files on them written in many different filesystems, including digital video. DVD-Video is a video format which replaced VHS tapes as a main source of home movie entertainment. Eventually the public could afford to record their own video onto these discs and enjoy them for years. With the popularity of high definition video, DVD’s are not as popular as they once were, but still provide a decent experience.

I often see the DVD-Video format in archives I work with and we use tools to “RIP” the already digital data from the disc into a new format. I use the term “RIP”, to indicate we are not digitizing the format as it already contains digital data. DVD-Video is a standard that is used on most discs and looks something like this:

tree /Volumes/VIDEO_ESSENTIALS 
/Volumes/VIDEO_ESSENTIALS
├── AUDIO_TS
└── VIDEO_TS
    ├── VIDEO_TS.BUP
    ├── VIDEO_TS.IFO
    ├── VIDEO_TS.VOB
    ├── VTS_01_0.BUP
    ├── VTS_01_0.IFO
    ├── VTS_01_0.VOB
    ├── VTS_01_1.VOB
    ├── VTS_01_2.VOB
    ├── VTS_01_3.VOB
    ├── VTS_01_4.VOB
    ├── VTS_02_0.BUP
    ├── VTS_02_0.IFO
    ├── VTS_02_0.VOB
    └── VTS_02_1.VOB

3 directories, 14 files

There is usually a AUDIO_TS and a VIDEO_TS folder. The Video folder is full of video files, but the Audio folder is always empty. Apparently is was going to be used for an audio format that was abandoned, so it remains empty. Often times I will see this folder absent on non-commercial discs.

An issue that has come up many times is often I find folks copy the folder structure from the disc to preserve the video as they would with any digital file. This can be an issue as the structure was meant for software and hardware used to access the DVD-Video format. The files by themselves can often not provide the same experience, especially if the disc contains any sort of encryption, then the files are useless. This is a complex, multi-part format and should remain together in this structure or migrated to a new format, such as an MKV for preservation.

Enter the miniDVD. It is a smaller version of the standard CD/DVD optical disc size. It was very popular as a recording medium for some digital video camera’s. Much like the Sony miniDVD handycam I own. You can pop a blank disc into the camera and it prepares it for you, which takes a couple minutes, then gives you 20 minutes of recording in high quality and up to 60 minutes with a lower quality. The discs can hold up to 1.4GB and will have the same structure as its big brother.

tree /Volumes/2025_05_23_07H36M_PM 
/Volumes/2025_05_23_07H36M_PM
└── VIDEO_TS
    ├── VIDEO_TS.BUP
    ├── VIDEO_TS.IFO
    ├── VIDEO_TS.VOB
    ├── VTS_01_0.BUP
    ├── VTS_01_0.IFO
    └── VTS_01_1.VOB

2 directories, 6 files

It is missing the AUDIO_TS folder, which is fine, but here is the catch. In order for the disc to be readable by another device, it has to be finalized!

Finalizing is an action which has to happen to any optical disc to “close” out the disc. This process adds important directory and file system data so computers and DVD Players can read the disc properly. Many camera’s like mine and other DVD Recorders require this step when you are finished recording. Unfortunately, it’s an extra step which can take a few minutes, so its is often forgotten. I have had many optical discs come to me over the years because they show up as blank or uninitialized when read on a computer. I fear many people have put them aside or thrown them away as blank, not knowing they have data on them. Luckily with most burnable discs, you can often see the difference from a blank disc and a burned disc from the underside, writable surface.

The filesystem used on most DVD-Video discs is called UDF, Universal Disk Format. It is often combined on hybrid discs with ISO-9660 and HFS for compatibility, but can be the only filesystem as well. According to the specifications, a UDF formatted disc should have a Volume recognition sequence to identify as a UDF disk. On a finalized disc I can find this sequence, but on an un-finalized disc, it is missing. This makes sense as the the disc is often seen as unformatted. A tool I use to explore a disc like this is with ISOBuster.

Another interesting feature of my Sony Handycam is the option to choose what type of disc you would like to prepare when you insert a blank disc. I get the option to choose Video or VR mode. Video is your normal DVD-Video format, but VR Mode is something a little different.

tree /Volumes/2025_05_23_08H29M_PM 
/Volumes/2025_05_23_08H29M_PM
└── DVD_RTAV
    ├── VR_MANGR.BUP
    ├── VR_MANGR.IFO
    └── VR_MOVIE.VRO

2 directories, 3 files

Instead of your expected VIDEO_TS folder, we see a DVD_RTAV folder with some different files inside. No this is a Virtual Reality mode, like I originally thought, the VR simply stands for Video Recording and is a standard. It is meant to allow for easier editing of the video format, but is not compatible with your standard DVD Player. The VRO format used is pretty cool, it is a container format, MPEG-PS, for both audio and video, also containing both 4:3 and 16:9 aspect ratios, unlike a VOB where the aspect ratio is set.

hexdump -C /Volumes/2025_05_23_08H29M_PM/DVD_RTAV/VR_MOVIE.VRO | head
00000000  00 00 01 ba 44 00 04 00  04 01 01 89 c3 f8 00 00  |....D...........|
00000010  01 bb 00 12 80 c4 e1 04  e1 7f b9 e0 e8 b8 c0 20  |............... |
00000020  bd e0 3a bf e0 02 00 00  01 bf 07 d4 50 00 00 00  |..:.........P...|
00000030  00 4d e3 00 00 00 00 00  ff ff ff ff ff 00 00 00  |.M..............|
00000040  00 00 00 00 00 00 00 00  53 4f 4e 59 5f 4d 4f 42  |........SONY_MOB|
00000050  49 4c 45 20 20 20 20 20  20 20 20 20 20 20 20 20  |ILE             |
00000060  20 20 20 20 20 20 20 20  41 52 49 5f 44 41 54 41  |        ARI_DATA|
00000070  01 02 ff ff 53 4f 4e 59  00 44 43 52 2d 44 56 44  |....SONY.DCR-DVD|
00000080  30 30 34 47 00 01 55 53  52 54 59 50 45 31 4c 4b  |004G..USRTYPE1LK|
00000090  00 10 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The VRO file does identify as a MPEG Program stream (x-fmt/386), but does contain a little extra information. My trusty copy of the book DVD Demystified has a bunch more info on this format if you are interested, you can find a copy here. The VRO format is an MPEG PS so identification is covered, but the current PRONOM signature doesn’t like the VRO extension. The BUP & IFO files on the disc are not identified. This is because the PRONOM signature, which covers both of these formats, is looking for the ASCII string “DVDVIDEO-VTS” or “DVDVIDEO-VMG”. It won’t find either of those strings as this is not the DVD-Video standard. instead it should look for the string “DVD_RTR_VMG” found in these files.

hexdump -C /Volumes/2025_05_23_08H29M_PM/DVD_RTAV/VR_MANGR.IFO | head
00000000  44 56 44 5f 52 54 52 5f  56 4d 47 30 00 00 7f ff  |DVD_RTR_VMG0....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 02 07  |................|
00000020  00 11 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  1e 5c 03 11 ff ff ff ff  ff ff ff ff ff ff ff ff  |.\..............|
00000050  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000060  ff ff 4d 41 59 20 32 33  20 32 30 32 35 20 20 20  |..MAY 23 2025   |
00000070  38 3a 32 39 50 4d 00 00  00 00 00 00 00 00 00 00  |8:29PM..........|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

I will probably suggest this addition to PRONOM for identification, but if you need to work with this format, you can use tools like: https://www.pixelbeat.org/programs/dvd-vr

DaVinci Resolve

May 2, 2025 by Thor Leave a comment

A previous post was about LUTs, the little files needed to color grade your photo’s and video’s. One of the best systems for color grading video in use by professionals today is DaVinci Resolve. The system originally was all hardware based, but in the 2004 as computers were able to process higher quality video, da Vinci Systems released new digital systems.

Like most professional multimedia editing software, projects are used to manage work and DaVinci Resolve is no different. Projects are generally where all the settings for the project are stored, but don’t generally store the actual media used in the project. Project files are often XML with unique schema’s, but other pack a little more into the project file.

hexdump -C project.drp | head
00000000  50 4b 03 04 14 00 08 00  08 00 f2 54 90 5a ef 18  |PK.........T.Z..|
00000010  b0 25 47 0c 00 00 db 1b  00 00 0b 00 00 00 70 72  |.%G...........pr|
00000020  6f 6a 65 63 74 2e 78 6d  6c 9d 58 d9 72 5b 37 12  |oject.xml.X.r[7.|
00000030  7d cf 57 68 f4 7e 4d ec  4b 8a 51 ca b1 92 89 aa  |}.Wh.~M.K.Q.....|
00000040  2c db 65 29 79 9d 6a 00  0d 85 09 45 aa 48 4a 71  |,.e)y.j....E.HJq|
00000050  fe 7e 0e ee 42 51 94 9c  68 c6 29 85 17 0d a0 d1  |.~..BQ..h.).....|
00000060  e8 3e bd 61 fe fd 97 db  e5 c9 03 6f b6 8b f5 ea  |.>.a.......o....|
00000070  bb 53 f9 46 9c 9e f0 2a  af cb 62 75 f3 dd e9 2f  |.S.F...*..bu.../|
00000080  d7 3f 75 e1 f4 fb b3 6f  e6 ff ea ba f3 f4 f6 ee  |.?u....o........|
00000090  ee 57 de 60 55 7c 23 df  98 37 42 48 79 7a 72 9e  |.W.`U|#..7BHyzr.|

DaVinci Resolve keeps all projects in a database, but you can export them to a project file. A DaVinci Resolve Project file uses a ZIP container to store all the project settings in one file. Let’s see what also might be inside.

Path = project.drp
Type = zip
Physical Size = 543860

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2018-02-27 20:25:08 .....      1010030       287793  project.xml
2018-02-27 20:25:08 .....        21173         6856  MediaPool/Master/000_Timelines/MpFolder.xml
2018-02-27 20:25:08 .....       492690        28067  MediaPool/Master/001_Audio/MpFolder.xml
2018-02-27 20:25:08 .....        20177         3588  MediaPool/Master/002_gfx/MpFolder.xml
2018-02-27 20:25:08 .....        11025         2611  MediaPool/Master/003_VO/MpFolder.xml
2018-02-27 20:25:08 .....        98309         7042  MediaPool/Master/004_ScreenCaptures Consolidated/MpFolder.xml
2018-02-27 20:25:08 .....      1278493        66424  MediaPool/Master/005_Video H264/MpFolder.xml
2018-02-27 20:25:08 .....         1995          748  MediaPool/Master/MpFolder.xml
2018-02-27 20:25:08 .....      1638204       137086  SeqContainer/909a0a2c-4183-4310-9f78-6e15c3c59cb4.xml
2018-02-27 20:25:08 .....         8806         1169  Gallery.xml
2018-02-27 20:25:08 .....        12697          696  media.dat
------------------- ----- ------------ ------------  ------------------------
2018-02-27 20:25:08            4593599       542080  11 files

Looks like a lot of XML! The consistent XML in all the DRP files is the apply named “project.xml” along with “Gallery.xml”.

cat project.xml | head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="19.1.4.0011" DbPrjVer="14"-->
<SM_Project DbId="db65f2ee-2bff-41cd-b478-f96c26e9609f">
 <FieldsBlob>000000010000000700000026005400650078007400520065006e006400650072004900740065006d005600650063004200410000000c00ffffffff0000002400520065006e0064006500720043006100630068006500560065007200730069006f006e0000000200000000010000001e00500072006f006a00650063007400460065006100740075007200650073000000050000000000000000010000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000030000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800330033003400320034003300380036002d0034006400330030002d0034003600610035002d0061006100340033002d006100330035003200620066006500370038003200640063000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
 <LockId/>
 <User>86f03abc-9354-47d9-9006-a55b6b1d49cf</User>
 <Folder/>
 <UserId>-1</UserId>
 <SysId>6CB133A11B81</SysId>
 <ProjectId>0</ProjectId>

It appears the version of DaVinci Resolve is pretty important. If you try and open a DRP file without using the most up-to-date software you might run into problems. From what I can see, every time a new major version is released, the updates to the XML cause the project error when imported. So knowing the version of the DRP file can be a critical piece of metadata needed in understanding the format. There are some helpful apps created by DaVinci Resolve users you can try, or you can try a little python script to report back the version used in a DRP or whole folder of DRP files.

There is one other file used by the DaVinci Resolve software. It uses the DRT extension and is for exporting and importing single timelines to the software. Like a DRP it is a simple project file that only points to the media used in the project and only stores the settings needed.

Path = timeline.drt
Type = zip
Physical Size = 215159

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2021-04-21 21:16:42 .....        45726         8888  project.xml
2021-04-21 21:16:42 .....       670306       198698  MediaPool/Master/MpFolder.xml
2021-04-21 21:16:42 .....        98268         7089  SeqContainer/7eb849f3-41cb-4e3f-baa8-d5b134b57aa7.xml
------------------- ----- ------------ ------------  ------------------------
2021-04-21 21:16:42             814300       214675  3 files

This DRT file also has a project.xml file, but doesn’t have the Gallery.xml file we normally find in a DRP file. We can use this to distinguish the difference. The project.xml is the same as the DRP, so this distinction is important.

cat project.xml |head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="17.1.1.0009" DbPrjVer="10"-->
<SM_Project DbId="ec6cb2e2-0b3c-43b8-8f90-a5fcb973af3b">
 <FieldsBlob>00000001000000040000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000020000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800660030003800380038003300390038002d0066006400620037002d0034006300320036002d0061003700310032002d003300360038006200300036003300300065003400330031000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
 <LockId/>
 <User>04d71873-a504-40c6-bde5-41709691a2c9</User>
 <Folder/>
 <UserId>-1</UserId>
 <SysId>94F6D6F3F60F</SysId>
 <ProjectId>0</ProjectId>

In both formats they use the XML root tag of “SM_Project”, this can also be used to define a signature for the two formats as “project.xml” could be used with a different format and we don’t want there to be a false identification.

I was able to trace back the use of the DRP format back to DaVinci Resolve version 9. In version 8, it appears projects are exported using the name and extension, “Default Project.resolve.zip”. From what I could find, DaVinci Resolve version 9 was a big re-write and so it makes sense to settle on more useful extension. The project.xml file in a version 8 format is slightly different.

cat project.xml | head
<SM_Project DbId="9ba0c4dc-d99c-4b7f-b0da-d254d91e34e2" DbAppVer="8.2 (#153)">
 <LockId></LockId>
 <User>159415b8-7515-43bf-b5f5-00d98949434b</User>
 <UserId>-1</UserId>
 <SysId>7cd1c388ea29</SysId>
 <ProjectId>0</ProjectId>
 <RevivalTaskSetID>-1</RevivalTaskSetID>
 <PlayHeadsSplitDisplay>false</PlayHeadsSplitDisplay>
 <pGallery>
  <Gallery::GyGallery DbId="9884d8ff-096e-4df0-b833-0e75e6e07e15">

Still uses the “SM_Project” root tag, but displays the DbAppVer information differently. It would be good to find more examples of the version 8 and earlier to see how this format has evolved over time. For now, I have created a signature you can test if you happen to have any DRP files in your archive.

Camtasia

February 14, 2025 by Thor Leave a comment

Not to be confused with Fantasia, a magical screen recording tool has been around for many years. Books have been written on the use of this software to instruct others on how to teach and demonstrate other software and ideas.

Unlike Fantasia, the screen recording software Camtasia was not made by Disney, but does contain some proprietary data. Camtasia is a screen recording software by the developer TechSmith. First released in 2002, it was available first for Windows and much later, Macintosh.

The first versions of Camtasia would encode screen recordings in an AVI container, using the TSCC codec. The TSCC codec, aka TechSmith Screen Capture Codec, was developed by TechSmith and the codec was distributed freely. Let’s see what MediaInfo knows about it.

mediainfo Camtasia1-s01.avi 
General
Complete name                            : Camtasia1-s01.avi
Format                                   : AVI
Format/Info                              : Audio Video Interleave
Format settings                          : BitmapInfoHeader
File size                                : 1.66 MiB
Duration                                 : 2 s 333 ms
Overall bit rate                         : 5 966 kb/s
Frame rate                               : 15.000 FPS

Video
ID                                       : 0
Format                                   : TechSmith
Codec ID                                 : tscc
Codec ID/Info                            : TechSmith Screen Capture
Duration                                 : 2 s 333 ms
Bit rate                                 : 87.3 kb/s
Width                                    : 320 pixels
Height                                   : 240 pixels
Display aspect ratio                     : 4:3
Frame rate                               : 15.000 FPS
Bit depth                                : 8 bits
Bits/(Pixel*Frame)                       : 0.076
Stream size                              : 24.9 KiB (1%)

The AVI video format was the default recording format for the first couple versions. In version 3 the default format changed to the proprietary CAMREC format.

Camrec video files are a proprietary TechSmith file format that is used to store multiple files and information in a single package. Overall, .camrec files store your screen and camera recording plus some meta data about the various streams.
However, it is important to note that you cannot view or play .camrec files outside of Camtasia Studio.

The CAMREC video format isn’t entirely proprietary and uses a common container.

hexdump -C Camtasia3-s01.camrec | head
00000000  d0 cf 11 e0 a1 b1 1a e1  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  3e 00 04 00 fe ff 0c 00  |........>.......|
00000020  06 00 00 00 00 00 00 00  01 00 00 00 02 00 00 00  |................|
00000030  01 00 00 00 00 00 00 00  00 10 00 00 02 00 00 00  |................|
00000040  01 00 00 00 fe ff ff ff  00 00 00 00 00 00 00 00  |................|
00000050  fc 03 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000060  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

file Camtasia3-s01.camrec 
Camtasia3-s01.camrec: Composite Document File V2 Document, Cannot read section info

7z l Camtasia3-s01.camrec  

Scanning the drive for archives:
1 file, 4696576 bytes (4587 KiB)                

Path = Camtasia3-s01.camrec
Type = Compound
ERRORS:
Unexpected end of archive
Physical Size = 4698112
Extension = compound
Cluster Size = 4096
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....         3912         3968  manifest.camxml
                    .....      4672000      4673536  Screen_Stream.avi
------------------- ----- ------------ ------------  ------------------------
                               4675912      4677504  2 files

The CAMREC file might be unknown to most video players, but the AVI within the compound object is the same as the versions before it. Camtasia even has a built in extractor if you really need to pull the AVI out of the format.

7z l Camtasia8-s01.camrec
Path = Camtasia8-s01.camrec
Type = Compound
Physical Size = 33849344
Extension = compound
Cluster Size = 4096
Sector Size = 64

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....         4798         8192  manifest.camxml
                    .....         4286         8192  cursor-1.ico
                    .....          766          768  cursor-0.ico
                    .....         9565        12288  Events.dat
                    .....           36           64  Keyboard.dat
                    .....     33764198     33767424  Screen_Stream.avi
------------------- ----- ------------ ------------  ------------------------
                              33783649     33796928  6 files

Each CAMREC file contains a manifest.camxml. They seem to be UTF-16 XML files, with and without the XML declaration. The Screen_Steam.avi file seems to be in all my samples, but not clear if there can be a variant without an AVI file.

This CAMREC container was used in the Camtasia Studio software until version 8.4 when the default was changed to a new Codec, based on MPEG4, with the TREC extension.

mediainfo capture-1.trec 
General
Complete name                            : capture-1.trec
Format                                   : MPEG-4
Format profile                           : Base Media / Version 2
Codec ID                                 : mp42 (mp42/isom)
File size                                : 277 KiB
Duration                                 : 3 s 41 ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 746 kb/s
Frame rate                               : 19.091 FPS
Encoded date                             : 2025-02-11 03:48:25 UTC
Tagged date                              : 2025-02-11 03:48:34 UTC
FileExtension_Invalid                    : braw mov mp4 m4v m4a m4b m4p m4r 3ga 3gpa 3gpp 3gp 3gpp2 3g2 k3g jpm jpx mqv ismv isma ismt f4a f4b f4v

Video
ID                                       : 1
Format                                   : tsc2-D0
Codec ID                                 : tsc2-D0
Duration                                 : 2 s 933 ms
Bit rate                                 : 495 kb/s
Width                                    : 924 pixels
Height                                   : 696 pixels
Display aspect ratio                     : 4:3
Frame rate mode                          : Variable
Frame rate                               : 19.091 FPS
Minimum frame rate                       : 10.000 FPS
Maximum frame rate                       : 30.000 FPS
Bits/(Pixel*Frame)                       : 0.040
Stream size                              : 177 KiB (64%)
Title                                    : 100
Encoded date                             : 2025-02-11 03:48:25 UTC
Tagged date                              : 2025-02-11 03:48:34 UTC

TechSmith Recording File (TREC) files will identify as an MP4 in most identification tools, you will need MediaInfo or other tools to understand the codec used. If we look at the header of the MP4 TREC file:

hexdump -C Camtasia8.4-s01.trec | head
00000000  00 00 00 18 66 74 79 70  6d 70 34 32 00 00 00 00  |....ftypmp42....|
00000010  6d 70 34 32 69 73 6f 6d  00 00 00 88 66 72 65 65  |mp42isom....free|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000a0  00 00 00 01 6d 64 61 74  00 00 00 00 01 1f c4 b9  |....mdat........|
000000b0  01 02 02 01 40 00 00 00  7f 7f 7f 7f 7f 7f 7f 7f  |....@...........|
000000c0  7f 7f 7f 7f 7f 7f 7f 7f  7f 7f 7f 7f 7f 7f 7f 7f  |................|
*
000000f0  7f 7f 7f 7f 7f 7f 7f 63  da 11 00 00 d6 46 18 e0  |.......c.....F..|
00000100  77 ff 43 ff aa e4 eb 9c  dc 8f 9a 56 7a 30 71 ea  |w.C........Vz0q.|

We see the standard header for an MP4 file. The codec specific to the Camtasia software is identified later in the file, but identification using a PRONOM signature might be challenging. In looking at the hex of the file, near the end, you can find embedded PNG’s and other data. VLC and FFMPEG can read the codec, but players like Quicktime struggle.

A promising section near the end shows the name and version of Camtasia Studio. More data needed.

hexdump -C Camtasia8.4-s02.trec
0569a3b0  00 00 00 00 00 00 00 00  00 00 01 54 53 43 52 00  |...........TSCR.|
0569a3c0  00 00 00 00 00 00 50 01  00 00 00 00 00 00 00 43  |......P........C|
0569a3d0  61 6d 74 61 73 69 61 20  53 74 75 64 69 6f 00 00  |amtasia Studio..|
0569a3e0  00 00 00 38 2e 34 00 00  00 00 00 00 00 00 00 00  |...8.4..........|
0569a3f0  00 00 00 57 69 6e 00 60  bb c4 00 00 00 00 00 00  |...Win.`........|
0569a400  00 00 00 00 00 00 00 00  00 00 01 54 53 43 4d 00  |...........TSCM.|
0569a410  00 00 00 00 00 95 2c 00  00 00 01 44 41 54 41 00  |......,....DATA.|

Camtasia also uses a lot of Project files to managing the video editing process of your screen recordings. The project files can vary between the Windows and Macintosh versions.

The older versions of Camtasia for Windows up until version 8.4, used the CAMPROJ extension for their projects. These are in XML and simply use “<Project_Data>” for the root element. With Version 8 having a later element “<CSMLData>” to manage the assets. Other projects also have a File element that begins with either “tscrec4://” or “TSCRec://”. But it may be best to identify the older versions with the “<ClipBin_Array>” element.

For Mac version 2, they used CMPROJ for the Project, but also it was an Apple Bundle/Package file. It also used a recording file with the extension CMREC, but is also Apple Bundle/Package file which contains MOV and DAT files.

The most recent versions of Camtasia for Mac and windows use the TSCPROJ extension. They are plan text files with some resemblance of JSON.

{
  "title" : "",
  "description" : "",
  "author" : "",
  "width" : 854.0,
  "height" : 480.0,
  "version" : "0.5",
  "editRate" : 30,
  "authoringClientName" :  {
    "name" : "Camtasia",
    "platform" : "Mac",
    "version" : "3.1.7"
  }

There are a few formats related to Camtasia, but the CAMREC format is the one that shows up the most in my work. So today I am only proposing a signature for CAMREC and the CAMPROJ formats. We will have to have some discussion on the TREC format to determine if standard MPEG-4 identification is fine or if the format needs its own PUID. You can find some examples and my proposed signature on my Github page.

CD Architect

December 13, 2024 by Thor Leave a comment

Receiving electronic media from an outside source can be an adventure. Often times you find yourself sorting the valuable files and separating them from the chaff. There can be hidden files, cache files, application files, drivers, and everything in between. Determining what formats are important can sometimes be difficult, especially if you don’t know the file format of some of the files.

I was recently working on a collection of files which had been produced through some audio software. When working with audio, a WAVE file is what is usually kept as they contain the actual audio data. With these files they came with a couple other formats. One of those formats was a bunch of SFK peak files. These files are meant to be temporary as they are generated from the WAVE file to make opening of audio data faster. They are important, but can easily be regenerated. One could argue they have historical value, but also they don’t contain anything that can be used by itself, so alone they don’t have much value.

The other format found with the WAVE files have a CDP extension. These came up as unknown when using DROID. It is not a common extension so finding the name of the software which created the files wasn’t too hard. Let’s take a look at one of them.

hexdump -C tutor1.cdp | head
00000000  52 49 46 46 79 03 00 00  53 46 50 4a 66 6d 74 20  |RIFFy...SFPJfmt |
00000010  18 00 00 00 00 00 01 00  02 00 00 00 10 00 00 00  |................|
00000020  44 ac 00 00 03 00 00 00  01 00 00 00 4c 49 53 54  |D...........LIST|
00000030  88 00 00 00 66 6c 73 74  66 69 6c 65 23 00 00 00  |....flstfile#...|
00000040  44 3a 5c 53 6f 75 6e 64  73 5c 4e 65 77 20 54 75  |D:\Sounds\New Tu|
00000050  74 6f 72 20 66 69 6c 65  73 5c 53 6f 6e 67 33 2e  |tor files\Song3.|
00000060  77 61 76 00 66 69 6c 65  23 00 00 00 44 3a 5c 53  |wav.file#...D:\S|
00000070  6f 75 6e 64 73 5c 4e 65  77 20 54 75 74 6f 72 20  |ounds\New Tutor |
00000080  66 69 6c 65 73 5c 53 6f  6e 67 32 2e 77 61 76 00  |files\Song2.wav.|
00000090  66 69 6c 65 23 00 00 00  44 3a 5c 53 6f 75 6e 64  |file#...D:\Sound|

Huh, this is a RIFF file. RIFF is most commonly used as the container used for WAVE and AVI files. You can read more about the RIFF format on a previous post. The RIFF container format can be used for all sorts of things. Looking at the internals we can see a few unique list chunk’s.

Lots of references to other files, specifically WAVE files. But not a lot of actual data. That is because this format turns out to be just a project format for some software called “CD Architect“. Sonic Foundry was an audio software developer for a few years before they sold their catalog to Sony in 2003. In looking at the manual for CD Architect version 5.2, it explains the CDP Project format.

CD Architect software handles the organization of your CD using a small project file (CDP) that saves information about source file locations, edits, cuts, and insertion points. This project file is not a multimedia file, but is instead used to create the CD when editing is finished.

Looking at another CDP file from the collection, I noticed something different.

hexdump -C CDArch50a-s01.cdp | head
00000000  72 69 66 66 2e 91 cf 11  a5 d6 28 db 04 c1 00 00  |riff......(.....|
00000010  20 0a 00 00 00 00 00 00  84 38 15 b3 da 08 85 44  | ........8.....D|
00000020  b2 2a 5b 70 a1 32 15 ff  5a 2d 8f b2 0f 23 d2 11  |.*[p.2..Z-...#..|
00000030  86 af 00 c0 4f 8e db 8a  00 02 00 00 00 00 00 00  |....O...........|
00000040  78 00 00 00 00 00 04 00  11 00 00 00 44 ac 00 00  |x...........D...|
00000050  00 00 00 00 00 c0 52 40  00 00 00 00 00 00 5e 40  |......R@......^@|
00000060  00 00 00 00 00 00 00 00  04 00 04 00 40 00 00 00  |............@...|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 7c 00 00 00  |............|...|
00000080  50 00 00 00 a0 00 00 00  00 00 00 00 00 00 00 00  |P...............|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

That’s odd, the RIFF format is always uppercase ASCII, this is lowercase. Also the important RIFF form, which was “SFPJ” in the other sample, is missing. This is not a valid RIFF format.

But further down in the file I can see the same list chunks. Did they take RIFF format and make a proprietary version of their own? I think they may have. It seems the first example was from CD Architect version 4 and these other files are from CD Architect version 5. That complicates things. Sony stopped developing CD Architect after version 5.2d and maintained it for a few years before selling many of their titles to MAGIX Software. As far as I know there was never any new versions released. The software was very popular, as it had some really nice audio mastering features and was easy to use. Many were upset when the software was abandoned.

Creating a signature for both version 4 and version 5 CDP files will be pretty straightforward. I feel knowing what you have in a collection you are processing is the first step in making informed decisions. Wether or not you keep the project files are up for debate. Some may only want the final audio created from a CD Architect project, while others may want to see the way the audio was put together and mixed. Either way, the more you know…..

One more thing. CD Architect would default to saving a CDP project file, but could also save a “CD Image file”. This process actually would save the project to a full WAVE file with some extras baked in.

An image file is essentially a wave file with volume, crossfades, effects, mixes, and track information embedded. Burning an image file will reduce the risk of buffer underruns (especially if you have a complex project or are using a slow computer) since no audio processing is required.

Interesting, normally when working with track information in a single WAVE file you would need a companion CUE Sheet in order to reference the track layout of the Audio CD. So I am curious how they do all of this. Lets take a look at a “CD Image”.

mediainfo CDArch52d-s02.wav
General
Complete name                            : CDArch52d-s02.wav
Format                                   : Wave
Format settings                          : PcmWaveformat
File size                                : 5.05 MiB
Duration                                 : 30 s 0 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 411 kb/s
Conformance errors                       : 2
 RIFF                                    : Yes
  General compliance                     : File size 5292434 is less than expected size 5292823 (offset 0x8)
 WAVE                                    : Yes
  General compliance                     : Element size 5292811 is more than maximal permitted size 5292422 (offset 0xC)

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 30 s 0 ms
Bit rate mode                            : Constant
Bit rate                                 : 1 411.2 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 kHz
Bit depth                                : 16 bits
Stream size                              : 5.05 MiB (100%)

Already seeing some issues with the format, but all the important bits are there. JHOVE doesn’t like them much either.

JhoveView (Rel. 1.32.0, 2024-09-12)
 Date: 2024-12-11 16:01:08 MST
 RepresentationInformation: CDArch52d-s02.wav
  ReportingModule: WAVE-hul, Rel. 1.8.3 (2024-03-05)
  LastModified: 2024-12-11 15:58:02 MST
  Size: 5292434
  Format: WAVE
  Status: Not well-formed
  SignatureMatches:
   WAVE-hul
  InfoMessage: Ignored unrecognized list type: "pqls"
   ID: WAVE-HUL-15
   Offset: 5292044
  ErrorMessage: Unexpected end of file: Bytes missing = 389
   ID: WAVE-HUL-3
   Offset: 5292434
  MIMEtype: audio/vnd.wave; codec=1
  Profile: PCMWAVEFORMAT

JHOVE is giving me two issues. The major error is the file appears truncated according to both MediaInfo and JHOVE. The InfoMessage which is less of an issue but more of a heads up that the WAVE file has an extra LIST type. “PQLS”, which was also in the CPD RIFF file we looked at earlier. So it seems by making a “CD Image” of a project embeds the project chunk data into the WAVE container. Identification is not an issue as these WAVE’s follow the standard pattern and therefore identify correctly, but one might want to be aware through further characterization these WAVE’s have some not so obvious extra data.

My attempts to find any samples from version 3 of CD Architect have failed. Until then, my proposal is to add version 4 & 5 to PRONOM with the signature on my Github page. There you will find a few samples as well.

RealVideo

November 15, 2024 by Thor 1 Comment

For #WDPD24 and PRONOM Hackathon week this year, I want to find some older formats listed which did not have a signature. There is a list to choose from, but I wanted to find something I hadn’t worked on before. I came across two entries for Real Video:

PUID	Name	Extension
fmt/204	RealVideo Clip	rv
x-fmt/277	Real Video	rv

I was familiar with Real Media and Real Audio, but had yet to come across any RealVideo with the RV extension. I thought it would be easy to find some references and samples, but that was not the case. I assume PRONOM originally added these based on MIME types available.

Real or RealNetworks is/was an Internet media company who jumped on the rapidly growing World Wide Web in 1995 to become a leader in Internet Media Delivery. Their initial offerings mainly focused on audio streaming and they accomplished all of this by providing free players and web browser extensions to make it easy to serve up a website with streaming media everyone could enjoy. Later adding video streaming optimized for the slower dialup and connections of the day. They used codecs based on common technology like H.263 and H.264, but used then to make their own proprietary codecs identified through FourCC codes, RV10-RV60.

So thought it would be easy to find a reference to the RV extension, I quickly discovered it wasn’t. Looking at the Wikipedia page on RealVideo, I found no reference to the RV extension. RV is an abbreviation for RealVideo, right? Well, I ended up finding a reference in the RealAudio page under file extensions. Ok, First clue to the existence of the RV extension. The page references RV as being used for video only files and was used by the flagship encoder (RealProducer).

RealProducer was the tool for creating the streaming audio and video formats that could then be used for your website or streaming platform. The RealProducer software came in a Basic version, which was free, and the Plus or Pro version, which was not free and provided more options. The first version of RealProducer to make video files was version 4. I was able to find a copy of the encoder and installed it under a Windows 95 emulator. To my surprise it only saved to the RealMedia RM file format. This format is well known and identified with PRONOM as x-fmt/190 also documented at the LoC.

This was the same with RealProducer 5, 7, 8, 9, and 10 that I was able to try. All made no mention of the RV extension. I was starting to feel this format didn’t exist or that some decided to use the RV extension on their own. Searches on Google yielded a couple results, mostly from users who had found a few files on their older discs and wanted to migrate them to something newer. I was able to find one example, one user shared, but it had the same header as the RealMedia format. The clue was in the file.

hexdump -C ambush_abb.rv
00000000  2e 52 4d 46 00 00 00 12  00 01 00 00 00 00 00 00  |.RMF............|
00000010  00 07 50 52 4f 50 00 00  00 32 00 00 00 03 6e e8  |..PROP...2....n.|
00000020  00 03 6e e8 00 00 03 e0  00 00 01 b3 00 00 6a 6f  |..n...........jo|
00000030  00 06 80 fa 00 00 08 b5  00 ba 41 73 00 00 03 55  |..........As...U|
00000040  00 03 00 09 43 4f 4e 54  00 00 00 40 00 00 00 00  |....CONT...@....|
00000050  00 00 00 08 28 43 29 20  32 30 30 35 00 26 00 00  |....(C) 2005.&..|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000270  00 09 61 75 64 69 6f 4d  6f 64 65 00 00 00 02 00  |..audioMode.....|
00000280  06 76 6f 69 63 65 00 00  00 00 2d 00 00 0d 43 72  |.voice....-...Cr|
00000290  65 61 74 69 6f 6e 20 44  61 74 65 00 00 00 02 00  |eation Date.....|
000002a0  13 39 2f 32 30 2f 32 30  30 36 20 31 34 3a 30 37  |.9/20/2006 14:07|
000002b0  3a 30 38 00 00 00 00 53  00 00 0c 47 65 6e 65 72  |:08....S...Gener|
000002c0  61 74 65 64 20 42 79 00  00 00 02 00 3a 52 65 61  |ated By.....:Rea|
000002d0  6c 50 72 6f 64 75 63 65  72 28 52 29 20 42 61 73  |lProducer(R) Bas|
000002e0  69 63 20 31 31 2e 30 20  66 6f 72 20 57 69 6e 64  |ic 11.0 for Wind|
000002f0  6f 77 73 2c 20 42 75 69  6c 64 20 31 31 2e 30 2e  |ows, Build 11.0.|
00000300  30 2e 32 30 30 39 00 00  00 00 31 00 00 11 4d 6f  |0.2009....1...Mo|
00000310  64 69 66 69 63 61 74 69  6f 6e 20 44 61 74 65 00  |dification Date.|
00000320  00 00 02 00 13 39 2f 32  30 2f 32 30 30 36 20 31  |.....9/20/2006 1|
00000330  34 3a 30 37 3a 30 38 00  00 00 00 1d 00 00 09 76  |4:07:08........v|
00000340  69 64 65 6f 4d 6f 64 65  00 00 00 02 00 07 6e 6f  |ideoMode......no|
00000350  72 6d 61 6c 00 44 41 54  41 00 ba 3e 1e 00 00 00  |rmal.DATA..>....|

RealProducer Basic 11 for Windows. The Wikipedia article did hint at this by saying “the latest version of RealProducer reverted to using .ra for audio only files and began using .rv for video files with or without audio.” Why would they use the RM extension for so long, then revert to a different extension with a later version? I found more in the User Manual for version 11.

• .rv – RealVideo
    RealProducer uses the .rv file extension if the input is video-only or video-with-audio. You can also select the .rm file extension for video content.
    Tip: Using the .rv file extension helps search engines identify the file as a RealVideo clip.

• .rm – RealAudio or RealVideo
    RealProducer chooses the .rm file extension if it cannot determine the content of the input clip. You can use .rm file extension for any RealAudio or RealVideo clip, except for variable bit-rate clips.

Ok, so a few things to learn from this. One is the RV extension was used as the default for version 11 as they wanted search engines to identify them as a RealVideo clip. Second thing we learned is there is no difference between the two placeholders in PRONOM, one being a RealVideo file and the other being a RealVideo Clip. We don’t need both.

Now, is there any difference between an RV and RM file?

hexdump -C Producer11-01.rv | head
00000000  2e 52 4d 46 00 00 00 12  00 01 00 00 00 00 00 00  |.RMF............|
00000010  00 07 50 52 4f 50 00 00  00 32 00 00 00 03 6e e8  |..PROP...2....n.|
00000020  00 03 6e e8 00 00 03 e0  00 00 01 c7 00 00 01 66  |..n............f|
00000030  00 00 1b 57 00 00 07 41  00 02 91 0a 00 00 03 5e  |...W...A.......^|
00000040  00 03 00 09 43 4f 4e 54  00 00 00 40 00 00 00 00  |....CONT...@....|
00000050  00 00 00 08 28 43 29 20  32 30 30 35 00 26 00 00  |....(C) 2005.&..|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  00 00 00 00 4d 44 50 52  00 00 00 70 00 00 00 00  |....MDPR...p....|
00000090  00 02 c2 a4 00 02 c2 a4  00 00 03 e0 00 00 01 9f  |................|

hexdump -C Producer11-01.rm | head
00000000  2e 52 4d 46 00 00 00 12  00 01 00 00 00 00 00 00  |.RMF............|
00000010  00 07 50 52 4f 50 00 00  00 32 00 00 00 03 6e e8  |..PROP...2....n.|
00000020  00 03 6e e8 00 00 03 e0  00 00 01 a4 00 00 01 64  |..n............d|
00000030  00 00 1b 57 00 00 05 a4  00 02 5c 35 00 00 03 5e  |...W......\5...^|
00000040  00 03 00 09 43 4f 4e 54  00 00 00 40 00 00 00 00  |....CONT...@....|
00000050  00 00 00 08 28 43 29 20  32 30 30 35 00 26 00 00  |....(C) 2005.&..|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  00 00 00 00 4d 44 50 52  00 00 00 70 00 00 00 00  |....MDPR...p....|
00000090  00 02 c2 a4 00 02 c2 a4  00 00 03 e0 00 00 01 a4  |................|

They both look very similar to me. Aside from a few bytes, they are practically identical. Lets see what MediaInfo has to say.

mediainfo Producer11-01.rv
General
Complete name                            : Producer11-01.rv
Format                                   : RealMedia
File size                                : 164 KiB
Duration                                 : 6 s 999 ms
Overall bit rate                         : 225 kb/s
Frame rate                               : 24.000 FPS
Copyright                                : (C) 2005
FileExtension_Invalid                    : rm rmvb ra

Video
ID                                       : 0
Format                                   : RealVideo 4
Codec ID                                 : RV40
Codec ID/Info                            : Based on AVC (H.264), Real Player 9
Duration                                 : 6 s 999 ms
Bit rate                                 : 181 kb/s
Width                                    : 640 pixels
Height                                   : 424 pixels
Display aspect ratio                     : 3:2
Frame rate                               : 24.000 FPS
Bits/(Pixel*Frame)                       : 0.028
Stream size                              : 155 KiB (94%)

Audio
ID                                       : 1
Format                                   : Cooker
Codec ID                                 : cook
Codec ID/Info                            : Based on G.722.1, Real Player 6
Duration                                 : 7 s 429 ms
Bit rate                                 : 44.1 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 kHz
Bit depth                                : 16 bits
Stream size                              : 40.0 KiB (24%)

mediainfo Producer11-01.rm
General
Complete name                            : Producer11-01.rm
Format                                   : RealMedia
File size                                : 151 KiB
Duration                                 : 6 s 999 ms
Overall bit rate                         : 225 kb/s
Frame rate                               : 24.000 FPS
Copyright                                : (C) 2005

Video
ID                                       : 0
Format                                   : RealVideo 4
Codec ID                                 : RV40
Codec ID/Info                            : Based on AVC (H.264), Real Player 9
Duration                                 : 6 s 999 ms
Bit rate                                 : 181 kb/s
Width                                    : 640 pixels
Height                                   : 424 pixels
Display aspect ratio                     : 3:2
Frame rate                               : 24.000 FPS
Bits/(Pixel*Frame)                       : 0.028
Stream size                              : 155 KiB

Audio
ID                                       : 1
Format                                   : Cooker
Codec ID                                 : cook
Codec ID/Info                            : Based on G.722.1, Real Player 6
Bit rate                                 : 44.1 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 kHz
Bit depth                                : 16 bits

Other than the RV file having a invalid file extension, they both identify as a RealMedia file and have identical properties. So it seems the RV file is really no different than the RM file. I think the best course of action for PRONOM is to deprecate these two RV PUID’s and just ad RV as an acceptable extension for the RealMedia format.

To add to the evidence, here is the output from ffprobe:

Input #0, rm, from 'Producer11-01.rm':
  Metadata:
    copyright       : (C) 2005
    comment         : 
    ASMRuleBook     : #($Bandwidth >= 0),Stream1Bandwidth = 44100, Stream0Bandwidth = 180900;
    Audiences       : 256k DSL or Cable;
    audioMode       : music
    Creation Date   : 11/12/2024 20:28:55
    Generated By    : RealProducer(R) Plus 11.1 for Windows, Build 11.1.0.2676
    Modification Date: 11/12/2024 20:28:55
    videoMode       : normal
  Duration: 00:00:07.00, start: 0.000000, bitrate: 176 kb/s
  Stream #0:0: Video: rv40 (RV40 / 0x30345652), yuv420p, 640x424, 180 kb/s, 24 fps, 24 tbr, 1k tbn
  Stream #0:1: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s

Input #0, rm, from 'Producer11-01.rv':
  Metadata:
    copyright       : (C) 2005
    comment         : 
    ASMRuleBook     : #($Bandwidth >= 0),Stream1Bandwidth = 44100, Stream0Bandwidth = 180900;
    Audiences       : 256k DSL or Cable;
    audioMode       : music
    Creation Date   : 11/12/2024 20:28:16
    Generated By    : RealProducer(R) Plus 11.1 for Windows, Build 11.1.0.2676
    Modification Date: 11/12/2024 20:28:16
    videoMode       : normal
  Duration: 00:00:07.43, start: 0.000000, bitrate: 181 kb/s
  Stream #0:0: Video: rv40 (RV40 / 0x30345652), yuv420p, 640x424, 180 kb/s, 24 fps, 24 tbr, 1k tbn
  Stream #0:1: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s

But wait, there are a couple formats we could add which are related to RealProducer. RealProducer used a few other formats to manage projects and other metadata for streaming. They include:

.RP RealPix Image
.RT RealText
.RPAD RealProducer Audience File
.RPJF RealProducer Job File
.RPSD RealProducer Server Destination
.RMHD RealMediaHD file
.RAM Playlist
.RPM Embedded RAM

File Type	Extension	MIME Type
Ram	.ram	audio/x-pn-realaudio
Embedded Ram	.rpm	audio/x-pn-realaudio-plugin
SMIL	.smil and .smi	application/smil
RealAudio	.ra	audio/x-pn-realaudio
RealVideo	.rm	application/x-pn-realmedia
Flash	.swf	application/x-shockwave-flash
RealPix	.rp	image/vnd.rn-realpix
RealText	.rt	text/vnd.rn-realtext

https://web.archive.org/web/20120513203726/http://service.real.com/help/library/guides/production8/htmfiles/server.htm

Don’t get excited, the RealPix Image format really isn’t an image, it is simply an XML file with all the details of an image or group of images. Pretty boring. It was however a big thing in the day, even got a full guide written up for the process. “All information in the file occurs between an opening <imfl> tag and a closing </imfl> tag. This is the only tag that uses an end tag.” This format was the topic of discussion as malicious code could be in the RP file and executed just by having someone load your webpage. IMFL is obviously an acronym, but none of the documents I could find tells me what it stands for, so I did what everyone does now, I asked ChatGPT.

The RealPix format by RealNetworks, which was used for interactive multimedia content, indeed utilized IMFL as its tagged format. IMFL stands for “Interleaved Media File Language.” This markup was particularly designed to handle multimedia presentations, allowing the synchronization of images, audio, and video in a slideshow-style format. It used XML-like syntax where elements like <imfl>, <head>, and <fadein/> defined media objects, transitions, and their timing. Key components included attributes for positioning, color, and animation effects, making RealPix a flexible format for creating multimedia sequences compatible with RealPlayer.

For technical details, the RealPix format closely resembles SMIL (Synchronized Multimedia Integration Language) and supports strict tag closure and case sensitivity. This means all tags and attribute names must be lowercase, and attributes must be in double quotes, as seen in SMIL and RealSystem G2 markup, RealNetworks’ broader multimedia framework.

When I asked for a source, it could not give me one. So not sure if it is the correct answer, but it seems to fit. Here are some samples of RP, RT and SMIL files.

For RealText with the RT extension, we find a similar tagged text. This format is used to provide text presentations to go along with Images, Audio, or Video. The tagged text then describes when and how the text is displayed. This is all done in a player window, therefore the root tag of these RT documents starts and ends with <window>. I guess these could be considered a subtitle format for streaming media.

The SMIL files is interesting, it is known standard, but in many cases, does not have an XML declaration, therefore not identified by current PRONOM. They are used to link everything together. I might suggest a variant of the SMIL format to not have the XML declaration to identify these formats correctly.

<smil>
 <body>
  <par>
   <textstream src=”rtsp://realserver.company.com/mary.rt”/>
   <video src=”rtsp://realserver.company.com/mary.rm”/>
  </par>
 </body>
</smil>

The .RPAD RealProducer Audience File, .RPJF RealProducer Job File, .RPSD RealProducer Server Destination are all XML files for managing some of the configuration found in the RealProducer software.

cat 56k\ Dial-up.rpad
<?xml version="1.0" encoding="UTF-8"?>
<audience xmlns="http://ns.real.com/tools/audience.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://ns.real.com/tools/audience.2.0 http://ns.real.com/tools/audience.2.0.xsd">
  <avgBitrate type="uint">34000</avgBitrate>
  <maxBitrate type="uint">68000</maxBitrate>
  <streams>

cat RealProducer11-01.rpjf
<?xml version="1.0" encoding="UTF-8"?>
<job xmlns="http://ns.real.com/tools/job.2.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://ns.real.com/tools/job.2.0 http://ns.real.com/tools/job.2.0.xsd">
  <enableTwoPass type="bool">true</enableTwoPass>
  <clipInfo>

cat Multicast\ Push\ Server.rpsd
<?xml version="1.0" encoding="UTF-8"?>
<destination xsi:type="pushServer" xmlns="http://ns.real.com/tools/server.2.0"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://ns.real.com/tools/server.2.0 http://ns.real.com/tools/server.2.0.xsd">
  <pluginName type="string">rn-server-rbs</pluginName>

Those three formats should be easy enough, especially if we look for Namespace urls.

The RAM and RPM formats are simply text files with a URL. You can find some samples here and here.

An RM and RV file are the same format as the RMVB file but just with a variable bitrate. Later on a new format was used to improve the quality of video. This format has the extension RMHD, referring to RealMedia HD. Let’s take a look.

hexdump -C DSC_0009.rmhd | head
00000000  2e 52 4d 50 00 00 00 12  00 01 00 00 00 00 00 00  |.RMP............|
00000010  00 07 50 52 4f 50 00 00  00 36 00 02 00 04 f7 33  |..PROP...6.....3|
00000020  00 04 f7 33 00 00 11 bd  00 00 02 5d 00 00 01 d2  |...3.......]....|
00000030  00 00 1b 2e 00 00 00 00  00 00 00 00 00 04 65 68  |..............eh|
00000040  00 00 01 6f 00 02 00 03  43 4f 4e 54 00 00 00 12  |...o....CONT....|
00000050  00 00 00 00 00 00 00 00  00 00 4d 44 50 52 00 00  |..........MDPR..|
00000060  00 76 00 00 00 00 00 03  24 64 00 03 24 64 00 00  |.v......$d..$d..|
00000070  11 bd 00 00 04 2a 00 00  00 00 00 00 00 00 00 00  |.....*..........|
00000080  1b 2e 0c 56 69 64 65 6f  20 53 74 72 65 61 6d 14  |...Video Stream.|
00000090  76 69 64 65 6f 2f 78 2d  70 6e 2d 72 65 61 6c 76  |video/x-pn-realv|

The format looks very similar, but has the magic header of .RMP instead of .RMF. MediaInfo and FFProbe are unaware of the format. The software mentions a RV11 codec which is confusing as the codecs went from RV10-RV60.

Phew, that was a lot considering the two formats I tried to research came up the same as an existing format. There are probably others I have missed. I did see a reference to an RMX format which seems to be an encrypted RM file. The header is the same so it will identify as a RealMedia file, but with the wrong extension. Let me know if you come across any. I have some samples of the formats mentioned here, plus a proposal of new signatures on my Github repository.

HFE

September 27, 2024 by Thor 2 Comments

Last week I had the pleasure of attending the 20th annual iPres conference on Digital Preservation in Ghent, Belgium. I enjoyed hearing from many of my respected colleagues on many aspects of preservation including one of my favorite topics, floppy disks. There was tutorials, lightning talks, and even a workshop, presented by Leontien Talboom, Elizabeth Kata, Chris Knowles, and myself. We titled the workshop “A Guide to Imaging Obscure Floppy Disk Formats“. The workshop was conceived by a mutual interest in imaging Wang 5.25in word processor disks, but expanded to include imaging of Amstrad 3in disks, 240K Brother Typewriter Disks, and Macintosh 400/800k disks.

I brought my hand soldered FluxEngine board and others brought their Greaseweazle board to show off how imaging obscure and uncommon disks can be done on a budget.

Photo of workshop taken on a Mavica Floppy Disk camera — Image taken during workshop on a Mavica FD200 Floppy Disk Camera.

During the conference we talked a bit about the different type of hardware that can be used and the difference between a disk image and flux image. There seems to be quite the exhaustive list of different types of file formats, some specific to a platform and others more generic. I recently did a blog post on the formats used by the Applesauce software, which have some unique features.

There are many disk image types which should be researched and added to PRONOM and other format description sites, but today lets take a look at a generic format used by many tools.

The HxC Floppy Emulator file format which the extension HFE is a popular format used with floppy drive emulators. There is a lot of complexity with what is included in many of these image formats, some are simply a raw sector representation of the binary data on a disk, others contain the complete flux readings from a floppy disk. The HFE format contains a little more than a raw image, including a header, a track lookup table, and the bitstreams for each track all with the purpose of emulating the physical media. The HFE format contains only a single pass over the data, where other formats may contain multiple reading of each track to get more complete data which can be helpful for damaged or purposely copy-protected disks. You can read more on Ashley’s blog, Library of Congress format description.

When using the HxC Floppy Emulator software, you can open and save to many different formats. The main format being their HFE native format. It comes in 5 versions.

hexdump -C test01.hfe | head
00000000  48 58 43 50 49 43 46 45  00 53 02 00 e8 01 00 00  |HXCPICFE.S......|
00000010  07 01 01 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

Above is a hexdump of the main SDCard HxC Floppy Emulator file format. The format specification shows the 8 byte header “HXCPICFE”. This is a very unique pattern and should be all we need to make a robust signature for the format, but we do need to take into account the other HFE “versions” and see if they might clash or need to be identified separately.

hexdump -C test02-a2.hfe | head 
00000000  48 58 43 50 49 43 46 45  00 53 02 00 d0 03 00 00  |HXCPICFE.S......|
00000010  07 01 01 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

The “A2” version of the format has the same header but some different bytes further into the file.

hexdump -C test03-rev2.hfe | head
00000000  48 58 43 50 49 43 46 45  01 53 02 00 00 00 00 00  |HXCPICFE.S......|
00000010  07 01 01 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

The “Rev 2” version also has the same header. But if you look at the 9th byte you can see the value changed from 00 to 01, which according to the specification, this is the revision byte.

hexdump -C test04-rev3.hfe | head 
00000000  48 58 43 48 46 45 56 33  00 53 02 00 e8 01 00 00  |HXCHFEV3.S......|
00000010  07 01 01 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

With “Rev 3” we see a change in the header with “HXCHFEV3” which appears to be referred to as HFEv3.

hexdump -C test05-stream.hfe | head 
00000000  48 78 43 5f 53 74 72 65  61 6d 5f 49 6d 61 67 65  |HxC_Stream_Image|
00000010  00 00 00 00 00 00 00 00  00 18 00 00 00 02 00 00  |................|
00000020  00 1a 00 00 53 00 00 00  02 00 00 00 40 9c 00 00  |....S.......@...|
00000030  07 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This last format seems to be a special HxC stream image.

It seems the best option is to make three signatures to identify the three main headers. Additional software can be used to further parse the disk image. If you would like to see some sample images, you can download a bunch here. You can also take a look at my GitHub repository to see additional samples and a proposed set of signatures.

ATRAC

September 6, 2024 by Thor 3 Comments

The year was 2001 and I found myself in need of an audio player and recorder. I had been burning CD’s for a few years, making mixed CD’s was fun and convenient, but I needed more flexibility. After some research I decided on a device that was super popular outside the United States, but was gaining some loyal fans.

This MZ-G750 MiniDisc device could record in a standard high quality mode through RCA, optical digital cable, and an optional microphone in mini-plug. This model also had the LP2 and LP4 modes which compressed higher, but could record up to 320 minutes on one MD disc.

Sony accomplished this by using a propriety compression codec called ATRAC, or Adaptive TRansform Acoustic Coding. This compression format was used with the MiniDisc and other Sony devices like the flash memory Walkman’s sold later.

I recorded and stored a lot of music on the few disc’s I purchased over the next year, but as you may have surmised, the iPod came out later that year. I waited a bit but eventually purchased the updated 10GB model and the MiniDisc only was used to make a few recordings over the next little while.

As good as the MiniDisc is, the model I owned could record in a digital format, but lacked the connections to transfer the audio to a computer unless you used the optical cable and captured in real time to a computer with an optical input. This was by design, even when they put USB ports on later models, the software only allowed sending audio to the MiniDisc, but not back from the device.

A few years back I heard of some work the community has done to bring MiniDisc’s back from shadows. Now there is a thriving market and some models can cost a pretty penny. With that came some great tools and the ability to copy from the device back to the computer. The only problem, my device lacks a USB port. I kept my eye out for a “good” deal on a NetMD MiniDisc device. It took some time, but I am happy to report I am now the proud owner of a MZ-N420D.

With a new USB capable NetMD in hand, lets take a look at the different ATRAC formats!

The most common ATRAC formats are the ATRAC3 versions which generally have the extension OMA or OMG. But let’s start with ATRAC1, the format used on my earlier MiniDisc device when captured in Standard Mode. Using the amazing https://webmd.pro/ tool, I was able to connect my new device and “archive” my disc.

hexdump -C Test1.aea | head
00000000  00 08 00 00 54 65 73 74  31 00 00 00 00 00 00 00  |....Test1.......|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 00 00 00 1e 01 00 00  02 00 00 00 00 00 00 00  |................|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000b50  0c a0 45 57 54 44 32 35  41 44 22 34 32 24 13 23  |..EWTD25AD"42$.#|
00000b60  32 23 22 12 11 11 11 11  76 18 69 75 f8 63 69 a7  |2#".....v.iu.ci.|
00000b70  a4 5d 46 22 45 36 1f 59  55 9d 41 55 19 51 45 17  |.]F"E6.YU.AU.QE.|
00000b80  45 14 55 38 c2 cb 2c b2  88 26 fd b2 17 b3 f0 0f  |E.U8..,..&......|

ffprobe -i Test1.aea
[aea @ 0x7fc5e6c04fc0] Estimating duration from bitrate, this may be inaccurate
Input #0, aea, from 'Test1.aea':
  Duration: 00:00:01.63, bitrate: 302 kb/s
  Stream #0:0: Audio: atrac1, 44100 Hz, stereo, fltp, 292 kb/s

ATRAC1 files can have the AEA extension, which ffmpeg can decode, but MediaInfo doesn’t appear to have added the support. According to the decoder the magic numbers for the ATRAC1 format are “Magic is ‘00 08 00 00‘ in little-endian”. This pattern matches my files, but the recent addition PRONOM fmt/1968 doesn’t match all the samples I have.

The magic numbers are too simple to be the only pattern used in a signature. The Track title follows the magic numbers but are not static. Then there are quite a bit of zero bytes, like a lot. All the samples I have seem to have some data around the 260 offset, then more zero bytes until around 2400 to 2800 byte offset range. I scanned all the samples I have through Tridscan, and it looks like the only bytes in common are the magic header, lots of zero’s, and a few strings.

	<GlobalStrings>
		<String>ED33</String>
		<String>EUD3</String>
		<String>FTDC</String>
		<String>T322</String>
		<String>TC32</String>
		<String>TC43</String>
		<String>UC22</String>
		<String>UED3</String>
		<String>VD33</String>
		<String>VETC</String>
		<String>WEDD</String>
	</GlobalStrings>

The ffmpeg libavformat code does tell us at byte 264 there will be a 01 or 02 which indicates channels. 44.1 kHz is assumed and the bitrate is calculated from a constant by how many channels, so not much else to identify common patterns. More testing needed.

ATRAC3 is what allowed my original MiniDisc to record in LP2 and LP4, extending the recording time. This format was also how some DRM was added to the device and computer to allow for some checking-in and checking-out of files, but to control their use. This was done with Desktop software from Sony, originally in the form of the title SonicStage, later incorporating OpenMG to manage the DRM. I used SonicStage to encode some audio into OMG and OMA formats.

OpenMG format files

These are audio files which have been converted to ATRAC3 format and encrypted in OpenMG format, which is the copyright protection technology for audio contents specific to OpenMG (with the extension .omg).

hexdump -C 01-Untitled.omg | head
00000000  30 80 30 80 06 07 66 6f  70 65 6e 4d 47 02 02 03  |0.0...fopenMG...|
00000010  eb 04 14 01 0f 50 00 00  04 00 00 00 ba d0 90 49  |.....P.........I|
00000020  3d 7f 61 7b 91 c4 30 06  02 67 01 02 02 3f 00 06  |=.a{..0..g...?..|
00000030  02 68 01 02 04 00 59 47  80 02 01 00 02 03 02 03  |.h....YG........|
00000040  a0 02 02 01 80 02 01 00  00 00 04 08 f5 94 79 c9  |..............y.|
00000050  6b 78 75 22 04 84 00 59  5e 30 83 0b 71 39 e3 e8  |kxu"...Y^0..q9..|
00000060  27 29 00 00 00 00 00 00  00 00 26 e2 65 d0 de e0  |')........&.e...|
00000070  69 19 73 45 1c c4 3b 36  8d 02 3b 72 bd eb 84 df  |i.sE..;6..;r....|
00000080  cd 20 4e 43 d3 e3 23 8a  3f 9e df 80 f1 86 d1 aa  |. NC..#.?.......|
00000090  2b 93 bf 09 59 0d d6 8f  78 5d 45 3a 9f d8 79 8b  |+...Y...x]E:..y.|

ffprobe -i /01-Untitled.omg 
[oma @ 0x7fed2440e980] Format oma detected only with low score of 1, misdetection possible!
[oma @ 0x7fed2440e980] Couldn't find the EA3 header !
/01-Untitled.omg: Invalid data found when processing input

The good news is there appears to be a standard header for the OMG format, but ffmpeg assumes they are OMA files. Turns out OMG was the original form of the format, but was replaced with OMA starting with SonicStage v2.1.

hexdump -C 01-Untitled.oma | head
00000000  65 61 33 03 00 00 00 00  17 76 54 49 54 32 00 00  |ea3......vTIT2..|
00000010  00 17 00 00 02 00 55 00  6e 00 74 00 69 00 74 00  |......U.n.t.i.t.|
00000020  6c 00 65 00 64 00 28 00  31 00 29 54 41 4c 42 00  |l.e.d.(.1.)TALB.|
00000030  00 00 11 00 00 02 00 55  00 6e 00 74 00 69 00 74  |.......U.n.t.i.t|
00000040  00 6c 00 65 00 64 54 58  58 58 00 00 00 17 00 00  |.l.e.dTXXX......|
00000050  02 00 4f 00 4d 00 47 00  5f 00 54 00 52 00 41 00  |..O.M.G._.T.R.A.|
00000060  43 00 4b 00 00 00 31 54  58 58 58 00 00 00 25 00  |C.K...1TXXX...%.|
00000070  00 02 00 4f 00 4d 00 47  00 5f 00 41 00 4c 00 42  |...O.M.G._.A.L.B|
00000080  00 4d 00 53 00 00 00 55  00 6e 00 74 00 69 00 74  |.M.S...U.n.t.i.t|
00000090  00 6c 00 65 00 64 54 58  58 58 00 00 00 23 00 00  |.l.e.dTXXX...#..|
*
00000c00  45 41 33 03 00 60 ff 80  00 00 00 00 01 0f 50 00  |EA3..`........P.|
00000c10  00 04 00 00 00 60 8a 07  e3 0a c9 91 63 46 c6 bc  |.....`......cF..|
00000c20  22 52 03 76 00 05 66 48  00 00 3b 86 00 00 00 00  |"R.v..fH..;.....|
00000c30  00 00 20 30 00 00 00 00  00 00 00 00 00 00 00 00  |.. 0............|
00000c40  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

ffprobe -i 01-Untitled.oma
Input #0, oma, from '01-Untitled.oma':
  Metadata:
    title           : Untitled(1)
    album           : Untitled
    OMG_TRACK       : 1
    OMG_ALBMS       : Untitled
    OMG_ASGTM       : 2366000
    OMG_TIT2S       : Untitled(1)
    TLEN            : 353000
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Audio: atrac3al ([34][0][0][0] / 0x0022), 44100 Hz, stereo, fltp

We learned from trying an OMG file in ffprobe that ffmpeg is looking for EA3 header, which is found in this OMA file. Both of these formats should have a nice header to work from for a signature. In fact there has already been a request and signature submitted for the OMA format. Mine are slightly different, but only takes a small tweak to work with all my samples. Also, it seems the extension AA3 was used for awhile before settling on OMA. OMA can have a few different types:

ffprobe -i 02-Untitled.oma 
[oma @ 0x7fbc7ef047c0] Estimating duration from bitrate, this may be inaccurate
Input #0, oma, from '/Star Trek/02-Untitled.oma':
  Metadata:
    title           : Untitled(2)
    album           : Star Trek
    OMG_TRACK       : 2
    OMG_ALBMS       : Star Trek
    OMG_ASGTM       : 2366000
    OMG_TIT2S       : Untitled(2)
    TLEN            : 27000
  Duration: 00:00:27.21, start: 0.000000, bitrate: 193 kb/s
  Stream #0:0: Audio: atrac3p ([1][0][0][0] / 0x0001), 44100 Hz, stereo, fltp, 192 kb/s

I’ll leave the technical properties to be handled by tools more suited for parsing the format like ffmpeg. Maybe MediaInfo could have the formats added, but until then, it might be best to simply identify the main format. I am also aware of some later additions to the ATRAC family, such as ATRAC3plus, ATRAC Advanced Lossless, and ATRAC9 (WAV RIFF). There are other extensions like AT3 out there which use the ATRAC codec, like Sony’s Playstation or PSP. I will have to keep my eyes out for the even more elusive Hi-MD MiniDisc devices to find out more. For now, take a look at some samples and my proposal for signatures on my GitHub.

Worldox

August 23, 2024 by Thor 1 Comment

Most File Systems have unique ways for doing things, but also many things in common. On a Macintosh you might have some extended attributes, or that pesky hidden .DS_Store file no one really knows why it’s there. On Windows you may find a hidden thumbs.db file throwing off your file count. Hidden files are everywhere. Many have a real purpose, and that purpose may be insignificant or important in finding or giving context to other files.

While processing a collection from a USB drive the other day, I came across a few files I hadn’t seen before. They were hidden files nestled in with a few folders of PDF’s. They have a unique name, so I figured it would be easy to find some documentation on them on the web. Turns out, there is very little.

-rwx------@ 1 tyler  staff    235 Aug 22 00:04 XNAME.CRS
-rwx------@ 1 tyler  staff    235 Aug 22 00:04 XNAME.LIB

The files were only a couple years old, so I figured there had to be some modern software which created them. A look inside the files with a hex editor didn’t provide much information.

hexdump -C XNAME.LIB 
00000000  22 80 21 36 00 00 00 00  00 00 00 00 00 00 00 00  |".!6............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000090  00 00 00 00 00 00 4c 3c  55 6e 61 73 73 69 67 6e  |......L<Unassign|
000000a0  65 64 3e 00 00 00 00 00  00 00 00 00 00 00 00 00  |ed>.............|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

I was about to give up since there wasn’t much data and since they were hidden files, I assumed they were probably just some cached files with little value. But wanting to learn more I did some more digging and at first thought they might have something to do with DropBox, as a user said they just showed up one day, but later found they probably were created by some Document Management software known as Worldox. I found a support page claiming these two files are part of a database.

XNAME.LIB	Contains document numbers (DOS names), extended names, and file security information.
XNAME.CRS	Contains custom profile field and version control information.

There is a key term in the definition of XNAME.LIB, “extended names”. I was curious what that meant and found Worldox has been around awhile. The World Software Corporation has been around since 1988 and Worldox was released in 1993, but before that it specialized in an interesting DOS software package called “Extend-A-Name” or “Extend-A-File”. The name gives away its purpose, it literally extends the name of the limited 8 Characters you could use in DOS. I can remember trying to decide on a filename that would accurately describe my file so I knew what it was later on. 8 characters is not enough to explain the content of a file, especially if you have hundreds or thousands of file to manage.

Extend-a-file was software which bonded with another piece of software like WordPerfect and loaded itself in memory. Then when you went to create a file or locate a file within WordPerfect, Extend-a-File would take over and allow you to create a file with a traditional 8 Character name, but also a name much longer.

This extended name allowed you to describe the files content with much more detail. Making it also very easy to find previous documents.

Pretty slick, this software really would make a big difference to managing a large amount of files in the old DOS days. Ok, it adds extended names, but where is this information stored? That is where the XNAME files come into play.

hexdump -C XNAME.LIB | head
00000000  6d 92 15 59 47 47 15 00  00 00 00 00 00 00 00 00  |m..YGG..........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000090  00 00 00 00 00 00 4c 10  20 4d 41 53 54 45 52 20  |......L. MASTER |
000000a0  4c 49 42 52 41 52 59 20  2d 20 41 6c 6c 20 46 69  |LIBRARY - All Fi|
000000b0  6c 65 73 20 4c 69 73 74  65 64 00 20 64 72 69 76  |les Listed. driv|
000000c0  65 20 43 20 00 58 4e 50  4c 55 53 2e 24 24 24 00  |e C .XNPLUS.$$$.|
000000d0  fd 05 fe 49 6e 73 75 66  66 69 63 69 65 6e 74 20  |...Insufficient |
000000e0  64 69 73 6b 20 73 70 61  63 65 20 46 54 68 69 73  |disk space FThis|
000000f0  20 69 73 20 61 20 74 65  73 74 20 6f 66 20 58 4e  | is a test of XN|
00000100  41 4d 45 20 20 20 20 20  20 20 20 20 20 20 20 20  |AME             |
00000110  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000120  20 20 20 20 20 20 20 20  00 56 44 53 30 30 30 30  |        .VDS0000|
00000130  2e 44 4f 43 00 00 96 00  56 44 53 00 00 00 00 4f  |.DOC....VDS....O|
00000140  55 30 30 30 30 30 30 0a  09 1d 00 02 00 00 00 00  |U000000.........|
00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This XNAME.LIB was generated by a running copy of XNPLUS circa 1990, bonded with a copy of DisplayWrite 4. It adds much more information within the “Library”.

So it seems this method of storing the extended filenames and other metadata started in the Extend-a-File software and has been brought along and used in modern versions of the Worldox Document Management software. Much of its purpose to extend an 8 character filename to around 60 character is no longer needed as most systems now allow for filenames with at least 256 characters. I imagine there is more the software can add to these files, but found that the samples I have really don’t have any information in them at all. The Worldox software seems to be marketed toward law firms and others who have a lot of documents to manage, but I have been unable to find a way to play with the software to see what can be embedded within the XNAME.LIB files.

There is also some discussion out there about wether to backup these two hidden files and what might happen if they are lost. Regardless, you may want to think twice before tossing them as I almost did. They could contain valuable information needed to give context.

I am not sure it is possible to have a good signature for identification of these files. The samples I have and others I found online, here, here, here, and here, just don’t have much data within them. In fact they are all exactly 235 bytes. The only consistent byte within them and the samples I generated from XNPLUS is “4C” at offset 150, but everything else seems arbitrary. Here is a sample I generated from XNPLUS if you want to take a closer look.

A2R / MOOF / WOZ

August 16, 2024 by Thor 2 Comments

There seems to be a never ending growing list of disk image formats. Many have features which are specific to the media and format. If you have ever imaged an older Macintosh floppy you know they are special. If you add in copy-protection which many early Apple II floppies have, and you need special drives, hardware, and a special format to store the floppy data.

When imaging special media, especially with unique media, it is best practice to image the floppies at the magnetic flux level.

Floppy disks contain magnetic fluctuations which are measured and recorded using specialized equipment. A popular method is using a Kryoflux board, floppy drive, and software. The software communicates with a custom controller board connected to a floppy drive through USB. If you are interested in the different controller boards, a good list has been compiled here.

A Kryoflux, fluxengine, greaseweazle, all can image specialized disks like a Macintosh 800k floppy, but the best controller board for them is an Applesauce setup. They are specifically designed to for the task. With that task, comes a few specialty formats.

A file format which can store flux data is a bit different than a regular disk image format. The flux data contains all the low-level recordings which can then be interpreted into disk images much like the original floppy. In the case of an Applesauce flux image, it can contain all the small nuances of the original floppy, this includes recording any copy protection or other creative methods used by software vendors throughout the years. The format used for storing this flux data is the A2R format.

A2R is in its third iteration. Let’s take a look at the basics of the format.

hexdump -C Samplev3.a2r | head
00000000  41 32 52 33 ff 0a 0d 0a  49 4e 46 4f 25 00 00 00  |A2R3....INFO%...|
00000010  01 41 70 70 6c 65 73 61  75 63 65 20 76 31 2e 38  |.Applesauce v1.8|
00000020  38 2e 35 20 20 20 20 20  20 20 20 20 20 20 20 20  |8.5             |
00000030  20 02 01 01 00 52 57 43  50 e9 49 6e 01 01 24 f4  | ....RWCP.In..$.|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 43 01 00  |.............C..|
00000050  00 01 27 3a 25 00 91 d9  00 00 21 20 21 21 21 21  |..':%.....! !!!!|
00000060  1f 21 21 21 21 1f 24 5e  24 1f 21 21 20 21 24 5c  |.!!!!.$^$.!! !$\|
00000070  24 20 21 21 21 1f 24 5c  25 21 21 1f 21 21 23 5b  |$ !!!.$\%!!.!!#[|
00000080  25 20 21 21 21 1f 21 22  23 3f 41 3f 26 3e 43 3f  |% !!!.!"#?A?&>C?|
00000090  43 5f 41 27 3d 61 41 27  3d 61 3f 28 3e 61 3f 26  |C_A'=aA'=a?(>a?&|

hexdump -C Samplev2.a2r | head
00000000  41 32 52 32 ff 0a 0d 0a  49 4e 46 4f 24 00 00 00  |A2R2....INFO$...|
00000010  01 41 70 70 6c 65 73 61  75 63 65 20 76 31 2e 31  |.Applesauce v1.1|
00000020  2e 36 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |.6              |
00000030  20 02 01 01 53 54 52 4d  75 17 5d 01 00 01 e6 da  | ...STRMu.].....|
00000040  00 00 83 a9 12 00 12 1e  11 13 1e 13 1e 13 11 1f  |................|
00000050  21 1f 11 13 1c 14 1e 30  14 20 1e 14 1e 14 1c 14  |!......0. ......|
00000060  1c 13 11 20 21 1f 11 11  0f 13 1e 14 1c 14 2e 21  |... !..........!|
00000070  13 1e 13 1e 14 1e 11 11  20 21 1f 11 11 13 1e 1f  |........ !......|
00000080  13 20 30 21 11 11 0f 13  1e 13 11 30 1f 21 20 13  |. 0!.......0.! .|
00000090  11 30 1f 14 1e 30 14 1e  11 11 11 1e 13 11 1e 14  |.0...0..........|

The A2R format uses a chunk system to store the various pieces to the format. Earlier versions used a STRM Chunk to store all the raw flux data. Version 3 changed to a RWCP Chunk to store all the raw flux data. Applesauce uses a 2-pass imaging process, doing a rapid imaging to determine where on the media surface track data exists and then a second pass that captures longer durations for processing and error correction.

Once the full raw flux data has been captured that data can be interpreted as a disk image. The Applesauce software is able to make a regular disk image, a Disk Copy 4.2 file, which are well known and identify in PRONOM as fmt/625, but can also create a couple of special disk image formats which allow for special nuances on an original disk.

The WOZ Disk Image format is an offshoot of the Applesauce project. Capturing highly accurate bit data is of no use if you don’t have a container to hold the data. The WOZ format was designed to be able to contain every possible Apple ][ disk structure and layout. It can be so accurate that even copy protected software can’t tell that it isn’t an original disk.

The WOZ format has become very popular in the Apple II community and is ideal for emulating all the old games and software titles popular in the early 1980’s. You may have guessed where the name comes from. The internet archive has a large collection of WOZ disks in their WOZ-a-Day collection. The file format of a WOZ disk image is also a chunk based format similar to the A2R format, it has two versions. Let’s take a look.

hexdump -C WOZ 1.0/Blazing Paddles (Baudville).woz | head
00000000  57 4f 5a 31 ff 0a 0d 0a  f6 f5 92 d6 49 4e 46 4f  |WOZ1........INFO|
00000010  3c 00 00 00 01 01 00 01  01 41 70 70 6c 65 73 61  |<........Applesa|
00000020  75 63 65 20 76 30 2e 32  36 20 20 20 20 20 20 20  |uce v0.26       |
00000030  20 20 20 20 20 20 20 20  20 00 00 00 00 00 00 00  |         .......|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  54 4d 41 50 a0 00 00 00  00 00 ff 01 01 01 ff 02  |TMAP............|
00000060  02 02 ff 03 03 03 ff 04  04 04 ff 05 05 05 ff 06  |................|
00000070  06 06 ff 07 07 07 ff 08  08 08 ff 09 09 09 ff 0a  |................|
00000080  0a 0a ff 0b 0b 0b ff 0c  0c 0c ff 0d 0d 0d ff 0e  |................|
00000090  0e 0e ff 0f 0f 0f ff 10  10 10 ff 11 11 11 ff 12  |................|

hexdump -C WOZ 2.0/Blazing Paddles (Baudville).woz | head
00000000  57 4f 5a 32 ff 0a 0d 0a  21 da c2 c8 49 4e 46 4f  |WOZ2....!...INFO|
00000010  3c 00 00 00 02 01 00 01  01 41 70 70 6c 65 73 61  |<........Applesa|
00000020  75 63 65 20 76 31 2e 31  20 20 20 20 20 20 20 20  |uce v1.1        |
00000030  20 20 20 20 20 20 20 20  20 01 01 20 00 00 00 00  |         .. ....|
00000040  0d 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  54 4d 41 50 a0 00 00 00  00 00 ff 01 01 01 ff 02  |TMAP............|
00000060  02 02 ff 03 03 03 ff 04  04 04 ff 05 05 05 ff 06  |................|
00000070  06 06 ff 07 07 07 ff 08  08 08 ff 09 09 09 ff 0a  |................|
00000080  0a 0a ff 0b 0b 0b ff 0c  0c 0c ff 0d 0d 0d ff 0e  |................|
00000090  0e 0e ff 0f 0f 0f ff 10  10 10 ff 11 11 11 ff 12  |................|

Unlike a common disk image, a WOZ image contains more than the bits on the disk, it contains a mapping of all the tracks and the associated data, this is how it can even contain copy-protection usually only possible with a physical disk. The ‘TMAP’ chunk contains a track map and the ‘TRKS’ chunk contains all the data.

What the WOZ is for the Apple II, MOOF was made for the Macintosh. You may wonder what is with the funny name, but there is a long history around “Clarus the Dogcow”. I’m sure this factoid will help you impress your friends or win at trivia night. Again, the purpose of the special format for Macintosh disks is to allow for emulating disks, even with copy protection. You can also find quite the collection of old Macintosh software in the MOOF format on the Internet Archive, even emulate your favorite game, such as Dark Castle, which I played for hours as a kid. Also a chunk based format, let’s take a look at the header.

hexdump -C Dark Castle v1.0 - Disk 1.moof | head
00000000  4d 4f 4f 46 ff 0a 0d 0a  b5 75 f9 4e 49 4e 46 4f  |MOOF.....u.NINFO|
00000010  3c 00 00 00 01 01 00 01  10 41 70 70 6c 65 73 61  |<........Applesa|
00000020  75 63 65 20 76 31 2e 37  33 20 20 20 20 20 20 20  |uce v1.73       |
00000030  20 20 20 20 20 20 20 20  20 00 13 00 00 00 00 00  |         .......|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  54 4d 41 50 a0 00 00 00  00 ff 01 ff 02 ff 03 ff  |TMAP............|
00000060  04 ff 05 ff 06 ff 07 ff  08 ff 09 ff 0a ff 0b ff  |................|
00000070  0c ff 0d ff 0e ff 0f ff  10 ff 11 ff 12 ff 13 ff  |................|
00000080  14 ff 15 ff 16 ff 17 ff  18 ff 19 ff 1a ff 1b ff  |................|
00000090  1c ff 1d ff 1e ff 1f ff  20 ff 21 ff 22 ff 23 ff  |........ .!.".#.|

All three formats created for imaging and emulating Apple and Macintosh software are well documented and open. They are also well suited for preservation as they can contain extensive metadata in the INFO chunk which gives provenance information on the source of the files. The Applesauce software even has a camera to photograph the disk itself for archiving. All of this makes these formats great for preservation and emulation. Take a look at my proposal for a signature on my Github.