There are probably many reasons why a software developer might want to create a proprietary format to store their files in. The software may require special features that don’t fit into an existing format. I would hope a developer would try to use existing formats, or even better open formats, but for many reasons, which probably include profits, they choose to re-invent the wheel often.
MAGIX is a German company which started making software in 1994. In 2001 they developed their first video editing software which was called Movie Edit Pro. The software seems to be well received and is still in use today.
Like most video editing software, project files are used to store all the edits and links to video files. These are usually smaller text based, with many using XML as the project format. Not MAGIX, they decided to go with a different yet known format for their project files.
Yes, they used the RIFF container format for their projects. Seems an odd choice, especially for video production although it is well suited for it. AVI is another video format which uses the RIFF container. The MVP project file uses the ID SEKD with the format MVPH. Earlier versions of Movie Edit Pro used a different extension.
The MVD format used on an earlier version of Movie Edit Pro is also a RIFF, and with the ID of SEKD, but has a format of SVIP.
RIFFpad can break down the chunks we see in an MVP file. Each of the LIST chunks has their own subchunks as well. I assume this his how the editing software stores each video/audio track references, etc. So I give it to MAGIX for at least using an understandable format to store their projects.
MAGIX has also used RIFF in many of its supporting formats. So far I have found mfx, afx, ifx, cfx, ctf, tfx, ufx, mmt, mmm, hdp, each having their own format:
Not sure the best way to manage all of these in terms of identification, as I am not sure what what is the purpose of each format. Maybe for now I’ll make a generic to catch them all as a MAGIX File.
Extension
ID
FORMAT
AFX
SEKD
SAFX
CFX
SEKD
SCFX
CTF
SEKD
SVIP
HDP
SEKD
SHDP
IFX
SEKD
SIFX
MFX
SEKD
MAFX
MMM
SEKD
SVIP
MMT
SEKD
SVIP
MVD
SEKD
SVIP
MVP
SEKD
MVPH
MXM
MXMD
mxmi
TFX
SEKD
STFX
UFX
SEKD
SVIP
But, when it comes to their proprietary MAGIX Video format, I think they may have pushed things a little too far. Meet the MXV format:
I am not sure what I am looking at, is it a RIFF? Is it a RIFF variant like RF64? MAGIX claims the format is:
This is the MAGIX video format for quicker processing with MAGIX products. It offers very low loss of quality, but it cannot be played via conventional DVD players.
A look around the internet doesn’t bring much up in reference to this format. Just my recent page on the format wiki. A search for MXRIFF64 bring up nothing. But a closer look at other strings within the MXV file reveal we are probably looking at some sort of MPEG format.
I was able to locate a project on GitHub which claims to be able to demux the MXV format. The software is written in GO and appears to indicate this format is chunked based and has most of the chunks figured out. So if you find yourself stuck with some MXV files and don’t want to use the latest from MAGIX, this might be the tool for you.
This demuxer also has an interesting file you can download. It is called a “GRAMMAR” file and can be loaded into hex viewers like Synalyze It! can show the parts of a file you load. Its a great way to explore a format!
None of these formats are found in PRONOM, project files are not usually kept in archives, but if would be good to know about the RIFF files if they do turn up. The video format is for sure something the archival world should know about. MediaInfo is currently not aware of this format, but seems like it might be an easy task.
As usual, you can see some samples and my proposal signatures on my GitHub.
Many software titles we have all used began life under a different brand or even title. Larger software companies gobble up smaller developers, some brands merge, and others change names for whatever reason. Adobe has bought many smaller companies over the years, sometimes developing the acquired software and other times burying the software to avoid competition. Pagemaker was bought to give InDesign life, many Macromedia titles were incorporated or shelved. Such is life in the software world.
In understanding a file format, often times you need to follow this trail backwards to understand when file formats changed and compatibility is dropped. Often times the formats remained the same, but the extension is changed. Or the software name changes and formats are updated, but the extension remains the same. There can also be multiple titles which all use a common format, further complicating the identification of the formats.
Let’s look closer at the a title which changed names and file formats a few times over the years. Micrografx was founded in 1982 and were pretty well known for their innovation in computer graphics. They have released many titles over the years, but one of the first was In*A*Vision graphic software for Windows 1.0 in 1986. This software used a format with the .PIC extension. A couple years later version 2, was renamed to Micrografx Designer and used the .DRW extension. This extension was also used by Micrografx Draw, another similar program.
Micrografx Designer continued to be released until version 9 which is when it was purchased by Corel who continued to release new versions, although it is said the software was just a variation of CorelDraw, and now Designer is part of the CorelDraw Technical Suite. Other Micrografx software such as Picture Publisher was discontinued and customers were encouraged to use Corel’s PaintShop Pro instead. Somewhere in the middle of all this, Micrografx spun off a separate business unit called iGrafx, which Designer was marketed under for a short time.
Let’s break down the names, extensions used, and format type.
In*A*Vision & Draw, binary format, PIC extension
Micrografx Designer & Draw, binary format, DRW extension
Micrografx Designer version 4, RIFF format, DS4 & MGX extension
Micrografx Designer versions 6-9, OLE Container format, DSF extension
Micrografx/Corel Designer versions 10-12, RIFF format, DES extension
Corel Designer version X4-Current, ZIP/XML format, DES extension
You can import Corel DESIGNER files. Files from version 10 and later have the filename extension .des. Files from Micrografx versions 6 to 9 have the filename extension .dsf. Version 4 files have the filename extension .ds4. The .drw filename extension is used for a Micrografx 2.x or 3.x file. Micrografx template files (DST) are also supported.
The PRONOM registry has a few of these formats with signatures and documented, but not all, let’s see where the gaps are.
So from the PRONOM list, it appears we have good identification on the original PIC and DRW formats. Then the Designer DSF OLE container is taken care of as well. That leaves us with DS4 and DES formats.
Each RIFF format has a four byte identifier type after the first eight bytes which identify the RIFF. The DS4 file uses the code “MGX ” to identify itself. Which also appears to be used with their clipart format, MGX. We can use the same identification method we use for other RIFF’s to identify this format.
Starting with version 10 of Corel Designer, the RIFF format is used again and has a different type. With Version 10 using “DESA”, then for version 10.5:
Looks like the container holds a RIFF inside along with some thumbnails, metadata, and other things. The mimetype file simple holds “application/x-vnd.corel.designer.document+zip”. The riffData.cdr however looks like this:
Another RIFF, and seems to be in the same sequence, but going from version 12 to X4 we seemed to have skipped “DESD”. Maybe there was a developer version in between as they transitioned. Version X5 looks similar and has the RIFF sequence “DESF”. When we get to X6 the structure changes.
The mimetype remains the same, but we see additional files within the structure. Also the riffData.cdr file is missing. Looking at each file we can see the root.dat file is a RIFF and follows the same sequence.
The last sample I have is for Corel Designer 2022, but there could be more. I created new signatures for all the samples I have, you can see them in my Github as usual. I decided to group some of the versions together to simplify things a bit, but if anyone thinks they should be broken out into individual versions, let me know.
In honor of #Marchintosh, I threatened in an earlier post to discuss The Writing Center, one of the many writing programs marketed by the Learning Company for the Mac. This one was developed by Datapak Software, Inc and I think they wanted to watch the world burn.
This format was different enough from the Student Writing Center and the “Ultimate Writing & Creativity Center” to need its own post. Moreover, I am pretty sure the developers of this software were actively trying to frustrate anyone trying to document the format. Let me explain.
In the early Macintosh world, very rarely were extensions used. Current systems use extensions to link the file to an application which can open the file. On the Mac, the system would use special attributes called Type / Creator codes. These codes were registered with Apple so they would be unique to a specific software and type of file. The codes used the FourCC system and unfortunately Apple never released a full list of codes used. Some folks over the years have tried to document as many as they can. Many used simple understandable codes, for example, A Microsoft Word document has a Type / Creator of W6BN / MSWD. The creator code of MSWD is very readable, and the type code W6BN is unique to a document from version 6 of Microsoft Word.
This Sample Report file from The Writing Center, when investigated with the ResEdit tool show interesting Type / Creator codes. If we look at the hexadecimals values for the codes. The first four bytes are the Type code and the second set of 4 bytes are the Creator code.
The first thing to know is the encoding for all Type / Creator codes is MacRoman, so if we look up the hexadecimal code for “0A” we learn it is the character for a new Line Feed, why in the world would you use the line feed character? The developers must have had a sense of humor, or are psychopaths, and I’m leaning toward the latter. Trying to put this character into any sort of spreadsheet or text based document with other codes throws everything off! When I try and use a spreadsheet with a group of codes and then use a script to look them up on the command line I get crazy formatting. Not to mentioned the second character in the creator code is “1A” which is a substitute character.
This is just one example of crazy characters being used in Type / Creator codes. Stay tuned for more on these in future discussions.
Even though the Type / Creator codes are very useful in identification of this format, often times the Finder attribute is lost. This can happen if the file is moved off an HFS disk, usually a network or through the internet. Then all we have is the binary data fork and a file with no extension. So finding a signature to identify this format is useful.
Looking at the hexadecimal values of the header of a couple samples doesn’t initially look promising, the first few bytes are very different meaning there is no magic bytes at the beginning of the file. In fact the only thing the same is the mention of the Geneva font used in the document. Looking further into the files.
The only bytes I could find near the beginning that seemed semi consistent is the highlighted bytes above. I did however notice some consistent bytes at the end of each of the files.
The four bytes at the end of each file by themselves would not be a good signature as there are many formats which end with a few “FF” sequences. But maybe combined with bytes near the beginning, a signature might be found. I added a couple samples to my Github page if you would like to take a look. In order to retain the extended attributes, I encoded the files as MacBinary.
lsar -L "Sample Report.bin"
Sample Report.bin: MacBinary
Sample Report:
Name: Sample Report
Size: 29.4 KB (29,404 bytes)
Compressed size: 29.4 KB (29,440 bytes)
Last modified: Thursday, July 25, 1991 at 12:58:20 PM
Created: Saturday, October 13, 1990 at 1:10:54 AM
Mac OS type code: ?WP1 (0x0a575031)
Mac OS creator code: ??WP (0x0a1a5750)
Mac OS Finder flags: 0x0100
Index in file: 0
Length of embedded data: 29404
Start of embedded data: 128
Original archive entry: Is an embedded MacBinary file: Yes
I think when most of us have some data to sort or make sense of, we tend to gravitate toward a spreadsheet. Using Excel or LibreOffice, or if you really like to party, OpenRefine. There are plenty of meme’s out there representing the frustration people have with bugs, features and limitations of Excel specifically.
There are more tools out there for making sense of data, one some people have access to is Microsoft’s more advanced PowerBI tool. Marketed as a Data Visualization tool it is accessible to many with a Office 365 subscription. It offers expanded features than excel and isn’t as limited in row maximums.
PowerBi was recently the topic of a Code4Lib editorial issue. The writer of an article for their journal posted two PowerBI datasets which a reader later noticed had private data. After some miscommunications and misunderstandings an open letter was drafted and received some support. Code4Lib did release a statement and lessons were learned.
One statement from the Code4Lib staff caught my eye. “The released files were in a proprietary file format, Microsoft Power BI, with which none of the editors have experience.”
We all use tools for our jobs we are most familiar or available to us. No one can be an expert in all file formats. Some us try, but things change so fast it is impossible. But, we can do more in documenting and making formats identifiable through the tools we use for digital preservation. The File Format Wiki and PRONOM have had no mention of Power BI, so let’s change that.
Microsoft Power BI was released in 2011 and has been part of the Microsoft Power Platform. Power BI can gather data from many sources. The software can be accessed in the Office 365 cloud, but also using a Desktop application. In the desktop application, all the data sources and connections are stored in a single file with the extension PBIX. But there are other related formats.
Just like many modern Microsoft formats it is a ZIP container with a mixture of XML and JSON. There is also a DataModel file along with Settings and Connections. A quick peek at some of the contents shows us:
So it looks like the ZIP structure follows the standard for OpenXML packages as it contains a “[Content_Types].xml” file. So using this XML alone would clash with too many other formats. From what I could find the “DataModel” file is what stores the data is more unique to this format, even though the name is pretty generic. Using a string within the file would probably help be more accurate. The “DataModel” file does have unicode double byte strings we can use. “STREAM_STORAGE_SIGNATURE” seems like a unique enough string to use, but it looks like it may not be unique to PBIX. Looks like the “DataModel” file is a Microsoft “MS-XLDM” file format and is a “Spreadsheet Data Model File Format“.
There is a variation to the DataModel file and I am not sure when the standard is used verses this variation, “This backup was created using XPress9 compression”. Not sure if it is versioning or how the file is saved, but they both seem to function correctly.
After a bit of digging it seems like the MS-XLDM format can be found within an XSLX file. I found an example with these datasets. Within an XSLX there can be a found a file “xl/model/item.data” and it has the same structure as DataModel within a PBIX.
Because this file has a different filename and is in a different path, using “DataModel” should keep identification specific to a PBIX file.
The Power BI Report has a template option. This format uses the .PBIT extension and doesn’t contain any data only a template to use with other data. The structure is roughly the same, but doesn’t contain the “DataModel” file, but “DataModelSchema”, which appears to be a JSON file.
The DataModelSchema JSON has some plain text strings which could be used for identification. Later in the file there is a string, “defaultPowerBIDataSourceVersion“.
The amazing Ashley recently did a little writeup on the Sibelius music notation software. I thought I would take the opportunity to talk about another music notation software which needs a little update. Finale was created in 1987 for the Macintosh by a company called Coda Music and became quite popular with musicians and composers. The ability to use a computer to typeset a musical score was a huge advancement. This was all possible by the use of music notation fonts.
Finale was originally written by Coda Music Technology, owned for a time by Net4Music, now currently owned by MakeMusic. Over the years there has been additional products developed along side Finale.
The first version of Finale was developed for the Macintosh and didn’t have an extension. But by version 3.5 there was a comparable Windows version and the use of the extension .MUS. In order to share the files between the different platforms Finale also created an ETF file, which instead of the binary MUS the ETF is a plain text “transportable” file.
Both formats are based on the Enigma or “Environment for Notation Intuitive Graphic Music Algorithms” format. These formats were last used with Finale 2012 when a new format took over in 2014. Let’s start from the beginning.
By Version 3 we see the format stabilize and this header is used until Finale 2012. There was other various products which also used the format so there is some variation.
The current PRONOM identification for fmt/397 is looking for the “ENIGMA BINARY FILE” bytes but also the string “Finale(R)”, so this PrintMusic variation is not identified correctly.
Another format that is a little more rare to see, but is part of the Finale formats collection. Finale Performance Assessment File (.fpa) is an older format discontinued in 2007, but has a similar format. It was a tool similar to the current SmartMusic tool.
The current signature of ETF files is only able to correctly identify the later version of the string in all caps. The fmt/398 PRONOM ID could use an alternate signature to ensure all variations are identified correctly. There is a couple versions of the specification out there, but does not add much to what is known.
Starting in 2014 Finale starting using a new file format to store its notations. The native format now uses the MUSX extension. This new format uses a ZIP container to store all the data. Let’s take a look at the inside.
It seems the presence of the NotationMetadata.xml file and the mimetype would be sufficient for identification in a container signature.
The current version of Finale can export to a few different “Music XML” versions. This includes MUSICXML, regular XML, and a compressed MXL file. The only one needs attention is the compressed MXL file and added to PRONOM. It already has a PUID, fmt/897, but no signature. Here is what it looks like inside the ZIP container.
Looks like a standard identifiable MUSICXML file within the container with a mimetype of “application/vnd.recordare.musicxml”. The MUSICXML file will be impossible to use for identification because of the variable file name, but the mimetype should do just fine.
Hopefully that covers all the major formats that need identification. I saw on a list that I will soon be working on an old Macintosh which has hundreds of Finale files, I hope these updates cover those needs! Take a look at my GitHub for my signatures and plenty of samples.
The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.
I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.
SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.
There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.
We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.
Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.
The SolidWorks 2000 format added additional files to the container which can help.
Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:
The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.
More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.
Is there a perfect raster image format? TIFF has been around quite some time and is generally accepted as a preferred preservation format. There have been a few attempts to have a single file contain multiple resolutions with the purpose of providing resolutions for different uses, lower-resolution for web and higher-resolution for print. Even the semi popular JPEG2000 added multiple resolutions to improve the JPEG format. Kodak came up with a few ideas to do this as well. The Kodak PCD, PhotoCD or Image PAC files was one that was used for awhile before it was abandoned. Another was FlashPix.
I briefly mentioned FlashPix on an earlier post about the Microsoft Picture It! format. They are extremely similar. Both. have the same basic structure in a Compound Object format. Some of the FlashPix files generated by Picture It! even have the same identifiers in the CompObj header.
FlashPix was supposed to be the answer to all the problems with storing bitmap image data and how we view the web. Kodak partnered with some big names, Microsoft Corporation, Hewlett-Packard Company and Live Picture, Inc, were among them. Kodak marketed the format and even included it as a native file format to some of its new digital cameras. The format was made official in June of 1996, with a Whitepaper explaining all the benefits and architecture. There was a lot of hype, some even calling it, “Not your Grandma’s format“. Many graphics software started to include support for the new format, including Adobe Photoshop. So what happened, why didn’t the format catch on? Some say it was the size of storing multiple resolutions in one file, others believe it was the complicated Compound Object structure that lead to its demise. Either way, the format had a lot of hype in the late 1990’s, but by the year 2000, it had gone silent and all the websites went away.
FlashPix did have a big impact, and there were many software and hardware devices which were made compatible. There are a few stories left behind of those who scanned all their photos to the FlashPix format only to find a few years later it was unsupported on more modern computers. There was also a few early digital camera’s which could capture directly to the format. Take my Kodak DC260 zoom camera, circa 1998. Changing the Capture Preferences, I can switch between a JPG and FPX.
Using exiftool we can take a look at one of the images from the camera:
exiftool P0004795.FPX
ExifTool Version Number : 12.73
File Name : P0004795.FPX
Directory : GitHub/digicam_corpus/Kodak/DC260/DC260_01
File Size : 251 kB
File Modification Date/Time : 2024:01:06 12:54:20-07:00
File Access Date/Time : 2024:01:06 13:20:46-07:00
File Inode Change Date/Time : 2024:01:06 13:04:34-07:00
File Permissions : -rwxrwxrwx
File Type : FPX
File Type Extension : fpx
MIME Type : image/vnd.fpx
Code Page : Unicode UTF-16, little endian
Data Object ID : 13BC5A58-6B90-1B6B-12C9-0800201177F8
Data Object Status : Exists, Not Purgeable
Creating Transform : Source Image
Using Transforms :
Cached Image Height : 1024
Cached Image Width : 1536
Comp Obj User Type Len : 16
Comp Obj User Type : FlashPix_Object
Visible Outputs : 1
Maximum Image Index : 1
Maximum Transform Index : 0
Maximum Operation Index : 0
Thumbnail Clip : (Binary data 18480 bytes, use -b option to extract)
Revision Number : 1
Create Date : 2024:01:06 12:53:29
Modify Date : 2024:01:06 12:53:29
Software : KODAK DIGITAL SCIENCE DC260
Image Width : 1536
Image Height : 1024
Subimage Width : 1536
Subimage Height : 1024
Subimage Color : RGB
Subimage Numerical Format : 8-bit, Unsigned
Decimation Method : None (Full-sized Image)
JPEG Tables : (Binary data 558 bytes, use -b option to extract)
Number Of Resolutions : 1
Max JPEG Table Index : 1
Scene Type : Original Scene
Software Release : KODAK DIGITAL SCIENCE DC260
Make : Eastman Kodak Company
Camera Model Name : KODAK DIGITAL SCIENCE DC260
Serial Number : 7577
Exposure Time : 1/180
F Number : 4.7
Exposure Program : Program AE
Exposure Compensation : 0
Subject Distance : 0.520 m
Metering Mode : Center-weighted average
Light Source : Unknown
Focal Length : 24.0 mm
Max Aperture Value : 4.6
Flash : No Flash
Exposure Index : 90
Sharpness Approximation : 0
File Source : Digital Camera
Sensing Method : One-chip color area
Extension Create Date : 2024:01:06 12:53:29
Extension Modify Date : 2024:01:06 12:53:29
Creating Application : Picoss
Extension Name : ijuhsimasa
Extension Persistence : Always Valid
Extension Description : Data Object Store 000001
Storage-Stream Pathname : /Data Object Store 000001
Extension Class ID : 56616000-C154-11CE-8553-00AA00A1F95B
Used Extension Numbers : 1
Screen Nail : (Binary data 4304 bytes, use -b option to extract)
Subimage Tile Count : 384
Subimage Tile Width : 64
Subimage Tile Height : 64
Num Channels : 3
Audio Stream : (Binary data 30780 bytes, use -b option to extract)
Aperture : 4.7
Image Size : 1536x1024
Megapixels : 1.6
Shutter Speed : 1/180
Preview Image : (Binary data 4164 bytes, use -b option to extract)
Focal Length : 24.0 mm
The file also does identify in PRONOM:
sf P0004795.FPX
---
siegfried : 1.11.0
scandate : 2024-01-17T23:13:59-07:00
signature : default.sig
created : 2023-12-17T15:54:41+01:00
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'P0004795.FPX'
filesize : 250880
modified : 2024-01-06T12:54:20-07:00
errors :
matches :
- ns : 'pronom'
id : 'x-fmt/56'
format : 'Kodak FlashPix Image'
version :
mime : 'image/vnd.fpx'
class : 'Image (Raster)'
basis : 'extension match fpx; container name CompObj with byte match at 53, 36 (signature 2/2)'
warning :
If you notice, PRONOM has two signatures for the FlashPix format, this image was identified with signature #2. The first signature looks for the string “FlashPix Object”, but the second looks for the CLSID which is unique to each compound object format. FlashPix has the CLSID: {56616700-c154-11ce-8553-00aa00a1f95b}. Looking at many of the other samples I have there is much variation on the use of the string and CLSID.
The images from the Kodak Camera use “FlashPix_Object” string so with the underscore it doesn’t match the first signature, but others I made using Picture It! software used a couple variations. Many don’t use the string at all. Others use a sightly different CLSID in both uppercase and lowercase. We will have to suggest adjustments to the current signature to identify them all.
Looking at the contents of the OLE container we can see some interesting things.
Path = P0004795.FPX
Type = Compound
Physical Size = 250880
Extension = compound
Cluster Size = 512
Sector Size = 64
Size Compressed Name
------------ ------------ ------------------------
188 192 [5]Data Object 000001
272 320 [1]CompObj
388 448 [5]Extension List
144 192 [5]Global Info
Data Object Store 000001
18704 18944 [5]SummaryInformation
816 832 Data Object Store 000001/[5]Image Contents
272 320 Data Object Store 000001/[1]CompObj
988 1024 Data Object Store 000001/[5]Extension List
1624 1664 Data Object Store 000001/[5]Image Info
4332 4608 Data Object Store 000001/[5]Screen Nail_bd0100609719a180
Data Object Store 000001/Resolution 0005
Data Object Store 000001/Audio_bd0100609719a180
1112 1152 Data Object Store 000001/[5]KDC_bd0100609719a180
72 128 Data Object Store 000001/[5]SummaryInformation
108 128 Data Object Store 000001/Audio_bd0100609719a180/[5]Audio Info
30808 31232 Data Object Store 000001/Audio_bd0100609719a180/Audio Stream 000000
6208 6656 Data Object Store 000001/Resolution 0005/Subimage 0000 Header
176378 176640 Data Object Store 000001/Resolution 0005/Subimage 0000 Data
------------ ------------ ------------------------
242414 244480 16 files, 3 folders
The main CompObj is where we find the identification information, but the Data Object Store 000001 directory is where all the image data is stored. In a multiple resolution image we might see additional Resolution directories. You may also notice a mention of an Audio directory. Yes, this image was captured and then audio was recorded with it. Not a video, but an audio clip associated with the image. FlashPix can contain audio streams. This isn’t the first time we have seen this, HP camera’s also have this function which as it turns out is stored in a FlashPix exif extension within a JPEG.
The FlashPix native format may have disappeared, but the format lives on as an extension to Exif data, allowing you to embed audio and other media within a JPEG file. The code for FlashPix was given to ImageMagick and is maintained by them.
Usually in the software world file formats are fairly efficient, the structure is meant to provide a way to store the data of the software being used. There isn’t much need to add additional unnecessary additions. This isn’t always true, but in the early days, disk space was expensive so compression and efficiency ruled. There also wasn’t much need to hide anything or complicate things. That is unless it is intended. This makes me think of two things, Polyglots and Steganography.
Steganography is the art of embedding data within an image. With digital images you can hide another image within the main image by using the most and least significant bits. Fun use of technology, but not something you normally would find in your regular desktop software.
Imagine my surprise when I was researching the Picture It! software and the MIX file format only to discover Microsoft decided to make their own polyglot of sorts for their PNG Plus format which replaced the MIX format, then both obsolete when Digital Image was discontinued in 2007. The PNG Plus format was the native format for the Microsoft Picture It! and Digital Image software often found with the Microsoft Works or Digital Imaging suite of software.
Save Menu from Digital Image Pro
According to the help within Digital Image:
The PNG Plus format uses the standard PNG extension but provides saving of layers and pages within the PNG format. Since the PNG format cannot do this natively, how did Microsoft accomplish this? Well, by throwing an OLE container into the middle of the file of course!
PNG Plus files are your regular PNG format and will identify as such. But they are just a low resolution thumbnail of the full image. Let’s take a look:
exiftool PictureIt7-s02.png
ExifTool Version Number : 12.70
File Name : PictureIt7-s02.png
File Size : 26 kB
File Modification Date/Time : 2023:12:26 22:01:58-07:00
File Access Date/Time : 2024:01:01 12:31:07-07:00
File Inode Change Date/Time : 2023:12:26 22:01:58-07:00
File Permissions : -rwx------
File Type : PNG
File Type Extension : png
MIME Type : image/png
Image Width : 500
Image Height : 333
Bit Depth : 8
Color Type : RGB with Alpha
Compression : Deflate/Inflate
Filter : Adaptive
Interlace : Noninterlaced
SRGB Rendering : Perceptual
Gamma : 2.2
White Point X : 0.3127
White Point Y : 0.329
Red X : 0.64
Red Y : 0.33
Green X : 0.3
Green Y : 0.6
Blue X : 0.15
Blue Y : 0.06
Warning : [minor] Text/EXIF chunk(s) found after PNG IDAT (may be ignored by some readers)
Title : PictureIt7-s02
Image Size : 500x333
Megapixels : 0.167
Looks like there is some additional data after the IDAT chunk.
What what do we have here? Near the end of the file before the IEND chunk is an OLE file with the very recognizable hex values of “D0CF11E0“. Let’s strip out the OLE file and take a look.
Path = PictureIt7-s02-ole
Type = Compound
WARNINGS:
There are data after the end of archive
Physical Size = 8704
Tail Size = 7764
Extension = compound
Cluster Size = 512
Sector Size = 64
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2023-12-26 22:01:58 D.... DataStore
2023-12-26 22:01:58 D.... Text
..... 2560 2560 Text/CONTENTS
..... 86 128 Text/[1]CompObj
..... 96 128 DataStore/3
..... 4 64 DataStore/1
..... 121 128 DataStore/0
..... 57 64 DataStore/2
..... 98 128 DataStore/5
..... 4 64 DataStore/4
..... 1254 1280 DataStore/7
..... 4 64 DataStore/6
..... 4 64 DataStore/8
------------------- ----- ------------ ------------ ------------------------
2023-12-26 22:01:58 4288 4672 11 files, 2 folders
Interesting, I don’t think I have come across a standard format with a container embedded within. I have come across many OLE and ZIP containers which contain other common formats within, but this format is definitely unique. Others have added features in the IDAT chunk, such as a web shell. I am sure there are others out there. The CompObj file found within the Text directory is very similar to the Microsoft Works and Publisher format. Although trying to open the file in Publisher doesn’t work!
PRONOM uses binary and container signatures to identify file formats. Even though this file format contains a valid OLE container, because it is within a regular binary file format, I don’t believe a container signature would work. The difficulty will be to clearly identify this new format without falsely identifying a regular PNG instead. The OLE file format header is not in a consistent location to use a specific offset. Making the string a variable location can causes some undo processing, so lets look to see if there is anything else we can use to make a positive ID.
The PNG file format is based on chunks, you have to have IHDR, then an IDAT and the IEND chunk. If we take a look at a regular PNG file using a libpng tool pngcheck, we see this:
pngcheck -cvt rgb-8.png
File: rgb-8.png (759 bytes)
chunk IHDR at offset 0x0000c, length 13
256 x 256 image, 24-bit RGB, non-interlaced
chunk tEXt at offset 0x00025, length 44, keyword: Copyright
? 2013,2015 John Cunningham Bowler
chunk iTXt at offset 0x0005d, length 116, keyword: Licensing
compressed, language tag = en
no translated keyword, 101 bytes of UTF-8 text
chunk IDAT at offset 0x000dd, length 518
zlib: deflated, 32K window, maximum compression
chunk IEND at offset 0x002ef, length 0
No errors detected in rgb-8.png (5 chunks, 99.6% compression).
The required chunk are there, but a couple extra, the tEXt and iTXt, which are textual metadata you can add. Now lets look at a PNG Plus file:
pngcheck -cvt PictureIt7-s02.png
File: PictureIt7-s02.png (26066 bytes)
chunk IHDR at offset 0x0000c, length 13
500 x 333 image, 32-bit RGB+alpha, non-interlaced
chunk sRGB at offset 0x00025, length 1
rendering intent = perceptual
chunk gAMA at offset 0x00032, length 4: 0.45455
chunk cHRM at offset 0x00042, length 32
White x = 0.3127 y = 0.329, Red x = 0.64 y = 0.33
Green x = 0.3 y = 0.6, Blue x = 0.15 y = 0.06
chunk IDAT at offset 0x0006e, length 9460
zlib: deflated, 32K window, fast compression
chunk cmOD at offset 0x0256e, length 0
Microsoft Picture It private, ancillary, unsafe-to-copy chunk
chunk cpIp at offset 0x0257a, length 16384
Microsoft Picture It private, ancillary, safe-to-copy chunk
chunk iTXt at offset 0x06586, length 24, keyword: Title
uncompressed, no language tag
no translated keyword, 15 bytes of UTF-8 text
chunk tEXt at offset 0x065aa, length 20, keyword: Title
PictureIt7-s02
chunk IEND at offset 0x065ca, length 0
No errors detected in PictureIt7-s02.png (10 chunks, 96.1% compression).
It looks like we have the required chunks and some textual chunks but also a couple chunks which pngcheck describes as private and identify’s them as Microsoft Picture It chunks. The cpIp chunk is the one which contains the OLE container. This is the chunk we need to identify in a signature. The problem is the offset for the cpIp chunk is not the same each time. Here is one from Digital Image 10 Pro.
chunk cpIp at offset 0x737a7, length 245760
Microsoft Picture It private, ancillary, safe-to-copy chunk
Significantly further in the file that the other example. These samples currently identify as PNG 1.2 files. PRONOM fmt/13 so we can use the signature and add to it, but it currently doesn’t look for IDAT only the iTXt chunk, which is probably not optimal. For PNG Plus, lets get the header which includes IHDR, IDAT, then the cpIp chunk then an end of file sequence for IEND. Take a look at my signature and samples, I am curious how many PNG Plus files are out there hidden to the world.
Turns out there is another PNG flavor which has been enhanced to allow for layers and pages. Adobe Fireworks uses a PNG format as their native format. They also use private chunks, but not within an OLE container. They use additional chunks, but before the IDAT chunk:
It’s hard to know which each of the chunks are for and if they are all required for the Fireworks PNG format. From the book on PNG.
In addition to supporting PNG as an output format, Fireworks actually uses PNG as its native file format for day-to-day intermediate saves. This is possible thanks to PNG’s extensible “chunk-based” design, which allows programs to incorporate application-specific data in a well-defined way. Macromedia has embraced this capability, defining at least four custom chunk types that hold various things pertinent to the editor. Unfortunately, one of them (pRVW) violates the PNG naming rules by claiming to be an officially registered, public chunk type, but this was an oversight and should be fixed in version 2.0.
Most everyone has heard of Microsoft Office, the suite of applications used by millions everyday. Less people know about Microsoft Works, which was a lower cost alternative, but was quite popular as a home office suite of applications. One tool which often came with the Works suite was a digital image tool called Picture It!
Picture It! was a photo editing tool first released by Microsoft in 1996 geared to making photo editing easy and affordable.
Picture It! used a wizard type interface which walked you through acquiring an image and adding to it. One of the key features of the software was the ability to “stack” objects like layers. Because of this feature a new file format was used to save this information to disk. Meet the Microsoft Image (Picture) Extension format, commonly known as the MIX file format. It is very similar to the FlashPix image format, which was supposed to be an image file format to solve many delivery issues, but didn’t seem to gain hold despite being created by Kodak, HP, and others. In fact many of the MIX files I found on Microsoft disks are actually FlashPix files.
The MIX extension was also used by another Microsoft program, PhotoDraw, which causes confusion as they were similar, but PhotoDraw has some added features which may not be compatible with Picture It!. Both formats are based on the Microsoft Compound Object (OLE) container, and have a similar structure. Let’s take a look at a MIX file from Picture It! version 1.
7z l PictureIt1-s02.mix
--
Path = PictureIt1-s02.mix
Type = Compound
Physical Size = 48128
Extension = compound
Cluster Size = 512
Sector Size = 64
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
..... 328 384 [5]Data Object 000001
..... 396 448 [5]Transform 000004
..... 872 896 [5]Operation 000001
..... 320 320 [1]CompObj
..... 292 320 [5]Global Info
..... 872 896 [5]Operation 000002
..... 144 192 [5]Operation 000003
..... 684 704 [5]Transform 000008
..... 1028 1088 [5]Transform 000009
..... 328 384 [5]Data Object 000009
..... 324 384 [5]Data Object 000005
2023-12-27 11:04:39 D.... Data Object Store 000001
..... 328 384 [5]Data Object 000010
..... 20932 20992 [5]SummaryInformation
..... 200 256 [5]Microsoft Embedding Info
2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0001
..... 1400 1408 Data Object Store 000001/[5]Image Contents
..... 230 256 Data Object Store 000001/[1]CompObj
2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0000
..... 28 64 Data Object Store 000001/Resolution 0000/Subimage 0000 Data
..... 80 128 Data Object Store 000001/Resolution 0000/Subimage 0000 Header
2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0003
2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0002
..... 28 64 Data Object Store 000001/Resolution 0002/Subimage 0000 Data
..... 208 256 Data Object Store 000001/Resolution 0002/Subimage 0000 Header
2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0005
2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0004
..... 28 64 Data Object Store 000001/Resolution 0004/Subimage 0000 Data
..... 1792 1792 Data Object Store 000001/Resolution 0004/Subimage 0000 Header
..... 124 128 Data Object Store 000001/[5]SummaryInformation
..... 28 64 Data Object Store 000001/Resolution 0005/Subimage 0000 Data
..... 6976 7168 Data Object Store 000001/Resolution 0005/Subimage 0000 Header
..... 28 64 Data Object Store 000001/Resolution 0003/Subimage 0000 Data
..... 544 576 Data Object Store 000001/Resolution 0003/Subimage 0000 Header
..... 28 64 Data Object Store 000001/Resolution 0001/Subimage 0000 Data
..... 128 128 Data Object Store 000001/Resolution 0001/Subimage 0000 Header
------------------- ----- ------------ ------------ ------------------------
2023-12-27 11:04:39 38698 39872 29 files, 7 folders
This is a simple MIX file with one line of text, but contains a lot of content inside the OLE container. If I try and use the PRONOM registry to identify the file, I get:
sf PictureIt1-s02.mix
---
siegfried : 1.11.0
scandate : 2023-12-27T11:06:32-07:00
signature : default.sig
created : 2023-12-17T15:54:41+01:00
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'PictureIt1-s02.mix'
filesize : 48128
modified : 2023-12-27T11:04:40-07:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/111'
format : 'OLE2 Compound Document Format'
version :
mime :
class : 'Text (Structured)'
basis : 'byte match at 0, 30'
warning :
Hmm, we know it is an OLE compound document, but it should identify as a Picture It! file as PRONOM has defined a PUID for the format. fmt/936 has been defined as “Microsoft Picture It! Image File 1”. So I am not sure why this file from version 1 is not identifying correctly. Let’s take a look. The PRONOM container signature for fmt/936 is looking for this:
The container signature is looking into the OLE container for the “CompObj” file (which seems to be required), then looks for the string “Microsoft Picture It! version 1 Picture” starting at the 32nd byte. That is pretty specific. The sample file I am using as an example has the following string of bytes.
Ok, so this sample has a similar string but is missing the “version 1” text. It seems the samples used to created the PRONOM signature was working off samples which included the version 1 in the header of CompObj. Maybe when Microsoft learned they would be making a version 2, they decided a version number should be included going forward. Let’s take a look a file from version 2 to compare:
Ok, so it looks like they did update the version string for version 2. This file also does not identify correctly. A quick look at the wikipedia page for Microsoft Picture It! tells us they continued to release the software until version 10. Is there a different string for each version?
Diving into this and gathering many samples has brought a lot of variants to surface. Let’s see if we can list all the CompObj header variants.
Version 1 samples:
Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Microsoft Picture It! version 1 Picture'{56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! Collage'{56616800-C154-11CE-8553-00AA00A1F95B}
Version 2 samples:
Microsoft Picture It! version 2 Picture'{2D722850-8C4B-11D0-A96F-00A0C905410D}
Version 3 samples:
Microsoft Picture It! version 3 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}
Version 4 samples:
Microsoft Picture It! version 4 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}
PhotoDraw version 1 samples:
Microsoft PhotoDraw version 1 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D}
PhotoDraw version 2 samples:
Microsoft PhotoDraw version 2 Picture'{18B8D021-B4FD-11D0-A97E-00A0C905410D}
FlashPix samples:
FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B}
FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B}
Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B}
LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b}
FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B}
'{56616700-C154-11CE-8553-00AA00A1F95B}
Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b}
Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}
Ok, there is a lot to discuss here. First of all, it seems MIX was only used in Picture It! until version 5 (2001), then the Picture It! software used a new format, PNG Plus to store the layered stacks. More on that in a future post! Although some later versions seems to be able to open the older MIX format. Version 4 of the MIX format seems to be the last as the 2001 software had only version 4 files on it. Probably safe to say only the 4 versions are needed for identification.
You may notice the additional unique identifier I included in each format. This is called a Class ID for the OLE format, which A LOT of formats use. Each “format” has a unique ID associated with it to help distinguish it from other formats. This Unique ID could possibly be a better solution for identification. It does cross over with the PhotoDraw format, but the FlashPix format seems to have a unique ID. With all the variations in the version 1 strings, the ID remains the same. For version 3 and 4 the ID is the same, which could mean they are interchangeable. It is also the same as PhotoDraw version 1. Not to complicate things.
So it seems in order to get proper identification of these similar formats we need to:
Clean up version 1 identification for fmt/936
Add a signature for 2, 3, and 4
Add a version 2 signature for the PhotoDraw format
Add some additional signature variations for the FlashPix format.
The Class ID’s could be used to distinguish different versions and formats, but many of the ID’s are identical, this could mean they are the same format. But for now we can just add the additional variation strings and it should identify everything for now. The FlashPix format needs more research as there is so many different variations and it’s so close to the MIX format. Take a look at my GitHub submission, maybe you have some additional variations to add?
One of the important parts about Digital Preservation is to gather significant properties of the digital files we hope to preserve. This can allow us to base our risk assessments off of more data than just an extension. For example, a TIFF file is a mighty good preservation format. Well documented and adopted by the preservation community, and with hundreds if not thousands of software tools to render and make use of the format. But if a TIFF file uses compression like LZW, or if it happens to have multiple pages, those are good things to know about. Most formats might have a stable set of properties, but sometimes can have properties which adds more risk to the format becoming difficult to render or migrate.
A DNG or Digital Negative developed by Adobe was supposed to solve the issues with proprietary RAW digital camera formats. Rendering a PhaseOne IIQ file often times requires the full CaptureOne software which can be expensive. Adobe spends quite a bit of resources in adding support to its Camera RAW toolkit and adding the ability to take majority of these RAW formats and move them into a DNG. There is also more and more camera manufacturers who image directly to a DNG as their native RAW format. This is the case for Apple’s ProRAW format which uses the DNG specification.
Another manufacturer is the Insta360 camera’s. Their 360 camera’s can use two lenses to capture 180 degrees from each and then stitch into a 360 photo or video. They can capture compressed images and videos, but also in RAW. Because of the two lenses and sensors, their DNG’s can get quite large. For this reason I recently asked PRONOM to adjust their signatures to allow for a bigger offset of DNG information in the larger RAW images.
exiftool IMG_20230913_141939_00_039.dng
ExifTool Version Number : 12.70
File Name : IMG_20230913_141939_00_039.dng
File Size : 143 MB
File Type : DNG
File Type Extension : dng
MIME Type : image/x-adobe-dng
Exif Byte Order : Little-endian (Intel, II)
Subfile Type : Full-resolution image
Image Width : 5984
Image Height : 11968
Bits Per Sample : 16
Compression : Uncompressed
Photometric Interpretation : Color Filter Array
Make : Arashi Vision
Camera Model Name : Insta360 X3
DNG files are actually based on the TIFF format, TIFF/EP to be precise, which means there is some good history behind the format and understanding of its structure. DNG does add many new tags and new features, so there is much more going on. Here is a TIFFInfo view of a DNG. Lots of new tags…..
tiffinfo IMG_20230913_141939_00_039.dng
TIFFReadDirectory: Warning, Unknown field with tag 33421 (0x828d) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 33422 (0x828e) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50937 (0xc6f9) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50938 (0xc6fa) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 50940 (0xc6fc) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 51009 (0xc741) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 51107 (0xc7a3) encountered.
=== TIFF directory 0 ===
TIFF Directory at offset 0x889946c (143234156)
Subfile Type: (0 = 0x0)
Image Width: 5984 Image Length: 11968
Bits/Sample: 16
Sample Format: unsigned integer
Compression Scheme: None
Photometric Interpretation: 32803 (0x8023)
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 11968
Planar Configuration: single image plane
Make: Arashi Vision
Model: Insta360 X3
Software: v1.0.69_build1
DateTime: 2023:09:13 14:19:40
Tag 33421: 2,2
Tag 33422: 1,2,0,1
EXIFIFDOffset: 0x8
GPSIFDOffset: 0x3e6
DNGVersion: 1,3,0,0
DNGBackwardVersion: 1,3,0,0
UniqueCameraModel: Insta360 X3
An IFD (Image File Directory) is the building block of a TIFF file. A TIFF file can have multiple IFD’s within a single file. But an IFD can also be a thumbnail, metadata or GPS info. For a DNG, they use the IFD structure as well, but often, the first IFD is a lower resolution of the full image.
It can get confusing, especially for tools we use to extract metadata and significant properties from a DNG for preservation. Within Rosetta, the preservation system I use at work, there is no dedicated DNG extractor, so we use JHOVE, as it is the tool we use for our TIFF images. This presents a problem as the process only extracts properties for the first IFD assuming it is the main IFD, but in many cases it reports back the image is much smaller in pixel dimensions than it actually is. More work is needed to improve extracting correct significant properties for DNG and other RAW image formats.
Adobe released a new version of DNG this year. In June, DNG version 1.7.0.0 was finalized. The new version brought a few new features, two of which are including JPEG XL compression and a new HDR colorimetric value. In order to add JPEG XL compression DNG version 1.7 is required. Here is how one looks in exiftool, created with Adobe DNG Converter 16.1.
exiftool _MG_9375_1.dng
ExifTool Version Number : 12.70
File Name : _MG_9375_1.dng
File Size : 5.4 MB
File Type : DNG
File Type Extension : dng
MIME Type : image/x-adobe-dng
Exif Byte Order : Little-endian (Intel, II)
Make : Canon
Camera Model Name : Canon EOS DIGITAL REBEL XT
Preview Image Start : 91884
Orientation : Rotate 270 CW
Rows Per Strip : 171
Preview Image Length : 10305
Software : Adobe DNG Converter 16.1 (Macintosh)
Modify Date : 2023:12:18 11:45:06
Artist : unknown
Image Width : 3516
Image Height : 2328
Bits Per Sample : 16
Compression : JPEG XL
DNG Version : 1.7.1.0
DNG Backward Version : 1.7.1.0
I had recently submitted a new signature for DNG 1.7 to PRONOM, but I found this new DNG version falls outside the signature I created. I had made the assumption all DNG’s report their version based on the last two values of 0.0, so I created the signature to look for 1.7.0.0. This is wrong now that I can see an example of version 1.7.1.0.
In order to fix the issue, I would need to change all the DNG signatures to remove the last two bytes so:
12C601000400000001070000 would change to 12C60100040000000107
This would allow for identification if some DNG files have a point version.
The pace at which manufacturers are producing camera’s with new features is much faster than the Digital Preservation community can keep up with. As new technologies get released, we play catch up trying to identify new formats and variations to existing ones. I guess that is job security?