In honor of World Digital Preservation Day, I wanted to write a little about format headers, the magic that makes some files more easily identifiable than others.
When it comes to binary file formats, some developers decide to make the format clearly identifiable in a header and others choose to make it ambiguous. Others have a little fun with leaving little clues and references to popular culture.
A couple of my favorites based on their header.
- Early CoolEdit / Audition files began with the string “COOLNESS”.
- Medi8or format with string “MatchWare Medi8or Version 3.00”
- MacCaption with string “File Format=MacCaption_MCC V2.0”
- HyperWriter format with string “HyperWriter!”
- ExpressPublisher and AnFX Java Movie with hex values “CAFEBEEF”
- TIFF format which has at bytes 2-3 a “An arbitrary but carefully chosen number (42)“
A couple of my current least favorites:
- MP3 format, which can have no header just frames and which clash with everything.
- Canvas format which the early versions (CVS) have no standard header.
- Leica Cyclone PTS format with just point cloud data, no headers.
- Adobe Flash (FLA) later versions where the ZIP container is non standard and throws a Central Directory error.
Like I said some developers make it very obvious what software created the file format and others seem to make things difficult. I understand there is a need to optimize files to keep them from getting bloated and taking up too much space, but many of the size limits from the early days of computing are not an issue anymore. Can’t we be more clear when designing a file format?
Today I want to document one format which was very easy to identify as it spelled out its format very verbosely, but because of the lack of additional documentation makes it very hard to preserve.
Meet the Composite File Management System file format:
hexdump -C sample.br4 00000000 43 43 6d 46 20 2d 20 55 6e 69 76 65 72 73 61 6c |CCmF - Universal| 00000010 20 2d 20 41 78 69 6f 6d 20 2d 20 41 47 50 20 2d | - Axiom - AGP -| 00000020 20 43 6f 6d 70 6f 73 69 74 65 20 46 69 6c 65 20 | Composite File | 00000030 4d 61 6e 61 67 65 6d 65 6e 74 20 53 79 73 74 65 |Management Syste| 00000040 6d 20 28 55 6e 69 76 65 72 73 61 6c 29 20 2d 20 |m (Universal) - | 00000050 43 72 65 61 74 65 64 20 62 79 20 41 6e 64 72 65 |Created by Andre| 00000060 61 20 50 65 73 73 69 6e 6f 2c 20 44 65 63 65 6d |a Pessino, Decem| 00000070 62 65 72 20 31 39 39 35 20 28 76 65 72 73 2e 20 |ber 1995 (vers. | 00000080 35 29 20 2d 20 43 6f 70 79 72 69 67 68 74 28 63 |5) - Copyright(c| 00000090 29 20 31 39 39 35 2d 39 36 20 62 79 20 4d 65 74 |) 1995-96 by Met| 000000a0 61 54 6f 6f 6c 73 2c 20 49 6e 63 2e 20 2d 20 50 |aTools, Inc. - P| 000000b0 72 6f 75 64 6c 79 20 6d 61 64 65 20 69 6e 20 74 |roudly made in t| 000000c0 68 65 20 55 53 41 2c 20 6c 61 6e 64 20 6f 66 20 |he USA, land of | 000000d0 74 68 65 20 66 72 65 65 2c 20 68 6f 6d 65 20 6f |the free, home o| 000000e0 66 20 74 68 65 20 62 72 61 76 65 2e 00 00 00 00 |f the brave.....| 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Where to start? First off, this is the Bryce 4 file format. Bryce was a 3D modeling, animation software developed by MetaTools, later MetaCreations. Metacreations was also the developer of popular software Ray Dream Studio/Infini D, Fractal Design Painter, and Kai’s Power Tools.
Secondly, this format refers to a Universal File Management System or CCmF, which I have found to be the file format for many other extensions, some of which are .goo, .brc, .br3, .br4, .br5, .sfp, .shp, .obp. It doesn’t always have the verbose header, some of them have the following:
hexdump -C Tutorial.obp | head 00000000 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | | * 00000050 20 20 20 20 20 20 20 20 20 20 20 20 20 20 43 43 | CC| 00000060 6d 46 69 6c 65 3a 3a 6b 49 64 65 6e 74 69 66 79 |mFile::kIdentify| 00000070 34 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |4 | 00000080 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
Different, but still contains the CCmF identification string. Others have the verbose header, but further down inside the file.
With this format being used with so many well known software titles, I assumed information on the format would we readily available. Alas, not so much. The format even had the name of the creator! “Created by Andrea Pessino, December 1995”. So I reached out. He was on Twitter and I asked about the file format and if there was any documentation available. Twitter (X) has since deleted his responses after he deleted his account, but he told me he wasn’t sure where the documentation might be. One other developer also commented and confirmed they didn’t know where any of the documentation went after they left.
MetaCreations sold Bryce to Corel in 2000, then in 2004 sold it to Daz3D, the current owners. It’s not actively developed anymore being that it was never made into a 64bit application. A blog post explains the format a little more, but concludes it is a secret known only to Daz.
It seems there is a community who would like to see Bryce more open, maybe even open-sourced. This thread discusses the format and the underlying Axiom format used.
The creator Andrea Pessino was able to track down some documentation on the CCmF file structure for me. He explained Axiom was an entire codebase for all MetaTools/Creations applications and plugins. So the CCmF system was more than a file format. The documentation included some information on versioning of a CCmF.
There seems to be a few versions of the CCmF file structure.
- CCmFile::kIdentify which corresponds with December 1995 (vers. 5)
- CCmFile::kIdentify2 which corresponds with March 1997 (vers. 7)
- CCmFile::kIdentify3 which corresponds with October 1998 (vers. 9)
- CCmFile::kDfFormat which is a Generic Composite File
The documentation given to me was up to date for 1998, but after Corel purchased Bryce there was some updates made as many material files have the identifier “CCmFile::kIdentify4“.
Bryce 6 & 7 were released by Daz3D and have a different file header. They have the extension .BR6 & .BR7 with the header:
hexdump -C Bryce7-s01.br7 | head 00000000 42 72 79 63 65 5f 36 2e 30 5f 46 69 6c 65 00 00 |Bryce_6.0_File..| 00000010 11 00 00 00 d4 07 00 00 00 20 00 00 e5 07 00 00 |......... ......| 00000020 00 0a 00 00 00 10 00 00 00 08 78 9c 63 64 60 60 |..........x.cd``| 00000030 60 04 e2 8c cc f4 0c 85 e4 9c fc d2 14 85 92 d4 |`...............| 00000040 8a 92 d2 a2 54 86 11 05 18 a1 18 04 82 76 c8 b5 |....T........v..| 00000050 be 0e 7c 60 8f 4e 93 67 f2 07 32 f5 d1 0e 30 31 |..|`.N.g..2...01| 00000060 40 fc ca 0c c5 60 bf 33 a2 ab da e2 8c c0 70 e0 |@....`.3......p.| 00000070 00 22 58 a0 9c ff 2a 40 fc bf 16 88 ff c3 c3 2e |."X...*@........| 00000080 13 64 20 83 82 13 50 29 50 ad 17 50 ef 3c 20 ce |.d ...P)P..P.< .| 00000090 72 66 64 86 19 31 cd 09 42 57 b9 80 71 43 9d 0b |rfd..1..BW..qC..|
I still need to gather more samples from the various extensions related to this format and the software related to them. More work to do understanding the different uses of the short CCmFile string and the more detailed header and the differences between objects, materials, and models. When I asked Andrea why he used such a verbose file header, his answer was basically, why not!