I wonder sometimes what goes through a software/hardware developers mind when deciding a format to use for a new device. There are so many options our there for audio formats to choose from. I am sure there are pros and cons to using one technology over another but it seems a few decide to go ahead and make their own. I am sure there is some commercial advantage to developing a proprietary audio format, but with all the established choices it seems unnecessary.
Sony developed their own audio compression formats, which I explored in an earlier blog post. I came across a small goofy looking RCA voice recorder, model VR6320.
Many of these RCA VR series recorders can record in a WAV or a VOC file format. The WAV files are pretty run of the mill, but the VOC format is unique to RCA recorders.
The VOC format is not to be confused with another audio format with the same extension. The Creative Voice Format is a bit more well known. It was used with the Creative’s sound cards (Sound Blaster family) many folks had in their Windows computers in the 1990’s. But the RCA file format is different, and because of the same extension needs its own identification so they are not confused with each other.
sf REC00001.VOC
---
siegfried : 1.10.1
scandate : 2023-11-19T23:33:47-07:00
signature : default.sig
created : 2023-05-12T09:10:13Z
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V112.xml; container-signature-20230510.xml'
---
filename : 'REC00001.VOC'
filesize : 47231
modified : 2015-01-09T20:51:10-07:00
errors :
matches :
- ns : 'pronom'
id : 'UNKNOWN'
format :
version :
mime :
class :
basis :
warning : 'no match; possibilities based on extension are fmt/1736'
The RCA VOC file format seems to be undocumented, there isn’t much available. You can always download a copy of the RCA Digital Voice Manager software, which may or may not run on your current system, and convert the VOC files to WAV or you can use a piece of software coded in 2008 called “devoc“. The developer used to have an online website you could upload the VOC to and it would convert it automatically, but is not longer available. The code can also be found here.
Let’s take a look at the header of a couple of the files I have:
Most of samples I have show “VCP162_VOC_File” in the header, but I have one sample with “RP5120_VOC_File“. I have heard of others, one being “V432_Voice_File“. There could be more variations. One could assume the header is somehow associated with the model number of the device, but that doesn’t appear to be the case. Although there is a device with the model number “RP 5120“. It might be that the older RP series get one header and the newer VR Series get VCP? I will need more samples to confirm, if you have any send them my way. Also, according to the manuals, there is a SP and LP mode to manage the bitrate of the file to squeeze more minutes on the built in memory of these devices. This doesn’t appear to affect identification, but might be good to differentiate in the future.
For now you can take a look at the signature on my GitHub page.
All I had to go on was it was an Adobe format and the acronym “ACD”. One of the first results that came up in a google search was a post in the Adobe forums with someone asking what to do with some old ACD and ACI files they found on a disc, circa 2000, labeled “Adobe Capture”. The only thing I remember about Adobe Capture was some scanning tools related to Adobe Acrobat, but I didn’t remember coming across any ACD files related to Acrobat.
Initially it wasn’t easy to find more information on this format. Eventually I was able to narrow it down to stand-alone software adobe released called “Adobe Acrobat Capture”. Originally released in 1995 it was eventually discontinued in 2010. The software was marketed under the ePaper name and connected to Acrobat through the creation of a PDF from scanned images. The software was compatible with many scanner models and would process the scanned images, run Optical Character recognition, and export to a searchable PDF. These tools are built into Adobe Acrobat today.
One of the reasons the software was being so elusive is the fact it was sold with a high price tag and required the use of a hardware key, or dongle, in order to process scans. The hardware key also managed the type of license you purchased which may limit the number of pages you are allowed to scan within a certain period of time. So the software is very difficult to run today, if you do happen to find a copy out there in Internet land.
In order to document these file formats for preservation purposes I needed to find some samples. I was excited to find a demonstration CD on the Internet Archive, but unfortunately it contained no examples of the ACD file format.
A little sleuthing on the Wayback Machine helped me find a few user guides and brochures. I was also able to find there was three versions of Adobe Acrobat Capture. In a Product Brochure, you can see a screenshot of the software with a document open with the ACD extension.
If you are OCD like me you might have noticed the window in this screenshot is typical of the older Windows 3.1 or Windows NT system. So this was indeed an older product released by Adobe.
The Adobe Acrobat Capture 3.0 Demonstration CD-ROM from the Internet Archive luckily has a UserGuide PDF on the disc and was able to help me understand the ACD format a little more.
Looks like the ACD format is an intermediate format used by the software to manage the process between scanning and export to PDF. ACD was also defined as an “Acrobat Capture Document” which makes sense. They were also mentioned as being “multipage files in Acrobat Capture Document (ACD)”. The UserGuide also mentioned an ACP format which it referenced as “one-page files are in Acrobat Capture Page (ACP) format.” So more research is needed.
Lets start with Adobe Acrobat Capture 2.0 as I managed to get a few samples from an installer I found. Here is a hexdump of an ACD file and its corresponding ACI file.
The ACD file is unique, PRONOM and even TrID was unaware of the format. But to the keen observer, the ACI format is very recognizable. You may have seen this header before:
Lets take a closer look at an ACI file to see if they are a true TIFF image or if there is any customization to the format.
Looks like a true TIFF image with no special tags or unique properties. They are 1-bit TIFF’s compressed with CCITT RLE. Not sure there would be any need to create a special signature for these ACI files.
Looking closer at the ACD file format, we can see they reference ACI files, so probably safe to assume the ACD file doesn’t contain the full raster data for each image:
From the limited sample set I have access, all the ACD files begin with the same Hex values, “02044747C900”. Along with the common header we can assume there should be at least one ACI file referenced in the first part of the file. Because it is referenced as a filepath, the ACI string would be variable in its offset.
Adobe Acrobat Capture 3.0 turns out to be a different format. But looks familiar………
The ACD has some of the same hex values as the previous version, but with some extra bytes at the beginning and it looks like the ACP is a straight up PDF. But may have some interesting tags, like “CAPT_info”.
The problem we will face when trying to write a signature for this version of ACD is the container signature needs a static file name to reference, and it appears the name of the container is also the name of the ACD file within the container. So every file will be different. I wish there was a way in the PRONOM signature syntax to reference an extension and ignore the filename, but currently there no method to do this. The only thing inside the container which seems to be consistent is the file “FILES.LST”. So lets take a peek inside if it.
hexdump -C FILES.LST | head
00000000 5b 41 43 44 31 5d 0d 0a 49 53 43 4f 4d 50 4f 53 |[ACD1]..ISCOMPOS|
00000010 49 54 45 3d 54 52 55 45 0d 0a 4e 55 4d 46 49 4c |ITE=TRUE..NUMFIL|
00000020 45 53 3d 31 0d 0a 46 49 4c 45 4e 41 4d 45 31 3d |ES=1..FILENAME1=|
00000030 43 6f 6e 74 72 61 63 74 2e 61 63 70 0d 0a |Contract.acp..|
Ok, there seems to be some static information that is unique to the ACD format. I bet the string “[ACD1]” would be sufficient enough to make a solid signature.
This is a good format example of a limited amount of information on the file format used by a well known company which has become obsolete and disappeared. Take a look at my signatures, maybe you have some old ACD files you were unaware of!
I have to admit, often when I am researching file formats I can get distracted by a shinier format I come across. I often go down rabbit holes and forget the reason I started down the path I am on. I try and focus on the current needs in my life as a Digital Preservation Manager, but can get easily sidetracked. I always look forward to November every year so I can celebrate World Digital Preservation Day which sometimes comes along with a PRONOM research week. This gives me a chance to look at formats that may need attention which are not normally on my radar.
This week I a taking a look again at Multiplan. There is a PRONOM PUID for version 4, but does not have a description nor does it have a binary signature. It is was also lacking a File Format Wiki entry. So I decided to dive in. I had already bumped into the format while doing some research on early Microsoft Excel formats. This includes the SYLK format which needed a little update.
Microsoft Multiplan was the parent of Microsoft Excel. Multiplan was built for many different types of computers in the 1980’s, but was never ported to Windows. So to use Multiplan you have to be comfortable with using DOS. If you want to take Multiplan for a spin, head over to PCjs Machines and load up one of the many emulated systems they have.
In the end, Multiplan had four versions, but the last one, version 4.2, had some big changes, especially to the file format. More on that in a minute.
The DOS files for Version 2 begin with 0CEC0000 08AB0800, but a file for the Xenix system starts with 0AEC0000 08AB0800. So it appears the first byte may be different depending on the system.
The DOS files for Version 3 begin with a similar hex pattern, 0CED0000 08AB0800. This would make sense as the documentation for Multiplan 4.2 states it supports opening of Version 2 & 3, but not Version 1.
There was also a companion product that went along with Multiplan, it was called Microsoft Chart. Here is a file from version 3:
The Chart file format has a similar byte pattern with the 08AB pattern and looks similar to the BIFF format. We will have to make sure it doesn’t conflict with any signatures so it can be identified separately.
Version 4 of Multiplan was the first to use the BIFF (Binary Interchange File Format). Technically Version BIFF2, not much is know about BIFF1 or if it ever existed. BIFF2 is the exact same format as Excel 2.0 used, so there will be some problems if we want to identify them separately. They currently identify as fmt/55.
You can see in the hex values above a difference of two bytes in the header. The reason the Multiplan file identifies as an Excel 2 file is the PRONOM signature ignores those two bytes and allows them to be anything. Some specifications say these aren’t used, but clearly there is a use for them. We could probably use the same signature for Multiplan, but include the two bytes, then set the priority to the Multiplan signature.
The hex values for the first 4 bytes have a similar pattern. 0CEF, Which seems to be in sequence where Version 3 left off. Microsoft calls this new format, New or Normal Binary File Format. They claim it is “the fastest loading and fastest saving file format ever“! Exciting as the new format probably was, it didn’t last long. Multiplan was phased out so Excel could shine.
When I was younger I didn’t use DOS very often because the computer my father brought home in the mid 1980’s was a Macintosh. I use DOS more now in my research then I did when I was younger. Using the DOS interface is not easy. There are a lot of key commands you need to know intuitively just to navigate, but it is fascinating to see how far software has come. Early Excel, Multiplan, and Chart were all intertwined, but hopefully combing through all of these samples can bring some clarity. Take a look at the draft signature I made and all the samples that go with it on my GitHub page.
In honor of World Digital Preservation Day, I wanted to write a little about format headers, the magic that makes some files more easily identifiable than others.
When it comes to binary file formats, some developers decide to make the format clearly identifiable in a header and others choose to make it ambiguous. Others have a little fun with leaving little clues and references to popular culture.
Like I said some developers make it very obvious what software created the file format and others seem to make things difficult. I understand there is a need to optimize files to keep them from getting bloated and taking up too much space, but many of the size limits from the early days of computing are not an issue anymore. Can’t we be more clear when designing a file format?
Today I want to document one format which was very easy to identify as it spelled out its format very verbosely, but because of the lack of additional documentation makes it very hard to preserve.
Meet the Composite File Management System file format:
Secondly, this format refers to a Universal File Management System or CCmF, which I have found to be the file format for many other extensions, some of which are .goo, .brc, .br3, .br4, .br5, .sfp, .shp, .obp. It doesn’t always have the verbose header, some of them have the following:
Different, but still contains the CCmF identification string. Others have the verbose header, but further down inside the file.
With this format being used with so many well known software titles, I assumed information on the format would we readily available. Alas, not so much. The format even had the name of the creator! “Created by Andrea Pessino, December 1995”. So I reached out. He was on Twitter and I asked about the file format and if there was any documentation available. Twitter (X) has since deleted his responses after he deleted his account, but he told me he wasn’t sure where the documentation might be. One other developer also commented and confirmed they didn’t know where any of the documentation went after they left.
MetaCreations sold Bryce to Corel in 2000, then in 2004 sold it to Daz3D, the current owners. It’s not actively developed anymore being that it was never made into a 64bit application. A blog post explains the format a little more, but concludes it is a secret known only to Daz.
It seems there is a community who would like to see Bryce more open, maybe even open-sourced. This thread discusses the format and the underlying Axiom format used.
The creator Andrea Pessino was able to track down some documentation on the CCmF file structure for me. He explained Axiom was an entire codebase for all MetaTools/Creations applications and plugins. So the CCmF system was more than a file format. The documentation included some information on versioning of a CCmF.
There seems to be a few versions of the CCmF file structure.
CCmFile::kIdentify which corresponds with December 1995 (vers. 5)
CCmFile::kIdentify2 which corresponds with March 1997 (vers. 7)
CCmFile::kIdentify3 which corresponds with October 1998 (vers. 9)
CCmFile::kDfFormat which is a Generic Composite File
The documentation given to me was up to date for 1998, but after Corel purchased Bryce there was some updates made as many material files have the identifier “CCmFile::kIdentify4“.
Bryce 6 & 7 were released by Daz3D and have a different file header. They have the extension .BR6 & .BR7 with the header:
I still need to gather more samples from the various extensions related to this format and the software related to them. More work to do understanding the different uses of the short CCmFile string and the more detailed header and the differences between objects, materials, and models. When I asked Andrea why he used such a verbose file header, his answer was basically, why not!