iView

February 13, 2026 by Thor Leave a comment

It seems to be a common theme through the history of software that some titles, get bought, sold, rebranded, integrated, and discontinued by a number of companies. I find it interesting to find out a popular software title’s humble beginnings. Often when a piece of software gets bought, the file formats don’t change much, at least at first.

A little shareware program called iView started out by a company called Script Software in 1996. They later changed their name to Plum Amazing. iView then became iView Multimedia, then an iView MediaPro version before it was bought by Microsoft where they changed the name to Expression Media. After a couple years the software was bought by Phase One and then discontinued. Let’s take a look at the history.

iView, according to their website in 1997, is simply the easiest and fastest way to view and catalog pictures for the Mac. The software initially only worked on the Macintosh and the Catalog file it produced did not have an extension. But they did have a Type/Creator code. A catalog produced by version 2 of the iView software was IVWc/IVW2.

% hexdump -C iView2-s01 | head
00000000  00 00 00 05 30 32 35 69  47 4f 53 58 3a 4c 69 62  |....025iGOSX:Lib|
00000010  72 61 72 79 3a 41 70 70  6c 69 63 61 74 69 6f 6e  |rary:Application|
00000020  20 53 75 70 70 6f 72 74  3a 41 70 70 6c 65 3a 69  | Support:Apple:i|
00000030  43 68 61 74 20 49 63 6f  6e 73 3a 46 72 75 69 74  |Chat Icons:Fruit|
00000040  3a 47 72 65 65 6e 20 41  70 70 6c 65 2e 67 69 66  |:Green Apple.gif|
00000050  03 46 44 63 00 00 0f ef  03 46 44 63 08 93 65 58  |.FDc.....FDc..eX|
00000060  00 01 5c 50 00 01 5a c8  68 ff f7 40 08 93 65 4b  |..\P..Z.h..@..eK|
00000070  08 13 9a c0 ff d1 3a 80  00 a3 c8 a0 00 00 28 00  |......:.......(.|
00000080  00 05 48 64 00 00 a0 24  00 00 39 ec 00 00 00 0a  |..Hd...$..9.....|
00000090  08 93 65 64 44 00 00 24  3d 14 51 84 3d 9d 74 bc  |..edD..$=.Q.=.t.|

The iView format is a proprietary binary format used to store a catalog of multimedia formats with their metadata and thumbnail. The media viewer had support for quite a few popular formats. The file seems to have paths to each of the files it has cataloged, so some of these iView files can get pretty large.

In 2003 the iView software was ported to Windows. With that brought a formal extension to the catalog format. This was also the time the iView software made the switch from the classic MacOS to MacOSX and extensions were also encouraged at this time. iView had two different version a standard shareware version and a Media Pro version, each had their own version numbers. iView MediaPro was not compatible with Macintosh 68K machines or systems earlier than 8.6. The last Media Pro version was version 3.8.6. You can get most of the old software versions here.

% hexdump -C iViewPro302-s01.ivc | head
00000000  00 00 00 00 30 32 35 69  46 53 4d 21 00 00 00 2e  |....025iFSM!....|
00000010  66 6c 64 72 00 00 00 2e  00 00 00 00 00 00 00 06  |fldr............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  3c 72 6f 6f 74 3e 42 4c  |........<root>BL|
00000040  44 4f 00 00 00 0c 31 00  02 00 00 00 01 01 00 00  |DO....1.........|
00000050  00 00 55 53 46 33 00 00  00 02 01 03 43 4d 52 53  |..USF3......CMRS|
00000060  00 00 01 ed 01 00 00 02  0a 01 00 00 00 00 00 00  |................|
00000070  00 02 f2 01 00 00 00 00  00 00 00 00 a2 01 00 00  |................|
00000080  00 00 02 01 03 00 00 00  a1 01 00 00 00 00 00 00  |................|
00000090  00 00 48 00 00 00 00 00  00 00 00 00 03 01 00 00  |..H.............|

This time with an extension, IVC, but with a familiar pattern at the beginning. The string 025i, hex values “30323569” at byte 4. The iView files from previous versions have the same bytes, but only version Media Pro 2 & 3 files match an existing PRONOM identification.

% sf iViewPro302-s01.ivc 
filename : 'iViewPro302-s01.ivc'
filesize : 3757
modified : 2025-09-17T17:39:27-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/647'
    format  : 'Microsoft Expression Media'
    version : '2'
    mime    : 
    class   : 'Presentation'
    basis   : 'extension match ivc; byte match at [[4 4] [3737 16]]'

These are iView Media Pro files, why are they identifying as Microsoft Expression Media files? That is because Microsoft bought iView Media Pro on June 27, 2006. Microsoft rebranded the software as Expression Media, not to be confused with Expression Studio. It was available for Windows and Macintosh, but not everyone was happy with the purchase. Version 1 of Expression Media was released the next year and was a free upgrade for iView Media Pro users. The format doesn’t appear to have changed much at all. In fact a comparison of an iView Media Pro 3 file with no content and an Expression Media 1 file are practically identical.

% hexdump -C Expression1-s01.ivc | head
00000000  00 00 00 00 30 32 35 69  46 53 4d 21 00 00 00 2e  |....025iFSM!....|
00000010  66 6c 64 72 00 00 00 2e  00 00 00 00 00 00 00 06  |fldr............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  3c 72 6f 6f 74 3e 42 4c  |........<root>BL|
00000040  44 4f 00 00 00 0c 31 00  02 00 00 00 01 01 00 00  |DO....1.........|
00000050  00 00 55 53 46 33 00 00  00 02 01 03 43 4d 52 53  |..USF3......CMRS|
00000060  00 00 01 ed 01 00 00 02  0a 01 00 00 00 00 00 00  |................|
00000070  00 02 f2 01 00 00 00 00  00 00 00 00 a2 01 00 00  |................|
00000080  00 00 02 01 03 00 00 00  a1 01 00 00 00 00 00 00  |................|
00000090  00 00 48 00 00 00 00 00  00 00 00 00 03 01 00 00  |..H.............|

The next year brought a version 2 of Expression Media, often found bundled with a Special Edition of Office 2008 for Mac, but also a standalone product for Windows. But the catalog format remained the same.

% hexdump -C Expression2-s01.ivc | head       
00000000  00 00 00 04 30 32 35 69  3a 43 3a 5c 44 4f 43 55  |....025i:C:\DOCU|
00000010  4d 45 7e 31 5c 41 4c 4c  55 53 45 7e 31 5c 44 4f  |ME~1\ALLUSE~1\DO|
00000020  43 55 4d 45 7e 31 5c 4d  59 50 49 43 54 7e 31 5c  |CUME~1\MYPICT~1\|
00000030  53 41 4d 50 4c 45 7e 31  5c 57 69 6e 74 65 72 2e  |SAMPLE~1\Winter.|
00000040  6a 70 67 00 00 00 00 00  00 00 00 00 00 00 00 00  |jpg.............|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Even though all of these versions have the same 4 bytes at the beginning, not all of them match the current PRONOM signature. fmt/647 is specifically for Expression Media version 2 files, but also identifies iView Media Pro 2 & 3 and Expression Media 1 files. It doesn’t identify earlier files because the signature is also looking for some bytes near the end of the file.

% hexdump -C iViewPro302-s01.ivc | tail       

00000e90  00 00 00 00 00 00 00 00  00 53 56 61 72 00 00 00  |.........SVar...|
00000ea0  04 00 00 01 f4 30 32 35  69 00 00 00 08           |.....025i....|

There is the same 4 bytes at the end of the file as well. There is also a string used in the signature at the end, “SVar”. Not sure what the string is used for but it is not in earlier versions.

% hexdump -C iView157-01 | tail 

00000420  00 00 00 00 00 00 00 00  00 00 00 00 30 32 35 69  |............025i|
00000430  00 00 00 08                                       |....|

And the even earlier versions are missing the “025i” at the end.

% hexdump -C iView2-s01 | tail

000062b0  2a ae ed d4 1a eb d4 04  c4 88 76 88 c4 d6 d4 04  |*.........v.....|
000062c0  c4 79 69 79 c4 d6 d4 04  c4 78 67 78 c4 ec d4 04  |.yiy.....xgx....|
000062d0  81 d4 f1 d4 00 ff                                 |......|

Microsoft Expression Media was short lived. Microsoft decided to sell off the software to Phase One in 2010. Phase One is the developer of Capture One, a professional photo editing program. It makes sense they would want a cataloging tool to go with their flagship product. Phase One retained the name Media Pro from the original iView Media Pro software.

Phase One took the software and did make modifications, starting with the extension used to store the catalogs. They also decided to adjust the format slightly, changing the “025i” bytes to “030i”.

% hexdump -C PhaseOneMediaProv1.mpcatalog | head 
00000000  00 00 00 05 30 33 30 69  4a 4d 61 63 31 30 37 3a  |....030iJMac107:|
00000010  4c 69 62 72 61 72 79 3a  41 70 70 6c 69 63 61 74  |Library:Applicat|
00000020  69 6f 6e 20 53 75 70 70  6f 72 74 3a 41 70 70 6c  |ion Support:Appl|
00000030  65 3a 69 43 68 61 74 20  49 63 6f 6e 73 3a 46 72  |e:iChat Icons:Fr|
00000040  75 69 74 3a 47 72 65 65  6e 20 41 70 70 6c 65 2e  |uit:Green Apple.|
00000050  67 69 66 00 00 00 00 00  00 00 00 00 00 00 00 00  |gif.............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The Phase One Media Pro software uses the extension MPCATALOG, but can also open the older IVC catalogs as well.

% sf PhaseOneMediaProv1.mpcatalog 

filename : 'PhaseOneMediaProv1.mpcatalog'
filesize : 21353
modified : 2025-09-16T20:37:07-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/648'
    format  : 'Media View Pro'
    version : 
    mime    : 
    class   : 'Presentation'
    basis   : 'extension match mpcatalog; byte match at [[4 4] [21329 16]]'

MPCATALOG files are identified in PRONOM using a similar signature as the one used for the IVC files. Although the name of the format isn’t quite right, MediaPro is probably a better name.

So it seems the identification is already available in PRONOM for the later MediaPro files, both iView MediaPro and Expression Media, and a second identification for the PhaseOne catalog. So we will need to either adjust the identification to include the earlier iView versions and adjust the names or we can create a new signature for the older versions. It would be good to find out what version added the change to the format, but with all the different software versions, it might be hard to nail down.

Enjoy some samples.

December 5, 2025 by Thor Leave a comment

The main subject of these posts is about Obsolete software and file formats. I prefer to focus on older software titles and collect them when I can. I have also found older Macintosh software to be particularly interesting as many of the qualities of early Macintosh use is lost today. In researching a very early Macintosh title, I came across an article from 1999 written by the Washington Post, the article, now 26 years old, was already commenting about “antique” software which was less than 20 years old at the time. Is there a term for even more antique? The title of the article? “Old Enthusiasts Are Scouring the Web to Find ‘Antique’ Software”. I feel this hasn’t changed, I still scour the web to find old software, and if the enthusiasts were “old” 26 years ago, then I am ancient.

Back in 1983, a little company called Living Videotext run by Dave Winer, who had developed a couple products for the Apple II, saw an opportunity to bring their product to the Macintosh. Their product, ThinkTank, was the fourth title to ship for the new Macintosh released in 1984.

Thinktank was an “idea processor“, not a word processor, but “a tool for organizing your thoughts on a computer screen. You could create an outline, then indent, move an item up a list, or out a level. Flesh out the details, and quickly record a top-level idea you had overlooked.” It was the beginning of outliner tools created by the company.

% hexdump -C Sample | head
00000000  2e 48 45 41 44 20 30 20  2b 20 20 4d 61 6a 6f 72  |.HEAD 0 +  Major|
00000010  20 4c 65 61 67 75 65 20  42 61 73 65 62 61 6c 6c  | League Baseball|
00000020  20 54 65 61 6d 73 0d 2e  48 45 41 44 20 31 20 2b  | Teams..HEAD 1 +|
00000030  20 20 4c 65 61 67 75 65  73 20 61 6e 64 20 44 69  |  Leagues and Di|
00000040  76 69 73 69 6f 6e 73 0d  2e 48 45 41 44 20 32 20  |visions..HEAD 2 |
00000050  2b 20 20 41 6d 65 72 69  63 61 6e 20 4c 65 61 67  |+  American Leag|
00000060  75 65 0d 2e 48 45 41 44  20 33 20 2b 20 20 57 65  |ue..HEAD 3 +  We|
00000070  73 74 65 72 6e 20 44 69  76 69 73 69 6f 6e 0d 2e  |stern Division..|
00000080  48 45 41 44 20 34 20 2d  20 20 43 61 6c 69 66 6f  |HEAD 4 -  Califo|
00000090  72 6e 69 61 20 41 6e 67  65 6c 73 0d 2e 48 45 41  |rnia Angels..HEA|

The files created by ThinkTank are plain text with the ASCII “HEAD”. There was also a DOS version of ThinkTank, but the files used were .DB and .SAV, although the templates in the .TXT format did use this same format.

% hexdump -C SAMPLE.TXT | head
00000000  2e 48 45 41 44 20 30 20  2b 20 20 50 65 72 66 6f  |.HEAD 0 +  Perfo|
00000010  72 6d 61 6e 63 65 20 52  65 76 69 65 77 0d 0a 2e  |rmance Review...|
00000020  48 45 41 44 20 31 20 2d  20 20 4e 61 6d 65 3a 20  |HEAD 1 -  Name: |
00000030  0d 0a 2e 48 45 41 44 20  31 20 2d 20 20 4a 6f 62  |...HEAD 1 -  Job|
00000040  20 54 69 74 6c 65 3a 20  0d 0a 2e 48 45 41 44 20  | Title: ...HEAD |
00000050  31 20 2d 20 20 52 65 76  69 65 77 20 44 61 74 65  |1 -  Review Date|
00000060  3a 20 0d 0a 2e 48 45 41  44 20 31 20 2d 20 20 52  |: ...HEAD 1 -  R|
00000070  65 76 69 65 77 20 70 65  72 69 6f 64 20 66 6f 72  |eview period for|
00000080  3a 20 0d 0a 2e 48 45 41  44 20 31 20 2b 20 20 4f  |: ...HEAD 1 +  O|
00000090  62 6a 65 63 74 69 76 65  73 20 4d 65 74 2f 4e 6f  |bjectives Met/No|

Turns out this was a special format they called “dot-head“, aptly named for the head of the file. It was used as an interchange format to move outlines between ThinkTank, another program called Ready!, and the later product MORE.

MORE was developed to be multiple tools in one. Meant to “Unite idea processing technology with the desktop publishing revolution“. MORE replaced ThinkTank in 1986 and promised more flexibility by creating charts and presentations quickly from your outline. MORE used the same dot-head format initially, also the ASCII could be in lowercase.

% hexdump -C MORE1 | head
00000000  2e 68 65 61 64 20 30 20  2b 20 20 48 6f 6d 65 0d  |.head 0 +  Home.|
00000010  2e 68 65 61 64 20 31 20  2d 20 20 0d 2e 68 65 61  |.head 1 -  ..hea|
00000020  64 20 31 20 2d 20 20 54  65 73 74 69 6e 67 0d 2e  |d 1 -  Testing..|
00000030  68 65 61 64 20 31 20 2d  20 20 0d                 |head 1 -  .|

In 1987 Living Videotext was purchased by Symantec. Shortly after Symantec released MORE II and a rebranded DOS application called GrandView based on ThinkTank.

Let’s take a look at GrandView, it was built from the DOS version of ThinkTank and compatible with the same formats. It had great reviews at the time and provided the first outliner for Symantec. It was written by the developer, John Friend, who created PC Outline which was often bundled with WordStar.

GrandView could import and export into any of the other products.

GrandView version 1 went with a new file format.

% hexdump -C PROJECT.GV | head
00000000  0b 00 01 00 1a 4a 4c 46  5f 49 44 06 00 02 00 01  |.....JLF_ID.....|
00000010  01 0a 00 03 00 16 00 26  00 2c 00 05 00 05 00 78  |.......&.,.....x|
00000020  06 00 07 00 ff ff 06 00  1a 00 01 00 26 00 08 00  |............&...|
00000030  26 00 ee 4d 02 00 00 00  c0 a8 00 00 00 00 00 00  |&..M............|
00000040  01 00 00 00 00 00 01 00  00 00 00 00 00 00 00 01  |................|
00000050  00 00 22 00 0a 00 20 1c  00 00 20 1c 00 00 d0 b6  |.."... ... .....|
00000060  00 00 10 ef 00 00 20 1c  00 00 20 1c 00 00 60 35  |...... ... ...`5|
00000070  01 00 01 00 05 00 10 00  62 15 00 13 00 cc 93 88  |........b.......|
00000080  10 54 54 59 2e 50 44 56  00 00 00 00 00 00 2c 00  |.TTY.PDV......,.|
00000090  1c 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

GrandView 2.0 also used the same format.

% hexdump -C TEST.GV | head
00000000  0b 00 01 00 1a 4a 4c 46  5f 49 44 06 00 02 00 01  |.....JLF_ID.....|
00000010  02 0a 00 03 00 02 00 2b  00 3a 00 05 00 05 00 78  |.......+.:.....x|
00000020  06 00 07 00 ff ff 26 00  08 00 2b 00 d5 3e 02 00  |......&...+..>..|
00000030  00 00 d0 b6 00 00 00 00  00 00 01 00 00 00 00 00  |................|
00000040  01 00 00 00 00 00 00 00  00 01 00 00 22 00 0a 00  |............"...|
00000050  20 1c 00 00 20 1c 00 00  d0 b6 00 00 10 ef 00 00  | ... ...........|
00000060  20 1c 00 00 20 1c 00 00  60 35 01 00 01 00 05 00  | ... ...`5......|
00000070  10 00 60 15 00 13 00 9b  5d 83 14 48 50 4c 33 2e  |..`.....]..HPL3.|
00000080  50 44 56 00 00 00 00 00  2c 00 1c 00 00 00 00 00  |PDV.....,.......|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

GrandView was also compatible with the Macintosh counterpart, MORE.

Symantec then released a new version of the MORE software for the Macintosh in 1988, adding new presentation features. MORE II went away from the dot-head format and used a new proprietary format.

% hexdump -C MORE2-s01 | head
00000000  00 03 4d 52 49 49 00 80  00 00 00 80 00 00 00 78  |..MRII.........x|
00000010  00 00 00 00 00 00 00 00  00 00 00 f8 00 00 00 a8  |................|
00000020  00 00 01 a0 00 00 00 28  00 00 01 c8 00 00 00 18  |.......(........|
00000030  00 00 01 e0 00 00 00 00  00 00 01 e0 00 00 00 0c  |................|
00000040  00 00 01 ec 00 00 00 0c  00 00 01 e0 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  00 03 00 00 00 48 00 48  00 00 00 00 02 d8 02 28  |.....H.H.......(|
00000090  ff e1 ff e2 02 f9 02 46  03 47 05 28 03 fc 00 02  |.......F.G.(....|
000000a0  00 00 00 48 00 48 00 00  00 00 02 d8 02 28 00 01  |...H.H.......(..|

Then in 1990 Symantec released MORE 3.0 with even more features and improvements to the user experience. Also adding a companion tool, MORE Graph.

% hexdump -C MORE3-s01 | head
00000000  00 06 4d 4f 52 33 00 80  00 00 00 80 00 00 00 78  |..MOR3.........x|
00000010  00 00 00 f8 00 00 01 b4  00 00 02 ac 00 00 00 a8  |................|
00000020  00 00 11 16 00 00 00 32  00 00 11 48 00 00 00 20  |.......2...H... |
00000030  00 00 11 68 00 00 00 00  00 00 11 68 00 00 00 10  |...h.......h....|
00000040  00 00 11 83 00 00 00 0c  00 00 11 68 00 00 00 00  |...........h....|
00000050  00 00 00 00 00 00 03 54  00 00 0d c2 00 00 11 78  |.......T.......x|
00000060  00 00 00 0b 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 03 00 00 00 48 00 48  00 00 00 00 02 d8 02 28  |.....H.H.......(|
00000090  ff e1 ff e2 02 f9 02 46  03 47 05 28 03 fc 00 02  |.......F.G.(....|

The MORE 3 format got a new header but appears similar in structure to the previous version. And the new companion tool MORE Graph had yet another format.

% hexdump -C MORE3-graph | head 
00000000  00 01 00 00 01 09 00 00  00 0c 00 01 09 19 80 00  |................|
00000010  01 09 54 65 73 74 00 00  00 07 00 01 00 03 00 00  |..Test..........|
00000020  00 0b 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000610  00 00 00 00 00 00 00 01  06 47 65 6e 65 76 61 00  |.........Geneva.|
00000620  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000670  01 00 00 1f ca 33 a0 00  2b 00 04 02 d8 03 d8 09  |.....3..+.......|
00000680  57 6f 72 6b 73 68 65 65  74 00 00 00 00 00 00 00  |Worksheet.......|

Luckily these early Macintosh based formats didn’t use a resource fork, making them fully compatible with their PC counterpart.

One of the coolest parts of this long list of outline software, is that years later, after Symantec discontinued the product, the original creator, Dave Winer, petitioned Symantec to allow him to release the antique software free and clear to the public. How cool is that? I would really like to see this happen more as other software titles die and get swept under the rug leaving the community to try and find copies, preserve them and make sense of the formats. Not only were the early versions made available, a tool was built to migrate the MORE format to more open XML, allowing the ideas trapped in these ancient formats to be re-imagined.

MORE 3.1 was the final version of the software to be released by Symantec. The files produced by MORE 3.1 have an identical header to the standard 3.0 version. Probably only need one signature for the two versions.

% hexdump -C MORE31-s01 | head 
00000000  00 06 4d 4f 52 33 00 80  00 00 00 80 00 00 00 78  |..MOR3.........x|
00000010  00 00 00 f8 00 00 01 b4  00 00 02 ac 00 00 00 a8  |................|
00000020  00 00 11 16 00 00 00 32  00 00 11 48 00 00 00 20  |.......2...H... |
00000030  00 00 11 68 00 00 00 00  00 00 11 68 00 00 00 10  |...h.......h....|
00000040  00 00 11 83 00 00 00 0c  00 00 11 68 00 00 00 00  |...........h....|
00000050  00 00 00 00 00 00 03 54  00 00 0d c2 00 00 11 78  |.......T.......x|
00000060  00 00 00 0b 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 03 00 00 00 48 00 48  00 00 00 00 02 d8 02 28  |.....H.H.......(|
00000090  ff e1 ff e2 02 f9 02 46  03 47 05 28 03 fc 00 02  |.......F.G.(....|

If you would like to try out the MORE software, download this disk image, and drag onto the Macintosh emulator below. The image will automatically mount and you should be able to take MORE 3.1 for a spin!

Outlining software still has a good place in idea generation and presentation. OmniOutliner can probably trace its roots to these “Antique” titles! Stay tuned for some PRONOM signatures to go along with these many format examples. For now you can gather some of the samples from my Github page.

Textor

October 17, 2025 by Thor Leave a comment

Many of us lived through the Word Processing Wars of the late 1980’s and early 1990’s. It was an overwhelming time of many options to choose from, each providing new features with each update, trying to become the leader in the word processing game. Early DOS versions had steep learning curves which built loyalty to those who committed to muscle memory all the key commands needed to produce the perfect document. With the many options to choose for word processing, brought just as many file formats to save your work. Many titles used the same file extensions or encouraged users to choose their own, using their initials instead. Often the files created by these software titles, used standard ASCII text, but mixed in their own formatting codes which all tend to make identification in preservation difficult.

I recently acquired a large lot of older software. It has been fun sorting through it and learning about the different titles. One title stuck out, as I hadn’t heard of it before. I found an old article which included the software in a comparison of word processing software in 1993. The article compares the following executive word processing software.

LotusWrite 2.0
JustWrite 2.0
Professional Write Plus 1.0
CA-Textor 6.0
Ami Pro 3.0
Word for Windows 2.0a
WordPerfect 5.1

You are probably familiar with a few of these titles, but the one that stuck out to me was CA-Textor 6.0. In my lot of software I came across a two disk installer for CA-Textor 6.0 for Windows. Developed by Computer Associates International, Inc. who opened their doors in 1976 and developed or acquired many software titles.

In the case of CA-Textor, it was purchased from a French company, Talor à Paris, who had been producing Textor, a popular word processor in France, for DOS since the 1983. The original developer, Thierry Lorthiois, had high hopes for a French product to exist in a world of giant American companies. Even with over 70,000 copies sold, the release of Textor 4 in 1988 saw much marketshare lost to Microsoft Word. By 1989, Computer Associates purchased Textor and rebranded Textor 5 for DOS and added mouse compatibility, then in late 1991 released a Windows version of Textor and named it CA-Textor; in line with their other products. It would be the only version released by Computer Associates and disappeared into the void like many word processors of the time.

CA-Textor 6.0 for Windows appears to be a well designed word process for its time. The reviews were mixed, but scored decently in many comparisons. In the article mentioned above, it scored the lowest of all the word processors. The final result says:

CA-Textor fails to offer the usability shortcuts of the other programs, and scores well below the other programs in editing, formatting and graphics manipulation.

It was possibly reviews like this which caused Computer Associates to never update or release a new version of the software.

The first thing I noticed with the software was the way the software handles files. The software defaults to a new “Library” method which stores each file connected to a Library which stores a folder of files and their full names and descriptions.

Single files can still be saved from CA-Textor by choosing DOS file, but the extension used is not clear.

Using .TXT for a formatted file seems like a bad recommendation. So let’s take a look at a few of the files generated by by CA-Textor.

The new Library File has the extension .TAL.

 % hexdump -C TEXTOR.TAL | head
00000000  43 3a 5c 54 45 58 54 4f  52 5c 54 45 58 54 4f 52  |C:\TEXTOR\TEXTOR|
00000010  2e 54 41 4c 00 00 00 00  00 00 00 00 00 00 00 00  |.TAL............|
00000020  00 00 00 00 00 00 00 00  00 54 45 58 54 4f 52 00  |.........TEXTOR.|
00000030  00 00 c1 46 8d ec 1a 47  8d ec c1 46 8d ec 05 00  |...F...G...F....|
00000040  01 00 01 00 00 00 00 00  00 00 00 00 00 ff ff 54  |...............T|
00000050  42 58 54 66 00 0a 00 00  00 00 00 65 00 00 00 00  |BXTf.......e....|
00000060  00 00 00 00 00 54 65 73  74 00 00 00 00 00 00 00  |.....Test.......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 54 45  |..............TE|
00000090  58 54 4f 52 00 00 00 e9  46 8d ec e9 46 8d ec 01  |XTOR....F...F...|

The first few bytes is the path of the file.

It also seems the individual files connected to the Library also have the .TAL extension. But they have a different header.

% hexdump -C OBSO0006.TAL | head
00000000  01 14 00 45 54 01 02 58  54 00 06 49 57 01 00 00  |...ET..XT..IW...|
00000010  00 00 00 65 00 87 16 06  80 00 4f 62 73 6f 6c 65  |...e......Obsole|
00000020  74 65 54 68 6f 72 00 00  00 00 00 00 00 00 00 00  |teThor..........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 4f 62 73 6f 6c  65 74 65 54 68 6f 72 00  |...ObsoleteThor.|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 54 45 58 54  |............TEXT|
00000070  4f 52 00 00 00 f4 b0 8e  ec 1d b1 8e ec 00 00 00  |OR..............|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 07 11 00 02 00 00  |................|

The CA-Textor software installed some sample files during installation we can also look at.

% hexdump -C SAMPLE01.SAM | head
00000000  01 14 00 45 54 01 02 58  54 00 06 49 57 3b 00 00  |...ET..XT..IW;..|
00000010  00 da 05 00 00 04 01 06  80 00 44 3a 5c 44 4f 43  |..........D:\DOC|
00000020  54 5c 53 41 4d 50 4c 45  30 31 2e 53 41 4d 00 00  |T\SAMPLE01.SAM..|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 44 4f 53 00 00  00 00 00 00 00 00 00 00  |...DOS..........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 43 72 61 69  |............Crai|
00000070  67 00 00 00 00 34 24 b5  29 a1 8a b5 29 00 00 00  |g....4$.)...)...|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 07 61 00 02 00 00  |...........a....|

A later build of CA-Textor had more sample files, but this time with a different extension. Also has the same bytes at the beginning of the file.

% hexdump -C TEMP0005.TEM | head
00000000  01 14 00 45 54 01 02 58  54 00 06 49 57 22 00 00  |...ET..XT..IW"..|
00000010  00 e0 40 12 00 10 00 06  80 00 46 61 78 20 43 6f  |..@.......Fax Co|
00000020  76 65 72 20 31 00 00 00  00 00 00 00 00 00 00 00  |ver 1...........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 54 65 6d 70 6c  61 74 65 73 00 00 00 00  |...Templates....|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 54 45 58 54  |............TEXT|
00000070  4f 52 00 00 00 12 04 d7  2b 5d 80 e1 2b 00 00 00  |OR......+]..+...|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 10 3d 02 01 00 00  |...........=....|

The good news is there is a pattern emerging, but not the same extension. I get the feeling they didn’t see much value in the extension for this software. When I save a file in the software as a DOS file, it doesn’t automatically pick an extension for me. I left the extension off and saved a file in the DOS format.

% hexdump -C TEST1 | head
00000000  01 14 00 45 54 01 02 58  54 00 06 49 57 1d 00 00  |...ET..XT..IW...|
00000010  00 2f 67 d8 6e 1d 00 06  80 00 43 3a 5c 54 45 58  |./g.n.....C:\TEX|
00000020  54 4f 52 5c 54 45 53 54  31 00 00 00 00 00 00 00  |TOR\TEST1.......|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 44 4f 53 00 00  00 00 00 00 00 00 00 00  |...DOS..........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 54 45 58 54  |............TEXT|
00000070  4f 52 00 00 00 c0 ce d0  ae 99 53 8d ec 00 00 00  |OR........S.....|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 07 0d 00 02 00 00  |................|

We see the same pattern at the head, but also a clear mention of DOS, just like the sample files included. Since I don’t have any earlier DOS versions to compare, I have to assume this is the same with at least Textor 5. I did find a mention of someone trying to convert their older Textor 5 documents to modern formats and they mention they are in the TAL format.

% sf OBSO0006.TAL 

filename : 'OBSO0006.TAL'
filesize : 915
modified : 2025-10-05T13:26:56-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match'

% python3 trid.py OBSO0006.TAL 
TrID - File Identifier v2.41 - (C) 2003-2025 By M.Pontello

File: OBSO0006.TAL
       Unknown!

The Textor format is not known to PRONOM via Siegfried and also unknown to TrID, which now has a python release! I did go ahead and add the signature to Wikidata which can be used in Siegfried. If there is a need, we can submit to PRONOM as well.

% sf OBSO0006.TAL              
---
siegfried   : 1.11.2
scandate    : 2025-10-05T15:24:44-06:00
signature   : default.sig
created     : 2025-03-01T15:28:08+11:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V120.xml; container-signature-20240715.xml'
  - name    : 'wikidata'
    details : 'wikidata-definitions-4.0.0 (2025-10-05, DROID_SignatureFile_V120.xml, container-signature-20240715.xml)'
---
filename : 'OBSO0006.TAL'
filesize : 915
modified : 2025-10-05T13:26:56-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match'
  - ns        : 'wikidata'
    id        : 'Q136442756'
    format    : 'Textor document'
    URI       : 'http://www.wikidata.org/entity/Q136442756'
    permalink : 'https://www.wikidata.org/w/index.php?oldid=2413044878&title=Q136442756'
    mime      : 
    basis     : 'extension match tal; byte match at 0, 13 (Wikidata reference is empty)'
    warning   :

There is also a software tool, meant for converting Word Processing formats to modern and Mac compatible formats which was available until recently called WINCONV from MacDisk. This software will convert Textor 2/3/4/5/6 files to a text file for RTF. In the software it separates Textor 2/3 into their own group and 4, 5, and 6 into their own. Unfortunately doesn’t confirm any extensions that might be used.

I was able to find a copy of Textor 2.2.

It took me a few minutes to figure out some of the controls. Aside being in French, it was a little different than other Word Processing software.

After a bit of playing around in the software and trying many of the functions, I saved out a few files. At first, all the files were placed into a pair of files, called “TEXTOR.TEX” and “TEXTOR.LIG”. Creating a new document and saving would just update these two files. They seem to function in the same way the library function works in the Windows 6.0 version.

% hexdump -C TEXTOR.TEX | head
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  00 00 cd 00 60 00 05 00  00 00 54 45 00 00 00 00  |....`.....TE....|
00000090  00 00 00 00 00 00 54 45  58 54 4f 52 20 20 00 00  |......TEXTOR  ..|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000c0  00 00 cd 00 62 00 05 00  ff ff 54 45 00 00 00 00  |....b.....TE....|
000000d0  00 00 00 00 00 00 54 45  58 54 4f 52 20 20 00 00  |......TEXTOR  ..|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

% hexdump -C TEXTOR.LIG | head 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002f80  ff ff 1f 54 65 73 74 69  6e 67 0d 0a 0d 20 20 20  |...Testing...   |
00002f90  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00003000  ff ff 1f 54 68 69 73 20  69 73 20 61 20 54 65 73  |...This is a Tes|
00003010  74 20 6f 66 20 54 65 78  74 6f 72 20 56 65 72 73  |t of Textor Vers|
00003020  69 6f 6e 20 32 2e 32 0d  0a 0d 20 20 20 20 20 20  |ion 2.2...      |
00003030  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |

It seems the text portion of my document was saved in the LIG file and additional data, probably some description and user names into the TEX file. I then stumbled on a setup executable in the same directory that gave me some options.

THE TEXT DATABASE WILL BE CREATED ON THE DISK IN DRIVE (B)B
F1 – CREATING A TEXT DATABASE >1000 DOCUMENTS INACCESSIBLE BY MS-DOS
F2 – CREATING A TEXT DATABASE MANAGED BY MS-DOS (1 file per document)

Ok, so the software has two options. One for creating a database of text which we discovered above, and setting the software to create one file per document. When I selected F2, I was greeted with an error, which took me a minute to realize the first line required a disk to be in Drive B. Once I got it all configured I was able to save out a single file for a document.

% hexdump -C TEST02.BAT | head
00000000  1f 54 65 73 74 69 6e 67  20 32 6e 64 20 4f 70 74  |.Testing 2nd Opt|
00000010  69 6f 6e 0d 0a 1a 00 00  00 00 00 00 00 00 00 00  |ion.............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Not much to go on, the file is just full of plain ASCII other than a simple byte at the beginning and some new line bytes at the end. The BAT extension is a little unexpected, usually see those as batch scripts in DOS. Let’s try a more complex text document. More text, a tab, centering a line…..

% hexdump -C TXT223.BAT | head 
00000000  02 27 08 47 2d 49 2d 2d  2d 2d 2d 2d 2d 21 2d 2d  |.'.G-I-------!--|
00000010  2d 2d 2d 2d 2d 2d 2d 21  2d 2d 2d 2d 2d 2d 2d 2d  |-------!--------|
00000020  2d 21 2d 2d 2d 2d 2d 2d  2d 2d 2d 21 2d 2d 2d 2d  |-!---------!----|
00000030  2d 2d 2d 2d 2d 21 2d 2d  2d 2d 2d 2d 2d 2d 2d 21  |-----!---------!|
00000040  2d 2d 2d 2d 2d 2d 2d 2d  2d 21 2d 44 0d 0a 02 27  |---------!-D...'|
00000050  08 23 46 37 32 2c 30 30  2c 30 30 2c 30 30 2c 30  |.#F72,00,00,00,0|
00000060  30 2c 30 30 2c 34 30 2c  30 31 2c 30 31 2c 30 31  |0,00,40,01,01,01|
00000070  2c 30 32 2c 30 30 2c 37  32 2c 30 30 2c 30 30 2c  |,02,00,72,00,00,|
00000080  30 30 2c 30 30 2c 30 30  2c 30 32 2c 23 0d 0a 54  |00,00,00,02,#..T|
00000090  65 73 74 69 6e 67 20 73  6f 6d 65 20 6f 66 20 74  |esting some of t|

That gave me more to work with. But a bit of a mess. These seem to be more like some of the other earlier DOS word processing programs, they used ASCII, but embedded their own formatting codes throughout which only their software understood. This is why it is difficult to identify older WordStar or WordPerfect files.

This was a fun format to explore, I did learn a little French, but also had to dig deep to find the little information I was able to mention here. I would love to find a copy of Textor 4 or 5, which I believe are different than versions 2 & 3 and different than the Windows 6 version I have. There is one edition available on eBay currently, but seems to be the first version. If someone has the means in France this would be good to preserve. Feel free to look at the samples I made.

ACE

September 12, 2025 by Thor Leave a comment

Without divulging any youthful indiscretions, I recently was going back through some of my personal archives and came across a disc I burned around 2002 with some music stored on it. Normally I would find MP3 files, but in this case the file had a ACE extension. I remembered the format as an alternative to the common RAR or ZIP format often used to compress content for transporting (sharing) around the internet. I did what I normally do when something is compressed and reached for 7zip. But to my surprise, it threw an error.

% 7z l sample.ace 

Scanning the drive for archives:
1 file, 12501419 bytes (12 MiB)    

Listing archive: sample.ace


ERROR: sample.ace : Can not open the file as archive

7zip usually can handle most common archives but a part of me remembered there was two versions of WinACE back in the day. Version 1 which was a free version and Version 2 which was for paid users of WinACE. How do I know which version I have is the question I frequently find myself asking. First was to check the PRONOM registry.

% sf sample.ace 
---
siegfried   : 1.11.2
scandate    : 2025-09-11T09:01:25-06:00
signature   : default.sig
created     : 2025-03-01T15:28:08+11:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V120.xml; container-signature-20240715.xml'
---
filename : 'sample.ace'
filesize : 12501419
modified : 2025-09-11T09:04:36-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'UNKNOWN'
    format  : 
    version : 
    mime    : 
    class   : 
    basis   : 
    warning : 'no match'

Nope, this format is not known to PRONOM. Lets try another tool.

% file sample.ace 
sample.ace: ACE archive data version 20, from Win/32, version 20 to extract, solid

Ok, so the file tool knows it is a version 2 ACE file and requires version 2 to extract. Good info from a file identification tool. Now lets see what we can find to extract this file on MacOS. The website Winace.com is long gone as this compression tool lost popularity and the final release was over 14 years ago. Looking at the website in the WaybackMachine we can see some downloads available. One being UnACE for Mac OS X, which upon further review, only works for the older PowerPC Mac’s. There is an open source version of unace for Linux, but it only supports version 1, the free version of the format.

Below is a screenshot of the DOS version of the ACE software. Created by Marcel Lemke.

It might be good to mention that WinRAR used to support the ACE format, but with WinACE support ending years ago and with some new vulnerabilities and folks using it for malware, support was dropped in 2019.

Luckily, I still have my PowerMac G5 lying around waiting for this very situation. After a quick install, unace was able to unarchive my music and I was able to listen to some of my favorite songs from 23 years ago. I still wanted to find a modern solution and later discovered there is a python project which can read and extract bother versions. Acefile is a pure python, no-dependencies implementation of the UnACE format. I had a little issue installing on an older Catalina laptop, but worked well on later MacOS versions. Acefile has a few features that are helpful in not only extracting, but testing and dumping the headers of an ACE file. I did install WinACE in a Windows XP Virtual Machine to make a few samples, here is one of them.

% acefile-unace --test sample.ace 
success  test.tif
total 1 tested, 1 ok, 0 failed

The test feature works well to ensure the file is complete and can be extracted, but doesn’t give me much to go on for knowing the version. Lets try dumping the header.

% acefile-unace --header sample.ace 
volume
    filename    sample.ace
    filesize    12501419
    headers     MAIN:1 FILE:1 RECOVERY:0 others:0
header
    hdr_crc     0x4900
    hdr_size    44
    hdr_type    0x00        MAIN
    hdr_flags   0x8100      V20FORMAT|SOLID
    magic       b'**ACE**'
    eversion    20          2.0
    cversion    20          2.0
    host        0x02        Win32
    volume      0
    datetime    0x5b2aae37  2025-09-10 21:49:46
    reserved1   c8 51 62 e3 5b 80 00 00
    advert      b''
    comment     b''
    reserved2   b'\x00e\x9c\xb1\xd8\x00\x03\n\x00\x00@\x08\x00test.'
header
    hdr_crc     0x3626
    hdr_size    39
    hdr_type    0x01        FILE32
    hdr_flags   0x8001      ADDSIZE|SOLID
    packsize    12501328
    origsize    25264236
    datetime    0x5b2aadcd  2025-09-10 21:46:26
    attribs     0x00000080  NORMAL
    crc32       0x9290955a
    comptype    0x02        blocked
    compqual    0x03        normal
    params      0x000a
    reserved1   0x4000
    filename    b'test.tif'
    comment     b''
    ntsecurity  b''
    reserved2   b''

This is very helpful. We can see the output shows the magic bytes, but also the e(xtraction)version and c(creating)version. We can also find this information in the open source unace technical documentation.

       2      HEAD_CRC      CRC16 over block up from HEAD_TYPE
       2      HEAD_SIZE     size of the block from HEAD_TYPE
                              up to the last byte of this block

       1      HEAD_TYPE     archive header type is 0
       2      HEAD_FLAGS    contains most important information about the
                            archive

                               bit  discription

                                0   0  (no ADDSIZE field)
                                1   presence of a main comment

                                9   SFX-archive
                                10  dictionary size limited to 256K
                                    (because of a junior SFX)
                                11  archive consists of multiple volumes
                                12  main header contains AV-string
                                13  recovery record present
                                14  archive is locked
                                15  archive is solid

       7      ACESIGN       fixed string: '**ACE**' serves to find the
                              archive header

       1      VER_EXTRACT   version needed to extract archive
       1      VER_CREATED   version used to create the archive

I think we have enough to go on to create a signature, we just need to see what the 1 byte versions number look like in an actual file.

% hexdump -C sample.ace | head
00000000  00 49 2c 00 00 00 81 2a  2a 41 43 45 2a 2a 14 14  |.I,....**ACE**..|
00000010  02 00 37 ae 2a 5b c8 51  62 e3 5b 80 00 00 00 65  |..7.*[.Qb.[....e|
00000020  9c b1 d8 00 03 0a 00 00  40 08 00 74 65 73 74 2e  |........@..test.|
00000030  26 36 27 00 01 01 80 50  c1 be 00 6c 80 81 01 cd  |&6'....P...l....|
00000040  ad 2a 5b 80 00 00 00 5a  95 90 92 02 03 0a 00 00  |.*[....Z........|
00000050  40 08 00 74 65 73 74 2e  74 69 66 28 25 a4 89 04  |@..test.tif(%...|
00000060  fa 43 b1 05 49 0c a3 76  8e 16 a9 2c 92 44 34 8c  |.C..I..v...,.D4.|
00000070  2c 12 e7 28 67 68 49 69  a7 92 4a 10 07 da 10 16  |,..(ghIi..J.....|
00000080  9c 16 4a 10 07 2b 9c ae  30 a9 50 c4 0a 69 51 a6  |..J..+..0.P..iQ.|
00000090  c9 64 a7 24 09 93 3d 81  26 31 a9 c2 68 32 c1 33  |.d.$..=.&1..h2.3|

As you can see above, we have our magic bytes **ACE** starting at the seventh byte and taking up seven bytes. Then two bytes after it both with the hex value 14. If we convert that hex value to decimal we get “20”. Let’s look at another:

% hexdump -C sample2.ace | head
00000000  61 67 31 00 00 00 90 2a  2a 41 43 45 2a 2a 0a 0c  |ag1....**ACE**..|
00000010  02 00 50 7c 31 26 d7 2b  c0 48 af 83 ce d9 16 2a  |..P|1&.+.H.....*|
00000020  55 4e 52 45 47 49 53 54  45 52 45 44 20 56 45 52  |UNREGISTERED VER|
00000030  53 49 4f 4e 2a 34 5f 24  00 01 01 80 00 00 00 00  |SION*4_$........|
00000040  35 00 00 00 3c 7c 31 26  10 00 00 00 ff ff ff ff  |5...<|1&........|
00000050  01 05 0a 00 2a 55 05 00  61 75 64 69 6f 45 72 23  |....*U..audioEr#|
00000060  00 01 01 80 00 00 00 00  35 00 00 00 3c 7c 31 26  |........5...<|1&|
00000070  10 00 00 00 ff ff ff ff  01 05 0a 00 2a 55 04 00  |............*U..|
00000080  42 49 54 53 98 14 24 00  01 01 80 00 00 00 00 35  |BITS..$........5|
00000090  00 00 00 3c 7c 31 26 10  00 00 00 ff ff ff ff 01  |...<|1&.........|

Hmm, now we have two different values. “0A” converts to decimal “10” and “0C” converts to decimal “12”. So we can infer this ACE file was created in version 1.2 and requires at least version 1.0 to extract. Let’s try another:

% hexdump -C sample3.ace | head   
00000000  c0 3f 2c 00 00 00 81 2a  2a 41 43 45 2a 2a 0a 14  |.?,....**ACE**..|
00000010  02 00 dc ad 2a 5b 23 52  89 e0 5b 80 00 00 00 65  |....*[#R..[....e|
00000020  9c b1 d8 00 03 0a 00 00  40 08 00 74 65 73 74 2e  |........@..test.|
00000030  92 f3 27 00 01 01 80 54  c3 be 00 6c 80 81 01 cd  |..'....T...l....|
00000040  ad 2a 5b 80 00 00 00 5a  95 90 92 01 03 0a 00 00  |.*[....Z........|
00000050  40 08 00 74 65 73 74 2e  74 69 66 28 25 a4 89 04  |@..test.tif(%...|
00000060  fa 43 b1 05 49 0c a3 76  8e 16 a9 2c 92 44 34 8c  |.C..I..v...,.D4.|
00000070  2c 12 e7 28 67 68 49 69  a7 92 4a 10 07 da 10 16  |,..(ghIi..J.....|
00000080  9c 16 4a 10 07 2b 9c ae  30 a9 50 c4 0a 69 51 a6  |..J..+..0.P..iQ.|
00000090  c9 64 a7 24 09 93 3d 81  26 31 a9 c2 68 32 c1 33  |.d.$..=.&1..h2.3|

Again we have “0A” which converts to decimal “10” and hex 14, which converts to decimal “20”. So made with version 2.0 of the software, but made compatible with version 1.0 for extraction. One more:

% hexdump -C sample4.ace | head
00000000  8b d6 31 00 00 00 90 2a  2a 41 43 45 2a 2a 0b 0b  |..1....**ACE**..|
00000010  02 00 cd b4 3e 26 4a e3  a1 80 32 4b c1 d9 16 2a  |....>&J...2K...*|
00000020  55 4e 52 45 47 49 53 54  45 52 45 44 20 56 45 52  |UNREGISTERED VER|
00000030  53 49 4f 4e 2a aa 08 24  00 01 01 00 00 00 00 00  |SION*..$........|
00000040  00 00 00 00 83 b2 3e 26  10 00 00 00 ff ff ff ff  |......>&........|
00000050  01 05 0a 00 2a 55 05 00  4d 75 73 69 63 77 73 27  |....*U..Musicws'|
00000060  00 01 01 00 00 00 00 00  00 00 00 00 83 b2 3e 26  |..............>&|
00000070  10 00 00 00 ff ff ff ff  01 05 0a 00 2a 55 08 00  |............*U..|
00000080  52 65 73 6f 75 72 63 65  93 75 25 00 01 01 00 00  |Resource.u%.....|
00000090  00 00 00 00 00 00 00 83  b2 3e 26 10 00 00 00 ff  |.........>&.....|

Both extraction and creation version are hex “0B” which converts to decimal “11”. I would have assumed any version 1.0 version could extract anything created with later 1.x versions, but I guess that might not be true. I am not clear on all the versions released, so I am not sure how many versions I should include in a signature. I did look through some of the captured pages on the WayBackMachine and feel the last 1.x version was version 1.32.

When building these signatures, it should be easy to create two signatures based on their extraction version. But should the creation version be a factor? Version 1.0 could look like this:

2A2A4143452A2A(0A|0B|0C|0D)(0A|0B|0C|0D|14)

This accounts for the versions 1.0 through 1.3 for extract version and 1.0 through 2.0 for creation version. Version 2.0 doesn’t seem to indicate minor versions with all 2.0 versions using decimal 14. So a signature could be:

2A2A4143452A2A1414

Both would start from offset 7 from the beginning of the file. Is there a better solution?

I will warn you, there are a couple of ACE formats out there which you may come across. One being an image/texture format for Microsoft Train Simulator. That might be for another day. There is another use of the ACE archive which is worth discussing. The Comic Book Archive file with the extension CBA will use the ACE archive for storing a series of images used in some Comic Book Readers. They are indeed ACE archive files, only having the different extension and a specific purpose. Maybe adding the CBA extension to the signature would be sufficient?

I am sure there are some other properties, seen above, of the ACE format we could discuss, encryption, the differences between Solid and SFX, and dictionary headers, but I think for now, identification of the format and the main version difference is sufficient. For now, check out my Github page for my signature proposal and a few samples I made.

Page Perfect

August 22, 2025 by Thor Leave a comment

PagePerfect: the Promise of Desktop Publishing Realized

Now, PagePerfect has arrived. And suddenly PC desktop publishing is a lot
simpler and less expensive, because PagePerfect integrates desktop
publishing, word processing, and graphics editing all in one package.

The 1980’s was a time of growth in personal computing and one industry was progressing rapidly. Previously in order to get printed more than just words, you had to use a complex arrangement of type, masking, screening; all done by hand. Now with a personal computer you could design and print well designed layouts. There were many software applications who came on the scene in these early days. My personal favorite was QuarkXPress, I used the software in the early 1990’s and spent the next few years working in a commercial printshop using the software. What once took a team of skilled workers to set copy, mask, blueline, etc took only one person with the right software.

I recently came across a set of floppy disks for some software called PagePerfect, by a well known software company IMSI.

This article in a 1988 PC Magazine announces this new revolutionary software. This was early on in the days of computer desktop publishing and even on a DOS system the software was powerful. It didn’t always get the best reviews in terms of ease of use, but it was well built. The company behind this powerful software wasn’t IMSI as you might expect, it was programed by a different company, Beyond Words, started by three former MicroPro employees, the makers of WordStar. Beyond Words liked to “leave sales to others” which included IMSI and a big contract with Canon called their Desktop Publishing System.

IMSI was able to market the software well and was well priced. The name PagePerfect didn’t last long and soon after they renamed the software IMSI Publisher in 1989. I’m not 100% sure, but it might have to do with WordPerfect asserting some copyright to the name around that same time. By 1990, the software was not seen much anymore, but another name pops up, Beyond Words Composer 2.0.

All three versions of the software have a very similar interface.

But the one thing they all have in common is their file formats. Unfortunately they used the same extensions many word processing software used during this time and after. .DOC and also .STY which was used frequently by Microsoft Word as well. It makes sense, a Document is shortened to DOC and a Stylesheet is shortened to STY. So if you have any DOC files which don’t open in Word, you might look here. The other problem is the file format used is not plain text and is in a binary proprietary format.

hexdump -C TEST.DOC | head
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 3c af  13 5b 1e 00 00 00 95 63  |......<..[.....c|
00000020  00 00 5e 00 00 00 18 00  00 00 01 00 76 00 00 00  |..^.........v...|
00000030  68 01 00 00 0a 00 de 01  00 00 00 00 00 00 00 00  |h...............|
00000040  de 01 00 00 8b 60 00 00  1e 00 69 62 00 00 2c 01  |.....`....ib..,.|
00000050  00 00 1e 00 00 00 00 00  00 00 00 00 00 00 5b 42  |..............[B|
00000060  57 44 4f 43 5d 00 00 00  00 32 2e 30 39 00 00 00  |WDOC]....2.09...|
00000070  00 00 00 00 0a 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 6c 00 00 00  00 00 00 00 00 00 00 00  |....l...........|

The one positive is the very obvious strings of text in the header. [BWDB] and [BWDOC], which one could infer as Beyond Words DB and Beyond Words Document. A later Beyond Words Composer document has the same header but a higher version number.

hexdump -C WELCOME.DOC | head
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 aa 14  56 16 29 00 00 00 30 84  |........V.)...0.|
00000020  00 00 5e 00 00 00 18 00  00 00 01 00 76 00 00 00  |..^.........v...|
00000030  b0 01 00 00 0c 00 26 02  00 00 00 00 00 00 00 00  |......&.........|
00000040  26 02 00 00 70 80 00 00  29 00 96 82 00 00 9a 01  |&...p...).......|
00000050  00 00 29 00 00 00 00 00  00 00 00 00 00 00 5b 42  |..)...........[B|
00000060  57 44 4f 43 5d 00 00 00  00 33 2e 30 31 00 00 00  |WDOC]....3.01...|
00000070  00 00 00 00 0c 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 6e 00 00 00  00 00 00 00 00 00 00 00  |....n...........|

If we look at the Stylesheets we see the same patterns.

hexdump -C SAMPLE.STY | head   
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 51 10  76 10 09 00 00 00 da 2c  |......Q.v......,|
00000020  00 00 5e 00 00 00 18 00  00 00 01 00 76 00 00 00  |..^.........v...|
00000030  68 01 00 00 0a 00 de 01  00 00 00 00 00 00 00 00  |h...............|
00000040  de 01 00 00 a2 2a 00 00  09 00 80 2c 00 00 5a 00  |.....*.....,..Z.|
00000050  00 00 09 00 00 00 00 00  00 00 00 00 00 00 5b 42  |..............[B|
00000060  57 44 4f 43 5d 00 00 00  00 32 2e 30 39 00 00 00  |WDOC]....2.09...|
00000070  00 00 00 00 0a 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 6c 00 00 00  00 00 00 00 00 00 00 00  |....l...........|

I haven’t been able to find any specific bytes which differentiate the Stylesheets from the Documents. They may be the same format, but for now we will consider them the same. These stylesheets seem to function as a template which are often the same format.

Apart from the document layout, the software can also create and use databases. Which appear to be a similar format but with different offsets.

hexdump -C DOCUMENT.TBL | head
00000000  5b 42 57 44 42 5d 00 00  00 00 00 31 2e 30 30 00  |[BWDB].....1.00.|
00000010  00 00 00 00 00 00 6b 10  36 00 00 00 18 00 00 00  |......k.6.......|
00000020  01 00 4e 00 00 00 68 01  00 00 0a 00 b6 01 00 00  |..N...h.........|
00000030  00 00 00 00 00 00 5b 42  57 44 4f 43 5d 00 00 00  |......[BWDOC]...|
00000040  00 32 2e 30 39 00 00 00  00 00 00 00 0a 00 00 00  |.2.09...........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 6c 00 00 00  |............l...|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Prior to me diving into this format, the only tool which had some information on this format was TrID, which identified all the DOC and STY files as Beyond Words Composer style. Which is mostly true. Hopefully with this background you can be aware of the different software names this format was used with and with some luck convert the files to something less proprietary.

Some disks that came with my PagePerfect install disks do have some personal documents created with the software, but I wonder how much this software really was used in the late 1980’s and early 1990’s, because after that point, you don’t hear about the software anymore. There is some references to the software getting absorbed into another software, IBM DisplayWrite 5/2. I would be curious if others have come across this file format.

More Student Writing Center

August 8, 2025 by Thor Leave a comment

Most of what you will find on this blog is file format identification. I see this as the first step in a longer process of preservation and ultimately access. Hopefully the analysis of some file formats can help make better decisions when needing to render the file in an emulator or migrate to another format. I don’t spend much time trying to parse the files I look at to understand the actual content, just enough to properly identify and differentiate between important versions of the format.

One area I sometimes touch on, but often skim over is encryption. Many file formats are binary, meaning they use a sequence of bytes to encode data which is more efficient than human readable text and is often compressed. The bytes used to store data is designed by the developer of the software, they can encode the data however they choose, which is often unreadable by anyone else and is proprietary. A file can also be further encrypted by a password to limit use, even with the right software.

I recently had one of the numerous fans of this blog reach out and ask about the post I made on the software Student Writing Center. They had a bunch of journal files from their youth and couldn’t find a way to read these older files. I offered my help as I still have the software and a nice emulator to run the old software.

As I was going through and converting the journal entries into a PDF. I came across a few which asked for a password to open. You can see below the explanation from the help menu confirms the file format is a proprietary format only readable by their software and the password feature is to further protect the content.

Finding a few of the journal documents password protected was frustrating at first. I was converting some documents that are over 26 years years old, I doubted the password would be remembered. When I asked, they gave me a couple passwords to try, but nothing worked. But I don’t give up that easily!

My first thought was to take all the text from the other journal entries and make a dictionary and then use it to try and brute-force the password. There are some great tools to do this like hashcat. With tools like this, you need to retrieve a hash of the password. This is an encrypted sequence of the password stored in the file. So the first step was to find where the password was stored in the file. Since I have the software and can make new password protected files using a password of my choice this proved a simple task. Create two identical files, add a password to each but different. Then compare the two files in a hex editor to find the difference.

There it is. The password field in the software only let me put in 10 characters and these 10 bytes lit up when I ran a difference between the two files. I went to check the files given to me which also had password protection and found they also had a similar pattern. In fact I noticed from a few checks that the passwords I used also had a pattern in the file.

For this file I used the number “1” ten times. In that same location it repeated the same byte value”85″, 10 times. After a couple more tests I could see this wasn’t an algorithm I need to crack, but a simple replacement. I created a few more files using all the letters in the alphabet and all the numbers and came up with a substitution cypher.

Obviously the passwords used in the documents I was trying to open didn’t all use the full 10 characters, but the password was always preceded by the values “00” and had the values “1A46461A” after the password. The byte prior to the “00” indicates the length of the password. From there I just needed to decode the bytes between those two offsets.

So for this file with an 8 byte sequence “90D54F4FA3FBBA94” decodes to: password. How cool is that? To make things even easier, the passwords used in Student Writing Center are not case sensitive. There are additional values for symbols. You can see the entire substitution list here.

One other thing related to identification. Would it be important to identify a password protected file differently than a regular file? At offset 0xDA there seems to be a indicator that the file is password protected. “00” if not “01” if protected.

What do you think? Should this property be identified as a separate file format from a regular file or is this property something that should be gathered using additional tools that can gather additional properties from a file like this?

Speaking of additional tools. There is a pretty cool project called the Import library for legacy Mac documents or libmwaw which claims to have support for Student Center Writing documents and a lot more. It indeed does, but not the journal format, only the main letter format. I bet it wouldn’t take much to add the journal format to the library, something I will look into.

Microstation

July 11, 2025 by Thor Leave a comment

I recently was able to image a few Bernoulli Disks for a collection using a SCSI device I have found quite useful. The disks had been sitting around for quite some time waiting for the right tools and resources to extract the contents. I mentioned the accomplishment to a few coworkers and one asked me if I would extract the contents from their old disk they used for school back in the 1990’s. They had spent a whopping $99 at the local bookstore for a disk which held a total of 150MB. Not GB’s like we are used to now, but megabytes. I have some camera’s which takes RAW photos larger than then would fit on one disk. Once I had the data extracted from their disk, I took a look at the contents. There was a few file formats on the disk I was unfamiliar with. A quick scan with DROID revealed some matches and a few problems.

Turns out the data were files written by an old version of Bentley Microstation. The files dated from late 1995 and the disk was formatted for FAT16 which leans more to being used in a DOS system, but could have been used with the newly released Windows 95. The Bentley Microstation 95 software wasn’t released until November of 1995, so my guess is these Microstation files where created with the Microstation version 5 for DOS.

disktype HD6_imaged-004.hda 

Regular file, size 144.0 MiB (150998016 bytes)
No type and creator code
DOS/MBR partition map
Partition 4: 144.0 MiB (150978560 bytes, 294880 sectors from 32, bootable)
  Type 0x06 (FAT16)
  FAT16 file system (hints score 5 of 5)
    Volume size 143.8 MiB (150810624 bytes, 36819 clusters of 4 KiB)
    Volume name "ode 009 - I"

PRONOM has a few entries for the Microstation software:

PUID	Format Name	Format Name	Extension
x-fmt/346	Microstation CAD Drawing	95	DGN
fmt/502	Bentley V8 DGN		DGN
fmt/1626	MicroStation Symbology Resource File		RSC
fmt/1549	Bentley Microstation Hidden Line File		HLN
fmt/1358	MicroStation Base File		BSE
fmt/1183	MicroStation Material Palette		PAL
fmt/1177	MicroStation Material Library		MAT

The files found on this old Bernoulli disk gave varied results in identification. Most of the DGN files give me this multiple Identifications in DROID.

A little digging and we can learn a bit about the major formats. Integraph and Bentley used a Binary version of their drawing format, DGN, from versions 2 until 7, spanning 1987 to 2001, with the release of version 8, they made a major change to the format. Version 8 use the Microsoft OLE2 container to enhance the format allowing it to hold multiple drawings and more information about the model. With this change, the format became proprietary. Sure, they started an OpenDGN program to make the format more compatible with other systems, but required you to sign an NDA in order to get a copy of the format specifications. You had to request access and sign an NDA, which doesn’t sound “open” to me. You can read another file format researchers thoughts on this on her blog.

So I know many of these files are not Version 8 of the DGN format as they are not OLE2 containers, but the other issue is that x-fmt/346 for the Microstation CAD drawing 95 is an outline record. It has no signature. So DROID is guessing based on extension only. We need to dig deeper.

I noticed than many of the DGN files in my sample set also identified as a “Microstation Hidden Line File”, but instead of a HLN extension, they use DGN.

sf samp15.dgn 

filename : 'samp15.dgn'
filesize : 359424
modified : 1998-09-01T12:31:52-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1549'
    format  : 'Bentley Microstation Hidden Line File'
    version : 
    mime    : 
    class   : 'Model'
    basis   : 'byte match at [[0 3] [359422 2]]'
    warning : 'extension mismatch'

hexdump -C samp15.dgn | head
00000000  08 09 fe 02 01 08 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 c8 45  |............ ..E|
00000020  00 00 00 00 00 00 00 00  40 06 0c 00 01 05 dc a0  |........@.......|
00000030  ff ff ff ff ff ff ff ff  b5 8b 9f 63 b9 88 85 a7  |...........c....|
00000040  00 00 00 00 19 00 b4 86  13 00 fe be 00 00 00 00  |................|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  80 40 00 00 00 00 00 00  |.........@......|

hexdump -C samp7.dgn | head
00000000  c8 09 fe 02 01 08 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 04 7a 45  |..............zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 01 05 fc b0  |................|
00000030  ff ff ff ff ff ff ff ff  0d 00 9d b5 0c 00 74 93  |..............t.|
00000040  ff ff a6 fd 09 00 40 11  05 00 50 aa 00 00 e5 f8  |......@...P.....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Looking at a couple files in the same sample set, some use the header “08 09 fe 02 01 08 00 00” while another uses “c8 09 fe 02 01 08 00 00”. This is why samp15.dgn identifies as an HLN files as the signature matches, while samp7.dgn uses “C8” instead of “08” making it not identify as an HLN file. What is the difference and what is an HLN file?

First let’s define an HLN file. The name of the format is “Hidden Line File”, although most references refer to it as a “Visible Edges File“. Confusing, but the definition is: “a 2D or 3D DGN file that contains the edges visible in a 3D view (that is, with those edges that would be hidden, removed).”

Looking at a couple HLN files, we can see the format is the same as DGN files:

hexdump -C test-2d.hln | head
00000000  08 09 fe 02 08 01 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 7a 45  |............ .zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 00 05 fc b2  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff 5b f5 ff ff fe f9  |..........[.....|
00000040  00 00 00 00 01 00 d3 cb  01 00 36 2a 00 00 e8 03  |..........6*....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  80 40 00 00 00 00 00 00  |.........@......|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

hexdump -C test-3d.hln | head
00000000  c8 09 fe 02 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 20 00 7a 45  |............ .zE|
00000020  00 00 00 00 00 00 00 00  e8 03 0a 00 00 05 fc b2  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff 5b f5 ff ff fe f9  |..........[.....|
00000040  ff ff 0c fe 01 00 d3 cb  01 00 36 2a 00 00 e8 03  |..........6*....|
00000050  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  80 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.@..............|

Same difference between the two previous files. These two files also explain the difference between the “08” and the “c8” values. Microstation uses the first to indicate it is a 2D file and the latter to indicate a 3D file. The DGN format has been documented in libdgn and this distinction is referenced.

This presents a problem with the current PRONOM identification.

filename : 'MS95-2D.dgn'
filesize : 12288
modified : 2025-06-05T21:13:52-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1549'
    format  : 'Bentley Microstation Hidden Line File'
    version : 
    mime    : 
    class   : 'Model'
    basis   : 'byte match at [[0 3] [12286 2]]'
    warning : 'extension mismatch'

filename : 'MS95-3D.dgn'
filesize : 12800
modified : 2025-06-05T21:14:00-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/346'
    format  : 'Microstation CAD Drawing'
    version : '95'
    mime    : 
    class   : 
    basis   : 'extension match dgn'
    warning : 'match on extension only'

The 2D files mis-identify as Hidden Line Files and the 3D files are identified through extension only. We learned from a previous test that Hidden Line Files can be both 2D and 3D and are the same format as DGN, so a separate identification PUID is unnecessary, but the x-fmt/346 identification doesn’t have a signatures, so a few things need to change.

The other issue is a Hidden Line File is also available in version 8+.

filename : 'Microstationv8-s01.hln'
filesize : 7168
modified : 2025-06-05T19:48:09-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/502'
    format  : 'Bentley V8 DGN'
    version : 
    mime    : 
    class   : 'Image (Vector)'
    basis   : 'container name Dgn~H with name only'
    warning : 'extension mismatch'

They also identify as Bentley V8 DGN files, but with an extension mismatch. This should be easy to remedy with the addition of the extension HLN to the signature. The container signature seems to work well, no need to change anything.

My suggestions to fix these issues would be:

Depreciate x-fmt/346
Change name of fmt/1549 from “Bentley Microstation Hidden Line File” to “Microstation CAD Drawing” and use the version 2-7 to distinguish from v8
Change the signature for fmt/1549 from “0809FE” to “(08|C8)09FE02” no EOF of “FFFF”

The other option would be to make fmt/1549 the 2D drawing format and x-fmt/346 could be used for the 3D drawing format. What do you think?

I have uploaded a few samples to my GitHub page. Curious if your examples of DGN files match what I am seeing. There are a few other related formats that will need to be explored, but this should help for now.

miniDVD

June 20, 2025 by Thor Leave a comment

Let’s talk about the DVD format for a minute. Specifically the miniDVD media format.

DVD’s are indeed versatile, as the name implies. You can find files on them written in many different filesystems, including digital video. DVD-Video is a video format which replaced VHS tapes as a main source of home movie entertainment. Eventually the public could afford to record their own video onto these discs and enjoy them for years. With the popularity of high definition video, DVD’s are not as popular as they once were, but still provide a decent experience.

I often see the DVD-Video format in archives I work with and we use tools to “RIP” the already digital data from the disc into a new format. I use the term “RIP”, to indicate we are not digitizing the format as it already contains digital data. DVD-Video is a standard that is used on most discs and looks something like this:

tree /Volumes/VIDEO_ESSENTIALS 
/Volumes/VIDEO_ESSENTIALS
├── AUDIO_TS
└── VIDEO_TS
    ├── VIDEO_TS.BUP
    ├── VIDEO_TS.IFO
    ├── VIDEO_TS.VOB
    ├── VTS_01_0.BUP
    ├── VTS_01_0.IFO
    ├── VTS_01_0.VOB
    ├── VTS_01_1.VOB
    ├── VTS_01_2.VOB
    ├── VTS_01_3.VOB
    ├── VTS_01_4.VOB
    ├── VTS_02_0.BUP
    ├── VTS_02_0.IFO
    ├── VTS_02_0.VOB
    └── VTS_02_1.VOB

3 directories, 14 files

There is usually a AUDIO_TS and a VIDEO_TS folder. The Video folder is full of video files, but the Audio folder is always empty. Apparently is was going to be used for an audio format that was abandoned, so it remains empty. Often times I will see this folder absent on non-commercial discs.

An issue that has come up many times is often I find folks copy the folder structure from the disc to preserve the video as they would with any digital file. This can be an issue as the structure was meant for software and hardware used to access the DVD-Video format. The files by themselves can often not provide the same experience, especially if the disc contains any sort of encryption, then the files are useless. This is a complex, multi-part format and should remain together in this structure or migrated to a new format, such as an MKV for preservation.

Enter the miniDVD. It is a smaller version of the standard CD/DVD optical disc size. It was very popular as a recording medium for some digital video camera’s. Much like the Sony miniDVD handycam I own. You can pop a blank disc into the camera and it prepares it for you, which takes a couple minutes, then gives you 20 minutes of recording in high quality and up to 60 minutes with a lower quality. The discs can hold up to 1.4GB and will have the same structure as its big brother.

tree /Volumes/2025_05_23_07H36M_PM 
/Volumes/2025_05_23_07H36M_PM
└── VIDEO_TS
    ├── VIDEO_TS.BUP
    ├── VIDEO_TS.IFO
    ├── VIDEO_TS.VOB
    ├── VTS_01_0.BUP
    ├── VTS_01_0.IFO
    └── VTS_01_1.VOB

2 directories, 6 files

It is missing the AUDIO_TS folder, which is fine, but here is the catch. In order for the disc to be readable by another device, it has to be finalized!

Finalizing is an action which has to happen to any optical disc to “close” out the disc. This process adds important directory and file system data so computers and DVD Players can read the disc properly. Many camera’s like mine and other DVD Recorders require this step when you are finished recording. Unfortunately, it’s an extra step which can take a few minutes, so its is often forgotten. I have had many optical discs come to me over the years because they show up as blank or uninitialized when read on a computer. I fear many people have put them aside or thrown them away as blank, not knowing they have data on them. Luckily with most burnable discs, you can often see the difference from a blank disc and a burned disc from the underside, writable surface.

The filesystem used on most DVD-Video discs is called UDF, Universal Disk Format. It is often combined on hybrid discs with ISO-9660 and HFS for compatibility, but can be the only filesystem as well. According to the specifications, a UDF formatted disc should have a Volume recognition sequence to identify as a UDF disk. On a finalized disc I can find this sequence, but on an un-finalized disc, it is missing. This makes sense as the the disc is often seen as unformatted. A tool I use to explore a disc like this is with ISOBuster.

Another interesting feature of my Sony Handycam is the option to choose what type of disc you would like to prepare when you insert a blank disc. I get the option to choose Video or VR mode. Video is your normal DVD-Video format, but VR Mode is something a little different.

tree /Volumes/2025_05_23_08H29M_PM 
/Volumes/2025_05_23_08H29M_PM
└── DVD_RTAV
    ├── VR_MANGR.BUP
    ├── VR_MANGR.IFO
    └── VR_MOVIE.VRO

2 directories, 3 files

Instead of your expected VIDEO_TS folder, we see a DVD_RTAV folder with some different files inside. No this is a Virtual Reality mode, like I originally thought, the VR simply stands for Video Recording and is a standard. It is meant to allow for easier editing of the video format, but is not compatible with your standard DVD Player. The VRO format used is pretty cool, it is a container format, MPEG-PS, for both audio and video, also containing both 4:3 and 16:9 aspect ratios, unlike a VOB where the aspect ratio is set.

hexdump -C /Volumes/2025_05_23_08H29M_PM/DVD_RTAV/VR_MOVIE.VRO | head
00000000  00 00 01 ba 44 00 04 00  04 01 01 89 c3 f8 00 00  |....D...........|
00000010  01 bb 00 12 80 c4 e1 04  e1 7f b9 e0 e8 b8 c0 20  |............... |
00000020  bd e0 3a bf e0 02 00 00  01 bf 07 d4 50 00 00 00  |..:.........P...|
00000030  00 4d e3 00 00 00 00 00  ff ff ff ff ff 00 00 00  |.M..............|
00000040  00 00 00 00 00 00 00 00  53 4f 4e 59 5f 4d 4f 42  |........SONY_MOB|
00000050  49 4c 45 20 20 20 20 20  20 20 20 20 20 20 20 20  |ILE             |
00000060  20 20 20 20 20 20 20 20  41 52 49 5f 44 41 54 41  |        ARI_DATA|
00000070  01 02 ff ff 53 4f 4e 59  00 44 43 52 2d 44 56 44  |....SONY.DCR-DVD|
00000080  30 30 34 47 00 01 55 53  52 54 59 50 45 31 4c 4b  |004G..USRTYPE1LK|
00000090  00 10 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The VRO file does identify as a MPEG Program stream (x-fmt/386), but does contain a little extra information. My trusty copy of the book DVD Demystified has a bunch more info on this format if you are interested, you can find a copy here. The VRO format is an MPEG PS so identification is covered, but the current PRONOM signature doesn’t like the VRO extension. The BUP & IFO files on the disc are not identified. This is because the PRONOM signature, which covers both of these formats, is looking for the ASCII string “DVDVIDEO-VTS” or “DVDVIDEO-VMG”. It won’t find either of those strings as this is not the DVD-Video standard. instead it should look for the string “DVD_RTR_VMG” found in these files.

hexdump -C /Volumes/2025_05_23_08H29M_PM/DVD_RTAV/VR_MANGR.IFO | head
00000000  44 56 44 5f 52 54 52 5f  56 4d 47 30 00 00 7f ff  |DVD_RTR_VMG0....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 02 07  |................|
00000020  00 11 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  1e 5c 03 11 ff ff ff ff  ff ff ff ff ff ff ff ff  |.\..............|
00000050  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000060  ff ff 4d 41 59 20 32 33  20 32 30 32 35 20 20 20  |..MAY 23 2025   |
00000070  38 3a 32 39 50 4d 00 00  00 00 00 00 00 00 00 00  |8:29PM..........|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

I will probably suggest this addition to PRONOM for identification, but if you need to work with this format, you can use tools like: https://www.pixelbeat.org/programs/dvd-vr

DaVinci Resolve

May 2, 2025 by Thor Leave a comment

A previous post was about LUTs, the little files needed to color grade your photo’s and video’s. One of the best systems for color grading video in use by professionals today is DaVinci Resolve. The system originally was all hardware based, but in the 2004 as computers were able to process higher quality video, da Vinci Systems released new digital systems.

Like most professional multimedia editing software, projects are used to manage work and DaVinci Resolve is no different. Projects are generally where all the settings for the project are stored, but don’t generally store the actual media used in the project. Project files are often XML with unique schema’s, but other pack a little more into the project file.

hexdump -C project.drp | head
00000000  50 4b 03 04 14 00 08 00  08 00 f2 54 90 5a ef 18  |PK.........T.Z..|
00000010  b0 25 47 0c 00 00 db 1b  00 00 0b 00 00 00 70 72  |.%G...........pr|
00000020  6f 6a 65 63 74 2e 78 6d  6c 9d 58 d9 72 5b 37 12  |oject.xml.X.r[7.|
00000030  7d cf 57 68 f4 7e 4d ec  4b 8a 51 ca b1 92 89 aa  |}.Wh.~M.K.Q.....|
00000040  2c db 65 29 79 9d 6a 00  0d 85 09 45 aa 48 4a 71  |,.e)y.j....E.HJq|
00000050  fe 7e 0e ee 42 51 94 9c  68 c6 29 85 17 0d a0 d1  |.~..BQ..h.).....|
00000060  e8 3e bd 61 fe fd 97 db  e5 c9 03 6f b6 8b f5 ea  |.>.a.......o....|
00000070  bb 53 f9 46 9c 9e f0 2a  af cb 62 75 f3 dd e9 2f  |.S.F...*..bu.../|
00000080  d7 3f 75 e1 f4 fb b3 6f  e6 ff ea ba f3 f4 f6 ee  |.?u....o........|
00000090  ee 57 de 60 55 7c 23 df  98 37 42 48 79 7a 72 9e  |.W.`U|#..7BHyzr.|

DaVinci Resolve keeps all projects in a database, but you can export them to a project file. A DaVinci Resolve Project file uses a ZIP container to store all the project settings in one file. Let’s see what also might be inside.

Path = project.drp
Type = zip
Physical Size = 543860

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2018-02-27 20:25:08 .....      1010030       287793  project.xml
2018-02-27 20:25:08 .....        21173         6856  MediaPool/Master/000_Timelines/MpFolder.xml
2018-02-27 20:25:08 .....       492690        28067  MediaPool/Master/001_Audio/MpFolder.xml
2018-02-27 20:25:08 .....        20177         3588  MediaPool/Master/002_gfx/MpFolder.xml
2018-02-27 20:25:08 .....        11025         2611  MediaPool/Master/003_VO/MpFolder.xml
2018-02-27 20:25:08 .....        98309         7042  MediaPool/Master/004_ScreenCaptures Consolidated/MpFolder.xml
2018-02-27 20:25:08 .....      1278493        66424  MediaPool/Master/005_Video H264/MpFolder.xml
2018-02-27 20:25:08 .....         1995          748  MediaPool/Master/MpFolder.xml
2018-02-27 20:25:08 .....      1638204       137086  SeqContainer/909a0a2c-4183-4310-9f78-6e15c3c59cb4.xml
2018-02-27 20:25:08 .....         8806         1169  Gallery.xml
2018-02-27 20:25:08 .....        12697          696  media.dat
------------------- ----- ------------ ------------  ------------------------
2018-02-27 20:25:08            4593599       542080  11 files

Looks like a lot of XML! The consistent XML in all the DRP files is the apply named “project.xml” along with “Gallery.xml”.

cat project.xml | head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="19.1.4.0011" DbPrjVer="14"-->
<SM_Project DbId="db65f2ee-2bff-41cd-b478-f96c26e9609f">
 <FieldsBlob>000000010000000700000026005400650078007400520065006e006400650072004900740065006d005600650063004200410000000c00ffffffff0000002400520065006e0064006500720043006100630068006500560065007200730069006f006e0000000200000000010000001e00500072006f006a00650063007400460065006100740075007200650073000000050000000000000000010000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000030000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800330033003400320034003300380036002d0034006400330030002d0034003600610035002d0061006100340033002d006100330035003200620066006500370038003200640063000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
 <LockId/>
 <User>86f03abc-9354-47d9-9006-a55b6b1d49cf</User>
 <Folder/>
 <UserId>-1</UserId>
 <SysId>6CB133A11B81</SysId>
 <ProjectId>0</ProjectId>

It appears the version of DaVinci Resolve is pretty important. If you try and open a DRP file without using the most up-to-date software you might run into problems. From what I can see, every time a new major version is released, the updates to the XML cause the project error when imported. So knowing the version of the DRP file can be a critical piece of metadata needed in understanding the format. There are some helpful apps created by DaVinci Resolve users you can try, or you can try a little python script to report back the version used in a DRP or whole folder of DRP files.

There is one other file used by the DaVinci Resolve software. It uses the DRT extension and is for exporting and importing single timelines to the software. Like a DRP it is a simple project file that only points to the media used in the project and only stores the settings needed.

Path = timeline.drt
Type = zip
Physical Size = 215159

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2021-04-21 21:16:42 .....        45726         8888  project.xml
2021-04-21 21:16:42 .....       670306       198698  MediaPool/Master/MpFolder.xml
2021-04-21 21:16:42 .....        98268         7089  SeqContainer/7eb849f3-41cb-4e3f-baa8-d5b134b57aa7.xml
------------------- ----- ------------ ------------  ------------------------
2021-04-21 21:16:42             814300       214675  3 files

This DRT file also has a project.xml file, but doesn’t have the Gallery.xml file we normally find in a DRP file. We can use this to distinguish the difference. The project.xml is the same as the DRP, so this distinction is important.

cat project.xml |head
<?xml version="1.0" encoding="UTF-8"?>
<!--DbAppVer="17.1.1.0009" DbPrjVer="10"-->
<SM_Project DbId="ec6cb2e2-0b3c-43b8-8f90-a5fcb973af3b">
 <FieldsBlob>00000001000000040000002e00500072006f006a00650063007400440062004d006900670072006100740069006f006e00530074006100740065000000040000000000000000020000002e0049007300500072006f006a0065006300740041006700650049006e004d006900630072006f00530065006300730000000100010000001400470061006c006c0065007200790052006500660000000a000000004800660030003800380038003300390038002d0066006400620037002d0034006300320036002d0061003700310032002d003300360038006200300036003300300065003400330031000000260046007500730069006f006e00530069007a0069006e006700560065007200730069006f006e000000020000000002</FieldsBlob>
 <LockId/>
 <User>04d71873-a504-40c6-bde5-41709691a2c9</User>
 <Folder/>
 <UserId>-1</UserId>
 <SysId>94F6D6F3F60F</SysId>
 <ProjectId>0</ProjectId>

In both formats they use the XML root tag of “SM_Project”, this can also be used to define a signature for the two formats as “project.xml” could be used with a different format and we don’t want there to be a false identification.

I was able to trace back the use of the DRP format back to DaVinci Resolve version 9. In version 8, it appears projects are exported using the name and extension, “Default Project.resolve.zip”. From what I could find, DaVinci Resolve version 9 was a big re-write and so it makes sense to settle on more useful extension. The project.xml file in a version 8 format is slightly different.

cat project.xml | head
<SM_Project DbId="9ba0c4dc-d99c-4b7f-b0da-d254d91e34e2" DbAppVer="8.2 (#153)">
 <LockId></LockId>
 <User>159415b8-7515-43bf-b5f5-00d98949434b</User>
 <UserId>-1</UserId>
 <SysId>7cd1c388ea29</SysId>
 <ProjectId>0</ProjectId>
 <RevivalTaskSetID>-1</RevivalTaskSetID>
 <PlayHeadsSplitDisplay>false</PlayHeadsSplitDisplay>
 <pGallery>
  <Gallery::GyGallery DbId="9884d8ff-096e-4df0-b833-0e75e6e07e15">

Still uses the “SM_Project” root tag, but displays the DbAppVer information differently. It would be good to find more examples of the version 8 and earlier to see how this format has evolved over time. For now, I have created a signature you can test if you happen to have any DRP files in your archive.

Scrivener

March 21, 2025 by Thor 1 Comment

Word Processors are everywhere and have some of the most recognizable file formats. Some are very simple in that they just contain plain text, others are more complex. There are formats which allow for images and others which can handle different languages and writing directions.

A writing platform I recently learned about is called Scrivener. It was first released in 2007 by a company called Literature & Latte Ltd, and has a Macintosh and Windows version. The software is marketed toward writers as there is some features that help with note taking, research and much more. It also allows for adding multimedia and even full webpages.

This is accomplished by a file format which uses a non-traditional method for storing all the data needed to render the format.

tree Scrivener3-s01.scriv
Scrivener3-s01.scriv
├── Files
│   ├── Data
│   │   ├── 921B4A08-54C0-4B69-94FD-428F56FDAB89
│   │   │   └── content.rtf
│   │   └── docs.checksum
│   ├── binder.autosave
│   ├── binder.backup
│   ├── search.indexes
│   ├── styles.xml
│   ├── version.txt
│   └── writing.history
├── Scrivener3-s01.scrivx
└── Settings
    ├── recents.txt
    ├── ui-common.xml
    └── ui.ini

Scrivener uses a folder structure to store all the data used in the format. The folder has an extension, .scriv. The format includes some rich text, backups, indexes, version history and more. One unique format within the folder is an XML file with the extension .scrivx. This makes the format proprietary and can only be rendered using the Scrivener software.

cat Scrivener3-s01.scrivx | head
<?xml version="1.0" encoding="UTF-8"?>
<ScrivenerProject Template="No" Version="2.0" Identifier="DF5DA7F0-27DB-4815-A050-B4D6F23CABA7" Creator="SCRWIN-3.1.5.1" Device="DESKTOP-JMM4K7M" Modified="2025-03-14 22:15:28 -0600" ModID="B4A944C3-FF79-49F6-A737-158BEB4E58BB">
    <Binder>
        <BinderItem UUID="17807D28-117A-409E-B12D-B34922B6CC6F" Type="DraftFolder" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:17 -0600">
            <Title>Draft</Title>
            <MetaData>
                <IncludeInCompile>Yes</IncludeInCompile>
            </MetaData>
            <Children>
                <BinderItem UUID="921B4A08-54C0-4B69-94FD-428F56FDAB89" Type="Text" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:23 -0600">

The XML has enough to be able to identify them apart from other XML files. The signature would be straight forward. Earlier versions of Scrivener sometimes have the SCRIVX file but also sometimes has a
.scrivproj extension. This file on a Macintosh is in a Binary plist format, which is different than earlier Windows versions. Seems they may have unified them under version 2 or 3, where version 1 & 2 for Windows uses Project version 1 and version 3 uses project version 2.

hexdump -C Scrivener1-s01.scriv/binder.scrivproj | head
00000000  62 70 6c 69 73 74 30 30  d4 00 01 00 02 00 03 00  |bplist00........|
00000010  04 00 05 00 1d 01 d8 01  d9 54 24 74 6f 70 58 24  |.........T$topX$|
00000020  6f 62 6a 65 63 74 73 58  24 76 65 72 73 69 6f 6e  |objectsX$version|
00000030  59 24 61 72 63 68 69 76  65 72 dc 00 06 00 07 00  |Y$archiver......|
00000040  08 00 09 00 0a 00 0b 00  0c 00 0d 00 0e 00 0f 00  |................|
00000050  10 00 11 00 12 00 13 00  14 00 15 00 16 00 17 00  |................|
00000060  18 00 19 00 1a 00 15 00  1b 00 1c 5a 4c 61 62 65  |...........ZLabe|
00000070  6c 54 69 74 6c 65 59 4c  61 62 65 6c 4c 69 73 74  |lTitleYLabelList|
00000080  5e 42 69 6e 64 65 72 43  6f 6e 74 65 6e 74 73 5f  |^BinderContents_|
00000090  10 0f 44 65 66 61 75 6c  74 4c 61 62 65 6c 54 61  |..DefaultLabelTa|

Since the developers of Scrivener decided to make the SCRIV format simply a folder with different content within, something special happens on the MacOS. The Scrivener software registers all the extensions is uses with the MacOS launch services. This process then changes the way the SCRIV folder is displayed in the MacOS Finder. They now appears as a single file and given a file type. This is called a Document Package format.

By right-clicking on the “file” you can then browse the package contents. There is nothing in the folder itself or hidden in any attributes which causes this to happen, it is all controlled by what extensions have been registered with the launch services database. We can however ask the MacOS to give us some extended metadata details about the package, as long as the file is on a Apple filesystem like HFS or APFS.

mdls Scrivener3-s01.scriv 
_kMDItemDisplayNameWithExtensions      = "Scrivener3-s01.scriv"
kMDItemContentCreationDate             = 2025-03-15 04:15:17 +0000
kMDItemContentCreationDate_Ranking     = 2025-03-15 00:00:00 +0000
kMDItemContentModificationDate         = 2025-03-15 04:15:18 +0000
kMDItemContentModificationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentType                     = "com.literatureandlatte.scrivener3.scriv"
kMDItemContentTypeTree                 = (
    "com.literatureandlatte.scrivener3.scriv",
    "public.directory",
    "public.item",
    "com.apple.package",
    "public.content",
    "public.composite-content"
)
kMDItemDateAdded                       = 2025-03-21 04:38:48 +0000
kMDItemDateAdded_Ranking               = 2025-03-21 00:00:00 +0000
kMDItemDisplayName                     = "Scrivener3-s01.scriv"
kMDItemDocumentIdentifier              = 0
kMDItemFSContentChangeDate             = 2025-03-15 04:15:18 +0000
kMDItemFSCreationDate                  = 2025-03-15 04:15:17 +0000
kMDItemFSCreatorCode                   = ""
kMDItemFSFinderFlags                   = 0
kMDItemFSHasCustomIcon                 = (null)
kMDItemFSInvisible                     = 0
kMDItemFSIsExtensionHidden             = 0
kMDItemFSIsStationery                  = (null)
kMDItemFSLabel                         = 0
kMDItemFSName                          = "Scrivener3-s01.scriv"
kMDItemFSNodeCount                     = 3
kMDItemFSOwnerGroupID                  = 20
kMDItemFSOwnerUserID                   = 501
kMDItemFSSize                          = 31155
kMDItemFSTypeCode                      = ""
kMDItemInterestingDate_Ranking         = 2025-03-15 00:00:00 +0000
kMDItemKind                            = "Scrivener Project"
kMDItemLogicalSize                     = 31155
kMDItemPhysicalSize                    = 69632

There is a lot of additional details available using the MDLS command, this includes the content type of “com.apple.package“. This tools works with any files in MacOS and can be a very useful tool in getting all the information you may need for preservation needs.

Until the tools we use for format identification can recognize package formats, tools like this may be needed to gather the neccessary metadata for preservation. But in the meantime, identification of the package content is the best we can hope for. Creating a signature for the XML based SCRIVX format is the first step.

Stay tuned for more on the package format as I will be bring it up more in the Digital Preservation community.