In honor of the 25th of May, aka #525FloppyDay, I have a few thoughts on 5.25″ Floppy Disks, more specifically, the sleeves that protect them.
This year marks the 50th anniversary of the 5.25″ Floppy Disk. Prior to 1976, Floppy Disks were a massive 8 inches square which also used sleeves. It wasn’t until the need to make the disk smaller or “mini-diskette” in 1976 that the media became more popular and used in most desktop computers systems in the 1980’s. I won’t get into all the different ways data was encoded on these diskette’s as that is a massive topic, so today we will look at an often overlooked feature of the media. The protective sleeve.
For the sake of consistency, I will call this protection a sleeve, but may also be referred to as an envelope, jacket, or a pouch. Pouch being used by the folks down-under. The sleeve was mainly used for storage of the disk, but was also a great marketing tool and on the backside a reminder of the proper care of the disk. Floppy disk manufacturers and software companies would use the sleeve to print their logo and other important information about the disk, some opting for blank or generic graphics. Today we will look at some of the creative and not so creative uses.
Sleeves for floppy disks were made of a few different types of material. Plain paper sleeves were very common, many of which have yellowed over time. Some were very thin, others quite thick to offer more protection. One type of material often used for a sleeve is called Olefin. Kind of a cross between paper and plastic, this spun bonded material was used by many manufactures because of this superior durability an its ability repel water. This material made by Dupont was named Tyvek and has many uses, one use being found protecting the walls of your home.
As far as I know there is no “archive” documenting all floppy disk sleeves. There is a website called the Original Disk Sleeve Archive started years ago and has many sleeves referenced, but hasn’t been updated in awhile. Jason Scott at the Internet Archive has also uploaded scans he has made, here and here.
Today I would like to share my collection of Sleeves with you. I scanned them myself from my collection at a massive 1200 dpi so you can explore all the details and find creative uses for them.
You will find most of the sleeves have a logo or name on the front, but the back may have some interesting icons. These icons helped remind us of the fragile nature of floppy disks and the care needed to make them last. The icons can be understood without any explanation, using familiar images. Some having the typical circle with a line through it, indicating to NOT do something, others would use simple words in a few languages to get the meaning across. Then you have some who decided to go with humor or clever phrases to get the message across. In all, the messages were clear and understood, but as the floppy disk lost its usefulness over time, the icons and messages disappeared to time as well.
You might noticed many of the different sleeves in my collection have a common set of icons, which were probably bought instead of designing their own by many companies. Others spent some time and effect into designing their own.
Generally they all follow they same message, Do not bend, only use soft tip pens or markers, never use paper clips, don’t touch the magnetic media with your hands, and most of all, keep magnets away from the disk. Others would indicate to be sure to insert into the drive carefully and not to force them. Many list the proper temperature and humidity to store the disks, which is a very wide range.
The sources of magnetism are very different from what we might find today in a typical office or home. No longer do we see large heavy telephones which huge magnets inside, or huge CRT monitors, nor do many of us keep large multimedia speakers at our desks, but these were real dangers when floppy disks were in use. For those who use floppy disks today, we have new unknown dangers to contend with, the modern laptop and cell phone which can be deadly to magnetic media. I explore this and other myths in a paper submitted to iPres this fall, stay tuned!
One of my favorite practical back sides is this one from Elephant Memory Systems. Don’t touch the shiny Parts! Also a reminder of making a copy of your data if it’s important, a good preservation principle.
There are some slightly humorous instructions, like not letting your disk drink or smoke.
Or this clever set of instructions. Letting us know to avoid letting our disks die an agonizing death and to keep them happy.
But of course the the most famous of all disk care backside instructions go to the Beagle Bros. They decided to go with something less practical and more humorous to get their point across.
The Beagle Bros were a software company and distributor who mainly distributed Apple II software in the 1980’s. The art is credited to Fred Crone, an artist for the Beagle Bros, who with is wife Sara created much of the art used by the software company. The warnings were so popular, they got asked frequently if the warnings could be reprinted and permission was happily given. You can find more images and history at the Beagles Bros Repository. Which is your favorite?
If you happen to catch me at a conference or send me a nice message below, I might part with a few stickers I have made using many of the icons and warnings you find on the back of these sleeves. Enjoy.
I found myself in the same situation again with a colleague asking me for help identifying an unknown file. The file in question did not have an extension and to make things harder, the file could not be shared with me, only the header. So not having much to go on I started with some assumptions. Not having an extension leans toward the file being from an early Macintosh system. My favorite. I asked some follow up questions and learned the file was from around 2002, but no longer had any extended attributes that might contain a type creator code to help with identification. There was also mentioned of layers and fonts later in the file I was told.
Looking at the header sent to me, there was an obviously first choice to look into.
cd cd 20 07 43 41 4e 56 (ÍÍ CANV)
With the ASCII text “CANV“, the initial thought was this file was related to the popular Canvas software. I did a post on the Canvas formats a couple years ago so I went back through my files and could not find any match. Most uncompressed Canvas files use “CANVAS5” or “CANVAS6”, nothing which was shortened to CANV. I check all the samples I have from many versions. Back to square one.
I looked at other graphics and desktop publishing programs from the time period, I even asked AI to help me narrow it down. AI also recommended Canvas, but also many others I was already aware of and had dismissed. AI did not know what to make of the CANV string in the header, it was not too helpful. I did follow a couple of leads it gave me, but they led to dead ends. One title I looked into was called Desktop Publisher Pro. Files from this software also had a unique header, but nothing resembling what I was looking for. Maybe I will do another post of this software in the future.
So the next step in my process was to scan through all my sample data sets for something that matched or was close. I made a simple signature with the CANV header, but also one with the “CDCD” hex values as it seemed unique as well. I set the scan to run overnight on my sample set and the next morning I was met with disappointment. Not one match. I decided to run the same signatures across a few of my other drives of personal files, just for good measure. The next morning I had a surprise. Within the 10’s of thousands of files scanned, one file popped. The file the scan found also had no extension and the software that created it was no longer on my computer so the file was not associated with any software. But luckily in my case, the Type/Creator code was still attached!
There it is! The header with CANV. Looking up the Creator code “ARTD” in my TCDB tool, I learned the creator of the file.
% python3 TC-lookup-draft-uni.py "MC Wrap" Type Code: DISC Creator Code: ARTD Size of Data Fork: 18483930 bytes Size of Resource Fork: 0 bytes Rows with Type Code b'DISC' and Creator Code b'ARTD': Row index: 11828 File Name: Discus—Disc Lable File Type: DISC Creator: ARTD Comments: Discus Category: nan Extension: nan Dup: Data by Ilan Szekely, Jerusalem: nan ------------------------------
The database of Type/Creator codes identified the file as being created by the Discus software. A little search and I remembered the file in question was created by Discus RE which was included with Roxio Toast, a CD/DVD buring software popular on the Macintosh for many years. Discus RE was labeling software bundled with Toast for a few versions, lastly Toast version 7.
I happen to have Toast 6 on my older PowerMac G5 machine which included Discus RE 2.74. I made a few samples to compare.
You may have noticed both my file and this sample have a slightly different header, my samples have “CDCD007” while the header sent to me has “CDCD2007”. A slight difference I am not sure of the meaning. I need to see more samples. Since this was a sample from Discus version 2, I tracked down samples from version 3 and 4.
Version 3 seems to have the same structure. Version 4 is a little different and also has an extension this time. The first two bytes are the same, but has 0008 and is missing the CANV, replaced by PREV.
I learned the Discus software was created by a software company called Magic Mouse Productions. Discus was written by Edward de Jong, the founder of Magic Mouse Productions, who has also created many other software titles. Discus sold millions of copies, many of which were bundled, but also full versions which unlocked many other label templates and a lot more artwork. The Discus software also included a large selection of artwork which could be used in the label design. Looking at a sample from the full version, I was hoping to see the CDCD2007, but found them to be similar to the other samples.
I actually reached out to Edward de Jong and asked him about the file format. His response,
The Discus file format is fairly straightforward
The first four bytes are the signature of a Discus file
The Discus software was built for the PowerPC chip for the Macintosh, but also for Windows. The last couple of updates to Discus version 4 added Intel chip support, but then no new versions where created. Edward explained….
I still use Discus myself, but on the Macintosh it stops running after OSX 10.14 because Apple cruelly discontinued their emulator for older operating systems and the Intel instruction set.
After Toast version 7, Roxio used a different labeling software bundled with their burning software. This and many others still need some research and documentation. For now, a signature for Discus will help in weeding out these labels some might find un-needed, but other may find invaluable.
Take a look at some samples and my signature proposal and let me know what you think.
If you remember the older post about Cafe Beef, you’ll appreciate the file format we explore in this post which uses using the Hex values “BE DEAD”. I guess they jinxed themselves because the software didn’t survive a refresh in 2009 and died. At one point the software was considered remarkable software being awarded 4.5 Mice by Macworld Magazine in August 2002.
When a colleague reach out to me recently with a file they were not familiar with I jumped in. I love a good challenge. The file had no extension, but was thought to have come from a Windows system. With a little digging I was able to identify the file as a Now Contact file which does have a Windows and Macintosh version, but with no extension, my money was on the file coming from the Mac.
I started my search with the obvious, the first few bytes. Since I only had one file, I wasn’t sure if this would be helpful, but looking at the bytes, I figured it was significant.
The first three bytes are “BE DE AD“, BE DEAD seems to be done on purpose. A quick search on the web showed no results, no mention of this unique header. I even turned to AI, asking grok if it know the source of this byte sequence. It had no idea. I began digging through the file looking for clues to its software source. The ASCII text I could see indicated some sort of customer database, that along with the file name of “CONTACT FILE” seemed to confirm. I found some dates from 2002 and started looking at popular CRM and PIM software at the time. I then found a reference to a note the user left saying they opened the file on a different Power Tower Pro. I owned one of these clones back in college, so I immediately knew they were using a Macintosh! A quick search of popular contact management software from the early 2000’s revealed a few suspects. I took a look at a product from Now Software, Now Up-to-Date & Contact version 3.9 and I found the header I was looking for! Had the file sent to me retained its extended attributes from the Mac, I would have found this software much quicker.
The Now Contact software has a few functionalities including a Word Processor, but lets stick to the contact manager for now. Let’s take a look at a sample file from version 1.
This file does not have the “BE DE AD” header, but something else. I do see a repeated pattern of the text “KnDB” which also happens to be the Type code used on the Macintosh.
These version 1 files don’t seem to have a static header, but they do have common bytes sequences. I will need to make more samples to get a proper signature constructed.
Now Contact skipped version 2 so the next version to be released was 3.0. What do these files look like?
They have the same header as the file I received. Let’s try and open my file in Now Contact 3.9.
Oops, that didn’t work. There must be something in my file which tells the software it is from a newer version. After some digging in the file I can see some possible version text.
The file has some repeated text with v400. Sure enough opening the file in version 4 has no problems and I am able to view all the contacts and even allows me to export as a CSV. Looking at a sample file from a version 4 install confirms the version information.
Now Software updated the software for the a few years in the early 1990’s. There was Windows versions as well and the format is the same except one detail.
Now Up-to-Date & Contact released version 5.3 around 2008 which finally provided support for Intel processors. It was the last version released before Now Software attempted a full re-write of the software in 2009 named Now X (code-named “NightHawk”). The software did not receive good reviews and by 2010 the company ceasedoperations. So far I have come up empty in getting a copy of this doomed version, but I will update this post if I am able to get my hands on a copy.
For now, you can take a look at some sample files on Github, which I will also add some PRONOM signatures to soon.
With all the different file formats that are found in everyday computing, most formats which find their way to my archive have historical value. We know we can’t keep everything and have to assign value to all we decide to keep in for the long term. Some files have sensitive data and we have to follow guidelines for their proper handling. Identification of files helps us know what type of data might be kept inside the format, so often I need to also identify formats we don’t plan on keeping.
I was recently looking through a large digital collection and a report on the files which did not identify in the initial scan. A few popped out to me because of their extension, TAX. Tax records are one thing we need to identify so we can properly handle them, but not likely keep in our repository.
These tax files come from the popular US based TurboTax software. The software gets a new version for every year as tax laws are constantly changing. The software has also been around since 1984, so there are many versions to be aware of. Add to the fact there are personal and business versions along with DOS, Windows, and Macintosh versions, identification might get complicated. None of which are documented in the PRONOM registry. Wikidata is aware of a couple of the extensions, but does not have any signatures to help in identification.
Luckily, this collection of files I was processing had a number of years worth of records. Using them and a few others I was able to put together a decent timeline of formats used, at least from the early 1990’s on. The format seemed to settle on the .TAX extension around the 1994 Windows version. Before this, a group of files in DOS together stored the data. Let’s look at a sample of the 1994 file from Windows.
The nice easy to read header is gone, but some other patterns start to appear. It seems most of the files from these early versions also used a code near the beginning that may help. “S1995US1040PER”, is similar to the “S1994US1040” in the 1994 file. One could assume the “1040” is the tax form most Americans are used to, along with “US” preceding the number. Then at the end of the string we see “PER”. This may refer to different versions of the Tax software, a Personal for the individual, and a possibly other versions for business as well. I believe TurboTax also had versions for Canadians as well, so there may be many variations on this string. This could get complex. Let’s jump ahead to a 1999 file.
The same string is visible, but if course with the year “1999”. We can also see a pattern with the first 4 bytes, “c0 45 01 5f” which seem to be consistent with the 1995 file. The file I have for 1998 is consistent as well. Jumping to the new millennium, we see a change.
Two changes we see with this file. One, the ASCII string is different. S2000US1120, 1120 being the U.S. Corporation Income Tax Return. So this version of the software was different. The other change is the first 4 bytes. They changed to “c0 45 01 64”, with the last byte changing from 5F to 64. Jumping to 2003, we see the same values.
Back to a 1040 form, but with the same header as the 2000 file. I am removing some lines, just to be safe and not exposing any personal data. In 2004 we see a major change in the format.
% hexdump -C TurboTax2004.tax | head 00000000 54 54 46 4e 01 01 6f 68 dc 62 00 00 00 00 4b 01 |TTFN..oh.b....K.|
Again, removing some lines to ensure safety. This header is very different and their is no human readable ASCII in the file, which means it is binary and probably encoded. This header is new, TTFN is what I assume references TurboTax format? file? or possibly, “Turbo Tax Financial Network“?
This header is then used for the next few years ending in 2013, but before we get there, the extension makes a change as well. In 2008, instead of the simple .TAX extension, the software begins to save the tax file with the extension .TAX2008. I don’t have a 2008 document, but I do have a sample 2009 document.
2014 is where I get a little confused. I have one file which uses the TTFN header and another which uses what becomes the standard going forward. But definitely in 2015, the format starts using the ZIP container as a structure for the format. Here is a sample from 2015
The files all seem to have a manifest.xml and a unique identifier. 7-Zip also mentions a header issue with the ZIP files. Something maybe done on purpose? Now comes the odd part, the manifest.xml file does not render as an XML file, it is binary.
% hexdump -C TurboTax2017/manifest.xml | head 00000000 a1 b1 fe fb 37 18 dd 9c 08 2d 9c 86 23 00 10 fa |....7....-..#...| 00000010 12 60 92 bb dc 92 a5 df 1a 24 16 4e a9 28 89 80 |.`.......$.N.(..| 00000020 64 33 66 55 c5 93 f0 68 44 d0 7c f9 56 86 42 2c |d3fU...hD.|.V.B,| 00000030 80 ba 8a 95 2a 82 6d 32 75 84 b1 f1 e2 18 93 5c |....*.m2u......\| 00000040 82 4d 18 f9 ed 23 4f dc d6 b5 7f f2 20 1e 30 59 |.M...#O..... .0Y| 00000050 d5 7f 47 7d aa f5 8d bd 8b 10 20 ec 8a c7 43 df |..G}...... ...C.| 00000060 52 90 a9 70 4d 68 b4 76 fa c8 37 85 f5 56 25 82 |R..pMh.v..7..V%.| 00000070 ea 16 06 54 b0 b4 bc 43 16 fb 70 7b 7a 79 a5 8b |...T...C..p{zy..| 00000080 3c 79 7d ef ac 32 fc 35 ce 0f fa a2 6f e7 c3 a4 |<y}..2.5....o...| 00000090 92 a1 a4 c8 83 dd 9f 32 f4 ea d3 1a eb 89 15 a3 |.......2........|
Of the samples I have which have a manifest.xml, they all begin with “a1 b1 fe fb”. Which apparently is the header for an AES CBC encrypted file. A clever person was able to decrypt the file to reveal the actual XML.
TurboTax isn’t sold on physical disk anymore, but you can download the current tax year version from their website. I am not a user of their product so I am not sure if the latest version still saves files in the same way. If you do use it currently, I would love to know if it is still the same.
So to recap, the headers are:
1994 “TurboTax Format=WIN Version=13
1995-99 “C045015F”
2000-03 “C0450164”
2004-13 “TTFN”
2014-current “ZIP Container”
This should be enough to create five new signatures for identification. Extensions will be a problem since they change very year, but we can add them to the list. With these signatures we can now identify all the tax files we have and set them aside if not needed.
It seems to be a common theme through the history of software that some titles, get bought, sold, rebranded, integrated, and discontinued by a number of companies. I find it interesting to find out a popular software title’s humble beginnings. Often when a piece of software gets bought, the file formats don’t change much, at least at first.
A little shareware program called iView started out by a company called Script Software in 1996. They later changed their name to Plum Amazing. iView then became iView Multimedia, then an iView MediaPro version before it was bought by Microsoft where they changed the name to Expression Media. After a couple years the software was bought by Phase One and then discontinued. Let’s take a look at the history.
iView, according to their website in 1997, is simply the easiest and fastest way to view and catalog pictures for the Mac. The software initially only worked on the Macintosh and the Catalog file it produced did not have an extension. But they did have a Type/Creator code. A catalog produced by version 2 of the iView software was IVWc/IVW2.
The iView format is a proprietary binary format used to store a catalog of multimedia formats with their metadata and thumbnail. The media viewer had support for quite a few popular formats. The file seems to have paths to each of the files it has cataloged, so some of these iView files can get pretty large.
In 2003 the iView software was ported to Windows. With that brought a formal extension to the catalog format. This was also the time the iView software made the switch from the classic MacOS to MacOSX and extensions were also encouraged at this time. iView had two different version a standard shareware version and a Media Pro version, each had their own version numbers. iView MediaPro was not compatible with Macintosh 68K machines or systems earlier than 8.6. The last Media Pro version was version 3.8.6. You can get most of the old software versions here.
This time with an extension, IVC, but with a familiar pattern at the beginning. The string 025i, hex values “30323569” at byte 4. The iView files from previous versions have the same bytes, but only version Media Pro 2 & 3 files match an existing PRONOM identification.
% sf iViewPro302-s01.ivc filename : 'iViewPro302-s01.ivc' filesize : 3757 modified : 2025-09-17T17:39:27-06:00 errors : matches : - ns : 'pronom' id : 'fmt/647' format : 'Microsoft Expression Media' version : '2' mime : class : 'Presentation' basis : 'extension match ivc; byte match at [[4 4] [3737 16]]'
These are iView Media Pro files, why are they identifying as Microsoft Expression Media files? That is because Microsoft bought iView Media Pro on June 27, 2006. Microsoft rebranded the software as Expression Media, not to be confused with Expression Studio. It was available for Windows and Macintosh, but not everyone was happy with the purchase. Version 1 of Expression Media was released the next year and was a free upgrade for iView Media Pro users. The format doesn’t appear to have changed much at all. In fact a comparison of an iView Media Pro 3 file with no content and an Expression Media 1 file are practically identical.
Even though all of these versions have the same 4 bytes at the beginning, not all of them match the current PRONOM signature. fmt/647 is specifically for Expression Media version 2 files, but also identifies iView Media Pro 2 & 3 and Expression Media 1 files. It doesn’t identify earlier files because the signature is also looking for some bytes near the end of the file.
There is the same 4 bytes at the end of the file as well. There is also a string used in the signature at the end, “SVar”. Not sure what the string is used for but it is not in earlier versions.
Microsoft Expression Media was short lived. Microsoft decided to sell off the software to Phase One in 2010. Phase One is the developer of Capture One, a professional photo editing program. It makes sense they would want a cataloging tool to go with their flagship product. Phase One retained the name Media Pro from the original iView Media Pro software.
Phase One took the software and did make modifications, starting with the extension used to store the catalogs. They also decided to adjust the format slightly, changing the “025i” bytes to “030i”.
The Phase One Media Pro software uses the extension MPCATALOG, but can also open the older IVC catalogs as well.
% sf PhaseOneMediaProv1.mpcatalog
filename : 'PhaseOneMediaProv1.mpcatalog' filesize : 21353 modified : 2025-09-16T20:37:07-06:00 errors : matches : - ns : 'pronom' id : 'fmt/648' format : 'Media View Pro' version : mime : class : 'Presentation' basis : 'extension match mpcatalog; byte match at [[4 4] [21329 16]]'
MPCATALOG files are identified in PRONOM using a similar signature as the one used for the IVC files. Although the name of the format isn’t quite right, MediaPro is probably a better name.
So it seems the identification is already available in PRONOM for the later MediaPro files, both iView MediaPro and Expression Media, and a second identification for the PhaseOne catalog. So we will need to either adjust the identification to include the earlier iView versions and adjust the names or we can create a new signature for the older versions. It would be good to find out what version added the change to the format, but with all the different software versions, it might be hard to nail down.
The main subject of these posts is about Obsolete software and file formats. I prefer to focus on older software titles and collect them when I can. I have also found older Macintosh software to be particularly interesting as many of the qualities of early Macintosh use is lost today. In researching a very early Macintosh title, I came across an article from 1999 written by the Washington Post, the article, now 26 years old, was already commenting about “antique” software which was less than 20 years old at the time. Is there a term for even more antique? The title of the article? “Old Enthusiasts Are Scouring the Web to Find ‘Antique’ Software”. I feel this hasn’t changed, I still scour the web to find old software, and if the enthusiasts were “old” 26 years ago, then I am ancient.
The files created by ThinkTank are plain text with the ASCII “HEAD”. There was also a DOS version of ThinkTank, but the files used were .DB and .SAV, although the templates in the .TXT format did use this same format.
Turns out this was a special format they called “dot-head“, aptly named for the head of the file. It was used as an interchange format to move outlines between ThinkTank, another program called Ready!, and the later product MORE.
MORE was developed to be multiple tools in one. Meant to “Unite idea processing technology with the desktop publishing revolution“. MORE replaced ThinkTank in 1986 and promised more flexibility by creating charts and presentations quickly from your outline. MORE used the same dot-head format initially, also the ASCII could be in lowercase.
In 1987 Living Videotext was purchased by Symantec. Shortly after Symantec released MORE II and a rebranded DOS application called GrandView based on ThinkTank.
Let’s take a look at GrandView, it was built from the DOS version of ThinkTank and compatible with the same formats. It had great reviews at the time and provided the first outliner for Symantec. It was written by the developer, John Friend, who created PC Outline which was often bundled with WordStar.
GrandView could import and export into any of the other products.
GrandView was also compatible with the Macintosh counterpart, MORE.
Symantec then released a new version of the MORE software for the Macintosh in 1988, adding new presentation features. MORE II went away from the dot-head format and used a new proprietary format.
The MORE 3 format got a new header but appears similar in structure to the previous version. And the new companion tool MORE Graph had yet another format.
Luckily these early Macintosh based formats didn’t use a resource fork, making them fully compatible with their PC counterpart.
One of the coolest parts of this long list of outline software, is that years later, after Symantec discontinued the product, the original creator, Dave Winer, petitioned Symantec to allow him to release the antique software free and clear to the public. How cool is that? I would really like to see this happen more as other software titles die and get swept under the rug leaving the community to try and find copies, preserve them and make sense of the formats. Not only were the early versions made available, a tool was built to migrate the MORE format to more open XML, allowing the ideas trapped in these ancient formats to be re-imagined.
MORE 3.1 was the final version of the software to be released by Symantec. The files produced by MORE 3.1 have an identical header to the standard 3.0 version. Probably only need one signature for the two versions.
If you would like to try out the MORE software, download this disk image, and drag onto the Macintosh emulator below. The image will automatically mount and you should be able to take MORE 3.1 for a spin!
Outlining software still has a good place in idea generation and presentation. OmniOutliner can probably trace its roots to these “Antique” titles! Stay tuned for some PRONOM signatures to go along with these many format examples. For now you can gather some of the samples from my Github page.
Many of us lived through the Word Processing Wars of the late 1980’s and early 1990’s. It was an overwhelming time of many options to choose from, each providing new features with each update, trying to become the leader in the word processing game. Early DOS versions had steep learning curves which built loyalty to those who committed to muscle memory all the key commands needed to produce the perfect document. With the many options to choose for word processing, brought just as many file formats to save your work. Many titles used the same file extensions or encouraged users to choose their own, using their initials instead. Often the files created by these software titles, used standard ASCII text, but mixed in their own formatting codes which all tend to make identification in preservation difficult.
I recently acquired a large lot of older software. It has been fun sorting through it and learning about the different titles. One title stuck out, as I hadn’t heard of it before. I found an old article which included the software in a comparison of word processing software in 1993. The article compares the following executive word processing software.
LotusWrite 2.0
JustWrite 2.0
Professional Write Plus 1.0
CA-Textor 6.0
Ami Pro 3.0
Word for Windows 2.0a
WordPerfect 5.1
You are probably familiar with a few of these titles, but the one that stuck out to me was CA-Textor 6.0. In my lot of software I came across a two disk installer for CA-Textor 6.0 for Windows. Developed by Computer Associates International, Inc. who opened their doors in 1976 and developed or acquired many software titles.
In the case of CA-Textor, it was purchased from a French company, Talor à Paris, who had been producing Textor, a popular word processor in France, for DOS since the 1983. The original developer, Thierry Lorthiois, had high hopes for a French product to exist in a world of giant American companies. Even with over 70,000 copies sold, the release of Textor 4 in 1988 saw much marketshare lost to Microsoft Word. By 1989, Computer Associates purchased Textor and rebranded Textor 5 for DOS and added mouse compatibility, then in late 1991 released a Windows version of Textor and named it CA-Textor; in line with their other products. It would be the only version released by Computer Associates and disappeared into the void like many word processors of the time.
CA-Textor 6.0 for Windows appears to be a well designed word process for its time. The reviews were mixed, but scored decently in many comparisons. In the article mentioned above, it scored the lowest of all the word processors. The final result says:
CA-Textor fails to offer the usability shortcuts of the other programs, and scores well below the other programs in editing, formatting and graphics manipulation.
It was possibly reviews like this which caused Computer Associates to never update or release a new version of the software.
The first thing I noticed with the software was the way the software handles files. The software defaults to a new “Library” method which stores each file connected to a Library which stores a folder of files and their full names and descriptions.
Single files can still be saved from CA-Textor by choosing DOS file, but the extension used is not clear.
Using .TXT for a formatted file seems like a bad recommendation. So let’s take a look at a few of the files generated by by CA-Textor.
The good news is there is a pattern emerging, but not the same extension. I get the feeling they didn’t see much value in the extension for this software. When I save a file in the software as a DOS file, it doesn’t automatically pick an extension for me. I left the extension off and saved a file in the DOS format.
We see the same pattern at the head, but also a clear mention of DOS, just like the sample files included. Since I don’t have any earlier DOS versions to compare, I have to assume this is the same with at least Textor 5. I did find a mention of someone trying to convert their older Textor 5 documents to modern formats and they mention they are in the TAL format.
% sf OBSO0006.TAL
filename : 'OBSO0006.TAL' filesize : 915 modified : 2025-10-05T13:26:56-06:00 errors : matches : - ns : 'pronom' id : 'UNKNOWN' format : version : mime : class : basis : warning : 'no match'
% python3 trid.py OBSO0006.TAL TrID - File Identifier v2.41 - (C) 2003-2025 By M.Pontello
File: OBSO0006.TAL Unknown!
The Textor format is not known to PRONOM via Siegfried and also unknown to TrID, which now has a python release! I did go ahead and add the signature to Wikidata which can be used in Siegfried. If there is a need, we can submit to PRONOM as well.
% sf OBSO0006.TAL --- siegfried : 1.11.2 scandate : 2025-10-05T15:24:44-06:00 signature : default.sig created : 2025-03-01T15:28:08+11:00 identifiers : - name : 'pronom' details : 'DROID_SignatureFile_V120.xml; container-signature-20240715.xml' - name : 'wikidata' details : 'wikidata-definitions-4.0.0 (2025-10-05, DROID_SignatureFile_V120.xml, container-signature-20240715.xml)' --- filename : 'OBSO0006.TAL' filesize : 915 modified : 2025-10-05T13:26:56-06:00 errors : matches : - ns : 'pronom' id : 'UNKNOWN' format : version : mime : class : basis : warning : 'no match' - ns : 'wikidata' id : 'Q136442756' format : 'Textor document' URI : 'http://www.wikidata.org/entity/Q136442756' permalink : 'https://www.wikidata.org/w/index.php?oldid=2413044878&title=Q136442756' mime : basis : 'extension match tal; byte match at 0, 13 (Wikidata reference is empty)' warning :
There is also a software tool, meant for converting Word Processing formats to modern and Mac compatible formats which was available until recently called WINCONV from MacDisk. This software will convert Textor 2/3/4/5/6 files to a text file for RTF. In the software it separates Textor 2/3 into their own group and 4, 5, and 6 into their own. Unfortunately doesn’t confirm any extensions that might be used.
It took me a few minutes to figure out some of the controls. Aside being in French, it was a little different than other Word Processing software.
After a bit of playing around in the software and trying many of the functions, I saved out a few files. At first, all the files were placed into a pair of files, called “TEXTOR.TEX” and “TEXTOR.LIG”. Creating a new document and saving would just update these two files. They seem to function in the same way the library function works in the Windows 6.0 version.
It seems the text portion of my document was saved in the LIG file and additional data, probably some description and user names into the TEX file. I then stumbled on a setup executable in the same directory that gave me some options.
THE TEXT DATABASE WILL BE CREATED ON THE DISK IN DRIVE (B)B F1 – CREATING A TEXT DATABASE >1000 DOCUMENTS INACCESSIBLE BY MS-DOS F2 – CREATING A TEXT DATABASE MANAGED BY MS-DOS (1 file per document)
Ok, so the software has two options. One for creating a database of text which we discovered above, and setting the software to create one file per document. When I selected F2, I was greeted with an error, which took me a minute to realize the first line required a disk to be in Drive B. Once I got it all configured I was able to save out a single file for a document.
Not much to go on, the file is just full of plain ASCII other than a simple byte at the beginning and some new line bytes at the end. The BAT extension is a little unexpected, usually see those as batch scripts in DOS. Let’s try a more complex text document. More text, a tab, centering a line…..
That gave me more to work with. But a bit of a mess. These seem to be more like some of the other earlier DOS word processing programs, they used ASCII, but embedded their own formatting codes throughout which only their software understood. This is why it is difficult to identify older WordStar or WordPerfect files.
This was a fun format to explore, I did learn a little French, but also had to dig deep to find the little information I was able to mention here. I would love to find a copy of Textor 4 or 5, which I believe are different than versions 2 & 3 and different than the Windows 6 version I have. There is one edition available on eBay currently, but seems to be the first version. If someone has the means in France this would be good to preserve. Feel free to look at the samples I made.
Without divulging any youthful indiscretions, I recently was going back through some of my personal archives and came across a disc I burned around 2002 with some music stored on it. Normally I would find MP3 files, but in this case the file had a ACE extension. I remembered the format as an alternative to the common RAR or ZIP format often used to compress content for transporting (sharing) around the internet. I did what I normally do when something is compressed and reached for 7zip. But to my surprise, it threw an error.
% 7z l sample.ace
Scanning the drive for archives: 1 file, 12501419 bytes (12 MiB)
Listing archive: sample.ace
ERROR: sample.ace : Can not open the file as archive
7zip usually can handle most common archives but a part of me remembered there was two versions of WinACE back in the day. Version 1 which was a free version and Version 2 which was for paid users of WinACE. How do I know which version I have is the question I frequently find myself asking. First was to check the PRONOM registry.
% sf sample.ace --- siegfried : 1.11.2 scandate : 2025-09-11T09:01:25-06:00 signature : default.sig created : 2025-03-01T15:28:08+11:00 identifiers : - name : 'pronom' details : 'DROID_SignatureFile_V120.xml; container-signature-20240715.xml' --- filename : 'sample.ace' filesize : 12501419 modified : 2025-09-11T09:04:36-06:00 errors : matches : - ns : 'pronom' id : 'UNKNOWN' format : version : mime : class : basis : warning : 'no match'
Nope, this format is not known to PRONOM. Lets try another tool.
% file sample.ace sample.ace: ACE archive data version 20, from Win/32, version 20 to extract, solid
Ok, so the file tool knows it is a version 2 ACE file and requires version 2 to extract. Good info from a file identification tool. Now lets see what we can find to extract this file on MacOS. The website Winace.com is long gone as this compression tool lost popularity and the final release was over 14 years ago. Looking at the website in the WaybackMachine we can see some downloads available. One being UnACE for Mac OS X, which upon further review, only works for the older PowerPC Mac’s. There is an open source version of unace for Linux, but it only supports version 1, the free version of the format.
Below is a screenshot of the DOS version of the ACE software. Created by Marcel Lemke.
It might be good to mention that WinRAR used to support the ACE format, but with WinACE support ending years ago and with some new vulnerabilities and folks using it for malware, support was dropped in 2019.
Luckily, I still have my PowerMac G5 lying around waiting for this very situation. After a quick install, unace was able to unarchive my music and I was able to listen to some of my favorite songs from 23 years ago. I still wanted to find a modern solution and later discovered there is a python project which can read and extract bother versions. Acefile is a pure python, no-dependencies implementation of the UnACE format. I had a little issue installing on an older Catalina laptop, but worked well on later MacOS versions. Acefile has a few features that are helpful in not only extracting, but testing and dumping the headers of an ACE file. I did install WinACE in a Windows XP Virtual Machine to make a few samples, here is one of them.
The test feature works well to ensure the file is complete and can be extracted, but doesn’t give me much to go on for knowing the version. Lets try dumping the header.
This is very helpful. We can see the output shows the magic bytes, but also the e(xtraction)version and c(creating)version. We can also find this information in the open source unace technical documentation.
2 HEAD_CRC CRC16 over block up from HEAD_TYPE 2 HEAD_SIZE size of the block from HEAD_TYPE up to the last byte of this block
1 HEAD_TYPE archive header type is 0 2 HEAD_FLAGS contains most important information about the archive
bit discription
0 0 (no ADDSIZE field) 1 presence of a main comment
9 SFX-archive 10 dictionary size limited to 256K (because of a junior SFX) 11 archive consists of multiple volumes 12 main header contains AV-string 13 recovery record present 14 archive is locked 15 archive is solid
7 ACESIGN fixed string: '**ACE**' serves to find the archive header
1 VER_EXTRACT version needed to extract archive 1 VER_CREATED version used to create the archive
I think we have enough to go on to create a signature, we just need to see what the 1 byte versions number look like in an actual file.
As you can see above, we have our magic bytes **ACE** starting at the seventh byte and taking up seven bytes. Then two bytes after it both with the hex value 14. If we convert that hex value to decimal we get “20”. Let’s look at another:
Hmm, now we have two different values. “0A” converts to decimal “10” and “0C” converts to decimal “12”. So we can infer this ACE file was created in version 1.2 and requires at least version 1.0 to extract. Let’s try another:
Again we have “0A” which converts to decimal “10” and hex 14, which converts to decimal “20”. So made with version 2.0 of the software, but made compatible with version 1.0 for extraction. One more:
Both extraction and creation version are hex “0B” which converts to decimal “11”. I would have assumed any version 1.0 version could extract anything created with later 1.x versions, but I guess that might not be true. I am not clear on all the versions released, so I am not sure how many versions I should include in a signature. I did look through some of the captured pages on the WayBackMachine and feel the last 1.x version was version 1.32.
When building these signatures, it should be easy to create two signatures based on their extraction version. But should the creation version be a factor? Version 1.0 could look like this:
2A2A4143452A2A(0A|0B|0C|0D)(0A|0B|0C|0D|14)
This accounts for the versions 1.0 through 1.3 for extract version and 1.0 through 2.0 for creation version. Version 2.0 doesn’t seem to indicate minor versions with all 2.0 versions using decimal 14. So a signature could be:
2A2A4143452A2A1414
Both would start from offset 7 from the beginning of the file. Is there a better solution?
I will warn you, there are a couple of ACE formats out there which you may come across. One being an image/texture format for Microsoft Train Simulator. That might be for another day. There is another use of the ACE archive which is worth discussing. The Comic Book Archive file with the extension CBA will use the ACE archive for storing a series of images used in some Comic Book Readers. They are indeed ACE archive files, only having the different extension and a specific purpose. Maybe adding the CBA extension to the signature would be sufficient?
I am sure there are some other properties, seen above, of the ACE format we could discuss, encryption, the differences between Solid and SFX, and dictionary headers, but I think for now, identification of the format and the main version difference is sufficient. For now, check out my Github page for my signature proposal and a few samples I made.
PagePerfect: the Promise of Desktop Publishing Realized
Now, PagePerfect has arrived. And suddenly PC desktop publishing is a lot simpler and less expensive, because PagePerfect integrates desktop publishing, word processing, and graphics editing all in one package.
The 1980’s was a time of growth in personal computing and one industry was progressing rapidly. Previously in order to get printed more than just words, you had to use a complex arrangement of type, masking, screening; all done by hand. Now with a personal computer you could design and print well designed layouts. There were many software applications who came on the scene in these early days. My personal favorite was QuarkXPress, I used the software in the early 1990’s and spent the next few years working in a commercial printshop using the software. What once took a team of skilled workers to set copy, mask, blueline, etc took only one person with the right software.
I recently came across a set of floppy disks for some software called PagePerfect, by a well known software company IMSI.
This article in a 1988 PC Magazine announces this new revolutionary software. This was early on in the days of computer desktop publishing and even on a DOS system the software was powerful. It didn’t always get the best reviews in terms of ease of use, but it was well built. The company behind this powerful software wasn’t IMSI as you might expect, it was programed by a different company, Beyond Words, started by three former MicroPro employees, the makers of WordStar. Beyond Words liked to “leave sales to others” which included IMSI and a big contract with Canon called their Desktop Publishing System.
IMSI was able to market the software well and was well priced. The name PagePerfect didn’t last long and soon after they renamed the software IMSI Publisher in 1989. I’m not 100% sure, but it might have to do with WordPerfect asserting some copyright to the name around that same time. By 1990, the software was not seen much anymore, but another name pops up, Beyond Words Composer 2.0.
All three versions of the software have a very similar interface.
But the one thing they all have in common is their file formats. Unfortunately they used the same extensions many word processing software used during this time and after. .DOC and also .STY which was used frequently by Microsoft Word as well. It makes sense, a Document is shortened to DOC and a Stylesheet is shortened to STY. So if you have any DOC files which don’t open in Word, you might look here. The other problem is the file format used is not plain text and is in a binary proprietary format.
The one positive is the very obvious strings of text in the header. [BWDB] and [BWDOC], which one could infer as Beyond Words DB and Beyond Words Document. A later Beyond Words Composer document has the same header but a higher version number.
I haven’t been able to find any specific bytes which differentiate the Stylesheets from the Documents. They may be the same format, but for now we will consider them the same. These stylesheets seem to function as a template which are often the same format.
Apart from the document layout, the software can also create and use databases. Which appear to be a similar format but with different offsets.
Prior to me diving into this format, the only tool which had some information on this format was TrID, which identified all the DOC and STY files as Beyond Words Composer style. Which is mostly true. Hopefully with this background you can be aware of the different software names this format was used with and with some luck convert the files to something less proprietary.
Some disks that came with my PagePerfect install disks do have some personal documents created with the software, but I wonder how much this software really was used in the late 1980’s and early 1990’s, because after that point, you don’t hear about the software anymore. There is some references to the software getting absorbed into another software, IBM DisplayWrite 5/2. I would be curious if others have come across this file format.
Most of what you will find on this blog is file format identification. I see this as the first step in a longer process of preservation and ultimately access. Hopefully the analysis of some file formats can help make better decisions when needing to render the file in an emulator or migrate to another format. I don’t spend much time trying to parse the files I look at to understand the actual content, just enough to properly identify and differentiate between important versions of the format.
One area I sometimes touch on, but often skim over is encryption. Many file formats are binary, meaning they use a sequence of bytes to encode data which is more efficient than human readable text and is often compressed. The bytes used to store data is designed by the developer of the software, they can encode the data however they choose, which is often unreadable by anyone else and is proprietary. A file can also be further encrypted by a password to limit use, even with the right software.
I recently had one of the numerous fans of this blog reach out and ask about the post I made on the software Student Writing Center. They had a bunch of journal files from their youth and couldn’t find a way to read these older files. I offered my help as I still have the software and a nice emulator to run the old software.
As I was going through and converting the journal entries into a PDF. I came across a few which asked for a password to open. You can see below the explanation from the help menu confirms the file format is a proprietary format only readable by their software and the password feature is to further protect the content.
Finding a few of the journal documents password protected was frustrating at first. I was converting some documents that are over 26 years years old, I doubted the password would be remembered. When I asked, they gave me a couple passwords to try, but nothing worked. But I don’t give up that easily!
My first thought was to take all the text from the other journal entries and make a dictionary and then use it to try and brute-force the password. There are some great tools to do this like hashcat. With tools like this, you need to retrieve a hash of the password. This is an encrypted sequence of the password stored in the file. So the first step was to find where the password was stored in the file. Since I have the software and can make new password protected files using a password of my choice this proved a simple task. Create two identical files, add a password to each but different. Then compare the two files in a hex editor to find the difference.
There it is. The password field in the software only let me put in 10 characters and these 10 bytes lit up when I ran a difference between the two files. I went to check the files given to me which also had password protection and found they also had a similar pattern. In fact I noticed from a few checks that the passwords I used also had a pattern in the file.
For this file I used the number “1” ten times. In that same location it repeated the same byte value”85″, 10 times. After a couple more tests I could see this wasn’t an algorithm I need to crack, but a simple replacement. I created a few more files using all the letters in the alphabet and all the numbers and came up with a substitution cypher.
Obviously the passwords used in the documents I was trying to open didn’t all use the full 10 characters, but the password was always preceded by the values “00” and had the values “1A46461A” after the password. The byte prior to the “00” indicates the length of the password. From there I just needed to decode the bytes between those two offsets.
So for this file with an 8 byte sequence “90D54F4FA3FBBA94” decodes to: password. How cool is that? To make things even easier, the passwords used in Student Writing Center are not case sensitive. There are additional values for symbols. You can see the entire substitution list here.
One other thing related to identification. Would it be important to identify a password protected file differently than a regular file? At offset 0xDA there seems to be a indicator that the file is password protected. “00” if not “01” if protected.
What do you think? Should this property be identified as a separate file format from a regular file or is this property something that should be gathered using additional tools that can gather additional properties from a file like this?
Speaking of additional tools. There is a pretty cool project called the Import library for legacy Mac documents or libmwaw which claims to have support for Student Center Writing documents and a lot more. It indeed does, but not the journal format, only the main letter format. I bet it wouldn’t take much to add the journal format to the library, something I will look into.