Page Perfect

PagePerfect: the Promise of Desktop Publishing Realized

Now, PagePerfect has arrived. And suddenly PC desktop publishing is a lot
simpler and less expensive, because PagePerfect integrates desktop
publishing, word processing, and graphics editing all in one package.

The 1980’s was a time of growth in personal computing and one industry was progressing rapidly. Previously in order to get printed more than just words, you had to use a complex arrangement of type, masking, screening; all done by hand. Now with a personal computer you could design and print well designed layouts. There were many software applications who came on the scene in these early days. My personal favorite was QuarkXPress, I used the software in the early 1990’s and spent the next few years working in a commercial printshop using the software. What once took a team of skilled workers to set copy, mask, blueline, etc took only one person with the right software.

I recently came across a set of floppy disks for some software called PagePerfect, by a well known software company IMSI.

This article in a 1988 PC Magazine announces this new revolutionary software. This was early on in the days of computer desktop publishing and even on a DOS system the software was powerful. It didn’t always get the best reviews in terms of ease of use, but it was well built. The company behind this powerful software wasn’t IMSI as you might expect, it was programed by a different company, Beyond Words, started by three former MicroPro employees, the makers of WordStar. Beyond Words liked to “leave sales to others” which included IMSI and a big contract with Canon called their Desktop Publishing System.

IMSI was able to market the software well and was well priced. The name PagePerfect didn’t last long and soon after they renamed the software IMSI Publisher in 1989. I’m not 100% sure, but it might have to do with WordPerfect asserting some copyright to the name around that same time. By 1990, the software was not seen much anymore, but another name pops up, Beyond Words Composer 2.0.

All three versions of the software have a very similar interface.

But the one thing they all have in common is their file formats. Unfortunately they used the same extensions many word processing software used during this time and after. .DOC and also .STY which was used frequently by Microsoft Word as well. It makes sense, a Document is shortened to DOC and a Stylesheet is shortened to STY. So if you have any DOC files which don’t open in Word, you might look here. The other problem is the file format used is not plain text and is in a binary proprietary format.

hexdump -C TEST.DOC | head
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 3c af 13 5b 1e 00 00 00 95 63 |......<..[.....c|
00000020 00 00 5e 00 00 00 18 00 00 00 01 00 76 00 00 00 |..^.........v...|
00000030 68 01 00 00 0a 00 de 01 00 00 00 00 00 00 00 00 |h...............|
00000040 de 01 00 00 8b 60 00 00 1e 00 69 62 00 00 2c 01 |.....`....ib..,.|
00000050 00 00 1e 00 00 00 00 00 00 00 00 00 00 00 5b 42 |..............[B|
00000060 57 44 4f 43 5d 00 00 00 00 32 2e 30 39 00 00 00 |WDOC]....2.09...|
00000070 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 6c 00 00 00 00 00 00 00 00 00 00 00 |....l...........|

The one positive is the very obvious strings of text in the header. [BWDB] and [BWDOC], which one could infer as Beyond Words DB and Beyond Words Document. A later Beyond Words Composer document has the same header but a higher version number.

hexdump -C WELCOME.DOC | head
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 aa 14 56 16 29 00 00 00 30 84 |........V.)...0.|
00000020 00 00 5e 00 00 00 18 00 00 00 01 00 76 00 00 00 |..^.........v...|
00000030 b0 01 00 00 0c 00 26 02 00 00 00 00 00 00 00 00 |......&.........|
00000040 26 02 00 00 70 80 00 00 29 00 96 82 00 00 9a 01 |&...p...).......|
00000050 00 00 29 00 00 00 00 00 00 00 00 00 00 00 5b 42 |..)...........[B|
00000060 57 44 4f 43 5d 00 00 00 00 33 2e 30 31 00 00 00 |WDOC]....3.01...|
00000070 00 00 00 00 0c 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 6e 00 00 00 00 00 00 00 00 00 00 00 |....n...........|

If we look at the Stylesheets we see the same patterns.

hexdump -C SAMPLE.STY | head   
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 51 10 76 10 09 00 00 00 da 2c |......Q.v......,|
00000020 00 00 5e 00 00 00 18 00 00 00 01 00 76 00 00 00 |..^.........v...|
00000030 68 01 00 00 0a 00 de 01 00 00 00 00 00 00 00 00 |h...............|
00000040 de 01 00 00 a2 2a 00 00 09 00 80 2c 00 00 5a 00 |.....*.....,..Z.|
00000050 00 00 09 00 00 00 00 00 00 00 00 00 00 00 5b 42 |..............[B|
00000060 57 44 4f 43 5d 00 00 00 00 32 2e 30 39 00 00 00 |WDOC]....2.09...|
00000070 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 6c 00 00 00 00 00 00 00 00 00 00 00 |....l...........|

I haven’t been able to find any specific bytes which differentiate the Stylesheets from the Documents. They may be the same format, but for now we will consider them the same. These stylesheets seem to function as a template which are often the same format.

Apart from the document layout, the software can also create and use databases. Which appear to be a similar format but with different offsets.

hexdump -C DOCUMENT.TBL | head
00000000 5b 42 57 44 42 5d 00 00 00 00 00 31 2e 30 30 00 |[BWDB].....1.00.|
00000010 00 00 00 00 00 00 6b 10 36 00 00 00 18 00 00 00 |......k.6.......|
00000020 01 00 4e 00 00 00 68 01 00 00 0a 00 b6 01 00 00 |..N...h.........|
00000030 00 00 00 00 00 00 5b 42 57 44 4f 43 5d 00 00 00 |......[BWDOC]...|
00000040 00 32 2e 30 39 00 00 00 00 00 00 00 0a 00 00 00 |.2.09...........|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 6c 00 00 00 |............l...|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Prior to me diving into this format, the only tool which had some information on this format was TrID, which identified all the DOC and STY files as Beyond Words Composer style. Which is mostly true. Hopefully with this background you can be aware of the different software names this format was used with and with some luck convert the files to something less proprietary.

Some disks that came with my PagePerfect install disks do have some personal documents created with the software, but I wonder how much this software really was used in the late 1980’s and early 1990’s, because after that point, you don’t hear about the software anymore. There is some references to the software getting absorbed into another software, IBM DisplayWrite 5/2. I would be curious if others have come across this file format.

Scrivener

Word Processors are everywhere and have some of the most recognizable file formats. Some are very simple in that they just contain plain text, others are more complex. There are formats which allow for images and others which can handle different languages and writing directions.

A writing platform I recently learned about is called Scrivener. It was first released in 2007 by a company called Literature & Latte Ltd, and has a Macintosh and Windows version. The software is marketed toward writers as there is some features that help with note taking, research and much more. It also allows for adding multimedia and even full webpages.

This is accomplished by a file format which uses a non-traditional method for storing all the data needed to render the format.

tree Scrivener3-s01.scriv
Scrivener3-s01.scriv
├── Files
│   ├── Data
│   │   ├── 921B4A08-54C0-4B69-94FD-428F56FDAB89
│   │   │   └── content.rtf
│   │   └── docs.checksum
│   ├── binder.autosave
│   ├── binder.backup
│   ├── search.indexes
│   ├── styles.xml
│   ├── version.txt
│   └── writing.history
├── Scrivener3-s01.scrivx
└── Settings
├── recents.txt
├── ui-common.xml
└── ui.ini

Scrivener uses a folder structure to store all the data used in the format. The folder has an extension, .scriv. The format includes some rich text, backups, indexes, version history and more. One unique format within the folder is an XML file with the extension .scrivx. This makes the format proprietary and can only be rendered using the Scrivener software.

cat Scrivener3-s01.scrivx | head
<?xml version="1.0" encoding="UTF-8"?>
<ScrivenerProject Template="No" Version="2.0" Identifier="DF5DA7F0-27DB-4815-A050-B4D6F23CABA7" Creator="SCRWIN-3.1.5.1" Device="DESKTOP-JMM4K7M" Modified="2025-03-14 22:15:28 -0600" ModID="B4A944C3-FF79-49F6-A737-158BEB4E58BB">
<Binder>
<BinderItem UUID="17807D28-117A-409E-B12D-B34922B6CC6F" Type="DraftFolder" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:17 -0600">
<Title>Draft</Title>
<MetaData>
<IncludeInCompile>Yes</IncludeInCompile>
</MetaData>
<Children>
<BinderItem UUID="921B4A08-54C0-4B69-94FD-428F56FDAB89" Type="Text" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:23 -0600">

The XML has enough to be able to identify them apart from other XML files. The signature would be straight forward. Earlier versions of Scrivener sometimes have the SCRIVX file but also sometimes has a
.scrivproj extension. This file on a Macintosh is in a Binary plist format, which is different than earlier Windows versions. Seems they may have unified them under version 2 or 3, where version 1 & 2 for Windows uses Project version 1 and version 3 uses project version 2.

hexdump -C Scrivener1-s01.scriv/binder.scrivproj | head
00000000 62 70 6c 69 73 74 30 30 d4 00 01 00 02 00 03 00 |bplist00........|
00000010 04 00 05 00 1d 01 d8 01 d9 54 24 74 6f 70 58 24 |.........T$topX$|
00000020 6f 62 6a 65 63 74 73 58 24 76 65 72 73 69 6f 6e |objectsX$version|
00000030 59 24 61 72 63 68 69 76 65 72 dc 00 06 00 07 00 |Y$archiver......|
00000040 08 00 09 00 0a 00 0b 00 0c 00 0d 00 0e 00 0f 00 |................|
00000050 10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00 |................|
00000060 18 00 19 00 1a 00 15 00 1b 00 1c 5a 4c 61 62 65 |...........ZLabe|
00000070 6c 54 69 74 6c 65 59 4c 61 62 65 6c 4c 69 73 74 |lTitleYLabelList|
00000080 5e 42 69 6e 64 65 72 43 6f 6e 74 65 6e 74 73 5f |^BinderContents_|
00000090 10 0f 44 65 66 61 75 6c 74 4c 61 62 65 6c 54 61 |..DefaultLabelTa|

Since the developers of Scrivener decided to make the SCRIV format simply a folder with different content within, something special happens on the MacOS. The Scrivener software registers all the extensions is uses with the MacOS launch services. This process then changes the way the SCRIV folder is displayed in the MacOS Finder. They now appears as a single file and given a file type. This is called a Document Package format.

By right-clicking on the “file” you can then browse the package contents. There is nothing in the folder itself or hidden in any attributes which causes this to happen, it is all controlled by what extensions have been registered with the launch services database. We can however ask the MacOS to give us some extended metadata details about the package, as long as the file is on a Apple filesystem like HFS or APFS.

mdls Scrivener3-s01.scriv 
_kMDItemDisplayNameWithExtensions = "Scrivener3-s01.scriv"
kMDItemContentCreationDate = 2025-03-15 04:15:17 +0000
kMDItemContentCreationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentModificationDate = 2025-03-15 04:15:18 +0000
kMDItemContentModificationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentType = "com.literatureandlatte.scrivener3.scriv"
kMDItemContentTypeTree = (
"com.literatureandlatte.scrivener3.scriv",
"public.directory",
"public.item",
"com.apple.package",
"public.content",
"public.composite-content"
)
kMDItemDateAdded = 2025-03-21 04:38:48 +0000
kMDItemDateAdded_Ranking = 2025-03-21 00:00:00 +0000
kMDItemDisplayName = "Scrivener3-s01.scriv"
kMDItemDocumentIdentifier = 0
kMDItemFSContentChangeDate = 2025-03-15 04:15:18 +0000
kMDItemFSCreationDate = 2025-03-15 04:15:17 +0000
kMDItemFSCreatorCode = ""
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = (null)
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = (null)
kMDItemFSLabel = 0
kMDItemFSName = "Scrivener3-s01.scriv"
kMDItemFSNodeCount = 3
kMDItemFSOwnerGroupID = 20
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 31155
kMDItemFSTypeCode = ""
kMDItemInterestingDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemKind = "Scrivener Project"
kMDItemLogicalSize = 31155
kMDItemPhysicalSize = 69632

There is a lot of additional details available using the MDLS command, this includes the content type of “com.apple.package“. This tools works with any files in MacOS and can be a very useful tool in getting all the information you may need for preservation needs.

Until the tools we use for format identification can recognize package formats, tools like this may be needed to gather the neccessary metadata for preservation. But in the meantime, identification of the package content is the best we can hope for. Creating a signature for the XML based SCRIVX format is the first step.

Stay tuned for more on the package format as I will be bring it up more in the Digital Preservation community.