Scrivener

Word Processors are everywhere and have some of the most recognizable file formats. Some are very simple in that they just contain plain text, others are more complex. There are formats which allow for images and others which can handle different languages and writing directions.

A writing platform I recently learned about is called Scrivener. It was first released in 2007 by a company called Literature & Latte Ltd, and has a Macintosh and Windows version. The software is marketed toward writers as there is some features that help with note taking, research and much more. It also allows for adding multimedia and even full webpages.

This is accomplished by a file format which uses a non-traditional method for storing all the data needed to render the format.

tree Scrivener3-s01.scriv
Scrivener3-s01.scriv
├── Files
│   ├── Data
│   │   ├── 921B4A08-54C0-4B69-94FD-428F56FDAB89
│   │   │   └── content.rtf
│   │   └── docs.checksum
│   ├── binder.autosave
│   ├── binder.backup
│   ├── search.indexes
│   ├── styles.xml
│   ├── version.txt
│   └── writing.history
├── Scrivener3-s01.scrivx
└── Settings
├── recents.txt
├── ui-common.xml
└── ui.ini

Scrivener uses a folder structure to store all the data used in the format. The folder has an extension, .scriv. The format includes some rich text, backups, indexes, version history and more. One unique format within the folder is an XML file with the extension .scrivx. This makes the format proprietary and can only be rendered using the Scrivener software.

cat Scrivener3-s01.scrivx | head
<?xml version="1.0" encoding="UTF-8"?>
<ScrivenerProject Template="No" Version="2.0" Identifier="DF5DA7F0-27DB-4815-A050-B4D6F23CABA7" Creator="SCRWIN-3.1.5.1" Device="DESKTOP-JMM4K7M" Modified="2025-03-14 22:15:28 -0600" ModID="B4A944C3-FF79-49F6-A737-158BEB4E58BB">
<Binder>
<BinderItem UUID="17807D28-117A-409E-B12D-B34922B6CC6F" Type="DraftFolder" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:17 -0600">
<Title>Draft</Title>
<MetaData>
<IncludeInCompile>Yes</IncludeInCompile>
</MetaData>
<Children>
<BinderItem UUID="921B4A08-54C0-4B69-94FD-428F56FDAB89" Type="Text" Created="2025-03-14 22:15:17 -0600" Modified="2025-03-14 22:15:23 -0600">

The XML has enough to be able to identify them apart from other XML files. The signature would be straight forward. Earlier versions of Scrivener sometimes have the SCRIVX file but also sometimes has a
.scrivproj extension. This file on a Macintosh is in a Binary plist format, which is different than earlier Windows versions. Seems they may have unified them under version 2 or 3, where version 1 & 2 for Windows uses Project version 1 and version 3 uses project version 2.

hexdump -C Scrivener1-s01.scriv/binder.scrivproj | head
00000000 62 70 6c 69 73 74 30 30 d4 00 01 00 02 00 03 00 |bplist00........|
00000010 04 00 05 00 1d 01 d8 01 d9 54 24 74 6f 70 58 24 |.........T$topX$|
00000020 6f 62 6a 65 63 74 73 58 24 76 65 72 73 69 6f 6e |objectsX$version|
00000030 59 24 61 72 63 68 69 76 65 72 dc 00 06 00 07 00 |Y$archiver......|
00000040 08 00 09 00 0a 00 0b 00 0c 00 0d 00 0e 00 0f 00 |................|
00000050 10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00 |................|
00000060 18 00 19 00 1a 00 15 00 1b 00 1c 5a 4c 61 62 65 |...........ZLabe|
00000070 6c 54 69 74 6c 65 59 4c 61 62 65 6c 4c 69 73 74 |lTitleYLabelList|
00000080 5e 42 69 6e 64 65 72 43 6f 6e 74 65 6e 74 73 5f |^BinderContents_|
00000090 10 0f 44 65 66 61 75 6c 74 4c 61 62 65 6c 54 61 |..DefaultLabelTa|

Since the developers of Scrivener decided to make the SCRIV format simply a folder with different content within, something special happens on the MacOS. The Scrivener software registers all the extensions is uses with the MacOS launch services. This process then changes the way the SCRIV folder is displayed in the MacOS Finder. They now appears as a single file and given a file type. This is called a Document Package format.

By right-clicking on the “file” you can then browse the package contents. There is nothing in the folder itself or hidden in any attributes which causes this to happen, it is all controlled by what extensions have been registered with the launch services database. We can however ask the MacOS to give us some extended metadata details about the package, as long as the file is on a Apple filesystem like HFS or APFS.

mdls Scrivener3-s01.scriv 
_kMDItemDisplayNameWithExtensions = "Scrivener3-s01.scriv"
kMDItemContentCreationDate = 2025-03-15 04:15:17 +0000
kMDItemContentCreationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentModificationDate = 2025-03-15 04:15:18 +0000
kMDItemContentModificationDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemContentType = "com.literatureandlatte.scrivener3.scriv"
kMDItemContentTypeTree = (
"com.literatureandlatte.scrivener3.scriv",
"public.directory",
"public.item",
"com.apple.package",
"public.content",
"public.composite-content"
)
kMDItemDateAdded = 2025-03-21 04:38:48 +0000
kMDItemDateAdded_Ranking = 2025-03-21 00:00:00 +0000
kMDItemDisplayName = "Scrivener3-s01.scriv"
kMDItemDocumentIdentifier = 0
kMDItemFSContentChangeDate = 2025-03-15 04:15:18 +0000
kMDItemFSCreationDate = 2025-03-15 04:15:17 +0000
kMDItemFSCreatorCode = ""
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = (null)
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = (null)
kMDItemFSLabel = 0
kMDItemFSName = "Scrivener3-s01.scriv"
kMDItemFSNodeCount = 3
kMDItemFSOwnerGroupID = 20
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 31155
kMDItemFSTypeCode = ""
kMDItemInterestingDate_Ranking = 2025-03-15 00:00:00 +0000
kMDItemKind = "Scrivener Project"
kMDItemLogicalSize = 31155
kMDItemPhysicalSize = 69632

There is a lot of additional details available using the MDLS command, this includes the content type of “com.apple.package“. This tools works with any files in MacOS and can be a very useful tool in getting all the information you may need for preservation needs.

Until the tools we use for format identification can recognize package formats, tools like this may be needed to gather the neccessary metadata for preservation. But in the meantime, identification of the package content is the best we can hope for. Creating a signature for the XML based SCRIVX format is the first step.

Stay tuned for more on the package format as I will be bring it up more in the Digital Preservation community.

Apple Package Format

Let’s talk about Apple’s iWork software. Apple’s Office Suite of applications was first released in 2005 and provided a WordProcessor (Pages), Presentations (Keynote), and a little later, Spreadsheet (Numbers). They are exclusive to the Macintosh and iOS devices.

iWork was released in a few different versions. They get a little confusing as each application has its own version which all seemed to unify and stabilize in 2020. Here is a matrix of major versions.

VersionPackage or ZIP
iWork ’05Package
iWork ’06Package
iWork ’08Package
iWork ’09ZIP
iWork 2013Package
iWork 2014ZIP
iWork 2019ZIP
iWork 2020ZIP

You may already be aware but MacOS can sometimes be weird. I use the term weird in a loving, sometimes proud way, but I admit, there was some “odd” choices made in regards to how applications and documents are used and stored files on a Mac.

On early Macintosh computers Apple used an interesting method of storing resources for applications and some file formats. The Resource Fork for an application contained all the “resources” needed to run in the operating system. It would contain all the icons, warning screens, graphics, sounds, etc. This held true until Mac OS X came along and then Apple started using a bundle or package format. Still in use today, what appears to be a single file or application is actually a folder of all the resources needed to run the application.

Show Package Contents

By right clicking or control clicking on the icon you can open the folder and see all the contents which make up the Application.

Directory listing of Pages.app on MacOS

Nifty right? The MacOS which knows which extensions to treat as a package. If you were to move the application over to another system it would be a folder with the extension “.app”.

For an application I can see how this makes sense as it will only execute in the MacOS environment. The problem comes in when you use the same package method for the documents the application creates.

Contents of Pages version 1 sample file.

So instead of a single “file” with a bytestream, you get a folder of files which make up the file format. Here is Apple’s description:

Document Packages

If your document file formats are getting too complex to manage because of several disparate types of data, you might consider adopting a package format for your documents. Document packages give the illusion of a single document to users but provide you with flexibility in how you store the document data internally. Especially if you use several different types of standard data formats, such as JPEG, GIF, or XML, document packages make accessing and managing that data much easier.

Apple actually defines two similar methods:

Although bundles and packages are sometimes referred to interchangeably, they actually represent very distinct concepts:

  • package is any directory that the Finder presents to the user as if it were a single file.
  • bundle is a directory with a standardized hierarchical structure that holds executable code and the resources used by that code.

A couple years ago a processed digital collection made its way down to me. It had been processed by a new digital archivist and when I went to prepare the collection for preservation, I found a folder with the extension .pages and inside the folder a whole directory of files. Many of which they had renamed and arranged. Needless to say, I had to track down the original disk so I could properly preserve the file.

So looking back at the earlier table, iWork switched back and forth between the package format and a ZIP container. For preservation purposes, the ZIP container is easier to maintain outside the MacOS. Lets look a little closer at each. If you would like to follow along I have copied a few samples onto a hybrid ISO.

iWork ’05 through iWork ’08 used the same package format and structure. Because they are a package format, they are difficult to preserve as original files. I suppose you could zip them up, but probably the best option is to open with a current version of Pages and save to the newer ZIP container format.

tree iWork08/Keynote-06.key 
├── Contents
│   └── PkgInfo
├── QuickLook
│   └── Thumbnail.jpg
├── index.apxl.gz
└── theme-files
    ├── Blue 2.jpg
    ├── Blue 2.tif
    ├── Cool Gray-2.jpg
    ├── Cool Gray.tif
    ├── Green-8.jpg
    ├── Green.tif
    ├── Headlines_bullet.pdf
    ├── Headlines_star.pdf
    ├── Orange 2.tif
    ├── Orange_2.jpg
    ├── Purple-6.jpg
    ├── Purple.tif
    ├── Red.jpg
    ├── Red.tif
    ├── endpoints.pdf
    └── headlines_hi-res.jpg

iWork ’09 changed this practice. The documents saved from Pages, Keynote, and Numbers were contained in a ZIP file and can be identified using the PRONOM registry container signatures.

filename : 'iWork 2013/Pages2013-Sample09.pages'
filesize : 105900
modified : 2019-11-21T20:36:00-07:00
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1439'
    format  : 'Apple iWork Pages'
    version : '09'
    class   : 'Word Processor'
    basis   : 'extension match pages; container name index.xml with byte match at 195, 76' 
Sample09.pages
Type = zip
WARNINGS:
Headers Error
Physical Size = 105900

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-11-21 20:36:00 .....       364773        22413  index.xml
2019-11-21 20:36:00 .....         7007         7007  Hardcover_bullet_black.png
2019-11-21 20:36:00 .....        69176        69176  Simple_Noise_2x.jpg
2019-11-21 20:36:00 .....          232          232  buildVersionHistory.plist
2019-11-21 20:36:00 .....         6406         6406  QuickLook/Thumbnail.png
------------------- ----- ------------ ------------  ------------------------
2019-11-21 20:36:00             447594       105234  5 files

Then Apple went back to a Package format with iWork 2013. For reasons unknown. But the content and structure changed. Its a package format with a Index.zip instead of index.xml

Pages2013-Sample.pages
├── Data
│   └── Hardcover_bullet_black-13.png
├── Index.zip
├── Metadata
│   ├── BuildVersionHistory.plist
│   ├── DocumentIdentifier
│   └── Properties.plist
├── preview-micro.jpg
├── preview-web.jpg
└── preview.jpg

3 directories, 8 files

The ZIP within the package contains a new Apple format. IWA

Pages2013-Sample.pages/Index.zip
Type = zip
Physical Size = 39361

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-11-21 20:47:14 .....         3860         3860  Index/Document.iwa
2019-11-21 20:47:14 .....           26           26  Index/Tables/DataList.iwa
2019-11-21 20:47:14 .....          336          336  Index/ViewState.iwa
2019-11-21 20:47:14 .....          160          160  Index/CalculationEngine.iwa
2019-11-21 20:47:14 .....          121          121  Index/DocumentStylesheet.iwa
2019-11-21 20:47:14 .....        31931        31931  Index/ThemeStylesheet.iwa
2019-11-21 20:47:14 .....           22           22  Index/AnnotationAuthorStorage.iwa
2019-11-21 20:47:14 .....         1889         1889  Index/Metadata.iwa
------------------- ----- ------------ ------------  ------------------------
2019-11-21 20:47:14              38345        38345  8 files

Luckily Apple came to their senses and went back to the ZIP container format for iWork 2014 and later. The container signature looks for the IWA file Apple started using with iWork 2013.

filename : 'iWork 2014/Pages2014-Sample.pages'
filesize : 66256
modified : 2019-11-22T00:03:56-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/1441'
    format  : 'Apple iWork Document'
    version : '14'
    class   : 'Presentation, Spreadsheet, Word Processor'
    basis   : 'extension match pages; container name Index/Document.iwa with byte match at 16, 6; name Metadata/Properties.plist with name only'
Path = iWork 2014/Pages2014-Sample.pages
Type = zip
Physical Size = 66256

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-11-22 00:03:54 .....         3930         3930  Index/Document.iwa
2019-11-22 00:03:54 .....          364          364  Index/ViewState.iwa
2019-11-22 00:03:54 .....          206          206  Index/CalculationEngine.iwa
2019-11-22 00:03:54 .....        33573        33573  Index/DocumentStylesheet.iwa
2019-11-22 00:03:54 .....           22           22  Index/AnnotationAuthorStorage.iwa
2019-11-22 00:03:54 .....           23           23  Index/DocumentMetadata.iwa
2019-11-22 00:03:54 .....         8761         8761  Index/Metadata.iwa
2019-11-22 00:03:54 .....          322          322  Metadata/Properties.plist
2019-11-22 00:03:54 .....           36           36  Metadata/DocumentIdentifier
2019-11-22 00:03:54 .....          273          273  Metadata/BuildVersionHistory.plist
2019-11-22 00:03:54 .....        14611        14611  preview.jpg
2019-11-22 00:03:54 .....          838          838  preview-micro.jpg
2019-11-22 00:03:54 .....         1571         1571  preview-web.jpg
------------------- ----- ------------ ------------  ------------------------
2019-11-22 00:03:54              64530        64530  13 files

Now iWork was not the only Apple software to use the Package/Bundle format for their documents. Be advised the following software may save to the package format.

I remember a few years ago, Trent Reznor (NIN) decided to release a few of his tracks in the Garageband format. A little harder to find these days, but the good old wayback machine kept a copy for us! Grab them here. Be warned, they may be in the package format. Thanks Apple!