Up until recently I was working in a Corporate archive preserving all sort of content. The corporation throughout the years used many different software packages to produce all sorts of data. When I moved to an academic library I saw much of the same content, but there was a some new file formats which I needed to document and manage. Many of those come from scholarly journals , theses, dissertation, and data sets for projects.
One format which I came across often but seems to be missing from the standard file format known lists was the RefMan citation format. This format is a simple text based format which serves to standardize citations from scholarly sources. Created by Research Information Systems, the format uses the RIS extension used by Procite and Reference Manager (RefMan). ISI ResearchSoft managed the format for a bit in the 1990’s, this is where you can find most of the specifications.
Now that I am a little more familiar with the format I see it everywhere! Find any scholarly journal and there will usually be a “cite” feature to download the citation in a few formats, RIS being one of the most common.
It can be called by a few names, mostly based on the systems which support it. You might see Ris (Zotero), or EasyBib, Mendeley, ProCite, Reference Manager, and others. But they all follow the same format.
The format is simple plain text format, there are codes which indicate the different field types and tags. The basic structure would look like this:
TY - BOOK AU - Owens, Trevor LA - eng PB - Johns Hopkins University Press Baltimore, Maryland CY - Baltimore, Maryland SN - 9781421426976; 1421426978 PY - 2018 TI - The theory and craft of digital preservation LK - https://worldcat.org/title/1030899528 ER -
Provider: The name of the information provider (required) Database: The name of the database (optional) Tagformat: Name of the tag format used identify fields (optional) Content: media type for the body of the file (required)
Creation of a PRONOM signature for this text format is pretty straight forward. Looking for the TY and ER string should be enough to ensure the format doesn’t clash with other text based formats. Text formats are notoriously difficult to identify, but when they have expected patterns it makes it a little easier. I had to add a little buffer at the beginning of the signature to allow for the newer header information, but more samples will be needed to see if this is enough to identify the format in all situations. Take a look and see if it works for you!