Scholars Portal Registry of File Formats
1.Policy Statement
In order to achieve necessary operational efficiencies, SP requires immediate identification of the type of file format sent by the publisher. This assists with smooth ingestion and further migration.To do this, SP employs the use of DROID, JHOVE and PRONOM.
While Scholars Portal is not dependent on or restricted to any particular format or group of formats, it aims to use well-known, widely accepted formats that support long-term preservation. Almost all the files in the repository are in PDF, XML, JPEG, or GIF format. If a publisher wants to use a different format an agreement must be reached between the publisher and SP.
2. Implementation Examples
2.1. DROID
- Scholars Portal makes use of DROID for format identification during the ingestion process where a file format is associated with each file.
- DROID (digital record object identification) is a software tool developed by the National Archives to perform automated batch identification of file formats. It is a platform-independent Java tool, which is freely available to download under an open source license.1
2.2. JHOVE
- Scholars Portal employs JHOVE as a tool for further format-specific identification, validation and characterization of the file.
- JHOVE (JSTOR/Harvard Object Validation Environment) is an extensible framework for format validation created by a collaboration between JSTOR and Harvard University Library.
- The initial release of JHOVE includes modules for arbitrary byte streams,ASCII and UTF-8 encoded text, GIF, JPEG2000, and JPEG, and TIFF images, AIFF and WAVE audio, PDF, HTML, and XML; and text and XML output handlers.2
2.3. PRONOM
- During the process of DROID identification, a file format is associated with each file, and, where possible, the file is linked to the format's entry in PRONOM, the British National Archive's format registry.
- PRONOM is a resource providing impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.3
Example characterization and reference to format registry:
<PREMIS:format> <PREMIS:formatDesignation> <PREMIS:formatName>Acrobat PDF 1.4 - Portable Document Format</PREMIS:formatName> <PREMIS:formatVersion>1.4</PREMIS:formatVersion> </PREMIS:formatDesignation> <PREMIS:formatRegistry> <PREMIS:formatRegistryName>http://www.nationalarchives.gov.uk/pronom</PREMIS:formatRegistryName> <PREMIS:formatRegistryKey>fmt/18</PREMIS:formatRegistryKey> </PREMIS:formatRegistry> </PREMIS:format>
3. References
- Sourceforge.net. (2009). DROID. Retrieved from http://droid.sourceforge.net /
- Harvard University Library. (2009, February 25). JHOVE – JSTOR/Harvard Object Validation Environment. Retrieved from http://hul.harvard.edu/jhove
- The National Archives. The Technical Registry PRONOM. Retrieved from
http://www.nationalarchives.gov.uk/PRONOM/Default.aspx
4. Document History
Version |
Date |
Change |
Author |
0.1 |
09/27/11 |
Draft created |
Aurianne Steinman |
0.2 |
09/29/11 |
Formatted |
Aurianne Steinman |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
See also: