Date: Thu, 28 Mar 2024 17:23:44 -0400 (EDT)
Message-ID: <2085870585.558.1711661024031@spatlassian.scholarsportal.info>
Subject: Exported From Confluence
MIME-Version: 1.0
Content-Type: multipart/related;
boundary="----=_Part_557_302744119.1711661024030"
------=_Part_557_302744119.1711661024030
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Content-Location: file:///C:/exported.html
Scholars Portal URI=
and File Naming Policy
1. Policy State=
ment
URIs created by Scholars Portal
- Scholars Portal uses a systematic convention to generate unambiguously =
unique identification for digital objects within its repository. This conve=
ntion will create a stable name or reference to an object that can be perma=
nently associated with that object, regardless of future changes to organiz=
ational structure or to digital access protocols.
- This is in conformance with section 4.2.4 of Metrics for Digital Reposi=
tory Audit and Certification (CCSDS, June 2009) which states that a complia=
nt repository =E2=80=9Cshall have and use a convention that generates persi=
stent, unique identifiers for all AIPs=E2=80=9D and =E2=80=9Cits components=
.=E2=80=9D
- This convention will ensure that =E2=80=9Ceach AIP can be unambiguously=
found in the future=E2=80=9D and that =E2=80=9Ceach AIP can be distinguish=
ed from all other AIPs in the repository=E2=80=9D
2. Implementatio=
n
2.1. Journal articles
2.1.1. Scholars Portal URIs are consistently constructed in=
the following manner:
- /<ISSN>/v<volume number>i<issue number padded to four di=
gits>/<article hash>
- The article hash is generated by concatenating the starting page number=
of the article, an underscore character, the first letter of the first six=
words in the article title, and the first letter in the last six words in =
the article title. In cases where there are not enough words in the article=
title to construct to this specification, the first letters of each word i=
n the title are used.
Examples:
- Article: DNA-Directed Self-A=
ssembly of Gold Nanoparticles onto Nanopatterned Surfaces: Controlled Place=
ment of Individual Nanoparticles into Regular Arrays - ACS Nano (October 20=
10), 4 (10), pg. 6153-6161
- URI: /19360851/v04i0010/6153=
_dsognooinira
- Article: A possible chemical=
burn to the scalp following hair highlights - Burns (June 2005), 31 (4), p=
g. 530-531
- URI: /03054179/v31i0004/530_=
apcbttsfhh
In the case of a collision, the URI that was generated will be appended =
with an underscore (_) and a sequential number beginning with one. This num=
ber will increment for each duplicate URI.
Example:
- Article: Book Review - Journal of the Frankli=
n Institute (September 1944), 238 (3), pg. 224-224
- URI: /00160032/v238i0003/224_br
- Article: Book Review - Journal of the Frankli=
n Institute (September 1944), 238 (3), pg. 224-224
- URI: /00160032/v238i0003/224_br_1
In the case of a replacement article, the new copy of an article will su=
persede the old and claim the original identifier. The old copy of the arti=
cle will retain the original identifier with =E2=80=9C_old1=E2=80=9D append=
ed to the end. In the case of subsequent replacements, the replaced article=
will be appended with "_old<X>", where <X> is the next availab=
le integer.
Example:
- Article: The Myth of the Unicorn - Diogenes (=
September 1982), 30 (119), pg. 1-23
- URI: /03921921/v30i0119/1_tmotu
- Old Article: The Myth of the Unicorn - Diogen=
es (September 1982), 30 (119), pg. 1-23
- URI: /03921921/v30i0119/1_tmotu_old1
The URI is used not only as the unique identifier for the item, but also=
as a path to the item's file in Marklogic on both the search index and the=
preservation database, it provides a link between the preservation engine =
and the Scholars Portal search engine. The Marklogic path takes the form of=
:
h=
ttp://journals2.scholarsportal.info/details.xqy?uri=3D/03054179/v31i0004/53=
0_apcbttsfhh.xml
2.1.2. URIs for individual files and events created by Scho=
lars Portal
Scholars Portal URIs for individual files and events are consistently co=
nstructed in the following manner:
- The URI of the parent object and a hash generated from the current date=
/time are concatenated. PDF files are identified by adding =E2=80=9Cpdf_ful=
ltext=E2=80=9D after the parent object URI and then the current date\time h=
ash. While XML files are identified by adding =E2=80=9Cxml_fulltext=E2=80=
=9D after the parent object URI and then the current date\time hash. In thi=
s way, there should be no chance of collisions.
Examples:
- Parent object: A C. elegans LSD1 Demethylase =
Contributes to Germline Immortality by Reprogramming Epigenetic Memory - Ce=
ll (April 2009), 137 (2), pg. 308-320
- Parent object URI: /00928674/v137i0002/308_ac=
eldcgibrem
- PDF Fulltext: /00928674/v137i0002/308_aceldcg=
ibrem/pdf_fulltext/1303399300907
- XML Fulltext: /00928674/v137i0002/308_aceldcg=
ibrem/xml_fulltext/130339930294
- Other files: /00928674/v137i0002/308_aceldcgi=
brem/1303399302709
- Events: /00928674/v137i0002/308_aceldcgibrem/=
1303399314628
2.1.3. File system path structure for content files
- Content objects are stored in the same directory structure in which the=
y arrived, which is transferred to a filesystem specific to the collection =
(e.g., ejournals1), in a directory specified for the publisher and named at=
the time a loader script is written. Both the filesystem name and the publ=
isher directory name are followed by a sequential identifier, starting with=
1. Publisher directories are limited in space to 200GB, so when the direct=
ory reaches that size, a new one is created, and the sequential identifier =
is incremented. (e.g. kluwer1, kluwer2, kluwer3)
- When the current filesystem reaches 2 TB in size, it is unmounted and r=
emounted as a read-only volume. At this time, a new filesystem is created, =
and the filesystem's sequential identifier is incremented. (e.g. ejournals1=
, ejournals2) Any publisher directories that were on the old filesystem are=
considered closed, and the sequential identifier will be incremented for t=
he next directory created for that publisher.
Please see Move Event Diagram
Example paths:
- /mnt/pillar/ejournals2/wiley5/
- /mnt/pillar/ejournals3/wiley6/
- /mnt/pillar/ejournals3/ieee2/
3. References
3.1. Metrics for Digital Repository Audit and Certification (2009) CCSDS=
XXX.0-R-1. Red Book. Issue 1. June 2009.
4. Document History
Version |
Date |
Change |
Author |
0.1 |
08/23/11 |
Draft created |
Aurianne Steinman |
0.2 |
09/01/11 |
Draft formatted |
Aurianne Steinman |
0.3 |
09/20/11 |
Suggested edits |
Aurianne Steinman |
|
|
|
|
|
|
|
|
|
|
|
|
------=_Part_557_302744119.1711661024030--