An Archival Resource Key (ARK) is a multi-purpose URL suited to being a persistent identifier for information objects of any type. It is widely used by libraries, data centers, archives, museums, publishers, and government agencies to provide reliable references to scholarly, scientific, and cultural objects. In 2019 it was registered as a Uniform Resource Identifier (URI) scheme.[1]

Archival Resource Key
AcronymARK
OrganisationARK Alliance
Introduced2001 (2001)
No. issued8.2 billion
No. of digitsvariable
Check digitNCDA, optional
Exampleark:/53355/cl010066723
Websitearks.org

A URL that is an ARK is distinguished by the label ark: after the URL's hostname, which sets the expectation that, when submitted to a web browser, the URL terminated by '?' returns a brief metadata record, and the URL terminated by '??' returns metadata that includes a commitment statement from the current service provider.[incomprehensible] The ARK and its inflections ('?' and '??') provide access to three facets of a provider's ability to provide persistence.

Implicit in the design of the ARK scheme is that persistence is purely a matter of service and not a property of a naming syntax. Moreover, that a "persistent identifier" cannot be born persistent, but an identifier from any scheme may only be proved persistent over time. The inflections provide information with which to judge an identifier's likelihood of persistence.

History

edit

Throughout the 1990s, the Internet Engineering Task Force and other organizations developed standards for persistent identifiers for web resources, including URN, PURL, Handle, and DOI. In each of these standards, indirect identifiers would resolve to URLs, which themselves changed over time. Many believed that such systems would contribute to the persistence of web resources over time. [2]

In 2001, John Kunze of the University of California and R. P. Channing Rodgers of the United States National Library of Medicine released the first draft of “The ARK Persistent Identifier Scheme,” designed in response to the needs of their two organizations, as an IETF working document.[3] In explaining their motivations for creating a new system, Kunze later wrote that “each [persistent identifier] system had specific problems.” In contrast to the decentralized structure of the web, with many independent publishers, Handle and DOI were related centralized systems which charged for inclusion; they were “antithetical,” according to Kunze, “to an implicit principle that Internet standards must not endorse control by any one entity, over access to the networked resources of another entity.” URNs were free, but lacked a resolver discovery services, and, wrote Kunze, “it seemed to me that the IETF community lost interest in creating a whole new Internet indirection infrastructure that would add little to existing web and DNS mechanisms, especially in light of the small part that indirection plays in keeping links from breaking.”[2]

In contrast to these other systems, the ARK scheme proposed that “persistence is purely a matter of service,… neither inherent in an object nor conferred on it by a particular naming syntax.” The most an identifier could do to solve the problem of persistence, then, was to indicate an organization’s commitment. Accordingly, in the ARK standard, identifiers would refer not only to a web resource, but also to “a promise of stewardship” and metadata about the resource. If a web server was queried with an ARK, it should return the resource itself or some surrogate for it, such as “a table of contents instead of a large complex document.” If a question mark was appended to the ARK, though, it should return a description—metadata—instead, which “must at minimum answer the who, what, when, and why questions concern an expression of the object.” (The scheme also included a guide to Electronic Resource Citations, a simple format for structuring this metadata.) If two question marks were appended, the server should return the provider’s policies regarding “object persistence, object naming, object fragment addressing, and operational service support.”[3]

California Digital Library began using ARKs in 2002, and released the Noid (Nice Opaque IDentifiers) software for managing ARKs and other identifiers in 2004. Other early adopters of ARKs included Portico, the Internet Archive, and the Bibliothèque nationale de France, the first of several francophone institutions to adopt the scheme.

In 2018, the California Digital Library and DuraSpace announced a collaboration, initially named ARKs-in-the-Open and then the ARK Alliance, to build an international community around ARKs and their use in open scholarship. By 2021, over 800 institutions registered to use ARKs.[2]

Structure

edit
https://NMA/ark:/NAAN/Name[Qualifier]
  • NAAN: Name Assigning Authority Number - mandatory unique identifier of the organization that originally named the object
  • NMA: Name Mapping Authority - optional and replaceable hostname of an organization that currently provides service for the object
  • Qualifier: optional string that extends the base ARK to support access to individual hierarchical subcomponents of an object,[4] and to variants (versions, languages, formats) of components.[5]

A complete NAAN registry[6] is maintained by the ARK Alliance and replicated at the Bibliothèque Nationale de France and the US National Library of Medicine. It contained 530 entries in June 2018, 633 in July 2020, and 754 in April 2021.

Application

edit

ARKs may be assigned to anything digital, physical, or abstract. Below are examples, as reported (2020) to the ARK Alliance by the linked organizations.

Generic services

edit

Three generic ARK services have been defined. They are described below in protocol-independent terms. Delivering these services may be implemented through many possible methods given available technology (today's or future).

Access service (access, location)

edit
  • Returns (a copy of) the object or a redirect to the same, although a sensible object proxy may be substituted (for instance a table of contents instead of a large document).
  • May also return a discriminated list of alternate object locators.
  • If access is denied, returns an explanation of the object's current (perhaps permanent) inaccessibility.

Policy service (permanence, naming, etc.)

edit
  • Returns declarations of policy and support commitments for given ARKs.
  • Declarations are returned in either a structured metadata format or a human readable text format; sometimes one format may serve both purposes.
  • Policy subareas may be addressed in separate requests, but the following areas should be covered:
    • object permanence,
    • object naming,
    • object fragment addressing, and
    • operational service support.

Description service

edit
  • Returns a description of the object. Descriptions are returned in either a structured metadata format or a human readable text format; sometimes one format may serve both purposes.
  • A description must at a minimum answer the who, what, when, and where questions concerning an expression of the object.
  • Standalone descriptions should be accompanied by the modification date and source of the description itself.
  • May also return discriminated lists of ARKs that are related to the given ARK.

See also

edit

Notes and references

edit
  1. ^ "Uniform Resource Identifier (URI) Schemes".
  2. ^ a b c Meyerl, Jordan (September 14, 2021). "ARK Alliance: An Interview with John Kunze". bloggERS. Society of American Archivists Electronic Resources Section.
  3. ^ a b Kunze, J.; Rodgers, R. P. C. (March 8, 2001). "The ARK Persistent Identifier Scheme". IETF Datatracker. Internet Engineering Task Force.
  4. ^ Hierarchy qualifiers begin with a slash character.
  5. ^ Variant qualifiers begin with a dot character.
  6. ^ Name Assigning Authority Number registry
edit