Skip to content

06. OAI PMH endpoint

Matthias Vandermaesen edited this page Mar 19, 2019 · 1 revision

What is OAI-PMH?

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. The Datahub provides OAI-PMH support for data providers out-of-the-box.

For more details on the protocol, implementation, and uses please visit the OAI-PMH web site.

OAI-PMH Endpoint

The OAI-PMH endpoint for the Datahub is available at /oai:

foo@bar:~$ curl -i https://organisation.org/oai/?verb=Identify
HTTP/1.1 200 OK
Server: nginx
Content-Type: text/xml; charset=utf8
Connection: keep-alive
Date: Thu, 07 Feb 2019 06:39:29 GMT

<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2019-02-07T06:39:29Z</responseDate>
  <request verb="Identify">https://organisation.org</request>
  <Identify>
    <repositoryName>Your organisation name</repositoryName>
    <baseURL>http://organisation.org</baseURL>
    <protocolVersion>2.0</protocolVersion>
    <adminEmail>[email protected]</adminEmail>
    <earliestDatestamp>2019-02-07T06:39:29Z</earliestDatestamp>
    <deletedRecord>no</deletedRecord>
    <granularity>YYYY-MM-DDThh:mm:ssZ</granularity>
  </Identify>
</OAI-PMH>

Harvesting records via OAI-PMH

OAI-PMH is a set of six verbs or services that are invoked within HTTP.

  • Identify - used to retrieve information about the repository.
  • ListIdentifiers - used to retrieve record headers from the repository.
  • ListRecords - used to harvest full records from the repository.
  • ListSets - used to retrieve the set structure of the repository.
  • ListMetadataFormats - lists available metadata formats that the repository can disseminate.
  • GetRecord - used to retrieve an individual record from the repository.

Selective harvesting can be performed by the use of accompanying parameters. Available parameters are:

  • identifier - specifies a specific record identifier.
  • metadataPrefix - specifies the metadata format that the records will be returned in.
  • set - specifies the set that returned records must belong to.
  • from - specifies that records returned must have been created/update/ deleted on or after this date.
  • until - specifies that records returned must have been created/update/ deleted on or before this date.
  • resumptionToken - a token previously provided by the server to resume a request where it last left off.

The verbs and parameters can be combined to issue requests to the service such as:

  • //organisation.org/oai/?verb=Identify
  • //organisation.org/oai/?verb=ListIdentifiers&metadataPrefix=oai_lido
  • //organisation.org/oai/?verb=ListRecords&from=2011-06-01T00:00:00Z&metadataPrefix=oai_lido

Available Metadata Formats

The OAI-PMH supports the same metadata formats as defined in the list of Media Types supported by the REST API. It's possible to request a list of available metadata formats through the ListMetadataFormats verb. Specifying the correct metadata format through the metadataPrefix prefix is mandatory for the GetRecord, ListRecords and ListIdentifiers verbs.

foo@bar:~$ curl -i https://organisation.org/oai/?verb=ListMetadataFormats
HTTP/1.1 200 OK
Server: nginx
Content-Type: text/xml; charset=utf8
Connection: keep-alive
Date: Thu, 07 Feb 2019 06:43:00 GMT

<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2019-02-07T06:43:00Z</responseDate>
  <request verb="ListMetadataFormats">https://organisation.org</request>
  <ListMetadataFormats>
    <metadataFormat>
      <metadataPrefix>oai_dc</metadataPrefix>
      <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
      <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
    </metadataFormat>
    <metadataFormat>
      <metadataPrefix>oai_lido</metadataPrefix>
      <schema>http://www.lido-schema.org/schema/v1.0/lido-v1.0.xsd</schema>
      <metadataNamespace>http://www.lido-schema.org/</metadataNamespace>
    </metadataFormat>
  </ListMetadataFormats>
</OAI-PMH>