Skip to content

4Science/rioxxintegration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSpace Ref+Rioxx by 4Science

This repository contains the integration addon for REF+RIOXX

The original work is explained at https://github.com/atmire/RIOXX/blob/master/README.md

Note that the general concepts are the same, 4Science simplified the installation procedure, essentially read the New patch installation and How to disable REF

Introduction

This documentation will help you deploy and configure the RIOXXv2 Application Profile for DSpace 5.10 and 6.3. The patch has been implemented in a generic way, using a maven artifact. This means that changes to your existing DSpace installation could be overridden by this procedure.

Areas of DSpace affected by the RIOXX patch

Following areas of the DSpace codebase are affected by the RIOXX patch:

  • Metadata Registries: a new RIOXX metadata registry will be added with a number of new fields. This does not affect your existing metadata schema's or items
  • OAI Endpoint: a new RIOXX endpoint will become available in your OAI-PMH interface, in order to allow external harvesters to harvest your repository metadata in RIOXX compliant format.
  • SWORD V2 Endpoint: The SWORD V2 ingest will be improved to allow for RIOXX compliant SWORD V2 ingests into DSpace.

It is important to realize that your existing item metadata and item display pages will NOT be modified as part of the RIOXX patch.

Areas of DSpace that have to be manually configured after applying the patch

Submission forms: the configuration file that defines your submission forms, input-forms.xml needs to be be extended with a number of new entry options.

Because the vast majority of institutions make at least small tweaks to the submission forms, there is no opportunity to apply a patch to a standardized file. A template submission form file where the new REF and RIOXX fields are highlighted can be found on Github:

https://github.com/4Science/rioxxintegration/tree/master/example/input-forms.xml https://github.com/4Science/rioxxintegration/tree/master/example/item-submission.xml

See rioxxterms.* fields + value pairs + dcterms.dateAccepted

Please note that we have also provided examples for REF only or RIOXX only input forms and submission configuration: https://github.com/4Science/rioxxintegration/tree/master/example/input-forms-ref.xml https://github.com/4Science/rioxxintegration/tree/master/example/item-submission-ref.xml

https://github.com/4Science/rioxxintegration/tree/master/example/input-forms-rioxx.xml https://github.com/4Science/rioxxintegration/tree/master/example/item-submission-rioxx.xml

Compatibility with the REF patch

The RIOXX patch contains also the REF patch. Both are installed at the same time. But you can choose to disable REF feature.

To disable the REF feature set the follow configuration to false:

  1. https://github.com/4Science/rioxxintegration/tree/master/example/modules/rioxx.cfg#L9
  2. https://github.com/4Science/rioxxintegration/tree/master/example/modules/item-compliance.cfg#L32

Manually change this configuration:

  1. From https://github.com/4Science/rioxxintegration/tree/master/example/input-forms.xml remove the metadata: refterms.panel and refterms.dateFirstOnline
  2. From https://github.com/4Science/rioxxintegration/tree/master/example/item-submission.xml remove the step: REFExceptionStep and REFComplianceStep
  3. If you enabled the XML Workflow please remove https://github.com/4Science/rioxxintegration/tree/master/example/workflow.xml#L38 and https://github.com/4Science/rioxxintegration/tree/master/example/workflow.xml#L56

Metadata mapping

Before diving into the process of installing the RIOXX patch, it is crucial that you take note of the specific DSpace=>RIOXX metadata mapping that this patch implements. Your use of the different dc and dcterms fields in DSpace may be different from a standard installation, in which case you may need to do some additional activities before or after applying the patch.

The following table lists the different metadata elements, according to the order specified in http://rioxx.net/v2-0-final/.
The DSpace metadata column indicates where the corresponding RIOXX elements are stored in the DSpace metadata.

Existing fields from the dc and dcterms namespace were used where possible. A number of new fields were added in a dedicated rioxxterms metadata registry.

General DSpace to RIOXXTERMS metadata mapping

DSpace metadata RIOXX element example DSpace value example RIOXX value
Bitstream metadata ali:free_to_read See separate table with bitstream derived metadata below. See separate table with bitstream derived metadata below.
dc.rights.uri ali:license_ref http://creativecommons.org/licenses/by/3.0/igo/ <ali:license_ref start_date="2015-01-20">
http://creativecommons.org/licenses/by/3.0/igo/
</ali:license_ref>
dc.date.issued ali:license_ref:startdate 2015-01-20 <ali:license_ref start_date="2015-01-20">
http://creativecommons.org/licenses/by/3.0/igo/
</ali:license_ref>
dc.coverage dc:coverage Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W <dc:coverage>
Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W
</dc:coverage>
dc.description.abstract dc:description example item <dc:description>
example item
</dc:description>
Bitstream metadata dc:format See separate table with bitstream derived metadata below. See separate table with bitstream derived metadata below.
Bitstream metadata dc:identifier See separate table with bitstream derived metadata below. See separate table with bitstream derived metadata below.
rioxxterms.openaccess.uri dc:identifier See separate table with bitstream derived metadata below. See separate table with bitstream derived metadata below.
dc.language.iso dc:language en-GB <dc:language>
en-GB
</dc:language>
dc.publisher dc:publisher PLOS ONE <dc:publisher>
PLOS ONE
</dc:publisher>
dc.relation.uri dc:relation http://datadryad.org/resource/doi:10.5061/dryad.tg469 <dc:relation>
http://datadryad.org/resource/doi:10.5061/dryad.tg469
</dc:relation>
dc.identifier.isbn dc:source 0-14-020652-3 <dc:source>
0-14-020652-3
</dc:source>
dc.identifier.issn dc:source 1456-2979 <dc:source>
1456-2979
</dc:source>
dc.subject dc:subject example <dc:subject>
example
</dc:subject>
dc.title dc:title Title:Subtitle <dc:title>
Title:Subtitle
</dc:title>
dcterms.dateAccepted dcterms:dateAccepted 2015-02-10 <dcterms:dateAccepted>
2015-02-10
</dcterms:dateAccepted>
rioxxterms.apc rioxxterms:apc paid <rioxxterms:apc>
paid
</rioxxterms:apc>
dc.contributor.author (first) rioxxterms:author (+ attribute "first-named-author=true") Lawson, Gerry <rioxxterms:author id="http://orcid.org/0000-0002-1395-3092" first-named-author="true">
Lawson, Gerry
</rioxxterms:author>
dc.contributor.author (others) rioxxterms:author Lawson, Gerry <rioxxterms:author id="http://orcid.org/0000-0002-1395-3092" first-named-author="false">
Lawson, Gerry
</rioxxterms:author>
dc.contributor.* (non authors) rioxxterms:contributor Lawson, Gerry <rioxxterms:contributor id="http://orcid.org/0000-0002-1395-3092">
Lawson, Gerry
</rioxxterms:contributor>
rioxxterms.identifier.project rioxxterms:project EP/K023195/1 <rioxxterms:project rioxxterms:funder_name="Engineering and Physical Sciences Research Council" rioxxterms:funder_id="http://dx.doi.org/10.13039/501100000266">
EP/K023195/1
</rioxxterms:project>
rioxxterms.funder rioxxterms:project Engineering and Physical Sciences Research Council <rioxxterms:project rioxxterms:funder_name="Engineering and Physical Sciences Research Council" rioxxterms:funder_id="http://dx.doi.org/10.13039/501100000266">
EP/K023195/1
</rioxxterms:project>
rioxxterms.funder.project rioxxterms:project d90b33fb16bbac756120dd85cbad3940
(NOTE: This is an internal key identifying the project, it will be hidden by default for non-admin users)
<rioxxterms:project rioxxterms:funder_name="Engineering and Physical Sciences Research Council" rioxxterms:funder_id="http://dx.doi.org/10.13039/501100000266">
EP/K023195/1
</rioxxterms:project>
dc.date.issued rioxxterms:publication_date 2015-02-15 <rioxxterms:publication_date>
2015-02-15
</rioxxterms:publication_date>
rioxxterms.type with dc.type fallback rioxxterms:type Book <rioxxterms:type>
Book
</rioxxterms:type>
rioxxterms.version rioxxterms:version AO <rioxxterms:version>
AO
</rioxxterms:version>
rioxxterms.versionofrecord rioxxterms:version_of_record http://dx.doi.org/10.1006/jmbi.1995.0238 <rioxxterms:version_of_record>
http://dx.doi.org/10.1006/jmbi.1995.0238
</rioxxterms:version_of_record>

RIOXX metadata derived from DSpace Bitstream metadata

Because DSpace supports multiple files per attached metadata record, there is a split between information stored in the metadata record and information stored with the bitstreams.

For the three fields shown in the table below, data is retrieved from the bitstream metadata for the bitstream indicated as the "primary bitstream" ("embargo" information is currently retrieved on all bitstreams of the ORIGINAL bundle). Logic for selection of the primary bitstream has been improved by adding the option of identifying it by matching the filename to a regex pattern (check https://github.com/4Science/rioxxintegration/tree/master/example/modules/oai.cfg#L120). By default if the user manually selected the bitstream as primary then the RIOXX OAI context will honour that, otherwise the bitstream filenames will be compared to the regex and if a match is found that will be designated the primary; if there is no regex match then primary bitstream defaults to the first in the list of the uploaded files.

DSpace bitstream RIOXX element example DSpace value example RIOXX value
format dc:format application/pdf <dc:format>
application/pdf
</dc:format>
url dc:identifier https://example.com/dspace/bitstream/123456789/10/1/example.pdf <dc:identifier>
https://example.com/dspace/bitstream/123456789/10/1/example.pdf
</dc:identifier>
embargo ali:free_to_read 2015-08-27 <ali:free_to_read start_date="2015-08-27">
</ali:free_to_read>

The RIOXX patch relies on the activation of the standard DSpace embargo functionality, and will ready the date for ali:free_to_read from the Resource policy set on the bitstream.
Currently, there is no specific support provided for end_date, assuming that once access is open, there is no specific use case for closing it again.

dc:source mandatory where applicable

The RIOXX specification states that dc.source is mandatory where applicable. The DSpace RIOXX patch does currently not enforce this: ISSN and ISBN are merely provided in the crosswalk when they are filled out.
In the standard DSpace submission form, ISBN and ISSN can be provided in a field for identifiers, that has a dropdown where the user first needs to select the identifier type.

If you are primarily collecting materials for which an ISSN applies, it is recommended to use a separate, custom field for ISSN that fills dc.identifier.issn, and make that field mandatory.

dc:type fallback for rioxxterms:type

There is a substantial overlap between the vocabulary for rioxxterms:type and the standard list for dc.type. To ensure all of the rioxxterms types are available to your submitters, it is recommended to put a specific rioxxterms.type field in place, that uses the specifc vocabulary.

However, in case rioxxterms.type is absent in your items, the OAI-PMH crosswalk provides a basic mapping between dc.type and rioxxterms:type for those types that can be unambiguously mapped:

DSpace type RIOXX type
Article Journal Article/Review
Book Book
Book chapter Book chapter
Technical Report Technical Report
Thesis Thesis
Working Paper Working paper

fundref id for funders and orcid id for authors

Fundref DOI's for funders and ORCID id's for authors are NOT stored in the actual metadata value for the fields above. The metadata values only contain the string representations of funders and authors.
The RIOXX OAI-PMH crosswalks retrieves the ORCIDs for authors and fundref ids for funders from the DSpace SOLR Authority cache. This feature was added in DSpace 5, but was backported to DSpace 3.x and 4.x as part of the RIOXX patch.

Right now, this only affects institutions that use the XMLUI, since the JSPUI has no web UI yet for working with this authority cache. However, JSPUI institutions are still compliant with RIOXX as the string representations of funder and author are included in the RIOXX OAI-PMH crosswalk.

multiple funders and project

The item submission has been updated with a new step called projects. This step allows the submitter to associate his submission with one or more projects. Each of these projects is associated with a funder.

Using the 'Lookup Project' button the submitter can lookup projects that are already associated with another submission. When a project is selected, the associated funder will be automatically filled out as well.

if a project was not entered before, the submitter can create a new project. The new project's identifier must be filled out in the project input field and a funder to associate with the new project must be selected by using the 'Lookup Funder' button.

It is not possible to create a new funder during the submission, only existing funders can be selected. Refer to section 5. Load Fundref authority data in the Patch Installation procedures to learn how to load funder data into DSpace.

configuration

The behaviour of this new submission step can be configured in dspace/config/modules/rioxx.cfg.

Property submission.funder.required is used to configure if at least one project funder pair should be filled in before continuing to the next submission step.

submission.funder.required = true

Property submission.funder.enableDcSponsorship is used to enable the addition of sponsorships or other sources of funding that do not provide a formal project or grant ID. If this property is enabled a free text field will be available in the project step. This free text field is not authority controlled.

It is also possible to configure a default project funder pair to be used when the submitter did not select any project funder pairs before finishing the projects step. Properties authority.default.funder, authority.default.funderID and authority.default.project must all three be filled in for the default project funder pair to be added automatically.

authority.default.funder = Default funder
authority.default.funderID = 10.99999/999999999
authority.default.project = Default project

authority.default.funder is the name of the default funder. authority.default.funderID is the ID of the default funder. authority.default.project is the name of the default project.

warning messages during submission

As described in the multiple funders and projects configuration there are different combinations that state if a funder is required, and what values to use as a default. Depending on what combination is configured, a specified warning message will be shown.

There following rules are currently in place to set these warning messages:

Combination Warning message
Project and funder combination is required, and a default funder/project is configured Caution: Without manually selecting a project or funder, this submission will receive project ID "{0}" and funder "{1}". NOTE:{0} and {1} are the project and funder configured in dspace/config/modules/rioxx.cfg
Project and funder combination is required, No defaults configured Caution: Without manually selecting a project or funder, this submission will not receive the required project ID and funder. Please make sure to complete these fields using the provided lookup.
Project and funder combination not required, No defaults configured Caution: Without manually selecting a project or funder, this submission will not receive a project ID or funder. If this submission is desired to be RIOXX compliant, please make sure to complete these fields using the provided lookup.

Edit funders page

An additional functionality to edit the funding of an already archived item has been added. This enables users with the proper rights to add or remove project and funder pairs from the item.

This can be accessed on the "Edit item" page present in the user's context. This contains a new "Item Funding" tab that encompasses the addition and removal of project/funder. This does not however contain the assumption that a default funder and project should be used when no project/funder pair is given. It is the end-users responsibility to ensure the integrity of the item's metadata.

The rest of this new page is used in the same way as the normal "ProjectStep" during submission. A user can select a project using the provided lookup button, which will also autocomplete the appropriate funder. If a user wants to enter a new project, he/she can enter one manually and add a funder using the lookup. (Empty values are prohibited during this addition as the default project/funder is disabled)

license reference ali:license_ref

The input forms customisations provide an input field to specify the license reference that is exposed by RIOXX. This input field uses metadata field rioxxterms.licenseref.uri to store the license reference.

The Creative Commons license submission step has been enabled to provide a fallback for the custom rioxxterms.licenseref.uri field. The license selected in this step is stored in metadata field dc.rights.uri.

If a DSpace item does not have a rioxxterms.licenseref.uri value, the dc.rights.uri value is used as fallback.

A DSpace item will not be available in RIOXX if both metadata fields rioxxterms.licenseref.uri and dc.rights.uri are empty.

date completion

The RIOXX specification requires dates to be in format YYYY-MM-DD. When a DSpace metadata field contains a shorter date in format YYYY-MM or YYYY, the RIOXX crosswalks will complete the date into the full format required by RIOXX.

Examples:

  • dc.date.issued "2015" in DSpace becomes "2015-01-01" when it is exposed in RIOXX as ali:license_ref:start_date.
  • dcterms.dateAccepted "2014-05" in DSpace becomes "2014-05-01" when it exposed in RIOXX as dcterms:dateAccepted

SWORD V2 configuration

The DSpace SWORD V2 interface is designed to work optimally with a specially designed XML schema allowing for unambiguous transmission of information such as licensing and funding.

An example XML input file can be found on https://github.com/jisc-services/Public-Documentation/blob/master/PublicationsRouter/sword-out/DSpace-RIOXX-XML.md.

The configuration for the RIOXX SWORD V2 mapping can be found in dspace/config/modules/swordv2-server.cfg.

The RIOXX metadata mapping configuration in this file can be recognized by the 'simplerioxx' prefix. This prefix is a reference to the Simple RIOXX ingester which is added to DSpace by the RIOXX patch to allow RIOXX compliant SWORD V2 ingests.

SWORD V2 RIOXX Mapping overview

simplerioxx.dcterms.description = dc.description
simplerioxx.dcterms.publisher = dc.publisher
simplerioxx.dcterms.title = dc.title
simplerioxx.rioxxterms.type = rioxxterms.type
simplerioxx.dcterms.language = dc.language.iso
simplerioxx.dcterms.abstract = dc.description.abstract
simplerioxx.rioxxterms.version_of_record = rioxxterms.versionofrecord, dc.identifier.doi
simplerioxx.dcterms.subject = dc.subject
simplerioxx.dcterms.dateAccepted = dcterms.dateAccepted
simplerioxx.rioxxterms.publication_date = dc.date.issued
simplerioxx.pubr.author = dc.contributor.author
simplerioxx.pubr.contributor = dc.contributor
simplerioxx.ali.license_ref = dc.rights.uri
simplerioxx.dcterms.rights = dc.rights
simplerioxx.pubr.embargo_date = dc.rights.embargodate
simplerioxx.rioxxterms.project = workflow.newfunderprojectpair
simplerioxx.rioxxterms.version = rioxxterms.version
simplerioxx.pubr.sponsorship = dc.description.sponsorship
simplerioxx.pubr.openaccess_uri = rioxxterms.openaccess.uri

Please note that if you are already using the simpledc mapping from the same configuration file for your SWORD deposit, they will still be considered unless they conflict with the simplerioxx mappings (if the same MD field is involved in both a simpledc mapping and a simplerioxx mapping, the simplerioxx mapping will have priority).

SWORD V2 Project/Funder ingestion

The XML schema allows for project/funder information to be supplied in two XML elements:

<rioxxterms:project funder_name="Some Funder Name" funder_id="Identifier-URL">Project/Grant-Number</rioxxterms:project>
<pubr:sponsorship>Funder: Some Funder Name, Grant no: Project/Grant-Number, Funder ID:  Identifier-URL </pubr:sponsorship>

rioxxterms:project
The RIOXX patch will attempt to match the rioxxterms:project details against funders in the fundref-registry (see [5. Load Fundref authority data](#load-fundref-data)) first on funder_id and, as a fallback, on funder_name. If a match is found, the DSpace metadata fields rioxxterms.identifier.project, rioxxterms.funder and rioxxterms.funder.project will be filled with respectively the Project/grant-number, the Funder name, and the internal key registered in DSpace for this project. If no match can be found, the metadata field rioxxterms.newfunderprojectpair will be filled with the full details, with the intention that these are curated by a repository manager/reviewer and funder details manually added to the registry.

rioxxterms.newfunderprojectpair
This metadata field is created when no funder is found in the DSpace Funder Registry that matches that supplied via SWORD in the rioxxterms:project XML element. The value of this field is filled in with the information received via SWORD in this format:

funder-ID::funder-name::project-code

Note that it is possible to receive a project-code without any funder details. In that case this would be presented as:

null::null::project-code

pubr:sponsorship
The contents of the pubr:sponsorship element are always added to the Dspace metadata field dc.description.sponsorship as a textual description of the funding (without modification by the ingester) and will never be exposed on the RIOXX OAI endpoint.

SWORD V2 Author attributes

The XML schema allows for additional author attributes to be supplied in the XML and elements :

<pubr:contributor id="http://orcid.org/0000-0002-8257-7777" email="[email protected]">Smith, John </pubr:contributor>
<pubr:author id="http://orcid.org/0000-0002-8257-4088" email="[email protected]">Vernoux, Teva </pubr:author>

Please note that these attributes will not be stored as metadata in the ingested item itself but, if the corresponding dspace fields (see SWORD V2 Rioxx mappings) are defined in your repository in the authority core, they will be stored as attributes of the author record within the SOLR core (which means that these attributes are available when using the author lookup in a manual submission for example).

SWORD V2 Example Ingestion with Curl

Step 1 : Ingest metadata

curl -v -i <*your DSpace repository*>/swordv2/collection/<*collection in which the item should be ingested*> --data-binary "@<*your xml MD file*>" -H "Content-Type: application/atom+xml" -H "In-Progress: true" --user "<*e-mail address submitter*>"

Step 2 : Ingest bitstreams

curl -v -i <*your DSpace repository*>/edit-media/3 --data-binary "@<*zip file containing the bitstreams*>" -H "Content-Type: application/zip" -H "Packaging: http://purl.org/net/sword/package/SimpleZip" -H "Content-Disposition:filename=<*zip file containing the bitstreams*>"  --user "<*e-mail address submitter*>"

Step 3 : Finish submission

curl -X POST -v -i <*your DSpace repository*>/edit/3 -H "In-Progress: false" -H "content-length: 0" --user "<*e-mail address submitter*>"

Patch Installation Procedures

Prerequisites

A new simplified addon has been released by 4Science built as a Maven Module. The artifacts for DSpace 5.10 and DSpace 6.3 have been released as public on 4Science Nexus repository

Important note: if you use DSpace 5.10 or DSpace 6.3 default versions you have to IMMEDIATELLY UPGRADE to the last line of development (5.11-SNAPSHOT or 6.4-SNAPSHOT). This is required because these versions may have some malfunctions due to dependencies (e.g. upgrade for Bower, JRuby, SASS dependency) or changed third party policies (e.g. GeoLite database feature for geolocation points).

To be able to install the patch, you will need the following prerequisites:

  • A running DSpace 5.10 or 6.3 instance
  • Git should be installed on the machine to apply the prerequisite patch.

RIOXX Integration addon

After upgrading your DSpace at 5.11-SNAPSHOT or 6.4-SNAPSHOT (currently the released date of stable versions is not yet known) you can install the patch to download the RIOXX Integration during the default DSpace build procedure. The patch upgrades the Maven POM files to retrieve and install the RIOXX code customizations.

1. Run the pre-requisite Git command.

Run the following command where <patch file> needs to be replaced with the name of the patch:

git apply --check <patch file>

This command will return whether it is possible to apply the patch to your installation. This should pose no problems where DSpace is not customized or where not many customizations are present.

If the check is successful, the patch can be installed without any problems. Otherwise, you will have to merge some changes manually.

To apply the patch, the following command should be run where <patch file> is replaced with the name of the patch file.

git apply --whitespace=nowarn --reject <patch file>

This command will tell git to apply the patch and ignore unharmful whitespace issues. The --reject flag instructs the command to continue when conflicts are encountered and saves the problematic code chunks to a .rej file so you can review and apply them manually later on. Before continuing to the next step, you have to resolve all merge conflicts indicated by the .rej files. After solving the merge conflicts, remove all the .rej files.

For example to install the 5.x RIOXX Integration plugin on top of a 5.10 DSpace version you have:

git checkout dspace-5.10
git apply --whitespace=nowarn --reject <jisc-from-5_10-to-5_11-patch.diff>
git apply --whitespace=nowarn --reject <jisc-5_11-SNAP-patch.diff>

2. Rebuild and redeploy your repository

After the patch has been applied, the repository will need to be rebuilt.
DSpace repositories are typically built using the Maven and deployed using Ant.

Important note: RIOXX and REF Integration are fully supported only with XMLUI Mirage2 and XML Workflow

To build application please use:

mvn clean -U package -Dmirage2.on=true

The new RIOXX integration will be downloaded automatically from 4Science Nexus Repository and installed by the build procedure (Maven/Ant)

The new metadata fields needed by RIOXX integration are automatically installed during the startup of the application thanks to the Flyway utility.

If you are not seeing the fields in your registry, you can import the rioxx fields manually by executing:

dspace/bin/dspace dsrun org.dspace.administer.MetadataImporter -f <dspace.dir>/config/registries/rioxxterms-types.xml -u

3. Restart your tomcat

After the repository has been rebuilt and redeployed, the tomcat will need to be restarted to bring the changes to production.

4. Populate the RIOXX OAI-PMH end point

To Populate the RIOXX end point, used for harvesting, run the following command:

[dspace]/bin/dspace oai import -c

This will Populate the RIOXX OAI endpoint that will be available on

<server-url>/oai/rioxx?verb=ListRecords&metadataPrefix=rioxx

If you want to avoid multiple manual executions of this script during testing, you can always add it to your scheduled tasks (crontab), and have it execute every hour or every 15 minutes.

Do note that the more items your repository contains, the more resource intensive this task is. Be careful scheduling this task frequently on production systems! On production systems we still highly recommend a daily frequency.

5. Load Fundref authority data

From DSpace 5 there is a new SOLR based infrastructure for authority control, originally used for storing authority data from ORCID. For RIOXX, this infrastructure was used to hold Fundref authority data.
Even though the SOLR core with authority data can be enabled for JSPUI, there is no support yet for lookup in this registry through the submission forms in JSPUI.

As the source, DSpace relies on the RDF file published by Crossref at:
http://dx.doi.org/10.13039/fundref_registry

More information about this file is available at:
http://help.crossref.org/fundref-registry

Download this file to your DSpace server.

Important note: since the file is large, you may need to give more memory to your "dspace" script - you can open in edit the "dspace" file and find the "-Xmx" parameter. Set it up to give 2 Gigabytes of memory "-Xmx2G".

The "PopulateFunderAuthorityFromXML" script will add new funders as authorities for inclusion in rioxxterms.project, where funder and project id are exposed.
If you are executing the script for the first time, your SOLR authority cache will be loaded with all funders present in the fundref export.
After that, you can use the same script when there is a new release of the fundref export. In this case, both new funders will be added and information from previously added funders will be updated.

To run the script:

./dspace dsrun org.dspace.scripts.PopulateFunderAuthorityFromXML -f {funder-authority-rdf}

arguments:
-f: The RDF XML file containing the funder authorities
-t: Test if the script works correctly. No changes will be applied.

Note: Using the above PopulateFunderAuthorityFromXML script is the only way to create funders in DSpace. If an item is ingested into DSpace, for example by using SWORD V2, and this item contains a funder project pair with a funder that does not yet exists in DSpace, then DSpace will not attempt to create this funder but will instead store the project funder pair in metadata field workflow.newfunderprojectpair.

Configure Submission forms or other metadata ingest mechanisms

Now that the new fields are present in your metadata schema's, you have to ensure that these fields can be filled. If your institution is relying on manual entry using the DSpace submission forms, you can go over the template input-forms.xml file on Github to see how the different new RIOXX fields can be included. If you are relying on automated ingests using SWORD or integrations with your CRIS system, you will likely need to customize the mapping and integration with those systems. This is beyond the scope of the patch and this documentation.

Note that simply adding the new RIOXX fields to the existing DSpace fields may create confusion for your end users. For example, the DSpace default "sponsor" field is similar to the RIOXX specific project and funder linking. Likewise, the "File Description" field that DSpace offers in the file upload dialog, has a similar purpose than the RIOXX "version" field. It is recommended to go over your submission forms entirely to verify that it is clear for your end users which fields are used for which purpose. Possibly, you may want to remove or repurpose existing DSpace default fields.

Verification

RIOXX Metadata Registry

As an administrator, navigate to the standard DSpace administrator page "Registries >> Metadata".
On this page, you should be able to see the new RIOXX metadata schema. When clicking on the link, you should see the different fields in the metadata schema. This new registry shouldn't be empty.

Submission forms based on template

This verification assumes that you have modified your input-forms.xml based on:

https://github.com/4Science/rioxxintegration/tree/master/example/input-forms.xml

Start the submission of a new item in a DSpace collection that uses our custom submission form config.
After the collection selection, a custom step is included to support adding multiple funders and project IDs. In this step you should be able to add this field:

  • rioxxterms:project: Funder lookup and project field

in the first screen of the next step, you should be able to find following new fields:

  • ali:license_ref: license URI and License start date
    • The RIOXX spec supports the provision of multiple license ref's and dates. In DSpace, we are currently only supporting a single license URL and a single date. If multiple usage licenses apply, it is recommended to pick the most open one.
  • dcterms.dateAccepted
  • rioxxterms:version
  • rioxxterms:version_of_record (DOI)

Note that the template input-forms.xml does not add every single field defined in RIOXX. For many of the fields declared as optional, you will need to modify the submission forms yourself. The standard DSpace submission forms already have an excess of different fields, this is why not all RIOXX optional fields were added by default. Even though these fields are not yet in the submission form they ARE being taken into account for the RIOXX OAI-PMH mapping. Please refer to the documentation of the mapping before enabling these fields in the submission form.

Following fields have to be included manually in the submission forms:

  • dc:coverage
  • dc:relation
  • rioxxterms:apc

Continue the submission and don't forget to attach a file in order to create your first RIOXX test item and verify that it is completely "archived" in the repository. You can check this by verifying if the item now appears in the list of "Recent Submissions" on the repository homepage.

OAI-PMH endpoint

Immediately after a new test item is available in the repository, it is NOT YET available in your OAI-PMH SOLR index.
Normally, you have a nightly scheduled task (cron job) that synchronizes the archived items in the repository, with the OAI-PMH index.

For your testing purposes, you will want to verify new test items immediately. To do this, you need to manually trigger the OAI indexing task that populates the RIOXX OAI-PMH endpoint, as described in step 6 of the installation process.

After you have done this, you should be able to see your newly archived RIOXX test item through the link:

<server-url>/oai/rioxx?verb=ListRecords&metadataPrefix=rioxx

If you don't see your item there, check the corresponding troubleshooting section below.

Rioxxterms:project

There is a discrepancy between the examples listed in http://rioxx.net/v2-0-final/ and with the XSD definition for the exposure of funder_name and funder_id at http://rioxx.net/schema/v2.0/rioxx/rioxxterms_.html#project

In the DSpace RIOXX OAI-PMH endpoint, we have chosen to follow the XSD and to expose the rioxxterms: namespace for the funder_name and funder_id attributes.

Troubleshooting

RIOXX test items are not visible in OAI-PMH endpoint

The RIOXX OAI-PMH endpoint has been developed in such a way that it only exposes items that are RIOXX compliant. An item will not appear there as long as not all of the following mandatory fields are present in the item:

  • ali:license_ref
  • dc:identifier that directly links to the attached bitstream (This can be both the bitstream as provided to DSpace or a URI to the full text publication hosted elsewhere)
  • dc:language
  • dc:title
  • dcterms:dateAccepted
  • rioxxterms:author
  • rioxxterms:project
  • rioxxterms:type
  • rioxxterms:version

According to the specification, dc.source is mandatory where applicable (ISSN or ISBN). Currently, DSpace is not enforcing this in the OAI-PMH endpoint and will just expose ISSN or ISBN when they are present in the metadata.
Again, aside from these metadatafields, make sure that the item contains a bitstream (file), or a value in the rioxxterms.openaccess.uri that links to the full text publication hosted elsewhere. Metadata records without bitstreams/openacces URI will not be exposed through the RIOXX OAI-PMH endpoint.