Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow creating a record on Dataverse directly from GitHub #24

Open
EdoardoCostantini opened this issue Nov 19, 2024 · 0 comments
Open

Comments

@EdoardoCostantini
Copy link

At the University where I work, we are receiving more and more requests from researchers who would like to automatically create a Dataverse record based on the state of their GitHub repository. Currently, the dataverse-uploaded GitHub action requires the researcher to:

  1. Create a data record on Dataverse and fill in the metadata manually
  2. Copy the DOI from Dataverse and add it to the workflow.yml file
  3. Run the GitHub workflow
  4. Go back to Dataverse and submit for review

This workflow makes the researcher go back and forth between GitHub and Dataverse many times. I would like to propose the addition of a feature to create a data record upon running the GitHub action for the first time. This would reduce the number of times the researcher needs to switch between Dataverse and GitHub, and it might also help the researcher by automatically filling in as much metadata as possible based on the information in the GitHub repository.

Proposed user interface

The DATAVERSE_DATASET_DOI field could be defined to be optional (not required).

  • If the DOI is provided, the action works as it currently does.
  • If the DOI is not provided, a new Dataverse record is created using as much information as possible from the GitHub repository to define the metadata, and then the action proceeds as always.

Proposed implementation

For example, the action could follow these steps:

  1. Create and populate a temporary "metadata.json" file based on the GitHub context (which is needed to create a record with given metadata)
  2. Make the API request to create a new record based on a given "metadata.json" file. Something like:
import os
import requests

headers = {
    'X-Dataverse-key': os.getenv('API_TOKEN', ''),
    'Content-type': 'application/json',
}

with open('metadata.json', 'rb') as f:
    data = f.read()

response = requests.post(
    'http://' + os.getenv('SERVER_URL', '') + '/api/dataverses/' + os.getenv('PARENT', '') + '/datasets',
    headers=headers,
    data=data,
)
  1. Extract the DOI of the newly generated Dataverse record from the response object and use it as it had been provided by the user in the DATAVERSE_DATASET_DOI field.
  2. Maybe it would even be possible to have the workflow replace the missing value of the DATAVERSE_DATASET_DOI field with the newly created DOI. This would ensure that no new Dataverse record is created if the action is rerun.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant