Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata template & download should be consistent #2979

Open
saracarl opened this issue Feb 3, 2022 · 1 comment · May be fixed by #4273
Open

Metadata template & download should be consistent #2979

saracarl opened this issue Feb 3, 2022 · 1 comment · May be fixed by #4273
Assignees

Comments

@saracarl
Copy link
Collaborator

saracarl commented Feb 3, 2022

Could we make the metadata template and the metadata export the same spreadsheet? Or at least have the same fields in the same order as much as possible?

see also #2715

@benwbrum
Copy link
Owner

I'm making this the master issue for tracking the problems between metadata spreadsheet uploads, the download metadata csv report, and the potential conflicts between work.original_metadata and other metadata-like fields in the system.

As background, metadata about works is stored in four places:

  • Fields like title, description, author, etc. are stored as attributes of the Work object. Some of these attributes drive behavior, others only exist to be displayed (in the About tab of a work) and exported in CSV files or IIIF manifests. Currently these fields are populated by the UI in the Work Settings screen, or during import in the metadata.yml files which project owners place in folders of images uploaded as zip files.

  • More free-form metadata about a work is stored as an array of label/value pairs in work.original_metadata.
    This may be populated in several ways:

    • Importing a IIIF manifest into FromThePage essentially copies the metadata stanza of the manifest into the original_metadata attribute of the work.
    • Uploading a metadata spreadsheet converts column headers to labels and cells to values to replace the original_metadata array with their contents
    • _No UI exists to edit the contents of original_metadata

    The original_metadata is exposed in four ways

    • IIIF manifests expose this as JSON
    • The Work Metadata CSV export converts this into columns/values
    • Many CSV exports like table/field CSV exports expose this as columns/values similar to Work Metadata CSV
    • The TEI-XML export produces label/value pairs as XML elements in a xenodata tag.
  • User-collected metadata about a work lives in a separate JSON attribute on the Work object, metadata_description. This is configured using the same code as TranscriptionField, so the data types and labels are set by project owners.

    • Values are added to metadata_description through the UI by end users as part of the transcribe/describe flow.
    • These values are kept separate from original_metadata in the database and in the UI. We assume that original_metadata was created by the project owner or project staff, while metadata_description was part of a crowdsourcing process, and our customers want to keep these two processes separate.
    • Values in metadata_description are exposed in the Work Metadata CSV export and in the IIIF Contributions API
  • A whole giant feature based on browsing works by metadata is stored in metadata_facets, which is populated from original_metadata and may be considered a derivative.

From #3003 (WWP and FDP):
Project owners who have uploaded their works as PDFs, etc, want to be able to export the Work Metadata CSV file, make modifications to it, then upload it via the collection settings->upload metadata UI.

From #3138 (ARI)
Project owners who upload work metadata spreadsheets want to see that reflected on all of the (relevant) work attributes as well as the original_metadata fields. In addition, they want to use metadata spreadsheet uploads to manage free-form metadata fields.

Problems with the current functionality:

  • The metadata upload UI gives project owners a template to download and modify. This template does not include all values from original_metadata (and probably does not include all metadata attributes of Work either). As a result, users with metadata columns A, B, C want to update B, so they visit the metadata upload screen, download a template (which only contains title, description, and work ID), add a B column, fill it in and upload it.
    • They expect metadata field B to be updated
    • Instead, B is updated, but A and C are deleted.
  • Metadata fields on the attributes of Work (other than title and descriptioncannot be modified via a spreadsheet upload. If a project owner uploads a spreadsheet with a column ofauthor, a new authorlabel will appear on the relevantwork.original_metadatablock, but the attributework.author` will remain unchanged.

A lot of this could be fixed by pre-populating the template used for work metadata spreadsheet uploads with all current values from work.original_metadata and the work attributes. A more invasive solution would eliminate most work attributes and just use work.original_metadata for most functionality (which would require an editing UI).

From #2715 (UT) (Mostly covered by requests for #3138)

  • I have been opening the metadata export in Excel and then saving it as a csv file, which the platform rejects when I try to ingest it. I have to open the Excel-generated csv in notepad, download the template, copy and paste from the former to the latter, and save the latter to be able to ingest it. I don't know if this is an issue with how Excel formats the csv file. I'm not sure if y'all need to add further guidelines on this or if it can be fixed automatically. I am attaching the 'problem' file.
    • We could handle this by using the Roo library to handle .xlsx or .xls files and not forcing users to convert Excel to CSV
  • Is it possible to replace the FTP title with a new title in the ingested metadata csv?
    • If a metadata upload contains a header with the name of a work attribute, we should update that attribute as well as the value in original_metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants