-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create technical metadata audit mechanism #515
Comments
I'm aware of two tricky cases to watch out for: 1. Filename on disk does not match filename in current version Cocina (filename has changed) How does this happen? When a filename is changed, the Moab does not receive a new copy of the file. Instead, the Moab manifests are updated to associate the existing file on disk with the new filename. Example druid: https://argo.stanford.edu/view/druid:yw479qv6748 Some of the files in this druid were renamed after Moab version 1. Take these 3 files:
These were deposited in Moab version 1 with different names before a later update modified the names (but not the content):
The technical metadata service generated the techMD for those three files and associated it with the names on disk:
This is not wrong but does mean that the techMD is not associated with the current filenames. 2. Filename in Cocina is not present on disk (file is a duplicate) How does this happen? When duplicate files are accessioned into the same Moab, the Moab only stores one copy of the file on disk. The Moab manifest records the filename for the other copy in the Moab manifest, which associates the name with the single copy stored on disk. I don't have a current example of the this condition because it's hard to find druids with duplicate files. The last time this came up was the issue that motivated #485 . We had a set of druids where there was techMD for the specific file stored on disk but not for the filename (in the Cocina) of the duplicate copy/copies of the files not stored on disk. |
As revealed by #485 and #510, it is possible for the techmd service to get out of sync with an item (most notably, missing files). To assist with the remediation of these cases, it would be useful to be able to audit the techmd system.
One approach to this might be:
The text was updated successfully, but these errors were encountered: