Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of reprocessing data #33

Open
dpwrussell opened this issue Aug 24, 2018 · 0 comments
Open

Semantics of reprocessing data #33

dpwrussell opened this issue Aug 24, 2018 · 0 comments

Comments

@dpwrussell
Copy link
Contributor

There are several use-cases that warrant reprocessing of data:

  • Failure during the scan stage to identify a fileset that might be a fixed in a new version of the scanner.
  • Failure during the extract stage to successfully extract a fileset that might be fixed in a new version of the extractor.
  • Failure during the scan/extract stage due to unpredicted serverside error that has been resolved.
  • Even if an extract phase is successfully completed, the extracted metadata or images might be less than optimal and benefit from reprocessing the fileset.

The exact semantics of this needs to be defined before coming up with an implementation strategy.

Questions:

  • Is a reprocessed import entirely replaced by the reprocessed one?
  • Is a reprocessed fileset entirely replaced by the reprocessed one?
  • If reprocessed imports/filesets do not replace the originals, what happens to the originals and how do we record this in the database?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant