Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphaned identifiers result in duplicate objects #331

Open
hancush opened this issue Mar 27, 2020 · 0 comments
Open

Orphaned identifiers result in duplicate objects #331

hancush opened this issue Mar 27, 2020 · 0 comments

Comments

@hancush
Copy link
Contributor

hancush commented Mar 27, 2020

Related to #295.

Deleting objects does not delete identifiers associated with them. When orphaned identifiers hang out, any subsequent object with the same identifier creates duplicates. Some instances in which this is an issue is if data is removed by accident, or if it's desirable to remove data before a scrape can correct it, e.g., to prevent the spread of erroneous information.

A practical example: We scrape events from the Legistar API and use the unique event ID as an identifier for events. This week, we needed to remove a batch of test events, some with errors, and rely on the scrape to repopulate the events that did not contain errors. This resulted in a duplicate of every correct event that was removed, for each scrape we ran.

Something like hooking into delete signals for the top-level models in python-opencivicdata and removing any associated identifiers on removal might work, though it wouldn't cover removing data at the database level, since signals wouldn't fire. A database trigger implemented in a migration could cover data removal at the ORM or database level, though that would be less obvious to the end user.

In the meantime, this issue can be mitigated by ensuring identifiers are sufficiently unique and carefully deleting data, but I think it would be nice to think about for a future release.

As ever, thanks for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant