Orphaned identifiers result in duplicate objects #331

hancush · 2020-03-27T21:53:29Z

Related to #295.

Deleting objects does not delete identifiers associated with them. When orphaned identifiers hang out, any subsequent object with the same identifier creates duplicates. Some instances in which this is an issue is if data is removed by accident, or if it's desirable to remove data before a scrape can correct it, e.g., to prevent the spread of erroneous information.

A practical example: We scrape events from the Legistar API and use the unique event ID as an identifier for events. This week, we needed to remove a batch of test events, some with errors, and rely on the scrape to repopulate the events that did not contain errors. This resulted in a duplicate of every correct event that was removed, for each scrape we ran.

Something like hooking into delete signals for the top-level models in python-opencivicdata and removing any associated identifiers on removal might work, though it wouldn't cover removing data at the database level, since signals wouldn't fire. A database trigger implemented in a migration could cover data removal at the ORM or database level, though that would be less obvious to the end user.

In the meantime, this issue can be mitigated by ensuring identifiers are sufficiently unique and carefully deleting data, but I think it would be nice to think about for a future release.

As ever, thanks for your work!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orphaned identifiers result in duplicate objects #331

Orphaned identifiers result in duplicate objects #331

hancush commented Mar 27, 2020

Orphaned identifiers result in duplicate objects #331

Orphaned identifiers result in duplicate objects #331

Comments

hancush commented Mar 27, 2020