-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate making uuids deterministic/reproducible #238
Comments
The UUID, right now, is a random way of identifying a record. Pupa also has a separate, deterministic way of identifying a record, using However, sometimes, the resolution is incorrect. Two records sometimes need to be merged or unmerged. We can add new properties (like birth date) to disambiguate two people, for example. The nice thing about the random way is that it is guaranteed to be unique. If we make UUIDs deterministic, then we introduce a new problem: we can have the UUIDs be the same when the records aren't the same. Now we need to disambiguate both the UUID and the full record. I don't see how UUIDs can be made deterministic without introducing a new way for incorrect collisions to occur. |
agree with James M. this seems like a mistake and not what the UUIDs are On May 25, 2016 12:21 PM, "James McKinney" [email protected] wrote:
|
Ah, ok, gotcha. I misunderstood the short-term nature of the UUIDs. I knew the cdn scrapers were treating them as very transient, but I didn't realize that was so purposeful. Thanks all |
I think the pain point from the Slack conversation is best resolved by not clearing the DB before each scrape. However, if we stop clearing the DB in the scrapers for Represent, that creates new issues - namely, that Pupa has no way of automatically setting an end date on the memberships of representatives who were in a past scrape but not the current scrape. Since Represent doesn't care about UUIDs, clearing the DB is fine for its use case. For a single jurisdiction project (like a Councilmatic instance), Represent's pain point is not really felt, since it's easy to manually set an end date on a past membership within a single jurisdiction, which will occur rarely. Represent deals with 100 jurisdictions, so anything manual is a major maintenance burden, as things that are rare in one jurisdiction become common when you are managing 100. |
Context: https://opencivicdata.slack.com/archives/pupa/p1464187522000022
The text was updated successfully, but these errors were encountered: