Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve document registration to ensure entity records are shared for the same external ID #97

Open
dlongley opened this issue Jan 10, 2023 · 0 comments

Comments

@dlongley
Copy link
Member

The entity and document registration records are optimistically inserted concurrently to achieve important performance gains.

However, it's possible for multiple entity records (with different internal IDs) to be created for two different registration records that have the same externalIdHash. In fact, with the current implementation, this is the expected outcome.

On document registration, a check is performed to find a registration record that matches an externalIdHash and documentHash, i.e., a record that would match the exact document being registered. However, if a document that matches the same externalIdHash but has a different documentHash were to be registered, no matching registration record would be found and, therefore, the entity associated with the existing any registration (and externalIdHash) would not be found and reused.

In order to properly support multiple documents with the same externalIdHash, the follow changes are needed:

  1. The initial call to find a registration record should be changed to also find any registration record matching externalIdHash. Note that it should find just the earliest of these as it should provide the "canonical" internalId. However, we would still want to also find the specific registration record that matches the documentHash (if this is not the same record) -- as it is required to be returned in the API. The most efficient way to do this might be to just make two concurrent calls that may happen to return the same record.
  2. The call that creates a registration record may create one with the "wrong" internalId because the entity record with the "right" internalId may be inserted concurrently -- for degenerate cases where two documents are registered at the same time. Any registration record that is inserted must have its internalId checked against the earliest registration record's internalId value -- and be updated if it does not match. Note: Using the created timestamp on records may not be sufficient to make this distinction -- as multiple records could use the same value and then a decision about which internalId to use will need to be made ... in the face of asynchronous / competing processes. The best way to resolve this is TBD.

Once these changes are implemented, the impact to any existing system with multiple registration records will need to be analyzed (should it be upgraded to support the improved constraints).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant