-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smart repository proxy #5402
Comments
Scenario An organization has been using a third-party metadata repository for some time. The repository only operates in a standalone mode, however, it does support export/import. The organization has installed one instance of the repository for their governance team to use. This is where common definitions (glossary terms, policies etc) are developed. These definitions are then loaded into the instances of the repository that are located in each of the 3 business units using the export/import mechanism. There is a copy of the common definitions in each repository. This way each part of the business uses the same common definitions. The organization then decides to use Egeria to connect the business unit metadata repositories together to share metadata. All would be well if the third-party metadata repository supported reference copies, but it does not. It treats the common definitions it has imported in the same way as any metadata defined through its UI. In Egeria terminology, all elements stored in its repository are considered part of its home metadata collection. When it shares the common definitions across the cohort, it identifies them as belonging to its metadata collection. One of two things can happen when each repository shares its copy of the common definitions with the other members of the cohort - depending on how the unique identifiers (GUIDs) were assigned to the common definitions in each repository when they were imported.
It is the second situation that the smart repository proxy is aiming to support. |
The smart repository proxy replaces the current repository proxy. It maintains a list of the metadata elements stored in its third party metadata repository that should be treated as reference copies (ie as belonging to another metadata collection). It then performs the following services:
|
The smart repository proxy maintains its list of reference copies in a OMRS repository. This is supplied as a connection object passed to the repository proxy at start up. Its contents can be bootstrapped from an open metadata archive also loaded on start up of the repository proxy. It may be augmented with other reference copies that are received from the cohort and are then passed on to the third party metadata repository. |
Returning to the scenario at the top of this issue... The simplest way to create the list of reference copy instances is by creating a metadata archive for the common definitions. This can be created by:
The open metadata archive is then used to populate an embedded in-memory repository whenever the smart repository proxy is started. The list of common definition instances from the archive provide the list of instances that the smart repository proxy will monitor for and use when communicating with the cohort. The embedded in-memory repository acts as the cache of these metadata instances. |
If the organization wants to store metadata from other cohort members in the third-party metadata repository, it uses a persistent repository connector for the cache so it can keep track of all reference copies that it is dynamically storing in the third-party metadata repository. Open metadata archives can still be used to represent content that are logically reference copies that was added to the third party metadata repository via other mechanisms. They can also be used to load new common definitions into the third-party metadata repository. |
The smart repository proxy may use its store to assemble open metadata elements together before storing them in the third party metadata repository if the third party metadata repository has courser-grained elements. |
Implementation The smart repository proxy can be implemented as two new connectors that run in the existing repository proxy OMAG Server. There are two specialist connectors configured in the repository proxy that are responsible with communication with the third-party repository:
The smart repository proxy runs two additional connectors that wrap the third party ones. These connectors (shown in grey) are completely generic and can run with any third party metadata repository connectors.
|
Administration Although the Smart Repository Proxy's connectors require no changes to the Egeria runtime to operate, it would help users if the admin services were enhanced to help build the nested connection objects required to configure the nested connectors used in the smart repository connector and well as set up the configuration properties that control the behaviour of the Smart Repository Proxy's connectors. |
All looks great -- just a question on this statement:
Deletes, or purges? I would have thought a delete would be a protocol violation, as neither connector is the home repository and therefore should not be able to handle a soft-delete of the instance (?) I was expecting only a purgeReferenceCopy would be handled against the cache, though perhaps this would need to be passed onwards as a purgeInstance call against the Third Party Metadata Repository Connector (?) |
Good question: There are three forms of delete in the events
In all three cases, reference copies are removed from both repositories. However, you are right that a delete or purge through the API of a reference copy is a protocol violation |
@mandy-chessell Mandy, Thinking about scenario of 3 third-party repositories containing metadata instances loaded with the same import script and therefore the same GUIDs. You mention that " governance third party metadata repository" would not be connected to cohort and can be used to build metadata archive. What happens when new instances are added to governance third party repository? How do other reference repositories have their cache updated? |
The rule does not change - the owner of the metadata is the originator. This is reflected in the metadata collection id in the header of the instance. In the scenario described above, the governance metadata repository is the owner of the common definitions since its metadata collection id is in the header of each element in the archive. At the time before the governance metadata repository joins the cohort, no member of the cohort can change the content since all copies imported via the archive are reference copies. Updates to the common definitions from the governance metadata repository are introduced through a new archive/import file. If/when the governance metadata repository joins the cohort, it continues to be the owner of the common definitions. The difference is that changes to its instances are distributed to the other cohort members immediately through the cohort mechanisms. If the governance metadata repository is to be decommissioned, then it is possible to change ownership of its elements using the rehome commands. The metadata collection id is set to the repository that is taking over responsibility for maintaining the common definitions. This is all standard cohort operation that was defined in the original OMRS spec. All the smart repository proxy adds is mitigation when the third party repository does not support reference copies. |
Thank you Mandy @mandy-chessell I wonder if there is a way to indicate to all the Smart Proxies that particular member of cohort is "owner" of duplicate GUIDs that can be found in source system, and therefore they should be treated as "reference" copies everywhere. That way we might avoid archive import process to prime proxies all together. |
The archive import is precisely that. It is acting as the mechanism to tell the smart repository proxies which instances in their third party metadata repository have incorrect metadata collection ids and what the metadata collection ids should be so that it can communicate with the other cohort members in a compliant way. The third party metadata repository is providing incorrect metadata collection ids in its instances because it does not support reference copies and so has not stored what the metadata collection id should be. The other aspect of reference copies that needs to be supported is that they should not be changed by the third party metadata repositor(y/ies). Having complete instances in the archive means that the smart repository proxy can send out the completely correct version of the reference copy to the cohort. If there is an event mapper, the smart repository proxy can detect updates to these reference copy instances and record the violation in the audit log. If the repository connector is sophisticated enough, it can restore the correct values in the third party metadata repository as well. The archive is only needed when the common definitions are being maintained by imports. The smart repository proxy can dynamically build the list of reference copies coming from other members of the cohort. For that it needs a persistant repository. If this persistant repository supports history then the smart repository proxy can support historical queries on behalf of its third party metadata repository as well. With the smart proxy, we can take a repository, such as Apache Atlas, that does not support reference copies or historical queries and enable a compliant two-way exchange of metadata with it. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions. |
The smart repository proxy enhances a third-party metadata repository that does not support reference copies to allow:
In addition, the caching capability of the smart repository proxy can be used to improve the performance of the metadata
repository's communication with the cohort.
Background reading:
The text was updated successfully, but these errors were encountered: