-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of an Ownership factory #3072
base: main
Are you sure you want to change the base?
Conversation
❌ 120/128 passed, 2 flaky, 8 failed, 8 skipped, 3h56m29s total ❌ test_migrate_view: AssertionError: assert 3 == 4 (3m29.337s)
❌ test_move_views: databricks.sdk.errors.platform.BadRequest: [UC_COMMAND_NOT_SUPPORTED.WITH_RECOMMENDATION] The command(s): 3-layer (catalogName.schemaName.tableName) table name notation: dummy_cr7oi.dummy_sded4.dummy_t0ygh for view creation in HMS federation are not supported in Unity Catalog. Please use schemaName.tableName or tableName instead. SQLSTATE: 0AKUC (7.49s)
❌ test_alias_tables: databricks.sdk.errors.platform.BadRequest: [UC_COMMAND_NOT_SUPPORTED.WITH_RECOMMENDATION] The command(s): 3-layer (catalogName.schemaName.tableName) table name notation: dummy_c8nof.dummy_svmkt.dummy_t1w3j for view creation in HMS federation are not supported in Unity Catalog. Please use schemaName.tableName or tableName instead. SQLSTATE: 0AKUC (17.126s)
❌ test_table_migration_job_refreshes_migration_status[regular-migrate-tables]: AssertionError: No destination schema found for TableType.VIEW hive_metastore.migrate_tg8j1.dummy_tflv9 (8m13.55s)
❌ test_table_migration_job_publishes_remaining_tables[regular]: AssertionError: assert 'hive_metasto...o.dummy_tixvk' == 'hive_metasto...o.dummy_tvptw' (7m44.698s)
❌ test_migration_job_ext_hms[regular]: AssertionError: dummy_tkilh not found in dummy_ceqpe.migrate_wt7yf (14m0.357s)
❌ test_move_tables: databricks.sdk.errors.platform.BadRequest: [UC_COMMAND_NOT_SUPPORTED.WITH_RECOMMENDATION] The command(s): 3-layer (catalogName.schemaName.tableName) table name notation: dummy_cjfaj.dummy_sulzs.dummy_tf74k for view creation in HMS federation are not supported in Unity Catalog. Please use schemaName.tableName or tableName instead. SQLSTATE: 0AKUC (15.915s)
❌ test_table_migration_for_managed_table[managed-migrate-tables]: AssertionError: dummy_tmqvn not found in dummy_cfxoh.managed_dwkma (7m26.806s)
Flaky tests:
Running from acceptance #7214 |
# Conflicts: # src/databricks/labs/ucx/contexts/application.py # src/databricks/labs/ucx/hive_metastore/table_migration_status.py # src/databricks/labs/ucx/hive_metastore/tables.py
## Changes `LegacyQueryOwnership` takes a `str` as record type which obfuscates the meaning of that str and creates risk of collision in the ownership factory This PR fixes that by introducing a `LegacyQueryPath` data class as a wrapper around the `str` ### Linked issues None ### Functionality None ### Tests - [x] ran unit tests Co-authored-by: Eric Vergnaud <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make dependencies explicit, not implicit
@cached_property | ||
def ownership_factory(self) -> Callable[[type], Ownership]: | ||
# ensure registration of Ownerships | ||
names_with_ownership = [name for name in dir(GlobalContext) if "ownership" in name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explicitly init components - we're doing that for permissions migration already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
_factories: dict[type, Ownership] = {} | ||
|
||
@classmethod | ||
def for_record_type(cls, record_type: type) -> Ownership[Record]: | ||
return cast(Ownership[Record], cls._factories[record_type]) | ||
|
||
def __init__(self, administrator_locator: AdministratorLocator, record_type: type) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this control flow confuses pylint
and it will fail to detect hidden future bugs.wouldn't it be more clear if we add is_applicable
abstract method
def is_applicable(self, record: Any) -> bool:
return isinstance(record, Table)
this will:
- avoid changing every constructor at the cost of adding new method
- allow for one ownership to work with more than one record class
- allow for an explicit injectable facade:
class AnyOwnership(Ownership[Any]):
def __init__(self, ownerships: list[Ownership]): ...
def owner_of(self, record: Any) -> str:
for o for self._ownerships: if o.is_applicable(record): return o.owner_of(record)
return self._administrator_locator....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments.
One thing I wonder: if there a context where the ownership instance that we need is not static in nature and tied to the type?
If not, I think we could replace things with a dictionary of type: ownership_instance
in the application context. (Right now we wouldn't even need to deal with specialisation.)
I don't know about the context or any discussions that lead to this PR, so this is just based on the context provided here alone. (To be clear, I do understand the desire to avoid the proliferation of properties we have on the application context.)
@abstractmethod | ||
def is_applicable_to(self, record: Any) -> bool: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a design perspective, I think I'd prefer this to take the type of the record as an argument rather than an instance thereof. The reason for this is that it would then encourage the caller to get the encoder and re-use it for a collection, which is the usual use-case. As it stands the current API encourages the lookup for each record within a loop over a collection, and the lookups are reasonably expensive due to the iteration required.
I'm on the fence about whether it should be a class method or not, although leaning towards a class method being more accurate. (For singletons, which these are, I agree that the distinction is more academic than practical.)
Assuming the current API, I think a default implementation is possible (and preferable) here:
- A cached property, initialised via
typing.get_args(self)
, to get the generic type of the instance. - Use
isinstance()
as in the current implementations against the cached property.
(If we switch to accepting the type rather than an instance then the details obviously change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this comment indicating that we want to allow instances to support multiple types.
That said, I don't fully follow the reasoning: type-based lookups can also support multiple instance types (A | B
) and encourage a single lookup rather than the (current) situation that encourages callers to implicitly linear probe the registry within a loop. I think I'm missing something here, and hope that @nfx can help me understand what he has in mind.
@@ -169,8 +171,21 @@ def get_workspace_administrator(self) -> str: | |||
class Ownership(ABC, Generic[Record]): | |||
"""Determine an owner for a given type of object.""" | |||
|
|||
_ownerships: set[Ownership] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the type annotation here is quite correct: it needs to be ClassVar[…]
? (Without this I believe it is considered by the type tooling to be an instance property.)
@classmethod | ||
def for_record(cls, record: Any) -> Ownership[Record]: | ||
for ownership in cls._ownerships: | ||
if ownership.is_applicable_to(record): | ||
return ownership | ||
raise ValueError(f"Ownership not implemented or not registered for {type(record).__name__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere I've commented on this API, and whether it should take the type of the record instead of an instance.
I'd really like this method to have a docstring. The current design allows for some ambiguous corner cases like multiple registered instances supporting a given record. (This remains even if we switch to type-based lookups.)
@cached_property | ||
def ownership_factory(self) -> Callable[[Record], Ownership]: | ||
# ensure registration of Ownerships | ||
_ = [ | ||
self.directfs_access_ownership, | ||
self.grant_ownership, | ||
self.legacy_query_ownership, | ||
self.table_migration_ownership, | ||
self.table_ownership, | ||
self.udf_ownership, | ||
self.workspace_path_ownership, | ||
] | ||
return Ownership.for_record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As things stand during tests we re-create the application context for each test that needs one, and this is going to lead to the registry accumulating instances for each test that hits this property.
For this reason (and others) in the past we've preferred to avoid global singletons (which the class-hosted registry is), and move the cache to be owned and managed outside of it so that the lifecycle of the cache is tied to the application context.
Changes
Every asset has an owner and our implementation requires that the owner be retrieved using a specialized
Ownership
instance.There is also a requirement that such
Ownership
be retrieved from theGlobalContext
, which in complex scenarios may result in a proliferation of constructor parameters.This PR implements an
Ownership
factory that makes it easy to retrieve the appropriatexxxOwnership
from theGlobalContext
, simply using theRecord
type being considered.Linked issues
Progresses #1415
Functionality
None
Tests