Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cli command to create directfs to table mapping #3427

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pritishpai
Copy link
Contributor

Changes

The directfs access in the code that can be replaced by table would need a mapping to be reviewed/updated by the user. Introduces a cli command that creates the mapping file from the directfs usages.

Linked issues

Introduces #392

Functionality

  • added relevant user documentation
  • added new CLI command

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • verified on staging environment (screenshot attached)

Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction looks good. Just thinking again about the idea of merging this with the table_mapping instead of a separate mapping. Did we log our decision about this somewhere?

src/databricks/labs/ucx/hive_metastore/directfs_mapping.py Outdated Show resolved Hide resolved
raise ValueError(msg)
if not directfs_snapshot:
msg = "No directfs references found in code"
raise ValueError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not find any directfs references then we just raise the error and return

for table in tables_snapshot:
for directfs_record in directfs_snapshot:
if table.location:
if directfs_record.path in table.location:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about dfsa's that do not have a match with tables? We want to include those too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wont be having a mapping with a table to replace them and ignore them right?

@pritishpai
Copy link
Contributor Author

pritishpai commented Dec 11, 2024

The direction looks good. Just thinking again about the idea of merging this with the table_mapping instead of a separate mapping. Did we log our decision about this somewhere?

Table mapping is used entirely for table migration, while this is for code migration. I think we should keep this separate. But no we have not logged the decision anywhere.

@JCZuurmond
Copy link
Member

Table mapping is used entirely for table migration, while this is for code migration. I think we should keep this separate. But no we have not logged the decision anywhere.

  • The table mapping is maybe also used for code migration, or is that done with the able migration status based of the table properties?
  • The DFSA mapping is not just for code migration, it is also for table migration to create the tables for the DFSAs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Blocked/Hold
Development

Successfully merging this pull request may close these issues.

2 participants