Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement migration sequencing (phase 2) #3009

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

ericvergnaud
Copy link
Contributor

@ericvergnaud ericvergnaud commented Oct 17, 2024

Changes

Implement migration steps for notebooks and python files

  • Test cycles with notebooks referencing other notebooks

Linked issues

Progresses #1415

Functionality

None

Tests

  • added unit tests

@ericvergnaud ericvergnaud requested a review from a team as a code owner October 17, 2024 17:39
@ericvergnaud ericvergnaud marked this pull request as draft October 17, 2024 17:40
def register_workflow_job(self, job: jobs.Job) -> MigrationNode:
job_node = self._nodes.get(("JOB", str(job.job_id)), None)
job_node = self._nodes.get(("WORKFLOW", str(job.job_id)), None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align with ObjectInfo type enum

Copy link

github-actions bot commented Oct 17, 2024

❌ 73/77 passed, 4 failed, 4 skipped, 1h49m41s total

❌ test_migration_sequencing_job_with_task_referencing_non_existing_cluster: TypeError: MigrationSequencer.__init__() missing 1 required positional argument: 'administrator_locator' (224ms)
TypeError: MigrationSequencer.__init__() missing 1 required positional argument: 'administrator_locator'
[gw0] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
[gw0] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
❌ test_migration_sequencing_job_with_task_referencing_cluster: TypeError: MigrationSequencer.__init__() missing 1 required positional argument: 'administrator_locator' (2.028s)
TypeError: MigrationSequencer.__init__() missing 1 required positional argument: 'administrator_locator'
[gw7] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
[gw7] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
❌ test_running_real_assessment_job_ext_hms: databricks.sdk.errors.platform.InternalError: Failed to list databases due to Py4JSecurityException. Update or reinstall UCX to resolve this issue. (22m1.346s)
... (skipped 389772 bytes)
erify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS. SQLSTATE: 42704
13:36 INFO [databricks.labs.blueprint.parallel:guess_external_locations] listing tables 1/1, rps: 12.591/sec
13:36 INFO [databricks.labs.blueprint.parallel:guess_external_locations] Finished 'listing tables' tasks: 100% results available (1/1). Took 0:00:00.081199
13:36 INFO [databricks.labs.ucx.hive_metastore.tables:guess_external_locations] Finished scanning 0 tables
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:guess_external_locations] [hive_metastore.dummy_socng.tables] found 0 new records for tables
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:guess_external_locations] [hive_metastore.dummy_socng.external_locations] found 0 new records for external_locations
13:36 INFO [databricks.labs.ucx:crawl_grants] UCX v0.50.1+3320241126132137 After job finishes, see debug logs at /Workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.xRGH/logs/assessment/run-984582436726243-0/crawl_grants.log
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.grants] fetching grants inventory
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SELECT * FROM `hive_metastore`.`dummy_socng`.`grants`
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.grants] crawling new set of snapshot data for grants
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.tables] fetching tables inventory
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SELECT * FROM `hive_metastore`.`dummy_socng`.`tables`
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.tables] crawling new set of snapshot data for tables
13:36 INFO [databricks.labs.ucx.hive_metastore.tables:crawl_grants] Scanning dummy_sx51i
13:36 DEBUG [databricks.labs.blueprint.parallel:crawl_grants] Starting 1 tasks in 16 threads
13:36 ERROR [databricks.labs.ucx.hive_metastore.tables:crawl_grants] Failed to list databases due to Py4JSecurityException. Update or reinstall UCX to resolve this issue.
Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/databricks/labs/ucx/hive_metastore/tables.py", line 559, in _list_tables
    return list(self._iterator(self._external_catalog.listTables(database)))
                               ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/functools.py", line 995, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/databricks/labs/ucx/hive_metastore/tables.py", line 530, in _external_catalog
    return self._spark._jsparkSession.sharedState().externalCatalog()  # pylint: disable=protected-access
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 263, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 330, in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling o405.sharedState. Trace:
py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.internal.SharedState org.apache.spark.sql.SparkSession.sharedState() is not whitelisted on class class org.apache.spark.sql.SparkSession
	at py4j.security.WhitelistingPy4JSecurityManager.checkCall(WhitelistingPy4JSecurityManager.java:473)
	at py4j.Gateway.invoke(Gateway.java:305)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.base/java.lang.Thread.run(Thread.java:840)


13:36 INFO [databricks.labs.blueprint.parallel:crawl_grants] listing tables 1/1, rps: 164.204/sec
13:36 INFO [databricks.labs.blueprint.parallel:crawl_grants] Finished 'listing tables' tasks: 100% results available (1/1). Took 0:00:00.008948
13:36 INFO [databricks.labs.ucx.hive_metastore.tables:crawl_grants] Finished scanning 0 tables
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.tables] found 0 new records for tables
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.udfs] fetching udfs inventory
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SELECT * FROM `hive_metastore`.`dummy_socng`.`udfs`
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.udfs] crawling new set of snapshot data for udfs
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][execute] USE CATALOG `hive_metastore`;
13:36 DEBUG [databricks.labs.ucx.hive_metastore.udfs:crawl_grants] [hive_metastore.dummy_sx51i] listing udfs
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW USER FUNCTIONS FROM `hive_metastore`.`dummy_sx51i`;
13:36 WARNING [databricks.labs.ucx.hive_metastore.udfs:crawl_grants] Schema hive_metastore.dummy_sx51i no longer existed
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.udfs] found 0 new records for udfs
13:36 DEBUG [databricks.labs.blueprint.parallel:crawl_grants] Starting 4 tasks in 16 threads
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON CATALOG `hive_metastore`
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON ANY FILE 
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON ANONYMOUS FUNCTION 
13:36 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON DATABASE `hive_metastore`.`dummy_sx51i`
13:36 ERROR [databricks.labs.ucx.hive_metastore.grants:crawl_grants] Couldn't fetch grants for object DATABASE hive_metastore.dummy_sx51i: An error occurred while calling o405.sql.
: org.apache.spark.SparkSecurityException: Database(dummy_sx51i,Some(hive_metastore)) does not exist.
	at com.databricks.sql.acl.AclCommand.$anonfun$mapIfExists$1(commands.scala:79)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.sql.acl.AclCommand.mapIfExists(commands.scala:79)
	at com.databricks.sql.acl.AclCommand.mapIfExists$(commands.scala:75)
	at com.databricks.sql.acl.ShowPermissionsCommand.mapIfExists(commands.scala:226)
	at com.databricks.sql.acl.ShowPermissionsCommand.run(commands.scala:244)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$2(commands.scala:84)
	at org.apache.spark.sql.execution.SparkPlan.runCommandWithAetherOff(SparkPlan.scala:181)
	at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:192)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$1(commands.scala:84)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:81)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:80)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:94)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$5(QueryExecution.scala:387)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$4(QueryExecution.scala:387)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:193)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$3(QueryExecution.scala:387)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$10(SQLExecution.scala:453)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:738)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:334)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1273)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:205)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:675)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$2(QueryExecution.scala:383)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1125)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:379)
	at org.apache.spark.sql.execution.QueryExecution.withMVTagsIfNecessary(QueryExecution.scala:329)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$eagerlyExecute$1(QueryExecution.scala:377)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$8$1.applyOrElse(QueryExecution.scala:431)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$8$1.applyOrElse(QueryExecution.scala:426)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:505)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:85)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:505)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:40)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:379)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:375)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:481)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$8(QueryExecution.scala:426)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:436)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:426)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:288)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:285)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:383)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:132)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1273)
	at org.apache.spark.sql.SparkSession.$anonfun$withActiveAndFrameProfiler$1(SparkSession.scala:1280)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.SparkSession.withActiveAndFrameProfiler(SparkSession.scala:1280)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:123)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:969)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1273)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:933)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:992)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.base/java.lang.Thread.run(Thread.java:840)

13:36 INFO [databricks.labs.blueprint.parallel:crawl_grants] listing grants for hive_metastore 4/4, rps: 0.422/sec
13:36 INFO [databricks.labs.blueprint.parallel:crawl_grants] Finished 'listing grants for hive_metastore' tasks: 100% results available (4/4). Took 0:00:09.473927
13:36 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_socng.grants] found 0 new records for grants
13:36 INFO [databricks.labs.ucx:crawl_permissions] UCX v0.50.1+3320241126132137 After job finishes, see debug logs at /Workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.xRGH/logs/assessment/run-984582436726243-0/crawl_permissions.log
13:36 INFO [databricks.labs.ucx.assessment.workflows:crawl_permissions] Skipping permission crawling as legacy permission migration is disabled.
13:36 INFO [databricks.labs.ucx.installer.workflows] ---------- END REMOTE LOGS ----------
13:37 INFO [databricks.labs.ucx.install] Deleting UCX v0.50.1+3320241126132137 from https://DATABRICKS_HOST
13:37 INFO [databricks.labs.ucx.install] Deleting inventory database dummy_socng
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=912131732210626, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1044596945528103, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=102486256825231, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1080503767718745, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=312950414260697, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=755610085695314, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=891769156958438, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=704517162920537, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=996598579838925, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=336046930042195, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1070326180627272, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=410882524779222, as it is no longer needed
13:37 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=644127437362640, as it is no longer needed
13:37 INFO [databricks.labs.ucx.install] Deleting cluster policy
13:37 INFO [databricks.labs.ucx.install] Deleting secret scope
13:37 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
[gw9] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
❌ test_migration_sequencing_simple_job: TypeError: MigrationSequencer.__init__() missing 1 required positional argument: 'administrator_locator' (2.627s)
TypeError: MigrationSequencer.__init__() missing 1 required positional argument: 'administrator_locator'
[gw9] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
[gw9] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

Running from acceptance #7539

@ericvergnaud ericvergnaud changed the title Migration sequencing phase 2 Implement migration sequencing (phase 2) Oct 18, 2024
@ericvergnaud ericvergnaud marked this pull request as ready for review October 18, 2024 13:11
ws_path = WorkspacePath(self._ws, object_id)
object_owner = WorkspacePathOwnership(self._admin_locator, self._ws).owner_of(ws_path)
else:
raise ValueError(f"{object_type} not supported yet!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where this exception is caught? it'll crash the assessment workflow if unhandled.

@@ -78,8 +78,8 @@ def as_message(self) -> str:


class WorkflowTask(Dependency):
def __init__(self, ws: WorkspaceClient, task: jobs.Task, job: jobs.Job):
loader = WrappingLoader(WorkflowTaskContainer(ws, task, job))
def __init__(self, ws: WorkspaceClient, task: jobs.Task, job: jobs.Job, cache: WorkspaceCache | None = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add | None as constructor dependencies - it leads to non-deterministic logic and subtle bugs that are harder to diagnose later.

@JCZuurmond JCZuurmond force-pushed the migration-sequencing-phase-1 branch from f39bf38 to ba8b4c6 Compare November 1, 2024 07:49
Base automatically changed from migration-sequencing-phase-1 to main November 1, 2024 16:59
@JCZuurmond JCZuurmond force-pushed the migration-sequencing-phase-2 branch from 3c34640 to eb79746 Compare November 26, 2024 13:13
@JCZuurmond JCZuurmond requested a review from a team as a code owner November 26, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants