[FEATURE] New feature request - add an option to 'import' a databricks job using the .yml file #4295

andrewCluey · 2024-12-04T11:52:52Z

Use-cases

We would like our data team to continue to create databricks jobs using the methods they are comfortable with. That is, defining jobs in a development workspace & committing the yaml file into a version controlled git repository. Existing CI/CD processes should then be able to take over and deploy through the various environments.

However, the Terraform provider seems to only allow the job to be defined in HCL, this results in either decoding the yaml file, using templates or re-writing the yaml using HCL manually. All of these solutions are error prone & overly complex when the job has already been written using YAML.

The alternative is for Data developers to write the job in HCL from the start, which would be a substantial change to the way that they develop and test Databricks jobs.

Attempted Solutions

Using terraform functions to 'yamldecode' and template files, we can attempt to read the yaml file into terraform variables, and use the result to create the configuration in terraform. This is difficult, messy and incredibly verbose. It's also difficult to develop for all scenarios, so the likelihood of this working every time is debatable.

Terraform does have an import configuration feature, but this is still experimental, so cannot be used for production deployments. This does seem to work, so it could be a viable solution in the future. However, it is still adding an additional step into a process where one doesn't seem to be necessary.

The other solution is Databricks Asset Bundles, these do utilise Terraform, but it does appear to be a non-standard use of terraform (no obvious way to secure state file etc). Because of this lack of state file management, it's unlikely to fit well with CI/CD processes (how do build/deployment agents reference the state file for future deployments etc). DABs do seem to be focused on local/interactive deployments.

Proposal

The suggestion is to allow the already created job.yml file to be used in lieu of the HCL configuration (not a replace, but an either/or option). So, the databricks_job resource block would look something like this:

resource "databricks_job" "main" {
  name        = "Job with multiple tasks"
  description = "This job executes multiple tasks"
  import {
    type       = "yaml"
    content  = "${path.module}/artifacts/multi_job.yml"
  }
}

This allows data developers to carry on working in a way they are comfortable, the job is only defined once, so leads to a consistent deployment. it also allows platform teams to deploy Jobs using tooling and processes that they already have, without having to re-invent the wheel. Importantly, SMEs in each team are doing what they know best, with no re-engineering.

Thanks

References

The text was updated successfully, but these errors were encountered:

alexott · 2024-12-04T11:59:02Z

That's precisely why DABs exist - to support development workflows, and simplify deployments (local for development, CI/CD for staging/prod).

If developers create workflows in the UI, you can use Terraform exporter to generate HCL files from them.

P.S. Imho, it's a very low probability that this feature request will be fulfilled.

andrewCluey · 2024-12-04T12:28:28Z

That's precisely why DABs exist - to support development workflows, and simplify deployments (local for development, CI/CD for staging/prod).

If developers create workflows in the UI, you can use Terraform exporter to generate HCL files from them.

P.S. Imho, it's a very low probability that this feature request will be fulfilled.

Ok, thanks for the reply.

However, the problem is with state file management with DABs. Specifically it's not clear how it is stored and secured. As it is using terraform, it's likely that the ownership of it will fall on the platform team, which then brings in questions like where it is stored and secured. This doesn't seem, to me at least, to be a simpler deployment.

As mentioned about Terraform exporter, it is an experimental feature, so very few platform teams are going to allow an experimental feature to be used to deploy production services.

alexott · 2024-12-04T12:35:12Z

DABs right now store state in the workspace, and it's governed by the standard workspace permissions.

andrewCluey · 2024-12-04T12:49:58Z

DABs right now store state in the workspace, and it's governed by the standard workspace permissions.

OK, thanks for the confirmation on that. I'll take that to our platform team and get their view. I can see occasions where, if there are issues with state or other TF errors, that it would fall on platform team to resolve, so unlikely they'll support this approach.

using terraform plan -generate-config-out=.... would be the obvious answer to this, it's just unfortunate that it's experimental.

andrewCluey added the feature New feature or request label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] New feature request - add an option to 'import' a databricks job using the .yml file #4295

[FEATURE] New feature request - add an option to 'import' a databricks job using the .yml file #4295

andrewCluey commented Dec 4, 2024 •

edited

Loading

alexott commented Dec 4, 2024

andrewCluey commented Dec 4, 2024

alexott commented Dec 4, 2024

andrewCluey commented Dec 4, 2024

[FEATURE] New feature request - add an option to 'import' a databricks job using the .yml file #4295

[FEATURE] New feature request - add an option to 'import' a databricks job using the .yml file #4295

Comments

andrewCluey commented Dec 4, 2024 • edited Loading

Use-cases

Attempted Solutions

Proposal

References

alexott commented Dec 4, 2024

andrewCluey commented Dec 4, 2024

alexott commented Dec 4, 2024

andrewCluey commented Dec 4, 2024

andrewCluey commented Dec 4, 2024 •

edited

Loading