Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] New feature request - add an option to 'import' a databricks job using the .yml file #4295

Open
andrewCluey opened this issue Dec 4, 2024 · 4 comments
Labels
feature New feature or request

Comments

@andrewCluey
Copy link

andrewCluey commented Dec 4, 2024

Use-cases

We would like our data team to continue to create databricks jobs using the methods they are comfortable with. That is, defining jobs in a development workspace & committing the yaml file into a version controlled git repository. Existing CI/CD processes should then be able to take over and deploy through the various environments.

However, the Terraform provider seems to only allow the job to be defined in HCL, this results in either decoding the yaml file, using templates or re-writing the yaml using HCL manually. All of these solutions are error prone & overly complex when the job has already been written using YAML.

The alternative is for Data developers to write the job in HCL from the start, which would be a substantial change to the way that they develop and test Databricks jobs.

Attempted Solutions

Using terraform functions to 'yamldecode' and template files, we can attempt to read the yaml file into terraform variables, and use the result to create the configuration in terraform. This is difficult, messy and incredibly verbose. It's also difficult to develop for all scenarios, so the likelihood of this working every time is debatable.

Terraform does have an import configuration feature, but this is still experimental, so cannot be used for production deployments. This does seem to work, so it could be a viable solution in the future. However, it is still adding an additional step into a process where one doesn't seem to be necessary.

The other solution is Databricks Asset Bundles, these do utilise Terraform, but it does appear to be a non-standard use of terraform (no obvious way to secure state file etc). Because of this lack of state file management, it's unlikely to fit well with CI/CD processes (how do build/deployment agents reference the state file for future deployments etc). DABs do seem to be focused on local/interactive deployments.

Proposal

The suggestion is to allow the already created job.yml file to be used in lieu of the HCL configuration (not a replace, but an either/or option). So, the databricks_job resource block would look something like this:

resource "databricks_job" "main" {
  name        = "Job with multiple tasks"
  description = "This job executes multiple tasks"
  import {
    type       = "yaml"
    content  = "${path.module}/artifacts/multi_job.yml"
  }
}

This allows data developers to carry on working in a way they are comfortable, the job is only defined once, so leads to a consistent deployment. it also allows platform teams to deploy Jobs using tooling and processes that they already have, without having to re-invent the wheel. Importantly, SMEs in each team are doing what they know best, with no re-engineering.

Thanks

References

@andrewCluey andrewCluey added the feature New feature or request label Dec 4, 2024
@alexott
Copy link
Contributor

alexott commented Dec 4, 2024

That's precisely why DABs exist - to support development workflows, and simplify deployments (local for development, CI/CD for staging/prod).

If developers create workflows in the UI, you can use Terraform exporter to generate HCL files from them.

P.S. Imho, it's a very low probability that this feature request will be fulfilled.

@andrewCluey
Copy link
Author

That's precisely why DABs exist - to support development workflows, and simplify deployments (local for development, CI/CD for staging/prod).

If developers create workflows in the UI, you can use Terraform exporter to generate HCL files from them.

P.S. Imho, it's a very low probability that this feature request will be fulfilled.

Ok, thanks for the reply.

However, the problem is with state file management with DABs. Specifically it's not clear how it is stored and secured. As it is using terraform, it's likely that the ownership of it will fall on the platform team, which then brings in questions like where it is stored and secured. This doesn't seem, to me at least, to be a simpler deployment.

As mentioned about Terraform exporter, it is an experimental feature, so very few platform teams are going to allow an experimental feature to be used to deploy production services.

@alexott
Copy link
Contributor

alexott commented Dec 4, 2024

DABs right now store state in the workspace, and it's governed by the standard workspace permissions.

@andrewCluey
Copy link
Author

DABs right now store state in the workspace, and it's governed by the standard workspace permissions.

OK, thanks for the confirmation on that. I'll take that to our platform team and get their view. I can see occasions where, if there are issues with state or other TF errors, that it would fall on platform team to resolve, so unlikely they'll support this approach.

using terraform plan -generate-config-out=.... would be the obvious answer to this, it's just unfortunate that it's experimental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants