You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like our data team to continue to create databricks jobs using the methods they are comfortable with. That is, defining jobs in a development workspace & committing the yaml file into a version controlled git repository. Existing CI/CD processes should then be able to take over and deploy through the various environments.
However, the Terraform provider seems to only allow the job to be defined in HCL, this results in either decoding the yaml file, using templates or re-writing the yaml using HCL manually. All of these solutions are error prone & overly complex when the job has already been written using YAML.
The alternative is for Data developers to write the job in HCL from the start, which would be a substantial change to the way that they develop and test Databricks jobs.
Attempted Solutions
Using terraform functions to 'yamldecode' and template files, we can attempt to read the yaml file into terraform variables, and use the result to create the configuration in terraform. This is difficult, messy and incredibly verbose. It's also difficult to develop for all scenarios, so the likelihood of this working every time is debatable.
Terraform does have an import configuration feature, but this is still experimental, so cannot be used for production deployments. This does seem to work, so it could be a viable solution in the future. However, it is still adding an additional step into a process where one doesn't seem to be necessary.
The other solution is Databricks Asset Bundles, these do utilise Terraform, but it does appear to be a non-standard use of terraform (no obvious way to secure state file etc). Because of this lack of state file management, it's unlikely to fit well with CI/CD processes (how do build/deployment agents reference the state file for future deployments etc). DABs do seem to be focused on local/interactive deployments.
Proposal
The suggestion is to allow the already created job.yml file to be used in lieu of the HCL configuration (not a replace, but an either/or option). So, the databricks_job resource block would look something like this:
This allows data developers to carry on working in a way they are comfortable, the job is only defined once, so leads to a consistent deployment. it also allows platform teams to deploy Jobs using tooling and processes that they already have, without having to re-invent the wheel. Importantly, SMEs in each team are doing what they know best, with no re-engineering.
Thanks
References
The text was updated successfully, but these errors were encountered:
That's precisely why DABs exist - to support development workflows, and simplify deployments (local for development, CI/CD for staging/prod).
If developers create workflows in the UI, you can use Terraform exporter to generate HCL files from them.
P.S. Imho, it's a very low probability that this feature request will be fulfilled.
Ok, thanks for the reply.
However, the problem is with state file management with DABs. Specifically it's not clear how it is stored and secured. As it is using terraform, it's likely that the ownership of it will fall on the platform team, which then brings in questions like where it is stored and secured. This doesn't seem, to me at least, to be a simpler deployment.
As mentioned about Terraform exporter, it is an experimental feature, so very few platform teams are going to allow an experimental feature to be used to deploy production services.
DABs right now store state in the workspace, and it's governed by the standard workspace permissions.
OK, thanks for the confirmation on that. I'll take that to our platform team and get their view. I can see occasions where, if there are issues with state or other TF errors, that it would fall on platform team to resolve, so unlikely they'll support this approach.
using terraform plan -generate-config-out=.... would be the obvious answer to this, it's just unfortunate that it's experimental.
Use-cases
We would like our data team to continue to create databricks jobs using the methods they are comfortable with. That is, defining jobs in a development workspace & committing the yaml file into a version controlled git repository. Existing CI/CD processes should then be able to take over and deploy through the various environments.
However, the Terraform provider seems to only allow the job to be defined in HCL, this results in either decoding the yaml file, using templates or re-writing the yaml using HCL manually. All of these solutions are error prone & overly complex when the job has already been written using YAML.
The alternative is for Data developers to write the job in HCL from the start, which would be a substantial change to the way that they develop and test Databricks jobs.
Attempted Solutions
Using terraform functions to 'yamldecode' and template files, we can attempt to read the yaml file into terraform variables, and use the result to create the configuration in terraform. This is difficult, messy and incredibly verbose. It's also difficult to develop for all scenarios, so the likelihood of this working every time is debatable.
Terraform does have an
import configuration
feature, but this is still experimental, so cannot be used for production deployments. This does seem to work, so it could be a viable solution in the future. However, it is still adding an additional step into a process where one doesn't seem to be necessary.The other solution is Databricks Asset Bundles, these do utilise Terraform, but it does appear to be a non-standard use of terraform (no obvious way to secure state file etc). Because of this lack of state file management, it's unlikely to fit well with CI/CD processes (how do build/deployment agents reference the state file for future deployments etc). DABs do seem to be focused on local/interactive deployments.
Proposal
The suggestion is to allow the already created job.yml file to be used in lieu of the HCL configuration (not a replace, but an either/or option). So, the
databricks_job
resource block would look something like this:This allows data developers to carry on working in a way they are comfortable, the job is only defined once, so leads to a consistent deployment. it also allows platform teams to deploy Jobs using tooling and processes that they already have, without having to re-invent the wheel. Importantly, SMEs in each team are doing what they know best, with no re-engineering.
Thanks
References
The text was updated successfully, but these errors were encountered: