Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump python sdk version #827

Open
wants to merge 17 commits into
base: 1.10.latest
Choose a base branch
from
Open

Conversation

eric-wang-1990
Copy link
Collaborator

@eric-wang-1990 eric-wang-1990 commented Oct 14, 2024

Upgrade python sdk version from 0.17.0 to 0.36.0
Create DatabricksCredentialManager to managed credentials, under the hood use Config from databricks.sdk.core to manage the auth part which comes from the Python SDK.
For dbt we support 4 different auth mode:

  • token : When user explicitly passed in a token=xxxx
  • external_browser: When user pass in auth_type=oauth with clientid=xxx and no client secret
  • oauth-m2m: When user pass in clientid=xxx and clientsecret=yyy
  • azure-client-secret: When user pass in azure_client_id=xxx and azure_client_secret=yyy and your host is an azure host

For oauth-m2m and azure-client-secret we set a preferred auth sequence based on if the clientSecret starts with "dose", so we can reduce the number of false trying as much as possible.

Remove the set/get sharedPassword part, since the python sdk already handles that, but with one defect which I already raise a PR against: databricks/databricks-sdk-py#823

With this change one thing that will no longer work is if customer are using non-databricks OAuth client for U2M case it will not work, since the python sdk does not support modify redirectUrl/scopes.

Description

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@benc-db
Copy link
Collaborator

benc-db commented Nov 18, 2024

Looks like something is off in the python path, with those E2E tests failing.

@benc-db benc-db changed the base branch from main to 1.10.latest November 26, 2024 22:35
@stevenayers-bge
Copy link

Hi @benc-db & @eric-wang-1990, can this PR be applied to versions older than 1.10 too? At the moment, I have to have dbt as a separate repo purely to work around the conflict with databricks-connect==15.4.2.

I'm using dbt-databricks<1.80 right now, but I can upgrade to 1.8.x if you don't want to go that far back.

elif self.azure_client_id and self.azure_client_secret:
self._config = self.authenticate_with_azure_client_secret()
elif not self.client_secret:
self._config = self.authenticate_with_external_browser()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this, it will make it so much easier to add new authentication types!

I wonder if that could be made even simpler by just having a common generic method to build the Databricks SDK for all authentication types except the legacy ones.

The logic would then be reversed: we first check whether the authentication method used is legacy and otherwise just run something like: self._config = self.databricks_config

    @property
    def databricks_config(self) -> Config:
        return Config(
            host=self.host,
            client_id=self.client_id,
            client_secret=self.client_secret,
            auth_type=self.auth_type,
            azure_client_id=self.azure_client_id,
            azure_client_secret=self.azure_client_secret,
            token=self.token,
        )

I believe the databricks.sdk.config.Config class would handle the None values just fine by ignoring them. A major benefit of this approach, is that it would then be straightforward to pass through all of the attributes and thus enabling any possible authentication options.
It would also be easier to document and less confusing as you could just refer to Databricks' Unified Authentication / Python SDK authentication documentation. In addition it could also provide more flexibility by exposing additional configuration options.

For reference here is the full list of these attributes as of now:

  • host
  • account_id
  • token
  • username
  • password
  • client_id
  • client_secret
  • profile
  • config_file
  • google_service_account
  • google_credentials
  • azure_workspace_resource_id
  • azure_use_msi
  • azure_client_secret
  • azure_client_id
  • azure_tenant_id
  • azure_environment
  • databricks_cli_path
  • auth_type
  • cluster_id
  • warehouse_id
  • serverless_compute_id
  • skip_verify
  • http_timeout_seconds
  • debug_truncate_bytes
  • debug_headers
  • rate_limit
  • retry_timeout_seconds
  • metadata_service_url
  • max_connection_pools
  • max_connections_per_pool
  • databricks_environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants