Skip to content

Commit

Permalink
credentials moved to configuration, added configuration pages (#703)
Browse files Browse the repository at this point in the history
* credentials moved to configuration, added configuration pages

* add description

* del description

* fix paths

* fix broken links

* refactored secrets and configs

* refactored config providers

* refactored config specs

* return "credentials" as a section slug

* refactor links

* updates config docs, requests changes

* add built in creds

* refactor specs

* add explanation for secrets and config

* refactor configuration

* refactor configuration

* move add creds to how to

* convert comments to text

* rename

* del imports

* Update docs/website/docs/walkthroughs/add_credentials.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/walkthroughs/add_credentials.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/config_providers.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/config_providers.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/config_providers.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/configuration.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/configuration.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/configuration.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/configuration.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/configuration.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/config_providers.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/config_providers.md

Co-authored-by: Anton Burnashev <[email protected]>

* add more details about secrets and config

* intro for providers

* intro for specs

* small changes

* spec examples with sources

* small changes

* delete link to name convention

* refactor

* refactor

* refactor

* refactor

* fix typo

* add info about home dir

* Update docs/website/docs/general-usage/credentials/configuration.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/config_providers.md

Co-authored-by: Anton Burnashev <[email protected]>

* Update docs/website/docs/general-usage/credentials/config_specs.md

Co-authored-by: Anton Burnashev <[email protected]>

* wip

* more info about Configuration classes

* more about tomls

* fix link

* fix layout

---------

Co-authored-by: Marcin Rudolf <[email protected]>
Co-authored-by: Anton Burnashev <[email protected]>
  • Loading branch information
3 people authored Oct 31, 2023
1 parent 59258e1 commit ad64a30
Show file tree
Hide file tree
Showing 13 changed files with 1,116 additions and 64 deletions.
8 changes: 4 additions & 4 deletions docs/technical/secrets_and_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,14 @@ You should type your function signatures! The effort is very low and it gives `d

```python
@dlt.source
def google_sheets(spreadsheet_id: str, tab_names: List[str] = dlt.config.value, credentials: GcpClientCredentialsWithDefault = dlt.secrets.value, only_strings: bool = False):
def google_sheets(spreadsheet_id: str, tab_names: List[str] = dlt.config.value, credentials: GcpServiceAccountCredentials = dlt.secrets.value, only_strings: bool = False):
...
```
Now:
1. you are sure that you get a list of strings as `tab_names`
2. you will get actual google credentials (see `CredentialsConfiguration` later) and your users can pass them in many different forms.

In case of `GcpClientCredentialsWithDefault`
In case of `GcpServiceAccountCredentials`
* you may just pass the `service_json` as string or dictionary (in code and via config providers)
* you may pass a connection string (used in sql alchemy) (in code and via config providers)
* or default credentials will be used
Expand Down Expand Up @@ -331,7 +331,7 @@ It tells you exactly which paths `dlt` looked at, via which config providers and

## Working with credentials (and other complex configuration values)

`GcpClientCredentialsWithDefault` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`.
`GcpServiceAccountCredentials` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`.

As an example, let's use `ConnectionStringCredentials` which represents a database connection string.

Expand Down Expand Up @@ -421,7 +421,7 @@ In fact for each decorated function a spec is synthesized. In case of `google_sh
@configspec
class GoogleSheetsConfiguration:
tab_names: List[str] = None # manadatory
credentials: GcpClientCredentialsWithDefault = None # mandatory secret
credentials: GcpServiceAccountCredentials = None # mandatory secret
only_strings: Optional[bool] = False
```

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ p = dlt.pipeline(pipeline_name='chess', destination='duckdb', dataset_name='ches

This destination accepts database connection strings in format used by [duckdb-engine](https://github.com/Mause/duckdb_engine#configuration).

You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials.md) (e.g. using a `secrets.toml` file)
You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials) (e.g. using a `secrets.toml` file)
```toml
destination.duckdb.credentials=duckdb:///_storage/test_quack.duckdb
```
Expand Down
4 changes: 0 additions & 4 deletions docs/website/docs/general-usage/configuration.md

This file was deleted.

146 changes: 146 additions & 0 deletions docs/website/docs/general-usage/credentials/config_providers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: Configuration Providers
description: Configuration dlt Providers
keywords: [credentials, secrets.toml, secrets, config, configuration, environment
variables, provider]
---

# Configuration Providers


Configuration Providers in the context of the `dlt` library
refer to different sources from which configuration values
and secrets can be retrieved for a data pipeline.
These providers form a hierarchy, with each having its own
priority in determining the values for function arguments.

## The provider hierarchy

If function signature has arguments that may be injected, `dlt` looks for the argument values in
providers.

### Providers

1. **Environment Variables**: At the top of the hierarchy are environment variables.
If a value for a specific argument is found in an environment variable,
dlt will use it and will not proceed to search in lower-priority providers.

2. **Vaults (Airflow/Google/AWS/Azure)**: These are specialized providers that come
after environment variables. They can provide configuration values and secrets.
However, they typically focus on handling sensitive information.

3. **`secrets.toml` and `config.toml` Files**: These files are used for storing both
configuration values and secrets. `secrets.toml` is dedicated to sensitive information,
while `config.toml` contains non-sensitive configuration data.

4. **Default Argument Values**: These are the values specified in the function's signature.
They have the lowest priority in the provider hierarchy.

### Example

```python
@dlt.source
def google_sheets(
spreadsheet_id=dlt.config.value,
tab_names=dlt.config.value,
credentials=dlt.secrets.value,
only_strings=False
):
sheets = build('sheets', 'v4', credentials=Services.from_json(credentials))
tabs = []
for tab_name in tab_names:
data = sheets.get(spreadsheet_id, tab_name).execute().values()
tabs.append(dlt.resource(data, name=tab_name))
return tabs
```

In case of `google_sheets()` it will look
for: `spreadsheet_id`, `tab_names` and `credentials`.

Each provider has its own key naming convention, and dlt is able to translate between them.

**The argument name is a key in the lookup**.

At the top of the hierarchy are Environment Variables, then `secrets.toml` and
`config.toml` files. Providers like Airflow/Google/AWS/Azure Vaults will be inserted **after** the Environment
provider but **before** TOML providers.

For example, if `spreadsheet_id` is found in environment variable `SPREADSHEET_ID`, `dlt` will not look in TOML files
and below.

The values passed in the code **explicitly** are the **highest** in provider hierarchy. The **default values**
of the arguments have the **lowest** priority in the provider hierarchy.

:::info
Explicit Args **>** ENV Variables **>** Vaults: Airflow etc. **>** `secrets.toml` **>** `config.toml` **>** Default Arg Values
:::

Secrets are handled only by the providers supporting them. Some providers support only
secrets (to reduce the number of requests done by `dlt` when searching sections).

1. `secrets.toml` and environment may hold both config and secret values.
1. `config.toml` may hold only config values, no secrets.
1. Various vaults providers hold only secrets, `dlt` skips them when looking for values that are not
secrets.

:::info
Context-aware providers will activate in the right environments i.e. on Airflow or AWS/GCP VMachines.
:::

## Provider key formats

### TOML vs. Environment Variables

Providers may use different formats for the keys. `dlt` will translate the standard format where
sections and key names are separated by "." into the provider-specific formats.

1. For TOML, names are case-sensitive and sections are separated with ".".
1. For Environment Variables, all names are capitalized and sections are separated with double
underscore "__".

Example: When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find
the `private_key` for Google credentials. It will look

1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found,
1. in `secrets.toml` with key `my_section.gcp_credentials.private_key`.

### Environment provider

Looks for the values in the environment variables.

### TOML provider

The TOML provider in dlt utilizes two TOML files:

- `secrets.toml `- This file is intended for storing sensitive information, often referred to as "secrets".
- `config.toml `- This file is used for storing configuration values.

By default, the `.gitignore` file in the project prevents `secrets.toml` from being added to
version control and pushed. However, `config.toml` can be freely added to version control.

:::info
**TOML provider always loads those files from `.dlt` folder** which is looked **relative to the
current Working Directory**.
:::

Example: If your working directory is `my_dlt_project` and your project has the following structure:

```
my_dlt_project:
|
pipelines/
|---- .dlt/secrets.toml
|---- google_sheets.py
```

and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in
`my_dlt_project/.dlt/secrets.toml` and ignore the existing
`my_dlt_project/pipelines/.dlt/secrets.toml`.

If you change your working directory to `pipelines` and run `python google_sheets.py` it will look for
`my_dlt_project/pipelines/.dlt/secrets.toml` as (probably) expected.

:::caution
It's worth mentioning that the TOML provider also has the capability to read files from `~/.dlt/`
(located in the user's home directory) in addition to the local project-specific `.dlt` folder.
:::
Loading

0 comments on commit ad64a30

Please sign in to comment.