Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: adds DBT concept documentation #111

Merged
merged 5 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions docs/concepts/dbt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
.. _dbt:

data build tool (dbt)
*********************

dbt is an open source, command-line tool managed by `dbtlabs`_ for generating and maintaining data transformations.

dbt allows engineers to transform data by writing ``SELECT`` statements that reflect business logic which dbt
materializes into tables and views that can be queried efficiently.

dbt also allows engineers to modularize and re-use their transformation code using "packages" that can be shared across
projects or organizations.

dbt in Aspects
##############

Aspects uses the `aspects-dbt`_ package to define the transforms used by the Aspects project. This package creates and
manages macros and materialized views for data tables stored in :ref:`Clickhouse`, and provides some tests.

Operators may create and install their own dbt packages; see :ref:`dbt-extensions` for details.

`tutor-contrib-aspects`_ also provides a "do" command to proxy running `dbt commands`_ against your deployment; run
``tutor [dev|local] do dbt --help`` for details.

References
##########

* `dbtlabs`_: dbt documentation
* `dbt-core`_: core dbt package
* `aspects-dbt`_: Aspects dbt transforms
* `tutor-contrib-aspects`_: Aspects Tutor plugin

.. _aspects-dbt: https://github.com/openedx/aspects-dbt/#aspects-dbt
.. _dbtlabs: https://docs.getdbt.com/
.. _dbt-core: https://github.com/dbt-labs/dbt-core
.. _dbt commands: https://docs.getdbt.com/reference/dbt-commands
.. _tutor-contrib-aspects: https://github.com/openedx/tutor-contrib-aspects
1 change: 1 addition & 0 deletions docs/concepts/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Concepts
xAPI <xapi_concepts>
Tracking Logs <tracking_logs>
Clickhouse <clickhouse>
dbt <dbt>
Ralph <ralph>
Vector <vector>
Pipelines <pipelines>
Expand Down
87 changes: 75 additions & 12 deletions docs/how-tos/dbt_extensions.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,77 @@
.. _dbt-extensions:

DBT extensions
**************

To extend the DBT project, you can use the following Tutor variables:

- **DBT_REPOSITORY**: A git repository URL to clone and use as the DBT project.
- **DBT_BRANCH**: The branch to use when cloning the DBT project.
- **DBT_PROJECT_DIR**: The directory to use as the DBT project.
- **EXTRA_DBT_PACKAGES**: A list of python packages for the DBT project to install.
- **DBT_ENABLE_OVERRIDE**: This variable determines whether the DBT project override feature
should be enabled or not. When enabled, it allows you to make changes to the **dbt_project.yml**
and **packages.yml** files using the tutor patches: `dbt-packages` and `dbt-project`.
Extending dbt
*************

As noted in :ref:`dbt`, you can install your own custom dbt package to apply your own transforms to the event data
in Aspects.

**Step 1. Create your dbt package**

Create a new dbt package using `dbt init`_.

Update the generated ``dbt_project.yml`` to use the ``aspects`` profile:

.. code-block:: yaml

# This setting configures which "profile" dbt uses for this project.
profile: 'aspects'

See `Building dbt packages`_ for more details, and `Writing data tests`_ for how to validate your transformations.

**Step 2. Link to aspects-dbt**

Aspects charts depend on the transforms in `aspects-dbt`_, so it's important that your dbt package also installs
the same version of `aspects-dbt`_ as your Aspects Tutor plugin.

To do this, add a ``packages.yml`` file to your dbt package at the top level, where:

* ``git`` url matches the default value of ``DBT_REPOSITORY`` in `tutor-contrib-aspects plugin.py`_
* ``revision`` matches the default value of ``DBT_BRANCH`` in `tutor-contrib-aspects plugin.py`_

.. code-block:: yaml

packages:
- git: "https://github.com/openedx/aspects-dbt.git"
revision: v2.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should note here that it will also need to contain all of the contents of the aspects-dbt packages.yml too, I believe.

Copy link
Contributor

@Ian2012 Ian2012 Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true? I don't think so. I've been using it with only the aspects-dbt for two other projects and it has shown no issues. I think dependencies are installed recursively

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmtcril I think @Ian2012 is correct here.

I couldn't find docs that verify it, which I would expect to find if dbt dependencies didn't work like other package dependencies.

I also dug through the dbt packages on dbt hub to find an example. This demo project depends on dbt-codegen, which in turn depends on dbt-utils, but dbt-utils need not be present in the parent's packages.yml.

**Step 3. Install and run your dbt package**

Update the following Tutor variables to use your package instead of the Aspects default.

- ``DBT_REPOSITORY``: A git repository URL to clone and use as the dbt project.

Set this to the URL for your custom dbt package.

Default: ``https://github.com/openedx/aspects-dbt``
- ``DBT_BRANCH``: The branch to use when cloning the dbt project.

Set this to the hash/branch/tag of your custom dbt package that you wish to use.

Default: varies between versions of Aspects.
- ``DBT_PROJECT_DIR``: The directory to use as the dbt project.

Set this to the name of your dbt package repository.

Default: ``aspects-dbt``
- ``EXTRA_DBT_PACKAGES``: Add any python packages that your dbt project requires here.

Default: ``[]``
- ``DBT_PROFILE_*``: variables used in the Aspects ``dbt/profiles.yml`` file, including several Clickhouse connection settings.

Once your package is configured in Tutor, you can run dbt commands directly on your deployment; run ``tutor [dev|local] do dbt --help`` for details.

References
**********

* `Building dbt packages`_: dbt's guide to building packages
* `Writing data tests`_: dbt's guide to writing package tests
* `aspects-dbt`_: Aspects' dbt package
* `eduNEXT/dbt-aspects-unidigital`_: a custom dbt packages running in production Aspects

.. _aspects-dbt: https://github.com/openedx/aspects-dbt
.. _dbt init: https://docs.getdbt.com/reference/commands/init
.. _eduNEXT/dbt-aspects-unidigital: https://github.com/eduNEXT/dbt-aspects-unidigital
.. _Building dbt packages: https://docs.getdbt.com/guides/building-packages
.. _Writing data tests: https://docs.getdbt.com/best-practices/writing-custom-generic-tests
.. _tutor-contrib-aspects plugin.py: https://github.com/openedx/tutor-contrib-aspects/blob/main/tutoraspects/plugin.py