Skip to content

philiporlando/dagster-and-r

Repository files navigation

dagster_and_r

Exploring the synergy between Dagster, a modern data orchestrator, and R, a powerful statistical programming language. This project showcases how business logic written in R can be integrated seamlessly within the Dagster framework.

Key Features

  • Docker Integration: Execute R code in isolated environments using Docker container ops.
  • Dagster Pipes: Run R scripts within a subprocess, leveraging Dagster's experimental Pipes feature.
  • Reticulate Bridge: Utilize the {reticulate} R package to create a bridge between Python and R, enhancing interoperability.

Getting started

To begin exploring the integration of Dagster and R:

  1. Clone the Repository

    git clone https://github.com/philiporlando/dagster-and-r.git
  2. Navigate to Directory

    cd dagster-and-r
  3. Install Dependencies Using poetry, install the package and its dependencies:

    poetry install
  4. Set RETICULATE_PYTHON environment variable Determine the path to the python binary associated with this project's poetry environment.

    poetry run
    which python
    # /home/user/.cache/pypoetry/virtualenvs/dagster-and-r-kS5e8P_l-py3.10/bin/python

Create a new .Renviron file at the root of the project and set the RETICULATE_PYTHON variable to this path.

  1. Launch the Dagster UI Start the Dagster web server:
    poetry run dagster dev
    Access the UI at http://localhost:3000 in your browser.

Dagster UI Never Materialized

  1. Materialize Assets Click the "Materialize all" button in the top right of the UI. Each of the assets within this project should materialize without error.

Dagster UI Materialized

  1. Inspect the Run Click the "Runs" tab and navigate to the latest run of the pipeline to access detailed information, including custom logs, asset checks, and environment variables being passed from an external R session.

Dagster UI Run

  1. Create Assets Begin writing assets in dagster_and_r/assets.py. They are automatically loaded into the Dagster code location.

Then, start the Dagster UI web server:

poetry run dagster dev -m dagster_and_r

Open http://localhost:3000 with your browser to see the project.

Current Integrations

Dagster Pipes

  • Pass logs between an external R session and Dagster
  • Pass environment variables and context between an external R session and Dagster
  • Asset checks defined in R
  • In-memory data passing
  • Pass markdown metadata between R and Dagster (e.g. head() of a data.frame))

Docker Container Op

  • Execute external R code from a Docker container op.

Development Guide

Adding Python Dependencies

To add new Python packages to the project:

poetry add <pkg-name>

Unit Testing

Unit tests are essential for ensuring code reliability and are currently being developed. Run existing tests using pytest:

poetry run pytest dagster_and_r_tests

Note

Unit tests are a work in progress.

Schedules and Sensors

To enable Schedules and Sensors, ensure the Dagster Daemon is active:

poetry run dagster dev

With the Daemon running, you can start using schedules and sensors for your jobs.

Contributions

Contributions to enhance or expand the project are welcome! Feel free to fork the repository, make changes, and submit a pull request.