Skip to content

Commit

Permalink
Merge pull request #311 from dlt-hub/enh/new_readme
Browse files Browse the repository at this point in the history
Update README, add contributor's guide
  • Loading branch information
matthauskrzykowski authored Apr 29, 2023
2 parents acf623b + 34f736b commit 80fdb0c
Show file tree
Hide file tree
Showing 2 changed files with 156 additions and 91 deletions.
104 changes: 104 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Contributing to dlt

Thank you for considering contributing to **dlt**! We appreciate your help in making dlt better. This document will guide you through the process of contributing to the project.

## Table of Contents

1. [Getting Started](#getting-started)
2. [Submitting Changes](#submitting-changes)
3. [Linting](#linting)
4. [Testing](#testing)
5. [Local Development](#local-development)
6. [Publishing (Maintainers Only)](#publishing-maintainers-only)
7. [Resources](#resources)

## Getting Started

To get started, follow these steps:

1. Fork the `dlt` repository and clone it to your local machine.
2. Install `poetry` with `make install-poetry` (or follow the [official instructions](https://python-poetry.org/docs/#installation)).
3. Run `make dev` to install all dependencies including dev ones.
4. Start working in the `poetry` shell by executing `poetry shell`.

## Submitting Changes

When you're ready to contribute, follow these steps:

1. Create an issue describing the feature, bug fix, or improvement you'd like to make.
2. Create a new branch in your forked repository for your changes. **Note:** for some special cases, you'd need to contact us to create a branch in the main repository.
3. Write your code and tests.
4. Lint your code by running `make lint` and test common modules with `make test-common`.
5. If you're working on destination code, contact us to get access to test destinations.
6. Create a pull request targeting the `devel` branch of the main repository.

## Linting

`dlt` uses `mypy` and `flake8` with several plugins for linting.

## Testing

dlt uses `pytest` for testing.

### Common Components

To test common components (which don't require external resources), run `make test-common`.

### Local Destinations

To test local destinations (`duckdb` and `postgres`), run `make test-local`.

### External Destinations

To test external destinations use `make test`. You will need following external resources

1. `BigQuery` project
2. `Redshift` cluster
3. `Postgres` instance. You can find a docker compose for postgres instance [here](tests/load/postgres/docker-compose.yml). When run the instance is configured to work with the tests.

```shell
cd tests/load/postgres/
docker-compose up --build -d
```

See `tests/.example.env` for the expected environment variables and command line example to run the tests. Then create `tests/.env` from it. You configure the tests as you would configure the dlt pipeline.
We'll provide you with access to the resources above if you wish to test locally.

## Local Development

Use Python 3.8 for development, as it's the lowest supported version for `dlt`. You'll need `distutils` and `venv`. You may also use `pyenv`, as suggested by [poetry](https://python-poetry.org/docs/managing-environments/).

# Publishing (Maintainers Only)

This section is intended for project maintainers who have the necessary permissions to manage the project's versioning and publish new releases. If you're a contributor, you can skip this section.

## Project Versioning

`dlt` follows the semantic versioning with the [`MAJOR.MINOR.PATCH`](https://peps.python.org/pep-0440/#semantic-versioning) pattern. Currently, we are using **pre-release versioning** with the major version being 0.

- `minor` version change means breaking changes
- `patch` version change means new features that should be backward compatible
- any suffix change, e.g., `a10` -> `a11`, is considered a patch

Before publishing a new release, make sure to bump the project's version accordingly:

1. Use `poetry version prerelease` to bump the patch version.
2. Run `make build-library` to apply the changes to the project.
3. The source of the version is `pyproject.toml`, and we use `poetry` to manage it.

## Publishing to PyPI

Once the version has been bumped, follow these steps to publish the new release to PyPI:

1. Ensure that you are on the `devel` branch and have the latest code that has passed all tests on CI.
2. Verify the current version with `poetry version`.
3. Obtain a PyPI access token and configure it with `poetry config pypi-token.pypi your-api-token`.
4. Run `make publish-library` to publish the new version.
5. Create a release on GitHub, using the version and git tag as the release name.

## Resources

- [dlt Docs](https://dlthub.com/docs)
- [Poetry Documentation](https://python-poetry.org/docs/)

If you have any questions or need help, don't hesitate to reach out to us. We're here to help you succeed in contributing to `dlt`. Happy coding!
143 changes: 52 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,33 @@
![](https://github.com/dlt-hub/dlt/raw/devel/docs/DLT-Pacman-Big.gif)

<h1 align="center">
<strong>data load tool (dlt) — the open-source Python library for data loading</strong>
</h1>
<p align="center">
Be it a Google Colab notebook, AWS Lambda function, an Airflow DAG, your local laptop,<br/>or a GPT-4 assisted development playground—<strong>dlt</strong> can be dropped in anywhere.
</p>

[![PyPI version](https://badge.fury.io/py/dlt.svg)](https://pypi.org/project/dlt/)
[![LINT Badge](https://github.com/dlt-hub/dlt/actions/workflows/lint.yml/badge.svg)](https://github.com/dlt-hub/dlt/actions/workflows/lint.yml)
[![TEST COMMON Badge](https://github.com/dlt-hub/dlt/actions/workflows/test_common.yml/badge.svg)](https://github.com/dlt-hub/dlt/actions/workflows/test_common.yml)
[![TEST DESTINATIONS Badge](https://github.com/dlt-hub/dlt/actions/workflows/test_destinations.yml/badge.svg)](https://github.com/dlt-hub/dlt/actions/workflows/test_destinations.yml)
[![TEST BIGQUERY Badge](https://github.com/dlt-hub/dlt/actions/workflows/test_destination_bigquery.yml/badge.svg)](https://github.com/dlt-hub/dlt/actions/workflows/test_destination_bigquery.yml)
[![TEST DBT Badge](https://github.com/dlt-hub/dlt/actions/workflows/test_dbt_runner.yml/badge.svg)](https://github.com/dlt-hub/dlt/actions/workflows/test_dbt_runner.yml)
<div align="center">
<a target="_blank" href="https://join.slack.com/t/dlthub-community/shared_invite/zt-1slox199h-HAE7EQoXmstkP_bTqal65g" style="background:none">
<img src="https://img.shields.io/badge/slack-join-dlt.svg?labelColor=191937&color=6F6FF7&logo=slack" />
</a>
<a target="_blank" href="https://pypi.org/project/dlt/" style="background:none">
<img src="https://img.shields.io/pypi/v/dlt?labelColor=191937&color=6F6FF7">
</a>
<a target="_blank" href="https://pypi.org/project/dlt/" style="background:none">
<img src="https://img.shields.io/pypi/pyversions/dlt?labelColor=191937&color=6F6FF7">
</a>
</div>

</p>
## Installation

dlt supports Python 3.8+.

# data load tool (dlt)
```bash
pip install dlt
```

## Quick Start

**[Colab Demo](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing)**
Load chess game data from chess.com API and save it in DuckDB:

```python
import dlt
Expand All @@ -33,96 +47,43 @@ data = chess(['magnuscarlsen', 'rpragchess'], start_month='2022/11', end_month='
pipeline.run(data)
```

**data load tool (dlt)** is an open source Python library that makes data loading easy

- Automatically turn the JSON returned by any API into a live dataset stored wherever you want it
- `pip install dlt` and then include `import dlt` to use it in your Python loading script
- The **dlt** library is licensed under the Apache License 2.0, so you can use it for free forever

Read more about it on the [dlt Docs](https://dlthub.com/docs)

# semantic versioning

`dlt` will follow the semantic versioning with [`MAJOR.MINOR.PATCH`](https://peps.python.org/pep-0440/#semantic-versioning) pattern. Currently we do **pre-release versioning** with major version being 0.

- `minor` version change means breaking changes
- `patch` version change means new features that should be backward compatible
- any suffix change ie. `a10` -> `a11` is a patch

# development

`dlt` uses `poetry` to manage, build and version the package. It also uses `make` to automate tasks. To start

```sh
make install-poetry # will install poetry, to be run outside virtualenv
```

then
Try it out in our **[Colab Demo](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing)**

```sh
make dev # will install all deps including dev
```

Executing `poetry shell` and working in it is very convenient at this moment.

## python version

Use python 3.8 for development which is the lowest supported version for `dlt`. You'll need `distutils` and `venv`:

```shell
sudo apt-get install python3.8
sudo apt-get install python3.8-distutils
sudo apt install python3.8-venv
```

You may also use `pyenv` as [poetry](https://python-poetry.org/docs/managing-environments/) suggests.

## bumping version
## Features

Please use `poetry version prerelease` to bump patch and then `make build-library` to apply changes. The source of the version is `pyproject.toml` and we use poetry to manage it.
- **Automatic Schema:** Data structure inspection and schema creation for the destination.
- **Data Normalization:** Consistent and verified data before loading.
- **Seamless Integration:** Colab, AWS Lambda, Airflow, and local environments.
- **Scalable:** Adapts to growing data needs in production.
- **Easy Maintenance:** Clear data pipeline structure for updates.
- **Rapid Exploration:** Quickly explore and gain insights from new data sources.
- **Versatile Usage:** Suitable for ad-hoc exploration to advanced loading infrastructures.
- **Start in Seconds with CLI:** Powerful CLI for managing, deploying and inspecting local pipelines.
- **Incremental Loading:** Load only new or changed data and avoid loading old records again.
- **Open Source:** Free and Apache 2.0 Licensed.

## testing and linting
## Ready to use Pipelines and Destinations

`dlt` uses `mypy` and `flake8` with several plugins for linting. We do not reorder imports or reformat code.
Explore ready to use pipelines (e.g. Google Sheets) in the [Pipelines docs](https://dlthub.com/docs/pipelines/chess) and supported destinations (e.g. DuckDB) in the [Destinations docs](https://dlthub.com/docs/destinations/bigquery).

`pytest` is used as test harness. `make test-common` will run tests of common components and does not require any external resources.
## Documentation

### testing destinations
For detailed usage and configuration, please refer to the [official documentation](https://dlthub.com/docs).

To test destinations use `make test`. You will need following external resources
## Examples

1. `BigQuery` project
2. `Redshift` cluster
3. `Postgres` instance. You can find a docker compose for postgres instance [here](tests/load/postgres/docker-compose.yml). When run the instance is configured to work with the tests.
You can find examples for various use cases in the [examples](docs/examples) folder.

```shell
cd tests/load/postgres/
docker-compose up --build -d
```

See `tests/.example.env` for the expected environment variables and command line example to run the tests. Then create `tests/.env` from it. You configure the tests as you would configure the dlt pipeline.
We'll provide you with access to the resources above if you wish to test locally.

To test local destinations (`duckdb` and `postgres`) run `make test-local`. You can run this tests without additional credentials (just copy `.example.env` into `.env`)

## publishing

1. Make sure that you are on `devel` branch and you have the newest code that passed all tests on CI.
2. Verify the current version with `poetry version`
3. You'll need `pypi` access token and use `poetry config pypi-token.pypi your-api-token` then

```
make publish-library
```
## Get Involved

4. Make a release on github, use version and git tag as release name
The dlt project is quickly growing, and we're excited to have you join our community! Here's how you can get involved:

## contributing
- **Connect with the Community**: Join other dlt users and contributors on our [Slack](https://join.slack.com/t/dlthub-community/shared_invite/zt-1slox199h-HAE7EQoXmstkP_bTqal65g)
- **Report issues and suggest features**: Please use the [GitHub Issues](https://github.com/dlt-hub/dlt/issues) to report bugs or suggest new features. Before creating a new issue, make sure to search the tracker for possible duplicates and add a comment if you find one.
- **Contribute Pipelines**: Pipelines are data processing steps that help move and transform data between various sources and destinations. Contribute your custom pipelines to the [dlt-hub/pipelines](https://github.com/dlt-hub/pipelines) to help other folks in handling their data tasks.
- **Contribute code**: Check out our [contributing guidelines](CONTRIBUTING.md) for information on how to make a pull request.
- **Improve documentation**: Help us enhance the dlt documentation.

To contribute via pull request:
## License

1. Create an issue with your idea for a feature etc.
2. Write your code and tests
3. Lint your code with `make lint`. Test the common modules with `make test-common`
4. If you work on a destination code then contact us to get access to test destinations
5. Create a pull request
DLT is released under the [Apache 2.0 License](LICENSE.txt).

0 comments on commit 80fdb0c

Please sign in to comment.