Building better data structures, APIs and configuration systems for scientific software using Pydantic
Presented by Axel Donath and Nick Langellier
This tutorial is an introduction to Pydantic, a library for data validation and settings management using Python type annotations. Using a semi-realistic ML and / or scientific software pipeline scenario we demonstrate how Pydantic can be used to support type validations for scientific data structures, APIs and configuration systems. We show how the use of Pydantic in scientific and ML software leads to a more pleasant user experience as well as more robust and easier to maintain code. A minimum knowledge of Python type annotations, class definitions and data structures will be helpful for beginners but not required.
Prerequisites: A minimum knowledge of Python type annotations, class definitions and data structures will be helpful for beginners but not required. Some minimal knowledge of Pandas and Numpy for the hands-on exercises and Jupyter notebooks.
Prior Python Programming Level of Knowledge Expected: Advanced beginners
To execute the notebooks and examples from this tutorial, you are expected to have a working Python >=3.9 installation of your choice and git
installed. First clone this repository:
git clone https://github.com/adonath/scipy-2023-pydantic-tutorial
And change to the cloned folder:
cd scipy-2023-pydantic-tutorial
We recommend to use a virtual environment to install the required dependencies for this tutorial. For this we provide two options:
-
System Python: If you use a normal system Python installation you can create and activate the environment using:
python -m venv scipy-2023-pydantic-tutorial source ./scipy-2023-pydantic-tutorial/bin/activate python -m pip install -r requirements.txt
-
Conda / Mamba: If you prefer to use a package manager such as conda or mamba we also provide an
environment.yaml
file. In this case you can create and activate the environment using:conda env create -f environment.yaml conda activate scipy-2023-pydantic-tutorial
Once the environment is created and activated execute the following script to check your setup:
python check-setup.py
If everything is installed correctly you should see the following message:
******************************************************************
* Congratulations! You are ready to begin the Pydantic tutorial! *
******************************************************************
To execute the notebooks along with the presentation, start the Jupyter server:
jupyter notebook
If you have cloned the repository before July 10, 2023, please make sure to update it to the latest version. In case you have local changes, we would recommend to store those in a separate branch before updating the repository. To do so, change to the cloned folder and execute the following commands:
git checkout -b my-local-changes
git add .
git commit -m "My local changes"
Then you can update the repository using:
git checkout main
git pull origin main
Now you should have the latest version of the repository.
In case you have cloned the repository before July 10, 2023, please make sure to update the environment to the latest version. To do so, change to the cloned folder and execute the following commands:
If you use a normal system Python installation you can update the environment using:
python -m pip install -r requirements.txt
If you use conda or mamba you can update the environment using:
conda env update -f environment.yaml
If you do not have a working installation (for whatever reason) before the tutorial starts, you can also execute the tutorial in the browser using Binder. Just click on the following badge to start the Binder service:
This will open a new tab with a clone of the repository and a running instance of Jupyter Lab in your browser. The loading of the environment might take some time.
Important: The Binder service might not work reliably if a large number of participants is using it the same time. Installing and executing the tutorials locally is the preferred option.
We will cover the following chapters:
- Introduction and Installation
- Basic Usage
- Advanced Usage I
- Advanced Usage II
- Serialisation and Deserialisation
- Summary and Conclusion
Here is a more complete overview.
You can find the rendered notebooks and smaple solutions to the exercises on the GitHub pages of this repository.