Skip to content

sicara/pycon-2022-dvc-streamlit

Repository files navigation

Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit

PyConDE

This repo provides the slides and the materials for the talk I gave at PyConDE/PyDataBerlin 2022, on Tuesday April 12nd.

🎤 Watch the slides

I've made the slides with Streamlit, so you need to run some pip install before you can see the slides :).

1️⃣ Requirements

It works with python 3.9.10 on my laptop. It should be working with python >=3.6, but I have not tested it though.

2️⃣ Installation

pip install -r requirements.txt
pre-commit install  # You can skip this if you don't intend to make new commits

3️⃣ Pull the data

dvc pull -R .
dvc exp pull origin -A

4️⃣ Start the presentation

Just run:

streamlit run st_talk_slides.py

You should see the first slide with the title: title slide

From there, you can navigate through the slides with the menu in the left sidebar. Please open an issue if you got trouble with the slides 🙏.

🧑‍💻 About the code

I've made the slides with Streamlit for several reasons:

  • to show the code and its execution in the slides, to avoid switching to a web browser during the presentation
  • to make the slide more interactive
  • because the talk was about Streamlit, kind of inception 🌀

I used streamlit-book for the page layout. Many thanks sebastiandres for the awesome work 🙏 👍.

📂 Project Structure

Path Description
st_talk_slides.py The main Streamlit script for the slides.
./code_samples Code samples that were run "as is" in the slides.
./images The images of the slides.
./src Source code for the training pipeline: no streamlit here, only Python and DVC
./utils Utility functions for the slides e.g, display HTML and CSS, command line in Streamlit etc

🧪 Running new experiments

  • 1️⃣ Add experiments in the queue. For instance, if you want to change the train seed:
dvc exp run --set-params train.seed=0106 --queue

➡️ you can look at available parameters in the params.yaml file here

  • 2️⃣ Run the experiments that are in the queue:
dvc exp run --run-all
  • 3️⃣ Check the results:
dvc exp show
  • 4️⃣ Save the experiments to the remote git server and data storage (requires forking this repo & setting up your own dvc remote):
git push
dvc exp push origin --rev HEAD

⚠️ A note on DVC remote storage: remote storage is the Sicara's public s3 bucket (see dvc config file). By default, you have permission to read (dvc pull) but you cannot write (dvc push). If you want to run experiments and save your result with dvc push, consider adding your own dvc remote.

About

PyCon Talks 2022 by Antoine Toubhans

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages