We use conda
and MLflow to
handle experiments/runs and all python dependencies.
Please, install these tools:
Since 10/2019, we need to register an account in order to download the dataset. To download the dataset, use the following form : http://www.image-net.org/download.php
To configure the path to already existing ImageNet dataset, please specify DATASET_PATH
environment variable
export DATASET_PATH=/path/to/imagenet
# export DATASET_PATH=$PWD/input/imagenet
Setup mlflow output path as
export MLFLOW_TRACKING_URI=/path/to/output/mlruns
# e.g export MLFLOW_TRACKING_URI=$PWD/output/mlruns
Create once "Trainings" experiment
mlflow experiments create -n Trainings
or check existing experiments:
mlflow experiments list
export MLFLOW_TRACKING_URI=/path/to/output/mlruns
# e.g export MLFLOW_TRACKING_URI=$PWD/output/mlruns
mlflow run experiments/mlflow --experiment-name=Trainings -P config_path=configs/train/baseline_r50.py -P num_gpus=2
To visualize experiments and runs, user can start mlflow dashboard:
mlflow server --backend-store-uri /path/to/output/mlruns --default-artifact-root /path/to/output/mlruns -p 6026 -h 0.0.0.0
# e.g mlflow server --backend-store-uri $PWD/output/mlruns --default-artifact-root $PWD/output/mlruns -p 6026 -h 0.0.0.0
To visualize experiments and runs, user can start tensorboard:
tensorboard --logdir /path/to/output/mlruns/1
# e.g tensorboard --logdir $PWD/output/mlruns/1
where /1
points to "Training" experiment.
Files tree description:
code
configs
experiments/mlflow : MLflow related files
notebooks
- conda.yaml: defines all python dependencies necessary for our experimentations
- MLproject: defines types of experiments we would like to perform by "entry points":
- main : starts single-node multi-GPU training script
When we execute
mlflow run experiments/mlflow --experiment-name=Trainings -P config_path=configs/train/baseline_r50.py -P num_gpus=2
it executes main
entry point from MLproject and runs provided command.