ASU: Justified Referral in AI Glaucoma Screening challenge.
- Login to the lambda machine (via ssh key is preferred). To set this up:
- Alias the lambda machine in your
~/.ssh/config
file:Host lambda HostName 152.10.212.186 User <your-username> Port 22
- Issue a key exchange with the lambda machine:
ssh-copy-id <your-username>@lambda
- You should now be able to login to the lambda machine with the following command:
ssh <your-username>@lambda
- Alias the lambda machine in your
- Create a new Python virtual environment using
venv
. Ensure you use the command below to do so, as this will leverage the lambda-stack by default:python3 -m venv just-raigs --system-site-packages
- If you are not in
bash
shell, switch to it:bash
- Activate the virtual environment:
source just-raigs/bin/activate
- Note that the lambda-stack will have installed most of
the deep learning packages you require for you:
pip list
- I have created an environment for us to use, which is stored in the
requirements.txt
file. You can install all the required packages with the following command:pip install -r requirements.txt
- Any other packages you wish to install can be installed with
pip
:pip install some-package
- The lambda machine is powerful, but it is not a GPU cluster. There are two
NVIDIA GeForce RTX 3090 GPUs
and126 GB
of RAM that we must all share. - Please monitor resource consumption and utilization with the following commands:
- Current GPU usage:
watch -d -n 0.5 nvidia-smi
- Memory usage:
htop
- Current GPU usage:
- If one of the GPUs is not in use, you can use it. If both are in use, please wait until one is free. Communicate with
your team members on our
JustRAIGS
Google Chat Space to coordinate experimentation and resource utilization. - Ensure you check WaB to see if someone is already running an experiment that would include the parameters you were going to run.
- Data is stored in the
/usr/local/data/JustRAIGS
directory. - The raw compressed files are stored in
/usr/local/data/JustRAIGS/compressed
. - The extracted uncompressed files (unmodified) are stored in:
/usr/local/data/JustRAIGS/raw
.- Files in this directory are partitioned by the original dataset splits provided by the challenge organizers. For
example, the training data is stored in
/usr/local/data/JustRAIGS/raw/train/0
corresponds to the compressed file:JustRAIGS_Train_0.zip
provided directly from the challenge Zenodo website.
- Files in this directory are partitioned by the original dataset splits provided by the challenge organizers. For
example, the training data is stored in
- I have provided a utility method :meth:
src.utils.datasets.load_datasets
which will load the training datasets from the disk, perform preprocessing, rescaling, normalization, and convert the result into TensorFlow Datasets for use downstream.
- Weights and Biases (WaB) is a tool that allows us to track and visualize our
experiments. It is a great tool for collaboration, and is flexible enough to be customized how you see fit.
- This means you have the freedom to log arbitrary metrics, artifacts, models, and more.
- You can also integrate WaB with other deep learning frameworks and libraries. For instance, you could use Keras Tuner with WaB instead of the native WaB hyperparameter tuning framework.
- I have created a WaB project for us to use, which is located at: WaB: JustRAIGS.
- I have also provided comments in Restructured Text (RST) format in the codebase to help you understand how to integrate WaB with your code, and which documentation to reference when you get stuck.
- There is some high level terminology you should know which will allow you to utilize WaB effectively:
Organization
: This is a collection ofProjects
. It is a way to organize separate distinctProjects
within a particular organization/research group.Project
: This is the highest level of organization in WaB. It is a collection ofExperiments
.
- An
Experiment
is a collection ofRuns
,Metrics
andArtifacts
.- A
Run
: Is a single unit of computation logged by WaB. Consider aRun
an atomic element of the whole project. A newRun
should be initialized if you change a hyperparameter, use a different model, etc. - Within a
Run
,Metrics
can be tracked across the training and validation datasets and visualized in the WaB dashboard. You could track theaccuracy
,loss
,precision
,recall
and more asMetrics
. - Within a
Run
,Artifacts
can also be logged. For example, you could log the model weights, the model architecture, the training dataset, the training logs, a matplotlib image, etc.- For more on logging artifacts, see the WaB Documentation on Artifacts.
- A
- WaB uses
Reports
to organizeRuns
, embed visualizations, describe findings, and share content with collaborators. - WaB leverages
Sweeps
to automate hyperparameter searching and perform experiment tracking.Sweeps
allow you to leverage popular hyperparameter search methods such as Bayesian search, grid search, random search, hyperband search, and more.Sweeps
require aSweep Configuration
to specify the overall hyperparameter search space and the method of search.- Behind the scenes, the WaB Sweep Controller
manages the execution of the
Sweep
. You interact with the WaB Sweep Controller via thewandb.agent
API.
- Behind the scenes, the WaB Sweep Controller
manages the execution of the
- WaB
Sweeps
generate uniqueTrials
based on theSweep Configuration
provided. EachTrial
is a unique subset of the overall hyperparameter search space specified by theSweep Configuration
. In the way I leverage WaB, eachTrial
will have its own uniqueRun
, and therefore its ownMetrics
andArtifacts
. - In Keras a
Model
is a class that performs fitting (weight optimization) to a particular datasets. The model is in charge of minimizing a particular loss function, and is capable of making predictions on a particular dataset (i.e. performing inference). This class also is in charge of: resource management for the training process, the logging of metrics, the logging of losses, and the logging of artifacts to WaB. - A
Hypermodel
is a class that is in charge of instantiating and managingModels
for the hyperparameter tuning process. TheHyperModel
class is instantiated just once per-Sweep
and is responsible for creating a newModel
for eachTrial
in theSweep
.
- In the
docs
directory, you will find the source files needed to build Sphinx documentation. Sphinx is a static website generator that parses Python docstrings into HTML documentation. This is the tool leveraged by the official Python documentation so it is worth your while to be somewhat familiar with it. Sphinx operates on docstrings written in Restructured Text (RST) form. Restructured Text is a superset of Markdown. Since RST can be ugly to look at, I write docstrings in the Google Documentation Style, and use thesphinx.ext.napoleon
extension to parse it into RST, which Sphinx then utilizes to generate pretty looking HTML documentation. You will most likely not need to know how this works. Just know that if you write good docstrings in the Google Style (use this example for reference) , Sphinx will be able to generate readable documentation for you (almost) automagically.- If you use PyCharm (which you should) for Python development, then PyCharm can build the Sphinx documentation for you. Additionally, you can configure PyCharm to lint your docstrings in Google Style. Ask me if you want to know how to do this.
- In the
src
directory, you will find the following relevant subdirectories:sweepers
: This is the main entry point for the program. The :mod:src.sweepers.sweeper
module is the file you should modify to either change the hyperparameters that are experimented with, or to change the method of the hyperparameter search itself (i.e. random search, grid search, hyperband, etc.). Note that if you do change the hyperparameters, you will also need to change theHypermodel
itself to be able to handle the new hyperparameters.hypermodels
: Contains an example :class:hypermodels.hypermodels.WaBHyperModel
class which is instantiated by the WaB agent just once for a particularSweep
. This class is responsible for creating a newModel
for eachTrial
(i.e. unique subset of hyperparameters). Specifically, the :meth:hypermodels.hypermodels.WaBHyperModel.construct_model_run_trial
method is invoked once perTrial
and is in charge of creating a newModel
for theTrial
and fitting theModel
for theTrial
. You will need to modify this method if you modify the hyperparameters in the sweep configuration.models
: Contains an example :class:models.models.WaBModel
class which is instantiated by theHypermodel
once for everyTrial
. This class is separate from the hypermodel as thehypermodel
could theoretically instantiate separateModel
subclasses for eachTrial
. Additionally, the :class:models.models.WaBModel
class provides an example of how to perform custom model serialization and deserialization in TensorFlow 2.0. If you wish to use a non-sequential model, or a model that requires custom serialization/deserialization, this class will serve as a useful reference.metrics
: This file is used to house custom metrics that are not available by default in the Keras API. Note that the use of custom metrics will result in a custom model, which means you will have to modify the serialization and deserialization methods in the :class:src.models.models.WaBModel
class.utils
: This directory houses utility functions which are leveraged by the various classes above. For instance, the :meth:utils.datasets.load_datasets
method will load the training dataset from the disk, perform preprocessing, rescaling and normalization, and convert the result into TensorFlow Datasets for use downstream.layers
: This directory houses the :mod:src.layers.custom
module which provides an example of how to create custom layers in TensorFlow 2.0. This is not used in the current codebase, but is provided as a reference in case you wish to use custom layers in your model. Note that using a custom layer will result in a custom module, which will require you to modify the serialization an deserialization methods in the :class:src.models.WaBModel
class.
- Less-relevant subdirectories:
tuners
: This directory houses the :mod:src.tuners.wab_kt_tuner
module which provides an example of how to integrate WaB with Kerastuner. This is not used in the current codebase, but is provided as a reference in case you wish to leverage KerasTuner for hyperparameter tuning directly, instead of WaB. This class uses KerasTuner as a driver, but still integrates with WaB for experiment tracking, versioning, and artifact retention.