The KAUST Visualization Core Lab (KVL) will host an Introduction to Machine Learning with Scikit-Learn workshop on Tuesday, 30 March 2021 from 1 to 5 pm AST. The workshops will largely follow Chapter 2 of Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow which walks through the process of developing an end-to-end machine learning project with Scikit-Learn.
- Working with Real Data
- Understanding the Big Picture
- Getting the Data
- Discovering and Visualizing the Data to Gain Insights
- Preparing the Data for Machine Learning Algorithms
- Selecting and Training a Model
- Fine Tuning Your Model
This hands-on lesson is part of the Introduction to Data Science Workshop Series being offered by KVL as part of our on-going efforts to build capacity in core data science skills both at KAUST and within the Kingdom of Saudi Arabia (KSA).
The workshop will be hosted on Zoom with the relevant details sent out to registered participants prior to the event. The workshop will be recorded with the recording released via the KVL YouTube Channel.
If you are interested in registering for this workshop, then please see the following links.
Project organization is based on ideas from Good Enough Practices for Scientific Computing.
- Put each project in its own directory, which is named after the project.
- Put external scripts or compiled programs in the
bin
directory. - Put raw data and metadata in a
data
directory. - Put text documents associated with the project in the
doc
directory. - Put all Docker related files in the
docker
directory. - Install the Conda environment into an
env
directory. - Put all notebooks in the
notebooks
directory. - Put files generated during cleanup and analysis in a
results
directory. - Put project source code in the
src
directory. - Name all files to reflect their content or function.
After adding any necessary dependencies for your project to the Conda environment.yml
file
(or the requirements.txt
file), you can create the environment in a sub-directory of your
project directory by running the following command.
ENV_PREFIX=$PWD/env
conda env create --prefix $ENV_PREFIX --file environment.yml --force
Once the new environment has been created you can activate the environment with the following command.
conda activate $ENV_PREFIX
Note that the ENV_PREFIX
directory is not under version control as it can always be re-created as
necessary.
If you wish to use any JupyterLab extensions included in the environment.yml
and requirements.txt
files then you need to activate the environment and rebuild the JupyterLab application using the
following commands to source the postBuild
script.
conda activate $ENV_PREFIX # optional if environment already active
source postBuild
For your convenience these commands have been combined in a shell script ./bin/create-conda-env.sh
.
Running the shell script will create the Conda environment, activate the Conda environment, and build
JupyterLab with any additional extensions. The script should be run from the project root directory as
follows.
./bin/create-conda-env.sh
The list of explicit dependencies for the project are listed in the environment.yml
file. To see
the full lost of packages installed into the environment run the following command.
conda list --prefix $ENV_PREFIX
If you add (remove) dependencies to (from) the environment.yml
file or the requirements.txt
file
after the environment has already been created, then you can re-create the environment with the
following command.
conda env create --prefix $ENV_PREFIX --file environment.yml --force
If you add any additional JupyterLab extensions to the postBuild
file, then you should run the
environment creation script again.
./bin/create-conda-env.sh
In order to build Docker images for your project and run containers you will need to install Docker and Docker Compose.
Detailed instructions for using Docker to build and image and launch containers can be found in
the docker/README.md
.