This package provides scripts to setup Tensorflow using pip
or Tensorflow without using pip
with GPU support on casper without using any outside CUDA modules.
The key steps for setting up TensorFlow are to first install the correct versions of cudatoolkit
and cudnn
packages using the Casper conda
module. 'tensorrt' is also installed for the pip
version.
Second, environment variables need to be set correctly to point TensorFlow to the conda-based CUDA installation and associated libraries. These are set during the activation of each conda environment.
Third, the XLA_FLAGS
environment variable needs to be set to include the path to the conda environment.
- (Optional) Install MiniConda or MambaForge to your local machine if not running on Casper.
cd
totf_pip
ortf_nopip
depending on desire to use TensorFlow installed with pip or installed fromconda-forge
channel.- Run
sh setup_conda_tfXXX.sh
where XXX is the version number in tf_pip or tf_nopip. This creates a conda environment with TensorFlow and the appropriate libraries and environment variables. - Start a batch job on a gpu node. You can start a 30 minute testing job with
execcasper -l select=1:ncpus=1:mem=20GB:ngpus=1 --gpu_type=v100 -q gpudev -A $PROJECT_ID
- Activate the environment with
module load conda
andconda activate tfXXXgpu
. - Run
python test_simple_nn.py
to test that the GPU is detected correctly and that a simple neural net will train on the GPU.
The pip
installation method provided in this repository is that recommended by Google which provides TensorFlow. The nopip
version is community supported and ditributed primarily through conda-forge
.
In general, pure installations using conda
are easier to maintain compared to pip
based installations. It is relatively easy to break dependency requirements when mixing pip
and conda
installs. Nonetheless, the pip
version of TensorFlow does include utility of the tensorrt
framework since the Python Package Index distributes the pip
version with this functionality. Additionally, the pip
version includes instruction sets for CPU operations up to AVX while the conda
version provides only up to SSE3.
Please consider these differences when choosing to install a version of TensorFlow.