In this project, we develop a hand gesture recognition application using neural networks. It would be helpful for many areas: disability assistant, games, etc.
We're using the Sign Language MNIST dataset from Kaggle.
-
The dataset format is patterned to match closely with the classic MNIST.
-
Each training and test case represents a label (0-25) as a one-to-one map for each alphabetic letter A-Z (and NO cases for
9=J
or25=Z
because of gesture motions). -
The training data (27,455 cases) and test data (7172 cases) are approximately half the size of the standard MNIST but otherwise similar with a header row of label, pixel1,pixel2….pixel784 which represent a single 28x28 pixel image with grayscale values between 0-255.
pip install -r requirements.txt
wget http://i13pc106.ira.uka.de/~tha/PNNProjects/sign-language-mnist.zip
mkdir data
unzip sign-language-mnist.zip -d data/
rm sign-language-mnist.zip
Before training, combination of transformations as well as data augmentation are applied, including:
- random rotation
- random horizontal flip
- randomly changing the brightness, contrast and saturation
- random resized cropping
Original images:
Data augmentation:
Implmentation of models can be found in folder models/
.
Configuration (including dataset path, hyperparameters, etc) is defined in config.yaml
.
To start training,
-
Launch Tensorboard
tensorboard --logdir=runs/sign_languange
-
Create another terminal session and run
python train.py
You can specify the model you're going to train by
-m
argument. For example, trainsimple_cnn
:python train.py -m simple_cnn
-
Open browser, navigate to https://localhost:6006 to monitor training.
After training, the best model will be saved in saved_models/
.
To evaluate the trained model on the test set, run
python test.py -m <trained-model>
To infer with trained model on desired input image, run
python infer.py <trained-model> <input-image>
Training with a simple neural network (3 convolutional layers + 3 Fully Connected (FC) layers) using SGD with momentum for 40 epochs:
After training, the model can achieve 98.2432% accuracy on the test set.
Our best model is saved under models/CNN_best.pt
. It achieves the same accuracy on our test set and also generalizes well on ASL Alphabet as well as American Sign Language Dataset.