The goals / steps of this project are the following:
- Use the simulator to collect data of good driving behavior
- Build, a convolution neural network in Keras that predicts steering angles from images
- Train and validate the model with a training and validation set
- Test that the model successfully drives around track one without leaving the road
- Summarize the results with a written report
Above animation shows a scene from track2 that how the trained neural network determines (somewhat) sharp turns from conv layer4 (left) and conv layer1 (right), the color overlay on the image shows where in the image made the model decide this. Numbers show the turning angle.
My submitted project includes the following files:
- model.py contains the script to create and train the model
- drive.py for driving the car in autonomous mode
- model.h5 containing a trained convolution neural network
- track1_recording.mp4, track2_recording.mp4 recorded the process how the model passed both tracks.
Other files in the repository:
- README.md, this file (earlier versions recorded the history of baby steps in training and validation the model).
- analyze_data.py is used to analyze the distribution of recorded training data, say histogram of steering angle against speed or throttle.
- cam.py is used to mapping a set of (recorded, but not necessaily) images to (GAM) graidents activation mappings, which is used to help understand what piece of the image the model sees determines its steering decision. [Updated 2017-05: updated cam_2.py generated all layers GAM images. Details below]
- fitgen_test.py is used to dump datagenerator generated images. Used for sanity checking whether the changes are desired.
- preprocess.py is used to turn recorded images and angles into HDF5 data files, along with optional data augmentations, say image brightness change, left/right image augmentation, random shadow generation etc.
- quiver_test.py is used to leverage quiver_engine library to show neural network internals of all conv layers.
Using the Udacity provided simulator (earlier one, track 2 was curvy dark road in black mountains) and my drive.py file, the car can be driven autonomously around the track by executing
python3 drive.py model.h5 [recording_dir]
If [recording_dir] is specified, frame images will be automatically saved for later analysis.
The model.py file contains the code for training and saving the convolution neural network. The file shows the pipeline I used for training and validating the model, and it contains comments to explain how the code works.
python3 model.py <data.h5> <epoch_cnt>
The data.h5 file is prepared by preprocess.py in HDF5 format using data augmentation techniques described below. Raw data is recorded in a directory using Udacity self driving car simulator. To generate data.h5 file:
python3 preprocess.py <data_dir> [flip]
Recorded images in img_dir, kick off:
python3 cam_2.py model.h5 <img_dir>
A <img_dir>_cam dir will be generated. 6 sets of cam-annotated images will be generated: *.layer[1-5].cam.jpg, or *.layers.cam.jpg. First 5 each specify cam annotation up to that specific conv layer 1-5. Last one is a combined annotation.
The steps are following NVidia new paper Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car.
TODO: The de-convolution way of upsampling is not implemented, but simply using a cv2.resize() which should have caused a lot of issues -- a lot of final cam image of all layers has nothing left in cam. But it does show the intention and works as the gif shows above.
To convert images to video:
python3 video.py <img_dir>
To convert images to gif (Ubuntu):
convert -delay 20 -loop 0 *.jpg cam_track2_updated.gif
My model started with NVidia end-to-end neural network described in this paper (link4) and it was slightly modified.
It consists of a convolution neural network with 5 layers of convolution of 3x3 or 5x5 filter sizes and depths between 36 and 95 (model.py lines 71-113). Convolution layers were widened during the fine-tuning process. The architecture of my network is shown in a below section.
The model includes RELU layers to introduce nonlinearity at the output of every layer, and the data is normalized in preprocess.py (line 280) by a manually written normalization function normalize_color().
The model contains dropout layers in order to reduce overfitting (model.py code 81, 85, 89, 93, 99, 103).
The model was trained and validated on different data sets to ensure that the model was not overfitting (model.py code line 47)for current datasets collected in two simulator tracks. The model was tested by running it through the simulator and ensuring that the vehicle could stay on both the tracks.
(Update: 2017 May. The trained NN turned out to badly overfitting when trying on yet another new track in the newer version of car simulator. This is understood as the training dataset is so limited, also the back-annotated heatmap shows the trained NN doens't look at the desired features, such as lanemarks.)
The model used an adam optimizer, so the learning rate was not tuned manually (model.py line 123).
Other optimizers were tried during the training process, including SGD and RMSprop, but it ended up with adam.
Training data was chosen to keep the vehicle driving on the road. I used a combination of center lane driving, recovering from the left and right sides of the road ...
For details about how I created the training data, see the next section.
The overall strategy for deriving a model architecture was to try, error, analyze failures and improve.
NVidia-model: My first step was to use a convolution neural network model similar to the NVidia end-to-end paper. I thought this model might be appropriate because it is a proven model that works for real-world road autonomous driving.
Image cropping: I started with dataset provided by Udacity only. Also, the nvidia network expects image input size of 200x66, and because I believe the upper 1/3-1/4 part of the input image has no meaning to determine my steering angle, I did a cropping of the upper part then scale to the size of 200x66 in preprocessing.py.
img = cv2.imread(DATA_DIR+"/"+row[0])
img_crop = img[56:160,:,:]
img_resize = cv2.resize(img_crop, (200,66))
Started simple: The training process started with only three images, one with right steering angle, one with left steering angle and the last one with almost 0 angle. I verified my initial Keras model coding can overfit with these three images as input. This serves as a very good practice of sanity check that the model is crafted right and has learning capability.
Train/Validate split: In order to gauge how well the model was working, I split my image and steering angle data into a training and validation set. I found that my first model had a low mean squared error on the training set but a high mean squared error on the validation set. This implied that the model was overfitting.
Overfitting: To combat the overfitting, I modified the model by inserting dropouts to layers so that each layer can learn "redundant" features that even some are dropped in dropout layer, it can still predict the right angle. It did work.
Test: The final step was to run the simulator to see how well the car was driving around track one. There were a few spots where the vehicle fell off the track (say, the first left turn before black bridge, the left turn after black bridge, and then the right turn after that)... to improve the driving behavior in these cases, I purposely recorded recovery behavior (from curb side to center of the road) along the tracks. Then the car can finish track 1 completely.
At the end of the process, the vehicle is able to drive autonomously around the track without leaving the road.
The final model architecture (model.py lines 18-24) consisted of a convolution neural network with the following layers and layer sizes ...
Here is a visualization of the architecture.
Training data selection and preparation is a key starting step to pave base for all future work. Otherwise, it is just garbage in garbage out -- waste of time.
When I casually collected my data, my data looks like this: left turn dominant and speed capped at fastest pace. Such data won't cover all/enough cases for model to learn good behavior in all scenarios in simulator or real world.
Udacity provided training data contains good driving behavior.
I started with Udacity's data using center image only. Here is an example image of center lane driving:
I also applied a scaling factor to left/right image as below.
SMALL_SCALE = 0.9
LARGE_SCALE = 1.1
# Scale angle a bit for left/right images
if angle > 0: # right turn
l_angle = LARGE_SCALE * angle
r_angle = SMALL_SCALE * angle
else: # left turn
l_angle = SMALL_SCALE * angle
r_angle = LARGE_SCALE* angle
I then recorded the vehicle recovering from the left side and right sides of the road back to center so that the vehicle would learn to .... These images show what a recovery looks like starting from ... :
Then I repeated this process on track two in order to get more data points.
After the collection process, I had ~20,000 number of data points. I then preprocessed this data by data augmentation, For example, modifying image brightness histogram and adding random shadows. For example, here are example images:
Random shadow augmentation (copied code from Vivek's blog Vivek's blog link5), which helps A LOT for track 2 with shadows on.
I finally randomly shuffled the data set and put 20% of the data into a validation set.
I used this training data for training the model. The validation set helped determine if the model was over or under fitting. The ideal number of epochs was 4-5 as evidenced by validation loss no longer goes down. I used an adam optimizer so that manually training the learning rate wasn't necessary.
During the process of training, I felt very uneasy as it is almost like a blackbox. Whenever the model failed to proceed at a certain spot, it is very hard to tell what went wrong. Although my model passed both tracks, the process of try and error and meddle around with different combination of configurations is quite frustrating.
Quiver engine (github: link6) is a web-based tool based on Java and Python/Keras to grab internal info. about your neural network.
Below is an example page showing the activation output of my model's first convolution layer.
The quiver engine is helpful, but not very straight-forward, as there are too many filters at each convolution layer. At the end of my project, I found a very good blog (link1) describing the idea of Activation Mapping. The blog itself was referring to papers: link2 and link3.
The whole idea is to using heatmap to highlight locality areas contributing most to the final decision. It was designed for classification purpose, but with slight change, it can be applied to our steering angle predictions.
Cam.py has the implementation. I ran out of time to carefully verifying it, but it seems working pretty well. Here's an example showing the output of convolution layer 4. (Layer 5 is of side 1x18x96 of my design, which lost 1 dimension so conv4 output is more appropriate. )
Above image shows a scene from track2 that the number 0.281566 indicates a (somewhat) sharp right turn, the color overlay on the image shows where in the image made the model decided this.
Above images shows consecutively convolution layers 1-5 GAM annoation. Last image shows the combined result of all layers.
- The model doesn't generalize enough. Need to leverage GAM information to investigate if time allows.
- To derive from keras datagenerator as my own class so data augmentation can be done on the fly instead of using HDF5 tables now.