In this project, we'll construct a fully convolutional neural network based on the VGGNet-16 architecture to perform semantic segmentation on a video captured from a front facing camera mounted on a vehicle dashboard to identify the drivable surface area. To achieve this task, we'll implement Fully Convolutional Networks for Semantic Segmentation, a paper published by Jonathan Long, Evan Shelhamer and Trevor Darrell from UC Berkeley that adapt contemporary classification network VGGNet into fully convolutional network and transfers its learned representations by fine-tuning to the segmentation task.
A pre-trained VGG-16 network was converted to a fully convolutional network by converting the final fully connected layer to a 1x1 convolution and setting the depth equal to the number of desired classes (in this case, two: road and not-road).
Run the following command to run the project:
python main.py
Note: If running this in Jupyter Notebook system messages, such as those regarding test status, may appear in the terminal rather than the notebook.
main.py
will check to make sure you are using GPU - if you don't have a GPU on your system, you can use AWS or another cloud computing platform.
Make sure you have the following is installed:
Download the Kitti Road dataset from here. Extract the dataset in the data
folder. This will create the folder data_road
with all the training a test images.