GitHub - Bazs/automatic_image_captioning: Image captioning using recurrent neural networks

Automatic Image Captioning

A neural network architecture which automatically generates captions from images. The architecture and hyperparameters are inspired the work of Vinyals et al. [1] and Xu et al. [2].

This is my submission for the image captioning project in the Udacity Computer Vision Nanodegree.

My nanodegree certificate: https://confirm.udacity.com/SYAMKDHY

Requirements

The model was developed in cloud-hosted JupyterLab environment, with some custom packages from Udacity available. The requirements.txt file can help you get started reproducing the results. The trained model weights are also included in the models/ folder. The weights are stored using Git LFS, which needs to be installed before checking out the repository.

Network Architecture

Image Captioning Model

The model is based on a decoder-endcoder architecture. The encoder is a ResNet-50 model trained on the ImageNet dataset. Its final layer is connected to an embedding layer, whose output serves as the initial input to the LSTM-based RNN decoder.

For the full model architecture please refer to model.py.

Training

I have kept the pre-trained ResNet weights frozen during training. The embedding layer an the decoder were trained from scratch using the COCO dataset.

For details on hyperparameter choices, please refer to 2_Training.pdf.

Inference

Inference is implemented using Sampling, where at each step of the RNN, the word with the highest softmax probability is selected as output, and is used as input for the next step after passing it through an embedding layer. Refer to 3_Inference.pdf for example outputs on the test dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
2_Training.pdf		2_Training.pdf
3_Inference.ipynb		3_Inference.ipynb
3_Inference.pdf		3_Inference.pdf
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
requirements.txt		requirements.txt
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Image Captioning

Requirements

Network Architecture

Training

Inference

References

About

Releases

Packages

Languages

Bazs/automatic_image_captioning

Folders and files

Latest commit

History

Repository files navigation

Automatic Image Captioning

Requirements

Network Architecture

Training

Inference

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages