Microsoft COCO Caption Evaluation (Python 3)

Evaluation codes for MS COCO caption generation.

Requirements

java 1.8.0
python 3.2

modifications

2to3
flush stdin for Popen processes

Files

./

cocoEvalCapDemo.py (demo script)

./annotation

captions_val2014.json (MS COCO 2014 caption validation set)
Visit MS COCO download page for more details.

./results

captions_val2014_fakecap_results.json (an example of fake results for running demo)
Visit MS COCO format page for more details.

./pycocoevalcap: The folder where all evaluation codes are stored.

evals.py: The file includes COCOEavlCap class that can be used to evaluate results on COCO.
tokenizer: Python wrapper of Stanford CoreNLP PTBTokenizer
bleu: Bleu evalutation codes
meteor: Meteor evaluation codes
rouge: Rouge-L evaluation codes
cider: CIDEr evaluation codes

References

Microsoft COCO Captions: Data Collection and Evaluation Server
PTBTokenizer: We use the Stanford Tokenizer which is included in Stanford CoreNLP 3.4.1.
BLEU: BLEU: a Method for Automatic Evaluation of Machine Translation
Meteor: Project page with related publications. We use the latest version (1.5) of the Code. Changes have been made to the source code to properly aggreate the statistics for the entire corpus.
Rouge-L: ROUGE: A Package for Automatic Evaluation of Summaries
CIDEr: [CIDEr: Consensus-based Image Description Evaluation] (http://arxiv.org/pdf/1411.5726.pdf)

Developers

Xinlei Chen (CMU)
Hao Fang (University of Washington)
Tsung-Yi Lin (Cornell)
Ramakrishna Vedantam (Virgina Tech)

Acknowledgement

David Chiang (University of Norte Dame)
Michael Denkowski (CMU)
Alexander Rush (Harvard University)