Generalized Contrastive Loss

Visual place recognition is a challenging task in computer vision and a key component of camera-based localization and navigation systems. Recently, Convolutional Neural Networks (CNNs) achieved high results and good generalization capabilities. They are usually trained using pairs or triplets of images labeled as either similar or dissimilar, in a binary fashion. In practice, the similarity between two images is not binary, but rather continuous. Furthermore, training these CNNs is computationally complex and involves costly pair and triplet mining strategies. We propose a Generalized Contrastive loss (GCL) function that relies on image similarity as a continuous measure, and use it to train a siamese CNN. Furthermore, we propose three techniques for automatic annotation of image pairs with labels indicating their degree of similarity, and deploy them to re-annotate the MSLS, TB-Places, and 7Scenes datasets. We demonstrate that siamese CNNs trained using the GCL function and the improved annotations consistently outperform their binary counterparts. Our models trained on MSLS outperform the state-of-the-art methods, including NetVLAD, and generalize well on the Pittsburgh, TokyoTM and Tokyo 24/7 datasets. Furthermore, training a siamese network using the GCL function does not require any pair mining.

Paper and license

The code is licensed under the MIT License.

If you use our code please cite our paper

@article{leyvavallina2021gcl,
  title={Generalized Contrastive Optimization of Siamese Networks for Place Recognition}, 
  author={María Leyva-Vallina and Nicola Strisciuglio and Nicolai Petkov},
  journal={arXiv preprint arXiv:2103.06638},
  year={2021}
  url={https://arxiv.org/abs/2103.06638}
}

Network attention

Contact details

If you have any doubts please contact us at:

María Leyva-Vallina: m.leyva.vallina at rug dot nl
Nicola Strisciuglio: n.strisciuglio at utwente dot nl

How to use this library

Download the data

MSLS: The dataset is available on request here. For the new GT annotations, please register here.
Pittsburgh: The whole dataset is available on request here and the train val splits for Pitts30k are available here.
TokyoTM: The dataset is available on request here.
Tokyo 24/7: The dataset is available on request here.
TB-Places: The dataset is available here. For the new GT annotations, please register here.
7Scenes: The dataset is available here. For the new GT annotations, please register here.

Download the models

All our models can be downloaded from here.

Our results

MSLS

Backbone	Whitening	Pooling	Dimensions	Loss	R@1	R@5	R@10	mAP@1	mAP@5	mAP@10
VGG	No	GeM¹	512	TL	28	35	49	-	-	-
VGG	No	NetVLAD¹	32768	TL	30	40	44	-	-	-
VGG	No	NetVLAD¹	32768	TL	48	58	64	-	-	-
VGG	No	PatchNetVLAD²	4096	TL	48.1	57.6	60.5	-	-	-
ResNet50	No	avg	2048	CL	24.9	39.0	44.6	24.9	16.8	14.8
ResNet50	No	avg	2048	GCL	35.8	52.0	59.0	35.8	24.5	21.8
ResNet50	No	GeM	2048	CL	29.7	44.0	50.7	29.7	20.6	18.1
ResNet50	No	GeM	2048	GCL	43.3	59.1	65.0	43.3	30	26.8
ResNet152	No	avg	2048	CL	29.7	44.2	51.3	29.7	19.4	17.2
ResNet152	No	avg	2048	GCL	43.5	59.2	65.2	43.5	29.5	26.4
ResNet152	No	GeM	2048	CL	34.1	50.8	56.8	34.1	23.6	20.8
ResNet152	No	GeM	2048	GCL	45.7	62.3	67.9	45.7	31.4	28.3
ResNet50	Yes	GeM	2048	GCL	52.9	65.7	71.9	52.9	37.3	33.4
ResNet152	Yes	GeM	2048	GCL	57.9	70.7	75.7	57.9	40.7	36.6
ResNeXt-101-32x8d	Yes	GeM	1024	GCL	62.3	76.2	81.1	62.3	47	43.8

To reproduce them

Run the labeling/create_json_idx.py file to generate the necessary json index files for the dataset.

python3 labeling/create_json_idx.py --dataset msls --root_dir /mydir/MSLS/

Run the extract_predictions.py script to compute the map and query features, and the top-k prediction. For instance:

python3 extract_predictions.py --dataset MSLS --root_dir /mydir/MSLS/ --subset val --model_file models/MSLS/MSLS_resnet152_avg_480_GCL.pth --backbone resnet152 --pool avg --norm L2 --image_size 480,640 --batch_size 4

This will produce the file results/MSLS/val/MSLS_resnet152_avg_480_GCL_predictions.txt that you should use to evaluate the MSLS_resnet152_avg_480_GCL model in the MSLS repository.

TB-Places

Run the extract_predictions.py script to compute the map and query features, and the map-query distances. For instance:

python3 extract_predictions.py --dataset TB_Places --root_dir /mydir/TB_Places/ --subset W18_W17 --model_file models/TB_Places/resnet34_avg_GCL.pth --backbone resnet34 --pool avg --image_size 224 --batch_size 4 --query_idx_file /mydir/TB_Places/W18/W18.json --map_idx_file /mydir/TB_Places/W17/W17.json --f_length 512

python3 extract_predictions.py --dataset TB_Places --root_dir /mydir/TB_Places/ --subset W18_map_query --model_file models/TB_Places/resnet34_avg_GCL.pth --backbone resnet34 --pool avg --image_size 224 --batch_size 4 --query_idx_file /mydir/TB_Places/W18/W18_query.json --map_idx_file /mydir/TB_Places/W18/W18_map.json --f_length 512

For obtaining the top-k recall, run the script eval_recallatk.py. By default, the K values are 1,2,3,4,5,10,15,20,25.

python3 eval_recallatk.py --prediction_distance_file results/TB_Places/W18_W17/resnet34_avg_GCL_distances.npy --gt_file /mydir/TB_Places/W18_W17_gt.h5

python3 eval_recallatk.py --prediction_distance_file results/TB_Places/W18_map_query/resnet34_avg_GCL_distances.npy --gt_file /mydir/TB_Places/W18_map_query_gt.h5

7Scenes

Run the labeling/create_json_idx.py file to generate the necessary json index files for the dataset.

python3 labeling/create_json_idx.py --dataset 7scenes --root_dir /mydir/7Scenes/

Run the extract_predictions.py script to compute the map and query features, and the map-query distances. For instance:

python3 extract_predictions.py --dataset 7Scenes --root_dir /mydir/7Scenes/ --subset heads --model_file models/7Scenes/heads/resnet34_avg_GCL.pth --backbone resnet34 --pool avg --image_size 224 --batch_size 4 --query_idx_file /mydir/7Scenes/heads/test.json --map_idx_file /mydir/7Scenes/heads/train.json --f_length 512

This will produce the file results/7Scenes/heads/resnet34_avg_GCL_distances.npy, which we can use to evaluate the performance of the resnet34_avg_GCL model.

For obtaining the top-k recall, run the script eval_recallatk.py. By default, the K values are 1,2,3,4,5,10,15,20,25.

python3 eval_recallatk.py --prediction_distance_file results/7Scenes/heads/resnet34_avg_GCL_distances.npy --gt_file /mydir/7Scenes/heads_gt.h5

For obtaining the Average Precision, run the script eval_recallatk.py.

python3 eval_AP.py --prediction_distance_file results/7Scenes/heads/resnet34_avg_GCL_distances.npy --gt_file /mydir/7Scenes/heads_gt.h5

Train your own models

Coming soon

Define your own graded GT

Coming soon

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
labeling		labeling
.gitignore		.gitignore
README.md		README.md
attention.png		attention.png
datasets.py		datasets.py
eval_AP.py		eval_AP.py
eval_recallatk.py		eval_recallatk.py
extract_predictions.py		extract_predictions.py
factory.py		factory.py
license.md		license.md
networks.py		networks.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalized Contrastive Loss

Paper and license

Network attention

Contact details

How to use this library

Download the data

Download the models

Our results

MSLS

To reproduce them

TB-Places

7Scenes

Train your own models

Define your own graded GT

About

Releases

Packages

Languages

License

nicstrisc/generalized_contrastive_loss

Folders and files

Latest commit

History

Repository files navigation

Generalized Contrastive Loss

Paper and license

Network attention

Contact details

How to use this library

Download the data

Download the models

Our results

MSLS

To reproduce them

TB-Places

7Scenes

Train your own models

Define your own graded GT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages