PyTorch implementation and pre-trained models for paper APS: Asymmetric Patch Sampling for Contrastive Learning.
APS is a novel asymmetric patch sampling strategy for contrastive learning, to further boost the appearance asymmetry for better representations. APS significantly outperforms the existing self-supervised methods on both ImageNet-1K and CIFAR dataset, e.g., 2.5% finetune accuracy improvement on CIFAR100. Additionally, compared to other self-supervised methods, APS is more efficient on both memory and computation during training.
conda create -n asp python=3.9
pip install -r requirements.txt
Torchvision provides CIFAR10
, CIFAR100
datasets. The root paths of data are respectively set to ./dataset/cifar10
and ./dataset/cifar100
. ImageNet-1K
dataset is placed at ./dataset/ILSVRC
.
To start the APS pre-training, simply run the following commands.
arch
is the architecture of the pre-trained models,you can choosevit-tiny
,vit-small
andvit-base
.dataset
is the pre-trained dataset.data-root
is the path of the dataset.nepoch
is the pre-trained epochs.
Run APS with ViT-Small/2
network on a single node on CIFAR100
for 1600 epochs with the following command.
python main_pretrain.py --arch='vit-small' --dataset='cifar100' --data-root='./dataset/cifar100' --nepoch=1600
To finetune ViT-Small/2
on CIFAR100
with the following command.
python main_finetune.py --arch='vit-small' --dataset='cifar100' --data-root='./dataset/cifar100' \
--pretrained-weights='./weight/pretrain/cifar100/small_1600ep_5e-4_100.pth'
- CIFAR10 and CIFAR100
Dataset | Training (#Epochs) | ViT-Tiny/2 | ViT-Small/2 | ViT-Base/2 |
---|---|---|---|---|
CIFAR10 | Pretrain (1600) | download | download | download |
Finetune (100) | download | download | download | |
Accuracy | 97.2% | 98.1% | 98.2% | |
Pretrain (3200) | download | download | download | |
Finetune (100) | download | download | download | |
Accuracy | 97.5% | 98.2% | 98.3% | |
CIFAR100 | Pretrain (1600) | download | download | download |
Finetune (100) | download | download | download | |
Accuracy | 83.4% | 84.9% | 85.9% | |
Pretrain (3200) | download | download | download | |
Finetune (100) | download | download | download | |
Accuracy | 83.4% | 85.3% | 86.0% |
- ImageNet-1K
Backbone | Pretrain (300 epochs) | Finetune (100 epochs) |
---|---|---|
ViT-S/16 | download | 82.1% (download) |
ViT-B/16 | download | 84.2% (download) |
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
@article{shen2025asymmetric,
title={Asymmetric Patch Sampling for Contrastive Learning},
author={Shen, Chengchao and Chen, Jianzhong and Wang, Shu and Kuang, Hulin and Liu, Jin and Wang, Jianxin},
journal={Pattern Recognition},
year={2025}
}