Fengyuan Shi*, Zhuoyan Luo*, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang
Nanjing University, Tsinghua University, ARC Lab Tencent PCG
This is the official repository for Index Backpropagation Quantization (IBQ), a novel vector quantization (VQ) method that revolutionizes the scalability and performance of visual tokenizers.
- 🚀 Scalable Visual Tokenizers: IBQ enables scalable training of visual tokenizers, and achieves a large-scale codebook of size (262144) and high-dimensional embeddings (256), ensuring high utilization.
- 💡 Innovative Approach: Unlike conventional VQ methods prone to codebook collapse due to the partial-updating, IBQ leverages a straight-through estimator on the categorical distribution, enabling the joint optimization of all codebook embeddings and the visual encoder, for consistent latent space.
- 🏆 Superior Performance: Demonstrates competitive results on ImageNet:
- Reconstruction: 1.00 rFID, outperforming Open-MAGVIT2 (1.17 rFID)
- Autoregressive Visual Generation: 2.05 gFID, outperforming previous vanilla autoregressive transformers.
This repository provides the scripts and checkpoints to replicate our results.
-
$256\times 256$ Tokenizer Training
bash scripts/train_tokenizer/IBQ/run_16384.sh MASTER_ADDR MASTER_PORT NODE_RANK
bash scripts/train_tokenizer/IBQ/run_262144.sh MASTER_ADDR MASTER_PORT NODE_RANK
Method | #Tokens | Codebook Size | rFID | LPIPS | Codebook Utilization | Checkpoint |
---|---|---|---|---|---|---|
IBQ | 16 |
1024 | 2.24 | 0.2580 | 99% | Tokenizer-1024 |
IBQ | 16 |
8192 | 1.87 | 0.2437 | 98% | Tokenizer-8192 |
IBQ | 16 |
16384 | 1.37 | 0.2235 | 96% | Tokenizer-16384 |
IBQ | 16 |
262144 | 1.00 | 0.2030 | 84% | Tokenizer-262144 |
bash scripts/evaluation/evaluation_256.sh
Please see in scripts/train_autogressive/run.sh for different model configurations.
bash scripts/train_autogressive/run.sh MASTER_ADDR MASTER_PORT NODE_RANK
Please see in scripts/train_autogressive/run.sh for different sampling hyper-parameters for different scale of models.
bash scripts/evaluation/sample_npu.sh or scripts/evaluation/sample_gpu.sh Your_Total_Rank
Method | Params | #Tokens | FID | IS | Checkpoint |
---|---|---|---|---|---|
IBQ | 342M | 16 |
2.88 | 254.73 | AR_256_B |
IBQ | 649M | 16 |
2.45 | 267.48 | AR_256_L |
IBQ | 1.1B | 16 |
2.14 | 278.99 | AR_256_XL |
IBQ | 2.1B | 16 |
2.05 | 286.73 | AR_256_XXL |