Skip to content

Latest commit

 

History

History
83 lines (65 loc) · 4.38 KB

IBQ.md

File metadata and controls

83 lines (65 loc) · 4.38 KB

IBQ: Taming Scalable Visual Tokenizer for Autoregressive Image Generation

Fengyuan Shi*, Zhuoyan Luo*, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang
Nanjing University, Tsinghua University, ARC Lab Tencent PCG

arXiv 

This is the official repository for Index Backpropagation Quantization (IBQ), a novel vector quantization (VQ) method that revolutionizes the scalability and performance of visual tokenizers.

Highlights

  • 🚀 Scalable Visual Tokenizers: IBQ enables scalable training of visual tokenizers, and achieves a large-scale codebook of size (262144) and high-dimensional embeddings (256), ensuring high utilization.
  • 💡 Innovative Approach: Unlike conventional VQ methods prone to codebook collapse due to the partial-updating, IBQ leverages a straight-through estimator on the categorical distribution, enabling the joint optimization of all codebook embeddings and the visual encoder, for consistent latent space.
  • 🏆 Superior Performance: Demonstrates competitive results on ImageNet:
    • Reconstruction: 1.00 rFID, outperforming Open-MAGVIT2 (1.17 rFID)
    • Autoregressive Visual Generation: 2.05 gFID, outperforming previous vanilla autoregressive transformers.

This repository provides the scripts and checkpoints to replicate our results.

🔥 Quick Start

Stage I: Training of Visual Tokenizer

🚀 Training Scripts

  • $256\times 256$ Tokenizer Training
bash scripts/train_tokenizer/IBQ/run_16384.sh MASTER_ADDR MASTER_PORT NODE_RANK
bash scripts/train_tokenizer/IBQ/run_262144.sh MASTER_ADDR MASTER_PORT NODE_RANK

🍺 Performance and Models

Method #Tokens Codebook Size rFID LPIPS Codebook Utilization Checkpoint
IBQ 16 $\times$ 16 1024 2.24 0.2580 99% Tokenizer-1024
IBQ 16 $\times$ 16 8192 1.87 0.2437 98% Tokenizer-8192
IBQ 16 $\times$ 16 16384 1.37 0.2235 96% Tokenizer-16384
IBQ 16 $\times$ 16 262144 1.00 0.2030 84% Tokenizer-262144

🚀 Evaluation Scripts

bash scripts/evaluation/evaluation_256.sh

Stage II: Training of Auto-Regressive Models

🚀 Training Scripts

Please see in scripts/train_autogressive/run.sh for different model configurations.

bash scripts/train_autogressive/run.sh MASTER_ADDR MASTER_PORT NODE_RANK

🚀 Sample Scripts

Please see in scripts/train_autogressive/run.sh for different sampling hyper-parameters for different scale of models.

bash scripts/evaluation/sample_npu.sh or scripts/evaluation/sample_gpu.sh Your_Total_Rank

🍺 Performance and Models

Method Params #Tokens FID IS Checkpoint
IBQ 342M 16 $\times$ 16 2.88 254.73 AR_256_B
IBQ 649M 16 $\times$ 16 2.45 267.48 AR_256_L
IBQ 1.1B 16 $\times$ 16 2.14 278.99 AR_256_XL
IBQ 2.1B 16 $\times$ 16 2.05 286.73 AR_256_XXL