OPEN-MAGVIT2: An Open-source Project Toward Democratizing Auto-Regressive Visual Generation

Zhuoyan Luo*, Fengyuan Shi*, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan
ARC Lab Tencent PCG, Tsinghua University, Nanjing University

This is the official repository for Open-MAGVIT2, an open-source project re-implementing Google's MAGVIT-v2 tokenizer and democratizing autoregressive visual generation with a super large vocabulary (i.e., 2^18).

Highlights

🚀 Super-large Codebook: Re-implements the advanced Lookup-Free Quantizer proposed by MAGVITv2, and achieves a super-large codebook (i.e., 2^18) with strong performance (1.17rFID).
💡 Auto-Regressive Innovation: Introduces asymmetric token factorization and the next sub-token prediction paradigm, enabling efficient generation with a super-large vocabulary and enhanced sub-token interactions.
🚀 Scalability: Validates the scalability of plain auto-regressive models across various parameter sizes (300M to 1.5B).

This repository provides the scripts and checkpoints to replicate our results.

🎤 TODOs

[ ✔ ] Better image tokenizer with scale-up training.
[ ✔ ] Finalize the training of the autoregressive model.
Video tokenizer and the corresponding autoregressive model.

🤗 Open-MAGVIT2 is still at an early stage and under active development. Stay tuned for the update!

🔥 Quick Start

Stage I: Training of Visual Tokenizer

🚀 Training Scripts

$128\times 128$ Tokenizer Training

bash scripts/train_tokenizer/Open-MAGVIT2/run_128_L.sh MASTER_ADDR MASTER_PORT NODE_RANK

$256\times 256$ Tokenizer Training

bash scripts/train_tokenizer/run_256_L.sh MASTER_ADDR MASTER_PORT NODE_RANK

🚀 Evaluation Scripts

$128\times 128$ Tokenizer Evaluation

bash scripts/evaluation/evaluation_128.sh

$256\times 256$ Tokenizer Evaluation

bash scripts/evaluation/evaluation_256.sh

🍺 Performance and Models

Tokenizer

Method	Token Type	#Tokens	Train Data	Codebook Size	rFID	PSNR	Codebook Utilization	Checkpoint
Open-MAGVIT2-20240617	2D	16 $\times$ 16	256 $\times$ 256 ImageNet	262144	1.53	21.53	100%	-
Open-MAGVIT2-20240617	2D	16 $\times$ 16	128 $\times$ 128 ImageNet	262144	1.56	24.45	100%	-
Open-MAGVIT2	2D	16 $\times$ 16	256 $\times$ 256 ImageNet	262144	1.17	21.90	100%	IN256_Large
Open-MAGVIT2	2D	16 $\times$ 16	128 $\times$ 128 ImageNet	262144	1.18	25.08	100%	IN128_Large
Open-MAGVIT2*	2D	32 $\times$ 32	128 $\times$ 128 ImageNet	262144	0.34	26.19	100%	above

(*) denotes that the results are from the direct inference using the model trained with $128 \times 128$ resolution without fine-tuning.

Stage II: Training of Auto-Regressive Models

🚀 Training Scripts

Please see in scripts/train_autogressive/run.sh for different model configurations.

bash scripts/train_autogressive/run.sh MASTER_ADDR MASTER_PORT NODE_RANK

🚀 Sample Scripts

Please see in scripts/train_autogressive/run.sh for different sampling hyper-parameters for different scale of models.

bash scripts/evaluation/sample_npu.sh or scripts/evaluation/sample_gpu.sh Your_Total_Rank

🍺 Performance and Models

Method	Params	#Tokens	FID	IS	Checkpoint
Open-MAGVIT2	343M	16 $\times$ 16	3.08	258.26	AR_256_B
Open-MAGVIT2	804M	16 $\times$ 16	2.51	271.70	AR_256_L
Open-MAGVIT2	1.5B	16 $\times$ 16	2.33	271.77	AR_256_XL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open-MAGVIT2.md

Open-MAGVIT2.md

OPEN-MAGVIT2: An Open-source Project Toward Democratizing Auto-Regressive Visual Generation

Highlights

🎤 TODOs

🔥 Quick Start

Stage I: Training of Visual Tokenizer

🚀 Training Scripts

🚀 Evaluation Scripts

🍺 Performance and Models

Stage II: Training of Auto-Regressive Models

🚀 Training Scripts

🚀 Sample Scripts

🍺 Performance and Models

Files

Open-MAGVIT2.md

Latest commit

History

Open-MAGVIT2.md

File metadata and controls

OPEN-MAGVIT2: An Open-source Project Toward Democratizing Auto-Regressive Visual Generation

Highlights

🎤 TODOs

🔥 Quick Start

Stage I: Training of Visual Tokenizer

🚀 Training Scripts

🚀 Evaluation Scripts

🍺 Performance and Models

Stage II: Training of Auto-Regressive Models

🚀 Training Scripts

🚀 Sample Scripts

🍺 Performance and Models