Zhuoyan Luo*, Fengyuan Shi*, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan
ARC Lab Tencent PCG, Tsinghua University, Nanjing University
- 🚀 Super-large Codebook: Re-implements the advanced Lookup-Free Quantizer proposed by MAGVITv2, and achieves a super-large codebook (i.e., 2^18) with strong performance (1.17rFID).
- 💡 Auto-Regressive Innovation: Introduces asymmetric token factorization and the next sub-token prediction paradigm, enabling efficient generation with a super-large vocabulary and enhanced sub-token interactions.
- 🚀 Scalability: Validates the scalability of plain auto-regressive models across various parameter sizes (300M to 1.5B).
This repository provides the scripts and checkpoints to replicate our results.
- [ ✔ ] Better image tokenizer with scale-up training.
- [ ✔ ] Finalize the training of the autoregressive model.
- Video tokenizer and the corresponding autoregressive model.
🤗 Open-MAGVIT2 is still at an early stage and under active development. Stay tuned for the update!
-
$128\times 128$ Tokenizer Training
bash scripts/train_tokenizer/Open-MAGVIT2/run_128_L.sh MASTER_ADDR MASTER_PORT NODE_RANK
-
$256\times 256$ Tokenizer Training
bash scripts/train_tokenizer/run_256_L.sh MASTER_ADDR MASTER_PORT NODE_RANK
-
$128\times 128$ Tokenizer Evaluation
bash scripts/evaluation/evaluation_128.sh
-
$256\times 256$ Tokenizer Evaluation
bash scripts/evaluation/evaluation_256.sh
Tokenizer
Method | Token Type | #Tokens | Train Data | Codebook Size | rFID | PSNR | Codebook Utilization | Checkpoint |
---|---|---|---|---|---|---|---|---|
Open-MAGVIT2-20240617 | 2D | 16 |
256 |
262144 | 1.53 | 21.53 | 100% | - |
Open-MAGVIT2-20240617 | 2D | 16 |
128 |
262144 | 1.56 | 24.45 | 100% | - |
Open-MAGVIT2 | 2D | 16 |
256 |
262144 | 1.17 | 21.90 | 100% | IN256_Large |
Open-MAGVIT2 | 2D | 16 |
128 |
262144 | 1.18 | 25.08 | 100% | IN128_Large |
Open-MAGVIT2* | 2D | 32 |
128 |
262144 | 0.34 | 26.19 | 100% | above |
(*) denotes that the results are from the direct inference using the model trained with
Please see in scripts/train_autogressive/run.sh for different model configurations.
bash scripts/train_autogressive/run.sh MASTER_ADDR MASTER_PORT NODE_RANK
Please see in scripts/train_autogressive/run.sh for different sampling hyper-parameters for different scale of models.
bash scripts/evaluation/sample_npu.sh or scripts/evaluation/sample_gpu.sh Your_Total_Rank
Method | Params | #Tokens | FID | IS | Checkpoint |
---|---|---|---|---|---|
Open-MAGVIT2 | 343M | 16 |
3.08 | 258.26 | AR_256_B |
Open-MAGVIT2 | 804M | 16 |
2.51 | 271.70 | AR_256_L |
Open-MAGVIT2 | 1.5B | 16 |
2.33 | 271.77 | AR_256_XL |