What is PanoLlama:
- New Paradigm: A novel framework that redefines panoramic image generation as a next-token prediction task, fundamentally demonstrating its superiority over diffusion-based methods.
- Speed Up: We developed a training-free autoregressive strategy on the pre-trained LlamaGen architecture, achieving panorama generation of high quality and arbitrary size.
- Versatile Applications: Beyond text-to-panorama generation, it also supports multi-scale, multi-layout, and multi-guidance generation tasks.
- Comprehensive Evaluation: We evaluate our method across a range of baselines and metrics, ensuring the reliability of our experimental results.
For more details, please visit our paper page.
Configuration Set up and configure the environment by installing the required packages:
pip install -r requirements.txt
Pre-trained Models Download pre-trained models from LlamaGen, and place them in the folder /models
under the corresponding modules:
module | model | params | tokens | weight |
---|---|---|---|---|
text encoder | FLAN-T5-XL | 3B | / | flan-t5-xl |
image tokenizer | VQVAE | 72M | 16x16 | vq_ds16_t2i.pt |
token generator | LlamaGen-XL | 775M | 32x32 | t2i_XL_stage2_512.pt |
Generation We support panorama expansion in vertical, horizontal, and both directions. Try the following command to generate a horizontal one:
python -m token_generator.sample \
--seed -1 \
--times 12 \
--addit-cols 24 \
--lam 1 \
--gen-mode h \
--n 1
If you find our work helpful, please consider citing:
@article{zhou2024panollama,
title={PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs},
author={Zhou, Teng and Zhang, Xiaoyu and Tang, Yongchuan},
journal={arXiv preprint arXiv:2411.15867},
year={2024}
}