INF-MLLM/INF-MLLM1 at main · infly-ai/INF-MLLM

History

Name		Name	Last commit message	Last commit date
parent directory ..
docs		docs
evaluate		evaluate
infmllm		infmllm
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt

README.md

InfMLLM: A Unified Model for Visual-Language Tasks

Release

[12/06] Make the models and evaluation code available; the manuscript v2 will be posted on ArXiv in two days.
[11/06] Upload the initial version of the manuscript to arXiv.

Install

conda create -n infmllm python=3.9
conda activate infmllm
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Model Zoo

Both the multitask and instruction tuning models are now available on Hugging Face!

Evaluation

We conducted evaluations of the InfMLLM-7B multitask model across five VQA (Visual Question Answering) datasets and three visual grounding datasets. Meanwhile, the InfMLLM-7B-Chat model, tuned for instruction-following, was assessed on four VQA datasets and six multi-modal benchmarks. For detailed evaluation procedures, please refer to Evaluation.

Demo

Trying InfMLLM-7B-Chat is straightforward. We've provided a demo script to run on the following example image.

CUDA_VISIBLE_DEVICES=0 python demo.py

The conversation generated is shown below.

Citation

@misc{zhou2023infmllm,
      title={InfMLLM: A Unified Framework for Visual-Language Tasks}, 
      author={Qiang Zhou and Zhibin Wang and Wei Chu and Yinghui Xu and Hao Li and Yuan Qi},
      year={2023},
      eprint={2311.06791},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgments

This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INF-MLLM1

INF-MLLM1

README.md

InfMLLM: A Unified Model for Visual-Language Tasks

Release

Contents

Install

Model Zoo

Evaluation

Demo

Citation

Acknowledgments

Files

INF-MLLM1

Directory actions

More options

Directory actions

More options

Latest commit

History

INF-MLLM1

Folders and files

parent directory

README.md

InfMLLM: A Unified Model for Visual-Language Tasks

Release

Contents

Install

Model Zoo

Evaluation

Demo

Citation

Acknowledgments