OmModel

English | 中文

A collection of strong multimodal models for building the best multimodal agents

🗓️ Updates

07/04/2024: OmAgent is now open-sourced. 🌟 Dive into our Multi-modal Agent Framework for complex video understanding. Read more in our paper.
06/09/2024: OmChat has been released. 🎉 Discover the capabilities of our multimodal language models, featuring robust video understanding and support for context up to 512k. More details in the technical report.
03/12/2024: OmDet is now open-sourced. 🚀 Experience our fast and accurate Open Vocabulary Detection (OVD) model, achieving 100 FPS. Learn more in our paper.

🗃️ Projects

Here are the various projects we've worked on at OmLab:

⭐️ OmAgent

Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

⭐️ OmDet

Fast and accurate open-vocabulary end-to-end object detection

⭐️ OmChat

Multimodal Language Models with Strong Long Context and Video Understanding

⭐️ OVDEval

A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection

📜 Papers

Here are the research papers published by OmLab:

🏷️ How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Published in: AAAI, 2024

🏷️ OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

Published in: IET Computer Vision, 2024

🏷️ Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Published in: Arxiv. 2024

🏷️ OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Published in: Arxiv. 2024

🏷️ OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Published in: Arxiv. 2024

🏷️ Sparta: Efficient open-domain question answering via sparse transformer matching retrieval

Published in: NAACL, 2021

📬 Contact

For more information, feel free to reach out to us at [email protected].

Thank you for visiting OmModel's repository. We hope you find our projects and papers insightful and useful!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
OmAgent.png		OmAgent.png
README.md		README.md
README_ZH.md		README_ZH.md
omchat_structure.jpg		omchat_structure.jpg
omlab.png		omlab.png
ovdeval_radar.jpg		ovdeval_radar.jpg
speed_compare.jpeg		speed_compare.jpeg
turbo_model_structure.jpeg		turbo_model_structure.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmModel

🗓️ Updates

🗃️ Projects

⭐️ OmAgent

⭐️ OmDet

⭐️ OmChat

⭐️ OVDEval

📜 Papers

🏷️ How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

🏷️ OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

🏷️ Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

🏷️ OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

🏷️ OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

🏷️ Sparta: Efficient open-domain question answering via sparse transformer matching retrieval

📬 Contact

About

Releases

Packages

Contributors 2

License

om-ai-lab/OmModel

Folders and files

Latest commit

History

Repository files navigation

OmModel

🗓️ Updates

🗃️ Projects

⭐️ OmAgent

⭐️ OmDet

⭐️ OmChat

⭐️ OVDEval

📜 Papers

🏷️ How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

🏷️ OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

🏷️ Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

🏷️ OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

🏷️ OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

🏷️ Sparta: Efficient open-domain question answering via sparse transformer matching retrieval

📬 Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages