cross-modal-pretraining

Here are 2 public repositories matching this topic...

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

llama large-language-models video-language-pretraining vision-language-pretraining cross-modal-pretraining blip2 minigpt4 multi-modal-chatgpt

Updated Jun 4, 2024
Python

JacobYuan7 / RLIP

Star

[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Graph Generation.

relation detection-model relation-detection hoi-detection cross-modal-pretraining

Updated May 26, 2024
Python

Improve this page

Add a description, image, and links to the cross-modal-pretraining topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cross-modal-pretraining topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly