Skip to content

Latest commit

 

History

History
91 lines (69 loc) · 2.76 KB

README.md

File metadata and controls

91 lines (69 loc) · 2.76 KB

Overview

ViTX is a Vision-TeXt representation learning framework on top of PyTorch Lightning and HuggingFace Transformers for Cross-Modal Perception research. It is designed to be readable and easily extensible, to allow users to quickly run and experiment with their own ideas.

Currently Supported Methods

Installation

Dependencies

ViTX requires Python 3.6+.

  • hydra-core>=1.0.0
  • numpy>=1.22.4
  • rich>=12.4.4
  • pytorch>=1.12.0
  • torchvision>=0.13.0
  • pytorch-lightning
  • transformers
  • wandb>=0.12.19

Manual Installation

It is strongly recommend that you install ViTX in a dedicated virtualenv, to avoid conflicting with your system packages.

git clone https://github.com/naderAsadi/ViTX.git
cd ViTX
pip install -e .

Code structure

  • configs/: Directory to store experiment configurations, all in .yaml format.
  • examples/: Directory for example files showcasing the usage of the codebase.
  • scripts/: General purpose, single file scripts.
  • vitx/: The main source code
    • config/: Data Classes and utility function to parse and sanity check YAML configs.
    • data/: Classes for reading the data, transforming, and tokenizing it.
    • losses/: Classes as nn.Modules holding the implementation of several loss functions.
    • methods/: Classes that contain the pipelines for each method, e.g. training and inference functions.
    • models/: Model classes. e.g. CLIP, SLIP.
    • utils: Utility functions.

How To Use

With vitx you can use uni-modal and multi-modal self-supervised methods in a modular way using the full power of PyTorch. Experiment with different backbones, models and loss functions. The framework has been designed to be easy to use from the ground up.

Quick Start

Usage of main.py file:

python main.py \
        data=<data_config_name> \
        model/vision_model=<vision_model_config_name> \
        model/text_model=<text_model_config_name> \
        +arg=<value> \
        ...

Example:

python main.py \
        data="coco" \
        model/vision_model="vit-b" \
        model/text_model="vit-b"

Reading The Commits

Here is a reference to what each emoji in the commits means:

  • 📎 : Some basic updates.
  • ♻️ : Refactoring.
  • 💩 : Bad code, needs to be revised!
  • 🐛 : Bug fix.
  • 💡 : New feature.
  • ⚡ : Performance Improvement.