Skip to content

Meta-Transformer_Unified_Multimodal_Encoder

Latest
Compare
Choose a tag to compare
@invictus717 invictus717 released this 24 Jul 10:21
· 62 commits to master since this release

We release the Unified Multimodal Encoder.

from timm.models.vision_transformer import Block
ckpt = torch.load("Meta-Transformer_base_patch16_encoder.pth")
encoder = nn.Sequential(*[
            Block(
                dim=768,
                num_heads=12,
                mlp_ratio=4.,
                qkv_bias=True,
                norm_layer=nn.LayerNorm,
                act_layer=nn.GELU
            )
            for i in range(12)])
encoder.load_state_dict(ckpt,strict=True)