Skip to content

MMPreTrain Release v1.0.0rc8: Multi-Modality Support

Pre-release
Pre-release
Compare
Choose a tag to compare
@mzr1996 mzr1996 released this 23 May 03:45
· 100 commits to main since this release
4dd8a86

Highlights

  • Support multiple multi-modal algorithms and inferencers. You can explore these features by the gradio demo!
  • Add EVA-02, Dino-V2, ViT-SAM and GLIP backbones.
  • Register torchvision transforms into MMPretrain, you can now easily integrate torchvision's data augmentations in MMPretrain.

New Features

  • Support Chinese CLIP. (#1576)
  • Add ScienceQA Metrics (#1577)
  • Support multiple multi-modal algorithms and inferencers. (#1561)
  • add eva02 backbone (#1450)
  • Support dinov2 backbone (#1522)
  • Support some downstream classification datasets. (#1467)
  • Support GLIP (#1308)
  • Register torchvision transforms into mmpretrain (#1265)
  • Add ViT of SAM (#1476)

Improvements

  • [Refactor] Support to freeze channel reduction and add layer decay function (#1490)
  • [Refactor] Support resizing pos_embed while loading ckpt and format output (#1488)

Bug Fixes

  • Fix scienceqa (#1581)
  • Fix config of beit (#1528)
  • Incorrect stage freeze on RIFormer Model (#1573)
  • Fix ddp bugs caused by out_type. (#1570)
  • Fix multi-task-head loss potential bug (#1530)
  • Support bce loss without batch augmentations (#1525)
  • Fix clip generator init bug (#1518)
  • Fix the bug in binary cross entropy loss (#1499)

Docs Update

  • Update PoolFormer citation to CVPR version (#1505)
  • Refine Inference Doc (#1489)
  • Add doc for usage of confusion matrix (#1513)
  • Update MMagic link (#1517)
  • Fix example_project README (#1575)
  • Add NPU support page (#1481)
  • train cfg: Removed old description (#1473)
  • Fix typo in MultiLabelDataset docstring (#1483)

Contributors

A total of 12 developers contributed to this release.

@XiudingCai @Ezra-Yu @KeiChiTse @mzr1996 @bobo0810 @wangbo-zhao @yuweihao @fangyixiao18 @YuanLiuuuuuu @MGAMZ @okotaku @zzc98