MMPreTrain Release v1.0.0rc8: Multi-Modality Support
Pre-release
Pre-release
Highlights
- Support multiple multi-modal algorithms and inferencers. You can explore these features by the gradio demo!
- Add EVA-02, Dino-V2, ViT-SAM and GLIP backbones.
- Register torchvision transforms into MMPretrain, you can now easily integrate torchvision's data augmentations in MMPretrain.
New Features
- Support Chinese CLIP. (#1576)
- Add ScienceQA Metrics (#1577)
- Support multiple multi-modal algorithms and inferencers. (#1561)
- add eva02 backbone (#1450)
- Support dinov2 backbone (#1522)
- Support some downstream classification datasets. (#1467)
- Support GLIP (#1308)
- Register torchvision transforms into mmpretrain (#1265)
- Add ViT of SAM (#1476)
Improvements
- [Refactor] Support to freeze channel reduction and add layer decay function (#1490)
- [Refactor] Support resizing pos_embed while loading ckpt and format output (#1488)
Bug Fixes
- Fix scienceqa (#1581)
- Fix config of beit (#1528)
- Incorrect stage freeze on RIFormer Model (#1573)
- Fix ddp bugs caused by
out_type
. (#1570) - Fix multi-task-head loss potential bug (#1530)
- Support bce loss without batch augmentations (#1525)
- Fix clip generator init bug (#1518)
- Fix the bug in binary cross entropy loss (#1499)
Docs Update
- Update PoolFormer citation to CVPR version (#1505)
- Refine Inference Doc (#1489)
- Add doc for usage of confusion matrix (#1513)
- Update MMagic link (#1517)
- Fix example_project README (#1575)
- Add NPU support page (#1481)
- train cfg: Removed old description (#1473)
- Fix typo in MultiLabelDataset docstring (#1483)
Contributors
A total of 12 developers contributed to this release.
@XiudingCai @Ezra-Yu @KeiChiTse @mzr1996 @bobo0810 @wangbo-zhao @yuweihao @fangyixiao18 @YuanLiuuuuuu @MGAMZ @okotaku @zzc98