-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ml): export clip models to ONNX and host models on Hugging Face #4700
Conversation
refactored export code
…t, general refactoring
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good work! Looking forward to finally removing the large pytorch dependency. Even though, removing the dependency on clip-as-a-service is great! The fewer deps, the better :-)
I really like the new export functionality/image.
8468d51
to
9191c9e
Compare
Could you add a short README for the exporter? What it does, how to use it for some example model. |
I'm trying to ask for permission to check if it's not a problem in terms of copyright to create another download source for the models, from the license condition and from what I understand there shouldn't be a problem, but I'd rather ask anyway If you know the answer to my question, I would love to hear it |
…mmich-app#4700) * export clip models * export to hf refactored export code * export mclip, general refactoring cleanup * updated conda deps * do transforms with pillow and numpy, add tokenization config to export, general refactoring * moved conda dockerfile, re-added poetry * minor fixes * updated link * updated tests * removed `requirements.txt` from workflow * fixed mimalloc path * removed torchvision * cleaner np typing * review suggestions * update default model name * update test
The model used in my Smart Search is "M-CLIP/XLM Robertsa Large Vit B-16Plus". Do I need to modify this name? For example, modify it to: "immich app/XLM Robertsa Large Vit B-16Plus", thank you |
It ignores anything before the slash, so you should be fine. |
My "model-cache\clip" directory originally had a folder named "M-CLIP_XLM-Robertsa-Large Vit-B-16Plus". After upgrading to 1.84, a folder named "XLM-Robertsa-Large Vit-B-16Plus" appeared. Can I delete the "M-CLIP_XLM-Robertsa-Large Vit-B-16Plus" folder? |
Yes, that's a stale folder at this point. |
…mmich-app#4700) * export clip models * export to hf refactored export code * export mclip, general refactoring cleanup * updated conda deps * do transforms with pillow and numpy, add tokenization config to export, general refactoring * moved conda dockerfile, re-added poetry * minor fixes * updated link * updated tests * removed `requirements.txt` from workflow * fixed mimalloc path * removed torchvision * cleaner np typing * review suggestions * update default model name * update test
@mertalev Write a script that does this. The export code is available here https://github.com/immich-app/immich/tree/main/machine-learning/export It downloads the openclip model, traces it to torchscript and exports the torchscript model to ONNX. |
Description
We currently use clip-as-service for downloading CLIP models. The motivation of using this was to avoid the need to export models ourselves, as well as to have models ready to use after downloading without exporting to ONNX at runtime. However, this has caused a number of issues, particularly due to the hosting server being intermittently unavailable.
This PR transitions away from using clip-as-service to handle model exporting and hosting ourselves. The full ONNX catalog of clip-as-service is supported for feature parity and backwards compatibility, and models are downloaded with a different cache structure than before. As a result, this is a drop-in replacement that should not require any manual intervention.
Exported models are uploaded to a brand new set of Hugging Face repos with a new organization. Relevant model repos are downloaded at runtime and are completely self-contained in the files they need.
The CLIP implementation in the ML service has been refactored to integrate with these repos. Moreover, all dependence on PyTorch has been removed from this section of the code: preprocessing is now exclusively done in Pillow and NumPy. This paves the way for shrinking the image size considerably, leaving the image classification code as the only remaining reliance on PyTorch.
While this PR is focused on CLIP, using our own Hugging Face repos for models enables many exciting possibilities in the future. This is just the start.
How has this been tested?
Every model listed here has been tested with Postman for both image and text. Additionally, I tested text search with
ViT-B-32__openai
before running an Encode CLIP job, confirming the results were relevant (i.e. the model outputs are correct and compatible with existing embeddings). The Encode CLIP job ran successfully as well, as did changing the model toXLM-Roberta-Large-Vit-L-14
(an M-CLIP model that is handled differently than OpenAI and OpenCLIP models).Fixes #4117