Supercharging YOLOv5: How I Got 182.4 FPS Inference Without a GPU #8151
dnth
started this conversation in
Show and tell
Replies: 1 comment
-
Hi, i want to know will this steps work for object detection using yolov5 in real time rtsp stream. I get very few detections only when i simply use yolov5. Will following these steps improve the detection results ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Anyone can train a YOLOv5 nowadays, thanks to the devs of this repo.
But deploying it on a CPU is still a PAIN.
The pain ends here.
In this post I'll show you how I got insane speeds (180+ FPS) running YOLOv5 on a consumer CPU using 4 only cores 🤯
🔥 P/S: I use open-source tools by Neural Magic
--
💡Motivation
CPUs are far more common than GPUs in a production environment.
Can we leverage on the availability of CPUs and run object detection models in real-time?
Now, we can confidently say, YES we can 🤗
--
👉 By the end of this post , you'll learn how to:
⬩ Fine-tune existing sparse models on your dataset.
⬩ Sparsify the YOLOv5 model using SparseML.
⬩ Run the model using the DeepSparse engine at insane speeds on CPUs.
--
🤷♂️ Sparsification?
First the basics. What exactly is sparsification?
Sparsification is the process of removing redundant information from a model.
It is done by Pruning, Quantization or both.
In general,
✂ Pruning - Removing unused weights in the model.
🔮 Quantization - Forcing a model to use a less accurate storage format ie from 32-bit floating point (FP32) to 8-bit integers (INT8).
Used together or separately, they result is a smaller and faster model.
--
🔫 Dataset
The recent gun violence news had me thinking deeply about how we can prevent incidents like these again. This is the worst gun violence since 2012, and 21 innocent lives were lost.
My heart goes out to all victims of the violence and their loved ones.
I’m not a lawmaker, so there is little I can do there.
But, I think I know something in computer vision that might help.
That’s when I came across the Pistols Dataset from Roboflow.
--
In the baseline model, Running an inference using the saved PyTorch checkpoint on my CPU (i9-11900) with all 8 cores 👇
• Average FPS : 21.91
• Average inference time (ms) : 45.58
--
In the post I'll walk you through from the baseline to the fastest model 🚀
• Average FPS : 101.52
• Average inference time (ms) : 9.84
Using free tools by Neural Magic and Ultralytics.
--
Gone are the days when we need GPUs to run models in real-time.
⚡ With DeepSparse you can get GPU-class performance on CPUs.
Link to Twitter thread 👉 https://twitter.com/dicksonneoh7/status/1534395572022480896
LinkedIn post 👉 https://www.linkedin.com/posts/dickson-neoh_anyone-can-train-a-yolov5-ultralytics-nowadays-activity-6940225158120914944-5u3c?utm_source=linkedin_share&utm_medium=android_app
Link to blog post 👉 https://dicksonneoh.com/portfolio/supercharging_yolov5_180_fps_cpu/ (edited)
Link to GitHub codes 👉 https://github.com/dnth/yolov5-deepsparse-blogpost
Beta Was this translation helpful? Give feedback.
All reactions