Supercharging YOLOv5: How I Got 182.4 FPS Inference Without a GPU #8151

dnth · 2022-06-08T14:30:13Z

dnth
Jun 8, 2022

Anyone can train a YOLOv5 nowadays, thanks to the devs of this repo.

But deploying it on a CPU is still a PAIN.

The pain ends here.

In this post I'll show you how I got insane speeds (180+ FPS) running YOLOv5 on a consumer CPU using 4 only cores 🤯

🔥 P/S: I use open-source tools by Neural Magic

--

💡Motivation

CPUs are far more common than GPUs in a production environment.

Can we leverage on the availability of CPUs and run object detection models in real-time?

Now, we can confidently say, YES we can 🤗

--

👉 By the end of this post , you'll learn how to:

⬩ Fine-tune existing sparse models on your dataset.
⬩ Sparsify the YOLOv5 model using SparseML.
⬩ Run the model using the DeepSparse engine at insane speeds on CPUs.

--

🤷‍♂️ Sparsification?

First the basics. What exactly is sparsification?

Sparsification is the process of removing redundant information from a model.

It is done by Pruning, Quantization or both.

In general,

✂ Pruning - Removing unused weights in the model.

🔮 Quantization - Forcing a model to use a less accurate storage format ie from 32-bit floating point (FP32) to 8-bit integers (INT8).

Used together or separately, they result is a smaller and faster model.

--

🔫 Dataset

The recent gun violence news had me thinking deeply about how we can prevent incidents like these again. This is the worst gun violence since 2012, and 21 innocent lives were lost.

My heart goes out to all victims of the violence and their loved ones.

I’m not a lawmaker, so there is little I can do there.
But, I think I know something in computer vision that might help.

That’s when I came across the Pistols Dataset from Roboflow.

--

In the baseline model, Running an inference using the saved PyTorch checkpoint on my CPU (i9-11900) with all 8 cores 👇

• Average FPS : 21.91
• Average inference time (ms) : 45.58

--

In the post I'll walk you through from the baseline to the fastest model 🚀

• Average FPS : 101.52
• Average inference time (ms) : 9.84

Using free tools by Neural Magic and Ultralytics.

--

Gone are the days when we need GPUs to run models in real-time.

⚡ With DeepSparse you can get GPU-class performance on CPUs.

Link to Twitter thread 👉 https://twitter.com/dicksonneoh7/status/1534395572022480896

LinkedIn post 👉 https://www.linkedin.com/posts/dickson-neoh_anyone-can-train-a-yolov5-ultralytics-nowadays-activity-6940225158120914944-5u3c?utm_source=linkedin_share&utm_medium=android_app

Link to blog post 👉 https://dicksonneoh.com/portfolio/supercharging_yolov5_180_fps_cpu/ (edited)

Link to GitHub codes 👉 https://github.com/dnth/yolov5-deepsparse-blogpost

afrahthahir · 2024-09-25T14:20:07Z

afrahthahir
Sep 25, 2024

Hi, i want to know will this steps work for object detection using yolov5 in real time rtsp stream. I get very few detections only when i simply use yolov5. Will following these steps improve the detection results ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supercharging YOLOv5: How I Got 182.4 FPS Inference Without a GPU #8151

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Supercharging YOLOv5: How I Got 182.4 FPS Inference Without a GPU #8151

dnth Jun 8, 2022

Replies: 1 comment

afrahthahir Sep 25, 2024

dnth
Jun 8, 2022

afrahthahir
Sep 25, 2024