SCRFD is an efficient high accuracy face detection approach which initially described in Arxiv, and accepted by ICLR-2022.
Precision, flops and infer time are all evaluated on VGA resolution.
Method | Backbone | Easy | Medium | Hard | #Params(M) | #Flops(G) | Infer(ms) |
---|---|---|---|---|---|---|---|
DSFD (CVPR19) | ResNet152 | 94.29 | 91.47 | 71.39 | 120.06 | 259.55 | 55.6 |
RetinaFace (CVPR20) | ResNet50 | 94.92 | 91.90 | 64.17 | 29.50 | 37.59 | 21.7 |
HAMBox (CVPR20) | ResNet50 | 95.27 | 93.76 | 76.75 | 30.24 | 43.28 | 25.9 |
TinaFace (Arxiv20) | ResNet50 | 95.61 | 94.25 | 81.43 | 37.98 | 172.95 | 38.9 |
- | - | - | - | - | - | - | - |
ResNet-34GF | ResNet50 | 95.64 | 94.22 | 84.02 | 24.81 | 34.16 | 11.8 |
SCRFD-34GF | Bottleneck Res | 96.06 | 94.92 | 85.29 | 9.80 | 34.13 | 11.7 |
ResNet-10GF | ResNet34x0.5 | 94.69 | 92.90 | 80.42 | 6.85 | 10.18 | 6.3 |
SCRFD-10GF | Basic Res | 95.16 | 93.87 | 83.05 | 3.86 | 9.98 | 4.9 |
ResNet-2.5GF | ResNet34x0.25 | 93.21 | 91.11 | 74.47 | 1.62 | 2.57 | 5.4 |
SCRFD-2.5GF | Basic Res | 93.78 | 92.16 | 77.87 | 0.67 | 2.53 | 4.2 |
Method | Backbone | Easy | Medium | Hard | #Params(M) | #Flops(G) | Infer(ms) |
---|---|---|---|---|---|---|---|
RetinaFace (CVPR20) | MobileNet0.25 | 87.78 | 81.16 | 47.32 | 0.44 | 0.802 | 7.9 |
FaceBoxes (IJCB17) | - | 76.17 | 57.17 | 24.18 | 1.01 | 0.275 | 2.5 |
- | - | - | - | - | - | - | - |
MobileNet-0.5GF | MobileNetx0.25 | 90.38 | 87.05 | 66.68 | 0.37 | 0.507 | 3.7 |
SCRFD-0.5GF | Depth-wise Conv | 90.57 | 88.12 | 68.51 | 0.57 | 0.508 | 3.6 |
X64 CPU Performance of SCRFD-0.5GF:
Test-Input-Size | CPU Single-Thread | Easy | Medium | Hard |
---|---|---|---|---|
Original-Size(scale1.0) | - | 90.91 | 89.49 | 82.03 |
640x480 | 28.3ms | 90.57 | 88.12 | 68.51 |
320x240 | 11.4ms | - | - | - |
precision and infer time are evaluated on AMD Ryzen 9 3950X, using the simple PyTorch CPU inference by setting OMP_NUM_THREADS=1
(no mkldnn).
Please refer to mmdetection for installation.
- Install mmcv. (mmcv-full==1.2.6 and 1.3.3 was tested)
- Install build requirements and then install mmdet.
pip install -r requirements/build.txt pip install -v -e . # or "python setup.py develop"
- Download WIDERFace datasets and put it under
data/retinaface
. - Download annotation files from gdrive and put them under
data/retinaface/
data/retinaface/
train/
images/
labelv2.txt
val/
images/
labelv2.txt
gt/
*.mat
please refer to labelv2.txt for detail
For each image:
# <image_path> image_width image_height
bbox_x1 bbox_y1 bbox_x2 bbox_y2 (<keypoint,3>*N)
...
...
# <image_path> image_width image_height
bbox_x1 bbox_y1 bbox_x2 bbox_y2 (<keypoint,3>*N)
...
...
Keypoints can be ignored if there is bbox annotation only.
Example training command, with 4 GPUs:
CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_1g.py 4
We use a pure python evaluation script without Matlab.
GPU=0
GROUP=scrfd
TASK=scrfd_2.5g
CUDA_VISIBLE_DEVICES="$GPU" python -u tools/test_widerface.py ./configs/"$GROUP"/"$TASK".py ./work_dirs/"$TASK"/model.pth --mode 0 --out wouts
Name | Easy | Medium | Hard | FLOPs | Params(M) | Infer(ms) | Link |
---|---|---|---|---|---|---|---|
SCRFD_500M | 90.57 | 88.12 | 68.51 | 500M | 0.57 | 3.6 | download |
SCRFD_1G | 92.38 | 90.57 | 74.80 | 1G | 0.64 | 4.1 | download |
SCRFD_2.5G | 93.78 | 92.16 | 77.87 | 2.5G | 0.67 | 4.2 | download |
SCRFD_10G | 95.16 | 93.87 | 83.05 | 10G | 3.86 | 4.9 | download |
SCRFD_34G | 96.06 | 94.92 | 85.29 | 34G | 9.80 | 11.7 | download |
SCRFD_500M_KPS | 90.97 | 88.44 | 69.49 | 500M | 0.57 | 3.6 | download |
SCRFD_2.5G_KPS | 93.80 | 92.02 | 77.13 | 2.5G | 0.82 | 4.3 | download |
SCRFD_10G_KPS | 95.40 | 94.01 | 82.80 | 10G | 4.23 | 5.0 | download |
mAP, FLOPs and inference latency are all evaluated on VGA resolution.
_KPS
means the model includes 5 keypoints prediction.
Please refer to tools/scrfd2onnx.py
Generated onnx model can accept dynamic input as default.
You can also set specific input shape by pass --shape 640 640
, then output onnx model can be optimized by onnx-simplifier.
Please refer to tools/scrfd.py
which uses onnxruntime to do inference.
For two-steps search as we described in paper, we target hard mAP on how we select best candidate models.
We provide an example for searching SCRFD-2.5GF in this repo as below.
-
For searching backbones:
python search_tools/generate_configs_2.5g.py --mode 1
Where
mode==1
means searching backbone only. For other parameters, please check the code. -
After step-1 done, there will be
configs/scrfdgen2.5g/scrfdgen2.5g_1.py
toconfigs/scrfdgen2.5g/scrfdgen2.5g_64.py
ifnum_configs
is set to 64. -
Do training for every generated configs for 80 epochs, please check
search_tools/search_train.sh
-
Test WIDERFace precision for every generated configs, using
search_tools/search_test.sh
. -
Select the top accurate config as the base template(assume the 10-th config is the best), then do the overall network search.
python search_tools/generate_configs_2.5g.py --mode 2 --template 10
-
Test these new generated configs again and select the top accurate one(s).
We thank nihui for the excellent mobile-phone demo.