Skip to content

Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

Notifications You must be signed in to change notification settings

sunwei925/RQ-VQA

Repository files navigation

RQ-VQA

visitors Pytorch License arXiv

🏆 🥇 Winner solution for NTIRE 2024 Short-form UGC Video Quality Assessment Challenge at the NTIRE 2024 workshop @ CVPR 2024

Official Code for Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

TODO

  • release the test code for a single video
  • release the training code for other VQA datasets

Introduction

In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos. Specifically, we use SimpleVQA, a BVQA model that consists of a trainable Swin Transformer-B and a fixed SlowFast, as our base model. The Swin Transformer-B and SlowFast components are responsible for extracting spatial and motion features, respectively. Then, we extract three kinds of features from Q-Align, LIQE, and FAST-VQA to capture frame-level quality-aware features, frame-level quality-aware along with scene-specific features, and spatiotemporal quality-aware features, respectively. Through concatenating these features, we employ a multi-layer perceptron (MLP) network to regress them into quality scores. Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.

Model

Model Figure

Performance

Performance on social media video quality assessment datasets

Model Performance

Performance on NTIRE Challenge

Team SRCC PLCC RANK1 RANK2 Scores
SJTU MMLab (ours) 0.9361 0.9359 0.7792 0.8284 0.9228
IH-VQA (WeChat) 0.9298 0.9325 0.7013 0.8284 0.9145
TVQE (Tecent) 0.9268 0.9312 0.6883 0.8284 0.9120
BDVQAGroup (ByteDance) 0.9275 0.9211 0.7489 0.8462 0.9116
VideoFusion (Zhejiang University) 0.9026 0.9071 0.7186 0.8580 0.8932
  • for more results on the NTIRE challenge, please refer to the challenge report.

Usage

Environments

Dataset

Download the KVQ dataset

Train RQ-VQA

  • Frame extraction
python frame_extraction/extract_frame_NTIREVideo_384p.py --filename_path data/train_data.csv --videos_dir /data/sunwei_data/ntire_video --save_folder /data/sunwei_data/ntire_video/test_image_384p
python frame_extraction/extract_frame_NTIREVideo_original.py --filename_path data/train_data.csv --videos_dir /data/sunwei_data/ntire_video --save_folder /data/sunwei_data/ntire_video/test_image_original # for extracting LIQE and Q-Align features
  • SlowFast feature extraction
CUDA_VISIBLE_DEVICES=0 python -u feature_extraction/extract_SlowFast_feature_VQA.py \
--database NTIREVideoTest \
--resize 224 \
--feature_save_folder  /data/sunwei_data/ntire_video/NTIREVideo_Train_SlowFast_feature/ \
--datainfo_test data/train_data.csv \
--videos_dir /data/sunwei_data/ntire_video
  • LIQE features extraction

Download the model weights

CUDA_VISIBLE_DEVICES=0 python -u feature_extraction/extract_LIQE_feature_KVQ.py --videos_dir_test /data/sunwei_data/ntire_video/test_image_original --feature_save_folder /data/sunwei_data/ntire_video/LIQE_feature/ --datainfo_test data/train_data.csv
  • FASTVQA features extraction

You should put the path of data csvfile in Line 18 (data/train_data.csv), data path in Line 19, and pretrained FAST_VQA_B_1*4.pth path in Line 50 in options/fast-b_NTIRE_UGC.yml

cd features/FastVQA_feature
CUDA_VISIBLE_DEVICES=0 python extract_fastvqa_feature.py \
--opt options/fast-b_NTIRE_UGC.yml \
--save_path /data/sunwei_data/ntire_video/FASTVQA/sampled/
  • Q-Align features extraction
cd feature_extraction/Q-Align
read the readme.txt for feature extraction 

To facilitate the reproduction of the experiments, we provide SlowFast, FASTVQA, LIQE and Q-Align features for the KVQ training, validation, and test sets.

  • Train the model

Download the pre-trained model on LSVQ

  CUDA_VISIBLE_DEVICES=0,1 python -u train.py \
 --database NTIREVideo \
 --model_name RQ_VQA \
 --pretrained_path /home/sunwei/code/VQA/SimpleVQA/ckpts/Swin_b_384_in22k_SlowFast_Fast_LSVQ.pth \
 --multi_gpu \
 --motion \
 --conv_base_lr 0.00001 \
 --epochs 30 \
 --train_batch_size 6 \
 --print_samples 400 \
 --num_workers 6 \
 --ckpt_path ckpts \
 --decay_ratio 0.9 \
 --decay_interval 10 \
 --loss_type plcc \
 --random_seed 10 \
 --n_exp 10 \
 --resize 384 \
 --crop_size 384 \
 >> logs/train.log

For computational efficiency, you can simply train the base model, which does not require extracting FASTVQA, LIQE, and Q-Align features.

  CUDA_VISIBLE_DEVICES=0,1 python -u train_base_model.py \
 --database NTIREVideo \
 --model_name RQ_VQA_base_model \
 --pretrained_path /home/sunwei/code/VQA/SimpleVQA/ckpts/Swin_b_384_in22k_SlowFast_Fast_LSVQ.pth \
 --multi_gpu \
 --motion \
 --conv_base_lr 0.00001 \
 --epochs 30 \
 --train_batch_size 6 \
 --print_samples 400 \
 --num_workers 6 \
 --ckpt_path ckpts \
 --decay_ratio 0.9 \
 --decay_interval 10 \
 --loss_type plcc \
 --random_seed 10 \
 --n_exp 10 \
 --resize 384 \
 --crop_size 384 \
 >> logs/train.log

Test RQ-VQA

  • Download the model weighs trained on KVQ.
  • Extract video frames, SlowFast features, FASTVQA features, LIQE features, and Q-Align features of KVQ validation and test sets.
  • Run the code
CUDA_VISIBLE_DEVICES=0 python -u test.py \
--save_file results.csv \
--n_exp 10 \
--resize 384 \
--crop_size 384 \
--pretrained_path  \ # put the folder of trained model here
>> test.log

Citation

If you find this code is useful for your research, please cite:

@article{sun2024enhancing,
  title={Enhancing Blind Video Quality Assessment with Rich Quality-aware Features},
  author={Sun, Wei and Wu, Haoning and Zhang, Zicheng and Jia, Jun and Zhang, Zhichao and Cao, Linhan and Chen, Qiubo and Min, Xiongkuo and Lin, Weisi and Zhai, Guangtao},
  journal={arXiv preprint arXiv:2405.08745},
  year={2024}
}

Acknowledgement

  1. https://github.com/zwx8981/LIQE
  2. https://github.com/VQAssessment/FAST-VQA-and-FasterVQA
  3. https://github.com/Q-Future/Q-Align