-
Notifications
You must be signed in to change notification settings - Fork 460
FFmpeg QSV Multi GPU Selection on Linux
Contents
FFmpeg command line has a number of options to select GPU in multi device case. Appropriate usage of these options might be tricky. This article summarizes most typical use cases trying to pay attention on the tricky points.
In the examples given in this article we will use input content which can be obtained as follows (stream resolution is 176x144):
wget https://fate-suite.libav.org/h264-conformance/AUD_MW_E.264 ffmpeg -i AUD_MW_E.264 -c:v rawvideo -pix_fmt yuv420p -y AUD_MW_E.yuv
We will consider a system with 2 Intel GPUs and provide command lines which will schedule all GPU operations within the pipeline on the specified device.
This article does not attempt to focus on the best performance, example command lines are simplified as much as possible to just expolore scheduling options. Refer to other materials on how to achieve better quality and performance with ffmpeg-qsv.
To schedule on /dev/dri/renderD128
:
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v h264_qsv -i AUD_MW_E.264 -c:v h264_qsv -y out.264
To schedule on /dev/dri/renderD129
:
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD129 -c:v h264_qsv -i AUD_MW_E.264 -c:v h264_qsv -y out.264
To schedule on /dev/dri/renderD128
:
ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device qsv=hw@va -c:v h264 -i AUD_MW_E.264 -c:v h264_qsv -y out.264
To schedule on /dev/dri/renderD129
:
ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD129 -init_hw_device qsv=hw@va -c:v h264 -i AUD_MW_E.264 -c:v h264_qsv -y out.264
To schedule on /dev/dri/renderD128
:
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 \ -c:v h264_qsv -i AUD_MW_E.264 -vf hwdownload,format=nv12 -pix_fmt yuv420p -y out.yuv
To schedule on /dev/dri/renderD129
:
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD129 \ -c:v h264_qsv -i AUD_MW_E.264 -vf hwdownload,format=nv12 -pix_fmt yuv420p -y out.yuv
To schedule on /dev/dri/renderD128
:
ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device qsv=hw@va \ -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -c:v h264_qsv -y out.h264
To schedule on /dev/dri/renderD129
:
ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD129 -init_hw_device qsv=hw@va \ -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -c:v h264_qsv -y out.h264
To schedule on /dev/dri/renderD128
:
ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device qsv=hw@va -filter_hw_device hw \ -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -vf hwupload=extra_hw_frames=64,format=qsv -c:v h264_qsv -y out.h264
To schedule on /dev/dri/renderD129
:
ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD129 -init_hw_device qsv=hw@va -filter_hw_device hw \ -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -vf hwupload=extra_hw_frames=64,format=qsv -c:v h264_qsv -y out.h264
Option | Applies to |
---|---|
-hwaccel qsv |
QSV decoders |
-hwaccel_device |
Does not apply to QSV |
-qsv_device |
-hwaccel |
-init_hw_device |
QSV encoders |
-filter_hw_device |
QSV filters, QSV HW upload |
In general, -hwaccel
option actually applies to the input stream, i.e. to the 1st
component in the pipeline. If the first component is QSV decoder, then it will apply to it and
QSV encoder (2nd in the pipeline) will pick the same device thru the frames context. But if
1st component is not QSV decoder (as in the encoding examples where 1st component is
rawvideo
), then option will be simply ignored (by rawvideo
), encoder won't be
able to pick device from the frames context (since they will be raw system memory frames)
and will work on the default device.
-hwaccel_device
option is supposed to be able to select device for the -hwaccel
option, but it is not actually implemented for QSV. Instead, we need to use -qsv_device
.
-init_hw_device
initializes HW device and adds it to the global list. QSV encoders are
capable to pick devices from this list, but QSV decoders are not (historically QSV encoders
were written later and implemented this path while QSV decoders did not, the latter uses
their own ad-hoc path).
Finally, -filter_hw_device
allows to specify device for the filters and covers HW upload
case (which in case of QSV uses GPU to copy frames in system memory to video memory).
Before we will dive into specific ffmpeg command lines exploring device selection options, let's consider how we can check that selected GPU is actually loaded? For that we can use Linux perf tool which is capable to show whether tasks are running on the engines of the Intel GPUs.
List of available GPU engines is actually GPU dependent. To get it execute:
$ sudo perf list | grep i915 | grep busy i915/bcs0-busy/ [Kernel PMU event] i915/rcs0-busy/ [Kernel PMU event] i915/vcs0-busy/ [Kernel PMU event] i915/vecs0-busy/ [Kernel PMU event] i915_0000_03_00.0/bcs0-busy/ [Kernel PMU event] i915_0000_03_00.0/rcs0-busy/ [Kernel PMU event] i915_0000_03_00.0/vcs0-busy/ [Kernel PMU event] i915_0000_03_00.0/vcs1-busy/ [Kernel PMU event] i915_0000_03_00.0/vecs0-busy/ [Kernel PMU event]
In this example we have a system with 2 enabled Intel GPU devices. First GPU has 4 engines and is
Intel integrated GPU (it's events follow the pattern i915/<engine>-<event>
). Second GPU is
Intel discrete GPU and has 5 engines (event pattern i915_<pci>/<engine>-<event>
). For the
purpose of this article we will use only "busy" events which effectively give times during which
engines were actually executing some tasks.
Now we can run the following script to monitor both GPUs activity:
events="" # events for first GPU events+=i915/bcs0-busy/, events+=i915/rcs0-busy/, events+=i915/vcs0-busy/, events+=i915/vecs0-busy/, # events for second GPU events+=i915_0000_03_00.0/bcs0-busy/, events+=i915_0000_03_00.0/rcs0-busy/, events+=i915_0000_03_00.0/vcs0-busy/, events+=i915_0000_03_00.0/vcs1-busy/, events+=i915_0000_03_00.0/vecs0-busy/ # sudo perf stat -a -I 1000 -e $events \ /bin/bash -c "while :; do echo 'Press [CTRL+C] to stop..'; sleep 1; done"
Script will print both GPUs engines utilization (in nanoseconds during which engines were running tasks) each 1 second. For example:
9.003608840 0 ns i915/bcs0-busy/ 9.003608840 0 ns i915/rcs0-busy/ 9.003608840 33,478,100 ns i915/vcs0-busy/ 9.003608840 0 ns i915/vecs0-busy/ 9.003608840 511,984 ns i915_0000_03_00.0/bcs0-busy/ 9.003608840 147,219,205 ns i915_0000_03_00.0/rcs0-busy/ 9.003608840 25,717,925 ns i915_0000_03_00.0/vcs0-busy/ 9.003608840 0 ns i915_0000_03_00.0/vcs1-busy/ 9.003608840 40,396,895 ns i915_0000_03_00.0/vecs0-busy/ Press [CTRL+C] to stop..
As you can see, in this example we have tasks running on both GPUs, but not all engines are actually busy.
Running this script in parallel to ffmpeg examples given below you can check onto which GPU each workload was actually scheduled.
For mode details on Linux perf usage refer to Performance monitoring and debug w/ Linux perf.
- Media SDK for Linux
- Media SDK for Windows
- FFmpeg QSV
- GStreamer MSDK
- Docker
- Usage guides
- Building Media SDK
- Running Media SDK CI tests
- Additional information
- Multi-Frame Encode