- Simple setup:
- If on Mac, can run the setup script to install all requirements (and skip following section)
chmod +x setup_mac.sh
./setup_mac.sh
- Requirements:
- Clone external submodules:
git submodule update --init --recursive
- Set Python version to 3.10:
pyenv global 3.10
- Install Python requirements using pip:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- If on Mac, download and install shell requirement VideoSnap (a macOS command line tool for recording video and audio from any attached capture device):
wget https://github.com/matthutchinson/videosnap/releases/download/v0.0.9/videosnap-0.0.9.pkg
sudo installer -pkg videosnap-0.0.9.pkg -target /
- Clone external submodules:
- Contents:
- Audio and video capture module is located within directory capture
- AV synchronisation detection using Synchformer is located within directory av_sync_detection
- Stutter detection using MaxVQA and Essentia is located within directory stutter_detection
- Video quality assessment using Google UVQ is located within directory video_quality_assessment
- Setup mode to check input audio/video sources:
python capture/capture.py --setup-mode
- Run capture pipeline to generate AV files:
python capture/capture.py -a AUDIO_SOURCE -v VIDEO_SOURCE
- This capture audio and video in 10s segments and save them to the local directory output/capture/
- Halt capture by interrupting execution with
CTRL+C
usage: capture.py [-h] [-m] [-na] [-nv] [-s] [-a AUDIO] [-v VIDEO] [-o OUTPUT_PATH]
Capture audio and video streams from a camera/microphone and split into segments for processing.
options:
-h, --help show this help message and exit
-m, --setup-mode display video to be captured in setup mode with no capture/processing
-na, --no-audio do not include audio in captured segments
-nv, --no-video do not include video in captured segments
-s, --split-av-out output audio and video in separate files (WAV and MP4)
-a AUDIO, --audio AUDIO
index of input audio device
-v VIDEO, --video VIDEO
index of input video device
-o OUTPUT_PATH, --output-path OUTPUT_PATH
directory to output captured video segments to
- The complete build of the AV sync detection system uses Synchformer to predict AV offsets (as this was found to be the most accurate model during experimentation).
- Detection can be completed over a video file or directory of files.
- Can also enable streaming mode that continuously checks a directory for files and processes as they are added. This can be used in conjunction with the capture system to perform AV sync detection in real-time.
- Run inference on static files at PATH:
python AVSyncDetection.py PATH --plot
- Run in streaming mode on captured video segments:
python AVSyncDetection.py ../output/capture/segments/ -sxp
- If running on an Apple Silicon Mac:
python AVSyncDetection.py PATH -p --device mps
- If running on a GPU:
python AVSyncDetection.py PATH -p --device cuda
usage: AVSyncDetection.py [-h] [-p] [-s] [-i] [-d DEVICE] [-t TRUE_OFFSET] directory
Run Synchformer AV sync offset detection model over local AV segments.
positional arguments:
directory
options:
-h, --help show this help message and exit
-p, --plot plot sync predictions as generated by model
-s, --streaming real-time detection of streamed input by continuously locating & processing video segments
-i, --time-indexed-files
label output predictions with available timestamps of input video segments
-d DEVICE, --device DEVICE
harware device to run model on
-t TRUE_OFFSET, --true-offset TRUE_OFFSET
known true av offset of the input video
- Install ExplainableVQA deps:
git submodule update --init --recursive
pip install -r ExplainableVQA/requirements.txt
- Install open_clip:
On Mac:
sed -i "" "92s/return x\[0\]/return x/" ExplainableVQA/open_clip/src/open_clip/modified_resnet.py
pip install -e ExplainableVQA/open_clip
On Linux:
sed -i '92s/return x\[0\]/return x/' ExplainableVQA/open_clip/src/open_clip/modified_resnet.py
pip install -e ExplainableVQA/open_clip
- Install Dover:
On Mac first run this before continuing: sed -i "" "4s/decord/eva-decord/" ExplainableVQA/DOVER/requirements.txt
pip install -e ExplainableVQA/DOVER
mkdir ExplainableVQA/DOVER/pretrained_weights
wget https://github.com/VQAssessment/DOVER/releases/download/v0.1.0/DOVER.pth -P ExplainableVQA/DOVER/pretrained_weights/
- Run inference on directory or video/audio file at PATH:
python StutterDetection.py PATH
- This will output a plot of the "motion fluency" over the course of the video (low fluency may indicate stuttering events) and/or a plot of audio stutter times detected in the waveform.
usage: StutterDetection.py [-h] [-na] [-nv] [-c] [-t] [-i] [-f FRAMES] [-e EPOCHS]
[-d DEVICE]
directory
Run audio and video stutter detection algorithms over local AV segments.
positional arguments:
directory
options:
-h, --help show this help message and exit
-na, --no-audio Do not perform stutter detection on the audio track
-nv, --no-video Do not perform stutter detection on the video track
-c, --clean-video Testing on clean stutter-free videos (for experimentation)
-t, --true-timestamps
Plot known stutter times on the output graph, specified in
'true-stutter-timestamps.json
-i, --time-indexed-files
Label batch of detections over video segments with their
time range (from filename)
-f FRAMES, --frames FRAMES
Number of frames to downsample video to
-e EPOCHS, --epochs EPOCHS
Number of times to repeat inference per video
-d DEVICE, --device DEVICE
Specify processing hardware