Skip to content

Latest commit

 

History

History
156 lines (125 loc) · 6.35 KB

README.md

File metadata and controls

156 lines (125 loc) · 6.35 KB

General Setup

  • Simple setup:
    • If on Mac, can run the setup script to install all requirements (and skip following section)
    • chmod +x setup_mac.sh
    • ./setup_mac.sh
  • Requirements:
    • Clone external submodules: git submodule update --init --recursive
    • Set Python version to 3.10: pyenv global 3.10
    • Install Python requirements using pip:
      • python -m venv venv
      • source venv/bin/activate
      • pip install -r requirements.txt
    • If on Mac, download and install shell requirement VideoSnap (a macOS command line tool for recording video and audio from any attached capture device):
      • wget https://github.com/matthutchinson/videosnap/releases/download/v0.0.9/videosnap-0.0.9.pkg
      • sudo installer -pkg videosnap-0.0.9.pkg -target /
  • Contents:
    • Audio and video capture module is located within directory capture
    • AV synchronisation detection using Synchformer is located within directory av_sync_detection
    • Stutter detection using MaxVQA and Essentia is located within directory stutter_detection
    • Video quality assessment using Google UVQ is located within directory video_quality_assessment

AV Capture System

  • Setup mode to check input audio/video sources: python capture/capture.py --setup-mode
  • Run capture pipeline to generate AV files: python capture/capture.py -a AUDIO_SOURCE -v VIDEO_SOURCE
  • This capture audio and video in 10s segments and save them to the local directory output/capture/
  • Halt capture by interrupting execution with CTRL+C

General CLI

usage: capture.py [-h] [-m] [-na] [-nv] [-s] [-a AUDIO] [-v VIDEO] [-o OUTPUT_PATH]

Capture audio and video streams from a camera/microphone and split into segments for processing.

options:
  -h, --help            show this help message and exit
  -m, --setup-mode      display video to be captured in setup mode with no capture/processing
  -na, --no-audio       do not include audio in captured segments
  -nv, --no-video       do not include video in captured segments
  -s, --split-av-out    output audio and video in separate files (WAV and MP4)
  -a AUDIO, --audio AUDIO
                        index of input audio device
  -v VIDEO, --video VIDEO
                        index of input video device
  -o OUTPUT_PATH, --output-path OUTPUT_PATH
                        directory to output captured video segments to

AV Synchronisation Detection

Complete Detection System

  • The complete build of the AV sync detection system uses Synchformer to predict AV offsets (as this was found to be the most accurate model during experimentation).
  • Detection can be completed over a video file or directory of files.
  • Can also enable streaming mode that continuously checks a directory for files and processes as they are added. This can be used in conjunction with the capture system to perform AV sync detection in real-time.
  • Run inference on static files at PATH: python AVSyncDetection.py PATH --plot
  • Run in streaming mode on captured video segments: python AVSyncDetection.py ../output/capture/segments/ -sxp
  • If running on an Apple Silicon Mac: python AVSyncDetection.py PATH -p --device mps
  • If running on a GPU: python AVSyncDetection.py PATH -p --device cuda

General CLI

usage: AVSyncDetection.py [-h] [-p] [-s] [-i] [-d DEVICE] [-t TRUE_OFFSET] directory

Run Synchformer AV sync offset detection model over local AV segments.

positional arguments:
  directory

options:
  -h, --help            show this help message and exit
  -p, --plot            plot sync predictions as generated by model
  -s, --streaming       real-time detection of streamed input by continuously locating & processing video segments
  -i, --time-indexed-files
                        label output predictions with available timestamps of input video segments
  -d DEVICE, --device DEVICE
                        harware device to run model on
  -t TRUE_OFFSET, --true-offset TRUE_OFFSET
                        known true av offset of the input video

Stutter Detection

Installing

Installing Video Stutter Module

  1. Install ExplainableVQA deps:
git submodule update --init --recursive
pip install -r ExplainableVQA/requirements.txt
  1. Install open_clip:

On Mac:

sed -i "" "92s/return x\[0\]/return x/" ExplainableVQA/open_clip/src/open_clip/modified_resnet.py
pip install -e ExplainableVQA/open_clip

On Linux:

sed -i '92s/return x\[0\]/return x/' ExplainableVQA/open_clip/src/open_clip/modified_resnet.py
pip install -e ExplainableVQA/open_clip
  1. Install Dover:

On Mac first run this before continuing: sed -i "" "4s/decord/eva-decord/" ExplainableVQA/DOVER/requirements.txt

pip install -e ExplainableVQA/DOVER
mkdir ExplainableVQA/DOVER/pretrained_weights
wget https://github.com/VQAssessment/DOVER/releases/download/v0.1.0/DOVER.pth -P ExplainableVQA/DOVER/pretrained_weights/

Running

  • Run inference on directory or video/audio file at PATH: python StutterDetection.py PATH
  • This will output a plot of the "motion fluency" over the course of the video (low fluency may indicate stuttering events) and/or a plot of audio stutter times detected in the waveform.

General CLI

usage: StutterDetection.py [-h] [-na] [-nv] [-c] [-t] [-i] [-f FRAMES] [-e EPOCHS]
                           [-d DEVICE]
                           directory

Run audio and video stutter detection algorithms over local AV segments.

positional arguments:
  directory

options:
  -h, --help            show this help message and exit
  -na, --no-audio       Do not perform stutter detection on the audio track
  -nv, --no-video       Do not perform stutter detection on the video track
  -c, --clean-video     Testing on clean stutter-free videos (for experimentation)
  -t, --true-timestamps
                        Plot known stutter times on the output graph, specified in
                        'true-stutter-timestamps.json
  -i, --time-indexed-files
                        Label batch of detections over video segments with their
                        time range (from filename)
  -f FRAMES, --frames FRAMES
                        Number of frames to downsample video to
  -e EPOCHS, --epochs EPOCHS
                        Number of times to repeat inference per video
  -d DEVICE, --device DEVICE
                        Specify processing hardware