Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Whisper CPP and write a wrapper module in Aprapipes #324

Merged
merged 43 commits into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
8e23a6e
Add custom port vcpkg for whisper
joiskash Dec 22, 2023
fb29351
Add whisper stream
joiskash Dec 22, 2023
4ed21e2
Add whisper stream header
joiskash Dec 22, 2023
deab2d0
Add whisper cpp to Cmake list
joiskash Dec 22, 2023
9462f59
Add test frame type and minor changes
joiskash Dec 22, 2023
c55c40b
Add whisper to vcpkg
joiskash Dec 27, 2023
870862c
Add vcpkg custom overlay ports to thirdparty
joiskash Dec 27, 2023
ca4a6e4
Modify with whisper option
joiskash Dec 27, 2023
482f02c
Send whisper output as text frames
joiskash Dec 27, 2023
d12edf5
revert changes to sound record test
joiskash Dec 27, 2023
3275afd
Add whisper UT
joiskash Dec 27, 2023
acd8a1f
Fix PS to remove whisper from vcpkg json
joiskash Dec 27, 2023
9b18eb3
Revert changes to OPTIONS section, remove WHISPER option, rename Whis…
joiskash Dec 31, 2023
cf5d8a4
Move pcm to git lfs
joiskash Dec 31, 2023
5ad9157
Add pcm and model bin file to lfs
joiskash Dec 31, 2023
ded9a03
Fix UT name
joiskash Dec 31, 2023
ec0ca73
Throw AIP exception for unknown strategy
joiskash Dec 31, 2023
6d4528e
Revert sound_record_tests.cpp changes
joiskash Dec 31, 2023
91fe148
Revert changes to vcpkg indentation and remove Whisper option
joiskash Dec 31, 2023
2021355
Linux -> OFF to ON Windows ON -> OFF
joiskash Jan 3, 2024
80500ce
Add reserve statement for vector
joiskash Jan 9, 2024
42ca754
update submodule for pipeline to run
joiskash Jan 9, 2024
66cd4d8
Update whisper port with install fix
joiskash Jan 13, 2024
e817f98
update submodule
joiskash Jan 13, 2024
ce3d6e2
Update vcpkg version
joiskash Jan 13, 2024
f33644f
Add changes to handle props change
joiskash Jan 13, 2024
b6e20df
Improve UT and refactor for changing sample strategy during run time.
joiskash Jan 13, 2024
925e508
Add apt-get install libx11-dev libgles2-mesa-dev for libepoxy error
joiskash Jan 13, 2024
1d7bc11
Add memory type check in validate input pins and throw exception if m…
joiskash Feb 15, 2024
bc04e47
update submodule
joiskash Feb 15, 2024
25090aa
Merge branch 'main' into kj/whisper-asr
joiskash Feb 15, 2024
0c56895
update vcpkg mysys2
joiskash Feb 15, 2024
969e844
update submodule
joiskash Feb 15, 2024
9f58b90
Address nits
joiskash Feb 16, 2024
1e738f6
Export env variable overlay port for building in arm64
joiskash Feb 16, 2024
d478555
added fix-for-arm64.patch for whisper
kushaljain-apra Feb 23, 2024
67cbe9a
update fix-vcpkg-json.ps1
kushaljain-apra Feb 23, 2024
6ddd487
update CMakeLists.txt
Feb 23, 2024
dba812f
update vcpkg url for build
joiskash Feb 23, 2024
4716f25
update whisper tests threshold
kushaljain-apra Feb 26, 2024
f494d88
update code formatting
kushaljain-apra Feb 26, 2024
ad0977b
update whisper test
kushaljain-apra Feb 26, 2024
42df5de
added EOS for small buffer size
kushaljain-apra Feb 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/CI-Linux-ARM64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
cuda: 'ON'
prep-cmd: 'echo skipping builder prep as I can not sudo'
cache-path: './none'
cmake-conf-cmd: 'export VCPKG_FORCE_SYSTEM_BINARIES=1 && cmake -B . -DENABLE_ARM64=ON ../base'
cmake-conf-cmd: 'export VCPKG_FORCE_SYSTEM_BINARIES=1 && export VCPKG_OVERLAY_PORTS=../thirdparty/custom-overlay && cmake -B . -DENABLE_ARM64=ON ../base'
nProc: 6
jetson-publish:
needs: jetson-build-test
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-test-lin-container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ on:
prep-cmd:
type: string
description: 'commands required to be run on a builder to prep it for build'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix && pip3 install meson'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix libx11-dev libgles2-mesa-dev && pip3 install meson'
required: false
prep-check-cmd:
type: string
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-test-lin-wsl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ on:
prep-cmd:
type: string
description: 'commands required to be run on a builder to prep it for build'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix && pip3 install meson'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix libx11-dev libgles2-mesa-dev && pip3 install meson'
required: false
prep-check-cmd:
type: string
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-test-lin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ on:
prep-cmd:
type: string
description: 'commands required to be run on a builder to prep it for build'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxdamage-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix && pip3 install meson'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxdamage-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix libx11-dev libgles2-mesa-dev && pip3 install meson'
required: false
prep-check-cmd:
type: string
Expand Down
16 changes: 9 additions & 7 deletions base/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ OPTION(ENABLE_ARM64 "Use this switch to enable ARM64" OFF)
OPTION(ENABLE_WINDOWS "Use this switch to enable WINDOWS" OFF)

set(VCPKG_INSTALL_OPTIONS "--clean-after-build")
set(VCPKG_OVERLAY_PORTS "${CMAKE_CURRENT_SOURCE_DIR}/../thirdparty/custom-overlay")
kumaakh marked this conversation as resolved.
Show resolved Hide resolved

IF(ENABLE_CUDA)
add_compile_definitions(APRA_CUDA_ENABLED)
ENDIF(ENABLE_CUDA)
Expand All @@ -23,6 +25,7 @@ IF(ENABLE_ARM64)
add_compile_definitions(ARM64)
set(VCPKG_OVERLAY_PORTS ../vcpkg/ports/cudnn)
set(VCPKG_OVERLAY_TRIPLETS ../vcpkg/triplets/community/arm64-linux.cmake)
set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)
ENDIF(ENABLE_ARM64)

#use /MP only for language CXX (and not CUDA) and MSVC for both targets
Expand All @@ -38,8 +41,6 @@ project(APRAPIPES)
message(STATUS $ENV{PKG_CONFIG_PATH}">>>>>> PKG_CONFIG_PATH")

find_package(PkgConfig REQUIRED)


find_package(Boost COMPONENTS system thread filesystem serialization log chrono unit_test_framework REQUIRED)
find_package(JPEG REQUIRED)
find_package(OpenCV CONFIG REQUIRED)
Expand All @@ -50,6 +51,7 @@ find_package(FFMPEG REQUIRED)
find_package(ZXing CONFIG REQUIRED)
find_package(bigint CONFIG REQUIRED)
find_package(SFML COMPONENTS system window audio graphics CONFIG REQUIRED)
find_package(whisper CONFIG REQUIRED)

IF(ENABLE_CUDA)
if((NOT DEFINED CMAKE_CUDA_ARCHITECTURES) OR (CMAKE_CUDA_ARCHITECTURES STREQUAL ""))
Expand Down Expand Up @@ -280,10 +282,9 @@ SET(IP_FILES
src/OverlayFactory.h
src/OverlayFactory.cpp
src/TestSignalGeneratorSrc.cpp
src/AudioToTextXForm.cpp
)




SET(IP_FILES_H
include/HistogramOverlay.h
include/CalcHistogramCV.h
Expand All @@ -306,10 +307,9 @@ SET(IP_FILES_H
include/TextOverlayXForm.h
include/ColorConversionXForm.h
include/Overlay.h
include/AudioToTextXForm.h
)



SET(CUDA_CORE_FILES
src/apra_cudamalloc_allocator.cu
src/apra_cudamallochost_allocator.cu
Expand Down Expand Up @@ -561,6 +561,7 @@ SET(UT_FILES
test/mp4_dts_strategy_tests.cpp
test/overlaymodule_tests.cpp
test/testSignalGeneratorSrc_tests.cpp
test/audioToTextXform_tests.cpp
${ARM64_UT_FILES}
${CUDA_UT_FILES}
)
Expand Down Expand Up @@ -607,6 +608,7 @@ target_link_libraries(aprapipesut
liblzma::liblzma
bigint::bigint
sfml-audio
whisper::whisper
)

IF(ENABLE_WINDOWS)
Expand Down
4 changes: 4 additions & 0 deletions base/fix-vcpkg-json.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ if ($removeCUDA.IsPresent)
$v.dependencies |
Where-Object { $_.name -eq 'opencv4' } |
ForEach-Object { $_.features = $_.features -ne 'cuda' -ne 'cudnn' }

$v.dependencies |
Where-Object { $_.name -eq 'whisper' } |
ForEach-Object { $_.features = $_.features -ne 'cuda' }
}

if($removeOpenCV.IsPresent)
Expand Down
4 changes: 4 additions & 0 deletions base/fix-vcpkg-json.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ if $removeCUDA; then
# Remove "cuda" and "cudnn" features for this "opencv4" instance
v=$(echo "$v" | jq ".dependencies[$index].features |= map(select(. != \"cuda\" and . != \"cudnn\"))")
fi
if [ "$name" == "whisper"]; then
# Remove "cuda" features for this "whisper" instance
v=$(echo "$v" | jq ".dependencies[$index].features |= map(select(. != \"cuda\"))")
fi
done
fi

Expand Down
57 changes: 57 additions & 0 deletions base/include/AudioToTextXForm.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#pragma once

#include "Module.h"

// size of audio to process should be a parameter.
// Cache variable to collect frames for processing

class AudioToTextXFormProps : public ModuleProps
{
public:
enum DecoderSamplingStrategy {
GREEDY,
BEAM_SEARCH
};

DecoderSamplingStrategy samplingStrategy;
std::string modelPath;
int bufferSize;

AudioToTextXFormProps(
kumaakh marked this conversation as resolved.
Show resolved Hide resolved
DecoderSamplingStrategy _samplingStrategy,
std::string _modelPath,
int _bufferSize);
size_t getSerializeSize();


private:
friend class boost::serialization::access;

template <class Archive>
void serialize(Archive& ar, const unsigned int version);
};

class AudioToTextXForm : public Module
{

public:
AudioToTextXForm(AudioToTextXFormProps _props);
virtual ~AudioToTextXForm();
bool init();
bool term();
void setProps(AudioToTextXFormProps& props);
AudioToTextXFormProps getProps();

protected:
bool process(frame_container& frames);
bool processSOS(frame_sp& frame);
bool validateInputPins();
bool validateOutputPins();
void addInputPin(framemetadata_sp& metadata, string& pinId);
bool handlePropsChange(frame_sp& frame);

private:
void setMetadata(framemetadata_sp& metadata);
class Detail;
boost::shared_ptr<Detail> mDetail;
};
3 changes: 2 additions & 1 deletion base/include/FrameMetadata.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ class FrameMetadata {
HEVC_DATA, //H265
MOTION_VECTOR_DATA,
OVERLAY_INFO_IMAGE,
FACE_LANDMARKS_INFO
FACE_LANDMARKS_INFO,
TEXT
};

enum MemType
Expand Down
Loading
Loading