ISPA: Inter-Species Phonetic Alphabet for Transcribing Animal Sounds

ISPA (Inter-Species Phonetic Alphabet) is a precise, concise, and interpretable system designed for transcribing animal sounds into text, inspired by IPA (International Phonetic Alphabet) used for transcribing human speech sounds into text. See our paper for more details.

This repository contains the code and pretrained models for running the transcription.

Installation

Create a conda environment named 'ispa' with the required dependencies:

conda create -n ispa python=3.8
conda activate ispa

Install the dependencies and ISPA:

pip install -r requirements.txt
pip install -e .

Download the pretrained AVES model:

If you are using the feature-based ISPA with AVES, download and copy AVES-bio models and config (TorchAudio version) from the AVES repository:

wget https://storage.googleapis.com/esp-public-files/ported_aves/aves-base-bio.torchaudio.pt -P models
wget https://storage.googleapis.com/esp-public-files/ported_aves/aves-base-bio.torchaudio.model_config.json -P models

Usage

See sample.py for detailed usage examples.

from ispa import utils
from ispa.acoustics import run_inference as ispa_a_run_inference
from ispa.features import FeatureBasedISPAPredictor

waveform, sr = utils.load_waveform('1-38560-A-14.wav')
ispa_results = ispa_a_run_inference(waveform, sr)
print("ISP-A results:")
print(ispa_results['text'])
print()

ispa_f_predictor = FeatureBasedISPAPredictor(
    feature_type='mfcc',
    kmeans_model='models/kmeans.mfcc.pkl',
    phoneme_map='models/c2p.mfcc.json')
print("ISP-F results (with MFCC):")
print("(raw):", ispa_f_predictor.predict(waveform, variation='raw'))
print("(seg):", ispa_f_predictor.predict(waveform, variation='seg'))
print("(phn):", ispa_f_predictor.predict(waveform, variation='phn'))
print()

ispa_f_predictor = FeatureBasedISPAPredictor(
    feature_type='aves',
    kmeans_model='models/kmeans.aves.pkl',
    phoneme_map='models/c2p.aves.json',
    aves_config_path='models/aves-base-bio.torchaudio.model_config.json',
    aves_model_path='models/aves-base-bio.torchaudio.pt')
print("ISP-F results (with AVES):")
print("(raw):", ispa_f_predictor.predict(waveform, variation='raw'))
print("(seg):", ispa_f_predictor.predict(waveform, variation='seg'))
print("(phn):", ispa_f_predictor.predict(waveform, variation='phn'))
print()

This code will generate the following output:

ISP-A results:
N6/32= N5/2= N5/8+2 U5/16-2 N6/16-1 N6/4= U5/16= U4/4= U5/16-2 U5/2= U5/8-2 N6/16= N5/16-2 R/8 R/4 R/2 Rx2

ISP-F results (with MFCC):
(raw): 26 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 ... (omitted)
(seg): 26 44; 44.. 36; 15; 44; 44. 10.. 10. 10.. 10.
(phn): n o~; o~.. 4; t_d_h; o~; o~. t_j.. t_j. t_j.. t_j.

ISP-F results (with AVES):
(raw): 26 26 26 8 31 26 31 31 31 31 22 2 31 31 31 33 33 ... (omitted)
(seg): 26: 31; 33. 22.. 22; 31. 20. 40. 20.. 40. 13, 20, 20.
(phn): kp_}: q; f. t_h.. t_h; q. v. b. v.. b. n, v, v.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ispa		ispa
models		models
.gitignore		.gitignore
1-38560-A-14.wav		1-38560-A-14.wav
README.md		README.md
fig_ispa.png		fig_ispa.png
requirements.txt		requirements.txt
sample.py		sample.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISPA: Inter-Species Phonetic Alphabet for Transcribing Animal Sounds

Installation

Usage

About

Releases

Packages

Languages

earthspecies/ispa

Folders and files

Latest commit

History

Repository files navigation

ISPA: Inter-Species Phonetic Alphabet for Transcribing Animal Sounds

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages