SoundNet-tensorflow

TensorFlow implementation of "SoundNet" that learns rich natural sound representations.

Code for paper "SoundNet: Learning Sound Representations from Unlabeled Video" by Yusuf Aytar, Carl Vondrick, Antonio Torralba. NIPS 2016

Prerequisites

Linux
NVIDIA GPU + CUDA 8.0 + CuDNNv5.1
Python 2.7 with numpy
Tensorflow 0.12.1
librosa

Getting Started

Clone this repo:

git clone [email protected]:eborboihuc/SoundNet-tensorflow.git
cd SoundNet-tensorflow

Pretrained Model

I provide pre-trained models that are ported from soundnet. You can download the 8 layer model here. The model locates under ./models/sound8.npy in your folder.

NOTE

If you found out that some audio with offset value start in FFMPEG will cause a tremendous difference between torch audio and librosa, please convert it with following command.

sox {input.mp3} {output.mp3} trim 0

After this, the result might be much better.

Demo

To extract multiple features from a pretrained model with torch lua audio loaded sound track: The sound track ./data/demo.npy is equivalent with torch version.

python extract_feat.py -m {start layer number} -x {end layer numbe} -s

Or extract features from raw wave in demo.txt: The demo puts under ./data/demo.mp3

python extract_feat.py -m {start layer number} -x {end layer numbe} -s -t demo.txt

Feature Extraction

To extract multiple features from a pretrained model with downloaded mp3 dataset:

python extract_feat.py -t {dataset_txt_name} -m {start layer number} -x {end layer numbe} -s -p extract

e.g. extract layer 4 to layer 17 and save as ./sound_out/tf_fea%02d.npy:

python extract_feat.py -o sound_out -m 4 -x 17 -s -p extract

More details are in:

python extract_feat.py -h

Finetuning

To train from an existing model:

python main.py

Training

To train from scratch:

python main.py -p train

To extract features:

python main.py -p extract -m {start layer number} -x {end layer numbe} -s

More details are in:

python main.py -h

TODOs

Change audio loader to soundnet format
Fix conv8 padding issue in training phase
Change all config into tf.app.flags
Change dummy distribution of scene and object to useful placeholder
Add sound and feature loader from Data section

Known issues

Loaded audio length is not consist in torch7 audio and librosa. Here is the issue
Training with a short length audio will make conv8 complain about output size would be negative

FAQs

Why my loaded sound wave is different from torch7 audio to librosa: Here is my WiKi

Acknowledgments

Code ported from soundnet. And Torch7-Tensorflow loader are from tf_videogan. Thanks for their excellent work!

Author

Hou-Ning Hu / @eborboihuc

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
cmp.py		cmp.py
demo.txt		demo.txt
extract_feat.py		extract_feat.py
h5convert.py		h5convert.py
load_t7.py		load_t7.py
main.py		main.py
model.py		model.py
ops.py		ops.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundNet-tensorflow

Prerequisites

Getting Started

Demo

Feature Extraction

Finetuning

Training

TODOs

Known issues

FAQs

Acknowledgments

Author

About

Releases

Packages

Languages

houninghu/SoundNet-tensorflow

Folders and files

Latest commit

History

Repository files navigation

SoundNet-tensorflow

Prerequisites

Getting Started

Demo

Feature Extraction

Finetuning

Training

TODOs

Known issues

FAQs

Acknowledgments

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages