Skip to content

houninghu/SoundNet-tensorflow

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundNet-tensorflow

TensorFlow implementation of "SoundNet" that learns rich natural sound representations.

Code for paper "SoundNet: Learning Sound Representations from Unlabeled Video" by Yusuf Aytar, Carl Vondrick, Antonio Torralba. NIPS 2016

from soundnet

Prerequisites

  • Linux
  • NVIDIA GPU + CUDA 8.0 + CuDNNv5.1
  • Python 2.7 with numpy
  • Tensorflow 0.12.1
  • librosa

Getting Started

  • Clone this repo:
git clone [email protected]:eborboihuc/SoundNet-tensorflow.git
cd SoundNet-tensorflow
  • Pretrained Model

I provide pre-trained models that are ported from soundnet. You can download the 8 layer model here. The model locates under ./models/sound8.npy in your folder.

  • NOTE

If you found out that some audio with offset value start in FFMPEG will cause a tremendous difference between torch audio and librosa, please convert it with following command.

sox {input.mp3} {output.mp3} trim 0

After this, the result might be much better.

Demo

To extract multiple features from a pretrained model with torch lua audio loaded sound track: The sound track ./data/demo.npy is equivalent with torch version.

python extract_feat.py -m {start layer number} -x {end layer numbe} -s

Or extract features from raw wave in demo.txt: The demo puts under ./data/demo.mp3

python extract_feat.py -m {start layer number} -x {end layer numbe} -s -t demo.txt

Feature Extraction

To extract multiple features from a pretrained model with downloaded mp3 dataset:

python extract_feat.py -t {dataset_txt_name} -m {start layer number} -x {end layer numbe} -s -p extract

e.g. extract layer 4 to layer 17 and save as ./sound_out/tf_fea%02d.npy:

python extract_feat.py -o sound_out -m 4 -x 17 -s -p extract

More details are in:

python extract_feat.py -h

Finetuning

To train from an existing model:

python main.py 

Training

To train from scratch:

python main.py -p train

To extract features:

python main.py -p extract -m {start layer number} -x {end layer numbe} -s

More details are in:

python main.py -h

TODOs

  • Change audio loader to soundnet format
  • Fix conv8 padding issue in training phase
  • Change all config into tf.app.flags
  • Change dummy distribution of scene and object to useful placeholder
  • Add sound and feature loader from Data section

Known issues

  • Loaded audio length is not consist in torch7 audio and librosa. Here is the issue
  • Training with a short length audio will make conv8 complain about output size would be negative

FAQs

  • Why my loaded sound wave is different from torch7 audio to librosa: Here is my WiKi

Acknowledgments

Code ported from soundnet. And Torch7-Tensorflow loader are from tf_videogan. Thanks for their excellent work!

Author

Hou-Ning Hu / @eborboihuc

About

TensorFlow implementation of "SoundNet".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%