Skip to content

Commit

Permalink
Kapre 0.3.1 (#86)
Browse files Browse the repository at this point in the history
* add magphase

* add test for magphase

* add pad_begin

* add istft, composed - PR stft + istft, their value test, etc

* remove redundant audio loading in test. add phase test doc.

* bump version

* update docstrings, add constant values for data format strings

* update readme

* update readme

Co-authored-by: keunwoochoi <[email protected]`>
  • Loading branch information
keunwoochoi and keunwoochoi authored Aug 21, 2020
1 parent 8cdbb16 commit 29721cb
Show file tree
Hide file tree
Showing 9 changed files with 741 additions and 169 deletions.
95 changes: 50 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,17 @@
# Kapre
Keras Audio Preprocessors.

"Why bother to save STFT/melspectrograms to your storage? Just do it on-the-fly on-GPU."

## News

* 15 Aug 2020
- 0.3.0
- Breaking and simplifying changes with Tensorflow 2.0 and more tests. Some features are removed.

* 29 Jul 2020
- 0.2.0
- Change melspectrogram filterbank from `norm=1` to `norm='slaney'` (w.r.t. Librosa) due to the update from Librosa ([#77](https://github.com/keunwoochoi/kapre/issues/77)).
This would change the behavior of melspectrogram slightly.
- Bump librosa version to 0.7.2 or higher.
* 17 Mar 2020
- 0.1.8
- added `utils.Delta` layer
* 20 Feb 2020
- Kapre ver 0.1.7
- No vanilla Keras dependency
- Tensorflow >= 1.15 only
- Not tested on Python 2.7 anymore; only on Python 3.6 and 3.7 locally (by `tox`) and 3.6 on Travis

* 20 Feb 2019
- Kapre ver 0.1.4
- Fixed amplitude-to-decibel error as raised in [#46](https://github.com/keunwoochoi/kapre/issues/46)

* March 2018
- Kapre ver 0.1.3
- Kapre is on Pip again
- Add unit tests
- Remove `Datasets`
- Remove some codes while adding more dependency on Librosa to make it cleaner and more stable
- and therefore `htk` option enabled in `Melspectrogram`

* 9 July 2017
- Kapre ver 0.1.1, aka 'pretty stable' with a [benchmark paper](https://arxiv.org/abs/1706.05781)
- Remove STFT, python3 compatible
- A full documentation in this readme.md
- pip version is updated
Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.3, 3.6, and 3.7.

## Why?
- Kapre enables you to optimize DSP parameters and makes model deployment simpler with less dependency.
- Kapre layers are consistent with 1D/2D tensorflow batch shapes.
- Kapre layers are compatible with `'channels_fist'` and `'channels_last'`
- Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) *tricker* than you think.
- Kapre layers have extended APIs from the default `tf.signals` implementation.
- Kapre provides a perfectly invertible `STFT` and `InverseSTFT` pair.
- You save your time implementing and testing all of these.
- Kapre is available on pip with versioning; hence you keep your code reproducible.

## Installation

Expand All @@ -51,9 +23,20 @@ pip install kapre
### Layers

Audio preprocessing layers
* `STFT`, `Magnitude`, `Phase`, `MagnitudeToDecibel`, `ApplyFilterbank`, `Delta` in [time_frequency.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/time_frequency.py)
* melspectrogram and log-frequency STFT are composed using time-frequency layers as in [composed.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/composed.py).
See `get_melspectrogram_layer` and `get_log_frequency_spectrogram_layer`.
* Basic layers in [time_frequency.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/time_frequency.py)
- `STFT`
- `Magnitude`
- `Phase`
- `MagnitudeToDecibel`
- `ApplyFilterbank`
- `Delta`
* Complicated layers are composed using time-frequency layers as in [composed.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/composed.py).
- `kapre.composed.get_perfectly_reconstructing_stft_istft()`
- `kapre.composed.get_stft_mag_phase()`
- `kapre.composed.get_melspectrogram_layer()`
- `kapre.composed.get_log_frequency_spectrogram_layer()`.

(Note: Official documentation is coming soon)

## One-shot example

Expand Down Expand Up @@ -113,4 +96,26 @@ Please cite this paper if you use Kapre for your work.
year={2017},
organization={ICML}
}
```
```

## News

* 15 Aug 2020
- 0.3.0
- Breaking and simplifying changes with Tensorflow 2.0 and more tests. Some features are removed.

* 29 Jul 2020
- 0.2.0
- Change melspectrogram filterbank from `norm=1` to `norm='slaney'` (w.r.t. Librosa) due to the update from Librosa ([#77](https://github.com/keunwoochoi/kapre/issues/77)).
This would change the behavior of melspectrogram slightly.
- Bump librosa version to 0.7.2 or higher.
* 17 Mar 2020
- 0.1.8
- added `utils.Delta` layer
* 20 Feb 2020
- Kapre ver 0.1.7
- No vanilla Keras dependency
- Tensorflow >= 1.15 only
- Not tested on Python 2.7 anymore; only on Python 3.6 and 3.7 locally (by `tox`) and 3.6 on Travis

..and more at `news.md`.
2 changes: 1 addition & 1 deletion kapre/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = '0.3.0'
__version__ = '0.3.1'
VERSION = __version__

from . import time_frequency
Expand Down
79 changes: 56 additions & 23 deletions kapre/backend.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,55 @@
"""Backend operations of Kapre.
This module summarizes operations and functions that are used in Kapre layers.
Attributes:
CH_FIRST_STR (str): 'channels_first', a pre-defined string.
CH_LAST_STR (str): 'channels_last', a pre-defined string.
CH_DEFAULT_STR (str): 'default', a pre-defined string.
"""
from tensorflow.keras import backend as K
import tensorflow as tf
import numpy as np
import librosa

CH_FIRST_STR = 'channels_first'
CH_LAST_STR = 'channels_last'
CH_DEFAULT_STR = 'default'


def validate_data_format_str(data_format):
"""A function that validates the data format string."""
if data_format not in (CH_DEFAULT_STR, CH_FIRST_STR, CH_LAST_STR):
raise ValueError(
'data_format should be one of {}'.format(
str([CH_FIRST_STR, CH_LAST_STR, CH_DEFAULT_STR])
)
+ ' but we received {}'.format(data_format)
)


def magnitude_to_decibel(x, ref_value=1.0, amin=1e-5, dynamic_range=80.0):
"""
"""A function that converts magnitude to decibel scaling.
In essence, it runs `10 * log10(x)`, but with some other utility operations.
Similar to `librosa.amplitude_to_db` with `ref=1.0` and `top_db=dynamic_range`
Args:
x (tensor): float tensor. Can be batch or not. Something like magnitude of STFT.
ref_value (float): an input value that would become 0 dB in the result.
x (`Tensor`): float tensor. Can be batch or not. Something like magnitude of STFT.
ref_value (`float`): an input value that would become 0 dB in the result.
For spectrogram magnitudes, ref_value=1.0 usually make the decibel-sclaed output to be around zero
if the input audio was in [-1, 1].
amin (float): the noise floor of the input. An input that is smaller than `amin`, it's converted to `amin.
dynamic_range (float): range of the resulting value. E.g., if the maximum magnitude is 30 dB,
amin (`float`): the noise floor of the input. An input that is smaller than `amin`, it's converted to `amin.
dynamic_range (`float`): range of the resulting value. E.g., if the maximum magnitude is 30 dB,
the noise floor of the output would become (30 - dynamic_range) dB
Returns:
log_spec (`Tensor`): a decibel-scaled version of `x`.
Notes:
In many deep learning based application, the input spectrogram magnitudes (e.g., abs(STFT)) are decibel-scaled
(=logarithmically mapped) for a better performance.
"""

def _log10(x):
Expand Down Expand Up @@ -46,16 +79,16 @@ def filterbank_mel(
"""A wrapper for librosa.filters.mel that additionally does transpose and tensor conversion
Args:
sample_rate (int): sample rate of the input audio
n_freq (int): number of frequency bins in the input STFT magnitude.
n_mels (int): the number of mel bands
f_min (float): lowest frequency that is going to be included in the mel filterbank (Hertz)
f_max (float): highest frequency that is going to be included in the mel filterbank (Hertz)
sample_rate (`int`): sample rate of the input audio
n_freq (`int`): number of frequency bins in the input STFT magnitude.
n_mels (`int`): the number of mel bands
f_min (`float`): lowest frequency that is going to be included in the mel filterbank (Hertz)
f_max (`float`): highest frequency that is going to be included in the mel filterbank (Hertz)
htk (bool): whether to use `htk` formula or not
norm: The default, 'slaney', would normalize the the mel weights by the width of the mel band.
Return:
Mel filterbank tensor. Shape=(n_freq, n_mels)
Returns:
(`Tensor`): mel filterbanks. Shape=`(n_freq, n_mels)`
"""
filterbank = librosa.filters.mel(
sr=sample_rate,
Expand All @@ -70,23 +103,23 @@ def filterbank_mel(


def filterbank_log(sample_rate, n_freq, n_bins=84, bins_per_octave=12, f_min=None, spread=0.125):
"""Approximate a constant-Q filter bank for a fixed-window STFT.
"""A function that returns a approximation of constant-Q filter banks for a fixed-window STFT.
Each filter is a log-normal window centered at the corresponding frequency.
Note: `logfrequency` in librosa 0.4 (deprecated), so copy-and-pasted,
`tuning` was removed, `n_freq` instead of `n_fft`.
Note:
The code is originally from `logfrequency` in librosa 0.4 (deprecated) and copy-and-pasted.
`tuning` parameter was removed and we use `n_freq` instead of `n_fft`.
Args:
sample_rate (int): audio sampling rate
n_freq (int): number of the input frequency bins. E.g., `n_fft / 2 + 1`
n_bins (int): number of the resulting log-frequency bins. Defaults to 84 (7 octaves).
bins_per_octave (int): number of bins per octave. Defaults to 12 (semitones).
f_min (float): lowest frequency that is going to be included in the log filterbank. Defaults to `C1 ~= 32.70`
spread (float): spread of each filter, as a fraction of a bin.
sample_rate (`int`): audio sampling rate
n_freq (`int`): number of the input frequency bins. E.g., `n_fft / 2 + 1`
n_bins (`int`): number of the resulting log-frequency bins. Defaults to 84 (7 octaves).
bins_per_octave (`int`): number of bins per octave. Defaults to 12 (semitones).
f_min (`float`): lowest frequency that is going to be included in the log filterbank. Defaults to `C1 ~= 32.70`
spread (`float`): spread of each filter, as a fraction of a bin.
Returns:
log-frequency filterbank tensor. Shape=(n_freq, n_bins)
(`Tensor`): log-frequency filterbanks. Shape=`(n_freq, n_bins)`
"""

if f_min is None:
Expand Down
Loading

0 comments on commit 29721cb

Please sign in to comment.