Kapre 0.3.1 (#86)

* add magphase * add test for magphase * add pad_begin * add istft, composed - PR stft + istft, their value test, etc * remove redundant audio loading in test. add phase test doc. * bump version * update docstrings, add constant values for data format strings * update readme * update readme Co-authored-by: keunwoochoi <[email protected]`>
keunwoochoi · Aug 21, 2020 · 29721cb · 29721cb
1 parent 8cdbb16
commit 29721cb
Show file tree

Hide file tree

Showing 9 changed files with 741 additions and 169 deletions.
diff --git a/README.md b/README.md
@@ -1,45 +1,17 @@
 # Kapre
-Keras Audio Preprocessors.
-
-"Why bother to save STFT/melspectrograms to your storage? Just do it on-the-fly on-GPU."
-
-## News
-
-* 15 Aug 2020
-  - 0.3.0
-    - Breaking and simplifying changes with Tensorflow 2.0 and more tests. Some features are removed.
-
-* 29 Jul 2020
-  - 0.2.0
-    - Change melspectrogram filterbank from `norm=1` to `norm='slaney'` (w.r.t. Librosa) due to the update from Librosa ([#77](https://github.com/keunwoochoi/kapre/issues/77)). 
-    This would change the behavior of melspectrogram slightly.
-    - Bump librosa version to 0.7.2 or higher.
-* 17 Mar 2020
-  - 0.1.8
-    - added `utils.Delta` layer
-* 20 Feb 2020
-  - Kapre ver 0.1.7
-    - No vanilla Keras dependency
-    - Tensorflow >= 1.15 only
-    - Not tested on Python 2.7 anymore; only on Python 3.6 and 3.7 locally (by `tox`) and 3.6 on Travis 
-
-* 20 Feb 2019
-  - Kapre ver 0.1.4
-    - Fixed amplitude-to-decibel error as raised in [#46](https://github.com/keunwoochoi/kapre/issues/46)
-
-* March 2018
-  - Kapre ver 0.1.3
-    - Kapre is on Pip again
-    - Add unit tests
-    - Remove `Datasets`
-    - Remove some codes while adding more dependency on Librosa to make it cleaner and more stable
-      - and therefore `htk` option enabled in `Melspectrogram`
-
-* 9 July 2017
-  - Kapre ver 0.1.1, aka 'pretty stable' with a [benchmark paper](https://arxiv.org/abs/1706.05781)
-    - Remove STFT, python3 compatible
-    - A full documentation in this readme.md
-    - pip version is updated
+Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.
+
+Tested on Python 3.3, 3.6, and 3.7.
+
+## Why?
+- Kapre enables you to optimize DSP parameters and makes model deployment simpler with less dependency.  
+- Kapre layers are consistent with 1D/2D tensorflow batch shapes.
+- Kapre layers are compatible with `'channels_fist'` and `'channels_last'`
+- Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) *tricker* than you think.
+- Kapre layers have extended APIs from the default `tf.signals` implementation.
+- Kapre provides a perfectly invertible `STFT` and `InverseSTFT` pair.
+- You save your time implementing and testing all of these.
+- Kapre is available on pip with versioning; hence you keep your code reproducible.   
 
 ## Installation
 
@@ -51,9 +23,20 @@ pip install kapre
 ### Layers
 
 Audio preprocessing layers
-* `STFT`, `Magnitude`, `Phase`, `MagnitudeToDecibel`, `ApplyFilterbank`, `Delta` in [time_frequency.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/time_frequency.py)
-* melspectrogram and log-frequency STFT are composed using time-frequency layers as in [composed.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/composed.py).
-See `get_melspectrogram_layer` and `get_log_frequency_spectrogram_layer`. 
+* Basic layers in [time_frequency.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/time_frequency.py)
+  - `STFT`
+  - `Magnitude`
+  - `Phase`
+  - `MagnitudeToDecibel`
+  - `ApplyFilterbank`
+  - `Delta` 
+* Complicated layers are composed using time-frequency layers as in [composed.py](https://github.com/keunwoochoi/kapre/blob/master/kapre/composed.py).
+  - `kapre.composed.get_perfectly_reconstructing_stft_istft()`
+  - `kapre.composed.get_stft_mag_phase()`
+  - `kapre.composed.get_melspectrogram_layer()`
+  - `kapre.composed.get_log_frequency_spectrogram_layer()`. 
+
+(Note: Official documentation is coming soon)
 
 ## One-shot example
 
@@ -113,4 +96,26 @@ Please cite this paper if you use Kapre for your work.
   year={2017},
   organization={ICML}
 }
-```
+```
+
+## News
+
+* 15 Aug 2020
+  - 0.3.0
+    - Breaking and simplifying changes with Tensorflow 2.0 and more tests. Some features are removed.
+
+* 29 Jul 2020
+  - 0.2.0
+    - Change melspectrogram filterbank from `norm=1` to `norm='slaney'` (w.r.t. Librosa) due to the update from Librosa ([#77](https://github.com/keunwoochoi/kapre/issues/77)). 
+    This would change the behavior of melspectrogram slightly.
+    - Bump librosa version to 0.7.2 or higher.
+* 17 Mar 2020
+  - 0.1.8
+    - added `utils.Delta` layer
+* 20 Feb 2020
+  - Kapre ver 0.1.7
+    - No vanilla Keras dependency
+    - Tensorflow >= 1.15 only
+    - Not tested on Python 2.7 anymore; only on Python 3.6 and 3.7 locally (by `tox`) and 3.6 on Travis 
+
+..and more at `news.md`. 
diff --git a/kapre/__init__.py b/kapre/__init__.py
@@ -1,4 +1,4 @@
-__version__ = '0.3.0'
+__version__ = '0.3.1'
 VERSION = __version__
 
 from . import time_frequency

diff --git a/kapre/backend.py b/kapre/backend.py
@@ -1,22 +1,55 @@
+"""Backend operations of Kapre.
+
+This module summarizes operations and functions that are used in Kapre layers.
+
+Attributes:
+    CH_FIRST_STR (str): 'channels_first', a pre-defined string.
+    CH_LAST_STR (str): 'channels_last', a pre-defined string.
+    CH_DEFAULT_STR (str): 'default', a pre-defined string.
+
+"""
 from tensorflow.keras import backend as K
 import tensorflow as tf
 import numpy as np
 import librosa
 
+CH_FIRST_STR = 'channels_first'
+CH_LAST_STR = 'channels_last'
+CH_DEFAULT_STR = 'default'
+
+
+def validate_data_format_str(data_format):
+    """A function that validates the data format string."""
+    if data_format not in (CH_DEFAULT_STR, CH_FIRST_STR, CH_LAST_STR):
+        raise ValueError(
+            'data_format should be one of {}'.format(
+                str([CH_FIRST_STR, CH_LAST_STR, CH_DEFAULT_STR])
+            )
+            + ' but we received {}'.format(data_format)
+        )
+
 
 def magnitude_to_decibel(x, ref_value=1.0, amin=1e-5, dynamic_range=80.0):
-    """
+    """A function that converts magnitude to decibel scaling.
+    In essence, it runs `10 * log10(x)`, but with some other utility operations.
+
     Similar to `librosa.amplitude_to_db` with `ref=1.0` and `top_db=dynamic_range`
 
     Args:
-        x (tensor): float tensor. Can be batch or not. Something like magnitude of STFT.
-        ref_value (float): an input value that would become 0 dB in the result.
+        x (`Tensor`): float tensor. Can be batch or not. Something like magnitude of STFT.
+        ref_value (`float`): an input value that would become 0 dB in the result.
             For spectrogram magnitudes, ref_value=1.0 usually make the decibel-sclaed output to be around zero
             if the input audio was in [-1, 1].
-        amin (float): the noise floor of the input. An input that is smaller than `amin`, it's converted to `amin.
-        dynamic_range (float): range of the resulting value. E.g., if the maximum magnitude is 30 dB,
+        amin (`float`): the noise floor of the input. An input that is smaller than `amin`, it's converted to `amin.
+        dynamic_range (`float`): range of the resulting value. E.g., if the maximum magnitude is 30 dB,
             the noise floor of the output would become (30 - dynamic_range) dB
 
+    Returns:
+        log_spec (`Tensor`): a decibel-scaled version of `x`.
+
+    Notes:
+        In many deep learning based application, the input spectrogram magnitudes (e.g., abs(STFT)) are decibel-scaled
+        (=logarithmically mapped) for a better performance.
     """
 
     def _log10(x):
@@ -46,16 +79,16 @@ def filterbank_mel(
     """A wrapper for librosa.filters.mel that additionally does transpose and tensor conversion
 
     Args:
-        sample_rate (int): sample rate of the input audio
-        n_freq (int): number of frequency bins in the input STFT magnitude.
-        n_mels (int): the number of mel bands
-        f_min (float): lowest frequency that is going to be included in the mel filterbank (Hertz)
-        f_max (float): highest frequency that is going to be included in the mel filterbank (Hertz)
+        sample_rate (`int`): sample rate of the input audio
+        n_freq (`int`): number of frequency bins in the input STFT magnitude.
+        n_mels (`int`): the number of mel bands
+        f_min (`float`): lowest frequency that is going to be included in the mel filterbank (Hertz)
+        f_max (`float`): highest frequency that is going to be included in the mel filterbank (Hertz)
         htk (bool): whether to use `htk` formula or not
         norm: The default, 'slaney', would normalize the the mel weights by the width of the mel band.
 
-    Return:
-        Mel filterbank tensor. Shape=(n_freq, n_mels)
+    Returns:
+        (`Tensor`): mel filterbanks. Shape=`(n_freq, n_mels)`
     """
     filterbank = librosa.filters.mel(
         sr=sample_rate,
@@ -70,23 +103,23 @@ def filterbank_mel(
 
 
 def filterbank_log(sample_rate, n_freq, n_bins=84, bins_per_octave=12, f_min=None, spread=0.125):
-    """Approximate a constant-Q filter bank for a fixed-window STFT.
-
+    """A function that returns a approximation of constant-Q filter banks for a fixed-window STFT.
     Each filter is a log-normal window centered at the corresponding frequency.
 
-    Note: `logfrequency` in librosa 0.4 (deprecated), so copy-and-pasted,
-        `tuning` was removed, `n_freq` instead of `n_fft`.
+    Note:
+        The code is originally from `logfrequency` in librosa 0.4 (deprecated) and copy-and-pasted.
+        `tuning` parameter was removed and we use `n_freq` instead of `n_fft`.
 
     Args:
-        sample_rate (int): audio sampling rate
-        n_freq (int): number of the input frequency bins. E.g., `n_fft / 2 + 1`
-        n_bins (int): number of the resulting log-frequency bins.  Defaults to 84 (7 octaves).
-        bins_per_octave (int): number of bins per octave. Defaults to 12 (semitones).
-        f_min (float): lowest frequency that is going to be included in the log filterbank. Defaults to `C1 ~= 32.70`
-        spread (float): spread of each filter, as a fraction of a bin.
+        sample_rate (`int`): audio sampling rate
+        n_freq (`int`): number of the input frequency bins. E.g., `n_fft / 2 + 1`
+        n_bins (`int`): number of the resulting log-frequency bins.  Defaults to 84 (7 octaves).
+        bins_per_octave (`int`): number of bins per octave. Defaults to 12 (semitones).
+        f_min (`float`): lowest frequency that is going to be included in the log filterbank. Defaults to `C1 ~= 32.70`
+        spread (`float`): spread of each filter, as a fraction of a bin.
 
     Returns:
-        log-frequency filterbank tensor. Shape=(n_freq, n_bins)
+        (`Tensor`): log-frequency filterbanks. Shape=`(n_freq, n_bins)`
     """
 
     if f_min is None: