From 2ed6b07ed76e6eafb2ddc294c47446e7660f6b0e Mon Sep 17 00:00:00 2001
From: Alain Riou <alain.riou14000@yahoo.com>
Date: Tue, 3 Oct 2023 17:06:18 +0200
Subject: [PATCH 1/4] update readme

---
 README.md | 51 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 46 insertions(+), 5 deletions(-)
diff --git a/README.md b/README.md
index 68a6605..f784f9c 100644
--- a/README.md
+++ b/README.md
@@ -142,19 +142,60 @@ for x, sr in ...:
 Note that when passing a list of files to `pesto.predict_from_files(...)` or the CLI directly, the model  is loaded only
 once so you don't have to bother with that in general.
 
-## Benchmark
+## Performances
 
 On [MIR-1K]() and [MDB-stem-synth](), PESTO outperforms other self-supervised baselines.
-Its performances are close to CREPE's ones, that has 800x more parameters and was trained in a supervised way on a huge dataset containing MIR-1K and MDB-stem-synth, among others.
+Its performances are close to CREPE's ones, that has 800x more parameters and was trained in a supervised way on a huge 
+dataset containing MIR-1K and MDB-stem-synth, among others.
 
 <p align="center">
   <img width="360" src="https://github.com/SonyCSLParis/pesto/blob/master/images/results.png?raw=true">
 </p>
 
-## Speed
+## Speed benchmark
 
-TODO
+PESTO is a very lightweight model, and is therefore very fast at inference time.
+As CQT frames are processed independently, the actual speed of the pitch estimation process mostly depends on the 
+granularity of the predictions, that can be controlled with the `--step_size` parameter (10ms by default).
+
+Here is a comparison speed between CREPE and PESTO, averaged over 10 runs on the same machine.
+
+- Audio file: `wav` format, 2m51s
+- Hardware: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz, 8 cores
+
+Note that the *y*-axis is in log-scale: with a step size of 10ms (the default),
+PESTO would perform pitch estimation of the file in 13 seconds (~12 times faster than real-time) while CREPE would take 12 minutes!
+It is therefore more suited to applications that need very fast pitch estimation without relying on GPU resources.
 
 ## Cite
 
-If you want to cite this work, 
\ No newline at end of file
+If you want to use this work, please cite:
+```
+TODO
+```
+
+## Credits
+
+- [multipitch-architectures](https://github.com/christofw/multipitch_architectures) for the original architecture of the model
+- [nnAudio](https://github.com/KinWaiCheuk/nnAudio) for the original CQT implementation
+
+```
+@ARTICLE{9174990,
+    author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}},
+    journal={IEEE Access}, 
+    title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, 
+    year={2020},
+    volume={8},
+    number={},
+    pages={161981-162003},
+    doi={10.1109/ACCESS.2020.3019084}}
+@ARTICLE{9174990,
+    author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}},
+    journal={IEEE Access}, 
+    title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, 
+    year={2020},
+    volume={8},
+    number={},
+    pages={161981-162003},
+    doi={10.1109/ACCESS.2020.3019084}}
+```
\ No newline at end of file

From 80207eec8364b1763c103ccf5e13de1608c28627 Mon Sep 17 00:00:00 2001
From: Alain Riou <alain.riou14000@yahoo.com>
Date: Tue, 3 Oct 2023 17:15:42 +0200
Subject: [PATCH 2/4] update readme

---
 README.md | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/README.md b/README.md
index f784f9c..cf4a42f 100644
--- a/README.md
+++ b/README.md
@@ -2,26 +2,25 @@
 
 **tl;dr**: Fast and powerful pitch estimator based on machine learning
 
+This code is the implementation of the [PESTO paper](https://arxiv.org/abs/2309.02265),
+that has been accepted at [ISMIR 2023](https://ismir2023.ismir.net/).
+
 **Disclaimer:** This repository contains minimal code and should be used for inference only.
 If you want full implementation details or want to use PESTO for research purposes, take a look at ~~[this repository](https://github.com/aRI0U/pesto-full)~~ (work in progress).
 
-
 ## Installation
 
 ```shell
 pip install pesto
 ```
-
-### Common issues
-
-- When 
+That's it!
 
 ### Dependencies
 
 This repository is implemented in [PyTorch](https://pytorch.org/) and has the following additional dependencies:
-- `matplotlib` and `numpy` for basic I/O and plotting operations
+- `numpy` for basic I/O  operations
 - [torchaudio](https://pytorch.org/audio/stable/) for audio loading
-- [nnAudio](https://github.com/KinWaiCheuk/nnAudio) for computing the Constant-Q Transform (CQT)
+- `matplotlib` for exporting pitch predictions as images (optional)
 
 ## Usage
 
@@ -171,13 +170,19 @@ It is therefore more suited to applications that need very fast pitch estimation
 
 If you want to use this work, please cite:
 ```
-TODO
+@inproceedings{PESTO,
+    author = {Riou, Alain and Lattner, Stefan and Hadjeres, Gaëtan and Peeters, Geoffroy},
+    booktitle = {Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023},
+    publisher = {International Society for Music Information Retrieval},
+    title = {PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective},
+    year = {2023}
+}
 ```
 
 ## Credits
 
-- [multipitch-architectures](https://github.com/christofw/multipitch_architectures) for the original architecture of the model
 - [nnAudio](https://github.com/KinWaiCheuk/nnAudio) for the original CQT implementation
+- [multipitch-architectures](https://github.com/christofw/multipitch_architectures) for the original architecture of the model
 
 ```
 @ARTICLE{9174990,
@@ -189,13 +194,13 @@ TODO
     number={},
     pages={161981-162003},
     doi={10.1109/ACCESS.2020.3019084}}
-@ARTICLE{9174990,
-    author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}},
-    journal={IEEE Access}, 
-    title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, 
-    year={2020},
-    volume={8},
+@ARTICLE{9865174,
+    author={Weiß, Christof and Peeters, Geoffroy},
+    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
+    title={Comparing Deep Models and Evaluation Strategies for Multi-Pitch Estimation in Music Recordings}, 
+    year={2022},
+    volume={30},
     number={},
-    pages={161981-162003},
-    doi={10.1109/ACCESS.2020.3019084}}
+    pages={2814-2827},
+    doi={10.1109/TASLP.2022.3200547}}
 ```
\ No newline at end of file

From 3540b45d69144a1d76a10bfdd23ae185deca8ad7 Mon Sep 17 00:00:00 2001
From: Alain Riou <36546630+aRI0U@users.noreply.github.com>
Date: Tue, 3 Oct 2023 17:59:26 +0200
Subject: [PATCH 3/4] add images to README.md

---
 README.md | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index cf4a42f..4fb72fa 100644
--- a/README.md
+++ b/README.md
@@ -58,9 +58,8 @@ This structure is voluntarily the same as in [CREPE](https://github.com/marl/cre
 Alternatively, one can choose to save timesteps, pitch, confidence and activation outputs as a `.npz` file.
 
 Finally, you can also visualize the pitch predictions by exporting them as a `png` file. Here is an example:
-<p align="center">
-  <img src="https://github.com/SonyCSLParis/pesto/blob/master/examples/example.f0.png?raw=true">
-</p>
+![example f0](https://github.com/SonyCSLParis/pesto/assets/36546630/2ad82c86-136a-4125-bf47-ea1b93408022)
+
 Multiple formats can be specified after the `-e` option.
 
 #### Batch processing
@@ -82,9 +81,7 @@ Additionally, audio files can have any sampling rate, no resampling is required.
 By default, the model returns a probability distribution over all pitch bins.
 To convert it to a proper pitch, by default we use Argmax-Local Weighted Averaging as in CREPE:
 
-<p align="center">
-  <img width="360" src="https://github.com/SonyCSLParis/pesto/blob/master/images/alwa.png?raw=true">
-</p>
+![image](https://github.com/SonyCSLParis/pesto/assets/36546630/38a6f405-f591-4960-81d3-6fcc551d91e8)
 
 Alternatively, one can use basic argmax of weighted average with option `-r`/`--reduction`.
 
@@ -147,9 +144,8 @@ On [MIR-1K]() and [MDB-stem-synth](), PESTO outperforms other self-supervised ba
 Its performances are close to CREPE's ones, that has 800x more parameters and was trained in a supervised way on a huge 
 dataset containing MIR-1K and MDB-stem-synth, among others.
 
-<p align="center">
-  <img width="360" src="https://github.com/SonyCSLParis/pesto/blob/master/images/results.png?raw=true">
-</p>
+![image](https://github.com/SonyCSLParis/pesto/assets/36546630/2fd0e46a-f9ac-4a7e-beb7-95b6f8f030fb)
+
 
 ## Speed benchmark
 
@@ -158,6 +154,7 @@ As CQT frames are processed independently, the actual speed of the pitch estimat
 granularity of the predictions, that can be controlled with the `--step_size` parameter (10ms by default).
 
 Here is a comparison speed between CREPE and PESTO, averaged over 10 runs on the same machine.
+![speed](https://github.com/SonyCSLParis/pesto/assets/36546630/c5ca72be-1c8a-4cbd-bc96-80fbe0d1096f)
 
 - Audio file: `wav` format, 2m51s
 - Hardware: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz, 8 cores
@@ -203,4 +200,4 @@ If you want to use this work, please cite:
     number={},
     pages={2814-2827},
     doi={10.1109/TASLP.2022.3200547}}
-```
\ No newline at end of file
+```

From e36cce3baab0d1a955d0e4151eedff6276a0ca9b Mon Sep 17 00:00:00 2001
From: Alain Riou <36546630+aRI0U@users.noreply.github.com>
Date: Tue, 3 Oct 2023 18:01:10 +0200
Subject: [PATCH 4/4] Update README.md

---
 README.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 4fb72fa..31f04b0 100644
--- a/README.md
+++ b/README.md
@@ -80,8 +80,7 @@ Additionally, audio files can have any sampling rate, no resampling is required.
 
 By default, the model returns a probability distribution over all pitch bins.
 To convert it to a proper pitch, by default we use Argmax-Local Weighted Averaging as in CREPE:
-
-![image](https://github.com/SonyCSLParis/pesto/assets/36546630/38a6f405-f591-4960-81d3-6fcc551d91e8)
+![image](https://github.com/SonyCSLParis/pesto/assets/36546630/7d06bf85-585c-401f-a3c2-f2fab90dd1a7)
 
 Alternatively, one can use basic argmax of weighted average with option `-r`/`--reduction`.