Replies: 9 comments 4 replies
-
I created a complete pipeline to encode inputs into a latent representation using a GON-based approach and proceeded to apply this method to both the sperm whale and the macaque datasets. It seems to work well for low-resolution input spectrograms, but as soon as I increase the resolution beyond some threshold, I run into memory allocation errors. I'll definitely have to look into this more deeply to diagnose the issues. With that being said, it seems like this method is off to a good start! |
Beta Was this translation helpful? Give feedback.
-
I've been working on the GON-based technique, and so far I've implemented it in both PyTorch and Tensorflow using MNIST (as proof-of-concept), sperm whales (for simplicity), and macaques. The procedure and pipeline are similar to the GAIA approach, but it seems like an interesting method for constructing a latent representation. Right now, I'm training the model on MNIST data, and then I'll extract the "encoder" (which essentially involves one gradient descent step to find the latent vector initialized at the origin). I originally completed this entire task/pipeline in Tensorflow (in order to circumvent the memory errors I was experiencing in PyTorch), but the results are looking better in PyTorch for some reason. The only thing is it takes a bit of time to train, so it's still running now. After MNIST, I'll hopefully apply this completed PyTorch framework to the macaque dataset. |
Beta Was this translation helpful? Give feedback.
-
The experimental (not in the GitHub repo) GON notebook is essentially complete, so I'll plan to write it up as another module in the toolbox. I've also included several metrics to try to quantify the goodness of our representations. Finally, here is the ALBERT paper that I mentioned: https://arxiv.org/pdf/1909.11942.pdf. Looks like it might be super relevant, so this could be a good next step/problem to tackle. |
Beta Was this translation helpful? Give feedback.
-
I finalized the GON module and added it to the repo. As I mentioned in the previous post above, I added a few metrics in an attempt to assess the various representations based on the quality of clustering in the respective feature spaces. It might be worth exploring these more thoroughly, but in the meantime, it seems like the ALBERT approach might be a useful tool to investigate. |
Beta Was this translation helpful? Give feedback.
-
I've started to explore the S3PRL toolkit and to look to integrate this into our framework. I also found this paper (https://link.springer.com/article/10.1007/s00521-018-3626-7) on wavelet transforms, so I worked on developing the CWT module in the toolbox. The paper mentions that this representation might lead to improved results when generalizing to other datasets not used during model training, so I figured it might be worthwhile to dive into this approach. Right now, I've applied this technique to Watkins sperm whale data as well as macaque data, and I've trained a baseline classifier model on the macaque identification task, yielding a validation accuracy of ~91%. There seem to be quite a few options for constructing the representation, so I can continue investigating some of these choices. |
Beta Was this translation helpful? Give feedback.
-
I experimented by applying the various approaches in the toolkit to the sperm whale click dataset. I started with the conventional spectrogram approach and then implemented the HHT and TFR approaches, which ended up yielding an accuracy increase from 92% to 99%. I also worked on modifying the representation goodness metric, and it is leading to promising results. I look forward to discussing these findings tomorrow and sharing the resultant images (since I'm not sure about posting them here). It's pretty awesome because this technique allows for super fast prototyping since I can go from a spectrogram-based model to an HHT-based model by changing one line of code! |
Beta Was this translation helpful? Give feedback.
-
I've been applying the above techniques to the macaque dataset in an effort to explore the effect of representation on classification performance. It's really great because the framework is super modular, so it's relatively easy to switch between datasets, representations, architectures, etc. I ran into a minor issue with the GCP VM instance I had created, but I ended up using the AI Platform notebooks, which was very straightforward. Other than that, I've also been looking into other approaches for visualizing activations and feature maps and constructing custom training loops to gain insight into how these networks are using the various representations to carry out the classification tasks. |
Beta Was this translation helpful? Give feedback.
-
I was able to diagnose and resolve the issue with the macaque HHT representation. At first, I tried bumping up the GCP compute to no avail, so I dove into the custom HHT architecture, and I found that for certain samples, the method was getting stuck while computing the EMD (using PyEMD). There are a couple of other packages that offer EMD methods, but I just used a timeout decorator to account for the samples that would get stuck. With that being said, it didn’t seem as though the HHT representation was as successful for the tonal macaque calls as it was for the broadband transient sperm whale clicks. However, I ended up using a spectrogram-based approach to yield a final error of 0.00134. Also, as per our last conversation, I ended up training a network to “compute” a spectrogram from raw audio. This involved (1) a decoder-like head to output a spectrogram-like image from a raw audio input and (2) a UNET to output a TFR spectrogram-like image. I then added a couple of conv layers and dense layers to construct a CNN-based classifier, which yielded a final accuracy of ~96%. This was a pretty neat approach, and I think there is a ton of room for experimentation and improvement! |
Beta Was this translation helpful? Give feedback.
-
I revised the model constructed to learn how to generate spectrograms by editing the architecture and experimenting with come custom losses. For instance, I tried a usual L2 reconstruction loss as well as an ImageNet/VGG-based perceptual loss. Interestingly, though the "spectrograms" generated using the perceptual loss seemed to look better, they ended up performing worse in the downstream whale ID classification task. Ultimately, the optimal model involved training both the decoder-like spectrogram generator and the UNET TFR generator with only the L2 loss - this approach yielded a final holdout test accuracy of 97.2%, which was worse than the final accuracy using TFR spectrograms as inputs but better than the original spectrograms. Other than this, I worked on the dolphin dataset, and I've managed to decipher the annotations and extract whistles from the recordings. |
Beta Was this translation helpful? Give feedback.
-
Today I added commentary to the deep-learned representation module in the toolbox. I really like this approach and think that it offers a number of advantages that we can explore more thoroughly in other applications. I then applied the techniques described in this section to the 'Best Of Cuts' in the Watkins database with limited success - with that being said, it might simply be due to an insufficient amount of training data in this relatively small portion of the total available Watkins database. To address this, I started investigating several methods for augmenting acoustic datasets - the Google AI Blog describes a few interesting ideas (https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html), but I also experimented with neural style transfer to apply a background noise "style" to the "content" spectrograms. I'm definitely looking forward to discovering better ways to augment small datasets.
Beta Was this translation helpful? Give feedback.
All reactions