Dolphins, Learned Transforms, Neural Architectures, and Cocktail Party Problem #21

pcbermant · 2020-08-27T01:37:25Z

pcbermant
Aug 27, 2020

Firstly, I experimented with the dolphin stranding inference runs, and given that we've been working on this problem for less than 1 week, I suppose I am cautiously optimistic. But I'm also wondering if we planning to continue with this project (?).

Also, I continued to explore the concept of learned transforms (time-frequency representations), but after experimenting with a number of techniques in an attempt to improve the results, I decided to explore more deeply the fundamentals of the neural network architectures used in acoustic ML problems. Over the past few weeks, I've been considering the problem of how to construct an optimal neural network model, and today--after reading some intro resources on the cocktail party problem (CPP)--I experimented with a new type of model with surprisingly promising results (for instance, the simple CNN-based toy model achieved 96.9% accuracy on the macaque dataset, but the new model achieved 98.9%). I have a few ideas in mind for modifying the architecture in the hopes that it could help to address a number of general bioacoustics research problems in addition to the CPP-related problems of sequential and simultaneous integration (and segregation) of acoustic signals.

aza · 2020-08-28T04:36:08Z

aza
Aug 28, 2020
Maintainer

Interesting! What's the new model type (and what's the intuition behind why it is better)?

A couple thoughts:

On the dolphin stranding project: my hunch is that we should treat it like a side project that you work on when stuck on the main problem. One problem can often give both distance from and perspective on a second problem. This is a raw thought, though—haven't thought the implications. Curious what @bs, @radekosmulski, and @kzacarian think?
The ESP Library work that @oliver-adams-b is doing: adding the Egyptian Bat data from the Yossi Yovel data sets. We should make sure that it also acts like a new ESP-MNIST, as it seems we are starting to max out using macaques as a benchmark. We should discuss what kinds of tasks those would be on top of the bat data set.

2 replies

oliver-adams-b Aug 28, 2020

Would love a link to the model architecture you are working with @pcbermant! The bat data might be interesting to play with in regards to the CPP, since the annotations contain labeled 'emitters' and 'addressees'. The size and quality of the bat data is also quite impressive, and so there's a lot of potential for novel insights. For a first romp with the bat data, I plan on spinning up a resnet to predict the 'context' of a vocalization. I'll make a post about accessing the bat data soon instead of spelling it out here!

pcbermant Aug 28, 2020
Author

Ok that sounds great! As I've only been working on it for about a day, the model is super experimental (rough!) so I'm still modifying the topology. The departure from CNNs is based on a number of observations including (1) the properties of spectrogram 'images' differ significantly from visual images (I have a notebook in the works addressing this), (2) the performance improvement when using ImageNet-pretrained models versus simple toy CNN models is marginal (or even nonexistent), (3) training generative models by minimizing a perceptual loss (or perceptual + L2 loss) leads to better-looking spectrograms that ultimately yield a decreased accuracy in downstream tasks relative to training solely on L2 loss. I suppose I am wondering why we are using a machine learning model biologically inspired by the visual sensory system to carry out auditory processing tasks.

With that being said, it's completely hypothetical (so I could be totally off course!), and as @aza mentioned, it might be better to confront a more challenging task to investigate the performances of the various models. I'm very curious to see your work with the bat data!

pcbermant · 2020-08-31T22:37:13Z

pcbermant
Aug 31, 2020
Author

I continued to explore the CPP by compiling a list of resources and datasets, and I drafted an initial writeup describing the problem and the plan of attack. I’m not sure what the best way to share this document is since I can’t seem to find a way to upload it to the ‘Projects’ section on GitHub.

As I dove into the literature, I found an interesting paper that implemented wavelet scattering transforms, which yield translation-invariant time-frequency representations of input data. I experimented with this technique and ended up with ~99% accuracy on the macaque dataset using a CNN-based architecture. It might be worth adding this to the representation toolbox, especially since it seems that a CNN-based approach might be well-suited for this representation as opposed to a usual spectrogram representation.

0 replies

pcbermant · 2020-09-02T22:33:17Z

pcbermant
Sep 2, 2020
Author

I dove deeper into the literature (which is very extensive!!) with a focus on both biology- and computer science-related publications. I think this is an interesting approach, since a number of papers have been exploring the possibility of constructing biology/physiology-inspired computational models to address the CPP. Thus, it seems to make sense to understand both the biology of the problem and the computer science of the problem at a fundamental level. As Britt suggested during our call wrt the timeline of research, I'm thinking it might be worthwhile to do some sort of visualization to show the existing body of research. However, before getting to this, I think it might be reasonable to continuing parsing the literature for inspiration and selecting a handful of papers to explore more deeply - I'm planning to add something like a bookshelf to the CPP project/repo.

As I touched on during our call, I've formulated a draft roadmap for addressing the CPP, so I'll make sure to push this to the appropriate repo before our next sync!

0 replies

pcbermant · 2020-09-09T23:35:15Z

pcbermant
Sep 9, 2020
Author

For the CPP project, I expanded the bookshelf with a UMAP visualiziation of publication abstract embeddings (which isn't too insightful), and I added a list of the particularly relevant resources and concepts. I started to explore the Asteroid package, but I wanted to confirm with everyone that we agree that this is a reasonable/appropriate first step in accordance with our roadmap.
I worked on the scattering wavelet transform representation module for the representation toolkit - it's pretty much ready to be added to the toolkit, but I think it might be useful to go over some of the ideas during our AI sync.
I explored the humpback 'solo caller' data, and it seems to be relatively limited. I'm happy to discuss more thoroughly, but I'm not too sure what we can do in terms of ML-based analysis with a dataset with five or six recordings ranging from ~5 seconds to just over a minute.
I explored some other techniques for analyzing our datasets. For instance, I worked on a VAE-based anomaly detector with some promising results. I included a new approach for normalizing inputs which I'm tempted to continue to investigate.
Lastly, I continued to look into neuroscience/bio resources in order to improve the cochlea-based NN architecture.

0 replies

pcbermant · 2020-09-15T00:34:41Z

pcbermant
Sep 15, 2020
Author

I've started to explore the Asteroid package. It's actually super cool and very comprehensive - the only thing is the recipes take a very long time to run, but I think that understanding the implementation of the techniques included will provide a ton of insight into how best to address the CPP in bioacoustics applications.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dolphins, Learned Transforms, Neural Architectures, and Cocktail Party Problem #21

{{title}}

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Dolphins, Learned Transforms, Neural Architectures, and Cocktail Party Problem #21

pcbermant Aug 27, 2020

Replies: 5 comments · 2 replies

aza Aug 28, 2020 Maintainer

oliver-adams-b Aug 28, 2020

pcbermant Aug 28, 2020 Author

pcbermant Aug 31, 2020 Author

pcbermant Sep 2, 2020 Author

pcbermant Sep 9, 2020 Author

pcbermant Sep 15, 2020 Author

pcbermant
Aug 27, 2020

Replies: 5 comments 2 replies

aza
Aug 28, 2020
Maintainer

pcbermant Aug 28, 2020
Author

pcbermant
Aug 31, 2020
Author

pcbermant
Sep 2, 2020
Author

pcbermant
Sep 9, 2020
Author

pcbermant
Sep 15, 2020
Author