dolphin non-stranding / pre-stranding whistles #16
Replies: 6 comments
-
Hey Radek! Thanks this is an awesome update! As I understand, the date is contained in the name of the recording - for instance .....170114....wav corresponds to the date 01_14_17. Then the next numbers correspond to the time (hhmmss) if I recall correctly. Lastly, I'm pretty sure that the time stamps ('Begin Time' and 'End Time') in the annotations are the number of seconds starting from the first recording of the day. Does this seem to make sense? I'm still not 100% sure if this is the right approach, but I was able to find whistles in the recordings using this method. |
Beta Was this translation helpful? Give feedback.
-
One other piece of information - seems the recordings are of just under 20 minutes taken every hour (we have 24 recordings per day - 5 non stranding days and 14 pre stranding days). Regarding the labels, thank you very much for the information @pcbermant! I rechecked everything to make sure I implemented the logic you outlined and it turns out I got a little bit unlucky in picking the annotations I was working with. Here are the first two annotations from 011417: and here is the third one where finally one can make something out on the spectrogram: still no discernible signal to be identified by listening. I guess part of the issue might be that these are high frequency that could be outside my hearing range / not to good to listen to on my laptop speakers. I tried using headphones and am able to hear the whistles much better. I wonder if the first two examples do contain whistles? I am not sure of that. If not, that might be indicative we might have some mislabeled examples in the dataset - might be something worth taking a closer look at. |
Beta Was this translation helpful? Give feedback.
-
What a 'nice' example looks like: (I wonder if we are hearing two dolphins speak here actually?) |
Beta Was this translation helpful? Give feedback.
-
Hey @pcbermant! Thx for sharing your code! I really like the use of the low and high pass filter and that you jumped to playing with a CAM model! I also feel that given the problem figuring out what the model fits to is going to be very important! (it might also give us a clue as to the solution why the strandings occur!) I was curious as I do not fully understand the notebook with the CAM model - what did you find out? Is the model fitting to the shape of the whistle or not necessarily? I do not want to cause extra work for you, but if you were so kind and maybe be able to share a couple of examples in our next AI sync, I think that could be very interesting and useful! 🙂 |
Beta Was this translation helpful? Give feedback.
-
I created a way of getting a hold of the examples, very much like a PyTorch dataset. I am taking a more narrative approach to the work as evidenced in the initial notebook (contains a quick look at the data, includes the encapsulated machinery for working with Raven annotations and the naming convention adopted by the researchers). My thinking is that by organizing the work for legibility, we might have an easier time going back to our partners to discuss progress. Also, this could work as a good way to demonstrate the value we can provide to potential partners. @kzacarian this is all very preliminary - would love to hear what you think and if there is anything else you would like us to do on this front. I also brought myself up to speed on the fantastic work @pcbermant is doing and shared some thoughts above. I am beginning to work on a very simple method of detecting whistles. Not sure how well it will work if at all. Should it work, I think it could be complimentary to the approach Peter is taking at the moment. |
Beta Was this translation helpful? Give feedback.
-
Upon learning that @pcbermant got the adapted click detector to work well with the dolphin data, I ditched my idea for creating a detector that would exploit the fact that the vocalizations we are after here are high frequency. Instead, I decided to take a different approach. The question I would like to answer is this - if there is something in the dataset that is indicative / predictive of a stranding occurring, be that sonar, some other vocalizations, loud waves, would we be able to find it? I already have the starter code for it here, now trying to figure out how to train on this. There is some narrative in the notebook where I attempt to explain my reasoning. I am not convinced this will work, but I think it is worth a try. I think this can also give us a better understanding of the data we have at hand. If training on entire files would give high accuracy, this would probably be indicative the model is fitting to some ambient conditions. What those conditions might be would be interesting, it could also be very useful for the approach of trying to train on only the vocalizations. Anyhow, this is a WIP at this point. |
Beta Was this translation helpful? Give feedback.
-
I started looking at the dolphin data that we have in this notebook.
The data is divided into non-stranding and pre-stranding categories. There are a total 120 non-stranding and 336 pre-stranding recordings. This translates to just under 40 hrs of non-stranding and under 96 hrs of pre-stranding audio. All audio is recorded with a sample rate of 72khz.
We have just 160 non-stranding and 1079 pre-stranding annotations. They are in raven output format. Reading the annotation data is not a problem (it is just a tab separated csv file) but unfortunately going from annotation to specified portion of the recording is non-trivial. Raven would use the 'Begin Time (s)' column to match a given annotation to a portion of the recording, but that piece of information (an offset in seconds) is only useful if we know how to arrange the recordings in order.
AFAICT the recordings are not named in a way aligned with Raven defaults. Specifically, this is the Raven default:
<yyyy><ll><dd>_<hh><mm><ss>
. A name of a recording in the dataset is for example805339167.170114002319.wav
. The first portion of the filename (up to the first dot) seems to be some identifier, we then have 6 ints for the data and I assumed that the last portion of the filename is the we are after. I unfortunately was unable to make use of this information, neither in jupyter notebook nor in Raven. It might be that we are missing some recordings or that the data follows some different formatting scheme.This is all very new to me, only started to look at the data today. A lot to take in and consider. Have some ideas how we can start working with the data (assuming we can understand the labeling convention), seems the data is quite sparse, maybe a detector would make for a good initial step (just some tentative thoughts).
@pcbermant would love to know your thoughts 🙂 Did you have any luck decoding the file format? I create a repo for this work - maybe we could put all our work there?
I still want to think a bit more about this, but assuming we will have issues making use of the annotations, maybe it would be good to get on the call with our partners and ask them to show us how they are working with the data in Raven. We might grab the formatting string they are using this way for the file names and might be a good idea to learn anyhow what they are looking for in creating the bounding boxes around the calls, how precise they try to be.
These are just some preliminary thoughts, might be I am missing some context on this and with additional information, should it have been shared with us, maybe some of these ideas are actually obsolete.
Beta Was this translation helpful? Give feedback.
All reactions