The Egyptian Fruit Bat Datasets #24
Replies: 1 comment 2 replies
-
@oliver-adams-b , this is so great! Detecting a signal to the call-unit level, and scaling up annotations, would be an incredibly valuable ML application for the conservation biology community. I wonder if this dataset of 300k vocalization/90k of which are annotated, could be an interesting example? As context, a common ask from biologists is how to scale up annotations, as doing so by hand or with Raven is very labor-intensive. For example, one biologist just shared with me, "I am now currently burning my last remaining operative budget at the university to annotate and index ALL our recordings. This will be necessary before embarking on a full (semi)automated ML protocol for us to be able to gauge the ML accuracy." |
Beta Was this translation helpful? Give feedback.
-
Hey All!
Aza scouted the Egyptian Fruit Bat, and for the past week I've been curating the dataset into chunks and uploading them to Archive. The full dataset contains ~300k vocalization samples, ~90k of which are annotated. I've split the data up into three lumps, one set containing everything from the source (found here) another containing just the annotations (found here), and another containing a small subset of the annotated data (found here). The larger two datasets, are quite large even in a zipped archive format (~100GiB and ~40GiB) and have to be accessed using 7zip (since the files were 7zipped from the Figshare source):
dest_path = "/path/to/the/datas/new/home/"
!wget -P {dest_path} https://archive.org/download/egyptian_fruit_bat_annotated/egyptian_fruit_bat_annotated.zip
!7z x {dest_path}egyptian_fruit_bat_annotated.zip -o{dest_path}egyptian_fruit_bat_annotated/
Downloading and unzipping both of these sets onto my GCP instance took around 10 hours, and so I made the tiny subsample to make the amount of data a little less monstrous. You should be able to download the tiny subsample of the annotated dataset using:
untar_data("https://archive.org/download/egyptian_fruit_bat_annotated_tiny/egyptian_fruit_bat_annotated_tiny.zip")
I'll be putting out a quick little notebook soon that showcases working with the data! If anyone has any questions or comments on this, bring em up here!
Beta Was this translation helpful? Give feedback.
All reactions