Skip to content

Commit

Permalink
Merge branch 'datasets'
Browse files Browse the repository at this point in the history
  • Loading branch information
sadda committed Oct 31, 2024
2 parents 1423e32 + a9fb2e5 commit 62095da
Show file tree
Hide file tree
Showing 88 changed files with 4,088 additions and 9,607 deletions.
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,20 @@

The aim of the project is to provide comprehensive overview of datasets for wildlife individual re-identification and an easy-to-use package for developers of machine learning methods. The core functionality includes:

- overview of 36 publicly available wildlife re-identification datasets.
- overview of 41 publicly available wildlife re-identification datasets.
- utilities to mass download and convert them into a unified format and fix some wrong labels.
- default splits for several machine learning tasks including the ability create additional splits.

An introductory example is provided in a [Jupyter notebook](notebooks/introduction.ipynb). The package provides a natural synergy with [Wildlife tools](https://github.com/WildlifeDatasets/wildlife-tools), which provides our [MegaDescriptor](https://huggingface.co/BVRA/MegaDescriptor-L-384) model and tools for training neural networks.

## Changelog

[08/10/2024] Added AmvrakikosTurtles, ReunionTurtles, ZakynthosTurtles (sea turtles), ELPephants (elephants) and Chicks4FreeID (chickens).
[13/06/2024] Added WildlifeReID-10k (unification of multiple datasets).
[09/05/2024] Added CatIndividualImages (cats), CowDataset (cows) and DogFaceNet (dogs).
[28/02/2024] Added MPDD (dogs), PolarBearVidID (polar bears) and SeaStarReID2023 (sea stars).
[04/01/2024] Received **Best paper award** at WACV 2024.

## Summary of datasets

An overview of the provided datasets is available in the [documentation](https://wildlifedatasets.github.io/wildlife-datasets/datasets/), while the more numerical summary is located in a [Jupyter notebook](notebooks/dataset_descriptions.ipynb). Due to its size, it may be necessary to view it via [nbviewer](https://nbviewer.org/github/WildlifeDatasets/wildlife-datasets/blob/main/notebooks/dataset_descriptions.ipynb).
Expand Down Expand Up @@ -88,7 +96,7 @@ dataset.df
The dataset also contains basic metadata including information about the number of individuals, time span, licences or published year.

```
dataset.metadata
dataset.summary
```

<picture>
Expand Down
4 changes: 2 additions & 2 deletions baselines/analyze_wildlife_reid_10k.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -554,7 +554,7 @@
"source": [
"summary_datasets = {}\n",
"for name, df_red in df.groupby('dataset'):\n",
" metadata = eval(f'datasets.{name}.metadata')\n",
" metadata = eval(f'datasets.{name}.summary')\n",
" if 'licenses' in metadata:\n",
" license = metadata['licenses']\n",
" else:\n",
Expand Down Expand Up @@ -1039,7 +1039,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "venv",
"language": "python",
"name": "python3"
},
Expand Down
311 changes: 0 additions & 311 deletions baselines/prepare_wildlife_reid_10k.py

This file was deleted.

4 changes: 2 additions & 2 deletions baselines/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ def rename_index(df):
rename = {}
for dataset_name in df.index:
try:
metadata = eval(f'datasets.{dataset_name}.metadata')
citation = " \cite{" + metadata['cite'] + "}"
summary = eval(f'datasets.{dataset_name}.summary')
citation = " \cite{" + summary['cite'] + "}"
except:
citation = ''
rename[dataset_name] = dataset_name + citation
Expand Down
Loading

0 comments on commit 62095da

Please sign in to comment.