Data candy #29

nevrome · 2021-08-27T12:59:09Z

With the growing data collection in this repository and the tools we wrote to access it, we could relatively easily prepare some automatic pipelines to construct useful, derived data products. One way to set this up would be to create a github repo, that gets updated automatically with a clever github action, whenever the master branch in published_data changes.

Some ideas:

Pairwise-distance matrices with multiple distance measures. This is especially important, given that many individuals are represented multiple times in this dataset. So far we do not offer a workflow to remove duplicates (or biologically related individuals).
An MDS with all ancient individuals.
Various data quality and -completeness indices.

Theoretically we could also produce figures and interactive toys - then the sky is the limit. I would suggest to stick to the basic necessities, though.

stschiff · 2021-08-30T06:31:59Z

Love it! Of course, things like all-pairwise distances likely require High Performance computing environments, so I'd be curious how we can practically set this up. But the general idea to keep a separate GitHub-repo with such results is super nice!

nevrome · 2021-09-20T12:34:10Z

I'm pretty optimistic here. Pruning and pairwise distance calculation for 3000 individuals takes about 2 (!!) seconds on the MPI-EVA cluster. Even given that calculating distances for 12000 individuals is 16 times more work and Github actions only provide limited computing power, this could still be possible.

stschiff · 2021-09-24T15:36:57Z

Yes indeed. This would be very nice.

stschiff added the enhancement New feature or request label Dec 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data candy #29

Data candy #29

nevrome commented Aug 27, 2021

stschiff commented Aug 30, 2021

nevrome commented Sep 20, 2021

stschiff commented Sep 24, 2021

Data candy #29

Data candy #29

Comments

nevrome commented Aug 27, 2021

stschiff commented Aug 30, 2021

nevrome commented Sep 20, 2021

stschiff commented Sep 24, 2021