You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the growing data collection in this repository and the tools we wrote to access it, we could relatively easily prepare some automatic pipelines to construct useful, derived data products. One way to set this up would be to create a github repo, that gets updated automatically with a clever github action, whenever the master branch in published_data changes.
Some ideas:
Pairwise-distance matrices with multiple distance measures. This is especially important, given that many individuals are represented multiple times in this dataset. So far we do not offer a workflow to remove duplicates (or biologically related individuals).
An MDS with all ancient individuals.
Various data quality and -completeness indices.
Theoretically we could also produce figures and interactive toys - then the sky is the limit. I would suggest to stick to the basic necessities, though.
The text was updated successfully, but these errors were encountered:
Love it! Of course, things like all-pairwise distances likely require High Performance computing environments, so I'd be curious how we can practically set this up. But the general idea to keep a separate GitHub-repo with such results is super nice!
I'm pretty optimistic here. Pruning and pairwise distance calculation for 3000 individuals takes about 2 (!!) seconds on the MPI-EVA cluster. Even given that calculating distances for 12000 individuals is 16 times more work and Github actions only provide limited computing power, this could still be possible.
With the growing data collection in this repository and the tools we wrote to access it, we could relatively easily prepare some automatic pipelines to construct useful, derived data products. One way to set this up would be to create a github repo, that gets updated automatically with a clever github action, whenever the master branch in published_data changes.
Some ideas:
Theoretically we could also produce figures and interactive toys - then the sky is the limit. I would suggest to stick to the basic necessities, though.
The text was updated successfully, but these errors were encountered: