-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATAC-seq Data Integration #14
Comments
Related to hubmapconsortium/portal-ui#1334:
@mruffalo How can we figure out which genome build was used to process these datasets? |
@ngehlenborg What exactly do you mean by "figure out" -- what type of answer do you have in mind? Me answering in a comment to this GitHub issue? (GRCh38 with GENCODE v32 anntoations for all processed ATAC-seq datasets.) Storing a mapping of pipeline versions (commit hashes? tags? both?) to genome and annotation versions in this repository or somewhere else appropriate? Or a programmatic way to obtain the annotations for a derived data set, given the pipeline version that was used to produce that data set? Something like this could be automated by examining a derived data set, obtaining the pipeline commit that produced that data set, and getting supplementary data from the appropriate Docker image:
This would allow accessing the actual genome annotations in BED format -- does something like this seem useful enough to make more convenient? |
Sorry, that wasn't very clear. I am wondering how we can figure out which genome build was used programmatically. We should probably have that for each pipeline through an API or a well-defined location in the CWL file? I am not sure what is best, but I would rather not have to write code that checks file names on disk. |
I agree with @ngehlenborg - the way this would work ideally is that it would be somewhere that is eminently parse-able (say some sort of |
We need to agree on a location for the genome build for a given data set with the IEC and the CMU TC. Added to portal call agenda. |
From the 1/21/2022 minutes:
cc @mruffalo : Please update here if that isn't correct. |
Matt posted on hive-developers February 9:
Nils responded:
|
Ilan says:
|
Cell by bin
Visualize in higlass
Cell by peaks (in BED + snap files)
Annotated peaks (genomic intervals) per cell
Genome-wide (not necessarily tied to a gene)
Our Pipelines:
TMC:
Outstanding Issues:
Notes:
We have many visualization options:
The text was updated successfully, but these errors were encountered: