Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: moving all variant coordinates to GnomAD #566

Merged
merged 13 commits into from
May 15, 2024

Conversation

DSuveges
Copy link
Contributor

@DSuveges DSuveges commented Apr 4, 2024

✨ Context

The variant mapping of the curated GWAS Catalog dataset is done via the REST API of Ensembl. This process uses +1 based numbering for indels, however GnomAD uses 0 based numbering. As we don't have complete allele set for the curated data, we only know if an associated variant is an indel if it is grounded to GnomAD.

So far the current implementation contained logic to permanently change the indel coordinates to Ensembl of the GnomAD variant annotation. This also affected the LD index generation. All this causing a series of complications across the board. More context here: #3274

🛠 What does this PR implement

  • Removing gnomadVariantId column from variant annotation. gnomadVariantId is the variant id.
  • utils.convert_gnomad_position_to_ensembl moved to datasource/gwas_catalog/associations.py as that function is no longer universal. (doctest updated)
  • Logic to change coordinates in the variant annotation step removed.
  • Logic to change coordinates in the LD index step removed.

🙈 Missing

These steps were not yet re-run. Also the LD index generation needs to be refactored to not do liftover but use the provided variant b38 mapping.

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@DSuveges DSuveges changed the title refactor (gnomad): moving all variant coordinates to GnomAD refactor: moving all variant coordinates to GnomAD Apr 18, 2024
@DSuveges DSuveges requested a review from d0choa April 24, 2024 08:22
Copy link
Collaborator

@d0choa d0choa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks more clear than the alternative coordinate system.
We shouldn't forget about the "missing" section regarding LD index. cc @Daniel-Considine
Thanks

@@ -256,6 +299,7 @@ def _map_to_variant_annotation_variants(
"alternateAllele",
"chromosome",
"position",
# ensemblPosition column is dropped. Only the GnomAD position is kept.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# ensemblPosition column is dropped. Only the GnomAD position is kept.

@DSuveges DSuveges merged commit bb9f9c6 into dev May 15, 2024
4 checks passed
@DSuveges DSuveges deleted the ds_3274_convert_gnomad branch June 30, 2024 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore impact of ambigious usage of Ensembl and Gnomad based coordinates for indels
2 participants