Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add the step class for fine-mapping #554

Merged
merged 8 commits into from
Apr 2, 2024
Merged

Conversation

addramir
Copy link
Contributor

@addramir addramir commented Mar 21, 2024

✨ Context

This PR introduced the fine-mapper step and has a function that converts susie output to study-locus.

πŸ›  What does this PR implement

See above.

πŸ™ˆ Missing

Documentation will be updated later.

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@addramir addramir marked this pull request as ready for review March 21, 2024 13:39
@addramir addramir requested a review from DSuveges March 22, 2024 13:58
Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have a few small comments that doesn't affect the logic, so not a deal breaker for merging. My comments are stylisting, if you want to address them, you can before merge.

cred_set.tolist(),
["variantId", "posteriorProbability", "logBF", "beta"],
)
.join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this join the mode is inner by default. Is it expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be 1 to 1 the same size and order

session (Session): Spark session
_studyId (str): study ID
_region (str): region
_join (DataFrame): DataFrame with variant information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the _join variable to something more intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

order_creds.sort(key=lambda x: x[1], reverse=True)
cred_sets = None
counter = 0
for i, value in order_creds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i is fine, this is a scanonical way of calling an index variable, however value is not very telling. At row 110 it is very complicated what does this valuerefer to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed it for cs_lbf_value

susie_output (dict[str, Any]): SuSiE-inf output dictionary
session (Session): Spark session
_studyId (str): study ID
_region (str): region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a consisted way of representing regions? If so, in the args description could be written eg.:

_region (str): finemapped region in chr:start-end format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't have it now. But agree, we need to think about the standard.

if cred_sets is None:
cred_sets = cred_set
else:
cred_sets = cred_sets.union(cred_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use unionByName rather union, because union concatenates columns by positions instead of names. I think in this case it is fine, as the order of columns are defined in rwo 120, but still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_join=gwas_df,
cs_lbf_thr=2,
)
assert isinstance(L1, StudyLocus), "L1 is not an instance of StudyLocus"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how does the test dataset look like, it would be great to assert that the number of credible set is what you are expecting, and validate if the locus object is healthy. However I understand if that is not a high priority for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can create more meaningful test for bigger function that will use this convertor on later stages.

@addramir addramir merged commit d76ebbe into dev Apr 2, 2024
4 checks passed
@addramir addramir deleted the ytdc_susie_finemapper_2 branch April 2, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants