Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(coloc): handle cases when the bayes factors are null #556

Merged
merged 4 commits into from
Mar 22, 2024

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Mar 21, 2024

✨ Context

Testing COLOC's behaviour when Bayes Factors aren't present I observed that null values of logBF weren't filled with 0:

+----------------+-----------------+----------+------------+------------+
|leftStudyLocusId|rightStudyLocusId|chromosome|tagVariantId|  statistics|
+----------------+-----------------+----------+------------+------------+
|               1|                2|         1|         snp|{null, null}|
+----------------+-----------------+----------+------------+------------+

df.fillna(0, subset=["statistics.left_logBF", "statistics.right_logBF"]).withColumn(
    "sum_log_bf",
    f.col("statistics.left_logBF") + f.col("statistics.right_logBF"),
).show()
+----------------+-----------------+----------+------------+------------+----------+
|leftStudyLocusId|rightStudyLocusId|chromosome|tagVariantId|  statistics|sum_log_bf|
+----------------+-----------------+----------+------------+------------+----------+
|               1|                2|         1|         snp|{null, null}|      null|
+----------------+-----------------+----------+------------+------------+----------+

The outcome of this is that Coloc.colocalise crashes because get_logsum can't handle nulls.

I don't think this had implications in the outputs of #530, because all data had Bayes Factors.

🛠 What does this PR implement

  • The fix is simple. fillna is a function that operates in columns that are not nested. Unnesting the BFs fields fixes the problem.
  • I've added a semantic test that makes sure that if no credible sets have a BF, Coloc.colocalise won't crash and the results will basically show no signs of colocalisation.

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added bug Something isn't working size-S Method labels Mar 21, 2024
@addramir
Copy link
Contributor

addramir commented Mar 21, 2024

I discussed it with @xyg123 before - it is not really correct to use 0 instead of NAN in overlapping object.
We should do the following:

  1. We should assume that each missing value corresponds to very small PIP, lets set it to 1e-6
  2. For each study (left and right) separately:
  • Select one SNP with non null pip and LBF and calculate the following: delta=logBF-log(PIP). Actually this value is equal for all non null SNP within the study (but differnt between the studies!)
  • For missing null snps assign logBF as logBF=log(1e-6)+delta

Why it is important? For cases where signals are strong, assigning 0 for logBF can make the coloc very conservative because real value of the logBF will be >>0.

I'm happy to implement these changes if needed.

@ireneisdoomed
Copy link
Contributor Author

@addramir Thanks a lot for looking at this. It's a good point, we could be downweighting associations in L2G because of this.
I understand that delta represents some baseline likelihood of being causal within the study? Im curious about hte interpretation of substracting PIP to the Bayesian Factor.

I tested some examples and deltas intra study can vary in the hundredths, I guess this is not a problem.

In any case, I would implement this in another PR. Independently of the value we decide as baseline, the fixes in this PR are necessary.

@addramir
Copy link
Contributor

Ok, agree, lets implement it later, I will provide more context.

@ireneisdoomed ireneisdoomed merged commit 8f9d268 into dev Mar 22, 2024
4 checks passed
@ireneisdoomed ireneisdoomed deleted the il-fix-coloc branch July 15, 2024 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Method size-S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants