fix(coloc): handle cases when the bayes factors are null #556

ireneisdoomed · 2024-03-21T17:12:06Z

✨ Context

Testing COLOC's behaviour when Bayes Factors aren't present I observed that null values of logBF weren't filled with 0:

+----------------+-----------------+----------+------------+------------+
|leftStudyLocusId|rightStudyLocusId|chromosome|tagVariantId|  statistics|
+----------------+-----------------+----------+------------+------------+
|               1|                2|         1|         snp|{null, null}|
+----------------+-----------------+----------+------------+------------+

df.fillna(0, subset=["statistics.left_logBF", "statistics.right_logBF"]).withColumn(
    "sum_log_bf",
    f.col("statistics.left_logBF") + f.col("statistics.right_logBF"),
).show()
+----------------+-----------------+----------+------------+------------+----------+
|leftStudyLocusId|rightStudyLocusId|chromosome|tagVariantId|  statistics|sum_log_bf|
+----------------+-----------------+----------+------------+------------+----------+
|               1|                2|         1|         snp|{null, null}|      null|
+----------------+-----------------+----------+------------+------------+----------+

The outcome of this is that Coloc.colocalise crashes because get_logsum can't handle nulls.

I don't think this had implications in the outputs of #530, because all data had Bayes Factors.

🛠 What does this PR implement

The fix is simple. fillna is a function that operates in columns that are not nested. Unnesting the BFs fields fixes the problem.
I've added a semantic test that makes sure that if no credible sets have a BF, Coloc.colocalise won't crash and the results will basically show no signs of colocalisation.

🙈 Missing

🚦 Before submitting

Do these changes cover one single feature (one change at a time)?
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes?
Did you make sure there is no commented out code in this PR?
Did you follow conventional commits standards in PR title and commit messages?
Did you make sure the branch is up-to-date with the dev branch?
Did you write any new necessary tests?
Did you make sure the changes pass local tests (make test)?
Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

addramir · 2024-03-21T20:07:10Z

I discussed it with @xyg123 before - it is not really correct to use 0 instead of NAN in overlapping object.
We should do the following:

We should assume that each missing value corresponds to very small PIP, lets set it to 1e-6
For each study (left and right) separately:

Select one SNP with non null pip and LBF and calculate the following: delta=logBF-log(PIP). Actually this value is equal for all non null SNP within the study (but differnt between the studies!)
For missing null snps assign logBF as logBF=log(1e-6)+delta

Why it is important? For cases where signals are strong, assigning 0 for logBF can make the coloc very conservative because real value of the logBF will be >>0.

I'm happy to implement these changes if needed.

ireneisdoomed · 2024-03-22T14:34:53Z

@addramir Thanks a lot for looking at this. It's a good point, we could be downweighting associations in L2G because of this.
I understand that delta represents some baseline likelihood of being causal within the study? Im curious about hte interpretation of substracting PIP to the Bayesian Factor.

I tested some examples and deltas intra study can vary in the hundredths, I guess this is not a problem.

In any case, I would implement this in another PR. Independently of the value we decide as baseline, the fixes in this PR are necessary.

addramir · 2024-03-22T17:26:51Z

Ok, agree, lets implement it later, I will provide more context.

ireneisdoomed added 2 commits March 21, 2024 13:05

fix(coloc): fillna doesnt fill nested data

c833873

test(coloc): added test_coloc_no_logbf (semantic)

b843791

ireneisdoomed requested a review from d0choa March 21, 2024 17:12

github-actions bot added bug Something isn't working size-S Method labels Mar 21, 2024

revert(ecaviar): revert accidental changes

fe552a4

Merge branch 'dev' into il-fix-coloc

db8ba84

addramir approved these changes Mar 22, 2024

View reviewed changes

ireneisdoomed merged commit 8f9d268 into dev Mar 22, 2024
4 checks passed

ireneisdoomed deleted the il-fix-coloc branch July 15, 2024 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(coloc): handle cases when the bayes factors are null #556

fix(coloc): handle cases when the bayes factors are null #556

ireneisdoomed commented Mar 21, 2024 •

edited

Loading

addramir commented Mar 21, 2024 •

edited

Loading

ireneisdoomed commented Mar 22, 2024

addramir commented Mar 22, 2024

fix(coloc): handle cases when the bayes factors are null #556

fix(coloc): handle cases when the bayes factors are null #556

Conversation

ireneisdoomed commented Mar 21, 2024 • edited Loading

✨ Context

🛠 What does this PR implement

🙈 Missing

🚦 Before submitting

addramir commented Mar 21, 2024 • edited Loading

ireneisdoomed commented Mar 22, 2024

addramir commented Mar 22, 2024

ireneisdoomed commented Mar 21, 2024 •

edited

Loading

addramir commented Mar 21, 2024 •

edited

Loading