Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: the FineMapper function for one locus #564

Merged
merged 57 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
632ff14
test: adding test for pairwiseLD
DSuveges Nov 13, 2023
e057a8c
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Jan 14, 2024
76cda39
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Jan 17, 2024
557fade
feat: adding ld matrix extraction
Daniel-Considine Jan 18, 2024
5b96d2e
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Jan 18, 2024
c952898
chore: merge from dev
Daniel-Considine Jan 18, 2024
70f7a01
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Jan 18, 2024
cbedc3d
Merge branch 'dev' of https://github.com/opentargets/gentropy into ds…
Daniel-Considine Jan 18, 2024
83427d8
Merge branch 'dev' of https://github.com/opentargets/gentropy into ds…
Daniel-Considine Jan 22, 2024
51f66eb
Merge branch 'dev' of https://github.com/opentargets/gentropy into ds…
Daniel-Considine Jan 22, 2024
82107ec
feat: index and block matrix extraction for studyLocus
Daniel-Considine Feb 1, 2024
4df3dfc
Merge branch 'dev' of https://github.com/opentargets/gentropy into ds…
Daniel-Considine Feb 1, 2024
bce30d2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 1, 2024
4d09a60
chore: updating some test files to gentropy
Daniel-Considine Feb 1, 2024
f7dfba1
chore: updating tests
Daniel-Considine Feb 1, 2024
3a629ef
chore: updating pairwise_ld_schema for tests
Daniel-Considine Feb 2, 2024
6ff6b1c
chore: updating pairwise_ld tests
Daniel-Considine Feb 2, 2024
3fc9cfa
chore: fix ld_pairwise tests
Daniel-Considine Feb 2, 2024
0d68020
chore: fix pairwise_ld tests
Daniel-Considine Feb 2, 2024
07260a7
chore: fix tests
Daniel-Considine Feb 2, 2024
65cea4e
chore: fix tests
Daniel-Considine Feb 2, 2024
aa19f9f
chore: fixing typing for tests
Daniel-Considine Feb 2, 2024
e24cac1
chore: fixing tests
Daniel-Considine Feb 2, 2024
4ec36b9
chore: fixing ld tests
Daniel-Considine Feb 2, 2024
0e93adb
Update src/gentropy/dataset/study_index.py
Daniel-Considine Feb 5, 2024
cd4cddf
feat: moving functions to their appropriate locations and improving l…
Daniel-Considine Feb 7, 2024
08c63d0
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Feb 7, 2024
228c66b
fix: optimise conversion of BM to NumPy
Daniel-Considine Feb 7, 2024
b7dea93
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Feb 8, 2024
b666764
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Feb 8, 2024
06a9e96
feat: updating get_locus_index to allow for just chromosome and posit…
Daniel-Considine Feb 9, 2024
99fbf48
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Feb 9, 2024
798692c
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Feb 12, 2024
c39597f
Merge branch 'dev' of https://github.com/opentargets/gentropy into ds…
Daniel-Considine Mar 6, 2024
61e682d
Merge branch 'dev' of https://github.com/opentargets/gentropy into ds…
Daniel-Considine Mar 13, 2024
dfaa3e3
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Mar 18, 2024
ebdd983
fix: suggested changes
Daniel-Considine Mar 18, 2024
8019c4c
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Mar 18, 2024
faebdea
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Mar 21, 2024
1dc08e3
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Mar 21, 2024
a2232b5
Update study_index.py
Daniel-Considine Mar 21, 2024
70c7fc0
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Mar 21, 2024
b9b817d
fix: changes to datasource/gnomad/ld.py
Daniel-Considine Mar 21, 2024
170dd09
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Mar 28, 2024
37668d1
feat: add the draft of finemapper fucntion
addramir Apr 2, 2024
33b6a51
feat: updated method for ld_index extraction
Daniel-Considine Apr 2, 2024
f72338d
Merge branch 'dev' into ds_pairwise_ld
Daniel-Considine Apr 2, 2024
67f7f36
fix: changing input
addramir Apr 2, 2024
1dc6701
fix: adding fillter by studyId
addramir Apr 2, 2024
2a2afbe
fix: sorting idx in hail
Daniel-Considine Apr 2, 2024
48592e7
Merge remote-tracking branch 'origin/ds_pairwise_ld' into ytdc_finema…
addramir Apr 3, 2024
58f3eb9
feat: add fine-mapping of one study_locus_row
addramir Apr 3, 2024
6ec36cc
Merge branch 'dev' into ytdc_finemapper_v3
addramir Apr 3, 2024
5ee023a
fix: small fix in majpop
addramir Apr 3, 2024
3b0ddda
fix: small fixes in function
addramir Apr 3, 2024
3f8e20a
fix: using more spark before converting to pandas
Daniel-Considine Apr 4, 2024
d462368
fix: fix in test
addramir Apr 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 132 additions & 7 deletions src/gentropy/susie_finemapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,17 @@
from typing import Any

import numpy as np
import pandas as pd
import pyspark.sql.functions as f
from pyspark.sql import DataFrame, Window
from pyspark.sql import DataFrame, Row, Window
from pyspark.sql.types import IntegerType, StringType, StructField, StructType

from gentropy.common.session import Session
from gentropy.dataset.study_index import StudyIndex
from gentropy.dataset.study_locus import StudyLocus
from gentropy.dataset.summary_statistics import SummaryStatistics
from gentropy.datasource.gnomad.ld import GnomADLDMatrix
from gentropy.method.susie_inf import SUSIE_inf


class SusieFineMapperStep:
Expand All @@ -19,12 +25,129 @@ class SusieFineMapperStep:
In the future this step will be refactored and moved to the methods module.
"""

@staticmethod
def susie_finemapper_one_studylocus_row(
GWAS: SummaryStatistics,
session: Session,
study_locus_row: Row,
study_index: StudyIndex,
window: int = 1_000_000,
L: int = 10,
) -> StudyLocus:
"""Susie fine-mapper function that uses Summary Statstics, chromosome and position as inputs.

Args:
GWAS (SummaryStatistics): GWAS summary statistics
session (Session): Spark session
study_locus_row (Row): StudyLocus row
study_index (StudyIndex): StudyIndex object
window (int): window size for fine-mapping
L (int): number of causal variants

Returns:
StudyLocus: StudyLocus object with fine-mapped credible sets
"""
# PLEASE DO NOT REMOVE THIS LINE
pd.DataFrame.iteritems = pd.DataFrame.items

chromosome = study_locus_row["chromosome"]
position = study_locus_row["position"]
studyId = study_locus_row["studyId"]

study_index_df = study_index._df
study_index_df = study_index_df.filter(f.col("studyId") == studyId)
major_population = study_index_df.select(
"studyId",
f.array_max(f.col("ldPopulationStructure"))
.getItem("ldPopulation")
.alias("majorPopulation"),
).collect()[0]["majorPopulation"]

region = (
chromosome
+ ":"
+ str(int(position - window / 2))
+ "-"
+ str(int(position + window / 2))
)

gwas_df = (
GWAS.df.withColumn("z", f.col("beta") / f.col("standardError"))
.withColumn("chromosome", f.split(f.col("variantId"), "_")[0])
.withColumn("position", f.split(f.col("variantId"), "_")[1])
.filter(f.col("studyId") == studyId)
.filter(f.col("z").isNotNull())
)

ld_index = (
GnomADLDMatrix()
.get_locus_index(
study_locus_row=study_locus_row,
window_size=window,
major_population=major_population,
)
.withColumn(
"variantId",
f.concat(
f.lit(chromosome),
f.lit("_"),
f.col("`locus.position`"),
f.lit("_"),
f.col("alleles").getItem(0),
f.lit("_"),
f.col("alleles").getItem(1),
).cast("string"),
)
)

# Filtering out the variants that are not in the LD matrix, we don't need them
gwas_index = gwas_df.join(
ld_index.select("variantId", "alleles", "idx"), on="variantId"
).sort("idx")

gnomad_ld = GnomADLDMatrix.get_numpy_matrix(
gwas_index, gnomad_ancestry=major_population
)

pd_df = gwas_index.toPandas()
z_to_fm = np.array(pd_df["z"])
ld_to_fm = gnomad_ld

susie_output = SUSIE_inf.susie_inf(z=z_to_fm, LD=ld_to_fm, L=L)

schema = StructType(
[
StructField("variantId", StringType(), True),
StructField("chromosome", StringType(), True),
StructField("position", IntegerType(), True),
]
)
pd_df["position"] = pd_df["position"].astype(int)
variant_index = session.spark.createDataFrame(
pd_df[
[
"variantId",
"chromosome",
"position",
]
],
schema=schema,
)

return SusieFineMapperStep.susie_inf_to_studylocus(
susie_output=susie_output,
session=session,
studyId=studyId,
region=region,
variant_index=variant_index,
)

@staticmethod
def susie_inf_to_studylocus(
susie_output: dict[str, Any],
session: Session,
_studyId: str,
_region: str,
studyId: str,
region: str,
variant_index: DataFrame,
cs_lbf_thr: float = 2,
) -> StudyLocus:
Expand All @@ -33,8 +156,8 @@ def susie_inf_to_studylocus(
Args:
susie_output (dict[str, Any]): SuSiE-inf output dictionary
session (Session): Spark session
_studyId (str): study ID
_region (str): region
studyId (str): study ID
region (str): region
variant_index (DataFrame): DataFrame with variant information
cs_lbf_thr (float): credible set logBF threshold, default is 2

Expand All @@ -44,6 +167,7 @@ def susie_inf_to_studylocus(
variants = np.array(
[row["variantId"] for row in variant_index.select("variantId").collect()]
).reshape(-1, 1)

PIPs = susie_output["PIP"]
lbfs = susie_output["lbf_variable"]
mu = susie_output["mu"]
Expand Down Expand Up @@ -74,6 +198,7 @@ def susie_inf_to_studylocus(
win = Window.rowsBetween(
Window.unboundedPreceding, Window.unboundedFollowing
)

cred_set = (
session.spark.createDataFrame(
cred_set.tolist(),
Expand Down Expand Up @@ -104,8 +229,8 @@ def susie_inf_to_studylocus(
.limit(1)
.withColumns(
{
"studyId": f.lit(_studyId),
"region": f.lit(_region),
"studyId": f.lit(studyId),
"region": f.lit(region),
"credibleSetIndex": f.lit(counter),
"credibleSetlog10BF": f.lit(cs_lbf_value * 0.4342944819),
"finemappingMethod": f.lit("SuSiE-inf"),
Expand Down
5 changes: 3 additions & 2 deletions tests/gentropy/method/test_susie_inf.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,12 @@ def test_SUSIE_inf_convert_to_study_locus(
est_tausq=False,
)
gwas_df = sample_summary_statistics._df.limit(21)

L1 = SusieFineMapperStep.susie_inf_to_studylocus(
susie_output=susie_output,
session=session,
_studyId="sample_id",
_region="sample_region",
studyId="sample_id",
region="sample_region",
variant_index=gwas_df,
cs_lbf_thr=2,
)
Expand Down