Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: trigger release process #595

Merged
merged 95 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
ba956c7
chore: small updates to accomodate GWAS Catalog for feb release
DSuveges Feb 27, 2024
655a5f3
fix: updating cohort parser to new GWAS Catalog format
DSuveges Feb 28, 2024
d4dd97e
fix: updating study index generation from gwas flat files
DSuveges Mar 4, 2024
97bf3f0
Merge branch 'dev' into ds_gwas_update
ireneisdoomed Mar 6, 2024
c2e3e91
chore: resolve merge commit
DSuveges Mar 7, 2024
9b64cf8
chore: bumping for 24.03 release
DSuveges Mar 7, 2024
9ee712a
fix: reverting some changes
DSuveges Mar 7, 2024
6e62c4b
chore: add tuqtl features to L2Gfeaturematrix schema
ireneisdoomed Mar 8, 2024
fda650b
chore: add sqtl, tuqtl, pqtl features to l2g inclusion list
ireneisdoomed Mar 8, 2024
4dcf542
Merge pull request #529 from opentargets/il-l2g-config-changes
DSuveges Mar 8, 2024
02dc66f
feat: extract credible sets and studies from all eQTL Catalogue finem…
ireneisdoomed Mar 6, 2024
0b0db5e
fix: fixing merge conflict
DSuveges Mar 8, 2024
8f91f3a
revert: make setup dev not to update pre-commits (#524)
d0choa Mar 7, 2024
af16331
chore: update ruff pre-commit and rules (#522)
d0choa Mar 7, 2024
f6e0ad0
fix: pr labeller patterns (#523)
d0choa Mar 7, 2024
8257ea3
fix: fixing merge conflict
DSuveges Mar 8, 2024
d70a8ef
feat(coloc): single SNP case (#511)
xyg123 Mar 7, 2024
4d72946
fix: sorting out merge conflicts
DSuveges Mar 8, 2024
45750c5
fix: removing fsspec from dependencies
DSuveges Mar 8, 2024
4da9462
Merge branch 'dev' into ds_gwas_update
DSuveges Mar 8, 2024
50af4d2
fix: fixing lock file
DSuveges Mar 8, 2024
a7c919a
fix: updating gwas catalog test files following schema changes
DSuveges Mar 8, 2024
fa938fd
Merge pull request #507 from opentargets/ds_gwas_update
DSuveges Mar 8, 2024
8f0c6d5
build(deps-dev): bump ruff from 0.2.0 to 0.3.2 (#531)
dependabot[bot] Mar 11, 2024
317a09c
build(deps-dev): bump mypy from 1.8.0 to 1.9.0 (#532)
dependabot[bot] Mar 11, 2024
5cd4602
fix: pin version of commitlint (#533)
louwenjjr Mar 11, 2024
f12d4ea
Merge branch 'dev' into ds_3238_update_ETL_DAG
DSuveges Mar 11, 2024
06c6117
fix: read function for thurman data to include first line (#534)
louwenjjr Mar 12, 2024
43dd479
feat(sumstat imputation): adding class for sumstat imputation (#490)
addramir Mar 12, 2024
088895a
Merge branch 'dev' into ds_3238_update_ETL_DAG
DSuveges Mar 12, 2024
db73c23
chore: apply suggestions from code review
DSuveges Mar 13, 2024
6fa81a6
refactor: tidying up the ETL DAG
DSuveges Mar 13, 2024
8b312ef
ci: bugfix in PR labeller (#537)
d0choa Mar 13, 2024
1095c67
chore: pre-commit autoupdate (#539)
pre-commit-ci[bot] Mar 13, 2024
cd3324a
Merge branch 'dev' into ds_3238_update_ETL_DAG
DSuveges Mar 13, 2024
cae2e40
fix(dag): sorting out the checks for the existence of GCP folders
DSuveges Mar 14, 2024
d4f2792
Merge branch 'ds_3238_update_ETL_DAG' of https://github.com/opentarge…
DSuveges Mar 14, 2024
4dcecc3
Merge pull request #528 from opentargets/ds_3238_update_ETL_DAG
DSuveges Mar 14, 2024
01abe96
build(deps-dev): bump deptry from 0.12.0 to 0.14.0 (#548)
dependabot[bot] Mar 18, 2024
eaefc98
build(deps-dev): bump mkdocstrings-python from 1.8.0 to 1.9.0 (#549)
dependabot[bot] Mar 18, 2024
b3a5664
chore(l2g): log annotated gold standards in w&b (#546)
ireneisdoomed Mar 18, 2024
77976b5
perf(l2g): streamline feature generation (#544)
ireneisdoomed Mar 18, 2024
923c622
chore: pre-commit autoupdate (#550)
pre-commit-ci[bot] Mar 19, 2024
160051c
feat(l2g): distance features based on weighted score (#545)
ireneisdoomed Mar 19, 2024
ad50c15
perf(clump): refactored window based clumping (#492)
d0choa Mar 20, 2024
650bb2e
feat: notebook to run qc metrics each release (#541)
xyg123 Mar 20, 2024
dee3085
fix: small fixes in susie defaults (#552)
addramir Mar 21, 2024
512a80a
test(method): improved performance in coloc tests (#536)
xyg123 Mar 21, 2024
232b1e0
perf(l2g): optimise extraction of features from colocalisation result…
ireneisdoomed Mar 22, 2024
8f9d268
fix(coloc): handle cases when the bayes factors are null (#556)
ireneisdoomed Mar 22, 2024
255c42d
fix(sumstats): correct study id for dir of finngen studies (#551)
louwenjjr Mar 26, 2024
d76ebbe
feat: add the step class for fine-mapping (#554)
addramir Apr 2, 2024
56067e7
feat: LD index and block matrix extraction for a studyLocus (#463)
Daniel-Considine Apr 3, 2024
e1d20f3
feat: the FineMapper function for one locus (#564)
addramir Apr 5, 2024
b76fd07
feat: susie_finemapper_ss_gathered() (#567)
Daniel-Considine Apr 5, 2024
86600b0
feat: add FM step with carma and sumstat imputation (#568)
addramir Apr 9, 2024
dc4e367
fix: adding deduplication for GWAS in locus (#573)
addramir Apr 10, 2024
a5b62f2
feat: add benchmarking for fine-mapping using Alzheimer as example (#…
addramir Apr 10, 2024
ecd8063
fix: removing all duplicated variants in sumstats for finemapping fun…
Daniel-Considine Apr 10, 2024
7ed4703
feat: adding notebook for mapping EFOs for the FinnGen study index (#…
addramir Apr 14, 2024
900dd64
feat: adding init to finemapping step (#577)
Daniel-Considine Apr 23, 2024
78fcf1b
feat: dockerise gentropy python package (#579)
ireneisdoomed Apr 23, 2024
82b8a7c
feat: updating step config file (#580)
Daniel-Considine Apr 23, 2024
b0c4530
fix: update error in config.py (#583)
Daniel-Considine Apr 23, 2024
28a067c
feat: changing locus window to locus radius to be consistent with oth…
Daniel-Considine Apr 23, 2024
e2f8e87
fix: minor updates and bug fixes (#543)
DSuveges Apr 24, 2024
cf184f8
fix: updating config.py argument for finemapper (#584)
Daniel-Considine Apr 24, 2024
bcc9a36
feat(sumstat qc): adding methods for QC of summary statistics (#455)
addramir Apr 24, 2024
05d21bc
feat: susie_finemapper_one_studylocus_row_v3_dev_ss_gathered (#586)
addramir Apr 25, 2024
a88f16c
feat: functionality added to StudyLocus.find_overlaps() for finding w…
Daniel-Considine Apr 26, 2024
da2c75d
feat: github action to upload docker image to registry (#588)
d0choa Apr 26, 2024
df75870
feat: lighter dockerfile (#585)
d0choa Apr 26, 2024
b83a8aa
fix: docker action fixes (#589)
d0choa Apr 26, 2024
c89cd57
fix: docker action fixes (#590)
d0choa Apr 26, 2024
70e5e26
fix: docker action fixes v3 (#591)
d0choa Apr 26, 2024
d163215
build(deps-dev): bump deptry from 0.14.0 to 0.16.1 (#570)
dependabot[bot] Apr 29, 2024
1c341c2
build(deps-dev): bump lxml from 5.1.0 to 5.2.1 (#569)
dependabot[bot] Apr 29, 2024
dce499f
chore: pre-commit autoupdate (#561)
pre-commit-ci[bot] Apr 29, 2024
e58ad7f
build(deps-dev): bump pre-commit from 3.6.0 to 3.7.0 (#559)
dependabot[bot] Apr 29, 2024
60c0003
build(deps-dev): bump pytest-cov from 4.1.0 to 5.0.0 (#560)
dependabot[bot] Apr 29, 2024
c468fe0
build(deps-dev): bump python-semantic-release from 9.1.0 to 9.4.1 (#571)
dependabot[bot] Apr 29, 2024
23a5fae
chore: pre-commit autoupdate (#593)
pre-commit-ci[bot] Apr 30, 2024
690d158
feat: add purity qc metrics to fine-mapping (#592)
addramir Apr 30, 2024
962b79c
feat: check for if no overlapping variants in LD index (#594)
Daniel-Considine May 1, 2024
85c3aa8
chore: pre-commit autoupdate (#601)
pre-commit-ci[bot] May 7, 2024
a9ef7c0
build(deps-dev): bump pymdown-extensions from 10.7 to 10.8.1 (#600)
dependabot[bot] May 7, 2024
d34b7d7
build(deps-dev): bump interrogate from 1.5.0 to 1.7.0 (#599)
dependabot[bot] May 7, 2024
7af1cbc
build(deps-dev): bump pytest from 8.1.0 to 8.2.0 (#598)
dependabot[bot] May 7, 2024
9f6e6b9
build(deps-dev): bump pytest-xdist from 3.5.0 to 3.6.1 (#597)
dependabot[bot] May 8, 2024
d89ec86
build(deps-dev): bump ruff from 0.3.2 to 0.4.3 (#596)
dependabot[bot] May 8, 2024
bc1a112
feat(airflow): include COLOC as a node in the DAG (#530)
ireneisdoomed May 15, 2024
1576cf4
build(deps): bump typing-extensions from 4.10.0 to 4.11.0 (#602)
dependabot[bot] May 15, 2024
8bc09c8
build(deps-dev): bump ipython from 8.22.1 to 8.24.0 (#603)
dependabot[bot] May 15, 2024
c066327
build(deps-dev): bump mypy from 1.9.0 to 1.10.0 (#604)
dependabot[bot] May 15, 2024
bb9f9c6
refactor: moving all variant coordinates to GnomAD (#566)
DSuveges May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/labeler.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
version: "1"
version: 1
labels:
- label: "size-XS"
size:
Expand Down
39 changes: 39 additions & 0 deletions .github/workflows/artifact.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Build and Push to Artifact Registry

"on":
push:
branches: ["dev"]

env:
PROJECT_ID: open-targets-genetics-dev
REGION: europe-west1
GAR_LOCATION: europe-west1-docker.pkg.dev/open-targets-genetics-dev
IMAGE_NAME: gentropy-app

jobs:
build-push-artifact:
runs-on: ubuntu-latest
steps:
- name: "Checkout"
uses: "actions/checkout@v3"

- name: "auth"
uses: "google-github-actions/auth@v2"
with:
credentials_json: "${{ secrets.SERVICE_ACCOUNT_KEY }}"

- name: "Set up Cloud SDK"
uses: "google-github-actions/setup-gcloud@v2"

- name: "Use gcloud CLI"
run: "gcloud info"

- name: "Docker auth"
run: |-
gcloud auth configure-docker ${{ env.REGION }}-docker.pkg.dev --quiet

- name: Build image
run: docker build . --tag "${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}/gentropy:${{ github.ref_name }}"

- name: Push image
run: docker push "${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}/gentropy:${{ github.ref_name }}"
14 changes: 7 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ci:
skip: [poetry-lock]
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.0
rev: v0.4.3
hooks:
- id: ruff
args:
Expand All @@ -15,7 +15,7 @@ repos:
files: ^((gentropy|utils|tests)/.+)?[^/]+\.py$

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand Down Expand Up @@ -59,14 +59,14 @@ repos:
exclude: "CHANGELOG.md"

- repo: https://github.com/alessandrojcm/commitlint-pre-commit-hook
rev: v9.11.0
rev: v9.16.0
hooks:
- id: commitlint
additional_dependencies: ["@commitlint/config-conventional"]
additional_dependencies: ["@commitlint/config-conventional@18.6.3"]
stages: [commit-msg]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: "v1.8.0"
rev: "v1.10.0"
hooks:
- id: mypy
args:
Expand All @@ -82,7 +82,7 @@ repos:
- "--disallow-untyped-defs"

- repo: https://github.com/econchick/interrogate
rev: 1.5.0
rev: 1.7.0
hooks:
- id: interrogate
args: [--verbose]
Expand All @@ -104,7 +104,7 @@ repos:
- id: pydoclint

- repo: https://github.com/python-poetry/poetry
rev: "1.8.2"
rev: "1.8.0"
hooks:
- id: poetry-check
- id: poetry-lock
Expand Down
33 changes: 33 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
FROM python:3.10-bullseye


RUN apt-get update && \
apt-get install -y openjdk-11-jdk && \
apt-get clean && rm -rf /var/lib/apt/lists/*

RUN java -version

# Set environment variables for Java
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
ENV PATH=$PATH:$JAVA_HOME/bin

RUN pip install poetry==1.7.1

ENV POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache

WORKDIR /app

COPY pyproject.toml poetry.lock ./
RUN touch README.md

RUN poetry config installer.max-workers 10
RUN poetry install --without dev,docs,tests --no-root --no-interaction --no-ansi -vvv && rm -rf $POETRY_CACHE_DIR

COPY src ./src

RUN poetry install --without dev,docs,tests

ENTRYPOINT ["poetry", "run", "gentropy"]
8 changes: 4 additions & 4 deletions config/datasets/ot_gcp.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Release specific configuration:
release_version: "24.01"
release_version: "24.03"
dev_version: XX.XX
release_folder: gs://genetics_etl_python_playground/releases/${datasets.release_version}

inputs: gs://genetics_etl_python_playground/input
static_assets: gs://genetics_etl_python_playground/static_assetss
static_assets: gs://genetics_etl_python_playground/static_assets
outputs: gs://genetics_etl_python_playground/output/python_etl/parquet/${datasets.dev_version}

## Datasets:
Expand Down Expand Up @@ -36,9 +36,9 @@ anderson: ${datasets.static_assets}/andersson2014/enhancer_tss_associations.bed
javierre: ${datasets.static_assets}/javierre_2016_preprocessed
jung: ${datasets.static_assets}/jung2019_pchic_tableS3.csv
thurman: ${datasets.static_assets}/thurman2012/genomewideCorrs_above0.7_promoterPlusMinus500kb_withGeneNames_32celltypeCategories.bed8.gz
target_index: ${datasets.release_folder}/targets # OTP 23.12 data
target_index: ${datasets.static_assets}/targets # OTP 23.12 data
gene_interactions: ${datasets.static_assets}/interaction # OTP 23.12 data

gene_interactions: ${datasets.release_folder}/interaction # OTP 23.12 data
finngen_finemapping_results_path: ${datasets.inputs}/Finngen_susie_finemapping_r10/full
finngen_finemapping_summaries_path: ${datasets.inputs}/Finngen_susie_finemapping_r10/Finngen_susie_credset_summary_r10.tsv

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ defaults:
credible_set_path: ${datasets.credible_set}
study_index_path: ${datasets.study_index}
coloc_path: ${datasets.colocalisation}
colocalisation_method: Coloc
7 changes: 7 additions & 0 deletions config/step/ot_colocalisation_ecaviar.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
defaults:
- colocalisation

credible_set_path: ${datasets.credible_set}
study_index_path: ${datasets.study_index}
coloc_path: ${datasets.colocalisation}
colocalisation_method: ECaviar
2 changes: 1 addition & 1 deletion config/step/ot_variant_index.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ defaults:
- variant_index

variant_annotation_path: ${datasets.variant_annotation}
credible_set_path: ${datasets.study_locus}
credible_set_path: ${datasets.credible_set}
variant_index_path: ${datasets.variant_index}
28 changes: 28 additions & 0 deletions docs/python_api/methods/sumstat_imputation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: Summary Statistics Imputation
---

Summary statistics imputation leverages linkage disequilibrium (LD) information to compute Z-scores of missing SNPs from neighbouring observed SNPs
SNPs by taking advantage of the Linkage Disequilibrium.

We implemented the basic model from RAISS (Robust and Accurate Imputation from Summary Statistics) package (see the original [paper](https://academic.oup.com/bioinformatics/article/35/22/4837/5512360)).

The full repository for the RAISS package can be found [here](https://gitlab.pasteur.fr/statistical-genetics/raiss).

The original model was suggested in 2014 by Bogdan Pasaniuc et al. [here](https://pubmed.ncbi.nlm.nih.gov/24990607/).

It represents the following formula:

E(z*i|z_t) = M*{i,t} \cdot (M\_{t,t})^{-1} \cdot z_t

Where:

- E(z_i|z_t) represents the expected z-score of SNP 'i' given the observed z-scores at known SNP indexes 't'.

- M\_{i,t} represents the LD (Linkage Disequilibrium) matrix between SNP 'i' and the known SNPs at indexes 't'.

- (M\_{t,t})^{-1} represents the inverse of the LD matrix of the known SNPs at indexes 't'.

- z_t represents the vector of observed z-scores at the known SNP indexes 't'.

:::gentropy.method.sumstat_imputation.SummaryStatisticsImputation
18 changes: 18 additions & 0 deletions docs/python_api/methods/sumstat_quality_controls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: QC of GWAS Summary Statistics
---

This class consists of several general quality control checks for GWAS with full summary statistics.
There are several checks included:

1. Genomic control lambda (median of the distribution of Chi2 statistics divided by expected for Chi2 with df=1). Lambda should be reasonably close to 1. Ideally not bigger than 2.

2. P-Z check: the linear regression between log10 of reported p-values and log10 of p-values inferred from betas and standard errors. Intercept of the regression should be close to 0, slope close to 1.

3. Mean beta check: mean of beta. Should be close to 0.

4. The N_eff check: It estimates the ratio between effective sample size and the expected one and checks its distribution. It is possible to conduct only if the effective allele frequency is provided in the study. The median ratio is always close to 1, standard error should be close to 0.

5. Number of SNPs and number of significant SNPs.

:::gentropy.method.sumstat_quality_controls.SummaryStatisticsQC
2 changes: 1 addition & 1 deletion docs/src_snippets/howto/python_api/c_applying_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def apply_class_method_clumping(summary_stats: SummaryStatistics) -> StudyLocus:
from gentropy.method.window_based_clumping import WindowBasedClumping

clumped_summary_statistics = WindowBasedClumping.clump(
summary_stats, window_length=500_000
summary_stats, distance=250_000
)
# --8<-- [end:apply_class_method_clumping]
return clumped_summary_statistics
Expand Down
Loading
Loading