fix(finngen_study_index): improved tests for finngen study index #776

project-defiant · 2024-09-20T10:28:09Z

✨ Context

The overall issue affecting CI tests was the get request to the https://r11.finngen.fi/api/phenos.
In addition to this, I have found a few more things worth resolving and improved the test coverage of the step overall.

🛠 What does this PR implement

This PR implements following things:

mock of urllib.request.urlopen function to mimic get requests for
- https://r11.finngen.fi/api/phenos - finngen phenotypes
- https://raw.githubusercontent.com/opentargets/curation/24.09.1/mappings/disease/manual_string.tsv - efo table
addition of step_tests marks to the pytest to be able to isolate integration tests from unit tests
refactor of the finngen study index generation tests to use mock request objects
update of finngen sampleSize to match R11 release and extracted it as a required parameter for step function, so it gets updated each time we would bump the release
drop of default parameters from downstream functions that generete study index

🙈 Missing

🚦 Before submitting

Do these changes cover one single feature (one change at a time)?
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes?
Did you make sure there is no commented out code in this PR?
Did you follow conventional commits standards in PR title and commit messages?
Did you make sure the branch is up-to-date with the dev branch?
Did you write any new necessary tests?
Did you make sure the changes pass local tests (make test)?
Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

project-defiant · 2024-09-20T10:40:57Z

@DSuveges tagged you as I want to understand if non changed sample size is going to affect much of downstream processing. If so, the previous size was used from R9 (R10 also used R9 sample size in our dags)

DSuveges · 2024-09-23T09:34:24Z

@DSuveges tagged you as I want to understand if non changed sample size is going to affect much of downstream processing. If so, the previous size was used from R9 (R10 also used R9 sample size in our dags)

The sample size doesn't have any downstream application besides showing on the UI. Relative sample sizes (ldPopulations) are used to get LD set for mixed ancestry GWASes, but this is not applicable for FinnGen.

DSuveges

Great test suite, however it would likely to be a pain to actualise when moving on to R12.

project-defiant · 2024-09-23T09:56:42Z

Great test suite, however it would likely to be a pain to actualise when moving on to R12.

What do you mean exactly? What will be hard to actualise ? I expect that the phenos that will eventually come from https://r12.finngen.fi/api/phenos will be simillar to the ones in https://r10.finngen.fi/api/phenos and https://r11.finngen.fi/api/phenos. This would mean that we eventually should just update the sampleSize and initialSampleSize

DSuveges · 2024-09-23T12:55:21Z

Ah, OK, so these example datasets in the test:

            {
                "assoc_files": [
                    "/cromwell_root/pheweb/generated-by-pheweb/pheno_gz/GLUCOSE.gz"
                ],
                "category": "Glucose",
                "category_index": 28,
                "gc_lambda": {
                    "0.001": 1.1251,
                    "0.01": 1.062,
                    "0.1": 1.0531,
                    "0.5": 1.0599,
                },
                "num_cases": 43764,
                "num_cases_prev": 39231,
                "num_controls": 409969,
                "num_controls_prev": 372950,
                "num_gw_significant": 3,
                "num_gw_significant_prev": 3,
                "phenocode": "GLUCOSE",
                "phenostring": "Glucose",
            }

Serves as mock for modelling the schema.

project-defiant · 2024-09-23T13:32:18Z

@DSuveges tagged you as I want to understand if non changed sample size is going to affect much of downstream processing. If so, the previous size was used from R9 (R10 also used R9 sample size in our dags)

The sample size doesn't have any downstream application besides showing on the UI. Relative sample sizes (ldPopulations) are used to get LD set for mixed ancestry GWASes, but this is not applicable for FinnGen.

@DSuveges just to clear things up.
Am I understanding correctly , that none of ldPopulationStructure nor sampleSize are used downstream in case of finngen? The ldPopulationStructure actually relies on discoverySamples which relies on the provided (previously hardcoded) sampleSize.

ireneisdoomed

This PR improves extensively the tests to generate a study index from Finngen, and updates the sample size values to match Finngen R11. I expect that tests won't fail when a new release is out as long as the input format is the same.
Approving, but please see my comments

tests/gentropy/datasource/finngen/test_finngen_study_index.py

fix(finngen_study_index): improved tests for finngen study index

be0f308

github-actions bot added bug Something isn't working size-M Step Datasource labels Sep 20, 2024

chore(tests): added pytest mark to be able to isolate step tests

7276c1f

project-defiant marked this pull request as ready for review September 20, 2024 10:37

project-defiant requested review from ireneisdoomed, d0choa and DSuveges and removed request for d0choa September 20, 2024 10:37

project-defiant added 2 commits September 20, 2024 13:24

Merge branch 'dev' into mock-finngen-study-index-input-api-call

d045eec

Merge branch 'dev' into mock-finngen-study-index-input-api-call

2b77829

DSuveges reviewed Sep 23, 2024

View reviewed changes

project-defiant added 2 commits September 23, 2024 19:28

Merge branch 'dev' into mock-finngen-study-index-input-api-call

739d1e2

Merge branch 'dev' into mock-finngen-study-index-input-api-call

b011329

ireneisdoomed approved these changes Sep 24, 2024

View reviewed changes

project-defiant and others added 3 commits September 24, 2024 23:40

Merge branch 'dev' into mock-finngen-study-index-input-api-call

0f36704

chore: pr comments

5b4064a

feat: revert mock.patch

03ea00d

project-defiant merged commit 6c4bdf5 into dev Sep 24, 2024
5 checks passed

project-defiant deleted the mock-finngen-study-index-input-api-call branch September 24, 2024 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(finngen_study_index): improved tests for finngen study index #776

fix(finngen_study_index): improved tests for finngen study index #776

project-defiant commented Sep 20, 2024 •

edited

Loading

project-defiant commented Sep 20, 2024

DSuveges commented Sep 23, 2024

DSuveges left a comment

project-defiant commented Sep 23, 2024 •

edited

Loading

DSuveges commented Sep 23, 2024

project-defiant commented Sep 23, 2024

ireneisdoomed left a comment

fix(finngen_study_index): improved tests for finngen study index #776

fix(finngen_study_index): improved tests for finngen study index #776

Conversation

project-defiant commented Sep 20, 2024 • edited Loading

✨ Context

🛠 What does this PR implement

🙈 Missing

🚦 Before submitting

project-defiant commented Sep 20, 2024

DSuveges commented Sep 23, 2024

DSuveges left a comment

Choose a reason for hiding this comment

project-defiant commented Sep 23, 2024 • edited Loading

DSuveges commented Sep 23, 2024

project-defiant commented Sep 23, 2024

ireneisdoomed left a comment

Choose a reason for hiding this comment

project-defiant commented Sep 20, 2024 •

edited

Loading

project-defiant commented Sep 23, 2024 •

edited

Loading