Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(finngen_study_index): improved tests for finngen study index #776

Merged
merged 9 commits into from
Sep 24, 2024

Conversation

project-defiant
Copy link
Contributor

@project-defiant project-defiant commented Sep 20, 2024

✨ Context

This PR closes opentargets/issues#3484

The overall issue affecting CI tests was the get request to the https://r11.finngen.fi/api/phenos.
In addition to this, I have found a few more things worth resolving and improved the test coverage of the step overall.

🛠 What does this PR implement

This PR implements following things:

  • mock of urllib.request.urlopen function to mimic get requests for
  • addition of step_tests marks to the pytest to be able to isolate integration tests from unit tests
  • refactor of the finngen study index generation tests to use mock request objects
  • update of finngen sampleSize to match R11 release and extracted it as a required parameter for step function, so it gets updated each time we would bump the release
  • drop of default parameters from downstream functions that generete study index

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added bug Something isn't working size-M Step Datasource labels Sep 20, 2024
@project-defiant project-defiant marked this pull request as ready for review September 20, 2024 10:37
@project-defiant project-defiant requested review from ireneisdoomed, d0choa and DSuveges and removed request for d0choa September 20, 2024 10:37
@project-defiant
Copy link
Contributor Author

@DSuveges tagged you as I want to understand if non changed sample size is going to affect much of downstream processing. If so, the previous size was used from R9 (R10 also used R9 sample size in our dags)

@DSuveges
Copy link
Contributor

@DSuveges tagged you as I want to understand if non changed sample size is going to affect much of downstream processing. If so, the previous size was used from R9 (R10 also used R9 sample size in our dags)

The sample size doesn't have any downstream application besides showing on the UI. Relative sample sizes (ldPopulations) are used to get LD set for mixed ancestry GWASes, but this is not applicable for FinnGen.

Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test suite, however it would likely to be a pain to actualise when moving on to R12.

@project-defiant
Copy link
Contributor Author

project-defiant commented Sep 23, 2024

Great test suite, however it would likely to be a pain to actualise when moving on to R12.

What do you mean exactly? What will be hard to actualise ? I expect that the phenos that will eventually come from https://r12.finngen.fi/api/phenos will be simillar to the ones in https://r10.finngen.fi/api/phenos and https://r11.finngen.fi/api/phenos. This would mean that we eventually should just update the sampleSize and initialSampleSize

@DSuveges
Copy link
Contributor

Ah, OK, so these example datasets in the test:

            {
                "assoc_files": [
                    "/cromwell_root/pheweb/generated-by-pheweb/pheno_gz/GLUCOSE.gz"
                ],
                "category": "Glucose",
                "category_index": 28,
                "gc_lambda": {
                    "0.001": 1.1251,
                    "0.01": 1.062,
                    "0.1": 1.0531,
                    "0.5": 1.0599,
                },
                "num_cases": 43764,
                "num_cases_prev": 39231,
                "num_controls": 409969,
                "num_controls_prev": 372950,
                "num_gw_significant": 3,
                "num_gw_significant_prev": 3,
                "phenocode": "GLUCOSE",
                "phenostring": "Glucose",
            }

Serves as mock for modelling the schema.

@project-defiant
Copy link
Contributor Author

@DSuveges tagged you as I want to understand if non changed sample size is going to affect much of downstream processing. If so, the previous size was used from R9 (R10 also used R9 sample size in our dags)

The sample size doesn't have any downstream application besides showing on the UI. Relative sample sizes (ldPopulations) are used to get LD set for mixed ancestry GWASes, but this is not applicable for FinnGen.

@DSuveges just to clear things up.
Am I understanding correctly , that none of ldPopulationStructure nor sampleSize are used downstream in case of finngen? The ldPopulationStructure actually relies on discoverySamples which relies on the provided (previously hardcoded) sampleSize.

Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR improves extensively the tests to generate a study index from Finngen, and updates the sample size values to match Finngen R11. I expect that tests won't fail when a new release is out as long as the input format is the same.
Approving, but please see my comments

@project-defiant project-defiant merged commit 6c4bdf5 into dev Sep 24, 2024
5 checks passed
@project-defiant project-defiant deleted the mock-finngen-study-index-input-api-call branch September 24, 2024 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Datasource size-M Step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make a mock of Finngen study index resource url.
3 participants