Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(method): improved performance in coloc tests #536

Merged
merged 9 commits into from
Mar 21, 2024
Merged

test(method): improved performance in coloc tests #536

merged 9 commits into from
Mar 21, 2024

Conversation

xyg123
Copy link
Contributor

@xyg123 xyg123 commented Mar 12, 2024

✨ Context

Prev. implemented coloc unit tests were taking ~140 seconds, now using pytest, the coloc unit test time is ~45 secs.

🛠 What does this PR implement

Read in coloc test data using conftest.py.

🙈 Missing

None.

🚦 Before submitting

  • [ x] Do these changes cover one single feature (one change at a time)?
  • [ x] Did you read the contributor guideline?
  • [ x] Did you make sure to update the documentation with your changes?
  • [ x] Did you make sure there is no commented out code in this PR?
  • [ x] Did you follow conventional commits standards in PR title and commit messages?
  • [x ] Did you make sure the branch is up-to-date with the dev branch?
  • [ x] Did you write any new necessary tests?
  • [ x] Did you make sure the changes pass local tests (make test)?
  • [ x] Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@xyg123 xyg123 requested a review from d0choa March 12, 2024 11:39
@d0choa d0choa changed the title fix: pytest for coloc unit tests test(method): improved performance in coloc tests Mar 12, 2024
@d0choa
Copy link
Collaborator

d0choa commented Mar 12, 2024

@ireneisdoomed do you mind looking at this one when you have time? Returning a list of dataframes looks a bit off.

@ireneisdoomed
Copy link
Contributor

@d0choa will do!

def sample_data_for_coloc(spark: SparkSession) -> list[Any]:
"""Sample data for Coloc tests."""
overlap_df = spark.read.parquet(
"tests/gentropy/data_samples/coloc_test_data.snappy.parquet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this file generated? For semantic tests, it's easier to understand if you create a data subset in the testing module directly.
Instead of reading a file of 500 rows, create a dataframe with 2 overlapping variants, for example.

The same testing function can be parametrised for both scenarios: associations that overlap on multiple SNPs, and on a single SNP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was directly extracted from the test dataset from the R package

ireneisdoomed and others added 3 commits March 13, 2024 09:41
* test(coloc): define fixtures and parametrise coloc tests

* test(coloc): compare dfs with assert_frame_equal
@ireneisdoomed
Copy link
Contributor

@xyg123 Do we need to update the expected results or can this be merged?

@xyg123
Copy link
Contributor Author

xyg123 commented Mar 20, 2024

@xyg123 Do we need to update the expected results or can this be merged?

Yes! I think so!
Apologies, I assumed check_exact=False, check_dtype=True meant it only checking if data types were the same, I didn't realise the function was still doing a results comparison with a floating point tolerance.

@xyg123 xyg123 merged commit 512a80a into dev Mar 21, 2024
4 checks passed
@xyg123 xyg123 deleted the pytest_coloc branch March 21, 2024 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants