Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAF munging #10

Open
bschilder opened this issue Nov 13, 2019 · 5 comments
Open

MAF munging #10

bschilder opened this issue Nov 13, 2019 · 5 comments

Comments

@bschilder
Copy link

munge_polyfun_sumstats.py

In line 62 I noticed that when the columns are renamed, freq and MAF seem to be treated the same. But couldn't these two things be different in a summary stats file? Perhaps one way would be to check if 1-freq ≤ .5, and if it is then you know it's the minor allele (and then can flip the ref/alt alleles and effect, though the specifics of this might depend on the particular file format).

@omerwe
Copy link
Owner

omerwe commented Nov 14, 2019

Good point! The code actually treats MAF as FREQ (does not assumes it's <0.5), so the name is wrong but the functionality is correct. I'll keep this issue open to remind myself to change the MAF name...

@bschilder
Copy link
Author

Got it, thanks for clarifying!

@bschilder
Copy link
Author

bschilder commented Mar 16, 2022

Was just looking back at this and realized we actually have some ways of addressing this now.

MungeSumstats does some inference of what each column means (including MAF/FRQ) and standardizes them. Specifically, these internal functions.

Beyond this, the main exported function format_sumstats also has an arg that formats the sumstats to LDSC format automatically (format_sumstats(..., ldsc_format=TRUE). We designed this pipeline to cover everything that mungesumstats.py does, and much much more, so perhaps it would be worth mentioning MungeSumstats as an alternative?

@Al-Murphy

@omerwe
Copy link
Owner

omerwe commented Mar 17, 2022

@bschilder @AI-Murphy thanks this is awesome!

I kept this ticket open for too long because I was afraid that changing the MAF column to another name would mess things up. But I'm happy to recommend that people use your code if it's more robust and actively maintained.

Would you mind writing a shell-command snippet demonstrating how to use your package to replace PolyFun's internal munge_sumstats script? I can put this in the wiki as a recommendation.

@bschilder
Copy link
Author

Great! Happy to put the shell script together, will share it as soon as it's ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants