Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a comprehensive list of read/alignment filters? #914

Open
trev-f opened this issue Dec 10, 2024 · 2 comments
Open

Is there a comprehensive list of read/alignment filters? #914

trev-f opened this issue Dec 10, 2024 · 2 comments

Comments

@trev-f
Copy link

trev-f commented Dec 10, 2024

Is there a comprehensive list of default filters applied to reads/alignments? For example, I'm looking for something like what is in the "Read filters" section of the GATK HaplotypeCaller man page.

I think I've gathered some of what is out there (e.g. if a candidate is only supported by one read it is not considered) from various places in the FAQs, issues, and even places in the source code, but it would be helpful to have a list of everything in one place along with any arguments that can be set to modify defaults.

Thanks in advance, and sorry if this is already available somewhere and I missed it.

@AndrewCarroll
Copy link
Collaborator

Hi @trev-f

There are three other relevant filters for reads:

min_mapping_quality - If the MAPQ value of the read is below a pre-set value, the read is not included for subsequent analysis. By default, that value is 5 for WGS and WES with BWA, it is 0 for vg-mapped files. It is 1 for PacBio and ONT reads.

min_base_quality - If the base quality value at the variant position considered is below a pre-set value, the read is not included in the analysis of that specific variant position (it could be included for other positions if the base quality of the base at the other position is higher). For SNPs, the direct position is considered. For Insertions, the behavior depends on other flags, in some cases averaging qualities of inserted bases, in other cases looking at the minimum base quality value. By default this is 10.

downsampling - When coverage is very high, random downsampling is applied to reads for inclusion into the pileup that the neural network sees.

@pichuan
Copy link
Collaborator

pichuan commented Dec 13, 2024

Hi @trev-f ,
In https://github.com/google/deepvariant/blob/r1.8/deepvariant/make_examples_options.py , you can also see flags like

  • keep_duplicates
  • keep_supplementary_alignments
  • keep_secondary_alignments

You can find the default values in the link above.

In general, to get a comprehensive list of flags, running the make_examples binary with --help is most likely to get you the most complete list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants