Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for coverage data #16

Open
Hypecoum opened this issue Jul 28, 2022 · 3 comments
Open

Looking for coverage data #16

Hypecoum opened this issue Jul 28, 2022 · 3 comments

Comments

@Hypecoum
Copy link

Dear developers,

Thanks for a great software package to analyse epiGBS data efficiently.

I have recently been running the pipeline on one of my datasets and would like to calculate the read coverage for each assembled fragment for downstream filtering of assembled loci. Could you please help me indicate where in the output I would be able to find such information?

My first guess was that I would be able to find it in the "alignment" directory, however, I could not find any documentation on the contents of this output directory. Could you please explain me what is in the .bam files in this directory as well?

I have been running the pipeline in reference mode.

Many thanks,
Yannick Woudstra

@MaartenPostuma
Copy link
Collaborator

MaartenPostuma commented Aug 2, 2022

Hi Yannick,
The easiest way to do it would be to look at the .vcf file that's output after SNP calling or load the methylation calling data using a R package such as methylkit. Here you can find the coverage for each SNP / methylation site.

The bam file is relatively complicated (see https://samtools.github.io/hts-specs/SAMv1.pdf for more info on the format), however the program samtools can be used to extract all sorts of information from these files.

Furthermore using the reference mode, fragments do not get assembled. Instead they are mapped directly onto reference genome, therefore the pipeline will only output the coverage on each location on the reference genome.

Hope this helps,
Maarten

@Hypecoum
Copy link
Author

Hypecoum commented Aug 4, 2022

Dear Maarten,

Thanks so much for your helpful answer. I will certainly check the methylation calling data in R as you suggested.

The output you mentioned in your last comment about reference mode is exactly the data that I require. I wish to have the coverage of reads on each part of the genome that is covered by the epiGBS experiment. Could you please tell me where I find this information?

Many thanks again,
Yannick

@MaartenPostuma
Copy link
Collaborator

Hi Yannick,
You can calculate it with
samtools coverage YOUR_BAM_FILE.bam
Greetings,
Maarten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants