This section will cover
- Removing chimeric sequences
- Removing unwanted taxa from the dataset
- Calculate the error rate using the Mock community
- Determine OTUs from the tutorial data at thr 97% sequence similarity level.
Note that we use uchime here, but the current mothur standard is Vsearch and is recommended for chimera checking. We use uchime since the biolinux version does not have Vsearch.
chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table)
remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)
summary.seqs(fasta=current, count=current)
remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table)
Why is this the correct way?
summary.seqs(fasta=current, count=current)
classify.seqs(fasta=current, count=current, taxonomy=trainset9_032012.pds.tax, reference=trainset9_032012.pds.fasta, cutoff=80)
remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloropast-Mitochondria-unknown-Archaea-Eukaryota)
summary.tax(taxonomy=current, count=current)
summary.tax(taxonomy=current, count=current)
At this point we have a clean dataset which can be used for generating OTUs and doing diversity studies. No matter what toolor pipeline you have used al of the above filtering/removal /screening steps have to be performed. Of course this is debatable.
##Assessing error rates In this section we use a mock community consisting of 32 unaligned fasta sequences, to analyze the error rate that we have based on the clean-up that we have performed.
get.groups(count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.count_table, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, groups=Mock)
seq.error(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.count_table, reference=HMP_MOCK.v35.fasta, aligned=F)
dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, cutoff=0.03)
cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.count_table, cutoff=0.03)
make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.count_table, label=0.03)
rarefaction.single(shared=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.shared)
This generates a file which you can us to check how the diversity is distributed in the dataset.
remove.groups(count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.count_table, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, groups=Mock)
dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, cutoff=0.20)
Why is the cutoff used here?
cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.count_table, method=average, cutoff=0.20)
Note: in the newest MiSeq SOP this method is replaced by the optiClust
algorithm.
make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.count_table, label=0.03)
classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, label=0.03)