-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Species-level 16S kraken2 database? #862
Comments
Kraken 2 / Bracken 16s RNA indexes are available for Greenegenes, RDP, Silva. https://benlangmead.github.io/aws-indexes/k2 Does this help? |
Hi @ChillarAnand , Many thanks for your reply. Unfortunately, no - this doesn't help. The Kraken2 RDP and Silva databases are limited to genus level, while GreenGenes has not been updated since - I think - 2016. |
Hi @DntBScrdDv , |
Many thanks for this @Username-felix-is-not-available , I'm sorry, but could you explain a little what all the different files are? Is the .fa the sequences? What's the giant .suf file? Thanks! |
You are very welcome, @DntBScrdDv . For your purposes, you can ignore all files except the "RDP16s28s.fa" (sequences) and the "tax.RDP16s28s.txt" (taxonomy) files. The other files either provide metadata or are specific to the LAST alignment program which I used for my project. I hope my message will not send you down the rabbit hole, because Kraken2 uses a vastly different approach to taxonomy files than I did. In my files, you can use the sequence ID in the FASTA file to find the matching taxonomic string in the taxonomy file. The string contains the full lineage. Kraken2 uses an approach based on taxonomy IDs and splits the lineage in single taxa (see names.dmp and nodes.dmp files in Kraken2 database). For the special databases, it is best to assume that they are not identical to the NCBI taxonomy IDs (i.e. they are artificial). I think translating my files to Kraken2 format could be very difficult. It may be easier to use the logic in my script and add it to Kraken2's build_rdp_taxonomy.pl. The logic is described here (Supplementary Methods 4.2) in more general terms. Nevertheless, I don't know what downstream effects this would have. My automated approach to fix the taxonomy is also not fool proof and I am not a taxonomist by training. So there will be some room for improvement. If you come up with a better approach, please let me know. Best, |
Hi all,
I'm in need of a species-level 16S database. I had relied on the rdp database on other analysis platforms (e.g. FROGS) but the pre-built kraken2 rdp database only goes to genus-level.
I built a database from the RefSeq database but it is missing many key taxa (e.g. candidatus Omnitrophus).
Does anyone know of a species-level 16S database for kraken2 that is broader than RefSeg? e.g. a species-level Silva or RDP?
Many thanks
The text was updated successfully, but these errors were encountered: