-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in creating the standard Kraken2 database #412
Comments
Hi! error: I notice other people too getting the same error but could not find any solution for it. |
@saras224 thats a connectivity issue. It may be that ncbi's ftp site is not working at the moment so you might just have to wait, or you can try using @Nathalia-Cavichiolli thats an issue i havent seen before. Are you using the newest kraken2 version to download the files? |
Thank you for responding Jennifer! I tried --use-ftp with kraken2 version 2.0.8 and it did not work so I read other issues and figured out that was a version issue so I downloaded latest kraken2 version which is 2.1.1 and installed then used --use-ftp flag while building and this time it started to build but while processing it was giving this: gzip: plasmid.3.1.genomic.fna.gz: invalid compressed data--format violated gzip: plasmid.4.1.genomic.fna.gz: invalid compressed data--format violated I gave this command: kraken2-build --standard --threads 12 --db STANDARD-DB --use-ftp Thanks |
Thank you for your replay I've installed using conda (conda install -c bioconda kraken2). The version is v2.1.1 |
Just letting you guys know that I'm checking the standard build right now. I'll comment again if I figure out what is going on |
@Nathalia-Cavichiolli im not seeing the same issue with the plasmid files. It could be that they had messed up the plasmid download at the time.....I would try running the standard download again, but I don't have a solution for you otherwise. Since it looks like its only happening with plasmid, I wouldn't restart from scratch but check to see which of these does not exist: and download that:
That should save you from re-doing the full build. |
@saras224 Can you retry downloading the viral and plasmid libraries? I'm sorry about all of the issues but I suspect the plasmid errors should be fixed by now as my standard build did not have those problems. The errors with downloading the viral library seems to do with the ftp-download script which I'm still trying to debug at the moment. I'm not certain why it is doing that but its having a problem downloading the files correctly and then unzipping them. |
Hi, I'm having the same issue. As archaea, bacteria and viral were downloaded correctly, I tried with: gzip: plasmid.1.1.genomic.fna.gz: invalid compressed data--format violated Could it be that the original file at NCBI is corrupted? |
Thank you @saras224 . |
I'm having this problem with all downloads using kraken2. I suspect ncbi has changed something and the download script no longer works. |
I have the same problem with the plasmid database with kraken 2.1.1. To debug, here is the output from the wget
this is with wget 1.14 behind a squid proxy (using ftp-over-http) My workaround, which is admittedly ugly:
|
Apologies everyone, for the rsync issue, it may be a temporary connection issue that MAY resolve if you try again at a later time. However, if not, there is some sort of server setting that is preventing your server from connecting to NCBI via rsync. In that case, the only option is to use the "--use-ftp" download option. However, we recently discovered an issue with the ftp-downloads that we are working to resolve alongside the NCBI staff. |
Hi @jenniferlu717 Thanks |
HI @jenniferlu717
I hope there is a solution soon, |
I didn't really explain my workaround very well, so here is a patch:
For me the issue was that the script wasn't generating the list of plasmid files to download. the |
I actually have the same problem as @saras224 for the bacteria and viral but not the human database. Even though I can use them further, they are not complete ( when I compare to refseq). Thanks for all the effort! |
This just has worked for me. I download this libraries independently. And then build. The building is going on but it seems that the problem is solved. Thanks a lot!
|
Would it be possible to do a sanity check against the md5 hash they provide? |
Hi, I met the same problem:
gzip: plasmid.10.1.genomic.fna.gz: invalid compressed data--format violated gzip: plasmid.3.1.genomic.fna.gz: invalid compressed data--format violated Here I have two questions: **The first one: about kraken2-build --standard ** I read these comments and replies, I tried again ,but it takes long time to run again( --thread 84, take >10 hour then the progress faild ), is there any solution to avoid the the full build if I already have the result of last time? By the way, the --use-ftp doesn't work either for me:
Can I do something to fix these error and run the kraken2-build smoothly? **The second one: about kraken2-build --download-library ; kraken2-build --build ** Any suggestions mean a lot for me ! Thank you very much ! |
|
Someone solved this issue? I have tried to do all the suggested changes here and in #529 but nothing works. |
All databases are here |
Thanks Sergey, this is really helpful. |
You can try this, and it's worked for me: |
After running into a lot of issues, I build a small tool to easily create any kraken2 database index with a single command. You can build a standard db with single command.
You can stop the process at any point and when you re-run it, it resumes from where it has stopped. It is also way faster than You can give it a try and I will be happy answer any questions. |
Hello @ChillarAnand . Thanks for sharing this amazing tool! I am running like this:
And I continue to get an error like this:
In line 23 of |
Ah I found the repo. Adding my question there. |
Hello I'm trying to create the standard Kraken 2 database, with the following command:
kraken2-build --standard --threads 16 --db kraken-Sdb
I'm getting this:
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Downloading plasmid files from FTP...
gzip: plasmid.6.1.genomic.fna.gz: invalid compressed data--format violated
gzip: plasmid.7.1.genomic.fna.gz: invalid compressed data--format violated
gzip: plasmid.8.1.genomic.fna.gz: invalid compressed data--format violated
gzip: plasmid.9.1.genomic.fna.gz: invalid compressed data--format violated
Any thoughts how I can solve this problem?
Thank you Nathalia
The text was updated successfully, but these errors were encountered: