Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Option to use Diamond instead of Blast #111

Open
jolespin opened this issue Feb 26, 2023 · 8 comments
Open

[Feature request] Option to use Diamond instead of Blast #111

jolespin opened this issue Feb 26, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@jolespin
Copy link

Would it be possible to include the option to use diamond as an alternative to blast?

@vbrover
Copy link
Contributor

vbrover commented Feb 26, 2023

We use BLAST usually for very strong matches where identity >= 90%.
For remote matches we do not use BLAST, but we use HMMer.
(But generally, if the goal is to find the protein family then BLAST is not the best tool.)
If you know protein families which are incorrectly identified by AMRFinderPlus, please let us know.

@oschwengers
Copy link

I think it's quite the opposite. It might be very interesting and advantageous to use Diamond instead of Blast to significantly speed-up the searches for >=90% hits using the --fast mode. We use Diamond in Bakta for these use cases with great results in terms of runtime.

@vbrover
Copy link
Contributor

vbrover commented Feb 26, 2023

Could you post an example where BLASTP with identity >=90% and Diamond produce different results (alignments)?
Can Diamond replace BLASTP, BLASTN, BLASTX and TBLASTN?
How faster is Diamond than BLAST?

@oschwengers
Copy link

oschwengers commented Feb 26, 2023

I think (due to our results and the Diamond publication), they should produce the same results for these highly similar hits, i.e. >=90% id. Due to the publication (figure above) Diamond blastp is ~2 magnitudes faster than blasp in default mode and >3 magnitudes in fast mode suitable for >90% seq id hits. It also provides an blastx mode. As far as I know, blastn/tblastn is not possible. May I kindly refere you to https://github.com/bbuchfink/diamond/wiki/3.-Command-line-options#sensitivity-modes

@vbrover
Copy link
Contributor

vbrover commented Feb 27, 2023

In AMRFinderPlus tblastn on a 7 Mbp genome takes 90 sec. (using 1 core, 2500 GHz).
If that can be made faster that will be a big improvement.

@evolarjun evolarjun added the enhancement New feature or request label Feb 27, 2023
@oschwengers
Copy link

Doesn't AMRFinderPlus also use blastp? I think this is where diamond could make a difference.

@evolarjun
Copy link
Contributor

AMRFinderPlus does use blastp, but the way we use blastp parallelizes better, and is faster than the tblastn step that is the slowest step currently, that's why @vbrover brought it up. The blastp does take some time though, so we'll check out your suggestion.

Thanks!

@evolarjun
Copy link
Contributor

Well we haven't yet tried diamond, but this suggestion prompted us to spend some time optimizing the blast parameters. There is likely room for further improvements, but being very conservative and careful to make sure we won't miss any alignments, we improved the time of combined runs by an average of over 50% in version 3.11.8. Unfortunately one of the optimizations caused issues with bioconda. Note that performance and optimization are highly dependent on the input sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants