Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PacBio Assemblies stuck at "addNonORFcopy" #48

Open
olimat17 opened this issue Mar 23, 2023 · 7 comments
Open

PacBio Assemblies stuck at "addNonORFcopy" #48

olimat17 opened this issue Mar 23, 2023 · 7 comments

Comments

@olimat17
Copy link

Hi!
I am running ISEScan on a large set of genome assemblies (consisting primarily of assemblies from Illumina data and a couple PacBio genome assemblies). The tools runs great on the test data and my Illumina assemblies, but the PacBio assemblies seem to get stuck at the "addNonORFcopy" step. (On my server it took the other genomes <30 seconds to finish successfully after the HMM, but on the PacBio assemblies I kill the command after 30 minutes because there is no forward progress).
For a little more context: We anticipate >1000 transposases and predict that it is likely that our observed IS Elements may be overlapping or nested.
Thank you for any help you can provide!
-O

@xiezhq
Copy link
Owner

xiezhq commented Mar 27, 2023

Hi,

Sorry for the late reply.

ISEScan might count two overlapped or nested IS elements as one large IS elements, it depends on the boundaries of the predicted ORF.

ISEScan works on any sequence file in FASTA format, one or many sequences in one sequence file. Is there any special sequences in your PacBio assemblies?

Zhiqun Xie

@olimat17
Copy link
Author

Thank you for your reply.
Update: I attempted to run the sequences through the tool over the weekend, and I killed the command after 7 hours stuck at the same "addNonORFcopy" step. The sequences are the same size as the E. coli genome in the paper on the tool, so I am not quite sure why it is getting stuck.
The sequences were assembled using Flye, and we were able to use other tools (e.g., CheckM, Prokka) on the genomes with normal running times. What do you mean by special sequences?
Thank you for your help.

@xiezhq
Copy link
Owner

xiezhq commented Mar 28, 2023

Could you share the sequence file with the issue 'addNonORFcopy'? I need to reproduce the error reported with your sequence file. Without reproducing the error, it would be hard to figure out what is the underlining issue.

Xie

@olimat17
Copy link
Author

Hi Xie,
Sorry for the delay in response here. I just sent an example sequence to your listed contact e-mail ([email protected]).
For me it doesn't ever give an error, it just gets stuck for hours and never finishes. I am not quite sure why.
Thank you for your help!
-Olivia

@xiezhq
Copy link
Owner

xiezhq commented Apr 3, 2023 via email

@xiezhq
Copy link
Owner

xiezhq commented Aug 30, 2023

The internal algorithm produced the IS element candidates with large number of population each IS element candidate. This caused the huge computing cost when clustering candidates and picking the representative for each cluster. Need to change the internal algorithm to solve this issue in the future.

@cifuj
Copy link

cifuj commented May 3, 2024

Hi Xie, I have come across the same issue as Olivia.
isescan.py has been stuck at the addNonORFcopy step for more than 1 hour.
I also have identified more than 1000 transposes in the genome with other tools.
Are there any updates that could fix this issue?

Best,
Jero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants