Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run simulated human HiFi reads #138

Open
Oieswarya opened this issue Sep 10, 2024 · 17 comments
Open

Unable to run simulated human HiFi reads #138

Oieswarya opened this issue Sep 10, 2024 · 17 comments

Comments

@Oieswarya
Copy link

Oieswarya commented Sep 10, 2024

Hello,
I have been trying to run goldrush with simulated HiFi reads of Human. The coverage of the reads is 10x. I have used goldrush for several other simulated inputs and it ran. I also checked if there is any non actg characters on my fq file and found none.

I have used this command:
goldrush run reads=Human_nonACTG_fq G=3120e6 track_time=1 m=10000 --debug

This is the .out file:

GNU Make 4.3
Built for x86_64-conda-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading makefiles...
Updating makefiles....
Updating goal targets....
File 'run' does not exist.
Must remake target 'run'.
mkdir -p goldrush_intermediate_files
cd goldrush_intermediate_files && ln -sf ../Human_nonACTG_fq.fq && goldrush run-in-dir reads=Human_nonACTG_fq G=3120e6 t=48 z=1000 track_time=1 k=22 w=16 tile=1000 b=10 u=5 a=1 o=0.1 x=10 h=3 s=1011011110110111101101 m=10000 M=5 r=0.9 P=15 d=5 span=2 dist=500 k_ntLink=40 w_ntLink=250 rounds=5 polisher=goldpolish polisher_mapper=minimap2 shared_mem=/dev/shm
GNU Make 4.3
Built for x86_64-conda-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading makefiles...
make[1]: Entering directory '/home/goldrush_intermediate_files'
Updating makefiles....
Updating goal targets....
File 'run-in-dir' does not exist.
File 'check-G' does not exist.
Must remake target 'check-G'.
Successfully remade target file 'check-G'.
File 'check-reads' does not exist.
Must remake target 'check-reads'.
Successfully remade target file 'check-reads'.
File 'clean' does not exist.
File 'goldrush_asm_golden_path.fa' does not exist.
File 'goldrush_asm_silver_path_all.fq' does not exist.
File 'goldrush_asm_silver_path_5.fq' does not exist.
Must remake target 'goldrush_asm_silver_path_5.fq'.
command time -v -o goldrush_asm_silver_path_5.fq.time goldrush-path -k 22 -w 16 -t 1000 -u 5 -a 1 -o 0.1 -p goldrush_asm_silver_path -i Human_nonACTG_fq.fq -h 3 -j 48 -x10 -P 15 -d 5 -s 1011011110110111101101 -g 3120e6 -b 10 -r 0.9 --silver_path -M 5 -m 10000 --verbose
make[1]: Leaving directory '/home/goldrush_intermediate_files'

This is the .err file:
make[1]: *** [/home/.conda/envs/goldrush_env/bin/goldrush.make:251: goldrush_asm_silver_path_5.fq] Error 127
make: *** [/home/.conda/envs/goldrush_env/bin/goldrush.make:203: run] Error 2

Can you kindly guide me as to where I am going wrong.

@lcoombe
Copy link
Member

lcoombe commented Sep 10, 2024

Hi @Oieswarya,

Is that the full standard out and error? After goldrush-path starts, there will be some messages about the parameters, etc. and don't see those there.

Can you confirm you are using exactly the same environment and installation as past runs? Do you see the help page when you run goldrush-path --help?

Thank you for your interest in GoldRush!
Lauren

@th-of
Copy link

th-of commented Sep 11, 2024

Also having a similar issue trying to run goldrush, ubuntu 20.04, 22.04, WSL2, all machines have the same error when installed with conda.

ln: ./..: cannot overwrite directory
make: *** [/home/thomas-ws/miniconda3/bin/goldrush.make:203: run] Error 1

I haven't been able to build it from source yet because of missing shared libraries that I can't figure out.

@lcoombe
Copy link
Member

lcoombe commented Sep 11, 2024

Hi @th-of,

This looks like a different error/issue - would you mind opening a new GitHub issue so we can keep our discussions separate? In particular, we would want to see your command and full log (standard out and error), as well as the result of running our assembly demo.

@th-of
Copy link

th-of commented Sep 11, 2024

Hi @th-of,

This looks like a different error/issue - would you mind opening a new GitHub issue so we can keep our discussions separate? In particular, we would want to see your command and full log (standard out and error), as well as the result of running our assembly demo.

As I was reproducing my issue I found the problem, I was including the file extension in the reads name ("reads=reads.fastq"). However, that only changed one problem to another (see below). I will spend some more time trying to fix it before I make an issue for this one.

SeqIndex::SeqIndex: Loading index from some/path/reads.fastq.index
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoul
goldrush/bin/goldrush.make:259: goldrush_asm_golden_path.goldpolish-polished.fa] Error 143

@lcoombe
Copy link
Member

lcoombe commented Sep 11, 2024

Sounds good, @th-of. I'll look out for your fresh issue.
It's hard to say too much without more information from you, but just a reminder to test your installation using the assembly demo, and ensure that your input read file is in your current working directory.

@th-of
Copy link

th-of commented Sep 11, 2024

Sounds good, @th-of. I'll look out for your fresh issue. It's hard to say too much without more information from you, but just a reminder to test your installation using the assembly demo, and ensure that your input read file is in your current working directory.

All is working now! Although one of the steps in the goldrush pipeline (goldpolish?) appears to be incompatible with fastq files from Dorado. The formatting of the header line seems to be the problem. The fastq file causes an error with header:

@b4fe3a55-f963-4a43-88d1-35b23acdbdc7 st:Z:2024-03-14T09:22:04.350+00:00 RG:Z:ccf17720be1a9a9f8f33443ea90c42b6a7685e7f_dna_r10.4.1_e8.2_400bps_hac@v5.0.0 DS:Z:gpu:NVIDIA GeForce RTX 3090

If I rename all the headers in the fastq file to a single word it runs without problems. Probably doesn't account for a tab-separated list as a fastq header. Dorado generates this by default when basecalling ONT pod5 files to fastq.

@lcoombe
Copy link
Member

lcoombe commented Sep 11, 2024

Glad it's working for you now! Huh strange - We've tested reads from Dorado before, but perhaps not with this header format. Thanks for that info, we'll take a look at fixing that.

@Oieswarya
Copy link
Author

Hi @lcoombe, yes strangely I have not changed anything and also goldrush-path --help gives me all the information from the help page. I am using the job script that I used to submit my previous jobs. I am using 370GB memory which should be more than enough, but do you think it is a memory issue?

I also checked the headers of my fastq file and they are single words like @1 and so on.

@lcoombe
Copy link
Member

lcoombe commented Sep 11, 2024

Thanks for confirming, @Oieswarya!

Looking at your command, your target genome looks to be ~3Gbp, so yes that should be enough memory.

In the goldrush_intermediate_files directory could you try just directly runing the command that looks to have failed?

command time -v -o goldrush_asm_silver_path_5.fq.time goldrush-path -k 22 -w 16 -t 1000 -u 5 -a 1 -o 0.1 -p goldrush_asm_silver_path -i Human_nonACTG_fq.fq -h 3 -j 48 -x10 -P 15 -d 5 -s 1011011110110111101101 -g 3120e6 -b 10 -r 0.9 --silver_path -M 5 -m 10000 --verbose

It would be super helpful to get more log messages from that command - it would be strange to just immediately fail without writing any of it's regular messages to log, if the binary itself seems OK (as indicated by you seeing the help page just fine)

@Oieswarya
Copy link
Author

@lcoombe I wanted to update you. I have run the command separately from goldrush_immediate_files and this is the log file that generated:
Using preset spaced seed
with:
span: 22
weight: 16
Calculating 5 silver path(s)
Using:
tile length: 1000
block size: 10
seed patterns: 3
threshold: 10
base seed pattern: 1011011110110111101101
minimum unassigned tiles: 5
maximum assigned tiles: 1
expected hash space: 6442450944
minimum average phred quality score: 15
maximum average phred delta between first and second half of read: 5
occupancy: 0.1
jobs: 48
allocating bit vector
m_filterSize: 61146729472
finished allocating bit vector
in 2.2652
opening: Human_nonACTG_fq.fq
inserting bit vector
num_passed_reads: 1597712
num_reads: 3083048
num_reads - num_passed_reads: 1485336
num_reads - num_passed_reads / num_reads: 0.0000
num_reads_skipped_by_phred: 0
num_reads_skipped_by_delta: 0
num_reads_skipped_by_length: 1485336
Total reads skipped: 1485336
finished inserting bit vector
in 1391.7048
assigning tiles
processed 10000 reads
processed 20000 reads
processed 30000 reads
processed 40000 reads
processed 50000 reads
processed 60000 reads
processed 70000 reads
processed 80000 reads
processed 90000 reads
processed 100000 reads
processed 110000 reads
processed 120000 reads
processed 130000 reads
processed 140000 reads
processed 150000 reads
processed 160000 reads
processed 170000 reads
processed 180000 reads
processed 190000 reads
processed 200000 reads
processed 210000 reads
processed 220000 reads
processed 230000 reads
processed 240000 reads
processed 250000 reads
processed 260000 reads
processed 270000 reads
processed 280000 reads
processed 290000 reads
processed 300000 reads
processed 310000 reads
processed 320000 reads
processed 330000 reads
processed 340000 reads
processed 350000 reads
processed 360000 reads
processed 370000 reads
processed 380000 reads
processed 390000 reads
processed 400000 reads
processed 410000 reads
processed 420000 reads
processed 430000 reads
processed 440000 reads
processed 450000 reads
processed 460000 reads
processed 470000 reads
processed 480000 reads
processed 490000 reads
processed 500000 reads
processed 510000 reads
processed 520000 reads
processed 530000 reads
processed 540000 reads
processed 550000 reads
processed 560000 reads
processed 570000 reads
processed 580000 reads
processed 590000 reads
processed 600000 reads
processed 610000 reads
processed 620000 reads
processed 630000 reads
processed 640000 reads
processed 650000 reads
processed 660000 reads
processed 670000 reads
processed 680000 reads
processed 690000 reads
processed 700000 reads
processed 710000 reads
processed 720000 reads
processed 730000 reads
processed 740000 reads
processed 750000 reads
processed 760000 reads
processed 770000 reads
processed 780000 reads
processed 790000 reads
processed 800000 reads
processed 810000 reads
processed 820000 reads
processed 830000 reads
processed 840000 reads
processed 850000 reads
processed 860000 reads
processed 870000 reads
processed 880000 reads
processed 890000 reads
processed 900000 reads
processed 910000 reads
processed 920000 reads
processed 930000 reads
processed 940000 reads
processed 950000 reads
processed 960000 reads
processed 970000 reads
processed 980000 reads
processed 990000 reads
processed 1000000 reads
processed 1010000 reads
processed 1020000 reads
processed 1030000 reads
processed 1040000 reads
processed 1050000 reads
processed 1060000 reads
processed 1070000 reads
processed 1080000 reads
processed 1090000 reads
processed 1100000 reads
processed 1110000 reads
processed 1120000 reads
processed 1130000 reads
processed 1140000 reads
processed 1150000 reads
processed 1160000 reads
processed 1170000 reads
processed 1180000 reads
processed 1190000 reads
processed 1200000 reads
processed 1210000 reads
processed 1220000 reads
processed 1230000 reads
Visited 642632 reads to generate 1 silver paths
processed 1240000 reads
processed 1250000 reads
processed 1260000 reads
processed 1270000 reads
processed 1280000 reads
processed 1290000 reads
processed 1300000 reads
processed 1310000 reads
processed 1320000 reads
processed 1330000 reads
processed 1340000 reads
processed 1350000 reads
processed 1360000 reads
processed 1370000 reads
processed 1380000 reads
processed 1390000 reads
processed 1400000 reads
processed 1410000 reads
processed 1420000 reads
processed 1430000 reads
processed 1440000 reads
processed 1450000 reads
processed 1460000 reads
processed 1470000 reads
processed 1480000 reads
processed 1490000 reads
processed 1500000 reads
processed 1510000 reads
processed 1520000 reads
processed 1530000 reads
processed 1540000 reads
processed 1550000 reads
processed 1560000 reads
processed 1570000 reads
processed 1580000 reads
processed 1590000 reads
processed 1600000 reads
processed 1610000 reads
processed 1620000 reads
processed 1630000 reads
processed 1640000 reads
processed 1650000 reads
processed 1660000 reads
processed 1670000 reads
processed 1680000 reads
processed 1690000 reads
processed 1700000 reads
processed 1710000 reads
processed 1720000 reads
processed 1730000 reads
processed 1740000 reads
processed 1750000 reads
processed 1760000 reads
processed 1770000 reads
processed 1780000 reads
processed 1790000 reads
processed 1800000 reads
processed 1810000 reads
processed 1820000 reads
processed 1830000 reads
processed 1840000 reads
processed 1850000 reads
processed 1860000 reads
processed 1870000 reads
processed 1880000 reads
processed 1890000 reads
processed 1900000 reads
processed 1910000 reads
processed 1920000 reads
processed 1930000 reads
processed 1940000 reads
processed 1950000 reads
processed 1960000 reads
processed 1970000 reads
processed 1980000 reads
processed 1990000 reads
processed 2000000 reads
processed 2010000 reads
processed 2020000 reads
processed 2030000 reads
processed 2040000 reads
processed 2050000 reads
processed 2060000 reads
processed 2070000 reads
processed 2080000 reads
processed 2090000 reads
processed 2100000 reads
processed 2110000 reads
processed 2120000 reads
processed 2130000 reads
processed 2140000 reads
processed 2150000 reads
processed 2160000 reads
processed 2170000 reads
processed 2180000 reads
processed 2190000 reads
processed 2200000 reads
processed 2210000 reads
processed 2220000 reads
processed 2230000 reads
processed 2240000 reads
processed 2250000 reads
processed 2260000 reads
processed 2270000 reads
processed 2280000 reads
processed 2290000 reads
processed 2300000 reads
processed 2310000 reads
processed 2320000 reads
processed 2330000 reads
processed 2340000 reads
processed 2350000 reads
processed 2360000 reads
processed 2370000 reads
processed 2380000 reads
processed 2390000 reads
processed 2400000 reads
processed 2410000 reads
processed 2420000 reads
processed 2430000 reads
processed 2440000 reads
processed 2450000 reads
processed 2460000 reads
processed 2470000 reads
processed 2480000 reads
processed 2490000 reads
processed 2500000 reads
processed 2510000 reads
processed 2520000 reads
processed 2530000 reads
processed 2540000 reads
processed 2550000 reads
processed 2560000 reads
Visited 1330621 reads to generate 2 silver paths
processed 2570000 reads
processed 2580000 reads
processed 2590000 reads
processed 2600000 reads
processed 2610000 reads
processed 2620000 reads
processed 2630000 reads
processed 2640000 reads
processed 2650000 reads
processed 2660000 reads
processed 2670000 reads
processed 2680000 reads
processed 2690000 reads
processed 2700000 reads
processed 2710000 reads
processed 2720000 reads
processed 2730000 reads
processed 2740000 reads
processed 2750000 reads
processed 2760000 reads
processed 2770000 reads
processed 2780000 reads
processed 2790000 reads
processed 2800000 reads
processed 2810000 reads
processed 2820000 reads
processed 2830000 reads
processed 2840000 reads
processed 2850000 reads
processed 2860000 reads
processed 2870000 reads
processed 2880000 reads
processed 2890000 reads
processed 2900000 reads
processed 2910000 reads
processed 2920000 reads
processed 2930000 reads
processed 2940000 reads
processed 2950000 reads
processed 2960000 reads
processed 2970000 reads
processed 2980000 reads
processed 2990000 reads
processed 3000000 reads
processed 3010000 reads
processed 3020000 reads
processed 3030000 reads
processed 3040000 reads
processed 3050000 reads
processed 3060000 reads
processed 3070000 reads
processed 3080000 reads
WARNING: Expected 5 silver paths, but only 3 generated.
Possible reasons include:
- Input reads sorted by chromosome/position
- Genome size set too large
assigned
in 3501.5824
I have set the genome size according to the long reads file's total length which is 3120e6.

@lcoombe
Copy link
Member

lcoombe commented Sep 12, 2024

Hi @Oieswarya,

So that indicates that the run went just fine (I don't see any errors), so unsure why you were getting that error before?
You could try re-launching the same command, which should now start after that goldrush-path step. You can confirm that by running the same command with the dry-run option (-n).

@Oieswarya
Copy link
Author

@lcoombe shall I run this command now?

goldrush run reads=Human_nonACTG_fq G=3120e6 track_time=1 m=10000 --debug

@lcoombe
Copy link
Member

lcoombe commented Sep 12, 2024

That's right! Fingers crossed it'll work - if so, it could have been a transient server issue.

@Oieswarya
Copy link
Author

I am still getting the same error:
GNU Make 4.3
Built for x86_64-conda-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading makefiles...
Updating makefiles....
Updating goal targets....
File 'run' does not exist.
Must remake target 'run'.
mkdir -p goldrush_intermediate_files
cd goldrush_intermediate_files && ln -sf ../Human_nonACTG_fq.fq && goldrush run-in-dir reads=Human_nonACTG_fq G=3120e6 t=48 z=1000 track_time=1 k=22 w=16 tile=1000 b=10 u=5 a=1 o=0.1 x=10 h=3 s=1011011110110111101101 m=10000 M=5 r=0.9 P=15 d=5 span=2 dist=500 k_ntLink=40 w_ntLink=250 rounds=5 polisher=goldpolish polisher_mapper=minimap2 shared_mem=/dev/shm
GNU Make 4.3
Built for x86_64-conda-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading makefiles...
make[1]: Entering directory '/home/goldrush_intermediate_files'
Updating makefiles....
Updating goal targets....
File 'run-in-dir' does not exist.
File 'check-G' does not exist.
Must remake target 'check-G'.
Successfully remade target file 'check-G'.
File 'check-reads' does not exist.
Must remake target 'check-reads'.
Successfully remade target file 'check-reads'.
File 'clean' does not exist.
File 'goldrush_asm_golden_path.fa' does not exist.
File 'goldrush_asm_silver_path_all.fq' does not exist.
File 'goldrush_asm_silver_path_5.fq' does not exist.
Must remake target 'goldrush_asm_silver_path_5.fq'.
command time -v -o goldrush_asm_silver_path_5.fq.time goldrush-path -k 22 -w 16 -t 1000 -u 5 -a 1 -o 0.1 -p goldrush_asm_silver_path -i Human_nonACTG_fq.fq -h 3 -j 48 -x10 -P 15 -d 5 -s 1011011110110111101101 -g 3120e6 -b 10 -r 0.9 --silver_path -M 5 -m 10000 --verbose
make[1]: Leaving directory '/home/goldrush_intermediate_files'

When I ran the goldrush-path command, though it was running but I did not see any files in the folder nor any soft links which it usually produces.

@lcoombe
Copy link
Member

lcoombe commented Sep 12, 2024

Are you running that command in the same folder? It doesn't appear to be starting in the right place (ie. it is re-running the goldrush-path command) - but regardless, I can't really see any error there - could you attach the full log files to GitHub?

In addition, could you re-run a fresh demo with your current set-up, just to make sure that nothing happened with your environment or server that you're usingg?

@Oieswarya
Copy link
Author

Yes I am running both the commands from the same environment where I installed my goldrush.

I will try to upload the file but unsure if I can do that as it is a 61gb file.

I will run goldrush with another file (which previously successfully ran) and see if there is something wrong with the installation somehow.

Thank you for your prompt responses!

@lcoombe
Copy link
Member

lcoombe commented Sep 12, 2024

No worries!

Just to clarify - I was asking about you sharing your full log files from the failed run, not your reads :)

And I know you're using the same environment, but always good to do that fresh demo run as a sanity check - running a previously successful read set works too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants