Skip to content
Lauren Coombe edited this page Jan 30, 2020 · 18 revisions

Frequently Asked Questions

  1. I am getting an error that says Kmer::setLength(unsigned int): Assertion `length <= 64' failed
  2. My ABySS assembly jobs hang when I run them with high k values! (e.g. k=250)
  3. My ABySS MPI job with a large number of processors (over 1000) is using much more memory than expected. What's up?
  4. My ABySS assembly fails and I get an error that says abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
  5. Why do I count more contigs than abyss-fac that are larger than 500 bp?
  6. How much memory does ABySS use?
  7. What are the lower case characters in my assembly?

1. I am getting an error that says Kmer::setLength(unsigned int): Assertion `length <= 128' failed

ABySS has a compile-time parameter for the maximum value of k. As of ABySS 2.0.0, the maximum k value is 128 by default. In order to do assemblies with higher k values you must compile ABySS from source and use the --enable-maxk option during the configure step, i.e.

$ ./configure --enable-maxk=192
$ make
$ make install

The value of --enable-maxk should be a multiple of 32. ABySS needs to know the maximum value of k so that it can minimize the amount of memory it uses to represent the de Bruijn graph. If memory usage is not a concern, you may set --enable-maxk as high as you like.

2. My ABySS assembly jobs hang when I run them with high k values! (e.g. k=250)

The way that OpenMPI handles messages changes when the message sizes exceeded a certain size called the eager send limit. In ABySS, message size depends directly on k, and when the eager send limit is exceeded, assembly jobs will deadlock.

The best workaround for this problem is to explicitly set the eager send limit. This can be done by setting an environment variable called mpirun in your cluster job script.

Example:

#!/bin/sh
PATH=/home/joe/abyss-1.3.7/maxk_96/bin:$PATH
export mpirun='mpirun --mca btl_sm_eager_limit 16000 --mca btl_openib_eager_limit 16000'
abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'

The values for the btl_sm_eager_limit and btl_openib_eager_limit are in bytes, and it is usually fine to set them both to the same value. The formula for determining the appropriate value is:

eager_limit >= (max_k/4 + 32) * 100

3. My ABySS MPI job with a large number of processors (over 1000) is using much more memory than expected. What's up?

The default parameters of Open MPI allocate a large amount of memory to communication buffers. The following options will reduce the amount of memory allocated to buffers.

mpirun --mca btl_openib_receive_queues X,128,256,192,128:X,4096,256,128,32:X,12288,256,128,32:X,65536,256,128,3

4. My ABySS assembly fails and I get an error that says abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.

During the contig and scaffold stages of an assembly, ABySS aligns the paired end reads to the sequences that have been assembled so far (e.g. unitigs), so that it can link them into larger sequences (e.g. contigs). In order to be able to do this, ABySS needs to be able to correctly match up reads that belong to the same pair. If you are seeing this error, please check that either

  1. Both reads from a pair have identical FASTQ IDs (first word of line beginning with @), OR
  2. Both reads from a pair have identical FASTQ IDs followed by /1 and /2, respectively.

It is actually not required for the sequences in the read 1 and read 2 files to be sorted in the same order, but it is strongly recommended because it reduces the memory usage of abyss-fixmate. (In the majority of cases, the sequences in the read 1 and read 2 files will already be sorted in the same order anyway.)

5. Why do I count more contigs than abyss-fac that are larger than 500 bp?

abyss-fac does not count Ns toward the 500 bp, and samtools faidx counts all symbols. See the ABySS stats file format.

6. Why does ABySS crash with a segmentation fault during ABYSS-P with Open MPI 3.x?

With Open MPI 3.x, you may see a segmentation fault similar to this one:

[hpce705:162958] *** Process received signal ***
[hpce705:162958] Signal: Segmentation fault (11)
[hpce705:162958] Signal code:  (128)
[hpce705:162958] Failing at address: (nil)
[hpce705:162958] [ 0] /gsc/btl/linuxbrew/lib/libc.so.6(+0x33070)[0x7f7b4c627070]
[hpce705:162958] [ 1] /gsc/btl/linuxbrew/Cellar/open-mpi/3.1.0/lib/openmpi/mca_btl_vader.so(+0x4bde)[0x7f7b40b8fbde]
[hpce705:162958] [ 2] /gsc/btl/linuxbrew/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7f7b4be5f60c]
[hpce705:162958] [ 3] /gsc/btl/linuxbrew/lib/libmpi.so.40(PMPI_Request_get_status+0x74)[0x7f7b4d2abb54]
[hpce705:162958] [ 4] ABYSS-P[0x40dcec]

Try using Open MPI 2.1.3. If that crashes as well, try using the shared-memory (sm) BTL rather than the default vader BTL by adding to your abyss-pe command line mpirun='/path/to/openmpi-2.1.3/mpirun --mca btl self,sm'.

6. How much memory does ABySS use?

The most memory intensive step of ABySS is the initial de Bruijn graph assembly step. Bloom filter ABySS (abyss-bloom-dbg) uses the amount of memory specified by the -b, --bloom-size option, plus some overhead.

Hash table ABySS (ABYSS and ABYSS-P) uses (8 + maxk / 4) · n bytes of RAM, plus some overhead, where n is the number of distinct k-mers. You may use ntCard to count the number of distinct k-mers in the data set, which is reported as F0. For example, if ntCard reports F0 3000000000

(8 + maxk / 4) ⋅ n = (8 + 128 / 4) ⋅ 3e9 = 120 GB of RAM

7. What are the lower case characters in my assembly?

Lower case characters represent positions where ABySS is unsure of the precise sequence at that location. The uncertainty could be due to heterozygous sequence or a collapsed repeat. Polishing your assembly using a tool such as Pilon, Racon, ntEdit or Unicycler-polish will refine the sequence in these uncertain loci.