Flat requirements for whole genome mode are wasteful #773

adamnovak · 2019-11-15T19:16:57Z

When running a whole genome construct run, I have chromosomes 1-22, X, Y, and then a bunch of different little unplaced/unlocalized contigs and decoys.

We're using e.g. 200 GB of memory to compute snarls for each of those chromosomes, including all the tiny ones, but I don't observe them using nearly that much memory. It could be that chr1 takes that much memory, or even that a whole genome combined graph that we might ask to index takes that much memory, but chr21 doesn't, to say nothing of all the little unlocalized bits and decoys.

We should have some way to scale our job requirements based on file size, and/or we should run a test run with Toil's stat collection on and cut limits down to closer to what is really needed now with current vg.

This is currently causing me to waste most of our lab Kubernetes cluster capacity as unused-but-subscribed memory, and making my run very slow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flat requirements for whole genome mode are wasteful #773

Flat requirements for whole genome mode are wasteful #773

adamnovak commented Nov 15, 2019

Flat requirements for whole genome mode are wasteful #773

Flat requirements for whole genome mode are wasteful #773

Comments

adamnovak commented Nov 15, 2019