Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce storage used by intermediate files #14

Open
anoronh4 opened this issue Mar 7, 2023 · 1 comment · Fixed by #15
Open

Reduce storage used by intermediate files #14

anoronh4 opened this issue Mar 7, 2023 · 1 comment · Fixed by #15
Labels
enhancement New feature or request

Comments

@anoronh4
Copy link
Collaborator

anoronh4 commented Mar 7, 2023

Currently forte is creating large intermediate files, which are demanded by nf-core's atomized modules. I've identified a few places where the bam storage can be reduced

  1. STAR_FOR_STARFUSION - we do not need to produce a bam for this process to run STARFUSION and we should probably skip writing one by default. the nf-core version of star/align currently requires a bam for the output. there's an easy fix, simply allow the bam output to be optional
  2. Allow all STAR_* processes to accept multiple pairs of fastqs. That way we don't have to use samtools/merge or cat/fastq to merge read data before or after, and we can even maintain distinct read groups this way. STAR is capable of doing this, but nf-core's star/align was not written with this usage case in mind.
  3. Merge the STAR_FOR_ARRIBA and ARRIBA processes together and don't output the bam at the end of the process. If scratch=true or other cleanup protocol is enabled, the bam will be discarded. Currently nf-core does not have a combined process to run arriba.

I think executing all of these will allow us to eliminate up to 5 copies of unused read data per sample, which is a huge savings in terms of storage and will possibly optimize time/compute as well. It may take some time for nf-core to adopt these changes so best to go ahead with local changes for now.

@anoronh4 anoronh4 linked a pull request Mar 7, 2023 that will close this issue
@anoronh4 anoronh4 added the enhancement New feature or request label Mar 7, 2023
@anoronh4
Copy link
Collaborator Author

part 1 and 2 have been addressed by #15 but part 3 is still unsolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant