- Adds RNA downloads to PanCancer download tool
gnos_pull.pl
- Hardening of external process handling in
PCAP::Threaded
- Adds C version of
diff_bams
- Significant speed up of BAM generation under
bwa_mem.pl
by using separate process to do compression of mark duplicate output and streaming BAS generation. Not possible to do this to CRAM in same way.
- Reduce disk usage when running
bwa_mem.pl
- Improve throughput via slightly unintuitive use of additional pipes
- adds
map_threads|mt
option tobwa_mem.pl
to allow more control of parallel processing in one shot submission. - adds
bwa_pl|l
option tobwa_mem.pl
to allow preload of different malloc libraries.
- Move from legacy kent bigwig manipulation code and to
cgpBigWig
- Faster and handles the huge number of contigs in many new reference builds.
- Resulting changes to underlying installed tools is
bwcat
nowbwjoin
to be more descriptive of actual function.
Handle recent changes to BioPerl structure
- Use BWA default for
-T
previously hard coded to-T 0
.- Can be passed through
bwa_mem.pl
other ags to bwa via the-b
option.
- Can be passed through
- Fix
bam2bedgraph
compilation since changes to underlying libraries bamToBw.pl
- expose read flag filters- Drop dependancy on
Bio::DB::HTS INSTALL.pl
as can't fix to known good version. - Added travis CI
- Add support for output directly to CRAM
bwa
version upgraded to 0.7.15
- Threading module now converts currently running step to bash script for following reasons:
- Changes logging to use file redirects instead of Capture::Tiny - prevent log bleed into wrong files
- Commands for failed jobs remain after shutdown for easy debug/testing
- Log and progress file names simplified so more portable.
- Modified reheadSQ to be more robust.
- Adds
xam_coverage_bins.pl
which calculates fraction of targets covered at various depths (BAM/CRAM), using BED/GFF3 as target bait file.
- bwa_mem.pl
- allow user to specify BWA mapping parameters
- now accepts CRAM as input
- bamToBw.pl - now accepts CRAM as input.
- bam_stats - Adds 2 new stats:
#_mapped_pairs
#_inter_chr_pairs
- Dependancy changes
- WARNING: ensure all related tools handle these updates
- samtools, now only uses htslib based versions (1.3+, handling deprecated use of sort)
- Bio::DB::HTS htslib bindings replacing Bio::DB::Sam
- bwa_mem.pl - Option to disable duplicate marking
- bam_stats - Unit tests for C code
- bam_stats - Fix to median insert size calculation
- bam_stats - new rna switch to give more appropriate insert size stats
- bam_stats - more robust handling of optional RG header entries
- bam_stats - allows streaming IO (thanks to @jenniferliddle)
- bwa_mem.pl - Handle
'
in RG header line/IDs - Generally improved version handling and updated versions of some tools.
- Changed final log folder to include sample name and analysis type, prevents clash when lots of data to same output loc.
- Fix bugs #52 and #53
- Modified bwa_mem.pl to accept multi-readgroup BAM as input
- Turns out BWA mem still requires fixmates to get proper isize distributions
- bumped biobambam to 0.0.191
- Switched to bam_stats C in
bwa_mem.pl
. - Updates to
bam_to_sra.pl
to prevent bad SM values in unaligned BAM uploads.
- Adding local file mode for sites that cannot download from GNOS when the xml_to_bas.pl script runs
- gnos_pull.pl - see linked docs
- bam_stats C
- Reference file parameter is now optional to replicate bam_stats.pl functionality.
- Warnings in help, and when a cram file is given as reference from header may not be found, and bam stats will fail.
- bam_stats C - changed array for khash in insert size calculations in order to make code more robust.
- Header RG line reading now reads anything not a tab or newline as it should when determining what the values of tags are.
bamToBw.pl fixes
- Pull actual binaries from jkent_util not html page associated
- process name corrections in bamToBw.pm command line args
- bam_stats c now has CRAM support.
- Also dropped need for samtools v1.x api as can be handled by htslib on it's own.
- bamToBw.pl and new biobambam dep
No changes to old tools, just additions and prep for handling CRAM input.
bam_stats in C, less than 2 hours to generate stats on a sample level BAM file of ~120GB.
- bam_stats.pl is now multi-threaded, can get ~50% runtime reduction with 3-4 threads, memory still <500MB.
- Upgrades biobambam to 0.0.185 (and dependencies).
xml_to_bas.pl - detect readgroup id clashes and attempt to reconcile, #54
Fixed bug in bwa_mem.pl when using '-f' option on paired fastq.
Makes xml_to_bas.pl more robust on AWS. Retrieved XML was being truncated on some network configurations.
Modifications made to the bwa_mem.pl code to split a lane of data into fragments to reduce failure recovery time. Primarily added to handle X10 data better.
Also updated samtools to 0.1.20, last version that is currently compatible with Bio::DB::Sam.
Fix missing dependancy and build a relocatable version of biobambam suitable for use in artifactory.
- Minor enhancement to bwa_mem.pl to automatically generate the *.bas file.
- Added xml_to_bas.pl for pancancer users, see the wiki for details.
- Fixed a few minor issues, #36, #37, #39
- Install biobambam 0.0.142 to prevent over-counting of duplicates when multiple libraries, also required libmaus 0.0.124.
- Improve install for those working with multiple perl installs.
- Improve version inheritance, less code
- Corrected issue from dynamic de-reference of hash, issue for pre 5.14 perl and potentially unstable in future.
- Added missing project code to cv terms.
- Bug-fixed upgrade path, still needs better solution.
- Cleaned up messaging in Threaded module.
- Upgrade install to pull biobambam 0.0.138
- fastqtobam option 'pairedfile' for where readnames don't have trailing '/1' or '/2'.
- fastqtobam option to relax qscore validation without turning off... careful
- Upgrade install to pull BWA 0.7.8
- performance improvements for short read alignment (100bp)
- Upgrade install to pull biobambam 0.0.135
- fastqtobam supports Casava v1.8
- bamsort supports NM/MD correction during sam->bam/merge process
- Minor enhancement to BAS reader module.
- Sample name from command line passed through to SM of RG header in bwa_mem.pl
- SRA.pm - check that rg id is unique within run of code (thanks to Junjun Zhang)
- Threads.pm - join interval is now configurable.
- bam_stats.pl actually installed now.
- Basic *.bas perl access module.
- Upgraded libmaus/biobambam to resolve patch and CentOS install issue.
- Reference implementations ensure unique RG:ID between files.
- Changes for the re-worked PanCancer submission SOP.
- Patch for libmaus issue as not going to be a release in time.
- Bug fix for *.info files (bam_to_sra_sub.pl).
- Added bam_stats.pl.
- Project is now defaulted when not provided (bam_to_sra_sub.pl).
- Updated biobambam version
- Documented additional dependencies
- Improved install implementation
- Updated module naming in preparation for publication to GitHub.
- Added license boiler plate.
- bam_to_sra_sub.pl generates valid XML for GNOS, some features disabled until modifications to GNOS can be made (warnings indicate this on execution)
- Pre release with basic SRA XML generation (GENOS)
- Updated requirements for biobambam of 0.0.120
- Tests update to reflect change in biobambam requirement
- Initial codebase for PanCancer alignment with BWA 0.6.2.