Releases: FunctionLab/selene
0.4.1
Updates:
- HDF5 support for in silico mutagenesis.
- Add google groups link to documentation.
Bug fixes:
- Predicting on sequences: write predictions to file for input less than batch size.
0.4.0
Updates:
- Variant effect prediction: adjustments made to variant centering and strand-specific sequence handling so that the sequence context fetched for a variant matches the implementation for code associated with DeepSEA and SeqWeaver (https://hb.flatironinstitute.org/asdbrowser/help, https://github.com/FunctionLab/expecto)
- Predicting on sequences accepts BED file as input
- Add compatibility with Lua-trained DeepSEA and SeqWeaver models (converted to PyTorch) - models themselves will be officially released through the ASD browser on HumanBase in the coming weeks.
- Simplified the prediction handlers output for variant effect prediction - sequences where the reference allele doesn't match the reference genome are no longer diverted to a new file. Rather, a column has been added
ref_match
that denotes whether the allele matches or not.
Bug fixes:
- Predicting on sequences: previously did not output anything if N < batch size
0.3.0
Selene version 0.3.0. Tested previously as a pre-release.
The updates to 0.3.0:
- Saving outputs for variant effect prediction to HDF5 or TSV files (used to only be TSV).
- Allowing users to set a write memory limit for how many predictions to store (for prediction, in silico mutagenesis, variant effect prediction) before writing them to a file.
- Major refactor for the
predict
module - Updating variant effect prediction sequence creation so that it matches how the sequences are created in ExPecto (that is, how the variant is centered in an N bp sequence).
Bug fix:
- Loading model checkpoint in the
TrainModel
class.
0.2.0
Bug fixes
max_steps
typo inTrainModel
(can now continue training from a model checkpoint)- API ordering mismatch for
get_data_and_targets
between online samplers and file samplers (now can run `EvaluateModel on both kinds of samplers)
Enhancements
- Significant improvements to the CLI/config file documentation: https://selene.flatironinstitute.org/overview/cli.html
- Allow callback handlers so that users can specify different kinds of metrics for training
- Support for training regression models using MatFileSampler: https://github.com/FunctionLab/selene/blob/master/tutorials/regression_mpra_example/regression_mpra_example.ipynb
- Allow saving new checkpoints after a certain number of steps in training (as opposed to overwriting the same one)
- Improved standard output logging for training
- Updated MatFileSampler so it no longer loads all data directly into memory if using an HDF5 file
- Allow users to have the option of loading the test set at the start of training, or (default) waiting until evaluation starts (if
ops: [train, evaluate]
).
Selene SDK release for preprint
IMPORTANT: For a manuscript submission, I have updated this tag with commits containing ONLY changes to some examples and READMEs. We will avoid making further forced updates to tags from now on (and forced updates will never happen if it is related to package code).
Minor fix to the setup specifications
The previous release of Selene had a bug where the tabix-indexed blacklist files could not be loaded for selene_sdk.sequences.Genome
classes. This release should resolve that issue.
New multi-file samplers, adjusts the selene.samplers API
In addition to adding a sampler that loads in .mat or .bed files for sampling in training/testing/validation modes (MultiFileSampler
), we also have updated selene.sequences.Genome
to include an input of blacklist_regions
in its constructor. This allows users to specify whether certain regions of the genome should be ignored entirely (e.g. never get sampled when using an online sampler).
Updated release with revision to fix inconsistent string formatting
Minor revision that updates the EvaluateModel and NonStrandSpecific classes with the proper string formatting (.format
).
First release of Selene
This release contains basic functionality to train, evaluate, and apply common sequence-level models. We used DeepSEA and use cases that build off of that model to determine what we should include in the first release. Please consult the tutorials for more information.