You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New CLI sockeye.prepare_data for preprocessing the training data only once before training,
potentially splitting large datasets into shards. At training time only one shard is loaded into memory at a time,
limiting the maximum memory usage.
Changed
Instead of using the --source and --target arguments sockeye.train now accepts a --prepared-data argument pointing to the folder containing the preprocessed and sharded data. Using the raw
training data is still possible and now consumes less memory.
[1.15.5]
Added
Optionally apply query, key and value projections to the source and target hidden vectors in the CNN model
before applying the attention mechanism. CLI parameter: --cnn-project-qkv.
[1.15.4]
Added
A warning will be printed if the checkpoint decoder slows down training.
[1.15.3]
Added
Exposing the xavier random number generator through --weight-init-xavier-rand-type.
[1.15.2]
Added
Exposing MXNet's Nesterov Accelerated Gradient, Adadelta and Adadelta optimizers.
[1.15.1]
Added
A tool that initializes embedding weights with pretrained word representations, sockeye.init_embedding.
Fast decoding for transformer models. Caches keys and values of self-attention before softmax.
Changed decoding flag --bucket-width to apply only to source length.
[1.14.2]
Added
Gradient norm clipping (--gradient-clipping-type) and monitoring.
Changed
Changed --clip-gradient to --gradient-clipping-threshold for consistency.
[1.14.1]
Changed
Sorting sentences during decoding before splitting them into batches.
Default chunk size: The default chunk size when batching is enabled is now batch_size * 500 during decoding to avoid
users accidentally forgetting to increase the chunk size.
[1.14.0]
Changed
Downscaled fixed positional embeddings for CNN models.
Renamed --monitor-bleu flag to --decode-and-evaluate to illustrate that it computes
other metrics in addition to BLEU.
Added
--decode-and-evaluate-use-cpu flag to use CPU for decoding validation data.
--decode-and-evaluate-device-id flag to use a separate GPU device for validation decoding. If not specified, the
existing and still default behavior is to use the last acquired GPU for training.
[1.13.2]
Added
A tool that extracts specified parameters from params.x into a .npz file for downstream applications or analysis.