Skip to content

1.15.6

Compare
Choose a tag to compare
@fhieber fhieber released this 18 Dec 16:39

[1.15.6]

Added

  • New CLI sockeye.prepare_data for preprocessing the training data only once before training,
    potentially splitting large datasets into shards. At training time only one shard is loaded into memory at a time,
    limiting the maximum memory usage.

Changed

  • Instead of using the --source and --target arguments sockeye.train now accepts a
    --prepared-data argument pointing to the folder containing the preprocessed and sharded data. Using the raw
    training data is still possible and now consumes less memory.

[1.15.5]

Added

  • Optionally apply query, key and value projections to the source and target hidden vectors in the CNN model
    before applying the attention mechanism. CLI parameter: --cnn-project-qkv.

[1.15.4]

Added

  • A warning will be printed if the checkpoint decoder slows down training.

[1.15.3]

Added

  • Exposing the xavier random number generator through --weight-init-xavier-rand-type.

[1.15.2]

Added

  • Exposing MXNet's Nesterov Accelerated Gradient, Adadelta and Adadelta optimizers.

[1.15.1]

Added

  • A tool that initializes embedding weights with pretrained word representations, sockeye.init_embedding.

[1.15.0]

Added

[1.14.3]

Changed

  • Fast decoding for transformer models. Caches keys and values of self-attention before softmax.
    Changed decoding flag --bucket-width to apply only to source length.

[1.14.2]

Added

  • Gradient norm clipping (--gradient-clipping-type) and monitoring.

Changed

  • Changed --clip-gradient to --gradient-clipping-threshold for consistency.

[1.14.1]

Changed

  • Sorting sentences during decoding before splitting them into batches.
  • Default chunk size: The default chunk size when batching is enabled is now batch_size * 500 during decoding to avoid
    users accidentally forgetting to increase the chunk size.

[1.14.0]

Changed

  • Downscaled fixed positional embeddings for CNN models.
  • Renamed --monitor-bleu flag to --decode-and-evaluate to illustrate that it computes
    other metrics in addition to BLEU.

Added

  • --decode-and-evaluate-use-cpu flag to use CPU for decoding validation data.
  • --decode-and-evaluate-device-id flag to use a separate GPU device for validation decoding. If not specified, the
    existing and still default behavior is to use the last acquired GPU for training.

[1.13.2]

Added

  • A tool that extracts specified parameters from params.x into a .npz file for downstream applications or analysis.

[1.13.1]

Added