Skip to content
@hplt-project

HPLT - High Performance Language Technologies

A space that combines petabytes of natural language data with large-scale model training

Pinned Loading

  1. OpusCleaner OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    Python 45 13

  2. OpusTrainer OpusTrainer Public

    Curriculum training

    Python 15 4

Repositories

Showing 10 of 17 repositories
  • data-analytics-tool Public

    Data Analytics Tool

    hplt-project/data-analytics-tool’s past year of commit activity
    JavaScript 6 1 0 0 Updated Jul 8, 2024
  • monotextor-slurm Public

    Set of scripts to run monotextor-like pipeline under slurm HPCs

    hplt-project/monotextor-slurm’s past year of commit activity
    Rust 2 GPL-3.0 0 1 0 Updated Jul 3, 2024
  • warc2text-runner Public

    Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

    hplt-project/warc2text-runner’s past year of commit activity
    HTML 3 0 5 0 Updated Jun 27, 2024
  • OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    hplt-project/OpusCleaner’s past year of commit activity
  • sacremoses Public

    Python port of Moses tokenizer, truecaser and normalizer

    hplt-project/sacremoses’s past year of commit activity
    Python 483 MIT 59 25 (2 issues need help) 5 Updated May 26, 2024
  • HPLT-WP4 Public

    Information and pipelines on WP4: language models training

    hplt-project/HPLT-WP4’s past year of commit activity
    Python 1 CC0-1.0 0 0 0 Updated Apr 22, 2024
  • document-aligner Public

    tf/idf-based document aligner from Bitextor

    hplt-project/document-aligner’s past year of commit activity
    C++ 0 Apache-2.0 0 0 1 Updated Mar 19, 2024
  • hplt-project/OPUS-MT-dashboard’s past year of commit activity
    PHP 0 MIT 1 0 0 Updated Mar 9, 2024
  • HPLT-MT-Models Public

    This contains the configuration and scripts for HPLT MT model releases.

    hplt-project/HPLT-MT-Models’s past year of commit activity
    Python 4 0 1 0 Updated Mar 6, 2024
  • monolingual-multilingual-instruction-tuning Public

    Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

    hplt-project/monolingual-multilingual-instruction-tuning’s past year of commit activity
    Python 8 0 0 0 Updated Mar 6, 2024

Most used topics

Loading…