Summary task in Vietnamese applies seq2seq model. Thanks to the SOTA Roberta model in Vietnamese, PhoBERT, I made summarization architecture which is trained on Vietnews dataset (reference 1)
- Step 1: Build docker container
docker build -f Dockerfile -t nlp-text-summarization:latest .
- Step 2: Run docker container
docker run -p 8501:8501 nlp-text-summarization:latest
The model outperforms the recent research paper on Vietnamese text summarization on the same dataset.
Attempt | Precision | Recall | F1-Score | F1-Score Fast-Abs (Ref 1) |
---|---|---|---|---|
Rouge 1 | 0.64 | 0.61 | 0.61 | 0.55 |
Rouge 2 | 0.31 | 0.30 | 0.30 | 0.23 |
Rouge L | 0.42 | 0.41 | 0.40 | 0.38 |
- Nguyen, Van-Hau & Nguyen, Thanh-Chinh & Nguyen, Minh-Tien & Hoai, Nguyen. (2019). VNDS: A Vietnamese Dataset for Summarization. 375-380. 10.1109/NICS48868.2019.9023886.
- Rothe, Sascha & Narayan, Shashi & Severyn, Aliaksei. (2020). Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. Transactions of the Association for Computational Linguistics. 8. 264-280. 10.1162/tacl_a_00313.
- Nguyen, Dat Quoc & Nguyen, Anh. (2020). PhoBERT: Pre-trained language models for Vietnamese. 1037-1042. 10.18653/v1/2020.findings-emnlp.92.