From eb24e4fac5b755605586a3d931dac590cc2aa18f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=90=B4=E6=88=90=E4=BC=9F?= <45165355+wuchengwei0122@users.noreply.github.com> Date: Tue, 2 Jan 2024 11:56:01 +0800 Subject: [PATCH] Dev (#4) * lid * add new function * new functions * new functions * pipeline * pipeline * Update README.md * new functions * update requirements.txt * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update readMe * update requirements * update all2txt * update readMe * update readMe * update readMe * update readMe --------- Co-authored-by: wuchengwei --- README.md | 10 ++++------ README_zh.md | 10 ++++------ 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 5517b13..656f800 100644 --- a/README.md +++ b/README.md @@ -15,20 +15,18 @@ data is one of the basic elements in the development of artificial intelligence. FlagData supports the following features: -* it can be used with simple configuration after installation, and the custom feature can be realized with low code volume. +* Realize the high-quality content extraction of various original format data, and greatly reduce the processing cost. -* High-quality structured data can be quickly cleaned from the original html/text/pdf/epub, and sensitive information can be filtered to avoid the risk of privacy disclosure. +* Provide the function of fine-tuning data perspective for large models. -* Support massive text data de-duplication, and provide detailed multi-machine distributed data processing system deployment documents. - -* support data quality assessment and common data analysis. +* One-stop efficient distributed data processing function. The complete pipeline process and features such as ![pipeline](pipeline.png) ## News -- [Dec 15st, 2023] FlagData v1.1.0 has been upgraded +- [Dec 31st, 2023] FlagData v2.0.0 has been upgraded - [Jan 31st, 2023] FlagData v1.0.0 is online! -------------------------------------------------------------------------------- diff --git a/README_zh.md b/README_zh.md index 493c665..29adc27 100644 --- a/README_zh.md +++ b/README_zh.md @@ -15,20 +15,18 @@ FlagData支持以下特性: -* 安装后简单配置即可上手使用,低代码量实现自定义功能。 +* 实现多种原始格式数据的高质量内容提取,极大降低处理成本 -* 可从原始html/text/pdf/epub 快速清洗得到高质量结构化数据,注重敏感信息滤除,避免隐私泄露风险。 +* 提供大模型微调数据透视功能 -* 支持海量文本数据去重,并提供详细的多机分布式数据处理系统部署文档。 - -* 支持数据质量评估与常见数据分析。 +* 一站式高效分布式数据处理功能 完整的pipeline流程以及功能如下图: ![pipeline](pipeline_zh.png) ## 动态 -- [Dec 15st, 2023] FlagData v1.1.0 升级 +- [Dec 31st, 2023] FlagData v2.0.0 升级 - [Jan 31st, 2023] FlagData v1.0.0 上线了! --------------------------------------------------------------------------------