Skip to content

Commit

Permalink
Dev (#4)
Browse files Browse the repository at this point in the history
* lid

* add new function

* new functions

* new functions

* pipeline

* pipeline

* Update README.md

* new functions

* update requirements.txt

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update readMe

* update requirements

* update all2txt

* update readMe

* update readMe

* update readMe

* update readMe

---------

Co-authored-by: wuchengwei <[email protected]>
  • Loading branch information
wuchengwei0122 and wuchengwei authored Jan 2, 2024
1 parent 7f3bc17 commit eb24e4f
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 12 deletions.
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,18 @@ data is one of the basic elements in the development of artificial intelligence.

FlagData supports the following features:

* it can be used with simple configuration after installation, and the custom feature can be realized with low code volume.
* Realize the high-quality content extraction of various original format data, and greatly reduce the processing cost.

* High-quality structured data can be quickly cleaned from the original html/text/pdf/epub, and sensitive information can be filtered to avoid the risk of privacy disclosure.
* Provide the function of fine-tuning data perspective for large models.

* Support massive text data de-duplication, and provide detailed multi-machine distributed data processing system deployment documents.

* support data quality assessment and common data analysis.
* One-stop efficient distributed data processing function.

The complete pipeline process and features such as
![pipeline](pipeline.png)

## News

- [Dec 15st, 2023] FlagData v1.1.0 has been upgraded
- [Dec 31st, 2023] FlagData v2.0.0 has been upgraded
- [Jan 31st, 2023] FlagData v1.0.0 is online!

--------------------------------------------------------------------------------
Expand Down
10 changes: 4 additions & 6 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,18 @@

FlagData支持以下特性:

* 安装后简单配置即可上手使用,低代码量实现自定义功能。
* 实现多种原始格式数据的高质量内容提取,极大降低处理成本

* 可从原始html/text/pdf/epub 快速清洗得到高质量结构化数据,注重敏感信息滤除,避免隐私泄露风险。
* 提供大模型微调数据透视功能

* 支持海量文本数据去重,并提供详细的多机分布式数据处理系统部署文档。

* 支持数据质量评估与常见数据分析。
* 一站式高效分布式数据处理功能

完整的pipeline流程以及功能如下图:
![pipeline](pipeline_zh.png)

## 动态

- [Dec 15st, 2023] FlagData v1.1.0 升级
- [Dec 31st, 2023] FlagData v2.0.0 升级
- [Jan 31st, 2023] FlagData v1.0.0 上线了!

--------------------------------------------------------------------------------
Expand Down

0 comments on commit eb24e4f

Please sign in to comment.