diff --git a/Applications/DocXChain/README.md b/Applications/DocXChain/README.md index 0288057..d2e5e70 100644 --- a/Applications/DocXChain/README.md +++ b/Applications/DocXChain/README.md @@ -10,7 +10,7 @@ DocXChain is a powerful open-source toolchain for document parsing, which can co DocXChain is designed and developed with the original aspiration of ***promoting the level of digitization and structurization for documents***. In the future, we will go beyond pure document parsing capabilities, to explore more possibilities, e.g., combining DocXChain with large language models (LLMs) to perform document information extraction (IE), question answering (QA) and retrieval-augmented generation (RAG). -For more details, prelase refer to the [technical report](https://arxiv.org/abs/2310.12430) of DocXChain. +For more details, please refer to the [technical report](https://arxiv.org/abs/2310.12430) of DocXChain. **Notice 1:** In this project, we adopt the ***broad concept of documents***, meaning DocXChain can support various kinds of documents, including regular documents (such as books, academic papers and business forms), street view photos, presentations and even screenshots. diff --git a/README.md b/README.md index f7bbde8..e19ab86 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Visit our [读光-Du Guang Portal](https://duguang.aliyun.com/) and [DocMaster]( ## Recent Updates **2023.9 Release** - - [**DocXChain**](./Applications/DocXChain/) (*DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond,* arXiv 2023. [report](https://arxiv.org/abs/2310.12430)): To **promote the level of digitization and structurization for documents**, we develop and release a open-source toolchain, called DocXChain, for precise and detailed document parsing. Currently, basic capabilities, including text detection, text recognition, table structure recognition, and layout analysis, are provided. Also, typical pipelines, i.e., text reading, table parsing and document structurization, are built to support more complicated applications related to documents. Most of the algorithmic models are from [ModelScope](https://github.com/modelscope/modelscope). + - [**DocXChain**](./Applications/DocXChain/) (*DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond,* arXiv 2023. [report](https://arxiv.org/abs/2310.12430)): To **promote the level of digitization and structurization for documents**, we develop and release an open-source toolchain, called DocXChain, for precise and detailed document parsing. Currently, basic capabilities, including text detection, text recognition, table structure recognition, and layout analysis, are provided. Also, typical pipelines, i.e., general text reading, table parsing, and document structurization, are built to support more complicated applications related to documents. Most of the algorithmic models are from [ModelScope](https://github.com/modelscope/modelscope). - [**LISTER**](./OCR/LISTER/) (*LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition,* ICCV 2023. [paper](https://arxiv.org/abs/2308.12774v1)): We propose a method called Length-Insensitive Scene TExt Recognizer (LISTER), which remedies the limitation regarding the **robustness to various text lengths**. Specifically, a Neighbor Decoder is proposed to obtain accurate character attention maps with the assistance of a novel neighbor matrix regardless of the text lengths. Besides, a Feature Enhancement Module is devised to model the long-range dependency with low computation cost, which is able to perform iterations with the neighbor decoder to enhance the feature map progressively.. - [**VGT**](./DocumentUnderstanding/VGT/) (*Vision Grid Transformer for Document Layout Analysis,* ICCV 2023. [paper](https://arxiv.org/abs/2308.14978)): To **fully leverage multi-modal information and exploit pre-training techniques to learn better representation** for document layout analysis (DLA), we present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding. In addition, a new benchmark for assessing document layout analysis algorithms, called [D^4LA](https://modelscope.cn/datasets/damo/D4LA/summary), is curated and released. - [**VLPT-STD**](./OCR/VLPT-STD/) (*Vision-Language Pre-Training for Boosting Scene Text Detectors,* CVPR 2022. [paper](https://arxiv.org/abs/2204.13867)): We adapt **vision-language joint learning for scene text detection**, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language. The pre-trained model is able to produce more informative representations with richer semantics, which could readily benefit existing scene text detectors (such as EAST and DB) in the down-stream text detection task.