Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
yue kun committed Oct 23, 2023
1 parent 468ae79 commit c170e0c
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion Applications/DocXChain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ DocXChain is a powerful open-source toolchain for document parsing, which can co

DocXChain is designed and developed with the original aspiration of ***promoting the level of digitization and structurization for documents***. In the future, we will go beyond pure document parsing capabilities, to explore more possibilities, e.g., combining DocXChain with large language models (LLMs) to perform document information extraction (IE), question answering (QA) and retrieval-augmented generation (RAG).

For more details, prelase refer to the [technical report](https://arxiv.org/abs/2310.12430) of DocXChain.
For more details, please refer to the [technical report](https://arxiv.org/abs/2310.12430) of DocXChain.

**Notice 1:** In this project, we adopt the ***broad concept of documents***, meaning DocXChain can support various kinds of documents, including regular documents (such as books, academic papers and business forms), street view photos, presentations and even screenshots.

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Visit our [读光-Du Guang Portal](https://duguang.aliyun.com/) and [DocMaster](
## Recent Updates

**2023.9 Release**
- [**DocXChain**](./Applications/DocXChain/) (*DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond,* arXiv 2023. [report](https://arxiv.org/abs/2310.12430)): To **promote the level of digitization and structurization for documents**, we develop and release a open-source toolchain, called DocXChain, for precise and detailed document parsing. Currently, basic capabilities, including text detection, text recognition, table structure recognition, and layout analysis, are provided. Also, typical pipelines, i.e., text reading, table parsing and document structurization, are built to support more complicated applications related to documents. Most of the algorithmic models are from [ModelScope](https://github.com/modelscope/modelscope).
- [**DocXChain**](./Applications/DocXChain/) (*DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond,* arXiv 2023. [report](https://arxiv.org/abs/2310.12430)): To **promote the level of digitization and structurization for documents**, we develop and release an open-source toolchain, called DocXChain, for precise and detailed document parsing. Currently, basic capabilities, including text detection, text recognition, table structure recognition, and layout analysis, are provided. Also, typical pipelines, i.e., general text reading, table parsing, and document structurization, are built to support more complicated applications related to documents. Most of the algorithmic models are from [ModelScope](https://github.com/modelscope/modelscope).
- [**LISTER**](./OCR/LISTER/) (*LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition,* ICCV 2023. [paper](https://arxiv.org/abs/2308.12774v1)): We propose a method called Length-Insensitive Scene TExt Recognizer (LISTER), which remedies the limitation regarding the **robustness to various text lengths**. Specifically, a Neighbor Decoder is proposed to obtain accurate character attention maps with the assistance of a novel neighbor matrix regardless of the text lengths. Besides, a Feature Enhancement Module is devised to model the long-range dependency with low computation cost, which is able to perform iterations with the neighbor decoder to enhance the feature map progressively..
- [**VGT**](./DocumentUnderstanding/VGT/) (*Vision Grid Transformer for Document Layout Analysis,* ICCV 2023. [paper](https://arxiv.org/abs/2308.14978)): To **fully leverage multi-modal information and exploit pre-training techniques to learn better representation** for document layout analysis (DLA), we present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding. In addition, a new benchmark for assessing document layout analysis algorithms, called [D^4LA](https://modelscope.cn/datasets/damo/D4LA/summary), is curated and released.
- [**VLPT-STD**](./OCR/VLPT-STD/) (*Vision-Language Pre-Training for Boosting Scene Text Detectors,* CVPR 2022. [paper](https://arxiv.org/abs/2204.13867)): We adapt **vision-language joint learning for scene text detection**, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language. The pre-trained model is able to produce more informative representations with richer semantics, which could readily benefit existing scene text detectors (such as EAST and DB) in the down-stream text detection task.
Expand Down

0 comments on commit c170e0c

Please sign in to comment.