This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadow, dewarping, deblur, and binarization.
Document registration (also known as document alignment) aims to densely map two document images with the same content (such as a scanned and photographed version of the same document). It has important applications in automated data annotation and template-based dewarping tasks.
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
DocAlign12K | 12K (10K/2K) | Synth | Example | Link |
Venue | Method | DocUNet (130) | ||
---|---|---|---|---|
MS-SSIM↑ | AD↓ | |||
Arxiv'23 | DocAligner | 0.8232 | 0.0445 |
Appearance enhancement (also known as illumination correction) is not limited to a specific degradation type and aims to restore a clean appearance similar to that obtained from a scanner or digital born PDF files.
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
Doc3DShade | 90K | Synth | Example | Link |
DocProj | 2450 | Synth | Example | Link |
DocUNet from DocAligner | 130 | Real | Example | Link |
RealDAE | 600 (450/150) | Real | Example | Link |
Inv3D | 25K | Synth | Example | Link |
Venue | Methods | Training data | DocUNet from DocAligner (130) | RealDAE (150) | ||
---|---|---|---|---|---|---|
SSIM | PSNR | SSIM | PSNR | |||
- | - | - | 0.7195 | 13.09 | 0.8264 | 12.26 |
TOG'19 | DocProj | DocProj | 0.7098 | 14.71 | 0.8684 | 19.35 |
BMVC'20 | Das et al. | Doc3DShade | 0.7276 | 16.42 | 0.8633 | 19.87 |
MM'21 | DocTr | DocProj | 0.7067 | 15.78 | 0.7925 | 18.62 |
MM'22 | UDoc-GAN | DocProj | 0.6833 | 14.29 | 0.7558 | 16.43 |
TAI'23 | GCDRNet | RealDAE | 0.7658 | 17.09 | 0.9423 | 24.42 |
CVPR'24 | DocRes | 0.7598 | 17.60 | 0.9219 | 24.65 |
Deshadowing aims to eliminate shadows that are mainly caused by occlusion to obtain shadow-free document images.
* indicates that the implementation is unofficial.
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
RDD | 4916 (4371/545) | Real | Example | Link |
Kligler et al. | 300 | Real | Example | Link |
FSDSRD | 14200 | Synth | Example | Link |
Jung et al. | 87 | Real | Example | Link |
OSR | 237 | Real | Example | Link |
WEZUT OCR | 176 | Real | Example | Link |
SD7K | 7620 (6479/760) | Real | Example | Link |
SynDocDS | 50K (40K/5K) | Synth | Link |
Venue | Method | Training data | Kligler et al. (300) | Jung et al. (87) | OSR (237) | RDD (545) | SD7K (760) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE↓ | PSNR↑ | SSIM↑ | RMSE↓ | PSNR↑ | SSIM↑ | RMSE↓ | PSNR↑ | SSIM↑ | RMSE↓ | PSNR↑ | SSIM↑ | RMSE↓ | PSNR↑ | SSIM↑ | |||
CVPR'23 | BGShadowNet | RDD | 5.377 | 29.17 | 0.948 | 2.219 | 37.58 | 0.983 | |||||||||
ICCV'23 | FSENet | SD7K | 10.60 | 28.98 | 0.93 | 17.56 | 23.60 | 0.85 | 10.00 | 28.67 | 0.96 | ||||||
CVPR'24 | DocRes | 27.14 | 0.900 | 23.02 | 0.908 | 21.64 | 0.937 |
Dewarping, also referred to as geometric rectification, aims to rectify document images that suffer from curves, folds, crumples, perspective/affine deformation and other geometric distortions.
Dataset | Num. | Type | Example | Download/Codes |
---|---|---|---|---|
DocUNet | 130 | Real | Example | Link |
Doc3D | 100K | Synth | - | Link |
DIW | 5K | Real | Example | Link |
WarpDoc | 1020 | Real | Example | Link |
DIR300 | 300 | Real | Example | Link |
Inv3D | 25K | Synth | Example | Link |
Inv3DReal | 360 | Real | Example | Link |
DICP | - | Synth | - | Link |
DIF | - | Synth | - | Link |
Simulated Paper | 90K | Synth | - | Link |
DocReal | 200 | Real | Example | Link |
UVDoc | 20K | Synth | Example | Link |
WarpDoc-R | 840 | Real |
Venue | Method | DocUNet (130) | DIR300 (300) | DocReal (200) | UVDoc (50) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MS-SSIM↑ | LD↓ | AD↓ | MS-SSIM↑ | LD↓ | AD↓ | MS-SSIM↑ | LD↓ | MS-SSIM↑ | AD↓ | ||
ICCV'19 | DewarpNet | 0.474 | 8.39 | 0.426 | 0.492 | 13.94 | 0.331 | 0.589 | 0.193 | ||
DAS'20 | FCN-based | 0.448 | 7.84 | 0.434 | 0.503 | 9.75 | 0.331 | ||||
ICCV'21 | Piece-Wise | 0.492 | 8.64 | 0.468 | |||||||
ICDAR'21 | DDCP | 0.473 | 8.99 | 0.453 | 0.552 | 10.95 | 0.357 | 0.46 | 16.04 | 0.585 | 0.290 |
MM'21 | DocTr | 0.511 | 7.76 | 0.396 | 0.616 | 7.21 | 0.254 | 0.55 | 12.66 | 0.697 | 0.160 |
CVPR'22 | RDGR | 0.497 | 8.51 | 0.461 | 0.610 | 0.280 | |||||
MM'22 | Marior | 0.478 | 7.27 | 0.403 | |||||||
ECCV'22 | DocGeoNet | 0.504 | 7.71 | 0.380 | 0.638 | 6.40 | 0.242 | 0.55 | 12.22 | 0.706 | 0.168 |
SIGGRAPH'22 | PaperEdge | 0.473 | 7.81 | 0.392 | 0.583 | 8.00 | 0.255 | 0.52 | 11.46 | ||
Arxiv'22 | DocScanner-L | 0.518 | 7.45 | 0.334 | |||||||
ICCV'23 | Li et al. | 0.497 | 8.43 | 0.376 | 0.607 | 7.68 | 0.244 | ||||
WACV'23 | DocReal | 0.50 | 7.03 | 0.56 | 9.83 | ||||||
TCSVT'23 | DRNet | 0.51 | 7.42 | ||||||||
TMM'23 | DocTr++ | 0.51 | 7.54 | 0.45 | 19.88 | ||||||
Arxiv'23 | Polar-Doc | 0.605 | 7.17 | 0.206 | |||||||
Arxiv'23 | MetaDoc | 0.502 | 7.42 | 0.315 | 0.638 | 5.75 | 0.178 | ||||
SIGGRAPH'23 | UVDoc | 0.544 | 6.83 | 0.315 | 0.785 | 0.119 | |||||
ACM TOG'23 | LA-DocFlatten | 0.526 | 6.72 | 0.300 | 0.651 | 5.70 | 0.195 | ||||
CVPR'24 | DocRes | 0.626 | 6.83 | 0.241 | |||||||
IJDAR'24 | DocTLNet | 0.51 | 6.70 | 0.658 | 5.75 |
- Note that the 127th and 128th distorted images in DocUNet benchmark are rotated by 180 degrees, which does not match the ground truth documents. The performance reported here is based on corrected data.
- Note that the UVDoc benchmark reported in our repository is based on the full UVDoc benchmark dataset (reported on the official github page). The results in the paper used only half of the UVDoc benchmark.
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
TDD (text deblur dataset) | 67.6K (66K/1.6K) | Synth | Example | Link1, Link2 |
Coming Soon ...
Dataset | Num. | Type | Example | Download |
---|---|---|---|---|
DocEng 2019 | 15 | Real | Example | Link |
DocEng 2020 | 32 | Real | Example | Link |
DocEng 2021 | 222 | Real | Example | Link |
DocEng 2022 | 80 | Real | Example | Link |
DIBCO 2009 | 10 | Real | Example | Link |
H-DIBCO 2010 | 10 | Real | Example | Link |
DIBCO 2011 | 16 | Real | Example | Link |
H-DIBCO 2012 | 14 | Real | Example | Link |
DIBCO 2013 | 16 | Real | Example | Link |
H-DIBCO 2014 | 10 | Real | Example | Link |
H-DIBCO 2016 | 10 | Real | Example | Link |
DIBCO 2017 | 20 | Real | Example | Link |
DIBCO 2018 | 10 | Real | Example | Link |
DIBCO 2019 | 10 | Real | Example | Link |
Bickly-diary | 7 | Real | Example | Link |
Synchromedia Multispectral (MSI) | 240 | Real | Example | Link |
Persian Heritage Image Binarization (PHIBD) | 15 | Real | Example | Link |
Palm Leaf | 50 | Real | Example | Link |
NoiseOffice | 216 | Synth | Example | Link |
LRDE Document Binarization Dataset | 125 | Real | - | Link |
Shipping label dataset | 1082 | Real | Example | Link |
Coming Soon ...
This task aims to erase the handwritten text in the document image.
Year | Venue | Title | Repo |
---|---|---|---|
2022 | PRCV | CHENet: Image to Image Chinese Handwriting Eraser | |
2023 | ICDAR | EnsExam: A Dataset for Handwritten Text Erasure on Examination Papers | Code |
2024 | IJDAR | Scene handwritten text erasure based on multi-scale feature fusion |
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
百度网盘AI大赛:手写文字擦除 | 1281 (1081/200) | Real | Example | Link |
CH-dataset | 1623 (1423/200) | Real | ||
EnsExam | 545 (430/115) | Real | Example | Link |
SignaTR6K | 6257 (5169/558/530) | Real | Example | Link |
APP & Project & Tool | Developer | Platform |
---|---|---|
CamScanner (扫描全能王) | INTSIG | ios, Android |
Quark (夸克扫描王) | Dongyue | ios, Android, Web |
WPS Office | KINGSOFT OFFICE | ios, Android |
Adobe Acrobat | Adobe | Windows |
Adobe Scan | Adobe | ios, Android |
Lenovo Smart Scanner (联想扫描王) | Lenovo | ios, Android |