ocr-ops

OCR-Ops is infrastructure to perform Optimal Character Recognition (OCR) at scale on a large number of images and videos. Built on top of the algo-ops framework, OCR-Ops is modular and extensible in its data processing operations.

Key Features:

Supports building an OCRPipeline that can utilize multiple popular OCR annotation methods (e.g. PyTesseract, EasyOCR, etc.) and return the results in structured and efficient fashion within a unified framework.
Enables multi-levels of information of the OCR application (e.g. text-only, bounding boxes, etc.)
Allows definition of an image pre-processing pipeline (before OCR) and a text-cleaning pipeline (after OCR) of detected but noisy text to enable optimal and robust OCR performance.
Supports several nice presets that are plug-and-play for the above purpose!

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
ocr_ops		ocr_ops
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda.yaml		conda.yaml
makefile		makefile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocr-ops

About

Releases

Packages

Languages

License

prateekt/ocr-ops

Folders and files

Latest commit

History

Repository files navigation

ocr-ops

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages