Releases · itext/itext-pdfocr-dotnet

18 Nov 09:58

4.0.0

3167bc6

pdfOCR 4.0.0 Latest

Latest

pdfOCR is our add-on for iText Core to perform OCR on documents and images.

For this release the version number has been bumped for compatibility with iText Core 9.0 and License Key Library 4.2.0.

In addition, it includes a fix for CVE-2024-47554 resulting from the use of the Apache Commons.io library. This was resolved by updating to version 2.14.0 from 2.11.0.

Bug fixes

Fix CVE-2024-47554 which comes from commons-io

Assets 4

07 Feb 14:31

StryhelskiAndrei

3.0.2

7657d96

pdfOCR 3.0.2

pdfOCR is our add-on for iText Core to perform OCR on documents and images.

In this release we’ve added support for pdfOCR to be able to intelligently recognize table data and convert it into the correct tag structure in the resulting PDF documents.

A bug for the incorrect font size being selected for particularly small text was also fixed.

New features

Table recognition support

Bug fixes

Incorrect font size for small text in the PDFs generated with pdfOCR

Assets 4

25 Oct 15:08

StryhelskiAndrei

3.0.1

eead8fd

pdfOCR 3.0.1

pdfOCR is our add-on for iText Core to perform OCR on documents and images.

For this release, the artifact names have been changed to reflect the new naming structure. In addition, since Bouncy Castle is a dependency for tests the .NET version has been updated to use the latest 2.2.1 version.

Improvements

Updated .NET Bouncy Castle dependency to 2.2.1

Assets 4

10 May 12:43

AnhelinaM

3.0.0

394a1c6

pdfOCR 3.0.0

pdfOCR is our add-on for iText Core to perform OCR on documents and images.

This release is for compatibility with the iText Core version 8.x.x release.

Assets 4

25 Oct 10:02

introfog

2.0.2

34d0ba6

pdfOCR 2.0.2

Full Changelog: 2.0.1...2.0.2

Assets 4

11 Jan 13:29

ars18wrw

2.0.1

951ac4d

pdfOCR 2.0.1

This maintenance release updates the underlying glue (tess4j) with Tesseract to 4.5.5. There is not much to write home about, but we want to keep track of these underlying versions updates so we are ready for when bigger changes come about.

Improvements

Upgrade tesseract up to 4.5.5

Assets 4

25 Oct 14:06

ars18wrw

2.0.0

6480e74

pdfOCR 2.0.0

The pdfOCR 2.0.0 release brings the support of the new Unified License Mechanism along with the other products in the iText 7 Suite, and removes some deprecated API methods.

As the icing on the cake though, it benefits from all the improvements featured in iText 7 Core 7.2.0, such as the move to version 4.6.1 of the .NET Framework.

Breaking Changes

Removed deprecated methods from API
Bump to .NET Framework 4.6.1

New Features

Unified License Mechanism

Assets 4

05 Jul 08:59

ars18wrw

1.0.3

20eb242

pdfOCR 1.0.3

This the first release of the pdfOCR add-on this year.
It brings more advanced image type detection. From now on, pdfOCR does not rely on the file extension to determine the image type, but instead it detects the image type by considering a file's content to prevent errors in OCR processes.
It allows you to use files with unknown or incorrect extensions as an input, providing they have the correct structure from a specifications point of view.

Improvements

image type detection based on file content

Assets 4

22 Oct 07:27

ars18wrw

1.0.2

6386958

pdfOCR 1.0.2

pdfOCR 1.0.2 is already the third release of our newest project.

It brings some important improvements which allow you to process documents more precisely. These are:

Refinement of the symbol position based on the HOCR data that fixes output for Thai and some CJK fonts. This is especially important for our pdfCalligraph customers.

You can turn it on with: tesseract4OcrEngineProperties.setUseTxtToImproveHocrParsing(true);

Possibility for configuration of image preprocessing. That allows smoothing out fluctuations in a document's brightness to give you better results in cases of images taken by a camera.
You can pass the parameters which are described on http://www.leptonica.org/binarization.html using tesseract4OcrEngineProperties.setImagePreprocessingOptions

Improvements

Combine HOCR and TXT outputs for more precise text recognition
Add possibility to set image preprocessing properties (adaptive threshold tile size, threshold smoothing)

Assets 4

21 Oct 12:07

Snipx

1.0.1

0400706

pdfOCR 1.0.1

Hot on the heels of our initial release, we're releasing 1.0.1 already!

We've made improvements to the way that the calculations for word bounding boxes are made, so that in languages where ligatures are required, we are able to properly detect the text and render each character correctly.

Improvements

Improvements in word bbox calculation

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes

New features

Bug fixes

Improvements

Improvements

Breaking Changes

New Features

Improvements

Improvements

Improvements

Releases: itext/itext-pdfocr-dotnet

pdfOCR 4.0.0

Bug fixes

pdfOCR 3.0.2

New features

Bug fixes

pdfOCR 3.0.1

Improvements

pdfOCR 3.0.0

pdfOCR 2.0.2

pdfOCR 2.0.1

Improvements

pdfOCR 2.0.0

Breaking Changes

New Features

pdfOCR 1.0.3

Improvements

pdfOCR 1.0.2

Improvements

pdfOCR 1.0.1

Improvements