Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence value calculation (CC - WC - PC) - annotation extension #23

Open
ntra00 opened this issue Jul 1, 2014 · 10 comments
Open

Confidence value calculation (CC - WC - PC) - annotation extension #23

ntra00 opened this issue Jul 1, 2014 · 10 comments
Assignees

Comments

@ntra00
Copy link
Member

ntra00 commented Jul 1, 2014

Submitter: CCS  ([email protected])
Submitted: 2013-02
Status: Discussion
Backwards compatible:**Yes (Only Annotation)**
To ALTO Version: ?

For the page / word and character confidence the values for the calculation are not defined in the schema.
To establish a common calculation method the idea was to share the calculation method and to define a common rule for this to make the confidence values comparable.

Here the calculation methods as calculated until now by CCS with docWorks.

Precondition detail:

ABBYY FineReader up to version 7.1: the character confidence range was defined for 28 (good) to 55 (bad)

ABBYY FineReader starting version 8.0: the character confidence range was defined for 0 (good) to 100 (bad)

These ranges have to be transformed into the range defined by ALTO (range 0 to 9; see below). There unsharpness appeares.

CCS continued calculations for WC due to that on more precised values from ABBYY (range 28 - 55 / 0 - 100), Due to that rounding differences can appear on following values of WC from CC within the ALTO!

CC:

The character confidence is defined in ALTO in a scale of "0" to "9" - "0" is best, 9 is worst.

Character Confidence is determined according to ABBYY character confidence.
The results from the Finreader engines are normalized to the ALTO scale of 0 to 9 per character.
e.g. the word FAX - detected 100% ok by OCR engine will have a CC of 000 - one digit for every character.

WC:

Word Confidence is determined based on character level confidence.
The better the character confidence the better the word confidence.
In addition the word confidence is influenced by the dictionary verification.

If a word is found in the dictionary, it increases the word confidence value.
The longer the word, the higher the confidence value.
(Explanation: If a long word (e.g. with 15 characters) is found in dictionary it is pretty sure that the word is correct, while on wrong detected character a match against the dictionary by mistake is unlikely. Short words like 'fun' / 'fan' will both be found in dictionary. There is no improved guarantee by dictionary check, that the right word is detected.)
Due to that also words with 2 or less characters are not checked against the dictionary.

The word confidence is normalized to an interval of "0.00 to "1.00" - "1.00" best, "0.00" worst.
Calculation:
double( (sum CC)/numChar )/1000.0 - normalization to (0,1)
Example:

                <String HPOS="5485" VPOS="4654" WIDTH="468" HEIGHT="109" CONTENT="quorum" WC="1.00" CC="211110"/>

                <SP HPOS="5953" VPOS="4762" WIDTH="104"/>

                <String HPOS="6057" VPOS="4606" WIDTH="524" HEIGHT="132" CONTENT="conliflmg" WC="0.89" CC="110121122"/>

                <SP HPOS="6581" VPOS="4762" WIDTH="61"/>

                <String HPOS="6643" VPOS="4592" WIDTH="128" HEIGHT="118" CONTENT="of" WC="0.93" CC="02"/>

                <SP HPOS="6770" VPOS="4762" WIDTH="52"/>

                <String HPOS="6822" VPOS="4635" WIDTH="61" HEIGHT="66" CONTENT="a" WC="0.85" CC="2"/>

                <SP HPOS="6883" VPOS="4762" WIDTH="71"/>

                <String HPOS="6954" VPOS="4597" WIDTH="468" HEIGHT="137" CONTENT="majority" WC="1.00" CC="12101111"/>

                <SP HPOS="7422" VPOS="4762" WIDTH="52"/>

                <String HPOS="7474" VPOS="4578" WIDTH="123" HEIGHT="113" CONTENT="of" WC="0.96" CC="01"/>

When a word is in the dictionary, confidence is 1.0, else is computed (mainly average of all “reversed” cc – means for “212” = ((10-2) + (10-1) + (10-2)) / 3 = 25/3 = 8.33, means a WC of 0.83)

For short words, less than 3 chars, the risk is to have incorrect characters. Due to that it is calculated differently. (still pending)

Details:

FR9( FR8.1, FR10 also) : ABBYY character confidence range is between 0-100
The character confidence is normalized to (0,9) . The word confidence is the sum of the characters confidences and in the end this is calculated as an average of the numbers of characters.

Before writing the WC attribute in ALTO, the word confidence is checked against ABBYY dictionary, whenever the word is found in dictionary the confidence increases:
1000 - ((1000 - charConfLevel) / (chars.GetSize()*3));

Otherwise if the word is not found in ABBYY dictionary the initial determined word confidence level is used and normalized to (0,1)

Note:
charConfLevel word confidence - average confidence on character basis.
chars.GetSize number of characters in word

PC:

The Page Confidence is calculated by average dictionary confidence of all alpha-numeric characters.
?
The page confidence is normalized to an interval of "0.00 to "1.00" - "1.00" best, "0.00" worst.

Details:
The confidence is calculated by adding all the confidences of the XMLTexts (sum of character confidence)

set confidenceSum [expr $confidenceSum + $noOfAlphaNumChars * $confidence ]
and in the end the total page confidence is calculated after this formula:
return [ expr $confidenceSum/$pgNoOfAlphaNumChars ]

Note:

confidence- XMLText dictionary confidence

The total characters confidence sum divided by the number of characters on the page, (normalized in the end to (0,1) ) determines the Page Confidence.

If there are zones but no OCR, the returned value is 999 for confidence as for a bad confidence level.
For blank pages the returned value is 100 for confidence – as to full confidence on blank pages.

@ntra00 ntra00 self-assigned this Jul 1, 2014
@ntra00
Copy link
Member Author

ntra00 commented Jul 1, 2014

[email protected] said
at 11:41 am on Feb 21, 2013

Regarding the Character Confidence (CC):
I think the new Glyph would be a replacement and extension to the CC attribute. It would allow us to store additional information and use the same value range for the confidence as the WC and PC are using (0-1).
For backwards compatibility we should still support the CC attribute but define it as "deprecated" in the schema documentation.

@jukervin jukervin changed the title 2013-02 confidence value calculation (CC - WC - PC) - annotation extension Confidence value calculation (CC - WC - PC) - annotation extension Sep 10, 2014
@Jo-CCS Jo-CCS mentioned this issue Mar 23, 2016
@acpopat acpopat assigned acpopat and unassigned ntra00 Sep 20, 2017
@artunit artunit added the high priority Identified as high priority by Board label Sep 19, 2018
@urieli
Copy link

urieli commented Feb 7, 2019

I am highly concerned with the attempt to standardize the relationship between GC, WC and PC.

A word is much more than the sum of its component glyphs, and its confidence can be affected by many factors other than the confidence of these component glyphs.
For example, word confidence can be affected by whether or not the word appears in a dictionary, how common the word is in a reference corpus, and which other words precede or follow it in an n-gram model.

Trying to standardize this would force ALTO output to ignore these factors, or else to force all OCR software to use the identical algorithm, regardless of what experiments indicate. In my opinion, ALTO should make an effort to be descriptive rather than prescriptive.

@artunit
Copy link
Member

artunit commented Feb 7, 2019

This came up at the 2019-01-25 Board meeting. One idea was to use a sort of registry of algorithms if it was not possible to reconcile the metrics. On the other hand, there was also a feeling that the scale should be consistent.

@Jo-CCS
Copy link
Member

Jo-CCS commented Feb 7, 2019

HI Uriel,
thanks for your statement.
The demand was coming from the community to get references how the values should be computed and to come to compareable values within ALTO standard, so across the tools which generate it.
That's why we publicated here the calculation of the confidences as they were done on docWORKS - the initial single tool generating the ALTO output.
As you can see on the details that also the "found in dictionary" condition is something which is considerd in there AND which even is one of the most important points to be clarified how they are considered on the values.
As Art outlined we just discussed on the board again. The idea is to find a common definition e.g. WHEN a value is set to 1.0 (if is manual approved or corrected / digital born) or if the confidence is above 9.5 AND found in dictionary, f.e..
Since the initial version also other attributes were added which should also be considered on the WC value, e.g. to express anywhere else if is digital born, where f.e. the CC and WC probably are obsolet and should be ommited. Or should those be set to 1.0 everywhere?
So I think there are a lot of details to be discussed before coming to a decission of best approach and noting down the annotation, which was intended.

I would be glad to have you on board for the up-coming technical call we will then setup for it.
Regards,
jo

@urieli
Copy link

urieli commented Feb 7, 2019

@Jo-CCS - please e-mail me with details regarding the technical call, and I'll see if I can attend. I've added my e-mail address to my GitHub profile. I remain convinced that standardizing the method for calculating WC is the wrong approach. In the I've experimented with many different methods for calculating it, and the best method in one context (language, training corpus, evaluation corpus) isn't always the best method in another context.

What I mean by best is:
Let's say the OCR system produces n hypotheses for each word, and gives each hypothesis a confidence score. Different algorithms for calculating the score will rank the hypotheses differently. If we take the highest scoring hypothesis produced by each algorithm, we can then calculate the overall accuracy achieved by each algorithm on the evaluation corpus (in terms of letters and words).
Some algorithms provide better accuracy on one corpus, while others provide better accuracy on another corpus.

So far I've experimented with different ways of aggregating the glyph confidence scores (e.g. arithmetic vs. geometric mean), different ways of including the glyph splitting scores (as opposed to the glyph recognition scores), different ways of integrating whether or not a word is in the dictionary, and how often the word appears in a reference corpus. Because I deal with languages without standardized spelling, another factor to take into account is the word distance from dictionary entries. I would also like in the future to incorporate corpus-based n-gram models. There are really far too many possibilities to experiment with, and attempting to standardize would necessarily limit WC to methods that have been imagined up to now.

@artunit
Copy link
Member

artunit commented Mar 24, 2019

As agreed at Friday's Board Meeting, we will move ahead with a single topic meeting on this issue, open to anyone who is interested. @urieli: I will include your email in the poll, it is hoped we can have the meeting in April.

@artunit
Copy link
Member

artunit commented Mar 24, 2019

I include the Doodle Poll link here for completeness.

@altomator
Copy link

An information from ABBYY Support:
"Our developers confirmed that in fact the WC value is only useful as a relative one, to compare recognition quality of different variants of the same word. It is better and more correct to use the CharParams::IsSuspicious to estimate the OCR quality. "

@Jo-CCS
Copy link
Member

Jo-CCS commented Apr 3, 2019

Good discussion input.
WC on ALTO is not defined to be the WC of ABBYY. Also it should be discussed if the WC should be an identification of the character confidence or not.
I think indeed the WC is just a relative confidence to identify the confidence of the word and other ALTERNAITIVES and to get the supposed confidence difference available.

At CCS we have multiple OCR results we compare to each other and thse unified confidence values (results of the different engines calculated to same value ranges) are the identifiers which result to be taken as the best.

@artunit
Copy link
Member

artunit commented Jul 9, 2019

As per our 2019-07-08 Board Meeting and originating in the special single topic ALTO Meeting described above, we invite interested parties to explore and comment on the summary document on OCR Confidence put together by Board Member Ashok Popat of Google.

@cipriandinu cipriandinu removed the high priority Identified as high priority by Board label Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants