Skip to content

A wrapper for passing Wikisource image OCR requests through to the Google Vision API text-recognition system, and retrieving the resulting text.

Notifications You must be signed in to change notification settings

jayantanth/ws-google-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikisource Google OCR tool

This is a simple wrapper service around the Google Cloud Vision API, enabling Wikisources to submit images for Optical Character Recognition and retrieve the resultant text.

This works with more languages than the alternative service at https://tools.wmflabs.org/phetools (used by e.g. https://wikisource.org/wiki/MediaWiki:OCR.js and similar scripts on other Wikisources).

Requests can only be for images hosted on Commons.

Usage

Send up to two parameters to api.php:

https://tools.wmflabs.org/ws-google-ocr/api.php?lang=[LANG_CODE]&image=[IMAGE_URL]

And get back a JSON response with either 'text' or 'error' top-level items set:

{
  'text': 'Lorem ipsum...',
  'error': {
    'code': '',
    'message': ''
  }
}

Note that you should only set the lang parameter for languages that require it. The documentation informs us of the following:

In most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting languageHints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Text detection returns an error if one or more of the specified languages is not one of the supported languages.

External links

About

A wrapper for passing Wikisource image OCR requests through to the Google Vision API text-recognition system, and retrieving the resulting text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages