Understanding speed of Tesserocr's image_to_text #269

parulsingh23 · 2021-08-26T06:13:50Z

The ReadMe file states that:

image_to_text can be used with threading to concurrently process multiple images which is highly efficient.

However, I'm curious as to how much faster this is. For example, if I were to run tesseract on 120 images each around 100x30 pixels, the average time is .18 seconds per image.

How would running Tesserocr's image_to_text on 120 images each around 100x30 pixels (all in a thread) take?
Additionally, how would this time compare using a computer's CPU, versus a GPU (like provided on google collab, or AWS EC2 instances)?

The text was updated successfully, but these errors were encountered:

ichenjia · 2021-08-26T18:32:41Z

You shouldn't use image_to_text if you have multiple images. Load model and establish the API takes time. You are better off doing something like this:

tess_api=tesserocr.PyTessBaseAPI()

for img in imgs:
  tess_api.SetImage(img)
  text = tess_api.GetUTF8Text()

tess_api.End()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding speed of Tesserocr's image_to_text #269

Understanding speed of Tesserocr's image_to_text #269

parulsingh23 commented Aug 26, 2021

ichenjia commented Aug 26, 2021

Understanding speed of Tesserocr's image_to_text #269

Understanding speed of Tesserocr's image_to_text #269

Comments

parulsingh23 commented Aug 26, 2021

ichenjia commented Aug 26, 2021