ETL filling up OCR queue? #345

mosea3 · 2021-02-12T11:06:37Z

In my company we just need full text search on PDFs that were already scanned and converted into Text-PDFs - so no OCR needed.
And OCR was disabled in /etc/opensemantic/etl and the ETL service was restarted

Still, something is filling the OCR queue and converting PDFs into images (connected to issue #343 )

Where can I backtrace this activity?

etl.txt

Mandalka · 2021-02-14T14:27:33Z

Is OCR yet enabled in the web admin / config ui?

This ui will write /etc/opensemanticsearch/etl-webadmin which overwrites settings in /etc/opensemanticsearch/etl

schneipk · 2021-03-10T09:59:55Z

I've got a similar issue. But for me I have OCR turned on. Running enrich later causes an error (seems to be deprecated) and some files/images of Websites get OCRd while others don't. Thank you for this cool project & the good work you're doing.

phretor · 2021-03-10T13:01:02Z

Same issue here. I tried the Desktop VM as well as the latest Docker Compose file. Worst is that I don't see any errors being thrown. @Mandalka do you see the same issue with the latest build?

bmnnit · 2022-05-25T10:10:45Z

i have the debian paket install:
ii open-semantic-search 21.12.25 all Search engine
and the problem that im unable to disable ocr, whatever i do there are always things added to the ocr queue..
im adding files with i.e:
opensemanticsearch-index-dir /home/opensemanticetl/mnt/Projekte/archiv/aktuelle_Projekte -v

Mandalka self-assigned this Feb 14, 2021

Mandalka added ocr question labels Feb 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETL filling up OCR queue? #345

ETL filling up OCR queue? #345

mosea3 commented Feb 12, 2021

Mandalka commented Feb 14, 2021

schneipk commented Mar 10, 2021

phretor commented Mar 10, 2021

bmnnit commented May 25, 2022 •

edited

Loading

ETL filling up OCR queue? #345

ETL filling up OCR queue? #345

Comments

mosea3 commented Feb 12, 2021

Mandalka commented Feb 14, 2021

schneipk commented Mar 10, 2021

phretor commented Mar 10, 2021

bmnnit commented May 25, 2022 • edited Loading

bmnnit commented May 25, 2022 •

edited

Loading