Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'enhance_extract_text_tika_server' error message #357

Open
RabbitJackTrade opened this issue Jul 19, 2021 · 2 comments
Open

'enhance_extract_text_tika_server' error message #357

RabbitJackTrade opened this issue Jul 19, 2021 · 2 comments

Comments

@RabbitJackTrade
Copy link

Newbie here, so please pardon if I'm missing something:

I'm running the VM in Oracle Virtual Box under Windows 10 (all current versions).

I tried indexing a file (always a Microsoft Word docuemnt) using the browser (search-apps/files/create) - the response I get is

File or directory added to queue.

The file name shows up in the Newest documents tab, but the content is never indexed.

Trying the same thing using CLI

opensemanticsearch-index-dir /path/to/filename

gets this response

Indexing new file: /path/to/filename

but the indexing never takes place. When I run this again, the response this time is

Repeating indexing of unchanged file because critical plugin(s) ['enhance_extract_text_tika_server'] failed in former run: /path/to/filename

or, on occasion

Repeating indexing of unchanged file because (additional configured) plugin(s) or options ['enhance_extract_text_tika_server_ocr_enabled'] not runned yet: /path/to/filename

As I mentioned - all documents are in Microsoft Word format, so I'm not sure what ocr has to do with it.
I've seen references to the first error message but couldn't find a solution.

Thanks.

@denispol
Copy link

I confirm that this happens as well with .pdf and other Office formats (.xls, .xlsx), using the latest from master.

@AndreaPux
Copy link

Same problem here. Honestly, Open Semantic Search seems a wonderful tool, but it's a quite frustrating experience. I spent one week trying to install OSS on Ubuntu LTS, and the only solution was to use Debian instead inspite of what was claimed in the docs. Now, on Debian the tools is installed but it doesn't index the files content, and what I get here is that the problem is known from 2021 and there's no proposed solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants