Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not try to extract text for empty files #367

Open
mmoossen opened this issue Nov 1, 2021 · 0 comments
Open

Do not try to extract text for empty files #367

mmoossen opened this issue Nov 1, 2021 · 0 comments
Assignees

Comments

@mmoossen
Copy link

mmoossen commented Nov 1, 2021

If you have an empty file, it will be sent to tika, and tika will tell:

Failed tasks while import & analysis (ETL):
X-TIKA:EXCEPTION:runtime
enhance_extract_text_tika_server
X-TIKA_EXCEPTION_runtime:
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes at

one can argue that it does not make any sense to index empty files, but in our use case we want the metadata, for instance file name and path, of the file to be indexed.

i think the best is to check the file size before sending it to tika.

@Mandalka Mandalka self-assigned this Dec 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants