Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with pdf, corrupt first page of all documents #353

Open
chrbratt opened this issue May 6, 2021 · 3 comments
Open

Problem with pdf, corrupt first page of all documents #353

chrbratt opened this issue May 6, 2021 · 3 comments

Comments

@chrbratt
Copy link

chrbratt commented May 6, 2021

Hello

Has problems with all multi-page and one-page pdf pages being indexed. The first page of all documents (approx. 10,000) is corrupted after you download it from webpage. But if I open the document manually via the SMB share that is mounted then there are no problems.

The installation is completely by default except that I added smb share's for file index with help from this guide.
https://github.com/shunwatai/Open-Semantic-Search-setup-guide

Running Ubuntu 20.04.2 LTS and installed Open Sematic Search with sudo apt install ./open-semantic-search-server-ubuntu-*.deb yesterday,

image

Also all previews are corrupted.

image

A little help would be greatly appreciated on what might be wrong here

@dennirockz
Copy link

Same issue for me as well :( Although I can not even see any filecontent.
For me some files are working - most of them are not. OSS installed on a server, accessing files via external drive

@dennirockz
Copy link

Nice - I think I found a solution. Looks like Apache is causing the issue here.
I followed the instructions here https://superuser.com/questions/1483696/cifs-mounted-on-linux-from-windows-shows-corrupt-distorted-images

Now the documents are loaded without being corrupted as far as I can tell for now. Will come back to this topic if I spot something different

@chrbratt
Copy link
Author

chrbratt commented Jun 2, 2021

Many thanks for the follow-up and a supposed solution, will try as soon as possible :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants