Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multizim (suggestions) does not work at all #932

Open
kelson42 opened this issue Sep 14, 2024 · 22 comments
Open

Multizim (suggestions) does not work at all #932

kelson42 opened this issue Sep 14, 2024 · 22 comments
Assignees
Milestone

Comments

@kelson42
Copy link
Contributor

@kelson42 cloned issue kiwix/libkiwix#479 on 2021-03-19:

If I search for suggestion in the welcome page, nothing is printed.

I would like to see the results and it would be great to have the logo of the ZIM beside to see in which content the content is available.

See kiwix/kiwix-tools#385 for the fulltext search multizim lack of scalability

@kelson42
Copy link
Contributor Author

@JensKorte commented on 2021-03-19:

If I search for suggestion in the welcome page, nothing is printed.

Strange, for me it works. I use kiwix-tools_linux-x86_64-3.1.2-4$ ./kiwix-serve -V

3.1.2

The library isn't in use, I start with "/path/kiwix-serve *zim".

The menu line is broken. There are german and english results. Hope it helps.

kiwix-global-search

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2021-03-19:

@JensKorte This is the fulltext search, not the suggestions

@kelson42
Copy link
Contributor Author

@JensKorte commented on 2021-03-19:

@JensKorte This is the fulltext search, not the suggestions

Ahh, yes. When I try to use the suggestions, there are no results, there is even no storage IO.

@kelson42
Copy link
Contributor Author

@maneeshpm commented on 2021-03-25:

I tried to recreate this bug for a single zim file. In this case, the error occurs because of an empty content argument to the request, that causes a corresponding failure in getIdForName() method. Hence we get a 404 page via the catch block.

https://github.com/kiwix/kiwix-lib/blob/803cb1c2c5b6c99b53bcc540bf6719b69d3552ad/src/server/internalServer.cpp#L395-L402

This is the generated request: http://localhost:8080/suggest?content=&term=berlin

The solution is to fix the faulty request so that it includes suitable content from which bookName can be extracted.

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2021-03-25:

@maneeshpm Sounds good but we need to think about the scalability as well. How can we secure a proper response, on time, with 2000 ZIM files?

@kelson42
Copy link
Contributor Author

@JensKorte commented on 2021-03-25:

This reminds me a little bit of a meta search engine. The meta search engine queries several search engines and doesn't know, when this will finish. In past some meta search engines provided an interface with a user selectable timeout and a list where search engines could be choosen grouped by categories or languages.

If you think of a timeout between http server and browser, then the server could send a line with a space once in a while, until the search is finished. If the search result page gets an anchor in the URL, the empty line could get ignored by placing the anchor at the begin of the results.

A caching could be helpful, when several people do the same search, e.g. a school class searches during a lesson. For single user this could be helpful, if the first search gets a short timeout and when the search is repeated the caches serves the full response. Maybe a line with the timeout avoiding spaces could be placed at the end of a fast search and when the server finishes the search the user gets a link with "Reload to see all results".

When the first browser request is made to the server, the server could response with a "dynamic" start page where the languages are selected, which the user activated in the browser eg. "DE(-ch), EN(-us)". The user could then enter the search phrase and modify the languages.

@kelson42
Copy link
Contributor Author

@maneeshpm commented on 2021-03-26:

According to this thread on Xapian, Xapian can handle search over multiple databases with a very small overhead compared to single database search. For that, all the databases should be added simultaneously using the Xapian::Database::add_database() method. This is already implemented in libzim. IMO the real bottleneck is in retrieving the indexes from the zim. An improvement here would be to go async and load all the title indexes using multiple threads. This way, we might be able to set up a Xapian::Enquire object faster and let it handle the search. This is limited by the CPU of the host machine, but largely a general solution. But this must be done as soon as the library is loaded since we can assume that the user is going to use search.

PS: I guess this ticket #418 is well written and captures the issue very well.

As far as suggestions not working is concerned, I believe we need to fix that piece of code in kiwix-lib.

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2021-03-26:

retrieving the indexes from the zim

What do you mean exactly here? the IO overhead? Or simply what is reported in #418?

@kelson42
Copy link
Contributor Author

@maneeshpm commented on 2021-03-26:

I meant the net cost of (reading a zim + getting the index + adding it to databases object)

@kelson42
Copy link
Contributor Author

@maneeshpm commented on 2021-03-26:

I think this issue is more suited for kiwix-lib instead of kiwix-tools since the bug is there.

handle_search() and handle_suggest() are somewhat similar routines. Both of them initially try to get a bookName from the request obj inside a try catch block. When searching from the input box on the welcome page, both the functions rely on content argument of the request to load a bookName which is generating an error and entering the catch block. handle_search() does nothing in the catch block and has a fallback method to get all open local zim using mp_library->filter(kiwix::Filter().local(true).valid(true)) and does not raise any error. Whereas handle_suggest() returns a 404 in the catch block, hence causing this behavior. We can implement the same fallback method in handle_suggest() to fix this issue.

I think till the issue of scaling up is sorted, we should hide this feature from the main page as it hurts the user experience for a high number of zims.

@kelson42
Copy link
Contributor Author

@stale[bot] commented on 2021-06-05:

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@kelson42
Copy link
Contributor Author

@maneeshpm commented on 2021-07-16:

@kelson42 we can say as a fact that once a Xapian database is ready, search on it is quick(even on huge Xapian DB) and that is something we cannot improve on our side. Now our main concern is how to make the DB ready first time and how to keep it ready for further searches.

Answer to how to keep it ready is caching, which we have already started looking into in #509

Answering how to make it ready first time quickly is a bit more complicated. Currently in libkiwix side, we make a zim::Searcher only after receiving a query(we make it on each query, hence slow). We could prepare a zim::Searcher as soon as the user opens a multizim because we can expect them to do at least one search on the zim.

Now what to do till the zim::Searcher is being created? extracting the xapian entry from all the zim in case of multizim takes time. We could show a message "Searcher is preparing" and offer a simpler/stripped down search using zim index(which is quick) till the searcher is ready.

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2021-07-16:

The topic of the cold start is already touched in #418. I would keep this topic outside this ticket. That said I still believe that if kiwix-serve has 2000 zim files open, then a multizim search won't give an answer in a reasonable time and memory consumption. This is IMO mostly what this ticket is about.

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2021-07-25:

Here is how I would propose to proceed. First of all this is a quite lartge ticket, so I would first propose to split it in following tasks:

  • Multizim search ABI design should be agreed/confirmed, automated tests should be written to test it.

  • High load situation should be discussed and a solution should be provided to (1) allow users to get proper feedback in reasonable amount of time (2) avoid the whole software to crash/timeout because of lack of CPU/memory.

  • Kiwix-serve Multizim REST API design should be adapted/checked/tested.

  • Kiwix-serve multizim search should be re-introduced (it was the default on the welcome page taskbar, but since a few weeks we don't have a taskbar anymore on the welcome page... this was not working most of the time anyway).

@maneeshpm @mgautierfr Do you agree? Have you comments?

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2021-08-26:

Depends on #509

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2021-12-26:

@maneeshpm Would you mine to tackle the multizim problem until we fix the last details of #509? Maybe you have a feedback obout my last comment?

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2022-02-03:

@maneeshpm Any thoughts about the plan? Would you be ready to implement it?

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2022-03-08:

@maneeshpm We need to move quickly now an this. Therfore, I have reassigned the ticket to @mgautierfr. Hope this is OK for you?

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2022-04-29:

Fulltext multizim search is fixed with #731. The multizim suggestion work is left to do.

@kelson42
Copy link
Contributor Author

@stale[bot] commented on 2022-07-10:

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2022-11-11:

I guess this is ticket fot openzim/linbzim meanwhile.

We should fix #734 forst IMO.

@kelson42
Copy link
Contributor Author

@kelson42 commented on 2024-08-15:

Moving to openzim/libzim where it belongs.

@kelson42 kelson42 added this to the 9.3.0 milestone Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants