Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider migrating from codesearch to zoekt #68

Open
1 of 6 tasks
stapelberg opened this issue May 15, 2016 · 0 comments
Open
1 of 6 tasks

Consider migrating from codesearch to zoekt #68

stapelberg opened this issue May 15, 2016 · 0 comments

Comments

@stapelberg
Copy link
Contributor

stapelberg commented May 15, 2016

In first tests, https://github.com/google/zoekt is between 2-10x faster than codesearch and degrades much more gracefully for pathological queries (queries which have many potential matches).

For 1.4G of source code, zoekt writes a 1.7G index, which is a 1.21x blow-up. Our nodes currently have 22-24G used and 52-54G available, so disk-wise, we could actually switch to zoekt.

TODO list:

  • How can we keep our incremental indexing, i.e. could we store one zoekt shard per package, and/or could we merge the per-package shards into a single big shard?
    • zoekt by default indexes into 1 file per repository, so if we treat one debian package as one repository, we already get cheap updates.
  • Which features (query keywords) would we need to drop, which could we keep with a compatibility layer?
  • Do we need to fork zoekt to get all the features our search result page has (context lines etc.)?
    • zoekt does not sort the results within a file, at least not within its own UI
    • there are no context lines around matches in zoekt
  • How do we get our own ranking into zoekt?
  • Could we use the repo/branch feature of zoekt for multiple Debian versions (e.g. sid, testing, …)?
    • How much extra disk space would adding other Debian versions need?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant