-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move vector search from IndexInput to RandomAccessInput #13938
Comments
Hi, I'm learning Lucene KNN and this seems to be a workable PR for beginner. Just curious about the motivation behind this change. Is it only for cleaner code, or are we also suppose to make any latency improvement on the absolute readFloats method compare to the current seek() + readFloats()? |
I think this will be helpful since currently we cannot share these readers across threads -- they retain the state information about the current position. Not sure how much benefit that will be since they must still typically maintain some local temporary storage to retain the value that is read |
Gotcha, the current usage of seek + readFloats requires the Reader to keep the seek position. When we change to the RandomAccessInput, we expect the operation to have no side-effect to the Reader and thus they will be sharable. |
I looked at some implementation of RandomAccessInput, such as BufferedIndexInput. This particular class holds a single buffer for all reads, thus it cannot be shared. If we use temporary buffer (to make it shareable), then it kinda defeats the purpose of the single-buffer, which is to avoid excessive temporary buffers and GC. So it's unavoidable to have side-effects in read. |
@jpountz is this really appropriate? RandomAccessInput is to reduce the overhead when doing tiny (not bulk) reads, it was added to help move from fieldcache to docvalues, where you need to read e.g. single byte value at a specific location. it saves a bounds check for such tiny reads. For bulk reads it isn't useful. Basically, i think this is ok, as long as we remove bulk readFloats() method along with it. |
I was thinking of it differently, that |
Description
Vector search currently loads vectors from disk by issuing a
seek()
followed by areadFloats()
. We should instead:readFloats()
method toRandomAccessInput
RandomAccessInput
instead ofIndexInput
to read vectors from disk.The text was updated successfully, but these errors were encountered: