Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Reducing allocations in HnswUtil::markRooted #14002

Open
viswanathk opened this issue Nov 19, 2024 · 6 comments
Open

[Discuss] Reducing allocations in HnswUtil::markRooted #14002

viswanathk opened this issue Nov 19, 2024 · 6 comments

Comments

@viswanathk
Copy link
Contributor

viswanathk commented Nov 19, 2024

Description

I was going through the nightly benchmark runs and came across this https://blunders.io/jfr-demo/indexing-1kb-vectors-2024.11.17.18.04.47/allocations-drill-down

Almost 7% of the allocations seem to be due to usage of ArrayDeque and autoboxing that's required to build the stack in this code:

private static Component markRooted(
      HnswGraph hnswGraph,
      int level,
      FixedBitSet connectedNodes,
      FixedBitSet notFullyConnected,
      int maxConn,
      int entryPoint)
      throws IOException {
    // Start at entry point and search all nodes on this level
    // System.out.println("markRooted level=" + level + " entryPoint=" + entryPoint);
    Deque<Integer> stack = new ArrayDeque<>();
    stack.push(entryPoint);
    int count = 0;
    while (!stack.isEmpty()) {
      int node = stack.pop();
      if (connectedNodes.get(node)) {
        continue;
      }
      count++;
      connectedNodes.set(node);
      hnswGraph.seek(level, node);
      int friendOrd;
      int friendCount = 0;
      while ((friendOrd = hnswGraph.nextNeighbor()) != NO_MORE_DOCS) {
        ++friendCount;
        stack.push(friendOrd);
      }
      if (friendCount < maxConn && notFullyConnected != null) {
        notFullyConnected.set(node);
      }
    }
    return new Component(entryPoint, count);
  }

Is this something that's worth improving - seems like a low hanging fix? (I can contribute)

If it's worth improving, then:

  1. If there's a theoretical max depth for the stack, then the stack can be implemented with primitives avoiding the autoboxing cost, and reducing the allocation cost.
  2. In case there's no reasonable max depth, then starting off with a higher capacity for ArrayDeque should reduce some allocations. This code in ArrayDeque::grow looks quite wasteful in terms of allocations.
    private void grow(int needed) {
        int oldCapacity = this.elements.length;
        int jump = oldCapacity < 64 ? oldCapacity + 2 : oldCapacity >> 1;
        int newCapacity;
        if (jump < needed || (newCapacity = oldCapacity + jump) - 2147483639 > 0) {
            newCapacity = this.newCapacity(needed, jump);
        }

        Object[] es = this.elements = Arrays.copyOf(this.elements, newCapacity);
        if (this.tail < this.head || this.tail == this.head && es[this.head] != null) {
            int newSpace = newCapacity - oldCapacity;
            System.arraycopy(es, this.head, es, this.head + newSpace, oldCapacity - this.head);
            int i = this.head;

            for(int to = this.head += newSpace; i < to; ++i) {
                es[i] = null;
            }
        }

    }

Thoughts?

@viswanathk
Copy link
Contributor Author

cc: @msokolov

@vigyasharma
Copy link
Contributor

Interesting. The "theoretical" max depth of this stack would be the size of this graph. I suppose the stack does get large, which would explain a high no. of Array::grow calls? Would be interesting to see the impact of just setting initial capacity to half of the size of graph.

@msokolov
Copy link
Contributor

I do think it's worth improving. Another way could be to measure empirically the stack depth - maybe it scales in a predictable way with total number of vectors? And then we can use that prediction as an initial size. We could also think of using a primitive array that grows (implement our own primitive ArrayDeque).

@viswanathk
Copy link
Contributor Author

Interesting. The "theoretical" max depth of this stack would be the size of this graph. I suppose the stack does get large, which would explain a high no. of Array::grow calls? Would be interesting to see the impact of just setting initial capacity to half of the size of graph.

Maybe sqrt might be a good start? On next grow it should be large enough, and is probably small enough starting point.

I do think it's worth improving. Another way could be to measure empirically the stack depth - maybe it scales in a predictable way with total number of vectors? And then we can use that prediction as an initial size. We could also think of using a primitive array that grows (implement our own primitive ArrayDeque).

Should we be concerned about the chance of a humongous allocation (if the stack is really deep) with primitives? If not, I can give it a shot with running the same benchmarks with primitives and starting at sqrt of the graph size at that level.

@msokolov
Copy link
Contributor

msokolov commented Dec 1, 2024

Should we be concerned about the chance of a humongous allocation (if the stack is really deep) with primitives? If not, I can give it a shot with running the same benchmarks with primitives and starting at sqrt of the graph size at that level.

I don't think switching to primitives would be worse than what we do today (which would have the same humongous allocation, only bigger). So, +1 to try it!

@viswanathk
Copy link
Contributor Author

There's a chance primitive stack might give negligible gains after #14022. Will add it based on the decision of #14022.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants