You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// For each acked range, split off the acknowledged part,
// then split off the part that hasn't been acknowledged.
// This order works better when processing ranges that
// have already been processed, which is common.
letmut acked = packets.split_off(range.start());
let keep = acked.split_off(&(*range.end() + 1));
self.packets.extend(keep);
result.extend(acked.into_values().rev());
}
self.packets.extend(packets);
Adding some logging, the following seems to be the case:
Assume we have two nodes, A and B, where A sends 10 MiB to B.
A has 100 packets in flight.
A receives an ACK from B, acknowledging the first 2 packets.
At the end of the first loop iteration:
packets.len() will be 0, given the ACK from B acknowledges the first 2 packets.
acked.len() will be 2, containing the two new acked SentPackets.
keep.len() will be 98, containing all remaining packets
We then execute self.packets.extend(keep), e.g. in the above scenario, adding all remaining 98 packets to the now empty self.packets.
In other words, we re-insert all remaining packets on each ACK.
Unfortunately BTreeMap does not allow splitting out a full range. The closest to that is BTreeMap::extract_if which is currently Nightly only.
That said, I think the following change, optimizing for the scenario above, would get us a long way already:
modified neqo-transport/src/recovery/sent.rs
@@ -197,18 +197,17 @@ impl SentPackets {
{
let mut result = Vec::new();
// Remove all packets. We will add them back as we don't need them.
- let mut packets = std::mem::take(&mut self.packets);
for range in acked_ranges {
- // For each acked range, split off the acknowledged part,- // then split off the part that hasn't been acknowledged.- // This order works better when processing ranges that- // have already been processed, which is common.- let mut acked = packets.split_off(range.start());- let keep = acked.split_off(&(*range.end() + 1));- self.packets.extend(keep);+ let mut packets = std::mem::take(&mut self.packets);++ let mut keep = packets.split_off(&(*range.end()));+ let acked = packets.split_off(range.start());++ keep.extend(packets);+ self.packets = keep;+
result.extend(acked.into_values().rev());
}
- self.packets.extend(packets);
result
}
CPU profiling once more, this resolves the hot-spot on SentPackets::take_ranges:
Let me know if I am missing something? E.g. the above scenario not being something worth optimizing for.
I think this makes sense. Might want to add a bench for it first, to get a performance baseline for a number of different ACK patterns, before doing a fix PR?
When CPU profiling a
test_fixtures::Simulator
run transferring 10 MiB from a server to a client, I see the following flamegraph:The majority of CPU time is spent in
SentPackets::take_ranges
:neqo/neqo-transport/src/recovery/sent.rs
Lines 189 to 213 in f3d0191
More specifically in the two calls to
BTreeMap::extend
:neqo/neqo-transport/src/recovery/sent.rs
Lines 200 to 211 in f3d0191
Adding some logging, the following seems to be the case:
packets.len()
will be0
, given the ACK from B acknowledges the first 2 packets.acked.len()
will be2
, containing the two new ackedSentPacket
s.keep.len()
will be98
, containing all remaining packetsself.packets.extend(keep)
, e.g. in the above scenario, adding all remaining 98 packets to the now emptyself.packets
.Unfortunately
BTreeMap
does not allow splitting out a full range. The closest to that isBTreeMap::extract_if
which is currently Nightly only.That said, I think the following change, optimizing for the scenario above, would get us a long way already:
CPU profiling once more, this resolves the hot-spot on
SentPackets::take_ranges
:Let me know if I am missing something? E.g. the above scenario not being something worth optimizing for.
See also past discussion: https://github.com/mozilla/neqo/pull/1886/files#r1591830138
The text was updated successfully, but these errors were encountered: