Optimize `SentPackets::take_ranges` #2242

mxinden · 2024-11-21T10:50:05Z

When CPU profiling a test_fixtures::Simulator run transferring 10 MiB from a server to a client, I see the following flamegraph:

The majority of CPU time is spent in SentPackets::take_ranges:

neqo/neqo-transport/src/recovery/sent.rs

Lines 189 to 213 in f3d0191

    
           /// Take values from a specified ranges of packet numbers. 
        
           /// The values returned will be reversed, so that the most recent packet appears first. 
        
           /// This is because ACK frames arrive with ranges starting from the largest acknowledged 
        
           /// and we want to match that. 
        
           pub fn take_ranges<R>(&mut self, acked_ranges: R) -> Vec<SentPacket> 
        
           where 
        
               R: IntoIterator<Item = RangeInclusive<PacketNumber>>, 
        
               R::IntoIter: ExactSizeIterator, 
        
           { 
        
               let mut result = Vec::new(); 
        
               // Remove all packets. We will add them back as we don't need them. 
        
               let mut packets = std::mem::take(&mut self.packets); 
        
               for range in acked_ranges { 
        
                   // For each acked range, split off the acknowledged part, 
        
                   // then split off the part that hasn't been acknowledged. 
        
                   // This order works better when processing ranges that 
        
                   // have already been processed, which is common. 
        
                   let mut acked = packets.split_off(range.start()); 
        
                   let keep = acked.split_off(&(*range.end() + 1)); 
        
                   self.packets.extend(keep); 
        
                   result.extend(acked.into_values().rev()); 
        
               } 
        
               self.packets.extend(packets); 
        
               result 
        
           }

More specifically in the two calls to BTreeMap::extend:

neqo/neqo-transport/src/recovery/sent.rs

Lines 200 to 211 in f3d0191

    
           let mut packets = std::mem::take(&mut self.packets); 
        
           for range in acked_ranges { 
        
               // For each acked range, split off the acknowledged part, 
        
               // then split off the part that hasn't been acknowledged. 
        
               // This order works better when processing ranges that 
        
               // have already been processed, which is common. 
        
               let mut acked = packets.split_off(range.start()); 
        
               let keep = acked.split_off(&(*range.end() + 1)); 
        
               self.packets.extend(keep); 
        
               result.extend(acked.into_values().rev()); 
        
           } 
        
           self.packets.extend(packets);

Adding some logging, the following seems to be the case:

Assume we have two nodes, A and B, where A sends 10 MiB to B.
A has 100 packets in flight.
A receives an ACK from B, acknowledging the first 2 packets.
At the end of the first loop iteration:
- packets.len() will be 0, given the ACK from B acknowledges the first 2 packets.
- acked.len() will be 2, containing the two new acked SentPackets.
- keep.len() will be 98, containing all remaining packets
We then execute self.packets.extend(keep), e.g. in the above scenario, adding all remaining 98 packets to the now empty self.packets.
In other words, we re-insert all remaining packets on each ACK.

Unfortunately BTreeMap does not allow splitting out a full range. The closest to that is BTreeMap::extract_if which is currently Nightly only.

That said, I think the following change, optimizing for the scenario above, would get us a long way already:

modified   neqo-transport/src/recovery/sent.rs
@@ -197,18 +197,17 @@ impl SentPackets {
     {
         let mut result = Vec::new();
         // Remove all packets. We will add them back as we don't need them.
-        let mut packets = std::mem::take(&mut self.packets);
         for range in acked_ranges {
-            // For each acked range, split off the acknowledged part,
-            // then split off the part that hasn't been acknowledged.
-            // This order works better when processing ranges that
-            // have already been processed, which is common.
-            let mut acked = packets.split_off(range.start());
-            let keep = acked.split_off(&(*range.end() + 1));
-            self.packets.extend(keep);
+            let mut packets = std::mem::take(&mut self.packets);
+
+            let mut keep = packets.split_off(&(*range.end()));
+            let acked = packets.split_off(range.start());
+
+            keep.extend(packets);
+            self.packets = keep;
+
             result.extend(acked.into_values().rev());
         }
-        self.packets.extend(packets);
         result
     }

CPU profiling once more, this resolves the hot-spot on SentPackets::take_ranges:

Let me know if I am missing something? E.g. the above scenario not being something worth optimizing for.

See also past discussion: https://github.com/mozilla/neqo/pull/1886/files#r1591830138

The text was updated successfully, but these errors were encountered:

larseggert · 2024-11-21T12:55:37Z

I think this makes sense. Might want to add a bench for it first, to get a performance baseline for a number of different ACK patterns, before doing a fix PR?

martinthomson · 2024-11-21T15:23:02Z

Please, include a pretty solid comment in the code when you do this. We'll need to remember why this is apparently backwards.

mxinden linked a pull request Nov 24, 2024 that will close this issue

perf(transport/recovery): optimize SentPackets::take_ranges #2245

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `SentPackets::take_ranges` #2242

Optimize `SentPackets::take_ranges` #2242

mxinden commented Nov 21, 2024

larseggert commented Nov 21, 2024

martinthomson commented Nov 21, 2024

Optimize SentPackets::take_ranges #2242

Optimize SentPackets::take_ranges #2242

Comments

mxinden commented Nov 21, 2024

larseggert commented Nov 21, 2024

martinthomson commented Nov 21, 2024

Optimize `SentPackets::take_ranges` #2242

Optimize `SentPackets::take_ranges` #2242