Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msglist: Throttle fetchOlder retries #1050

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

PIG208
Copy link
Member

@PIG208 PIG208 commented Nov 6, 2024

This approach is different from how a BackoffMachine is typically used,
because the message list doesn't send and retry requests in a loop; its
caller retries rapidly on scroll changes, and we want to ignore the
excessive requests.

The test drops irrelevant requests with connection.takeRequests
without checking, as we are only interested in verifying that no request
was sent.

Fixes: #945

@PIG208 PIG208 force-pushed the pr-storm branch 4 times, most recently from 567be0d to 4bf2482 Compare November 6, 2024 23:38
@PIG208 PIG208 added the maintainer review PR ready for review by Zulip maintainers label Nov 6, 2024
@chrisbobbe
Copy link
Collaborator

(Rerunning CI.)

Copy link
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Good to get this bug fixed. Comments below.

Also, this change breaks an invariant we had before: whenever the top of the message list is scrolled into view, we would show a "start marker", which was either a loading indicator or some text saying there aren't any older messages to load.

We currently only show the loading indicator when fetchingOlder is true, and that's false when we're in the new "cooldown" period after a failed fetch-older request. How about also showing the loading indicator during the cooldown period? When a request is failing over and over, the cooldown period will quickly become multiple seconds long, with fetchingOlder flickering to true in between for perhaps milliseconds at a time; possibly not long enough for a frame.

Then, either here or as a followup, we could give the UI an "error"/"problem" state that's distinct from the loading state. If the request has failed, and especially if it's failed several times, that's an important sign that maybe the user should stop expecting it to succeed. Maybe MessageListLoadingItem could get another param like bool problem, for which we pass true when the BackoffMachine.waitsCompleted exceeds a certain value, like 4. And in that case make the loading indicator look like

image

or something, instead of just

image

.

Comment on lines 499 to 500
bool _waitBeforeRetry = false;
BackoffMachine? _backoffMachine;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can find a more fitting name than _waitBeforeRetry. I think that name might make it harder to see that no code is actually scheduling a retry after waiting for anything. It's also not clear from the name that it's about fetching the next batch of older messages, i.e., fetchOlder.

What if we:

  • Move this up near fetchingOlder, which is related and has a similar role
  • Make this public, with a name like fetchOlderCoolingDown
  • Give this a dartdoc and expand on fetchingOlder's dartdoc to make it clearer that they have similar roles

So, for example:

--- lib/model/message_list.dart
+++ lib/model/message_list.dart
@@ -92,9 +92,30 @@ mixin _MessageSequence {
   bool _haveOldest = false;
 
   /// Whether we are currently fetching the next batch of older messages.
+  ///
+  /// When this is true, [fetchOlder] is a no-op.
+  /// That method is called frequently by Flutter's scrolling logic,
+  /// and this field helps us avoid spamming the same request just to get
+  /// the same response each time.
+  ///
+  /// See also [fetchOlderCoolingDown].
   bool get fetchingOlder => _fetchingOlder;
   bool _fetchingOlder = false;
 
+  /// Whether [fetchOlder] had a request error recently.
+  ///
+  /// When this is true, [fetchOlder] is a no-op.
+  /// That method is called frequently by Flutter's scrolling logic,
+  /// and this field mitigates spamming the same request and getting
+  /// the same error each time.
+  ///
+  /// "Recently" is decided by a [BackoffMachine] that resets
+  /// when a [fetchOlder] request succeeds.
+  ///
+  /// See also [fetchingOlder].
+  bool get fetchOlderCoolingDown => _fetchOlderCoolingDown;
+  bool _fetchOlderCoolingDown = false;
+
   /// The parsed message contents, as a list parallel to [messages].
   ///
   /// The i'th element is the result of parsing the i'th element of [messages].

@@ -528,6 +545,7 @@ class MessageListView with ChangeNotifier, _MessageSequence {

_insertAllMessages(0, fetchedMessages);
_haveOldest = result.foundOldest;
_backoffMachine = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also do this in _reset?

@PIG208

This comment was marked as outdated.

This already holds for the existing callers.  Updating end markers
should only happen after the initial fetch.  During the initial fetch,
we have a separate loading indicator and no end markers.

Signed-off-by: Zixuan James Li <[email protected]>
@PIG208
Copy link
Member Author

PIG208 commented Nov 12, 2024

The PR has been updated with the proposed changes, and additionally, a check for this.generation == generation to avoid potential races after adding _updateEndMarkers and notifyListeners to the backoff callback.

I have also moved the backoff machine variable, renamed to _fetchOlderCooldownBackoffMachine with Cooldown as a noun, to _MessageSequence right next to fetchOlderCoolingDown, because they are relevant to each other. The backoff machine does not have a public getter and we keep it private.

The loading indicator change might be out of scope for this PR, and we should work on that as a follow-up.

@chrisbobbe
Copy link
Collaborator

Looks like CI is failing, could you take a look?

This approach is different from how a BackoffMachine is typically used,
because the message list doesn't send and retry requests in a loop; its
caller retries rapidly on scroll changes, and we want to ignore the
excessive requests.

The test drops irrelevant requests with `connection.takeRequests`
without checking, as we are only interested in verifying that no request
was sent.

Fixes: zulip#945

Signed-off-by: Zixuan James Li <[email protected]>
@PIG208
Copy link
Member Author

PIG208 commented Nov 13, 2024

Updated the PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintainer review PR ready for review by Zulip maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry storm on fetchOlder in message list
2 participants