Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE #8765] fix low performance of delay message when enable rocksdb consume queue #8766

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

yuz10
Copy link
Member

@yuz10 yuz10 commented Sep 27, 2024

Which Issue(s) This PR Fixes

Fixes #8765

Brief Description

How Did You Test This Change?

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 53.57143% with 13 lines in your changes missing coverage. Please review.

Project coverage is 47.35%. Comparing base (daf3d1a) to head (8bd91e3).
Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
...ache/rocketmq/store/queue/RocksDBConsumeQueue.java 53.57% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #8766      +/-   ##
=============================================
- Coverage      47.52%   47.35%   -0.17%     
+ Complexity     11592    11560      -32     
=============================================
  Files           1282     1282              
  Lines          89848    89882      +34     
  Branches       11557    11565       +8     
=============================================
- Hits           42697    42567     -130     
- Misses         41927    42056     +129     
- Partials        5224     5259      +35     
Flag Coverage Δ
47.35% <53.57%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lizhanhui
Copy link
Contributor

@yuz10 Is there profiling metrics verifying that prefetch actually improves perf?

@yuz10
Copy link
Member Author

yuz10 commented Sep 29, 2024

@yuz10 Is there profiling metrics verifying that prefetch actually improves perf?

The performance loss is not related to prefetching. The schedule message deliver speed is 160/s because every time the iterator only returns 16 messages, and the deliver thread will sleep 100ms after iterate finish. See org.apache.rocketmq.broker.schedule.ScheduleMessageService.DeliverDelayedMessageTimerTask#executeOnTimeUp

@yuz10
Copy link
Member Author

yuz10 commented Sep 29, 2024

@lizhanhui I found no difference between batch and single get key from rocksdb. I will remove prefetch code.
Batch:
QueryCQ iter 10489877 cost 20527
QueryCQ iter 10489877 cost 19496
QueryCQ iter 10489877 cost 19395

Single:
QueryCQ iter 10489877 cost 20313
QueryCQ iter 10489877 cost 19196
QueryCQ iter 10489877 cost 18945

@yuz10 yuz10 requested a review from lizhanhui October 17, 2024 02:08
@lizhanhui
Copy link
Contributor

@yuz10 Got your update and review it tomorrow.

@lizhanhui
Copy link
Contributor

The performance loss is not related to prefetching. The schedule message deliver speed is 160/s because every time the iterator only returns 16 messages, and the deliver thread will sleep 100ms after iterate finish. See org.apache.rocketmq.broker.schedule.ScheduleMessageService.DeliverDelayedMessageTimerTask#executeOnTimeUp

  1. The original implementation uses one-shot(at most 16 results) multi-get to simulate iterator; The outcome iterator fails to return all results, thus, does not fit well for the mentioned use case;
  2. You change is to use lazy single get to iterate; and use potential pre-fetch to accelerate;
  3. A third option is to directly wrap RocksIterator with prefix;

It would be best to make further comparisons in terms of performance(why multi-get at present), code maintenance, ...
After all pros and cons are clarified, we may finalize this pull request.

Another issue is option 2, aka, this pull request, changes original behavior. We need to verify the change does not impact semantics of upper layer code bases.

@yuz10
Copy link
Member Author

yuz10 commented Oct 28, 2024

  1. The original implementation uses one-shot(at most 16 results) multi-get to simulate iterator; The outcome iterator fails to return all results, thus, does not fit well for the mentioned use case;
  2. You change is to use lazy single get to iterate; and use potential pre-fetch to accelerate;
  3. A third option is to directly wrap RocksIterator with prefix;

It would be best to make further comparisons in terms of performance(why multi-get at present), code maintenance, ... After all pros and cons are clarified, we may finalize this pull request.

Another issue is option 2, aka, this pull request, changes original behavior. We need to verify the change does not impact semantics of upper layer code bases.

I did not compare the performance of RocksIterator with current solution, It can be optimized later, the current solution just deals with the issue of delay message. Another solustion is not to sleep 100ms after each iteration.
As for the behavior, the default ConsumeQueue only iters one file, the RocksDBConsumeQueue only iters at most 16 items. so I think the behavior is not defined about how many items the iteration returns. and will not impact upper layer code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Bad performace of delay message in rocksdb consumequeue
3 participants