Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOM caused by queue pressure #377

Closed

Conversation

AdamKatzDev
Copy link
Contributor

Fixes #307. Should be replaced in favor of #262 (but my PR might be used to fix the Kafka sink).

I was able to get stable performance during snapshots and huge single statement updates:

  1. Used LinkedBlockingQueue (which is also thread-safe) with limit of 200,000 records instead of ConcurrentLinkedQueue. When a queue becomes full thread sleeps in a loop until space in the queue is freed. I didn't use @IlyaTsoi solution since ConcurrentLinkedQueue::size is an O(n) (i.e. quite slow for our purpose).
  2. Moved records to a separate buffer with fixed size of 100k to prevent code from building a single batch indefinitely.

What I don't like here:

  1. Busy-waiting in attempt to put a record in an already full queue.
  2. Code that moves records to a separate buffer is quite ugly, the implementation might not be optimal for some cases.
  3. Lots of magic constants and conditions to make it stable.

I've just recently stumbled upon #262 which in theory should be a perfect solution for the problem and also fixes #342. Alas the PR only solves the problems for the lightweight sink. That is not a problem for me, but looks like the Kafka sink suffers from the same problems, the problematic parts of the code are the same.

AdamKatzDev and others added 2 commits November 16, 2023 12:53
move records in a separate buffer with fixed size of 100k to prevent code from building a single batch indefinitely
limited records queue to 200k per topic to prevent records building up
@subkanthi subkanthi closed this Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[sink-connector-lightweight] Initial snapshot slows down and then OOM
2 participants