Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-33201][Connectors/Kafka] Fix memory leak in CachingTopicSelector #55

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

qbx2
Copy link
Contributor

@qbx2 qbx2 commented Oct 6, 2023

What is the purpose of the change

In the CachingTopicSelector, a memory leak may occur when the internal logic fails to check the cache size due to a race condition. (

)
This PR fixes the memory leak by modifying the logic to be more resilient to failure.

Brief change log

Fix memory leak in CachingTopicSelector that can be triggered by race condition.

Verifying this change

caching-topic-selector-heap-dump-analysis

  • By analyzing a Java heap dump, I identified a memory leak in the CachingTopicSelector. As in the screenshot, cache has 47,769 elements. If the internal logic were functioning correctly, the number of elements should be less than or equal to CACHE_RESET_SIZE (which is 5).
  • Since writing unit tests for this type of bug is challenging, I instead applied a hotfix to our production workload. Before applying the hotfix, the memory leak is observed in the workload in 7 days. After applying the patch, the issue is no longer observed.

@boring-cyborg
Copy link

boring-cyborg bot commented Oct 6, 2023

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

@qbx2 qbx2 changed the title [FLINK-XXXX][Connectors/Kafka] Fix memory leak in CachingTopicSelector [FLINK-33201][Connectors/Kafka] Fix memory leak in CachingTopicSelector Oct 7, 2023
@qbx2 qbx2 force-pushed the fix-caching-topic-selector-memory-leak branch from f5ae323 to 112e8fa Compare October 7, 2023 05:48
@qbx2 qbx2 marked this pull request as ready for review October 7, 2023 05:48
@MartijnVisser
Copy link
Contributor

@qbx2 Can you please rebase your PR?

@qbx2 qbx2 force-pushed the fix-caching-topic-selector-memory-leak branch from 112e8fa to d024a40 Compare January 18, 2024 15:33
@qbx2
Copy link
Contributor Author

qbx2 commented Jan 18, 2024

@MartijnVisser Sure, I just rebased it.

@qbx2
Copy link
Contributor Author

qbx2 commented Feb 13, 2024

@MartijnVisser Could you please review?

@cnmckee13
Copy link

Hey All- We're experiencing some memory issues and wondering if this could be the cause. Will this PR be reviewed soon? @MartijnVisser @tzulitai

@MartijnVisser
Copy link
Contributor

@AHeise Can you take a look?

Copy link
Contributor

@AHeise AHeise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will run the test to make sure nothing happens.

@AHeise
Copy link
Contributor

AHeise commented Aug 22, 2024

Oh I see the tests can't be run because it's based on an old branch. I'm rebasing myself.

@AHeise AHeise self-assigned this Aug 22, 2024
@AHeise AHeise merged commit ea3a641 into apache:main Aug 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants