Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate direct buffers for multipart upload #559

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dopuskh3
Copy link

@dopuskh3 dopuskh3 commented Jun 6, 2024

Allow to allocate multipart upload buffers as direct buffer rather than on the heap.

Reasoning

We try to set a pretty large multipart upload part size on cluster to optimize throughput and reduce S3 requests.

At the same time, we try to keep kafka JVM heap size contained on most kafka installation in order to leave as much memory as possible for the page cache. As a matter of example, we will use 4GB heap size on machines with 64GB available memory.

The consequence of using pretty large multipart upload size on contained JVM heap size is that we can pretty easily run out of heap size if we suddenly have to upload many segments to tiered storage.

The strategy we propose is to allocate multipart buffers in direct memory so that we can more easily configure direct buffer budget.

Usage

We introduced s3.multipart.upload.direct.buffers boolean configuration flag (disabled by default).

@dopuskh3 dopuskh3 requested a review from a team as a code owner June 6, 2024 08:54
@dopuskh3 dopuskh3 changed the title [datadog] Allocate direct buffers for multipart upload Allocate direct buffers for multipart upload Jun 6, 2024
@funky-eyes
Copy link
Contributor

I think this feature is good.

@dopuskh3 dopuskh3 force-pushed the francois.visconte/allocate-direct-buffers branch 2 times, most recently from 8ffac2f to 05ef2ed Compare June 9, 2024 17:30
@dopuskh3
Copy link
Author

dopuskh3 commented Jun 9, 2024

@jeqo fixed tests, PTAL

Allow to allocate multipart upload buffers as direct buffer rather than
on the heap.

We try to set a pretty large multipart upload part size on cluster to
optimize throughput and reduce S3 requests.

At the same time, we try to keep kafka JVM heap size contained on most
kafka installation in order to leave as much memory as possible for the
page cache. As a matter of example, we will use 4GB heap size on
machines with 64GB available memory.

The consequence of using pretty large multipart upload size on contained
JVM heap size is that we can pretty easily run out of heap size if we
suddenly have to upload many segments to tiered storage.

The strategy we propose is to allocate multipart buffer in direct memory
so that we can more easily configure direct buffer budget.
@dopuskh3 dopuskh3 force-pushed the francois.visconte/allocate-direct-buffers branch from 05ef2ed to 9fcdda2 Compare June 10, 2024 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants