fix(s3): reducing array copying on S3 output stream #549

jeqo · 2024-05-22T22:18:42Z

Use only ByteRange slices to pass bytes to S3 client operations and remove array copying.

Doing some benchmarks on the current implementation, the Arrays.copyOfRange dominates the memory allocation:

By switching to the ByteBuffer approach, this copying is removed:

Durations (if empty or more than 1) are not included in the log message.

funky-eyes

LGTM

funky-eyes · 2024-05-24T01:32:40Z

However, in Java, the close method of ByteArrayInputStream has no effect. The methods of this class can be called after the stream has been closed without generating an IOException. This is because the data of ByteArrayInputStream is stored in memory, unlike file streams or network streams that require actual resource cleanup, so there may not be an out-of-memory problem

jeqo · 2024-05-24T15:52:54Z

@funky-eyes good catch! I'm adding more changes on how bytes are to the request using byte buffers only instead of array copying. PTAL

funky-eyes

LGTM

jeqo · 2024-05-29T08:06:55Z

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStream.java

-                // TODO: get rid of this array copying
-                partBuffer.put(source.array(), offset, transferred);
-                processedBytes += transferred;
-                source.position(source.position() + transferred);
+            final ByteBuffer inputBuffer = ByteBuffer.wrap(b, off, len);
+            while (inputBuffer.hasRemaining()) {
+                // copy batch to part buffer
+                final int inputLimit = inputBuffer.limit();
+                final int toCopy = Math.min(partBuffer.remaining(), inputBuffer.remaining());
+                final int positionAfterCopying = inputBuffer.position() + toCopy;
+                inputBuffer.limit(positionAfterCopying);
+                partBuffer.put(inputBuffer.slice());


Main change: remove array copying, and reuse input buffer

jeqo · 2024-05-29T08:07:32Z

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStream.java

-        final ByteArrayInputStream in = new ByteArrayInputStream(partBuffer.array(), offset, actualPartSize);
-        uploadPart(in, actualPartSize);
-        partBuffer.clear();
+        try (final InputStream in = new ByteBufferInputStream(buffer)) {
+            processedBytes += actualPartSize;
+            uploadPart(in, actualPartSize);
+        } catch (final IOException e) {
+            throw new RuntimeException(e);
+        }


Also, provide an inputstream directly from buffer stream instead of creating a new byte array.

jeqo · 2024-08-02T09:39:22Z

For reference, similar improvements have been implemented on kafka core: apache/kafka#15589

biggusdonzus · 2024-08-05T10:17:55Z

storage/s3/src/integration-test/java/io/aiven/kafka/tieredstorage/storage/s3/S3StorageTest.java

@@ -50,8 +50,8 @@ public class S3StorageTest extends BaseStorageTest {

    @BeforeAll
    static void setUpClass() {
-        final var clientBuilder = S3Client.builder();
-        clientBuilder.region(Region.of(LOCALSTACK.getRegion()))
+        s3Client = S3Client.builder()


nit: it probably doesn't matter for a test, but i think that s3Client needs to be closed in a @AfterAll method to cleanup eventual connections/resources.

biggusdonzus · 2024-08-05T12:26:44Z

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3Storage.java

-        try (final var out = s3OutputStream(key)) {
+        final var out = s3OutputStream(key);
+        try (out) {
            inputStream.transferTo(out);
-            return out.processedBytes();
        } catch (final IOException e) {
            throw new StorageBackendException("Failed to upload " + key, e);
        }
+        return out.processedBytes();


i do not understand why you did this 😅

Added a comment to clarify a bit, but the main reason this is now outside the resource try-catch is because the processedBytes in the output stream is counted now as part of the flushBuffer and not the write method anymore.
This is more accurate as it counts bytes processed after upload compared to the previous behavior where it was count when writing.
Before, the processed bytes would be known before closing. Now as the last flushBuffer may occur when closing the stream, it means that the result is not known before closing, thus the change.

writeAllMessages is not validating the request bodies properly, as writesTailMessages does.

Define 2 buffers: 1 for input read data, and the other for part size upload. Slice them to pass data to IO (part building and part upload).

biggusdonzus

looks good 👍

fix: log unexpected values

6010678

Durations (if empty or more than 1) are not included in the log message.

jeqo force-pushed the jeqo/s3-memleak branch 2 times, most recently from d7f6530 to 54ae8de Compare May 23, 2024 07:00

jeqo changed the title ~~jeqo/s3 memleak~~ fix(s3): close upload part InputStream May 23, 2024

jeqo marked this pull request as ready for review May 23, 2024 07:01

jeqo requested a review from a team as a code owner May 23, 2024 07:01

funky-eyes approved these changes May 24, 2024

View reviewed changes

jeqo force-pushed the jeqo/s3-memleak branch from 5c7c327 to d3d295b Compare May 24, 2024 15:45

jeqo changed the title ~~fix(s3): close upload part InputStream~~ fix(s3): reducing array copying on S3 output stream May 26, 2024

funky-eyes approved these changes May 27, 2024

View reviewed changes

jeqo commented May 29, 2024

View reviewed changes

jeqo mentioned this pull request Jul 18, 2024

Some data were not uploaded to the S3 when the num of partition > 7 #481

Open

biggusdonzus reviewed Aug 5, 2024

View reviewed changes

jeqo added 4 commits August 5, 2024 18:18

refactor(test): simplify s3 client instantiation

11f3450

refactor(test): improve s3 output stream test

319f42c

writeAllMessages is not validating the request bodies properly, as writesTailMessages does.

fix(s3): close upload part InputStream

a66a659

fix(s3): reduce array copying with buffer slices

cf3dff2

Define 2 buffers: 1 for input read data, and the other for part size upload. Slice them to pass data to IO (part building and part upload).

jeqo force-pushed the jeqo/s3-memleak branch from a858f70 to cf3dff2 Compare August 5, 2024 15:18

biggusdonzus approved these changes Aug 5, 2024

View reviewed changes

biggusdonzus merged commit 23792ed into main Aug 6, 2024
9 checks passed

biggusdonzus deleted the jeqo/s3-memleak branch August 6, 2024 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(s3): reducing array copying on S3 output stream #549

fix(s3): reducing array copying on S3 output stream #549

jeqo commented May 22, 2024 •

edited

Loading

funky-eyes left a comment

funky-eyes commented May 24, 2024

jeqo commented May 24, 2024

funky-eyes left a comment

jeqo May 29, 2024

jeqo May 29, 2024

jeqo commented Aug 2, 2024 •

edited

Loading

biggusdonzus Aug 5, 2024

biggusdonzus Aug 5, 2024

jeqo Aug 5, 2024

biggusdonzus left a comment

fix(s3): reducing array copying on S3 output stream #549

fix(s3): reducing array copying on S3 output stream #549

Conversation

jeqo commented May 22, 2024 • edited Loading

funky-eyes left a comment

Choose a reason for hiding this comment

funky-eyes commented May 24, 2024

jeqo commented May 24, 2024

funky-eyes left a comment

Choose a reason for hiding this comment

jeqo May 29, 2024

Choose a reason for hiding this comment

jeqo May 29, 2024

Choose a reason for hiding this comment

jeqo commented Aug 2, 2024 • edited Loading

biggusdonzus Aug 5, 2024

Choose a reason for hiding this comment

biggusdonzus Aug 5, 2024

Choose a reason for hiding this comment

jeqo Aug 5, 2024

Choose a reason for hiding this comment

biggusdonzus left a comment

Choose a reason for hiding this comment

jeqo commented May 22, 2024 •

edited

Loading

jeqo commented Aug 2, 2024 •

edited

Loading