-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Michael Graeb <[email protected]>
- Loading branch information
1 parent
5fe1c3b
commit f961971
Showing
19 changed files
with
1,152 additions
and
78 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
CRT S3 client was designed with throughput as a primary goal. As such, the client | ||
scales resource usage, such as number of parallel requests in flight, to achieve | ||
target throughput. The client creates buffers to hold data it is sending or | ||
receiving for each request and scaling requests in flight has direct impact on | ||
memory used. In practice, setting high target throughput or larger part size can | ||
lead to high observed memory usage. | ||
|
||
To mitigate high memory usages, memory reuse improvements were recently added to | ||
the client along with options to limit max memory used. The following sections | ||
will go into more detail on aspects of those changes and how the affect the | ||
client. | ||
|
||
### Memory Reuse | ||
At the basic level, CRT S3 client starts with a meta request for operation like | ||
put or get, breaks it into smaller part-sized requests and executes those in | ||
parallel. CRT S3 client used to allocate part sized buffer for each of those | ||
requests and release it right after the request was done. That approach, | ||
resulted in a lot of very short lived allocations and allocator thrashing, | ||
overall leading to memory use spikes considerably higher than whats needed. To | ||
address that, the client is switching to a pooled buffer approach, discussed | ||
below. | ||
|
||
Note: approach described below is work in progress and concentrates on improving | ||
the common cases (default 8mb part sizes and part sizes smaller than 64mb). | ||
|
||
Several observations about the client usage of buffers: | ||
- Client does not automatically switch to buffers above default 8mb for upload, until | ||
upload passes 10,000 parts (~80 GB). | ||
- Get operations always use either the configured part size or default of 8mb. | ||
Part size for get is not adjusted, since there is no 10,000 part limitation. | ||
- Both Put and Get operations go through fill and drain phases. Ex. for Put, the | ||
client first schedules a number of reads to 'fil' the buffers from the source | ||
and as those reads complete, the buffer are send over to the networking layer | ||
are 'drained' | ||
- individual uploadParts or ranged gets operations typically have a similar | ||
lifespan (with some caveats). in practice part buffers are acquired/released | ||
in bulk at the same time | ||
|
||
The buffer pooling takes advantage of some of those allocation patterns and | ||
works as follows. | ||
The memory is split into primary and secondary areas. Secondary area is used for | ||
requests with part size bigger than a predefined value (currently 4 times part size) | ||
allocations from it got directly to allocator and are effectively old way of | ||
doing things. | ||
|
||
Primary memory area is split into blocks of fixed size (part size if defined or | ||
8mb if not times 16). Blocks are allocated on demand. Each block is logically | ||
subdivided into part sized chunks. Pool allocates and releases in chunk sizes | ||
only, and supports acquiring several chunks (up to 4) at once. | ||
|
||
Blocks are kept around while there are ongoing requests and are released async, | ||
when there is low pressure on memory. | ||
|
||
### Scheduling | ||
Running out of memory is a terminal condition within CRT and in general its not | ||
practical to try to set overall memory limit on all allocations, since it | ||
dramatically increases the complexity of the code that deals with cases where | ||
only part of a memory was allocated for a task. | ||
|
||
Comparatively, majority of memory usage within S3 Client comes from buffers | ||
allocated for Put/Get parts. So to control memory usage, the client will | ||
concentrate on controlling the number of buffers allocated. Effectively, this | ||
boils down to a back pressure mechanism of limiting the number of parts | ||
scheduled as memory gets closer to the limit. Memory used for other resources, | ||
ex. http connections data, various supporting structures, are not actively | ||
controlled and instead some memory is taken out from overall limit. | ||
|
||
Overall, scheduling does a best-effort memory limiting. At the time of | ||
scheduling, the client reserves memory by using buffer pool ticketing mechanism. | ||
Buffer is acquired from the pool using the ticket as close to the usage as | ||
possible (this approach peaks at lower mem usage than preallocating all mem | ||
upfront because buffers cannot be used right away, ex reading from file will | ||
fill buffers slower than they are sent, leading to decent amount of buffer reuse) | ||
Reservation mechanism is approximate and in some cases can lead to actual memory | ||
usage being higher once tickets are redeemed. The client reserves some memory to | ||
mitigate overflows like that. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
#ifndef AWS_S3_BUFFER_ALLOCATOR_H | ||
#define AWS_S3_BUFFER_ALLOCATOR_H | ||
|
||
/** | ||
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
* SPDX-License-Identifier: Apache-2.0. | ||
*/ | ||
|
||
#include <aws/s3/s3.h> | ||
|
||
/* | ||
* S3 buffer pool. | ||
* Buffer pool used for pooling part sized buffers for Put/Get operations. | ||
* Provides additional functionally for limiting overall memory used. | ||
* High-level buffer pool usage flow: | ||
* - Create buffer with overall memory limit and common buffer size, aka chunk | ||
* size (typically part size configured on client) | ||
* - For each request: | ||
* -- call reserve to acquire ticket for future buffer acquisition. this will | ||
* mark memory reserved, but would not allocate it. if reserve call hits | ||
* memory limit, it fails and reservation hold is put on the whole buffer | ||
* pool. (aws_s3_buffer_pool_remove_reservation_hold can be used to remove | ||
* reservation hold). | ||
* -- once request needs memory, it can exchange ticket for a buffer using | ||
* aws_s3_buffer_pool_acquire_buffer. this operation never fails, even if it | ||
* ends up going over memory limit. | ||
* -- buffer lifetime is tied to the ticket. so once request is done with the | ||
* buffer, ticket is released and buffer returns back to the pool. | ||
*/ | ||
|
||
AWS_EXTERN_C_BEGIN | ||
|
||
struct aws_s3_buffer_pool; | ||
struct aws_s3_buffer_pool_ticket; | ||
|
||
struct aws_s3_buffer_pool_usage_stats { | ||
/* Effective Max memory limit. Memory limit value provided during construction minus | ||
* buffer reserved for overhead of the pool */ | ||
size_t mem_limit; | ||
|
||
/* How much mem is used in primary storage. includes memory used by blocks | ||
* that are waiting on all allocs to release before being put back in circulation. */ | ||
size_t primary_used; | ||
/* Overall memory allocated for blocks. */ | ||
size_t primary_allocated; | ||
/* Reserved memory. Does not account for how that memory will map into | ||
* blocks and in practice can be lower than used memory. */ | ||
size_t primary_reserved; | ||
/* Number of blocks allocated in primary. */ | ||
size_t primary_num_blocks; | ||
|
||
/* Secondary mem used. Accurate, maps directly to base allocator. */ | ||
size_t secondary_used; | ||
/* Secondary mem reserved. Accurate, maps directly to base allocator. */ | ||
size_t secondary_reserved; | ||
}; | ||
|
||
/* | ||
* Create new buffer pool. | ||
* chunk_size - specifies the size of memory that will most commonly be acquired | ||
* from the pool (typically part size). | ||
* mem_limit - limit on how much mem buffer pool can use. once limit is hit, | ||
* buffers can no longer be reserved from (reservation hold is placed on the pool). | ||
* Returns buffer pool pointer on success and NULL on failure. | ||
*/ | ||
AWS_S3_API struct aws_s3_buffer_pool *aws_s3_buffer_pool_new( | ||
struct aws_allocator *allocator, | ||
size_t chunk_size, | ||
size_t mem_limit); | ||
|
||
/* | ||
* Destroys buffer pool. | ||
* Does nothing if buffer_pool is NULL. | ||
*/ | ||
AWS_S3_API void aws_s3_buffer_pool_destroy(struct aws_s3_buffer_pool *buffer_pool); | ||
|
||
/* | ||
* Reserves memory from the pool for later use. | ||
* Best effort and can potentially reserve memory slightly over the limit. | ||
* Reservation takes some memory out of the available pool, but does not | ||
* allocate it right away. | ||
* On success ticket will be returned. | ||
* On failure NULL is returned, error is raised and reservation hold is placed | ||
* on the buffer. Any further reservations while hold is active will fail. | ||
* Remove reservation hold to unblock reservations. | ||
*/ | ||
AWS_S3_API struct aws_s3_buffer_pool_ticket *aws_s3_buffer_pool_reserve( | ||
struct aws_s3_buffer_pool *buffer_pool, | ||
size_t size); | ||
|
||
/* | ||
* Whether pool has a reservation hold. | ||
*/ | ||
AWS_S3_API bool aws_s3_buffer_pool_has_reservation_hold(struct aws_s3_buffer_pool *buffer_pool); | ||
|
||
/* | ||
* Remove reservation hold on pool. | ||
*/ | ||
AWS_S3_API void aws_s3_buffer_pool_remove_reservation_hold(struct aws_s3_buffer_pool *buffer_pool); | ||
|
||
/* | ||
* Trades in the ticket for a buffer. | ||
* Cannot fail and can over allocate above mem limit if reservation was not accurate. | ||
* Using the same ticket twice will return the same buffer. | ||
* Buffer is only valid until the ticket is released. | ||
*/ | ||
AWS_S3_API struct aws_byte_buf aws_s3_buffer_pool_acquire_buffer( | ||
struct aws_s3_buffer_pool *buffer_pool, | ||
struct aws_s3_buffer_pool_ticket *ticket); | ||
|
||
/* | ||
* Releases the ticket. | ||
* Any buffers associated with the ticket are invalidated. | ||
*/ | ||
AWS_S3_API void aws_s3_buffer_pool_release_ticket( | ||
struct aws_s3_buffer_pool *buffer_pool, | ||
struct aws_s3_buffer_pool_ticket *ticket); | ||
|
||
/* | ||
* Get pool memory usage stats. | ||
*/ | ||
AWS_S3_API struct aws_s3_buffer_pool_usage_stats aws_s3_buffer_pool_get_usage(struct aws_s3_buffer_pool *buffer_pool); | ||
|
||
/* | ||
* Trims all unused mem from the pool. | ||
* Warning: fairly slow operation, do not use in critical path. | ||
* TODO: partial trimming? ex. only trim down to 50% of max? | ||
*/ | ||
AWS_S3_API void aws_s3_buffer_pool_trim(struct aws_s3_buffer_pool *buffer_pool); | ||
|
||
AWS_EXTERN_C_END | ||
|
||
#endif /* AWS_S3_BUFFER_ALLOCATOR_H */ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.