feat: idsse-912: add optimization args to aws_cp()
#75
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Linear Issue
IDSSE-912
Changes
concurrency
andchunk_size
args to aws_cp, to change how s5cmd downloads files (hopefully improving the time to download a large file)Explanation
These function args let us control the number of parallel threads and size of chunks that s5cmd uses to download a single GRIB file from AWS. The defaults (what DAS was using up until now) is 5 threads, 50 MB chunks at a time, but now DAS can tweak these controls to figure out what runs the fastest in our environment.
Unfortunately s5cmd does not support downloading partial files from AWS S3 today. It's open source, so I'm hoping to contribute to the s5cmd project to get it added.