-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use streaming compression for files. #47
base: main
Are you sure you want to change the base?
Conversation
Rather than read large files into memory and process them multiple times (for hashing and for compression), use streaming compression for files so that only the compressed output needs to be fully in memory. This has the side-effect of the object ID being generated by hashing the compressed text (which is better because there's less to hash after compression). This means that the function image needs to be regenerated to match. This updates nelhage#45.
Seems like there's a test failure -- can you take a look? This is also somehow causing OOMs for my LLVM build (again using remote preprocessing); I haven't figured out why yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few line-by-line comments. Still investigating the memory issue.
} | ||
|
||
if _, err := io.Copy(encoder, obj); err != nil { | ||
encoder.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use defer encoder.Close()
above to ensure this is closed on every exit path
@@ -30,7 +31,9 @@ type GetRequest struct { | |||
var ErrNotExists = errors.New("Requested object does not exist") | |||
|
|||
type Store interface { | |||
Store(ctx context.Context, obj []byte) (string, error) | |||
StoreBytes(ctx context.Context, obj []byte) (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default I would only have Store
, and would make a StoreBytes
helper somewhere that wraps the bytes in a bytes.Buffer
.
If an implementation wants to do something more efficient when it knows it has a byte buffer, it can do a type assertion to *bytes.Buffer
to check for that.
if file.Local.Path != "" { | ||
pfile, err := files.ReadFile(ctx, store, file.Local.Path) | ||
switch err { | ||
case nil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use an if
test for a nilness check.
yeh I broke some internal contract ... will take a look next weekend :)
That's weird! For me with (partial) local preprocessing this makes memory usage nice and stable. |
Hm, I think I understand the OOM now. For remote preprocessing, we see the same header files many, many, many times; with this change, we compress each time before we hash and look at the upload cache, which means we actually generate many times more garbage in that case than we did previously, and I think we end up in a similar situation where the GC fails to keep up. I'm also not sure if zstd is deterministic – is there a risk that we end up uploading multiple versions of the same file if they get compressed differently? |
Looking a bit more, it looks like zstd has a massive per-encoder memory footprint -- at least a few MiB. For remote preprocessing, early builds upload hundreds of header files in a go, which results in us trying to create a new encoder for each one concurrently. It might make sense to rate-limit compression to one job per core or something, anyways, which would help with that… |
Maybe using a sync.Pool would help with that, but based on your comments about the remote compile use case, I probably want to revisit this PR. It's starting to feel like it causes more problems than it solves :) I'm wondering whether just using mmap (for larger files) might be a better way to deal with memory costs of reading the inputs. |
@nelhage Are you able to share your setup for compiling LLVM using llama? I have a few ideas regarding perf improvements that I'd like to try out and it would be good to have a similar baseline to what you currently see. |
Sure! The blog post should have most of it (https://blog.nelhage.com/post/building-llvm-in-90s/); I'm building on an AMD Ryzen 8 3900X 12-core / 24-thread processor, on Sonic fiber internet (but on wifi on my desktop). Client is Ubuntu 20.04 Focal. What else would be helpful for you? |
Rather than read large files into memory and process them multiple
times (for hashing and for compression), use streaming compression
for files so that only the compressed output needs to be fully in
memory.
This has the side-effect of the object ID being generated by hashing
the compressed text (which is better because there's less to hash after
compression). This means that the function image needs to be regenerated
to match.
This updates #45.