-
-
Notifications
You must be signed in to change notification settings - Fork 952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flush() after writing to gzip_file #2753
base: master
Are you sure you want to change the base?
Conversation
In order to better support streaming responses where the chunks are smaller than the file buffer size, we flush after writing. Without the explicit flush, the writes are buffered and the subsequent reads see an empty self.gzip_buffer until the file automatically flushes due to either (1) the write buffer fills, probably at 8kiB, or (2) the file is closed because the streaming response is complete. Without flushing, the GZipMiddleware doesn't work as expected for streaming responses, especially not for Server-Sent Events which are expected to be delivered immediately to clients. The code as written appears to intend to flush immediately rather than buffering, as it does immediately call `await self.send(message)`, but in practice that `message` is often empty.
Can we add a test to prove your point? |
I'll work on that today. I have a small repro case I'll share but I need to make it run as a test |
I've added a test, but it's a bit complicated. Without the flush, the entire contents of the response are correct, but to show that they are received iteratively rather than all at once, I use a wrapping middleware to assert that GZipMiddleware isn't sending empty message bodies, which is what it does without the flush. |
@Kludex could you take another look or recommend a good reviewer? Thanks! |
I'm honestly not sure if this is related, but I have a FastAPI project and I'm getting some GZIP exceptions after upgrading to Python 3.13
I have the following GZIP wrapper:
|
Not related. |
Any concerns here, or how can we best move this forward? |
The best way to move forward would be to present the problem first, with an MRE, and references to other issues where other people had the same problem. I think the current behavior is intentional, so I need to get more references around before reviewing this. |
If the behavior is intentional, we really need to update the documentation here:
to indicate that this will cause streaming responses to be buffered 32KiB at a time; this was certainly a surprising result for us, and one that caused our users to report that our app appeared broken as realtime status updates stopped working. |
Summary
In order to better support streaming responses where the chunks are smaller than the file buffer size, we flush after writing.
Without the explicit flush, the writes are buffered and the subsequent reads see an empty self.gzip_buffer until the file automatically flushes due to either (1) the 32KiB write buffer1 fills or (2) the file is closed because the streaming response is complete.
Without flushing, the GZipMiddleware doesn't work as expected for streaming responses, especially not for Server-Sent Events which are expected to be delivered immediately to clients. The code as written appears to intend to flush immediately rather than buffering, as it does immediately call
await self.send(message)
, but in practice thatmessage
is often empty.Checklist
Footnotes
https://github.com/python/cpython/blob/main/Lib/gzip.py#L26 ↩