Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Avoid CancelledError from being propagated to lifespan's receive() #2367

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bellini666
Copy link

@bellini666 bellini666 commented Jun 13, 2024

Summary

Currently LifeSpanOn.main() is excepting BaseException, which also catches CancelledError.

CancelledError is used by asyncio to signal that a task has been cancelled and is not an error condition.

This PR changes the behavior to catch CancelledError separately and log it as a "Lifespan task cancelled" instead of letting the error propagate.

This is a traceback that I have seen some times on sentry for a project that I work on.

Screenshot 2024-06-13 at 19 08 19

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@bellini666 bellini666 force-pushed the fix_cancelled_error branch 2 times, most recently from 31f889b to 76c6285 Compare June 13, 2024 22:19
Currently `LifeSpanOn.main()` is excepting `BaseException`, which
also catches `CancelledError`.

`CancelledError` is used by `asyncio` to signal that a task has been
cancelled and is not an error condition.

This commit changes the behavior to catch `CancelledError` separately
and log it as a "Lifespan  task cancelled" instead of letting the error
propagate.
@Kludex
Copy link
Sponsor Member

Kludex commented Jun 14, 2024

What problem are you trying to solve? Can you give me a real example?

# Lifespan task was cancelled, likely due to the server shutting down.
# This is not an error that should be handled by BaseException above,
# so we just log it and exit.
self.logger.info("Lifespan task cancelled.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancelledError should generally be re-raised, in order to propagate cancellation up the call chain or to the to-be-cancelled task.

Suggested change
self.logger.info("Lifespan task cancelled.")
self.logger.info("Lifespan task cancelled.")
raise

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Initially I went with the reraise, but the issue persisted, so I naively assumed it was already the top of the chain and that not propagating it would be fine.

I think the main reason is because the issue is not actually here but on starllete (I commented about in #2367 (comment))

@bellini666
Copy link
Author

Hey @Kludex,

So, it is not actually a real issue that would cause the application to misbehave or anything like that, but just a noisy error that will appear on your log and sentry (which is my case) but it is actually ok.

It is very rare (occurred 3 times in the past 30 days for me). But looking at the traceback (https://parade-ai.sentry.io/share/issue/4797523b6aa9458c927599ef0abf35a5/) and the logging below, it happened at the same second that uvicorn started, so my theory is that something canceled it during its startup process in a moment that the code was not expecting to be canceled.

Screenshot 2024-06-14 at 11 08 12

While replying here I tried to dig a bit deeper into the traceback and just noticed something that I completely missed yesterday:

Screenshot 2024-06-14 at 11 30 44

starlette is actually the one that is doing something "wrong" in here. For any BaseException it will send either a "lifespan.shutdown.failed" or a "lifespan.startup.failed", which is usually what we want, but for CancelledError that is not the case, it is just that the loop is asking the task to stop and uses that exception as a way to allow the code to except it and do some cleanup work.

Maybe the proper fix would be doing something like this on starlette?

        except asyncio.CancelledError:
            # CancelledError means this task was asked to terminate and it is not actually
            # an error. We want it to behave just like the `else` below, but reraise
            # it to make sure it gets propagated up to the call chain
            await send({"type": "lifespan.shutdown.complete"})
            raise
        except BaseException:
            exc_text = traceback.format_exc()
            if started:
                await send({"type": "lifespan.shutdown.failed", "message": exc_text})
            else:
                await send({"type": "lifespan.startup.failed", "message": exc_text})
            raise
        else:
            await send({"type": "lifespan.shutdown.complete"})

Not totally sure about the await send({"type": "lifespan.shutdown.complete"}) (makes sense to me, but you know more), but not handling it as an error seems to me to be correct (but again, you know more =P)

btw. Nice to have meet you at pycon italia! :)

@bellini666
Copy link
Author

@Kludex let me know if you agree with my conclusion in my previous comment, in which case I'm going to close this and propose a PR at starlette instead :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants