Fix bug where channel could outlive event loop #536

graebm · 2022-12-19T23:23:34Z

Issue:
Crashes in aws-crt-python if a WebSocket outlived an EventLoopGroup.

Investigation:
aws_channel is refcounted. It uses an aws_event_loop, but event loops do not currently support refcounting.

The aws_event_loop_group is refcounted, but aws_channel only currently knows about an individual loop.

So it's possible for the refcount on aws_event_loop_group to go to 0, and all the loops are destroyed, before the channel's refcount goes to zero, and it tries to schedule a task on the loop, and everything explodes because the loop was already cleaned up

Description of changes:
Add new function aws_event_loop_group_acquire_hold_on_group(struct aws_event_loop *); so that channel can refcount the group, even though it only has access to the individual loop

I considered adding a proper refcount to individual the aws_event_loop, but there's a good bit of code that assumes the lifetime of the loops is tied to the lifetime of the group, so I didn't want to break that assumption.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov-commenter · 2022-12-20T00:10:05Z

Codecov Report

Base: 79.05% // Head: 79.10% // Increases project coverage by +0.05% 🎉

Coverage data is based on head (f26f314) compared to base (6f8b922).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #536      +/-   ##
==========================================
+ Coverage   79.05%   79.10%   +0.05%     
==========================================
  Files          25       25              
  Lines        5528     5537       +9     
==========================================
+ Hits         4370     4380      +10     
+ Misses       1158     1157       -1

Impacted Files	Coverage Δ
source/channel.c	`89.61% <100.00%> (+0.23%)`	⬆️
source/event_loop.c	`91.04% <100.00%> (+0.32%)`	⬆️
source/linux/epoll_event_loop.c	`86.88% <100.00%> (+0.05%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

graebm added 4 commits December 16, 2022 21:05

Fix bug with increment read window

617f49a

add regression test to prove the event-loop can die before the channel

82ff4f5

channel keeps the event loop's group alive

0d736c9

i do hereby solemly swear that this does not affect the proofs

f26f314

sbSteveK approved these changes Dec 20, 2022

View reviewed changes

waahm7 approved these changes Dec 20, 2022

View reviewed changes

TingDaoK approved these changes Dec 20, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug where channel could outlive event loop #536

Fix bug where channel could outlive event loop #536

graebm commented Dec 19, 2022 •

edited

Loading

codecov-commenter commented Dec 20, 2022

Fix bug where channel could outlive event loop #536

Are you sure you want to change the base?

Fix bug where channel could outlive event loop #536

Conversation

graebm commented Dec 19, 2022 • edited Loading

codecov-commenter commented Dec 20, 2022

Codecov Report

graebm commented Dec 19, 2022 •

edited

Loading