Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent message delivery system can deadlock #36

Open
ettersi opened this issue Mar 23, 2023 · 0 comments
Open

Agent message delivery system can deadlock #36

ettersi opened this issue Mar 23, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@ettersi
Copy link
Collaborator

ettersi commented Mar 23, 2023

We currently have this pattern in three different places in container.jl (you can find them by searching for _dont_listen):

function foo(a::Agent, filter, priority)
  ch = Channel(1)
  _listen(a, ch, filter, priority)
  ...
  take!(ch) # One or more, but we only `take!` here
  ...
  _dont_listen(a, ch)
  close(ch)
end

This pattern can deadlock and should therefore be avoided.

The problem is that _dont_listen() acquires the _processmsg lock before deregistering the channels, but _listener_notify() and _msgloop() put!() into these channels while holding the _processmsg lock.

Fjage.jl/src/container.jl

Lines 1196 to 1205 in dc74a5e

function _dont_listen(a::Agent, ch::Channel)
lock(a._processmsg) do
for (n, (filt, ch1, p)) enumerate(a._listeners)
if ch === ch1
deleteat!(a._listeners, n)
return
end
end
end
end

Fjage.jl/src/container.jl

Lines 1207 to 1213 in dc74a5e

function _listener_notify(a::Agent)
lock(a._processmsg) do
for (filt, ch, p) a._listeners
put!(ch, nothing)
end
end
end

Fjage.jl/src/container.jl

Lines 1236 to 1258 in dc74a5e

function _msgloop(a::Agent)
@debug "Start $(a) message loop"
lock(a._processmsg) do
while wait(a._processmsg)
try
@debug "Deliver messages in $(a) [qlen=$(length(a._msgqueue))]"
filter!(a._msgqueue) do msg
for (filt, ch, p) a._listeners
if _matches(filt, msg)
isready(ch) && return true
put!(ch, msg)
return false
end
end
true
end
catch ex
@warn "Message delivery failed: $(ex)"
end
end
end
@debug "Stop $(a) message loop"
end

The deadlock occurs if

  • there is currently a message waiting in the channel, and
  • the put!() happens after the last take!() and before the _dont_listen().

A possible solution could be to move the close(ch) before the _dont_listen(a, ch) and make sure that the put!() handle the case of closed channels correctly, but as usual, this should be carefully analysed and discussed first.

@mchitre mchitre added the bug Something isn't working label Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants