Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlocks in Channel.call(...) #253

Open
jxsl13 opened this issue Mar 12, 2024 · 4 comments
Open

deadlocks in Channel.call(...) #253

jxsl13 opened this issue Mar 12, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@jxsl13
Copy link

jxsl13 commented Mar 12, 2024

Describe the bug

Hi,
I'm (still) developing a wrapper for this library.
I'm trying to properly implement flow control and context handling and trying to test as much as possible, simulating connection loss and more.

For one of my tests I have a rabbitmq which is out of memory upon startup which triggers the connection blocking state.

This state seems to trigger some weird deadlocks or something along the lines in this library making this select statement block "forever":

amqp091-go/channel.go

Lines 181 to 205 in a2fcd5b

select {
case e, ok := <-ch.errors:
if ok {
return e
}
return ErrClosed
case msg := <-ch.rpc:
if msg != nil {
for _, try := range res {
if reflect.TypeOf(msg) == reflect.TypeOf(try) {
// *res = *msg
vres := reflect.ValueOf(try).Elem()
vmsg := reflect.ValueOf(msg).Elem()
vres.Set(vmsg)
return nil
}
}
return ErrCommandInvalid
}
// RPC channel has been closed without an error, likely due to a hard
// error on the Connection. This indicates we have already been
// shutdown and if were waiting, will have returned from the errors chan.
return ErrClosed
}

I have seen Channel.Close() and Channel.UnbindQueue(...) block "forever".
The blocking of Channel.UnbindQueue(...) is reproduced in the test below.

Might be related to #225 (it might be possible to reproduce "turn off the internet" with the tool that I use for my tests that's called toxiproxy)

Reproduction steps

Here is a test that reproduces the problem:

level=info, msg=creating connection,
level=info, msg=registering flow control notification channel,
level=info, msg=creating channel,
level=info, msg=registering error notification channel,
level=info, msg=registering confirms notification channel,
level=info, msg=registering flow control notification channel,
level=info, msg=registering returned message notification channel,
level=info, msg=declaring exchange,
level=info, msg=declaring queue,
level=info, msg=binding queue,
level=info, msg=publishing message,
level=info, msg=unbinding queue,  (blocks here forever)

Expected behavior

QueueBind worked, so I guess QueueUnbind should also work.
I think this behavior can be triggered for nearly every method of Channel.

Additional context

Should not be relevant but could:
darwin/arm64
macOS 14.3.1

@jxsl13 jxsl13 added the bug Something isn't working label Mar 12, 2024
@lukebakken lukebakken self-assigned this Mar 12, 2024
@lukebakken
Copy link
Contributor

Thanks for the report and the steps to reproduce this issue. I can reproduce it. As you noted, it requires a blocked RabbitMQ to reproduce.

@jxsl13 jxsl13 changed the title random deadlocks in Channel.call(...) deadlocks in Channel.call(...) Mar 12, 2024
@amotzte
Copy link

amotzte commented Apr 2, 2024

I'm pretty sure I got the something similaron QueueBind. Calling to QueueBind with noWait=false and getting stuck forever. Would it make sense to add some timeout for this operation ?

@sedhossein
Copy link

sedhossein commented Jul 16, 2024

Lack of timeout feature felt in all pkg methods. I saw almost all of the XxxWithContext methods ignoring context! As mentioned initialization steps: func (ch *Channel) call(...). I think it's better to remove all XxxWithContext methods when you want to ignore all contexts. Or it's better to have a minimal version to support context timeout, in the future, we will improve them to support canceling the process. Currently, I face a lot of deadlocks in code. Is it acceptable for first version?

func (ch *Channel) PublishWithContext(ctx context.Context, exchange, key string, mandatory, immediate bool, msg Publishing) error {
	err := make(chan error)
	go func() {
		err <- ch.Publish(exchange, key, mandatory, immediate, msg)
	}()
	select {
	case <-ctx.Done():
		return fmt.Errorf("context cancelled")
	case e := <-err:
		return e
	}
}

@jxsl13
Copy link
Author

jxsl13 commented Jul 17, 2024

Implementing correct context cancelation is not trivial.
Also no, it is no acceptable to introduce more problems than there currently are for a library that is used in production environments.

Also your example is not correct, because you will (potentially) be leaking goroutines upon context cancelation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants