-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPC connectivity stops for good in high traffic #9594
Comments
To be clear, the issue isn't performance degradation due to heavy logging, it is the server stopping accepting connections after this:
Note that if you trace around, the EBADF might come from another function, set_option is just the most likely to get whacked. |
In that case, I left it running for about 10 minutes. But I don't have any
in my logs. Usually how long it takes for the exception to show up? |
In three attempts, about... 5 seconds, 5 seconds, 20 seconds maybe. After waiting for servers to be running. This is on master from a1dc85c. |
Running: ./tests/functional_tests/functional_tests_rpc.py /usr/bin/python tests/functional_tests/ build/Linux/master/release/ daemon_info |
I am hitting the infinite while loop correctly. But haven't been able to reproduce the I will update you if anything comes up, and will do it on bare metal machine too. I am doing it on a VM right now. |
I'm running on an old Fedora VM. I'll try setting up a more recent one later, it might be a dep issue if you can't get it to happen. |
I tried on a vm:
|
Also happens pretty much instantly on Fedora 41, GCC 14.2.1. |
Also Debian 12, GCC 12.2.0. All of this running in Qubes OS, so there might be something weird to do with xen I guess, though it does seem a bit unlikely. |
I've been debugging it on and off in Townforge for quite a long time, as I thought it was specific to my changes, but I can actually get it to happen in Monero reliably. Townforge has quite heavy TF specific functional tests, which trigger is reliably, and I got Monero to trigger it reliably by simply calling a RPC over and over, with this patch:
Note that setting log level to 3 is needed here. Running with log level 1 will not trigger it. In Townforge, log level 1 is fine. Log level 2 will trigger fairly quickly. Monero with log level 3 will trigger is pretty much at once.
Once triggered, it never recovers. I tried adding recovery code in Townforge, to no avail (that may be because the underlying issue is not what I vaguely expect it to be).
The symptoms are en exception in handle_accept, where a syscall returns EBADF. The socket is valid at the start of the function, and becomes invalid somewhere along the execution of handle_accept. AFAICT this is not a case of the connection being destroyed by another thread, but I'd be happy to be shown to be wrong there since it's the obvious inference.
I've spent days on this over the months, I hope someone with more networking chops can have a try at it.
Note that there's been reports of RPC connectivity going down over the years, that's probably the same thing.
The text was updated successfully, but these errors were encountered: