-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Reduce Context Switching #2065
base: master
Are you sure you want to change the base?
Conversation
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details. |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
This looks promising. |
I does, I was looking to get more time to read in details on the topic and to try to put together some basic testing, as validate the change |
Normal behaviour for multiple processes using EPOLL to listen on a single FD is for every process to be woken on every IO event. This can cause a thundering herd effect, increasing context switches and cpu usage. With EPOLLEXLUSIVE only a single UDP worker will be woken to handle an IO request greatly reducing context switching and contention, especially as the number of processes grow. One potential downside to using EPOLLEXCLUSIVE is that EPOLL may coalesce multiple events on a file descriptor into a single wakeup. This has the potential to increase latency if only a single process is woken to handle potentially multiple SIP messages. To help balance latency and reduced thundering this patch causes the first worker for a socket to not use EPOLLEXCLUSIVE and thus ALWAYS get woken for events. If present, at least one other worker using EPOLLEXCLUSIVE will also be woken.
…n everycall. With the use of EPOLLEXCLUSIVE we want to ensure that a single process is not being woke to handle multiple FDs at once. This allows EPOLL to better distribute wake-ups to processes that are actually ready to run. Without this a single process may be woken to handle IO for multiple FDs, while other processes are available waiting for work.
A single IO wake-up can correspond to multiple actual IO events/waiting IO. Currently, after handling a single event we go back to waiting on the FD, where we will be immediatly woke again because of the already waiting IO. This increases context switches and can increase latency. By handling all the IO possible on every wakeup before waiting again we can reduce both of these.
e4d33e7
to
d9e3fbe
Compare
These are changes that I am forward porting from our internal 2.4 branch that greatly reduces context switching under Linux. It uses EPOLLSXCLUSIVE when available and makes some changes to how events are handled before a process goes back to waiting.
Changes are mostly around UDP with some changes around internal pipes (IPS, timer) as they get the largest benefit. TCP is left untouched. Some of these changes could also be applied to the TCP listening socket, but I don't know how much of a benefit that would make (we only run UDP).
Not fully tested under master other than it compiles and OpenSIPS seems to start fine.
Under similar loads these changes on 2.4 brought context switching (as reported by vmstat) down to 25k/sec from around 45k/sec. CPU Usage as reported by our hypervisor dropped by 5-10% under light load and by nearly 20% under heavier loads.
It takes some care to avoid the pitfalls of using EPOLLEXCLUSIVE, and we have found that it works well in production.
Mostly just wanted to get this out so that some of the ideas might be incorporated before the next LTS.
Let me know what you think.