Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Reduce Context Switching #2065

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rrb3942
Copy link
Contributor

@rrb3942 rrb3942 commented Apr 3, 2020

These are changes that I am forward porting from our internal 2.4 branch that greatly reduces context switching under Linux. It uses EPOLLSXCLUSIVE when available and makes some changes to how events are handled before a process goes back to waiting.

Changes are mostly around UDP with some changes around internal pipes (IPS, timer) as they get the largest benefit. TCP is left untouched. Some of these changes could also be applied to the TCP listening socket, but I don't know how much of a benefit that would make (we only run UDP).

Not fully tested under master other than it compiles and OpenSIPS seems to start fine.

Under similar loads these changes on 2.4 brought context switching (as reported by vmstat) down to 25k/sec from around 45k/sec. CPU Usage as reported by our hypervisor dropped by 5-10% under light load and by nearly 20% under heavier loads.

It takes some care to avoid the pitfalls of using EPOLLEXCLUSIVE, and we have found that it works well in production.

Mostly just wanted to get this out so that some of the ideas might be incorporated before the next LTS.

Let me know what you think.

@bogdan-iancu bogdan-iancu self-assigned this Apr 15, 2020
@stale
Copy link

stale bot commented Apr 30, 2020

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@stale stale bot added the stale label Apr 30, 2020
@stale
Copy link

stale bot commented May 30, 2020

Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details.

@stale stale bot closed this May 30, 2020
@liviuchircu liviuchircu reopened this May 30, 2020
@stale stale bot removed the stale label May 30, 2020
@stale
Copy link

stale bot commented Jun 14, 2020

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@stale stale bot added the stale label Jun 14, 2020
@stale stale bot removed the stale label Jun 15, 2020
@wdoekes
Copy link
Contributor

wdoekes commented Apr 22, 2021

This looks promising.

@bogdan-iancu
Copy link
Member

I does, I was looking to get more time to read in details on the topic and to try to put together some basic testing, as validate the change

Normal behaviour for multiple processes using EPOLL to listen on a single FD is for every
process to be woken on every IO event. This can cause a thundering herd effect, increasing context
switches and cpu usage.

With EPOLLEXLUSIVE only a single UDP worker will be woken to handle an IO request greatly
reducing context switching and contention, especially as the number of processes grow.

One potential downside to using EPOLLEXCLUSIVE is that EPOLL may coalesce multiple events on
a file descriptor into a single wakeup. This has the potential to increase latency if only a single
process is woken to handle potentially multiple SIP messages. To help balance latency and reduced
thundering this patch causes the first worker for a socket to not use EPOLLEXCLUSIVE and thus ALWAYS
get woken for events. If present, at least one other worker using EPOLLEXCLUSIVE will also be woken.
…n everycall.

With the use of EPOLLEXCLUSIVE we want to ensure that a single process is not being
woke to handle multiple FDs at once. This allows EPOLL to better distribute wake-ups to
processes that are actually ready to run. Without this a single process may be woken to handle
IO for multiple FDs, while other processes are available waiting for work.
A single IO wake-up can correspond to multiple actual IO events/waiting IO.
Currently, after handling a single event we go back to waiting on the FD, where we will
be immediatly woke again because of the already waiting IO.

This increases context switches and can increase latency.

By handling all the IO possible on every wakeup before waiting again we can reduce both of these.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants