-
Notifications
You must be signed in to change notification settings - Fork 51
Semaphore Lock Scaling
SV/Posix Semaphore Lock Scaling Issue
Date: 10/16/2015
High system time and poor performance for applications which employ semaphores, such as Oracle, on scale-up hardware.
On a Superdome-X system running RHEL 6.7 with Oracle 12c, extremely high system time was observed. A Linux KI Toolset data collection was taken and the kparse report flagged high CPU utilization:
1.1 Global CPU Usage Counters
nCPU sys% user% idle%
80 89.60% 8.82% 1.59%
Warning: CPU Bottleneck (Idle < 10%)
The kiprof (profile) report showed semtimedop() and semctl() system calls accounted for the majority of system CPU consumption:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Kernel Functions executed during profile
Count Pct State Function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
99387 65.74% SYS sys_semtimedop
36853 24.38% SYS sys_semctl
Examination of the hardclock records showed three main code locations:
$ grep " hardclock state=SYS sys_sem" ki.MMDD_HHMM|awk '{print $7}'|sort|uniq -c|sort -rn
65522 sys_semtimedop+0x3c1
34897 sys_semctl+0x137
33852 sys_semtimedop+0x615
A review of the kernel debug information via gdb shows inlined sem_lock()
$ cat uname-a.MMDD_HHMM
Linux tux 2.6.32-573.el6.x86_64 #1 SMP Wed Jul 1 18:23:37 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
$ gdb /usr/lib/debug/lib/modules/2.6.32-573.el6.x86_64/vmlinux
(gdb) list *(sys_semtimedop+0x3c1)
0xffffffff81221fa1 is in sys_semtimedop (ipc/sem.c:1668).
1663 error = security_sem_semop(sma, sops, nsops, alter);
1664 if (error)
1665 goto out_rcu_wakeup;
1666
1667 error = -EIDRM;
1668 locknum = sem_lock(sma, sops, nsops);
1669 if (sma->sem_perm.deleted)
1670 goto out_unlock_free;
1671 /*
1672 * semid identifiers are not unique - find_alloc_undo may have
0218 /*
0219 * If the request contains only one semaphore operation, and there are
0220 * no complex transactions pending, lock only the semaphore involved.
0221 * Otherwise, lock the entire semaphore array, since we either have
0222 * multiple semaphores in our own semops, or we need to look at
0223 * semaphores from other pending complex operations.
0224 */
0225 static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
0226 int nsops)
0227 {
sem_lock() ether obtains the spinlock protecting the entire semaphore set or the spinlock protecting the individual semaphore.
An examination of the semaphore configuration showed large values for SEMMSL (number of semaphore per semaphore set) and SEMOPM (maximum number of operations per semop system call):
$ grep sem sysctl-a.MMDD_HHMM
kernel.sem = 4096 512000 1600 2048
Reducing SEMMSL to 250 resulted in the required semaphores being spread across a greater number of semaphore sets, thus reducing the lock contention and resolving the high system CPU utilization. Note the Oracle 12c documentation recommends the following:
kernel.sem = 250 32000 100 128
Please note that prior to RHEL 6.6 (2.6.32-504) and SLES12, the following critical locking change is missing which could cause the semaphore set spinlock contention to be even worse:
BZ#880024
Previously, the locking of a semtimedop semaphore operation was not fine enough with remote non-uniform memory architecture (NUMA) node accesses. As a consequence, spinlock contention occurred, which caused delays in the semop() system call and high load on the server when running numerous parallel processes accessing the same semaphore. This update improves scalability and performance of workloads with a lot of semaphore operations, especially on larger NUMA systems. This improvement has been achieved by turning the global lock for each semaphore array into a per-semaphore lock for many semaphore operations, which allows multiple simultaneous semop() operations. As a result, performance degradation no longer occurs.
- LinuxKI Mainpage
- LinuxKI Basic Documentation
- LinuxKI 7.10 - New!!
- LinuxKI Video Series
-
LinuxKI Warnings
- High System CPU utilization during memory allocations, deallocations, and page faults
- RunQ delays for critical processes can impact performance in a variety of ways
- Performance degradation on Microsoft Windows due to TCP interrupt timeouts
- Microsoft SQLServer scaling issues caused by SQL auto statistics
- Excessive page faults on KVM host
- Large IOs (>1MB) causing performance degradation on servers with PCIe Smart Array Controllers
- Oracle column tracking causing high CPU usage by Oracle processes
- Side-Channel Attack mitigation
- High SYS CPU time by processes reading /proc/stat such
- hugetlb_fault lock contention
- Excessive CPU time in pcc_cpufreq driver
- Excessive poll() calls by Oracle
- High wait time in md_flush()
- High BLOCK SoftIRQ times
- Network Latency Tuned profile
- Power vs. Performance
- Unaligned Direct IO
- NUMA Balancing
- NUMA Off
- SAP DB2 semget
- Semaphore Lock Scaling
- Tasklet IRQs
- Unterminated ixgbe NICs
- Poor Direct IO Reads
- RHEL 7.3 / SLES 12SP2 Multipath bug
- Barrier Writes