Jeremy Fitzhardinge
2008-Jul-04 22:17 UTC
[Xen-devel] Some initial measurements comparing spinlock algorithms
I did some kernbench tests with various spinlock algorithms. I tried the default ticket locks, the old lock-byte spinlock, and a Xen-specific spin-then-block lock algorithm. The test VM is a 4 vcpu guest with 1GB of memory, running on a 2 cpu host. The idea is to provoke self-stealing due to over-committed CPUs, exacerbating any bad preemption behaviours the various lock algorithms may have. The kernel is my current pvops development tree, so 2.6.26-rc8+patches, running 32-bit. I ran "kernbench -M", which avoids the "make -j" saturation test. The first test was with ticket locks: Fri Jul 4 13:25:54 BST 2008 2.6.26-rc8-tip - ticket locks Average Half load -j 3 Run (std deviation): Elapsed Time 503.002 (19.3737) User Time 563.494 (0.562699) System Time 146.404 (6.94372) Percent CPU 141 (4.63681) Context Switches 54069.4 (458.201) Sleeps 49098.4 (367.281) (Aborted optimal run after many hours. EIP sampled to __ticket_spin_lock+16) The first half-load test finished in a reasonable time period, but the "optimal load" (make -j16) test never terminated. After around 6 hours of running, it didn''t get past the first pass of 5. Sampling eip showed it was always in __ticket_spin_lock on all processors. This is a pretty dramatic confirmation of Thomas''s results. The second test was with lock-byte spinlocks: 2.6.26-rc8-tip - bytelocks Average Half load -j 3 Run (std deviation): Elapsed Time 410.686 (2.49314) User Time 564.596 (0.710408) System Time 130.2 (0.519856) Percent CPU 168.6 (1.34164) Context Switches 53195.8 (599.579) Sleeps 49026 (568.152) Average Optimal load -j 16 Run (std deviation): Elapsed Time 326.226 (0.158367) User Time 552.268 (13.0477) System Time 117.686 (13.2014) Percent CPU 182.9 (15.103) Context Switches 68198.8 (15849.9) Sleeps 51708.1 (2857.7) vcpu use: fedora9-x86_32 246 0 1 -b- 2050.1 any cpu fedora9-x86_32 246 1 0 -b- 2044.4 any cpu fedora9-x86_32 246 2 1 -b- 2032.3 any cpu fedora9-x86_32 246 3 0 -b- 2024.1 any cpu This shows that the old spinlock behaviour has better performance. For one, the test completed properly under load. The half-load test shows about the same amount of user time, but less system time used, better cpu utilisation. "xm vcpu-list" shows about 2020-2050 seconds of overall cpu use. And with the xen-pv locks: Fri Jul 4 18:37:36 BST 2008 2.6.26-rc8-tip - xenpv locks Average Half load -j 3 Run (std deviation): Elapsed Time 338.98 (0.932121) User Time 567.326 (0.416569) System Time 132.802 (1.56383) Percent CPU 206 (0) Context Switches 50225 (499.58) Sleeps 48687.6 (542.278) Average Optimal load -j 16 Run (std deviation): Elapsed Time 323.176 (0.251555) User Time 555.099 (12.898) System Time 117.882 (15.7619) Percent CPU 202.7 (3.49761) Context Switches 67133.4 (17837.1) Sleeps 51669.8 (3210.78) fedora9-x86_32 4 0 1 -b- 1857.3 any cpu fedora9-x86_32 4 1 1 -b- 1821.7 any cpu fedora9-x86_32 4 2 0 -b- 1821.0 any cpu fedora9-x86_32 4 3 0 r-- 1787.3 any cpu The pv locks show a marked improvement again: the cpu utilisation is up to the ideal 200%, and less elapsed time (at least for the half load). System time and user time is about the same or slightly worse. But the most signficiant result is the overall reduced CPU usage shown by xm vcpu-list. This shows that even if the guest performance is more or less unchanged, it improves overall system scaling. The pv-spinlock algorithm sets up an event channel for each vcpu. After spinning for 2^10 iterations, it then falls into a poll hypercall waiting for an event. When the lock holder releases the lock, it checks to see if anyone is waiting and kicks them with an IPI event to unblock them. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel