At the most recent Xen Summit, Thomas Friebel presented a paper ("Preventing Guests from Spinning Around", http://xen.org/files/xensummitboston08/LHP.pdf) investigating the interactions between spinlocks and virtual machines. Specifically, he looked at what happens when a lock-holding VCPU gets involuntarily preempted. The obvious first order effect is that while the VCPU is not running, the effective critical region time goes from being microseconds to milliseconds, until it gets scheduled again. This increases the chance that there will be be contention, and the contending VCPU will waste time spinning. This is a measurable effect, but not terribly serious. After all, since Linux tends to hold locks for very short periods of time, the likelihood of being preempted while holding a lock is low. The real eye-opener is the secondary effects specific to ticket locks. Clearly ticket locks suffer the same problem as all spinlocks. But when the lock holder releases the lock, the real fun begins. By design, ticket locks are strictly fair, by imposing a FIFO order lock holders. The micro-architectural effect of this is that the lock cache line will bounce around between the contending CPUs until it finds the next in line, who then takes the lock and carries on. When running in a virtual machine, a similar effect happens at the VCPU level. If all the contending VCPUs are not currently running on real CPUs, then VCPU scheduler will run some random subset of them. If it isn't a given VCPUs turn to take the lock, it will spin, burning a VCPU timeslice. Eventually the next-in-line will get scheduled, take the lock, release it, and the remaining contending VCPUs will repeat the process until the next in line is scheduled. This means that the effective contention time of the lock is not merely the time if takes the original lock-holder to take and release the lock - including any preemption it may suffer - but the spin-scheduling storm that follows to schedule the right VCPU to next take the lock. This could happen if the original contention was not as a result of preemption, but just normal spinlock level contention. One of the results Thomas presents is a kernbench run which normally takes less than a minute going for 45 minutes, with 99+% spent in ticket lock contention. I've reproduced similar results. This series has: - a paravirt_ops spinlock interface, which defaults to the standard ticket lock algorithm, - a second spinlock implementation based on the pre-ticket-lock "lock-byte" algorithm, - And a Xen-specific spinlock algorithm which voluntarily preempts a VCPU if it spins for too long. [FOR REFERENCE ONLY: will not apply to a current git tree.] When running on native hardware, the overhead of enabling CONFIG_PARAVIRT is an extra direct call/return on the lock/unlock paths; the paravirt-ops patching machinery eliminates any indirect calls. With a small amount of restructuring, this overhead could be eliminated (by making spin_lock()/unlock() inline functions, containing calls to __raw_spin_lock/unlock). My experiments show that using a Xen-specific lock helps guest performance a bit (reduction in elapsed and system time in a kernbench run), but most significantly, reduces overall physical CPU consumption by 10%, and so increases overall system scalability. J --
Jeremy Fitzhardinge
2008-Jul-07 19:07 UTC
[PATCH RFC 1/4] x86/paravirt: add hooks for spinlock operations
An embedded and charset-unspecified text was scrubbed... Name: paravirt-spinlocks.patch Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20080707/7d54191c/attachment.txt
Jeremy Fitzhardinge
2008-Jul-07 19:07 UTC
[PATCH RFC 2/4] paravirt: introduce a "lock-byte" spinlock implementation
An embedded and charset-unspecified text was scrubbed... Name: paravirt-byte-spinlock.patch Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20080707/a915a035/attachment.txt
Jeremy Fitzhardinge
2008-Jul-07 19:07 UTC
[PATCH RFC 3/4] xen: use lock-byte spinlock implementation
An embedded and charset-unspecified text was scrubbed... Name: xen-use-byte-locks.patch Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20080707/c9f209f0/attachment.txt
Jeremy Fitzhardinge
2008-Jul-07 19:07 UTC
[PATCH RFC 4/4] xen: implement Xen-specific spinlocks
An embedded and charset-unspecified text was scrubbed... Name: xen-pv-spinlocks.patch Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20080707/596b2a31/attachment.txt
On Tuesday 08 July 2008 05:07:49 Jeremy Fitzhardinge wrote:> At the most recent Xen Summit, Thomas Friebel presented a paper > ("Preventing Guests from Spinning Around", > http://xen.org/files/xensummitboston08/LHP.pdf) investigating the > interactions between spinlocks and virtual machines. Specifically, he > looked at what happens when a lock-holding VCPU gets involuntarily > preempted.I find it interesting that gang scheduling the guest was not suggested as an obvious solution. Anyway, concept looks fine; lguest's solution is more elegant of course :) A little disappointing that you can't patch your version inline. Cheers, Rusty.
* Jeremy Fitzhardinge <jeremy at goop.org> wrote:> My experiments show that using a Xen-specific lock helps guest > performance a bit (reduction in elapsed and system time in a kernbench > run), but most significantly, reduces overall physical CPU consumption > by 10%, and so increases overall system scalability.that's rather impressive and looks nice, considering the fairly low impact. as there were no fundamental objections in this thread i've created a tip/x86/paravirt-spinlocks topic branch for these patches and started testing them. i based the topic branch on tip/xen-64bit, so you should be able to get the latest code by doing: git-merge tip/x86/paravirt-spinlocks on tip/master. Ingo
Ingo Molnar
2008-Jul-09 12:39 UTC
[patch] x86: paravirt spinlocks, modular build fix (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks)
another small fixlet. Ingo -----------------> commit 0ea0f032767f41fb0a9586ab6acb5c1408ef16c5 Author: Ingo Molnar <mingo at elte.hu> Date: Wed Jul 9 14:39:15 2008 +0200 x86: paravirt spinlocks, modular build fix fix: MODPOST 408 modules ERROR: "pv_lock_ops" [net/dccp/dccp.ko] undefined! ERROR: "pv_lock_ops" [fs/jbd2/jbd2.ko] undefined! ERROR: "pv_lock_ops" [drivers/media/common/saa7146_vv.ko] undefined! Signed-off-by: Ingo Molnar <mingo at elte.hu> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 6aa8aed..3edfd7a 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -472,6 +472,7 @@ struct pv_lock_ops pv_lock_ops = { .spin_unlock = __ticket_spin_unlock, #endif }; +EXPORT_SYMBOL_GPL(pv_lock_ops); EXPORT_SYMBOL_GPL(pv_time_ops); EXPORT_SYMBOL (pv_cpu_ops);
the patches caused a boot hang with this config: http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad i have bisected it down to: commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1 Author: Jeremy Fitzhardinge <jeremy at goop.org> Date: Mon Jul 7 12:07:53 2008 -0700 xen: implement Xen-specific spinlocks i.e. applying that patch alone causes the hang. The hang happens in the ftrace self-test: initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs calling init_sched_switch_trace+0x0/0x4c Testing tracer sched_switch: PASSED initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs calling init_function_trace+0x0/0x12 Testing tracer ftrace: [hard hang] it should have continued like this: Testing tracer ftrace: PASSED initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs calling init_irqsoff_tracer+0x0/0x14 Testing tracer irqsoff: PASSED initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs calling init_mmio_trace+0x0/0x12 initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs note: ftrace is a user of raw spinlocks. Another speciality of the config is that it has paravirt enabled (obviously), and that it also has lock debugging enabled. Ingo
Ingo Molnar
2008-Jul-09 13:49 UTC
[patch] x86, paravirt-spinlocks: fix boot hang (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks)
* Ingo Molnar <mingo at elte.hu> wrote:> the patches caused a boot hang with this config: > > http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.badok, fixed it via the patch below. Ingo --------------> commit 6412367fe22dc54cc727e7bec5bd65c36c1a0754 Author: Ingo Molnar <mingo at elte.hu> Date: Wed Jul 9 15:42:09 2008 +0200 x86, paravirt-spinlocks: fix boot hang the paravirt-spinlock patches caused a boot hang with this config: http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad i have bisected it down to: | commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1 | Author: Jeremy Fitzhardinge <jeremy at goop.org> | Date: Mon Jul 7 12:07:53 2008 -0700 | | xen: implement Xen-specific spinlocks i.e. applying that patch alone causes the hang. The hang happens in the ftrace self-test: initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs calling init_sched_switch_trace+0x0/0x4c Testing tracer sched_switch: PASSED initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs calling init_function_trace+0x0/0x12 Testing tracer ftrace: [hard hang] it should have continued like this: Testing tracer ftrace: PASSED initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs calling init_irqsoff_tracer+0x0/0x14 Testing tracer irqsoff: PASSED initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs calling init_mmio_trace+0x0/0x12 initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs the problem is that such lowlevel primitives as spinlocks should never be built with -pg (which ftrace does). Marking paravirt.o as non-pg and marking all spinlock ops as always-inline solve the hang. Signed-off-by: Ingo Molnar <mingo at elte.hu> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 2dd1ccd..faba669 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -7,10 +7,11 @@ extra-y := head_$(BITS).o head$(BITS).o head.o init_task.o vmlinu CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE) ifdef CONFIG_FTRACE -# Do not profile debug utilities +# Do not profile debug and lowlevel utilities CFLAGS_REMOVE_tsc_64.o = -pg CFLAGS_REMOVE_tsc_32.o = -pg CFLAGS_REMOVE_rtc.o = -pg +CFLAGS_REMOVE_paravirt.o = -pg endif #
Ingo Molnar wrote:> * Ingo Molnar <mingo at elte.hu> wrote: > > >> the patches caused a boot hang with this config: >> >> http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad >> > > ok, fixed it via the patch below. > > Ingo > > --------------> > commit 6412367fe22dc54cc727e7bec5bd65c36c1a0754 > Author: Ingo Molnar <mingo at elte.hu> > Date: Wed Jul 9 15:42:09 2008 +0200 > > x86, paravirt-spinlocks: fix boot hang > > the paravirt-spinlock patches caused a boot hang with this config: > > http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad > > i have bisected it down to: > > | commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1 > | Author: Jeremy Fitzhardinge <jeremy at goop.org> > | Date: Mon Jul 7 12:07:53 2008 -0700 > | > | xen: implement Xen-specific spinlocks > > i.e. applying that patch alone causes the hang. The hang happens in the > ftrace self-test: > > initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs > calling init_sched_switch_trace+0x0/0x4c > Testing tracer sched_switch: PASSED > initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs > calling init_function_trace+0x0/0x12 > Testing tracer ftrace: > [hard hang] > > it should have continued like this: > > Testing tracer ftrace: PASSED > initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs > calling init_irqsoff_tracer+0x0/0x14 > Testing tracer irqsoff: PASSED > initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs > calling init_mmio_trace+0x0/0x12 > initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs > > the problem is that such lowlevel primitives as spinlocks should never > be built with -pg (which ftrace does). Marking paravirt.o as non-pg and > marking all spinlock ops as always-inline solve the hang. >Thanks Ingo, that would have taken me a while to work out. Presumably that means that xen/smp.o should also be built without -pg. In fact, should I move the spinlock code into its own file, so that ftrace can cover as much as it can? J
Seemingly Similar Threads
- [PATCH RFC 0/4] Paravirtual spinlocks
- [PATCH RFC 0/4] Paravirtual spinlocks
- [PATCH v3 0/11] tests/qemu: Add program for tracing and analyzing boot times.
- 3.0.0-rc2: Xen: High amount of kernel "reserved" memory, about 33% in 256MB DOMU
- [PATCH v4 0/6] tests/qemu: Add program for tracing and analyzing boot times.