On 06/07/2016 08:52, Peter Zijlstra wrote:> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote: >> change fomr v1: >> a simplier definition of default vcpu_is_preempted >> skip mahcine type check on ppc, and add config. remove dedicated macro. >> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. >> add more comments >> thanks boqun and Peter's suggestion. >> >> This patch set aims to fix lock holder preemption issues. >> >> test-case: >> perf record -a perf bench sched messaging -g 400 -p && perf report >> >> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock >> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner >> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock >> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task >> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq >> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is >> 2.49% sched-messaging [kernel.vmlinux] [k] system_call >> >> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin >> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner. >> These spin_on_onwer variant also cause rcu stall before we apply this patch set >> > > Paolo, could you help out with an (x86) KVM interface for this?If it's just for spin loops, you can check if the version field in the steal time structure has changed. Paolo> Waiman, could you see if you can utilize this to get rid of the > SPIN_THRESHOLD in qspinlock_paravirt? >
On Wed, Jul 06, 2016 at 12:44:58PM +0200, Paolo Bonzini wrote:> > Paolo, could you help out with an (x86) KVM interface for this? > > If it's just for spin loops, you can check if the version field in the > steal time structure has changed.That would require remembering the old value, no? That would work with a previous interface proposal, see: http://lkml.kernel.org/r/1466937715-6683-2-git-send-email-xinhui.pan at linux.vnet.ibm.com the vcpu_get_yield_count() thing would match that I think. However the current proposal: http://lkml.kernel.org/r/1467124991-13164-2-git-send-email-xinhui.pan at linux.vnet.ibm.com dropped that in favour of only vcpu_is_preempted(), which requires being able to tell if a (remote) vcpu is currently running or not, which iirc, isn't possible with the steal time sequence count.
2016-07-06 18:44 GMT+08:00 Paolo Bonzini <pbonzini at redhat.com>:> > > On 06/07/2016 08:52, Peter Zijlstra wrote: >> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote: >>> change fomr v1: >>> a simplier definition of default vcpu_is_preempted >>> skip mahcine type check on ppc, and add config. remove dedicated macro. >>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. >>> add more comments >>> thanks boqun and Peter's suggestion. >>> >>> This patch set aims to fix lock holder preemption issues. >>> >>> test-case: >>> perf record -a perf bench sched messaging -g 400 -p && perf report >>> >>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock >>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner >>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock >>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task >>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq >>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is >>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call >>> >>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin >>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner. >>> These spin_on_onwer variant also cause rcu stall before we apply this patch set >>> >> >> Paolo, could you help out with an (x86) KVM interface for this? > > If it's just for spin loops, you can check if the version field in the > steal time structure has changed.Steal time will not be updated until ahead of next vmentry except wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted currently, right? Regards, Wanpeng Li
On 06/07/2016 14:08, Wanpeng Li wrote:> 2016-07-06 18:44 GMT+08:00 Paolo Bonzini <pbonzini at redhat.com>: >> >> >> On 06/07/2016 08:52, Peter Zijlstra wrote: >>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote: >>>> change fomr v1: >>>> a simplier definition of default vcpu_is_preempted >>>> skip mahcine type check on ppc, and add config. remove dedicated macro. >>>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. >>>> add more comments >>>> thanks boqun and Peter's suggestion. >>>> >>>> This patch set aims to fix lock holder preemption issues. >>>> >>>> test-case: >>>> perf record -a perf bench sched messaging -g 400 -p && perf report >>>> >>>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock >>>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner >>>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock >>>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task >>>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq >>>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is >>>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call >>>> >>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin >>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner. >>>> These spin_on_onwer variant also cause rcu stall before we apply this patch set >>> >>> Paolo, could you help out with an (x86) KVM interface for this? >> >> If it's just for spin loops, you can check if the version field in the >> steal time structure has changed. > > Steal time will not be updated until ahead of next vmentry except > wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted > currently, right?Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the VCPU has been scheduled out since the last time the guest reset the bit. The guest can use an xchg to test-and-clear it. The bit can be accessed at any time, independent of the version field. Paolo