Konrad Rzeszutek Wilk
2013-Sep-07 13:46 UTC
[PATCH 1/5] xen/spinlock: Fix locking path engaging too soon under PVHVM.
The xen_lock_spinning has a check for the kicker interrupts and if it is not initialised it will spin normally (not enter the slowpath). But for PVHVM case we would initialise the kicker interrupt before the CPU came online. This meant that if the booting CPU used a spinlock and went in the slowpath - it would enter the slowpath and block forever. The forever part b/c during bootup the interrupts are disabled - so the CPU would never get an IPI kick and would stay stuck in the slowpath logic forever. Why would the booting CPU never get an IPI kick? B/c in both PV and PVHVM we consult the cpu_online_mask to determine whether the IPI should go to its CPU destination. Since the booting CPU has not yet finished and set that flag, it meant that if any spinlocks were taken before the booting CPU had gotten to: set_cpu_online(smp_processor_id(), true); it (booting CPU) we would never get the unkicker IPI (from xen_unlock_kick) and block forever. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/enlighten.c | 1 - arch/x86/xen/smp.c | 9 +++++++++ 2 files changed, 9 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 193097e..fbc002c 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1689,7 +1689,6 @@ static int xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action, case CPU_UP_PREPARE: xen_vcpu_setup(cpu); if (xen_have_vector_callback) { - xen_init_lock_cpu(cpu); if (xen_feature(XENFEAT_hvm_safe_pvclock)) xen_setup_timer(cpu); } diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 597655b..4db779d 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -703,6 +703,15 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle) WARN_ON(rc); if (!rc) rc = native_cpu_up(cpu, tidle); + + /* + * We must initialize the slowpath CPU kicker _after_ the native + * path has executed. If we initialized it before none of the + * unlocker IPI kicks would reach the booting CPU as the booting + * CPU had not set itself ''online'' in cpu_online_mask. That mask + * is checked when IPIs are sent (on HVM at least). + */ + xen_init_lock_cpu(cpu); return rc; } -- 1.7.7.6
David Vrabel
2013-Sep-09 10:31 UTC
Re: [PATCH 1/5] xen/spinlock: Fix locking path engaging too soon under PVHVM.
On 07/09/13 14:46, Konrad Rzeszutek Wilk wrote:> The xen_lock_spinning has a check for the kicker interrupts > and if it is not initialised it will spin normally (not enter > the slowpath). > > But for PVHVM case we would initialise the kicker interrupt > before the CPU came online. This meant that if the booting > CPU used a spinlock and went in the slowpath - it would > enter the slowpath and block forever. The forever part b/cb/c? Ewww. Proper English please.> during bootup the interrupts are disabled - so the CPU would > never get an IPI kick and would stay stuck in the slowpath > logic forever.This description isn''t right -- VCPUs blocked in SCHEDOP_poll can be unblocked on the event they''re waiting for even if local irq delivery is disabled.> Why would the booting CPU never get an IPI kick? B/c in both > PV and PVHVM we consult the cpu_online_mask to determine whether > the IPI should go to its CPU destination. Since the booting > CPU has not yet finished and set that flag, it meant that > if any spinlocks were taken before the booting CPU had gotten to:I can''t find where the online mask is being checked in xen_send_IPI_one(). Is this really the reason why it didn''t work? This fix looks fine but both the description and the comment need to be fixed/clarified. David
Konrad Rzeszutek Wilk
2013-Sep-09 13:06 UTC
Re: [PATCH 1/5] xen/spinlock: Fix locking path engaging too soon under PVHVM.
On Mon, Sep 09, 2013 at 11:31:23AM +0100, David Vrabel wrote:> On 07/09/13 14:46, Konrad Rzeszutek Wilk wrote: > > The xen_lock_spinning has a check for the kicker interrupts > > and if it is not initialised it will spin normally (not enter > > the slowpath). > > > > But for PVHVM case we would initialise the kicker interrupt > > before the CPU came online. This meant that if the booting > > CPU used a spinlock and went in the slowpath - it would > > enter the slowpath and block forever. The forever part b/c > > b/c? Ewww. Proper English please. > > > during bootup the interrupts are disabled - so the CPU would > > never get an IPI kick and would stay stuck in the slowpath > > logic forever. > > This description isn''t right -- VCPUs blocked in SCHEDOP_poll can be > unblocked on the event they''re waiting for even if local irq delivery is > disabled. > > > Why would the booting CPU never get an IPI kick? B/c in both > > PV and PVHVM we consult the cpu_online_mask to determine whether > > the IPI should go to its CPU destination. Since the booting > > CPU has not yet finished and set that flag, it meant that > > if any spinlocks were taken before the booting CPU had gotten to: > > I can''t find where the online mask is being checked in > xen_send_IPI_one(). Is this really the reason why it didn''t work?More details in fc78d343fa74514f6fd117b5ef4cd27e4ac30236 Author: Chuck Anderson <chuck.anderson@oracle.com> Date: Tue Aug 6 15:12:19 2013 -0700 xen/smp: initialize IPI vectors before marking CPU online I will add that part in.> > This fix looks fine but both the description and the comment need to be > fixed/clarified.U r Right!> > David
Konrad Rzeszutek Wilk
2013-Sep-09 15:11 UTC
[PATCH 1/5] xen/spinlock: Fix locking path engaging too soon under PVHVM.
The xen_lock_spinning has a check for the kicker interrupts and if it is not initialized it will spin normally (not enter the slowpath). But for PVHVM case we would initialize the kicker interrupt before the CPU came online. This meant that if the booting CPU used a spinlock and went in the slowpath - it would enter the slowpath and block forever. The forever part because during bootup: the spinlock would be taken _before_ the CPU sets itself to be online (more on this further), and we enter to poll on the event channel forever. The bootup CPU (see commit fc78d343fa74514f6fd117b5ef4cd27e4ac30236 "xen/smp: initialize IPI vectors before marking CPU online" for details) and the CPU that started the bootup consult the cpu_online_mask to determine whether the booting CPU should get an IPI. The booting CPU has to set itself in this mask via: set_cpu_online(smp_processor_id(), true); However, if the spinlock is taken before this (and it is) and it polls on an event channel - it will never be woken up as the kernel will never send an IPI to an offline CPU. Note that the PVHVM logic in sending IPIs is using the HVM path which has numerous checks using the cpu_online_mask and cpu_active_mask. See above mention git commit for details. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> --- arch/x86/xen/enlighten.c | 1 - arch/x86/xen/smp.c | 9 +++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 2fc216d..fa6ade7 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1692,7 +1692,6 @@ static int xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action, case CPU_UP_PREPARE: xen_vcpu_setup(cpu); if (xen_have_vector_callback) { - xen_init_lock_cpu(cpu); if (xen_feature(XENFEAT_hvm_safe_pvclock)) xen_setup_timer(cpu); } diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 9235842..c21b825 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -709,6 +709,15 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle) WARN_ON(rc); if (!rc) rc = native_cpu_up(cpu, tidle); + + /* + * We must initialize the slowpath CPU kicker _after_ the native + * path has executed. If we initialized it before none of the + * unlocker IPI kicks would reach the booting CPU as the booting + * CPU had not set itself ''online'' in cpu_online_mask. That mask + * is checked when IPIs are sent (on HVM at least). + */ + xen_init_lock_cpu(cpu); return rc; } -- 1.8.3.1