Waiman Long
2019-Mar-25  15:57 UTC
[PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
It was found that passing an invalid cpu number to pv_vcpu_is_preempted()
might panic the kernel in a VM guest. For example,
[    2.531077] Oops: 0000 [#1] SMP PTI
  :
[    2.532545] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[    2.533321] RIP: 0010:__raw_callee_save___kvm_vcpu_is_preempted+0x0/0x20
To guard against this kind of kernel panic, check is added to
pv_vcpu_is_preempted() to make sure that no invalid cpu number will
be used.
Signed-off-by: Waiman Long <longman at redhat.com>
---
 arch/x86/include/asm/paravirt.h | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c25c38a05c1c..4cfb465dcde4 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -671,6 +671,12 @@ static __always_inline void pv_kick(int cpu)
 
 static __always_inline bool pv_vcpu_is_preempted(long cpu)
 {
+	/*
+	 * Guard against invalid cpu number or the kernel might panic.
+	 */
+	if (WARN_ON_ONCE((unsigned long)cpu >= nr_cpu_ids))
+		return false;
+
 	return PVOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu);
 }
 
-- 
2.18.1
Juergen Gross
2019-Mar-25  16:40 UTC
[PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
On 25/03/2019 16:57, Waiman Long wrote:> It was found that passing an invalid cpu number to pv_vcpu_is_preempted() > might panic the kernel in a VM guest. For example, > > [ 2.531077] Oops: 0000 [#1] SMP PTI > : > [ 2.532545] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > [ 2.533321] RIP: 0010:__raw_callee_save___kvm_vcpu_is_preempted+0x0/0x20 > > To guard against this kind of kernel panic, check is added to > pv_vcpu_is_preempted() to make sure that no invalid cpu number will > be used. > > Signed-off-by: Waiman Long <longman at redhat.com> > --- > arch/x86/include/asm/paravirt.h | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index c25c38a05c1c..4cfb465dcde4 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -671,6 +671,12 @@ static __always_inline void pv_kick(int cpu) > > static __always_inline bool pv_vcpu_is_preempted(long cpu) > { > + /* > + * Guard against invalid cpu number or the kernel might panic. > + */ > + if (WARN_ON_ONCE((unsigned long)cpu >= nr_cpu_ids)) > + return false; > + > return PVOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu); > }Can this really happen without being a programming error? Basically you'd need to guard all percpu area accesses to foreign cpus this way. Why is this one special? Juergen
Waiman Long
2019-Mar-25  18:03 UTC
[PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
On 03/25/2019 12:40 PM, Juergen Gross wrote:> On 25/03/2019 16:57, Waiman Long wrote: >> It was found that passing an invalid cpu number to pv_vcpu_is_preempted() >> might panic the kernel in a VM guest. For example, >> >> [ 2.531077] Oops: 0000 [#1] SMP PTI >> : >> [ 2.532545] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 >> [ 2.533321] RIP: 0010:__raw_callee_save___kvm_vcpu_is_preempted+0x0/0x20 >> >> To guard against this kind of kernel panic, check is added to >> pv_vcpu_is_preempted() to make sure that no invalid cpu number will >> be used. >> >> Signed-off-by: Waiman Long <longman at redhat.com> >> --- >> arch/x86/include/asm/paravirt.h | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h >> index c25c38a05c1c..4cfb465dcde4 100644 >> --- a/arch/x86/include/asm/paravirt.h >> +++ b/arch/x86/include/asm/paravirt.h >> @@ -671,6 +671,12 @@ static __always_inline void pv_kick(int cpu) >> >> static __always_inline bool pv_vcpu_is_preempted(long cpu) >> { >> + /* >> + * Guard against invalid cpu number or the kernel might panic. >> + */ >> + if (WARN_ON_ONCE((unsigned long)cpu >= nr_cpu_ids)) >> + return false; >> + >> return PVOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu); >> } > Can this really happen without being a programming error?This shouldn't happen without a programming error, I think. In my case, it was caused by a race condition leading to use-after-free of the cpu number. However, my point is that error like that shouldn't cause the kernel to panic.> Basically you'd need to guard all percpu area accesses to foreign cpus > this way. Why is this one special?It depends. If out-of-bound access can only happen with obvious programming error, I don't think we need to guard against them. In this case, I am not totally sure if the race condition that I found may happen with existing code or not. To be prudent, I decide to send this patch out. The race condition that I am looking at is as follows: ? CPU 0???????????????????????? CPU 1 ? -----???????????????????????? ----- up_write: ? owner = NULL; ? <release-barrier> ? count = 0; <rcu-free task structure> ? ????????????????????????? rwsem_can_spin_on_owner: ??????????????????????????? rcu_read_lock(); ??????????????????????????? read owner; ????????????????????????????? : ??????????????????????????? vcpu_is_preempted(owner->cpu); ????????????????????????????? : ??????????????????????????? rcu_read_unlock() When I tried to merge the owner into the count (clear the owner after the barrier), I can reproduce the crash 100% when booting up the kernel in a VM guest. However, I am not sure if the configuration above is safe and is just very hard to reproduce. Alternatively, I can also do the cpu check before calling vcpu_is_preempted(). Cheers, Longman
Possibly Parallel Threads
- [PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
- [PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
- [PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
- [PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
- [PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()