Xen/MCE: vMCE injection In our test for win8 guest mce, we find a bug that no matter what SRAO/SRAR error xen inject to win8 guest, it always reboot. The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs). This patch fix vMCE injection bug, injecting vMCE# to all vcpus. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> diff -r 133664c6bfb4 xen/arch/x86/cpu/mcheck/vmce.c --- a/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 22:39:11 2012 +0800 +++ b/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 23:46:38 2012 +0800 @@ -340,48 +340,27 @@ int inject_vmce(struct domain *d) { - int cpu = smp_processor_id(); + struct vcpu *v; - /* PV guest and HVM guest have different vMCE# injection methods. */ - if ( !test_and_set_bool(d->vcpu[0]->mce_pending) ) + /* inject vMCE to all vcpus */ + for_each_vcpu(d, v) { - if ( d->is_hvm ) + if ( !test_and_set_bool(v->mce_pending) && + ((d->is_hvm) || + guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) ) { - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM %d\n", - d->domain_id); - vcpu_kick(d->vcpu[0]); + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n", + d->domain_id, v->vcpu_id); + vcpu_kick(v); } else { - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV DOM%d\n", - d->domain_id); - if ( guest_has_trap_callback(d, 0, TRAP_machine_check) ) - { - cpumask_copy(d->vcpu[0]->cpu_affinity_tmp, - d->vcpu[0]->cpu_affinity); - mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, old %d\n", - cpu, d->vcpu[0]->processor); - vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu)); - vcpu_kick(d->vcpu[0]); - } - else - { - mce_printk(MCE_VERBOSE, - "MCE: Kill PV guest with No MCE handler\n"); - domain_crash(d); - } + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n", + d->domain_id, v->vcpu_id); + return -1; } } - else - { - /* new vMCE comes while first one has not been injected yet, - * in this case, inject fail. [We can''t lose this vMCE for - * the mce node''s consistency]. - */ - mce_printk(MCE_QUIET, "There''s a pending vMCE waiting to be injected " - " to this DOM%d!\n", d->domain_id); - return -1; - } + return 0; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Xen/MCE: vMCE injection In our test for win8 guest mce, we find a bug that no matter what SRAO/SRAR error xen inject to win8 guest, it always reboot. The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs). This patch fix vMCE injection bug, injecting vMCE# to all vcpus. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> diff -r 133664c6bfb4 xen/arch/x86/cpu/mcheck/vmce.c --- a/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 22:39:11 2012 +0800 +++ b/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 23:46:38 2012 +0800 @@ -340,48 +340,27 @@ int inject_vmce(struct domain *d) { - int cpu = smp_processor_id(); + struct vcpu *v; - /* PV guest and HVM guest have different vMCE# injection methods. */ - if ( !test_and_set_bool(d->vcpu[0]->mce_pending) ) + /* inject vMCE to all vcpus */ + for_each_vcpu(d, v) { - if ( d->is_hvm ) + if ( !test_and_set_bool(v->mce_pending) && + ((d->is_hvm) || + guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) ) { - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM %d\n", - d->domain_id); - vcpu_kick(d->vcpu[0]); + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n", + d->domain_id, v->vcpu_id); + vcpu_kick(v); } else { - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV DOM%d\n", - d->domain_id); - if ( guest_has_trap_callback(d, 0, TRAP_machine_check) ) - { - cpumask_copy(d->vcpu[0]->cpu_affinity_tmp, - d->vcpu[0]->cpu_affinity); - mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, old %d\n", - cpu, d->vcpu[0]->processor); - vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu)); - vcpu_kick(d->vcpu[0]); - } - else - { - mce_printk(MCE_VERBOSE, - "MCE: Kill PV guest with No MCE handler\n"); - domain_crash(d); - } + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n", + d->domain_id, v->vcpu_id); + return -1; } } - else - { - /* new vMCE comes while first one has not been injected yet, - * in this case, inject fail. [We can''t lose this vMCE for - * the mce node''s consistency]. - */ - mce_printk(MCE_QUIET, "There''s a pending vMCE waiting to be injected " - " to this DOM%d!\n", d->domain_id); - return -1; - } + return 0; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 09/19/12 10:03, Liu, Jinsong wrote:> Xen/MCE: vMCE injection > > In our test for win8 guest mce, we find a bug that no matter what SRAO/SRAR > error xen inject to win8 guest, it always reboot. > > The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this is > not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs). > > This patch fix vMCE injection bug, injecting vMCE# to all vcpus.This breaks the AMD way. The AMD way is to only inject it to vcpu0. I suggest to add a flag argument to inject_vmce() that says whether to inject to all vcpus or just vcpu0. Then set/clear that flag from the caller side depending on whether you run on Intel or AMD. Christoph> > Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> > > diff -r 133664c6bfb4 xen/arch/x86/cpu/mcheck/vmce.c > --- a/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 22:39:11 2012 +0800 > +++ b/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 23:46:38 2012 +0800 > @@ -340,48 +340,27 @@ > > int inject_vmce(struct domain *d) > { > - int cpu = smp_processor_id(); > + struct vcpu *v; > > - /* PV guest and HVM guest have different vMCE# injection methods. */ > - if ( !test_and_set_bool(d->vcpu[0]->mce_pending) ) > + /* inject vMCE to all vcpus */ > + for_each_vcpu(d, v) > { > - if ( d->is_hvm ) > + if ( !test_and_set_bool(v->mce_pending) && > + ((d->is_hvm) || > + guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) ) > { > - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM %d\n", > - d->domain_id); > - vcpu_kick(d->vcpu[0]); > + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n", > + d->domain_id, v->vcpu_id); > + vcpu_kick(v); > } > else > { > - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV DOM%d\n", > - d->domain_id); > - if ( guest_has_trap_callback(d, 0, TRAP_machine_check) ) > - { > - cpumask_copy(d->vcpu[0]->cpu_affinity_tmp, > - d->vcpu[0]->cpu_affinity); > - mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, old %d\n", > - cpu, d->vcpu[0]->processor); > - vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu)); > - vcpu_kick(d->vcpu[0]); > - } > - else > - { > - mce_printk(MCE_VERBOSE, > - "MCE: Kill PV guest with No MCE handler\n"); > - domain_crash(d); > - } > + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n", > + d->domain_id, v->vcpu_id); > + return -1; > } > } > - else > - { > - /* new vMCE comes while first one has not been injected yet, > - * in this case, inject fail. [We can''t lose this vMCE for > - * the mce node''s consistency]. > - */ > - mce_printk(MCE_QUIET, "There''s a pending vMCE waiting to be injected " > - " to this DOM%d!\n", d->domain_id); > - return -1; > - } > + > return 0; > } >-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
Christoph Egger wrote:> On 09/19/12 10:03, Liu, Jinsong wrote: > >> Xen/MCE: vMCE injection >> >> In our test for win8 guest mce, we find a bug that no matter what >> SRAO/SRAR error xen inject to win8 guest, it always reboot. >> >> The root cause is, current Xen vMCE logic inject vMCE# only to >> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w >> generate MCE# to all CPUs). >> >> This patch fix vMCE injection bug, injecting vMCE# to all vcpus. > > > This breaks the AMD way. The AMD way is to only inject it to vcpu0. > I suggest to add a flag argument to inject_vmce() that says whether > to inject to all vcpus or just vcpu0. > Then set/clear that flag from the caller side depending on whether you > run on Intel or AMD. > > Christoph >No, it didn''t breaks AMD since it only called by intel_memerr_dhandler(). Thanks, Jinsong> >> >> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> >> >> diff -r 133664c6bfb4 xen/arch/x86/cpu/mcheck/vmce.c >> --- a/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 22:39:11 2012 +0800 >> +++ b/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 23:46:38 2012 +0800 >> @@ -340,48 +340,27 @@ >> >> int inject_vmce(struct domain *d) >> { >> - int cpu = smp_processor_id(); >> + struct vcpu *v; >> >> - /* PV guest and HVM guest have different vMCE# injection >> methods. */ >> - if ( !test_and_set_bool(d->vcpu[0]->mce_pending) ) >> + /* inject vMCE to all vcpus */ >> + for_each_vcpu(d, v) >> { >> - if ( d->is_hvm ) >> + if ( !test_and_set_bool(v->mce_pending) && + >> ((d->is_hvm) || + guest_has_trap_callback(d, >> v->vcpu_id, TRAP_machine_check)) ) { >> - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM >> %d\n", >> - d->domain_id); >> - vcpu_kick(d->vcpu[0]); >> + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d >> vcpu%d\n", + d->domain_id, v->vcpu_id); >> + vcpu_kick(v); >> } >> else >> { >> - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV >> DOM%d\n", >> - d->domain_id); >> - if ( guest_has_trap_callback(d, 0, TRAP_machine_check) ) >> - { >> - cpumask_copy(d->vcpu[0]->cpu_affinity_tmp, >> - d->vcpu[0]->cpu_affinity); >> - mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, >> old %d\n", >> - cpu, d->vcpu[0]->processor); >> - vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu)); >> - vcpu_kick(d->vcpu[0]); >> - } >> - else >> - { >> - mce_printk(MCE_VERBOSE, >> - "MCE: Kill PV guest with No MCE >> handler\n"); >> - domain_crash(d); >> - } >> + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d >> vcpu%d\n", + d->domain_id, v->vcpu_id); >> + return -1; >> } >> } >> - else >> - { >> - /* new vMCE comes while first one has not been injected yet, >> - * in this case, inject fail. [We can''t lose this vMCE for >> - * the mce node''s consistency]. >> - */ >> - mce_printk(MCE_QUIET, "There''s a pending vMCE waiting to be >> injected " >> - " to this DOM%d!\n", d->domain_id); >> - return -1; >> - } >> + >> return 0; >> }
On 09/20/12 21:15, Liu, Jinsong wrote:> Christoph Egger wrote: >> On 09/19/12 10:03, Liu, Jinsong wrote: >> >>> Xen/MCE: vMCE injection >>> >>> In our test for win8 guest mce, we find a bug that no matter what >>> SRAO/SRAR error xen inject to win8 guest, it always reboot. >>> >>> The root cause is, current Xen vMCE logic inject vMCE# only to >>> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w >>> generate MCE# to all CPUs). >>> >>> This patch fix vMCE injection bug, injecting vMCE# to all vcpus. >> >> >> This breaks the AMD way. The AMD way is to only inject it to vcpu0. >> I suggest to add a flag argument to inject_vmce() that says whether >> to inject to all vcpus or just vcpu0. >> Then set/clear that flag from the caller side depending on whether you >> run on Intel or AMD. >> >> Christoph >> > > No, it didn''t breaks AMD since it only called by intel_memerr_dhandler().But it will with the mce patches I still have in my queue. Christoph> Thanks, > Jinsong > >> >>> >>> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> >>> >>> diff -r 133664c6bfb4 xen/arch/x86/cpu/mcheck/vmce.c >>> --- a/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 22:39:11 2012 +0800 >>> +++ b/xen/arch/x86/cpu/mcheck/vmce.c Tue Sep 18 23:46:38 2012 +0800 >>> @@ -340,48 +340,27 @@ >>> >>> int inject_vmce(struct domain *d) >>> { >>> - int cpu = smp_processor_id(); >>> + struct vcpu *v; >>> >>> - /* PV guest and HVM guest have different vMCE# injection >>> methods. */ >>> - if ( !test_and_set_bool(d->vcpu[0]->mce_pending) ) >>> + /* inject vMCE to all vcpus */ >>> + for_each_vcpu(d, v) >>> { >>> - if ( d->is_hvm ) >>> + if ( !test_and_set_bool(v->mce_pending) && + >>> ((d->is_hvm) || + guest_has_trap_callback(d, >>> v->vcpu_id, TRAP_machine_check)) ) { >>> - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM >>> %d\n", >>> - d->domain_id); >>> - vcpu_kick(d->vcpu[0]); >>> + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d >>> vcpu%d\n", + d->domain_id, v->vcpu_id); >>> + vcpu_kick(v); >>> } >>> else >>> { >>> - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV >>> DOM%d\n", >>> - d->domain_id); >>> - if ( guest_has_trap_callback(d, 0, TRAP_machine_check) ) >>> - { >>> - cpumask_copy(d->vcpu[0]->cpu_affinity_tmp, >>> - d->vcpu[0]->cpu_affinity); >>> - mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, >>> old %d\n", >>> - cpu, d->vcpu[0]->processor); >>> - vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu)); >>> - vcpu_kick(d->vcpu[0]); >>> - } >>> - else >>> - { >>> - mce_printk(MCE_VERBOSE, >>> - "MCE: Kill PV guest with No MCE >>> handler\n"); >>> - domain_crash(d); >>> - } >>> + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d >>> vcpu%d\n", + d->domain_id, v->vcpu_id); >>> + return -1; >>> } >>> } >>> - else >>> - { >>> - /* new vMCE comes while first one has not been injected yet, >>> - * in this case, inject fail. [We can''t lose this vMCE for >>> - * the mce node''s consistency]. >>> - */ >>> - mce_printk(MCE_QUIET, "There''s a pending vMCE waiting to be >>> injected " >>> - " to this DOM%d!\n", d->domain_id); >>> - return -1; >>> - } >>> + >>> return 0; >>> } > >-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
Xen/MCE: vMCE injection for Intel MCE, broadcast vMCE to all vcpus; for AMD MCE, only inject vMCE to 1 vcpu, say, vcpu0 Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Suggested_by: Christoph Egger <Christoph.Egger@amd.com> Suggested_by: Jan Beulich <jbeulich@suse.com> diff -r 570d98e2f1cf xen/arch/x86/cpu/mcheck/mce.h --- a/xen/arch/x86/cpu/mcheck/mce.h Wed Sep 19 23:22:57 2012 +0800 +++ b/xen/arch/x86/cpu/mcheck/mce.h Wed Sep 26 18:59:03 2012 +0800 @@ -168,7 +168,7 @@ int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d, uint64_t gstatus); -int inject_vmce(struct domain *d); +int inject_vmce(struct domain *d, int vcpuid); static inline int mce_vendor_bank_msr(const struct vcpu *v, uint32_t msr) { diff -r 570d98e2f1cf xen/arch/x86/cpu/mcheck/mce_intel.c --- a/xen/arch/x86/cpu/mcheck/mce_intel.c Wed Sep 19 23:22:57 2012 +0800 +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c Wed Sep 26 18:59:03 2012 +0800 @@ -359,7 +359,7 @@ } /* We will inject vMCE to DOMU*/ - if ( inject_vmce(d) < 0 ) + if ( inject_vmce(d, -1) < 0 ) { mce_printk(MCE_QUIET, "inject vMCE to DOM%d" " failed\n", d->domain_id); diff -r 570d98e2f1cf xen/arch/x86/cpu/mcheck/vmce.c --- a/xen/arch/x86/cpu/mcheck/vmce.c Wed Sep 19 23:22:57 2012 +0800 +++ b/xen/arch/x86/cpu/mcheck/vmce.c Wed Sep 26 18:59:03 2012 +0800 @@ -338,51 +338,44 @@ HVM_REGISTER_SAVE_RESTORE(VMCE_VCPU, vmce_save_vcpu_ctxt, vmce_load_vcpu_ctxt, 1, HVMSR_PER_VCPU); -int inject_vmce(struct domain *d) +/* + * for Intel MCE, broadcast vMCE to all vcpus + * for AMD MCE, only inject vMCE to 1 vcpu, say, vcpu0 + * @ d, domain to which would inject vmce + * @ vcpuid, + * < 0, broadcast vMCE to all vcpus + * >= 0, vcpu who would be injected vMCE + * return 0 for success injection, -1 for fail injection + */ +int inject_vmce(struct domain *d, int vcpuid) { - int cpu = smp_processor_id(); + struct vcpu *v; + int ret = -1; - /* PV guest and HVM guest have different vMCE# injection methods. */ - if ( !test_and_set_bool(d->vcpu[0]->mce_pending) ) + for_each_vcpu(d, v) { - if ( d->is_hvm ) + if ( (vcpuid < 0) || (vcpuid == v->vcpu_id) ) { - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM %d\n", - d->domain_id); - vcpu_kick(d->vcpu[0]); - } - else - { - mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV DOM%d\n", - d->domain_id); - if ( guest_has_trap_callback(d, 0, TRAP_machine_check) ) + if ( !test_and_set_bool(v->mce_pending) && + ((d->is_hvm) || + guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) ) { - cpumask_copy(d->vcpu[0]->cpu_affinity_tmp, - d->vcpu[0]->cpu_affinity); - mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, old %d\n", - cpu, d->vcpu[0]->processor); - vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu)); - vcpu_kick(d->vcpu[0]); + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n", + d->domain_id, v->vcpu_id); + vcpu_kick(v); + ret = 0; } else { - mce_printk(MCE_VERBOSE, - "MCE: Kill PV guest with No MCE handler\n"); - domain_crash(d); + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n", + d->domain_id, v->vcpu_id); + ret = -1; + break; } } } - else - { - /* new vMCE comes while first one has not been injected yet, - * in this case, inject fail. [We can''t lose this vMCE for - * the mce node''s consistency]. - */ - mce_printk(MCE_QUIET, "There''s a pending vMCE waiting to be injected " - " to this DOM%d!\n", d->domain_id); - return -1; - } - return 0; + + return ret; } int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 26.09.12 at 05:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Xen/MCE: vMCE injection > > for Intel MCE, broadcast vMCE to all vcpus; > for AMD MCE, only inject vMCE to 1 vcpu, say, vcpu0Please double check what got committed. Jan
Maybe Matching Threads
- [PATCH] Xen/MCE: stick all 1's to MCi_CTL of vMCE
- [xen vMCE RFC V0.2] xen vMCE design
- [pvops-dom0]Let PV ops guest could handle Machine Check trap
- [PATCH] x86/hvm: don't give vector callback higher priority than NMI/MCE
- [Patch] Fix the slow wall clock time issue in x64 SMP Vista