Xen/MCA: bugfix for mca bank clear mcabank_clear() is common code for real h/w mca and s/w simulated mca. Under s/w simulated case, getting status via mca_rdmsr may trigger #GP if MCx_ADDR/MISC not supported by real h/w. This patch fix the bug. It always invalidates intpose for s/w simulated mca case, and do check real h/w status ADDRV/MISCV to avoid #GP when clear MCx_ADDR/MISC for h/w mca case. Reported-by: Ren Yongjie <yongjie.ren@intel.com> Singed-off-by: Liu Jinsong <jinsong.liu@intel.com> diff -r e84a79d11d7a xen/arch/x86/cpu/mcheck/mce.c --- a/xen/arch/x86/cpu/mcheck/mce.c Thu Nov 01 01:41:03 2012 +0800 +++ b/xen/arch/x86/cpu/mcheck/mce.c Thu Feb 28 08:05:15 2013 +0800 @@ -138,17 +138,28 @@ xfree(banks); } +/* + * Common code for real h/w mca and s/w simulated mca. + * Always invalidate intpose for s/w simulated mca case, and do check + * real h/w status ADDRV/MISCV to avoid #GP when clear MCx_ADDR/MISC + * for h/w mca case. + */ static void mcabank_clear(int banknum) { - uint64_t status; + uint64_t hw_status; - status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum)); + /* For s/w simulated mca case */ + intpose_inval(smp_processor_id(), MSR_IA32_MCx_ADDR(banknum)); + intpose_inval(smp_processor_id(), MSR_IA32_MCx_MISC(banknum)); - if (status & MCi_STATUS_ADDRV) - mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); - if (status & MCi_STATUS_MISCV) - mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL); + /* For real h/w mca case */ + rdmsrl(MSR_IA32_MCx_STATUS(banknum), hw_status); + if (hw_status & MCi_STATUS_ADDRV) + wrmsrl(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); + if (hw_status & MCi_STATUS_MISCV) + wrmsrl(MSR_IA32_MCx_MISC(banknum), 0x0ULL); + /* For both cases */ mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 27.02.13 at 17:44, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Xen/MCA: bugfix for mca bank clear > > mcabank_clear() is common code for real h/w mca and s/w simulated mca. > Under s/w simulated case, getting status via mca_rdmsr may trigger #GP > if MCx_ADDR/MISC not supported by real h/w. > > This patch fix the bug. It always invalidates intpose for s/w simulated > mca case, and do check real h/w status ADDRV/MISCV to avoid #GP when > clear MCx_ADDR/MISC for h/w mca case. > > Reported-by: Ren Yongjie <yongjie.ren@intel.com> > Singed-off-by: Liu Jinsong <jinsong.liu@intel.com> > > diff -r e84a79d11d7a xen/arch/x86/cpu/mcheck/mce.c > --- a/xen/arch/x86/cpu/mcheck/mce.c Thu Nov 01 01:41:03 2012 +0800 > +++ b/xen/arch/x86/cpu/mcheck/mce.c Thu Feb 28 08:05:15 2013 +0800 > @@ -138,17 +138,28 @@ > xfree(banks); > } > > +/* > + * Common code for real h/w mca and s/w simulated mca. > + * Always invalidate intpose for s/w simulated mca case, and do check > + * real h/w status ADDRV/MISCV to avoid #GP when clear MCx_ADDR/MISC > + * for h/w mca case. > + */ > static void mcabank_clear(int banknum) > { > - uint64_t status; > + uint64_t hw_status; > > - status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum)); > + /* For s/w simulated mca case */ > + intpose_inval(smp_processor_id(), MSR_IA32_MCx_ADDR(banknum)); > + intpose_inval(smp_processor_id(), MSR_IA32_MCx_MISC(banknum));And no invalidation of MSR_IA32_MCx_STATUS(banknum)?> > - if (status & MCi_STATUS_ADDRV) > - mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); > - if (status & MCi_STATUS_MISCV) > - mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL); > + /* For real h/w mca case */ > + rdmsrl(MSR_IA32_MCx_STATUS(banknum), hw_status); > + if (hw_status & MCi_STATUS_ADDRV) > + wrmsrl(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); > + if (hw_status & MCi_STATUS_MISCV) > + wrmsrl(MSR_IA32_MCx_MISC(banknum), 0x0ULL); > > + /* For both cases */ > mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL);Oh, that happens as a side effect of this one. Not really obvious, so likely worth a comment.> } >But anyway, I don''t think that''s quite right either: For one, I''m not sure the ordering of the MSR writes can be freely exchanged (originally you cleared STATUS first, while now it''s done last). And then I don''t see why you would do physical MSR accesses in the injection case at all. It should really be mca_wrmsr() to take care of this, it just needs to have a way to know it got called in the context of an injection. One fundamental question here is why the invalidation of the interposed data is being done before the MSR write rather than after. One thing we surely don''t care much about in the injection logic is interference with a real #MC. Jan
Jan Beulich wrote:>>>> On 27.02.13 at 17:44, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> Xen/MCA: bugfix for mca bank clear >> >> mcabank_clear() is common code for real h/w mca and s/w simulated >> mca. Under s/w simulated case, getting status via mca_rdmsr may >> trigger #GP if MCx_ADDR/MISC not supported by real h/w. >> >> This patch fix the bug. It always invalidates intpose for s/w >> simulated mca case, and do check real h/w status ADDRV/MISCV to >> avoid #GP when clear MCx_ADDR/MISC for h/w mca case. >> >> Reported-by: Ren Yongjie <yongjie.ren@intel.com> >> Singed-off-by: Liu Jinsong <jinsong.liu@intel.com> >> >> diff -r e84a79d11d7a xen/arch/x86/cpu/mcheck/mce.c >> --- a/xen/arch/x86/cpu/mcheck/mce.c Thu Nov 01 01:41:03 2012 +0800 >> +++ b/xen/arch/x86/cpu/mcheck/mce.c Thu Feb 28 08:05:15 2013 +0800 >> @@ -138,17 +138,28 @@ xfree(banks); >> } >> >> +/* >> + * Common code for real h/w mca and s/w simulated mca. >> + * Always invalidate intpose for s/w simulated mca case, and do >> check + * real h/w status ADDRV/MISCV to avoid #GP when clear >> MCx_ADDR/MISC + * for h/w mca case. + */ >> static void mcabank_clear(int banknum) >> { >> - uint64_t status; >> + uint64_t hw_status; >> >> - status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum)); >> + /* For s/w simulated mca case */ >> + intpose_inval(smp_processor_id(), MSR_IA32_MCx_ADDR(banknum)); >> + intpose_inval(smp_processor_id(), MSR_IA32_MCx_MISC(banknum)); > > And no invalidation of MSR_IA32_MCx_STATUS(banknum)? > >> >> - if (status & MCi_STATUS_ADDRV) >> - mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); >> - if (status & MCi_STATUS_MISCV) >> - mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL); >> + /* For real h/w mca case */ >> + rdmsrl(MSR_IA32_MCx_STATUS(banknum), hw_status); >> + if (hw_status & MCi_STATUS_ADDRV) >> + wrmsrl(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); >> + if (hw_status & MCi_STATUS_MISCV) >> + wrmsrl(MSR_IA32_MCx_MISC(banknum), 0x0ULL); >> >> + /* For both cases */ >> mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL); > > Oh, that happens as a side effect of this one. Not really obvious, > so likely worth a comment.OK.> >> } >> > > But anyway, I don''t think that''s quite right either: For one, I''m not > sure the ordering of the MSR writes can be freely exchanged > (originally you cleared STATUS first, while now it''s done last).OK, adjust sequence strictly according to SDM.> > And then I don''t see why you would do physical MSR accesses in > the injection case at all. It should really be mca_wrmsr() to take > care of this, it just needs to have a way to know it got called in > the context of an injection. One fundamental question here is > why the invalidation of the interposed data is being done > before the MSR write rather than after. One thing we surely > don''t care much about in the injection logic is interference with > a real #MC. > > JanA way (say, a global bool flag) could be used to indicate s/w injection. Our concern here is, if s/w inject occur while real h/w mce, it may be race condition. But if you think it''s OK to tolerate it, it''s OK for me. Thanks, Jinsong
>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> wrote: > And then I don''t see why you would do physical MSR accesses in > the injection case at all. It should really be mca_wrmsr() to take > care of this, it just needs to have a way to know it got called in > the context of an injection. One fundamental question here is > why the invalidation of the interposed data is being done > before the MSR write rather than after. One thing we surely > don''t care much about in the injection logic is interference with > a real #MC.Like this (RFC only, untested so far), based on having gone through all call sites of mca_wrmsr(): --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int cpu printk("intpose_add: interpose array full - request dropped\n"); } -void intpose_inval(unsigned int cpu_nr, uint64_t msr) +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) { - struct intpose_ent *ent; + struct intpose_ent *ent = intpose_lookup(cpu_nr, msr, NULL); - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { - ent->cpu_nr = -1; - } + if ( !ent ) + return 0; + + ent->cpu_nr = -1; + return 1; } #define IS_MCA_BANKREG(r) \ --- a/xen/arch/x86/cpu/mcheck/mce.h +++ b/xen/arch/x86/cpu/mcheck/mce.h @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce /* Read an MSR, checking for an interposed value first */ extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, uint64_t *); -extern void intpose_inval(unsigned int, uint64_t); +extern bool_t intpose_inval(unsigned int, uint64_t); static inline uint64_t mca_rdmsr(unsigned int msr) { @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne /* Write an MSR, invalidating any interposed value */ #define mca_wrmsr(msr, val) do { \ - intpose_inval(smp_processor_id(), msr); \ - wrmsrl(msr, val); \ -} while (0) + if ( !intpose_inval(smp_processor_id(), msr) ) \ + wrmsrl(msr, val); \ +} while ( 0 ) /* Utility function to "logout" all architectural MCA telemetry from the MCA
Jan Beulich wrote:>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> wrote: >> And then I don''t see why you would do physical MSR accesses in >> the injection case at all. It should really be mca_wrmsr() to take >> care of this, it just needs to have a way to know it got called in >> the context of an injection. One fundamental question here is >> why the invalidation of the interposed data is being done >> before the MSR write rather than after. One thing we surely >> don''t care much about in the injection logic is interference with >> a real #MC. > > Like this (RFC only, untested so far), based on having gone through > all call sites of mca_wrmsr(): > > --- a/xen/arch/x86/cpu/mcheck/mce.c > +++ b/xen/arch/x86/cpu/mcheck/mce.c > @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int cpu > printk("intpose_add: interpose array full - request dropped\n"); > } > > -void intpose_inval(unsigned int cpu_nr, uint64_t msr) > +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) > { > - struct intpose_ent *ent; > + struct intpose_ent *ent = intpose_lookup(cpu_nr, msr, NULL); > > - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { > - ent->cpu_nr = -1; > - } > + if ( !ent ) > + return 0; > + > + ent->cpu_nr = -1; > + return 1; > } > > #define IS_MCA_BANKREG(r) \ > --- a/xen/arch/x86/cpu/mcheck/mce.h > +++ b/xen/arch/x86/cpu/mcheck/mce.h > @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce > /* Read an MSR, checking for an interposed value first */ > extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, > uint64_t *); > -extern void intpose_inval(unsigned int, uint64_t); > +extern bool_t intpose_inval(unsigned int, uint64_t); > > static inline uint64_t mca_rdmsr(unsigned int msr) > { > @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne > > /* Write an MSR, invalidating any interposed value */ > #define mca_wrmsr(msr, val) do { \ > - intpose_inval(smp_processor_id(), msr); \ > - wrmsrl(msr, val); \ > -} while (0) > + if ( !intpose_inval(smp_processor_id(), msr) ) \ > + wrmsrl(msr, val); \ > +} while ( 0 ) > > > /* Utility function to "logout" all architectural MCA telemetry from > the MCANo, it doesn''t work, considering mce broadcast to all cpus, while s/w only simulate 1 cpu. Thanks, Jinsong
>>> On 28.02.13 at 14:21, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Jan Beulich wrote: >>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> wrote: >>> And then I don''t see why you would do physical MSR accesses in >>> the injection case at all. It should really be mca_wrmsr() to take >>> care of this, it just needs to have a way to know it got called in >>> the context of an injection. One fundamental question here is >>> why the invalidation of the interposed data is being done >>> before the MSR write rather than after. One thing we surely >>> don''t care much about in the injection logic is interference with >>> a real #MC. >> >> Like this (RFC only, untested so far), based on having gone through >> all call sites of mca_wrmsr(): >> >> --- a/xen/arch/x86/cpu/mcheck/mce.c >> +++ b/xen/arch/x86/cpu/mcheck/mce.c >> @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int cpu >> printk("intpose_add: interpose array full - request dropped\n"); >> } >> >> -void intpose_inval(unsigned int cpu_nr, uint64_t msr) >> +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) >> { >> - struct intpose_ent *ent; >> + struct intpose_ent *ent = intpose_lookup(cpu_nr, msr, NULL); >> >> - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { >> - ent->cpu_nr = -1; >> - } >> + if ( !ent ) >> + return 0; >> + >> + ent->cpu_nr = -1; >> + return 1; >> } >> >> #define IS_MCA_BANKREG(r) \ >> --- a/xen/arch/x86/cpu/mcheck/mce.h >> +++ b/xen/arch/x86/cpu/mcheck/mce.h >> @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce >> /* Read an MSR, checking for an interposed value first */ >> extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, >> uint64_t *); >> -extern void intpose_inval(unsigned int, uint64_t); >> +extern bool_t intpose_inval(unsigned int, uint64_t); >> >> static inline uint64_t mca_rdmsr(unsigned int msr) >> { >> @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne >> >> /* Write an MSR, invalidating any interposed value */ >> #define mca_wrmsr(msr, val) do { \ >> - intpose_inval(smp_processor_id(), msr); \ >> - wrmsrl(msr, val); \ >> -} while (0) >> + if ( !intpose_inval(smp_processor_id(), msr) ) \ >> + wrmsrl(msr, val); \ >> +} while ( 0 ) >> >> >> /* Utility function to "logout" all architectural MCA telemetry from >> the MCA > > No, it doesn''t work, considering mce broadcast to all cpus, while s/w only > simulate 1 cpu.I don''t see how that case would be handled any better with the invalidation happening before the MSR write (as is the case now). Please explain, Jan
Jan Beulich wrote:>>>> On 28.02.13 at 14:21, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> Jan Beulich wrote: >>>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> wrote: >>>> And then I don''t see why you would do physical MSR accesses in >>>> the injection case at all. It should really be mca_wrmsr() to take >>>> care of this, it just needs to have a way to know it got called in >>>> the context of an injection. One fundamental question here is >>>> why the invalidation of the interposed data is being done >>>> before the MSR write rather than after. One thing we surely >>>> don''t care much about in the injection logic is interference with >>>> a real #MC. >>> >>> Like this (RFC only, untested so far), based on having gone through >>> all call sites of mca_wrmsr(): >>> >>> --- a/xen/arch/x86/cpu/mcheck/mce.c >>> +++ b/xen/arch/x86/cpu/mcheck/mce.c >>> @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int cpu >>> printk("intpose_add: interpose array full - request >>> dropped\n"); } >>> >>> -void intpose_inval(unsigned int cpu_nr, uint64_t msr) >>> +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) { >>> - struct intpose_ent *ent; >>> + struct intpose_ent *ent = intpose_lookup(cpu_nr, msr, NULL); >>> >>> - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { >>> - ent->cpu_nr = -1; >>> - } >>> + if ( !ent ) >>> + return 0; >>> + >>> + ent->cpu_nr = -1; >>> + return 1; >>> } >>> >>> #define IS_MCA_BANKREG(r) \ >>> --- a/xen/arch/x86/cpu/mcheck/mce.h >>> +++ b/xen/arch/x86/cpu/mcheck/mce.h >>> @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce >>> /* Read an MSR, checking for an interposed value first */ >>> extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, >>> uint64_t *); -extern void intpose_inval(unsigned int, uint64_t); >>> +extern bool_t intpose_inval(unsigned int, uint64_t); >>> >>> static inline uint64_t mca_rdmsr(unsigned int msr) { >>> @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne >>> >>> /* Write an MSR, invalidating any interposed value */ >>> #define mca_wrmsr(msr, val) do { \ >>> - intpose_inval(smp_processor_id(), msr); \ >>> - wrmsrl(msr, val); \ >>> -} while (0) >>> + if ( !intpose_inval(smp_processor_id(), msr) ) \ + >>> wrmsrl(msr, val); \ +} while ( 0 ) >>> >>> >>> /* Utility function to "logout" all architectural MCA telemetry >>> from the MCA >> >> No, it doesn''t work, considering mce broadcast to all cpus, while >> s/w only simulate 1 cpu. > > I don''t see how that case would be handled any better with the > invalidation happening before the MSR write (as is the case now). > > Please explain, JanSorry for late reply. What I meant is, mca_wrmsr updated in this way is not quite clean, since per syntax it still has the risk ''sometimes access simulated value, sometimes access physical bank''. But of course your patch works for current xen-mceinj tools, so if you think it''s OK we will have a test at the specific machine and ACK later. Thanks, Jinsong
>>> On 11.03.13 at 11:26, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Jan Beulich wrote: >>>>> On 28.02.13 at 14:21, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >>> Jan Beulich wrote: >>>>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> wrote: >>>>> And then I don''t see why you would do physical MSR accesses in >>>>> the injection case at all. It should really be mca_wrmsr() to take >>>>> care of this, it just needs to have a way to know it got called in >>>>> the context of an injection. One fundamental question here is >>>>> why the invalidation of the interposed data is being done >>>>> before the MSR write rather than after. One thing we surely >>>>> don''t care much about in the injection logic is interference with >>>>> a real #MC. >>>> >>>> Like this (RFC only, untested so far), based on having gone through >>>> all call sites of mca_wrmsr(): >>>> >>>> --- a/xen/arch/x86/cpu/mcheck/mce.c >>>> +++ b/xen/arch/x86/cpu/mcheck/mce.c >>>> @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int cpu >>>> printk("intpose_add: interpose array full - request >>>> dropped\n"); } >>>> >>>> -void intpose_inval(unsigned int cpu_nr, uint64_t msr) >>>> +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) { >>>> - struct intpose_ent *ent; >>>> + struct intpose_ent *ent = intpose_lookup(cpu_nr, msr, NULL); >>>> >>>> - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { >>>> - ent->cpu_nr = -1; >>>> - } >>>> + if ( !ent ) >>>> + return 0; >>>> + >>>> + ent->cpu_nr = -1; >>>> + return 1; >>>> } >>>> >>>> #define IS_MCA_BANKREG(r) \ >>>> --- a/xen/arch/x86/cpu/mcheck/mce.h >>>> +++ b/xen/arch/x86/cpu/mcheck/mce.h >>>> @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce >>>> /* Read an MSR, checking for an interposed value first */ >>>> extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, >>>> uint64_t *); -extern void intpose_inval(unsigned int, uint64_t); >>>> +extern bool_t intpose_inval(unsigned int, uint64_t); >>>> >>>> static inline uint64_t mca_rdmsr(unsigned int msr) { >>>> @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne >>>> >>>> /* Write an MSR, invalidating any interposed value */ >>>> #define mca_wrmsr(msr, val) do { \ >>>> - intpose_inval(smp_processor_id(), msr); \ >>>> - wrmsrl(msr, val); \ >>>> -} while (0) >>>> + if ( !intpose_inval(smp_processor_id(), msr) ) \ + >>>> wrmsrl(msr, val); \ +} while ( 0 ) >>>> >>>> >>>> /* Utility function to "logout" all architectural MCA telemetry >>>> from the MCA >>> >>> No, it doesn''t work, considering mce broadcast to all cpus, while >>> s/w only simulate 1 cpu. >> >> I don''t see how that case would be handled any better with the >> invalidation happening before the MSR write (as is the case now). >> >> Please explain, Jan > > Sorry for late reply. > > What I meant is, mca_wrmsr updated in this way is not quite clean, since per > syntax it still has the risk ''sometimes access simulated value, sometimes > access physical bank''. > > But of course your patch works for current xen-mceinj tools, so if you think > it''s OK we will have a test at the specific machine and ACK later.As said before, I don''t think we need to consider interference between injection and real events (or if we do, then I think there are other aspects that would need fixing too). Hence, as long as what I proposed works for the injection case, that ought to be better than what we have currently. Perhaps explicitly filtering out the not supported case of broadcast being requested with the rejection (iirc there''s a way to request this) would need to be added to the change suggested earlier. Jan
Jan Beulich wrote:>>>> On 11.03.13 at 11:26, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> Jan Beulich wrote: >>>>>> On 28.02.13 at 14:21, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>> wrote: >>>> Jan Beulich wrote: >>>>>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> wrote: >>>>>> And then I don''t see why you would do physical MSR accesses in >>>>>> the injection case at all. It should really be mca_wrmsr() to >>>>>> take care of this, it just needs to have a way to know it got >>>>>> called in the context of an injection. One fundamental question >>>>>> here is >>>>>> why the invalidation of the interposed data is being done >>>>>> before the MSR write rather than after. One thing we surely >>>>>> don''t care much about in the injection logic is interference with >>>>>> a real #MC. >>>>> >>>>> Like this (RFC only, untested so far), based on having gone >>>>> through all call sites of mca_wrmsr(): >>>>> >>>>> --- a/xen/arch/x86/cpu/mcheck/mce.c >>>>> +++ b/xen/arch/x86/cpu/mcheck/mce.c >>>>> @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int cpu >>>>> printk("intpose_add: interpose array full - request >>>>> dropped\n"); } >>>>> >>>>> -void intpose_inval(unsigned int cpu_nr, uint64_t msr) >>>>> +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) { - >>>>> struct intpose_ent *ent; + struct intpose_ent *ent >>>>> intpose_lookup(cpu_nr, msr, NULL); >>>>> >>>>> - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { >>>>> - ent->cpu_nr = -1; >>>>> - } >>>>> + if ( !ent ) >>>>> + return 0; >>>>> + >>>>> + ent->cpu_nr = -1; >>>>> + return 1; >>>>> } >>>>> >>>>> #define IS_MCA_BANKREG(r) \ >>>>> --- a/xen/arch/x86/cpu/mcheck/mce.h >>>>> +++ b/xen/arch/x86/cpu/mcheck/mce.h >>>>> @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce >>>>> /* Read an MSR, checking for an interposed value first */ >>>>> extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, >>>>> uint64_t *); -extern void intpose_inval(unsigned int, uint64_t); >>>>> +extern bool_t intpose_inval(unsigned int, uint64_t); >>>>> >>>>> static inline uint64_t mca_rdmsr(unsigned int msr) { >>>>> @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne >>>>> >>>>> /* Write an MSR, invalidating any interposed value */ >>>>> #define mca_wrmsr(msr, val) do { \ >>>>> - intpose_inval(smp_processor_id(), msr); \ >>>>> - wrmsrl(msr, val); \ >>>>> -} while (0) >>>>> + if ( !intpose_inval(smp_processor_id(), msr) ) \ + >>>>> wrmsrl(msr, val); \ +} while ( 0 ) >>>>> >>>>> >>>>> /* Utility function to "logout" all architectural MCA telemetry >>>>> from the MCA >>>> >>>> No, it doesn''t work, considering mce broadcast to all cpus, while >>>> s/w only simulate 1 cpu. >>> >>> I don''t see how that case would be handled any better with the >>> invalidation happening before the MSR write (as is the case now). >>> >>> Please explain, Jan >> >> Sorry for late reply. >> >> What I meant is, mca_wrmsr updated in this way is not quite clean, >> since per syntax it still has the risk ''sometimes access simulated >> value, sometimes access physical bank''. >> >> But of course your patch works for current xen-mceinj tools, so if >> you think it''s OK we will have a test at the specific machine and >> ACK later. > > As said before, I don''t think we need to consider interference > between injection and real events (or if we do, then I think there > are other aspects that would need fixing too). > > Hence, as long as what I proposed works for the injection case, > that ought to be better than what we have currently. Perhaps > explicitly filtering out the not supported case of broadcast being > requested with the rejection (iirc there''s a way to request this) > would need to be added to the change suggested earlier. > > JanRongjie, Would you please test Jan''s patch at *the* WSM-EX machine? It should be fine but test is necessary and better. Thanks, Jinsong
> -----Original Message----- > From: Liu, Jinsong > Sent: Tuesday, March 12, 2013 10:07 AM > To: Jan Beulich; Ren, Yongjie > Cc: xen-devel > Subject: RE: [Xen-devel] [PATCH 1/2] Xen/MCA: bugfix for mca bank clear > > Jan Beulich wrote: > >>>> On 11.03.13 at 11:26, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > >> Jan Beulich wrote: > >>>>>> On 28.02.13 at 14:21, "Liu, Jinsong" <jinsong.liu@intel.com> > >>>>>> wrote: > >>>> Jan Beulich wrote: > >>>>>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> > wrote: > >>>>>> And then I don''t see why you would do physical MSR accesses in > >>>>>> the injection case at all. It should really be mca_wrmsr() to > >>>>>> take care of this, it just needs to have a way to know it got > >>>>>> called in the context of an injection. One fundamental question > >>>>>> here is > >>>>>> why the invalidation of the interposed data is being done > >>>>>> before the MSR write rather than after. One thing we surely > >>>>>> don''t care much about in the injection logic is interference with > >>>>>> a real #MC. > >>>>> > >>>>> Like this (RFC only, untested so far), based on having gone > >>>>> through all call sites of mca_wrmsr(): > >>>>> > >>>>> --- a/xen/arch/x86/cpu/mcheck/mce.c > >>>>> +++ b/xen/arch/x86/cpu/mcheck/mce.c > >>>>> @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int > cpu > >>>>> printk("intpose_add: interpose array full - request > >>>>> dropped\n"); } > >>>>> > >>>>> -void intpose_inval(unsigned int cpu_nr, uint64_t msr) > >>>>> +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) { - > >>>>> struct intpose_ent *ent; + struct intpose_ent *ent > >>>>> intpose_lookup(cpu_nr, msr, NULL); > >>>>> > >>>>> - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { > >>>>> - ent->cpu_nr = -1; > >>>>> - } > >>>>> + if ( !ent ) > >>>>> + return 0; > >>>>> + > >>>>> + ent->cpu_nr = -1; > >>>>> + return 1; > >>>>> } > >>>>> > >>>>> #define IS_MCA_BANKREG(r) \ > >>>>> --- a/xen/arch/x86/cpu/mcheck/mce.h > >>>>> +++ b/xen/arch/x86/cpu/mcheck/mce.h > >>>>> @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce > >>>>> /* Read an MSR, checking for an interposed value first */ > >>>>> extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, > >>>>> uint64_t *); -extern void intpose_inval(unsigned int, uint64_t); > >>>>> +extern bool_t intpose_inval(unsigned int, uint64_t); > >>>>> > >>>>> static inline uint64_t mca_rdmsr(unsigned int msr) { > >>>>> @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne > >>>>> > >>>>> /* Write an MSR, invalidating any interposed value */ > >>>>> #define mca_wrmsr(msr, val) do { \ > >>>>> - intpose_inval(smp_processor_id(), msr); \ > >>>>> - wrmsrl(msr, val); \ > >>>>> -} while (0) > >>>>> + if ( !intpose_inval(smp_processor_id(), msr) ) \ + > >>>>> wrmsrl(msr, val); \ +} while ( 0 ) > >>>>> > >>>>> > >>>>> /* Utility function to "logout" all architectural MCA telemetry > >>>>> from the MCA > >>>> > >>>> No, it doesn''t work, considering mce broadcast to all cpus, while > >>>> s/w only simulate 1 cpu. > >>> > >>> I don''t see how that case would be handled any better with the > >>> invalidation happening before the MSR write (as is the case now). > >>> > >>> Please explain, Jan > >> > >> Sorry for late reply. > >> > >> What I meant is, mca_wrmsr updated in this way is not quite clean, > >> since per syntax it still has the risk ''sometimes access simulated > >> value, sometimes access physical bank''. > >> > >> But of course your patch works for current xen-mceinj tools, so if > >> you think it''s OK we will have a test at the specific machine and > >> ACK later. > > > > As said before, I don''t think we need to consider interference > > between injection and real events (or if we do, then I think there > > are other aspects that would need fixing too). > > > > Hence, as long as what I proposed works for the injection case, > > that ought to be better than what we have currently. Perhaps > > explicitly filtering out the not supported case of broadcast being > > requested with the rejection (iirc there''s a way to request this) > > would need to be added to the change suggested earlier. > > > > Jan > > Rongjie, > > Would you please test Jan''s patch at *the* WSM-EX machine? It should be > fine but test is necessary and better. >Jan, Your patch can work fine during my testing. Tested-by: Yongjie Ren <yongjie.ren@intel.com>
>>> On 12.03.13 at 09:05, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >> From: Liu, Jinsong >> Would you please test Jan''s patch at *the* WSM-EX machine? It should be >> fine but test is necessary and better. > > Your patch can work fine during my testing. > Tested-by: Yongjie Ren <yongjie.ren@intel.com>Jinsong, so I think your concerns regarding broadcast not working were in fact misguided then? With the tool interposing values on the target CPU only, MSR reads and writes should end up to be consistent on every individual CPU (either always reading real MSRs and clearing them based on those real values, or always reading interposed values, and skipping the clearing write upon detection of reads having returned fake values). Hence even if the tool interposed values on multiple CPUs, things ought to work fine as long as those values are consistent. If you agree, do you mind putting your acked-by under the patch? Jan
Jan Beulich wrote:>>>> On 12.03.13 at 09:05, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >>> From: Liu, Jinsong >>> Would you please test Jan''s patch at *the* WSM-EX machine? It >>> should be fine but test is necessary and better. >> >> Your patch can work fine during my testing. >> Tested-by: Yongjie Ren <yongjie.ren@intel.com> > > Jinsong, > > so I think your concerns regarding broadcast not working were in > fact misguided then? With the tool interposing values on the > target CPU only, MSR reads and writes should end up to be > consistent on every individual CPU (either always reading real > MSRs and clearing them based on those real values, or always > reading interposed values, and skipping the clearing write upon > detection of reads having returned fake values).Yes :)> > Hence even if the tool interposed values on multiple CPUs, things > ought to work fine as long as those values are consistent. > > If you agree, do you mind putting your acked-by under the patch? > > JanSure, Acked-by: Liu Jinsong <jinsong.liu@intel.com>
Jan Beulich wrote:>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@suse.com> wrote: >> And then I don''t see why you would do physical MSR accesses in >> the injection case at all. It should really be mca_wrmsr() to take >> care of this, it just needs to have a way to know it got called in >> the context of an injection. One fundamental question here is >> why the invalidation of the interposed data is being done >> before the MSR write rather than after. One thing we surely >> don''t care much about in the injection logic is interference with >> a real #MC. > > Like this (RFC only, untested so far), based on having gone through > all call sites of mca_wrmsr(): > > --- a/xen/arch/x86/cpu/mcheck/mce.c > +++ b/xen/arch/x86/cpu/mcheck/mce.c > @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int cpu > printk("intpose_add: interpose array full - request dropped\n"); > } > > -void intpose_inval(unsigned int cpu_nr, uint64_t msr) > +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr) > { > - struct intpose_ent *ent; > + struct intpose_ent *ent = intpose_lookup(cpu_nr, msr, NULL); > > - if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) { > - ent->cpu_nr = -1; > - } > + if ( !ent ) > + return 0; > + > + ent->cpu_nr = -1; > + return 1; > } > > #define IS_MCA_BANKREG(r) \ > --- a/xen/arch/x86/cpu/mcheck/mce.h > +++ b/xen/arch/x86/cpu/mcheck/mce.h > @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce > /* Read an MSR, checking for an interposed value first */ > extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t, > uint64_t *); > -extern void intpose_inval(unsigned int, uint64_t); > +extern bool_t intpose_inval(unsigned int, uint64_t); > > static inline uint64_t mca_rdmsr(unsigned int msr) > { > @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne > > /* Write an MSR, invalidating any interposed value */ > #define mca_wrmsr(msr, val) do { \ > - intpose_inval(smp_processor_id(), msr); \ > - wrmsrl(msr, val); \ > -} while (0) > + if ( !intpose_inval(smp_processor_id(), msr) ) \ > + wrmsrl(msr, val); \ > +} while ( 0 ) > > > /* Utility function to "logout" all architectural MCA telemetry from > the MCATested-by: Ren Yongjie <yongjie.ren@intel.com> Acked-by: Liu Jinsong <jinsong.liu@intel.com> Thanks, Jinsong