This work around an issue when test via xen-mceinj tools. when inject simulated error via xen-mceinj tools, status ADDRV/MISCV bits are simulated hence there is potential risk of #GP if h/w not really support MCi_ADDR/MISC. We temporarily work around by not clean them until we have clean solution. Reported-by: Ren Yongjie <yongjie.ren@intel.com> Singed-off-by: Liu Jinsong <jinsong.liu@intel.com> diff -r e84a79d11d7a xen/arch/x86/cpu/mcheck/mce.c --- a/xen/arch/x86/cpu/mcheck/mce.c Thu Nov 01 01:41:03 2012 +0800 +++ b/xen/arch/x86/cpu/mcheck/mce.c Thu Feb 28 00:34:22 2013 +0800 @@ -144,10 +144,19 @@ status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum)); +/* + * TODO: when inject simulated error via xen-mceinj tools, + * status ADDRV/MISCV bits are simulated hence there is + * potential risk of #GP if h/w not really support MCi_ADDR/MISC. + * We temporary work around by not clean them until we have + * clean solution. + */ +#if 0 if (status & MCi_STATUS_ADDRV) mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); if (status & MCi_STATUS_MISCV) mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL); +#endif mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 27.02.13 at 10:24, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > This work around an issue when test via xen-mceinj tools. > > when inject simulated error via xen-mceinj tools, > status ADDRV/MISCV bits are simulated hence there is > potential risk of #GP if h/w not really support MCi_ADDR/MISC. > We temporarily work around by not clean them until we have > clean solution.Excuse me, but - no. Changing the behavior for real MCE-s (which you added) for the benefit of fixing injection is a no-go IMO. Or are you telling us that after all that earlier change of yours is not really necessary (in which case we could as well revert it). Jan> Reported-by: Ren Yongjie <yongjie.ren@intel.com> > Singed-off-by: Liu Jinsong <jinsong.liu@intel.com> > > diff -r e84a79d11d7a xen/arch/x86/cpu/mcheck/mce.c > --- a/xen/arch/x86/cpu/mcheck/mce.c Thu Nov 01 01:41:03 2012 +0800 > +++ b/xen/arch/x86/cpu/mcheck/mce.c Thu Feb 28 00:34:22 2013 +0800 > @@ -144,10 +144,19 @@ > > status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum)); > > +/* > + * TODO: when inject simulated error via xen-mceinj tools, > + * status ADDRV/MISCV bits are simulated hence there is > + * potential risk of #GP if h/w not really support MCi_ADDR/MISC. > + * We temporary work around by not clean them until we have > + * clean solution. > + */ > +#if 0 > if (status & MCi_STATUS_ADDRV) > mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); > if (status & MCi_STATUS_MISCV) > mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL); > +#endif > > mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL); > }
Jan Beulich wrote:>>>> On 27.02.13 at 10:24, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> This work around an issue when test via xen-mceinj tools. >> >> when inject simulated error via xen-mceinj tools, >> status ADDRV/MISCV bits are simulated hence there is >> potential risk of #GP if h/w not really support MCi_ADDR/MISC. >> We temporarily work around by not clean them until we have >> clean solution. > > Excuse me, but - no. Changing the behavior for real MCE-s (which > you added) for the benefit of fixing injection is a no-go IMO. Or > are you telling us that after all that earlier change of yours is not > really necessary (in which case we could as well revert it). > > Jan >The reason of the former patch to clear MCi_ADDR/MISC is that it''s recommended by Intel SDM: LOG MCA REGISTER: SAVE IA32_MCi_STATUS; If MISCV in IA32_MCi_STATUS THEN SAVE IA32_MCi_MISC; FI; IF ADDRV in IA32_MCi_STATUS THEN SAVE IA32_MCi_ADDR; FI; IF CLEAR_MC_BANK = TRUE THEN SET all 0 to IA32_MCi_STATUS; If MISCV in IA32_MCi_STATUS THEN SET all 0 to IA32_MCi_MISC; FI; IF ADDRV in IA32_MCi_STATUS THEN SET all 0 to IA32_MCi_ADDR; FI; For Xen mce, it''s meaningful to read MCi_ADDR/MISC only when real error occur (which indicated by MCi_STATUS), so only clear MCi_STATUS at mce handler is an acceptable work around -- after all, to read MCi_ADDR/MISC is pointless if MCi_STATUS is 0. Thanks, Jinsong>> Reported-by: Ren Yongjie <yongjie.ren@intel.com> >> Singed-off-by: Liu Jinsong <jinsong.liu@intel.com> >> >> diff -r e84a79d11d7a xen/arch/x86/cpu/mcheck/mce.c >> --- a/xen/arch/x86/cpu/mcheck/mce.c Thu Nov 01 01:41:03 2012 +0800 >> +++ b/xen/arch/x86/cpu/mcheck/mce.c Thu Feb 28 00:34:22 2013 +0800 >> @@ -144,10 +144,19 @@ >> >> status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum)); >> >> +/* >> + * TODO: when inject simulated error via xen-mceinj tools, >> + * status ADDRV/MISCV bits are simulated hence there is >> + * potential risk of #GP if h/w not really support MCi_ADDR/MISC. >> + * We temporary work around by not clean them until we have + * >> clean solution. + */ >> +#if 0 >> if (status & MCi_STATUS_ADDRV) >> mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); >> if (status & MCi_STATUS_MISCV) >> mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL); +#endif >> >> mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL); >> }
>>> On 27.02.13 at 11:37, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Jan Beulich wrote: >>>>> On 27.02.13 at 10:24, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >>> This work around an issue when test via xen-mceinj tools. >>> >>> when inject simulated error via xen-mceinj tools, >>> status ADDRV/MISCV bits are simulated hence there is >>> potential risk of #GP if h/w not really support MCi_ADDR/MISC. >>> We temporarily work around by not clean them until we have >>> clean solution. >> >> Excuse me, but - no. Changing the behavior for real MCE-s (which >> you added) for the benefit of fixing injection is a no-go IMO. Or >> are you telling us that after all that earlier change of yours is not >> really necessary (in which case we could as well revert it). >> >> Jan >> > > The reason of the former patch to clear MCi_ADDR/MISC is that it''s > recommended by Intel SDM: > LOG MCA REGISTER: > SAVE IA32_MCi_STATUS; > If MISCV in IA32_MCi_STATUS > THEN > SAVE IA32_MCi_MISC; > FI; > IF ADDRV in IA32_MCi_STATUS > THEN > SAVE IA32_MCi_ADDR; > FI; > IF CLEAR_MC_BANK = TRUE > THEN > SET all 0 to IA32_MCi_STATUS; > If MISCV in IA32_MCi_STATUS > THEN > SET all 0 to IA32_MCi_MISC; > FI; > IF ADDRV in IA32_MCi_STATUS > THEN > SET all 0 to IA32_MCi_ADDR; > FI; > > For Xen mce, it''s meaningful to read MCi_ADDR/MISC only when real error > occur (which indicated by MCi_STATUS), so only clear MCi_STATUS at mce > handler is an acceptable work around -- after all, to read MCi_ADDR/MISC is > pointless if MCi_STATUS is 0.So then what - revert your original patch (and ignore the SDM)? I''m not in favor of this... Jan
Jan Beulich wrote:>>>> On 27.02.13 at 11:37, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> Jan Beulich wrote: >>>>>> On 27.02.13 at 10:24, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>> wrote: >>>> This work around an issue when test via xen-mceinj tools. >>>> >>>> when inject simulated error via xen-mceinj tools, >>>> status ADDRV/MISCV bits are simulated hence there is >>>> potential risk of #GP if h/w not really support MCi_ADDR/MISC. >>>> We temporarily work around by not clean them until we have >>>> clean solution. >>> >>> Excuse me, but - no. Changing the behavior for real MCE-s (which >>> you added) for the benefit of fixing injection is a no-go IMO. Or >>> are you telling us that after all that earlier change of yours is >>> not really necessary (in which case we could as well revert it). >>> >>> Jan >>> >> >> The reason of the former patch to clear MCi_ADDR/MISC is that it''s >> recommended by Intel SDM: LOG MCA REGISTER: >> SAVE IA32_MCi_STATUS; >> If MISCV in IA32_MCi_STATUS >> THEN >> SAVE IA32_MCi_MISC; >> FI; >> IF ADDRV in IA32_MCi_STATUS >> THEN >> SAVE IA32_MCi_ADDR; >> FI; >> IF CLEAR_MC_BANK = TRUE >> THEN >> SET all 0 to IA32_MCi_STATUS; >> If MISCV in IA32_MCi_STATUS >> THEN >> SET all 0 to IA32_MCi_MISC; >> FI; >> IF ADDRV in IA32_MCi_STATUS >> THEN >> SET all 0 to IA32_MCi_ADDR; >> FI; >> >> For Xen mce, it''s meaningful to read MCi_ADDR/MISC only when real >> error occur (which indicated by MCi_STATUS), so only clear >> MCi_STATUS at mce handler is an acceptable work around -- after all, >> to read MCi_ADDR/MISC is pointless if MCi_STATUS is 0. > > So then what - revert your original patch (and ignore the SDM)? > I''m not in favor of this... > > JanNot revert entire 23327, but only use this patch to revert MCi_ADDR/MISC clear. I also agree it''s not good, but currently seems we don''t have a simple and clean way to fix it, except we spend much time to to update xen-mceinj *tools* -- even so it''s low-priority? Thanks, Jinsong
>>> On 27.02.13 at 12:08, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Jan Beulich wrote: >>>>> On 27.02.13 at 11:37, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >>> The reason of the former patch to clear MCi_ADDR/MISC is that it''s >>> recommended by Intel SDM: LOG MCA REGISTER: >>> SAVE IA32_MCi_STATUS; >>> If MISCV in IA32_MCi_STATUS >>> THEN >>> SAVE IA32_MCi_MISC; >>> FI; >>> IF ADDRV in IA32_MCi_STATUS >>> THEN >>> SAVE IA32_MCi_ADDR; >>> FI; >>> IF CLEAR_MC_BANK = TRUE >>> THEN >>> SET all 0 to IA32_MCi_STATUS; >>> If MISCV in IA32_MCi_STATUS >>> THEN >>> SET all 0 to IA32_MCi_MISC; >>> FI; >>> IF ADDRV in IA32_MCi_STATUS >>> THEN >>> SET all 0 to IA32_MCi_ADDR; >>> FI; >>> >>> For Xen mce, it''s meaningful to read MCi_ADDR/MISC only when real >>> error occur (which indicated by MCi_STATUS), so only clear >>> MCi_STATUS at mce handler is an acceptable work around -- after all, >>> to read MCi_ADDR/MISC is pointless if MCi_STATUS is 0. >> >> So then what - revert your original patch (and ignore the SDM)? >> I''m not in favor of this... > > Not revert entire 23327, but only use this patch to revert MCi_ADDR/MISC > clear. > > I also agree it''s not good, but currently seems we don''t have a simple and > clean way to fix it, except we spend much time to to update xen-mceinj *tools* > -- even so it''s low-priority?No, fixing the tool seems unnecessary for this problem, all we need is a way to avoid the problematic MSR writes when finishing an injected MCE. That''s fully contained to the hypervisor. Jan
Jan Beulich wrote:>>>> On 27.02.13 at 12:08, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> Jan Beulich wrote: >>>>>> On 27.02.13 at 11:37, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>> wrote: >>>> The reason of the former patch to clear MCi_ADDR/MISC is that it''s >>>> recommended by Intel SDM: LOG MCA REGISTER: >>>> SAVE IA32_MCi_STATUS; >>>> If MISCV in IA32_MCi_STATUS >>>> THEN >>>> SAVE IA32_MCi_MISC; >>>> FI; >>>> IF ADDRV in IA32_MCi_STATUS >>>> THEN >>>> SAVE IA32_MCi_ADDR; >>>> FI; >>>> IF CLEAR_MC_BANK = TRUE >>>> THEN >>>> SET all 0 to IA32_MCi_STATUS; >>>> If MISCV in IA32_MCi_STATUS >>>> THEN >>>> SET all 0 to IA32_MCi_MISC; >>>> FI; >>>> IF ADDRV in IA32_MCi_STATUS >>>> THEN >>>> SET all 0 to IA32_MCi_ADDR; >>>> FI; >>>> >>>> For Xen mce, it''s meaningful to read MCi_ADDR/MISC only when real >>>> error occur (which indicated by MCi_STATUS), so only clear >>>> MCi_STATUS at mce handler is an acceptable work around -- after >>>> all, to read MCi_ADDR/MISC is pointless if MCi_STATUS is 0. >>> >>> So then what - revert your original patch (and ignore the SDM)? >>> I''m not in favor of this... >> >> Not revert entire 23327, but only use this patch to revert >> MCi_ADDR/MISC clear. >> >> I also agree it''s not good, but currently seems we don''t have a >> simple and clean way to fix it, except we spend much time to to >> update xen-mceinj *tools* -- even so it''s low-priority? > > No, fixing the tool seems unnecessary for this problem, all we > need is a way to avoid the problematic MSR writes when finishing > an injected MCE. That''s fully contained to the hypervisor. > > JanThe problem comes from xen-mceinj tools simulate *some* banks for *some* cpus (intpose_arr array). Tools sometimes access simulated value, sometimes access real hardware --> that''s problematic syntax and what really need fix. Thanks, Jinsong
Liu, Jinsong wrote:> Jan Beulich wrote: >>>>> On 27.02.13 at 12:08, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>> wrote: >>> Jan Beulich wrote: >>>>>>> On 27.02.13 at 11:37, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>>> wrote: >>>>> The reason of the former patch to clear MCi_ADDR/MISC is that it''s >>>>> recommended by Intel SDM: LOG MCA REGISTER: >>>>> SAVE IA32_MCi_STATUS; >>>>> If MISCV in IA32_MCi_STATUS >>>>> THEN >>>>> SAVE IA32_MCi_MISC; >>>>> FI; >>>>> IF ADDRV in IA32_MCi_STATUS >>>>> THEN >>>>> SAVE IA32_MCi_ADDR; >>>>> FI; >>>>> IF CLEAR_MC_BANK = TRUE >>>>> THEN >>>>> SET all 0 to IA32_MCi_STATUS; >>>>> If MISCV in IA32_MCi_STATUS >>>>> THEN >>>>> SET all 0 to IA32_MCi_MISC; >>>>> FI; >>>>> IF ADDRV in IA32_MCi_STATUS >>>>> THEN >>>>> SET all 0 to IA32_MCi_ADDR; >>>>> FI; >>>>> >>>>> For Xen mce, it''s meaningful to read MCi_ADDR/MISC only when real >>>>> error occur (which indicated by MCi_STATUS), so only clear >>>>> MCi_STATUS at mce handler is an acceptable work around -- after >>>>> all, to read MCi_ADDR/MISC is pointless if MCi_STATUS is 0. >>>> >>>> So then what - revert your original patch (and ignore the SDM)? >>>> I''m not in favor of this... >>> >>> Not revert entire 23327, but only use this patch to revert >>> MCi_ADDR/MISC clear. >>> >>> I also agree it''s not good, but currently seems we don''t have a >>> simple and clean way to fix it, except we spend much time to to >>> update xen-mceinj *tools* -- even so it''s low-priority? >> >> No, fixing the tool seems unnecessary for this problem, all we >> need is a way to avoid the problematic MSR writes when finishing >> an injected MCE. That''s fully contained to the hypervisor. >> >> Jan > > The problem comes from xen-mceinj tools simulate *some* banks for > *some* cpus (intpose_arr array). Tools sometimes access simulated > value, sometimes access real hardware --> that''s problematic syntax > and what really need fix. > > Thanks, > JinsongOK, update bugfix patch, better than drop clear MCi_ADDR/MISC in this patch, will send out later. Thanks, Jinsong