Jiang, Yunhong
2010-Apr-19 08:58 UTC
[Xen-devel] [RFC] [PATCH 0/2] Some clean-up to MCA handling
These two patches includes some clean-up/changes to MCA handling. The mce_extended.patch change the method to get the extended MCA information. This change is somwhat straightforward and is easy to understand. But please notice it changes some interface in include/public/arch-x86/xen-mca.h. The mce_ops.patch add mca_ops to MCE/polling/CMCI handler, to combine the handling process and then wrap mce(intel and AMD)/polling/CMCI with it. I''m sorry that it also include some Intel specific changes as listed in intem ''g'' in corresponding patch description. I tried to maintain it as different patch but later found it bring too much extra effort, so I qfold them together. Christoph/Frank, can you please have a look on the mce_ops method, especially the working flow of mcheck_mca_handler and the mca_ops interface definition? This patchset depends on the two patches I sent out yesterday, but those patches are very straightforward and thus I didn''t emphasize as RFC. Thanks Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2010-Apr-19 10:06 UTC
[Xen-devel] Re: [RFC] [PATCH 0/2] Some clean-up to MCA handling
On Monday 19 April 2010 10:58:44 Jiang, Yunhong wrote:> These two patches includes some clean-up/changes to MCA handling. > > The mce_extended.patch change the method to get the extended MCA > information. This change is somwhat straightforward and is easy to > understand. But please notice it changes some interface in > include/public/arch-x86/xen-mca.h.This breaks backward-compatibility. You can''t change the API/ABI just for the sake of "easier" internal handling. Please explain: - what exactly is wrong with the interface as it is in upstream ? - what *requries* an API/ABI change ? Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-19 15:32 UTC
[Xen-devel] RE: [RFC] [PATCH 0/2] Some clean-up to MCA handling
>-----Original Message----- >From: Christoph Egger [mailto:Christoph.Egger@amd.com] >Sent: Monday, April 19, 2010 6:07 PM >To: Jiang, Yunhong >Cc: Frank.Vanderlinden@Sun.COM; Keir Fraser; xen-devel@lists.xensource.com >Subject: Re: [RFC] [PATCH 0/2] Some clean-up to MCA handling > >On Monday 19 April 2010 10:58:44 Jiang, Yunhong wrote: >> These two patches includes some clean-up/changes to MCA handling. >> >> The mce_extended.patch change the method to get the extended MCA >> information. This change is somwhat straightforward and is easy to >> understand. But please notice it changes some interface in >> include/public/arch-x86/xen-mca.h. > >This breaks backward-compatibility. You can''t change the API/ABI just for the >sake of "easier" internal handling. > >Please explain: >- what exactly is wrong with the interface as it is in upstream ? >- what *requries* an API/ABI change ?It is not "easier" internalhandling. In fact, it makes no difference to internal handling at all. The reason is: 1) In amd_f10.c, it will only occupy 4 mc_msr, while in Intel platform, it may occupy 32 mc_msr, that is sure to cost extra space. The mc_info buffer is quite limited and can''t be expanded in run time, so reduce the size is quite important. 2) sizeof(void *) is different in 64 hypervisor and32 bit dom0. I''m not sure if it is tested in compatibility mode, which might be confused. In fact, since we have mc_msrs included in mcinfo_extended already, the caller can get the size of the buffer quite easy. Of course, if you *really* don''t care the waste of size in AMD platform, it''s ok for me. After all, in intel platform, either there is no extended information, or it will occupy all of them, so it really does not matter to me. But the (void*) issue should be resolved, I suspect. How about your option to the other patch? Thanks --jyh> >Christoph > > >-- >---to satisfy European Law for business letters: >Advanced Micro Devices GmbH >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2010-Apr-19 15:52 UTC
[Xen-devel] Re: [RFC] [PATCH 0/2] Some clean-up to MCA handling
On Monday 19 April 2010 17:32:17 Jiang, Yunhong wrote:> >-----Original Message----- > >From: Christoph Egger [mailto:Christoph.Egger@amd.com] > >Sent: Monday, April 19, 2010 6:07 PM > >To: Jiang, Yunhong > >Cc: Frank.Vanderlinden@Sun.COM; Keir Fraser; xen-devel@lists.xensource.com > >Subject: Re: [RFC] [PATCH 0/2] Some clean-up to MCA handling > > > >On Monday 19 April 2010 10:58:44 Jiang, Yunhong wrote: > >> These two patches includes some clean-up/changes to MCA handling. > >> > >> The mce_extended.patch change the method to get the extended MCA > >> information. This change is somwhat straightforward and is easy to > >> understand. But please notice it changes some interface in > >> include/public/arch-x86/xen-mca.h. > > > >This breaks backward-compatibility. You can''t change the API/ABI just for > > the sake of "easier" internal handling. > > > >Please explain: > >- what exactly is wrong with the interface as it is in upstream ? > >- what *requries* an API/ABI change ? > > It is not "easier" internalhandling. In fact, it makes no difference to > internal handling at all. The reason is: 1) In amd_f10.c, it will only > occupy 4 mc_msr,well, 4 in the generic handler plus 3 MSRs via mcinfo_extended which have been introduced in family10h.> while in Intel platform, it may occupy 32 mc_msr, that is > sure to cost extra space. The mc_info buffer is quite limited and can''t be > expanded in run time, so reduce the size is quite important. > 2) sizeof(void *) is different in 64 hypervisor and 32 bit dom0. I''m not > sure if it is tested in compatibility mode, which might be confused. > > In fact, since we have mc_msrs included in mcinfo_extended already, the > caller can get the size of the buffer quite easy. > > Of course, if you *really* don''t care the waste of size in AMD platform, > it''s ok for me. After all, in intel platform, either there is no extended > information, or it will occupy all of them, so it really does not matter to > me. But the (void*) issue should be resolved, I suspect.Is it possible to change the internal infrastructure to deal with multiple mc_info''s ? The user (Dom0) will keep to see just one because the switch happens underneath. What does not fit into one mc_info will be put into the next. In xen you will need some operations: lowlevel: allocate, free, switch, read, write highlevel: get and put The mce code itself should just use the highlevel operations and also just see one mc_info. The highlevel operations see as many mc_info as needed and use the lowlevel operations which work on mc_info directly. Does that make sense to you ?> How about your option to the other patch?Still need to have a look at it.> Thanks > --jyh > > >Christoph > > > > > >-- > >---to satisfy European Law for business letters: > >Advanced Micro Devices GmbH > >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni > >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > >Registergericht Muenchen, HRB Nr. 43632-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-20 08:06 UTC
[Xen-devel] RE: [RFC] [PATCH 0/2] Some clean-up to MCA handling
>> It is not "easier" internalhandling. In fact, it makes no difference to >> internal handling at all. The reason is: 1) In amd_f10.c, it will only >> occupy 4 mc_msr, > >well, 4 in the generic handler plus 3 MSRs via mcinfo_extended which have >been introduced in family10h.So it means 16 -3 extra mc_msrs be wasted at least, for each mcinfo_extended :-)> >> while in Intel platform, it may occupy 32 mc_msr, that is >> sure to cost extra space. The mc_info buffer is quite limited and can''t be >> expanded in run time, so reduce the size is quite important. >> 2) sizeof(void *) is different in 64 hypervisor and 32 bit dom0. I''m not >> sure if it is tested in compatibility mode, which might be confused. >> >> In fact, since we have mc_msrs included in mcinfo_extended already, the >> caller can get the size of the buffer quite easy. >> >> Of course, if you *really* don''t care the waste of size in AMD platform, >> it''s ok for me. After all, in intel platform, either there is no extended >> information, or it will occupy all of them, so it really does not matter to >> me. But the (void*) issue should be resolved, I suspect. > >Is it possible to change the internal infrastructure to deal with multiple >mc_info''s ? The user (Dom0) will keep to see just one because the switch >happens underneath.I think the size limitation happens on two side. Firstly, there is only 10 mc_info reserved in urgent queue, and 20 in non-urgent queue, and that queue size is not adjusted dynamical; Secondly, each mc_info contains 768 uint64_t.> >What does not fit into one mc_info will be put into the next. > >In xen you will need some operations: >lowlevel: allocate, free, switch, read, write >highlevel: get and put > >The mce code itself should just use the highlevel operations and also just >see one mc_info. The highlevel operations see as many mc_info as needed and >use the lowlevel operations which work on mc_info directly. > >Does that make sense to you ?I suspect this will cause much more complex. Also will this cause trouble to ABI also, since the mc_info is defined in public? As I have no data for MCE/CMCI trigger model in run time, maybe we can postpone this changes, unless some one raise this issue? Attached is the new patch, which does not change the interface anymore. --jyh> >> How about your option to the other patch? > >Still need to have a look at it. > >> Thanks >> --jyh >> >> >Christoph >> > >> > >> >-- >> >---to satisfy European Law for business letters: >> >Advanced Micro Devices GmbH >> >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >> >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni >> >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >> >Registergericht Muenchen, HRB Nr. 43632 > > > >-- >---to satisfy European Law for business letters: >Advanced Micro Devices GmbH >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2010-Apr-20 08:56 UTC
[Xen-devel] Re: [RFC] [PATCH 0/2] Some clean-up to MCA handling
On Tuesday 20 April 2010 10:06:12 Jiang, Yunhong wrote:> >> It is not "easier" internalhandling. In fact, it makes no difference to > >> internal handling at all. The reason is: 1) In amd_f10.c, it will only > >> occupy 4 mc_msr, > > > >well, 4 in the generic handler plus 3 MSRs via mcinfo_extended which have > >been introduced in family10h. > > So it means 16 -3 extra mc_msrs be wasted at least, for each > mcinfo_extended :-)I consider this as ''reserved space for the future'' :-)> > >> while in Intel platform, it may occupy 32 mc_msr, that is > >> sure to cost extra space. The mc_info buffer is quite limited and can''t > >> be expanded in run time, so reduce the size is quite important. > >> 2) sizeof(void *) is different in 64 hypervisor and 32 bit dom0. I''m not > >> sure if it is tested in compatibility mode, which might be confused. > >> > >> In fact, since we have mc_msrs included in mcinfo_extended already, the > >> caller can get the size of the buffer quite easy. > >> > >> Of course, if you *really* don''t care the waste of size in AMD platform, > >> it''s ok for me. After all, in intel platform, either there is no > >> extended information, or it will occupy all of them, so it really does > >> not matter to me. But the (void*) issue should be resolved, I suspect. > > > >Is it possible to change the internal infrastructure to deal with multiple > >mc_info''s ? The user (Dom0) will keep to see just one because the switch > >happens underneath. > > I think the size limitation happens on two side. Firstly, there is only 10 > mc_info reserved in urgent queue, and 20 in non-urgent queue, and that > queue size is not adjusted dynamical; Secondly, each mc_info contains 768 > uint64_t.The API allows the user (Dom0) to fetch a mc_info again and again. So what didn''t fit into the first can be placed into the next. The reason to have it static is that the MCE implementation in both xen and in Dom0 can be lockless (xmalloc/xfree use locks).> > >What does not fit into one mc_info will be put into the next. > > > >In xen you will need some operations: > >lowlevel: allocate, free, switch, read, write > >highlevel: get and put > > > >The mce code itself should just use the highlevel operations and also just > >see one mc_info. The highlevel operations see as many mc_info as needed > > and use the lowlevel operations which work on mc_info directly. > > > >Does that make sense to you ? > > I suspect this will cause much more complex. Also will this cause trouble > to ABI also, since the mc_info is defined in public?Yes, that is to make mc_info look dynamic from the internal side.> As I have no data for MCE/CMCI trigger model in run time, maybe we can > postpone this changes, unless some one raise this issue?You can ask back within your company to get a known broken but reasonable usable hardware (i.e. CPU L2 cache produces certain errors or DIMM plugged in slot 3 produces certain errors). Then you know what error types you can expect on such a machine and how to deal with.> Attached is the new patch, which does not change the interface anymore.This patch is good.> > --jyh > > >> How about your option to the other patch? > > > >Still need to have a look at it. > > > >> Thanks > >> --jyh > >> > >> >Christoph > >> > > >> > > >> >-- > >> >---to satisfy European Law for business letters: > >> >Advanced Micro Devices GmbH > >> >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > >> >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni > >> >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > >> >Registergericht Muenchen, HRB Nr. 43632 > > > >-- > >---to satisfy European Law for business letters: > >Advanced Micro Devices GmbH > >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni > >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > >Registergericht Muenchen, HRB Nr. 43632-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-20 09:21 UTC
[Xen-devel] RE: [RFC] [PATCH 0/2] Some clean-up to MCA handling
>> As I have no data for MCE/CMCI trigger model in run time, maybe we can >> postpone this changes, unless some one raise this issue? > >You can ask back within your company to get a known broken but reasonable >usable hardware (i.e. CPU L2 cache produces certain errors or DIMM plugged >in slot 3 produces certain errors). >Then you know what error types you can expect on such a machine and how >to deal with.we can trigger #MCE in our platform, and we can find some DIMM with one bit broken if needed, but the run-time static data is something different IMO.> >> Attached is the new patch, which does not change the interface anymore. > >This patch is good.Thanks for review and comments. I will re-submit it after comments on the other patch. BTW, do you know if Frank still working on MCA? --jyh> >> >> --jyh >> >> >> How about your option to the other patch? >> > >> >Still need to have a look at it. >> > >> >> Thanks >> >> --jyh >> >> >> >> >Christoph >> >> > >> >> > >> >> >-- >> >> >---to satisfy European Law for business letters: >> >> >Advanced Micro Devices GmbH >> >> >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >> >> >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni >> >> >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >> >> >Registergericht Muenchen, HRB Nr. 43632 >> > >> >-- >> >---to satisfy European Law for business letters: >> >Advanced Micro Devices GmbH >> >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >> >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni >> >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >> >Registergericht Muenchen, HRB Nr. 43632 > > > >-- >---to satisfy European Law for business letters: >Advanced Micro Devices GmbH >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel