SUZUKI Kazuhiro
2008-Dec-15 06:33 UTC
[Xen-devel] [PATCH 0/2] MCA support with page offlining
Hi all, I had posted about MCA support for Intel64 before. It had only a function to log the MCA error data received from hypervisor. http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00876.html I attach patches that support not only error logging but also Page Offlining function. The page where an MCA occurs will offline and not reuse. A new flag ''PGC_reserved'' was added in page count_info to mark the impacted page. I know that it is better to implement the page offlining for general purpose, but I implemented for MCA specialized in this first step. And I also implement the MCA handler of Dom0 which support to shutdown the remote domain where a MCA occurred. If the MCA occurred on a DomU, Dom0 notifies it to the DomU. When the notify is failed, Dom0 calls SCHEDOP_remote_shutdown hypercall. [1/2] xen part: mca-support-with-page-offlining-xen.patch [2/2] linux/x86_64 part: mca-support-with-page-offlining-linux.patch Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> Thanks, KAZ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Dec-16 09:30 UTC
RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
xen-devel-bounces@lists.xensource.com <> wrote:> Hi all, > > I had posted about MCA support for Intel64 before. It had only a > function to log the MCA error data received from hypervisor. > > http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html > > I attach patches that support not only error logging but also Page > Offlining function. The page where an MCA occurs will offline and not > reuse. A new flag ''PGC_reserved'' was added in page count_info to mark the > impacted page. > > I know that it is better to implement the page offlining for general > purpose, but I implemented for MCA specialized in this first step.Maybe the MCA page offline is a bit different to normal page offline requirement, so take it as first step maybe a good choice :) As for your current page_offlining, I''m not sure why the PGC_reserved page should not be freed? Also, for following code, will that make the heap(node, zone, j) can''t be allocated anymore? Maybe we can creat a special list to hold all those pages and remove them from the heap list? + if ( !list_empty(&heap(node, zone, j)) ) { + pg = list_entry(heap(node, zone, j).next, struct page_info, list); + if (!(pg->count_info & PGC_reserved)) + goto found; + else + printk(XENLOG_DEBUG "Page %p(%lx) is not to be allocated.\n", + pg, page_to_maddr(pg)); +> > And I also implement the MCA handler of Dom0 which support to shutdown > the remote domain where a MCA occurred. If the MCA occurred on a DomU, > Dom0 notifies it to the DomU. When the notify is failed, Dom0 calls > SCHEDOP_remote_shutdown hypercall. > > [1/2] xen part: mca-support-with-page-offlining-xen.patchWe are not sure we really need pass all #MC information to dom0 firstly, and let dom0 to notify domU. Xen should knows about everything, so it may have knowledge to decide inject virtual #MC to guest or not. Of course, this does not impact your patch.> [2/2] linux/x86_64 part: mca-support-with-page-offlining-linux.patchAs for how to inject virtual #MC to guest (including dom0), I think we need consider following point: a) Benefit from reusing guest #MC handler''s . #MC handler is well known difficult to test, and the native guest handler may have been tested more widely. Also #MC handler improves as time going-on, reuse guest''s MCA handler share us those improvement. b) Maintain the PV handler to different OS version may not so easy, especially as hardware improves, and kernel may have better support for error handling/containment. c) #MC handler may need some model specific information to decide the action, while guest (not dom0) has virtualized CPUID information. d) Guest''s MCA handler may requires the physical information when the #MC hapen, like the CPU number the #MC happens. e) For HVM domain, PV handler will be difficult (considering Windows guest). And we have several option to support virtual #MC to guest: Option 1 is what currently implemented. A PV #MC handler is implemented in guest. This PV handler gets MCA information from Xen HV through hypercall, including MCA MSR value, also some additional information, like which physical CPU the MCA happened. Option 1 will help us on issue d), but we need main a PV handler, and can''t get benifit from native handler. Also it does not resolve issue c) quite well. option 2, Xen will provide MCA MSR virtualization so that guest''s native #MC handler can run without changes. It can benifit from guest #MC handler, but it will be difficult to get model specific information, and has no physical information. Option 3 uses a PV #MC handler for guest as option 1, but interface between Xen/guest is abstract event, like offline offending page, terminate current execution context etc. This should be straight forward for Linux, but may be difficult to Windows and other OS. Currently we are considering option 2 to provide MCA MSR virtualization to guest, and dom0 can also benifit from such support (if guest has different CPUID as native, we will either keep guest running, or kill guest based on error code). Of course, current mechanism of passing MCA information from xen to dom0 will still be useful, but that will be used for logging purpose or for Correcatable Error. How do you think about this? Thanks Yunhong Jiang> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > > Thanks, > KAZ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wang, Shane
2008-Dec-16 11:10 UTC
RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
For page offlining, It may not be a good way to put the page into an array but a list. + pg->count_info |= PGC_reserved; + page_offlining[num_page_offlining++] = pg; I guess the code targets for offlining domain pages only, right? How about free pages and xen pages? If so, no need to check in the following when allocating free pages, since the offlined pages will not be freed into heap()()(). If not, the following may have a bug. + if ( !list_empty(&heap(node, zone, j)) ) { + pg = list_entry(heap(node, zone, j).next, struct page_info, list); + if (!(pg->count_info & PGC_reserved)) + goto found; + else + printk(XENLOG_DEBUG "Page %p(%lx) is not to be allocated.\n", + pg, page_to_maddr(pg)); + } If one free page (not pg) within pg and pg+(1U<<j) is offlined, the range pg~pg+(1U<<j) has the risk to be allocated with that page. Shane Jiang, Yunhong wrote:> xen-devel-bounces@lists.xensource.com <> wrote: >> Hi all, >> >> I had posted about MCA support for Intel64 before. It had only a >> function to log the MCA error data received from hypervisor. >> >> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 >> 0876.html >> >> I attach patches that support not only error logging but also Page >> Offlining function. The page where an MCA occurs will offline and not >> reuse. A new flag ''PGC_reserved'' was added in page count_info to >> mark the impacted page. >> >> I know that it is better to implement the page offlining for general >> purpose, but I implemented for MCA specialized in this first step. > > Maybe the MCA page offline is a bit different to normal page offline > requirement, so take it as first step maybe a good choice :) > > As for your current page_offlining, I''m not sure why the PGC_reserved > page should not be freed? Also, for following code, will that make > the heap(node, zone, j) can''t be allocated anymore? Maybe we can > creat a special list to hold all those pages and remove them from the > heap list? > > + if ( !list_empty(&heap(node, zone, j)) ) { > + pg = list_entry(heap(node, zone, j).next, struct > page_info, list); + if (!(pg->count_info & > PGC_reserved)) + goto found; > + else > + printk(XENLOG_DEBUG "Page %p(%lx) is not to > be allocated.\n", + pg, > page_to_maddr(pg)); + > > >> >> And I also implement the MCA handler of Dom0 which support to >> shutdown the remote domain where a MCA occurred. If the MCA occurred >> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, >> Dom0 calls SCHEDOP_remote_shutdown hypercall. >> >> [1/2] xen part: mca-support-with-page-offlining-xen.patch > > We are not sure we really need pass all #MC information to dom0 > firstly, and let dom0 to notify domU. Xen should knows about > everything, so it may have knowledge to decide inject virtual #MC to > guest or not. Of course, this does not impact your patch. > >> [2/2] linux/x86_64 part: mca-support-with-page-offlining-linux.patch > > As for how to inject virtual #MC to guest (including dom0), I think > we need consider following point: > > a) Benefit from reusing guest #MC handler''s . #MC handler is well > known difficult to test, and the native guest handler may have been > tested more widely. Also #MC handler improves as time going-on, reuse > guest''s MCA handler share us those improvement. > b) Maintain the PV handler to different OS version may not so easy, > especially as hardware improves, and kernel may have better support > for error handling/containment. > c) #MC handler may need some model specific information to decide the > action, while guest (not dom0) has virtualized CPUID information. > d) Guest''s MCA handler may requires the physical information when the > #MC hapen, like the CPU number the #MC happens. > e) For HVM domain, PV handler will be difficult (considering Windows > guest). > > And we have several option to support virtual #MC to guest: > > Option 1 is what currently implemented. A PV #MC handler is > implemented in guest. This PV handler gets MCA information from Xen > HV through hypercall, including MCA MSR value, also some additional > information, like which physical CPU the MCA happened. Option 1 will > help us on issue d), but we need main a PV handler, and can''t get > benifit from native handler. Also it does not resolve issue c) quite > well. > > option 2, Xen will provide MCA MSR virtualization so that guest''s > native #MC handler can run without changes. It can benifit from guest > #MC handler, but it will be difficult to get model specific > information, and has no physical information. > > Option 3 uses a PV #MC handler for guest as option 1, but interface > between Xen/guest is abstract event, like offline offending page, > terminate current execution context etc. This should be straight > forward for Linux, but may be difficult to Windows and other OS. > > Currently we are considering option 2 to provide MCA MSR > virtualization to guest, and dom0 can also benifit from such support > (if guest has different CPUID as native, we will either keep guest > running, or kill guest based on error code). Of course, current > mechanism of passing MCA information from xen to dom0 will still be > useful, but that will be used for logging purpose or for Correcatable > Error. How do you think about this? > > Thanks > Yunhong Jiang > >> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> >> >> Thanks, >> KAZ >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Dec-17 03:24 UTC
RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
Some clarification, Susuzi, I just want some discussion of how to enhance the MCA support and not against the xen part patch, which has not much change to current logicl. (The kernel part is new and may depends on the discussion). Hope feedback from you and the community. Thanks Yunhong Jiang xen-devel-bounces@lists.xensource.com <> wrote:> xen-devel-bounces@lists.xensource.com <> wrote: >> Hi all, >> >> I had posted about MCA support for Intel64 before. It had only a >> function to log the MCA error data received from hypervisor. >> >> > http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 > 0876.html >> >> I attach patches that support not only error logging but also Page >> Offlining function. The page where an MCA occurs will offline and not >> reuse. A new flag ''PGC_reserved'' was added in page count_info to mark the >> impacted page. >> >> I know that it is better to implement the page offlining for general >> purpose, but I implemented for MCA specialized in this first step. > > Maybe the MCA page offline is a bit different to normal page > offline requirement, so take it as first step maybe a good choice :) > > As for your current page_offlining, I''m not sure why the > PGC_reserved page should not be freed? Also, for following > code, will that make the heap(node, zone, j) can''t be > allocated anymore? Maybe we can creat a special list to hold > all those pages and remove them from the heap list? > > + if ( !list_empty(&heap(node, zone, j)) ) { > + pg = list_entry(heap(node, zone, j).next, > struct page_info, list); > + if (!(pg->count_info & PGC_reserved)) > + goto found; > + else > + printk(XENLOG_DEBUG "Page %p(%lx) is > not to be allocated.\n", > + pg, page_to_maddr(pg)); + > > >> >> And I also implement the MCA handler of Dom0 which support to shutdown >> the remote domain where a MCA occurred. If the MCA occurred on a DomU, >> Dom0 notifies it to the DomU. When the notify is failed, Dom0 calls >> SCHEDOP_remote_shutdown hypercall. >> >> [1/2] xen part: mca-support-with-page-offlining-xen.patch > > We are not sure we really need pass all #MC information to > dom0 firstly, and let dom0 to notify domU. Xen should knows > about everything, so it may have knowledge to decide inject > virtual #MC to guest or not. Of course, this does not impact > your patch. > >> [2/2] linux/x86_64 part: mca-support-with-page-offlining-linux.patch > > As for how to inject virtual #MC to guest (including dom0), I > think we need consider following point: > > a) Benefit from reusing guest #MC handler''s . #MC handler is > well known difficult to test, and the native guest handler may > have been tested more widely. Also #MC handler improves as > time going-on, reuse guest''s MCA handler share us those improvement. > b) Maintain the PV handler to different OS version may not so > easy, especially as hardware improves, and kernel may have > better support for error handling/containment. > c) #MC handler may need some model specific information to > decide the action, while guest (not dom0) has virtualized > CPUID information. > d) Guest''s MCA handler may requires the physical information > when the #MC hapen, like the CPU number the #MC happens. > e) For HVM domain, PV handler will be difficult (considering > Windows guest). > > And we have several option to support virtual #MC to guest: > > Option 1 is what currently implemented. A PV #MC handler is > implemented in guest. This PV handler gets MCA information > from Xen HV through hypercall, including MCA MSR value, also > some additional information, like which physical CPU the MCA > happened. Option 1 will help us on issue d), but we need main > a PV handler, and can''t get benifit from native handler. Also > it does not resolve issue c) quite well. > > option 2, Xen will provide MCA MSR virtualization so that > guest''s native #MC handler can run without changes. It can > benifit from guest #MC handler, but it will be difficult to > get model specific information, and has no physical information. > > Option 3 uses a PV #MC handler for guest as option 1, but > interface between Xen/guest is abstract event, like offline > offending page, terminate current execution context etc. This > should be straight forward for Linux, but may be difficult to > Windows and other OS. > > Currently we are considering option 2 to provide MCA MSR > virtualization to guest, and dom0 can also benifit from such > support (if guest has different CPUID as native, we will > either keep guest running, or kill guest based on error code). > Of course, current mechanism of passing MCA information from > xen to dom0 will still be useful, but that will be used for > logging purpose or for Correcatable Error. How do you think about this? > > Thanks > Yunhong Jiang > >> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> >> >> Thanks, >> KAZ >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
SUZUKI Kazuhiro
2008-Dec-19 06:41 UTC
Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining
Hi Yunhong and Shane, Thank you for your comments and reviews.> For page offlining, It may not be a good way to put the page into an array but a list.I modified to register offlined pages to a list instead of an array.> I guess the code targets for offlining domain pages only, right? How about free pages and xen pages? > If so, no need to check in the following when allocating free pages, since the offlined pages will not be freed into heap()()(). > If not, the following may have a bug.Yes, I assumed that offlining page was needed for domain pages. If xen pages are impacted, then it is enough to crash the Hypervisor, in current implementation. I attach an updated patch for xen part which also includes some bug fixes. Thanks, KAZ Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> From: "Wang, Shane" <shane.wang@intel.com> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining Date: Tue, 16 Dec 2008 19:10:00 +0800> For page offlining, It may not be a good way to put the page into an array but a list. > > + pg->count_info |= PGC_reserved; > + page_offlining[num_page_offlining++] = pg; > > I guess the code targets for offlining domain pages only, right? How about free pages and xen pages? > If so, no need to check in the following when allocating free pages, since the offlined pages will not be freed into heap()()(). > If not, the following may have a bug. > > + if ( !list_empty(&heap(node, zone, j)) ) { > + pg = list_entry(heap(node, zone, j).next, struct page_info, list); > + if (!(pg->count_info & PGC_reserved)) > + goto found; > + else > + printk(XENLOG_DEBUG "Page %p(%lx) is not to be allocated.\n", > + pg, page_to_maddr(pg)); > + } > > If one free page (not pg) within pg and pg+(1U<<j) is offlined, the range pg~pg+(1U<<j) has the risk to be allocated with that page. > > Shane > > Jiang, Yunhong wrote: > > xen-devel-bounces@lists.xensource.com <> wrote: > >> Hi all, > >> > >> I had posted about MCA support for Intel64 before. It had only a > >> function to log the MCA error data received from hypervisor. > >> > >> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 > >> 0876.html > >> > >> I attach patches that support not only error logging but also Page > >> Offlining function. The page where an MCA occurs will offline and not > >> reuse. A new flag ''PGC_reserved'' was added in page count_info to > >> mark the impacted page. > >> > >> I know that it is better to implement the page offlining for general > >> purpose, but I implemented for MCA specialized in this first step. > > > > Maybe the MCA page offline is a bit different to normal page offline > > requirement, so take it as first step maybe a good choice :) > > > > As for your current page_offlining, I''m not sure why the PGC_reserved > > page should not be freed? Also, for following code, will that make > > the heap(node, zone, j) can''t be allocated anymore? Maybe we can > > creat a special list to hold all those pages and remove them from the > > heap list? > > > > + if ( !list_empty(&heap(node, zone, j)) ) { > > + pg = list_entry(heap(node, zone, j).next, struct > > page_info, list); + if (!(pg->count_info & > > PGC_reserved)) + goto found; > > + else > > + printk(XENLOG_DEBUG "Page %p(%lx) is not to > > be allocated.\n", + pg, > > page_to_maddr(pg)); + > > > > > >> > >> And I also implement the MCA handler of Dom0 which support to > >> shutdown the remote domain where a MCA occurred. If the MCA occurred > >> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, > >> Dom0 calls SCHEDOP_remote_shutdown hypercall. > >> > >> [1/2] xen part: mca-support-with-page-offlining-xen.patch > > > > We are not sure we really need pass all #MC information to dom0 > > firstly, and let dom0 to notify domU. Xen should knows about > > everything, so it may have knowledge to decide inject virtual #MC to > > guest or not. Of course, this does not impact your patch. > > > >> [2/2] linux/x86_64 part: mca-support-with-page-offlining-linux.patch > > > > As for how to inject virtual #MC to guest (including dom0), I think > > we need consider following point: > > > > a) Benefit from reusing guest #MC handler''s . #MC handler is well > > known difficult to test, and the native guest handler may have been > > tested more widely. Also #MC handler improves as time going-on, reuse > > guest''s MCA handler share us those improvement. > > b) Maintain the PV handler to different OS version may not so easy, > > especially as hardware improves, and kernel may have better support > > for error handling/containment. > > c) #MC handler may need some model specific information to decide the > > action, while guest (not dom0) has virtualized CPUID information. > > d) Guest''s MCA handler may requires the physical information when the > > #MC hapen, like the CPU number the #MC happens. > > e) For HVM domain, PV handler will be difficult (considering Windows > > guest). > > > > And we have several option to support virtual #MC to guest: > > > > Option 1 is what currently implemented. A PV #MC handler is > > implemented in guest. This PV handler gets MCA information from Xen > > HV through hypercall, including MCA MSR value, also some additional > > information, like which physical CPU the MCA happened. Option 1 will > > help us on issue d), but we need main a PV handler, and can''t get > > benifit from native handler. Also it does not resolve issue c) quite > > well. > > > > option 2, Xen will provide MCA MSR virtualization so that guest''s > > native #MC handler can run without changes. It can benifit from guest > > #MC handler, but it will be difficult to get model specific > > information, and has no physical information. > > > > Option 3 uses a PV #MC handler for guest as option 1, but interface > > between Xen/guest is abstract event, like offline offending page, > > terminate current execution context etc. This should be straight > > forward for Linux, but may be difficult to Windows and other OS. > > > > Currently we are considering option 2 to provide MCA MSR > > virtualization to guest, and dom0 can also benifit from such support > > (if guest has different CPUID as native, we will either keep guest > > running, or kill guest based on error code). Of course, current > > mechanism of passing MCA information from xen to dom0 will still be > > useful, but that will be used for logging purpose or for Correcatable > > Error. How do you think about this? > > > > Thanks > > Yunhong Jiang > > > >> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > >> > >> Thanks, > >> KAZ > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Dec-19 10:48 UTC
RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
SUZUKI Kazuhiro <mailto:kaz@jp.fujitsu.com> wrote:> Hi Yunhong and Shane, > > Thank you for your comments and reviews. > >> For page offlining, It may not be a good way to put the page into an array >> but a list. > > I modified to register offlined pages to a list instead of an array. > >> I guess the code targets for offlining domain pages only, > right? How about free pages and xen pages? >> If so, no need to check in the following when allocating > free pages, since the offlined pages will not be freed into heap()()(). >> If not, the following may have a bug. > > Yes, I assumed that offlining page was needed for domain pages. If xen > pages are impacted, then it is enough to crash the Hypervisor, in current > implementation.We have a internal patch for the similar purpose, for page offline caused by both #MC or other purpose. We can base our work on your patch if needed. But I''m a bit confused some changes in this patch. In your previous version, if a page is marked as PGC_reserved, it will not be allocated anymore. However, in this version, when a page is marked as PGC_reserved, it is just not freed, so does that mean the page will not be removed from current list? That seems a bit hack for me. What I think to do is for page offline is: a) mark a page as PGC_reserved (or other name like PGC_broken ) b) If it is free, to step c, otherwise, wait till it is freed by the owner and to step c. c) Remove the page from the buddy system, motve it to a special and seperated list (i.e. not in the heap[][][] anymore), and return other page to the buddy allocator. Some argument is in step b, that if the page is owned by a guest, we can replace it with a new page through p2m table, and don''t need wait till it is freed, we didn''t do that currently because it is a bit complex to achieve that. How is your idea on this? Thanks Yunhong Jiang> > I attach an updated patch for xen part which also includes some bug fixes. > > Thanks, > KAZ > > Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > > > From: "Wang, Shane" <shane.wang@intel.com> > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > Date: Tue, 16 Dec 2008 19:10:00 +0800 > >> For page offlining, It may not be a good way to put the page into an array >> but a list. >> >> + pg->count_info |= PGC_reserved; >> + page_offlining[num_page_offlining++] = pg; >> >> I guess the code targets for offlining domain pages only, > right? How about free pages and xen pages? >> If so, no need to check in the following when allocating > free pages, since the offlined pages will not be freed into heap()()(). >> If not, the following may have a bug. >> >> + if ( !list_empty(&heap(node, zone, j)) ) { >> + pg = list_entry(heap(node, zone, > j).next, struct page_info, list); >> + if (!(pg->count_info & PGC_reserved)) >> + goto found; >> + else >> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be >> allocated.\n", + pg, page_to_maddr(pg)); + >> } >> >> If one free page (not pg) within pg and pg+(1U<<j) is > offlined, the range pg~pg+(1U<<j) has the risk to be allocated > with that page. >> >> Shane >> >> Jiang, Yunhong wrote: >>> xen-devel-bounces@lists.xensource.com <> wrote: >>>> Hi all, >>>> >>>> I had posted about MCA support for Intel64 before. It had only a >>>> function to log the MCA error data received from hypervisor. >>>> >>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html >>>> >>>> I attach patches that support not only error logging but also Page >>>> Offlining function. The page where an MCA occurs will offline and not >>>> reuse. A new flag ''PGC_reserved'' was added in page count_info to mark >>>> the impacted page. >>>> >>>> I know that it is better to implement the page offlining for general >>>> purpose, but I implemented for MCA specialized in this first step. >>> >>> Maybe the MCA page offline is a bit different to normal page offline >>> requirement, so take it as first step maybe a good choice :) >>> >>> As for your current page_offlining, I''m not sure why the PGC_reserved >>> page should not be freed? Also, for following code, will that make >>> the heap(node, zone, j) can''t be allocated anymore? Maybe we can >>> creat a special list to hold all those pages and remove them from the >>> heap list? >>> >>> + if ( !list_empty(&heap(node, zone, j)) ) { >>> + pg = list_entry(heap(node, zone, j).next, struct >>> page_info, list); + if (!(pg->count_info & >>> PGC_reserved)) + goto found; + >>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not to >>> be allocated.\n", + pg, page_to_maddr(pg)); >>> + >>> >>> >>>> >>>> And I also implement the MCA handler of Dom0 which support to >>>> shutdown the remote domain where a MCA occurred. If the MCA occurred >>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, >>>> Dom0 calls SCHEDOP_remote_shutdown hypercall. >>>> >>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch >>> >>> We are not sure we really need pass all #MC information to dom0 >>> firstly, and let dom0 to notify domU. Xen should knows about >>> everything, so it may have knowledge to decide inject virtual #MC to >>> guest or not. Of course, this does not impact your patch. >>> >>>> [2/2] linux/x86_64 part: > mca-support-with-page-offlining-linux.patch >>> >>> As for how to inject virtual #MC to guest (including dom0), I think >>> we need consider following point: >>> >>> a) Benefit from reusing guest #MC handler''s . #MC handler is well >>> known difficult to test, and the native guest handler may have been >>> tested more widely. Also #MC handler improves as time going-on, reuse >>> guest''s MCA handler share us those improvement. >>> b) Maintain the PV handler to different OS version may not so easy, >>> especially as hardware improves, and kernel may have better support >>> for error handling/containment. >>> c) #MC handler may need some model specific information to decide the >>> action, while guest (not dom0) has virtualized CPUID information. >>> d) Guest''s MCA handler may requires the physical information when the >>> #MC hapen, like the CPU number the #MC happens. >>> e) For HVM domain, PV handler will be difficult (considering Windows >>> guest). >>> >>> And we have several option to support virtual #MC to guest: >>> >>> Option 1 is what currently implemented. A PV #MC handler is >>> implemented in guest. This PV handler gets MCA information from Xen >>> HV through hypercall, including MCA MSR value, also some additional >>> information, like which physical CPU the MCA happened. Option 1 will >>> help us on issue d), but we need main a PV handler, and can''t get >>> benifit from native handler. Also it does not resolve issue c) quite well. >>> >>> option 2, Xen will provide MCA MSR virtualization so that guest''s >>> native #MC handler can run without changes. It can benifit from guest >>> #MC handler, but it will be difficult to get model specific >>> information, and has no physical information. >>> >>> Option 3 uses a PV #MC handler for guest as option 1, but interface >>> between Xen/guest is abstract event, like offline offending page, >>> terminate current execution context etc. This should be straight >>> forward for Linux, but may be difficult to Windows and other OS. >>> >>> Currently we are considering option 2 to provide MCA MSR >>> virtualization to guest, and dom0 can also benifit from such support >>> (if guest has different CPUID as native, we will either keep guest >>> running, or kill guest based on error code). Of course, current >>> mechanism of passing MCA information from xen to dom0 will still be >>> useful, but that will be used for logging purpose or for Correcatable >>> Error. How do you think about this? >>> >>> Thanks >>> Yunhong Jiang >>> >>>> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> >>>> >>>> Thanks, >>>> KAZ >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
SUZUKI Kazuhiro
2008-Dec-22 00:42 UTC
Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining
Hi,> But I''m a bit confused some changes in this patch.Sorry, I attached a wrong patch, this is a correct one. Thanks, KAZ Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> From: "Jiang, Yunhong" <yunhong.jiang@intel.com> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining Date: Fri, 19 Dec 2008 18:48:11 +0800> SUZUKI Kazuhiro <mailto:kaz@jp.fujitsu.com> wrote: > > Hi Yunhong and Shane, > > > > Thank you for your comments and reviews. > > > >> For page offlining, It may not be a good way to put the page into an array > >> but a list. > > > > I modified to register offlined pages to a list instead of an array. > > > >> I guess the code targets for offlining domain pages only, > > right? How about free pages and xen pages? > >> If so, no need to check in the following when allocating > > free pages, since the offlined pages will not be freed into heap()()(). > >> If not, the following may have a bug. > > > > Yes, I assumed that offlining page was needed for domain pages. If xen > > pages are impacted, then it is enough to crash the Hypervisor, in current > > implementation. > > We have a internal patch for the similar purpose, for page offline caused by both #MC or other purpose. We can base our work on your patch if needed. > > But I''m a bit confused some changes in this patch. > In your previous version, if a page is marked as PGC_reserved, it will not be allocated anymore. However, in this version, when a page is marked as PGC_reserved, it is just not freed, so does that mean the page will not be removed from current list? That seems a bit hack for me. > > What I think to do is for page offline is: > a) mark a page as PGC_reserved (or other name like PGC_broken ) > b) If it is free, to step c, otherwise, wait till it is freed by the owner and to step c. > c) Remove the page from the buddy system, motve it to a special and seperated list (i.e. not in the heap[][][] anymore), and return other page to the buddy allocator. > > Some argument is in step b, that if the page is owned by a guest, we can replace it with a new page through p2m table, and don''t need wait till it is freed, we didn''t do that currently because it is a bit complex to achieve that. > > How is your idea on this? > > Thanks > Yunhong Jiang > > > > > I attach an updated patch for xen part which also includes some bug fixes. > > > > Thanks, > > KAZ > > > > Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > > > > > > From: "Wang, Shane" <shane.wang@intel.com> > > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > > Date: Tue, 16 Dec 2008 19:10:00 +0800 > > > >> For page offlining, It may not be a good way to put the page into an array > >> but a list. > >> > >> + pg->count_info |= PGC_reserved; > >> + page_offlining[num_page_offlining++] = pg; > >> > >> I guess the code targets for offlining domain pages only, > > right? How about free pages and xen pages? > >> If so, no need to check in the following when allocating > > free pages, since the offlined pages will not be freed into heap()()(). > >> If not, the following may have a bug. > >> > >> + if ( !list_empty(&heap(node, zone, j)) ) { > >> + pg = list_entry(heap(node, zone, > > j).next, struct page_info, list); > >> + if (!(pg->count_info & PGC_reserved)) > >> + goto found; > >> + else > >> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be > >> allocated.\n", + pg, page_to_maddr(pg)); + > >> } > >> > >> If one free page (not pg) within pg and pg+(1U<<j) is > > offlined, the range pg~pg+(1U<<j) has the risk to be allocated > > with that page. > >> > >> Shane > >> > >> Jiang, Yunhong wrote: > >>> xen-devel-bounces@lists.xensource.com <> wrote: > >>>> Hi all, > >>>> > >>>> I had posted about MCA support for Intel64 before. It had only a > >>>> function to log the MCA error data received from hypervisor. > >>>> > >>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html > >>>> > >>>> I attach patches that support not only error logging but also Page > >>>> Offlining function. The page where an MCA occurs will offline and not > >>>> reuse. A new flag ''PGC_reserved'' was added in page count_info to mark > >>>> the impacted page. > >>>> > >>>> I know that it is better to implement the page offlining for general > >>>> purpose, but I implemented for MCA specialized in this first step. > >>> > >>> Maybe the MCA page offline is a bit different to normal page offline > >>> requirement, so take it as first step maybe a good choice :) > >>> > >>> As for your current page_offlining, I''m not sure why the PGC_reserved > >>> page should not be freed? Also, for following code, will that make > >>> the heap(node, zone, j) can''t be allocated anymore? Maybe we can > >>> creat a special list to hold all those pages and remove them from the > >>> heap list? > >>> > >>> + if ( !list_empty(&heap(node, zone, j)) ) { > >>> + pg = list_entry(heap(node, zone, j).next, struct > >>> page_info, list); + if (!(pg->count_info & > >>> PGC_reserved)) + goto found; + > >>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not to > >>> be allocated.\n", + pg, page_to_maddr(pg)); > >>> + > >>> > >>> > >>>> > >>>> And I also implement the MCA handler of Dom0 which support to > >>>> shutdown the remote domain where a MCA occurred. If the MCA occurred > >>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, > >>>> Dom0 calls SCHEDOP_remote_shutdown hypercall. > >>>> > >>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch > >>> > >>> We are not sure we really need pass all #MC information to dom0 > >>> firstly, and let dom0 to notify domU. Xen should knows about > >>> everything, so it may have knowledge to decide inject virtual #MC to > >>> guest or not. Of course, this does not impact your patch. > >>> > >>>> [2/2] linux/x86_64 part: > > mca-support-with-page-offlining-linux.patch > >>> > >>> As for how to inject virtual #MC to guest (including dom0), I think > >>> we need consider following point: > >>> > >>> a) Benefit from reusing guest #MC handler''s . #MC handler is well > >>> known difficult to test, and the native guest handler may have been > >>> tested more widely. Also #MC handler improves as time going-on, reuse > >>> guest''s MCA handler share us those improvement. > >>> b) Maintain the PV handler to different OS version may not so easy, > >>> especially as hardware improves, and kernel may have better support > >>> for error handling/containment. > >>> c) #MC handler may need some model specific information to decide the > >>> action, while guest (not dom0) has virtualized CPUID information. > >>> d) Guest''s MCA handler may requires the physical information when the > >>> #MC hapen, like the CPU number the #MC happens. > >>> e) For HVM domain, PV handler will be difficult (considering Windows > >>> guest). > >>> > >>> And we have several option to support virtual #MC to guest: > >>> > >>> Option 1 is what currently implemented. A PV #MC handler is > >>> implemented in guest. This PV handler gets MCA information from Xen > >>> HV through hypercall, including MCA MSR value, also some additional > >>> information, like which physical CPU the MCA happened. Option 1 will > >>> help us on issue d), but we need main a PV handler, and can''t get > >>> benifit from native handler. Also it does not resolve issue c) quite well. > >>> > >>> option 2, Xen will provide MCA MSR virtualization so that guest''s > >>> native #MC handler can run without changes. It can benifit from guest > >>> #MC handler, but it will be difficult to get model specific > >>> information, and has no physical information. > >>> > >>> Option 3 uses a PV #MC handler for guest as option 1, but interface > >>> between Xen/guest is abstract event, like offline offending page, > >>> terminate current execution context etc. This should be straight > >>> forward for Linux, but may be difficult to Windows and other OS. > >>> > >>> Currently we are considering option 2 to provide MCA MSR > >>> virtualization to guest, and dom0 can also benifit from such support > >>> (if guest has different CPUID as native, we will either keep guest > >>> running, or kill guest based on error code). Of course, current > >>> mechanism of passing MCA information from xen to dom0 will still be > >>> useful, but that will be used for logging purpose or for Correcatable > >>> Error. How do you think about this? > >>> > >>> Thanks > >>> Yunhong Jiang > >>> > >>>> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > >>>> > >>>> Thanks, > >>>> KAZ > >>>> > >>>> _______________________________________________ > >>>> Xen-devel mailing list > >>>> Xen-devel@lists.xensource.com > >>>> http://lists.xensource.com/xen-devel > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@lists.xensource.com > >>> http://lists.xensource.com/xen-devel > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wang, Shane
2008-Dec-22 01:52 UTC
RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
Hi KAZ, For your latest patch, I am thinking of the following code. + /* check whether a page number have been already registered or not */ + list_for_each_entry(page, &page_offlining, list) { + if (page == pg) + goto out; + } x86_page_offlining_t e is put into the list. Here, is it "if (e->page == pg)"? + + /* check whether already having attribute 'reserved' */ + if (pg->count_info & PGC_reserved) { + printk(XENLOG_DEBUG "Page offlining: ( %lx ) failure.\n", + maddr); + return 1; + } + + /* add attribute 'reserved' and register the page */ + get_page(pg, d); + pg->count_info |= PGC_reserved; + + e = xmalloc(x86_page_offlining_t); + BUG_ON(e == NULL); + e->page = pg; + list_add(&e->list, &page_offlining); Since page struct is also a list entry (containing "struct list_head list"). You can pick off from domain page list and put it onto your page_offlining list. Certainly, the domain may be using it now. You can mark it by using your flag PGC_reserved first and then do this in the free function. I think this is what Jiang Yunhong suggested:) What do you think on those suggestions? PS: in that case, alloc_bitmap[]/heap[][][]/avail[] should be considered. For the code piece + if ( !list_empty(&heap(node, zone, j)) ) { + pg = list_entry(heap(node, zone, j).next, struct page_info, list); + if (!(pg->count_info & PGC_reserved)) + goto found; + else + printk(XENLOG_DEBUG "Page %p(%lx) is not to be allocated.\n", + pg, page_to_maddr(pg)); + It seems this mechanism is not efficient enough since the first page is marked PGC_reserved (but the rest pages in heap(node, zone, j).next are not, the rest pages can't be used and allocated any more although they are not offlined), and it didn't solve the potential bug I mentioned. Moreover, since the original code (i.e. without your patch) just tries to find a space with an enough order, it just gets the first finding. So it is enough. However, I am wondering why you don't try to enumerate heap(node, zone, j) to get the second one ... but directly return "not to be allocated", when the code finds the first list entry can't meet the requirement? Thanks. Shane -----Original Message----- From: SUZUKI Kazuhiro [mailto:kaz@jp.fujitsu.com] Sent: 2008年12月22日 8:42 To: Jiang, Yunhong Cc: Wang, Shane; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining Hi,> But I'm a bit confused some changes in this patch.Sorry, I attached a wrong patch, this is a correct one. Thanks, KAZ Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> From: "Jiang, Yunhong" <yunhong.jiang@intel.com> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining Date: Fri, 19 Dec 2008 18:48:11 +0800> SUZUKI Kazuhiro <mailto:kaz@jp.fujitsu.com> wrote: > > Hi Yunhong and Shane, > > > > Thank you for your comments and reviews. > > > >> For page offlining, It may not be a good way to put the page into an array > >> but a list. > > > > I modified to register offlined pages to a list instead of an array. > > > >> I guess the code targets for offlining domain pages only, > > right? How about free pages and xen pages? > >> If so, no need to check in the following when allocating > > free pages, since the offlined pages will not be freed into heap()()(). > >> If not, the following may have a bug. > > > > Yes, I assumed that offlining page was needed for domain pages. If xen > > pages are impacted, then it is enough to crash the Hypervisor, in current > > implementation. > > We have a internal patch for the similar purpose, for page offline caused by both #MC or other purpose. We can base our work on your patch if needed. > > But I'm a bit confused some changes in this patch. > In your previous version, if a page is marked as PGC_reserved, it will not be allocated anymore. However, in this version, when a page is marked as PGC_reserved, it is just not freed, so does that mean the page will not be removed from current list? That seems a bit hack for me. > > What I think to do is for page offline is: > a) mark a page as PGC_reserved (or other name like PGC_broken ) > b) If it is free, to step c, otherwise, wait till it is freed by the owner and to step c. > c) Remove the page from the buddy system, motve it to a special and seperated list (i.e. not in the heap[][][] anymore), and return other page to the buddy allocator. > > Some argument is in step b, that if the page is owned by a guest, we can replace it with a new page through p2m table, and don't need wait till it is freed, we didn't do that currently because it is a bit complex to achieve that. > > How is your idea on this? > > Thanks > Yunhong Jiang > > > > > I attach an updated patch for xen part which also includes some bug fixes. > > > > Thanks, > > KAZ > > > > Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > > > > > > From: "Wang, Shane" <shane.wang@intel.com> > > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > > Date: Tue, 16 Dec 2008 19:10:00 +0800 > > > >> For page offlining, It may not be a good way to put the page into an array > >> but a list. > >> > >> + pg->count_info |= PGC_reserved; > >> + page_offlining[num_page_offlining++] = pg; > >> > >> I guess the code targets for offlining domain pages only, > > right? How about free pages and xen pages? > >> If so, no need to check in the following when allocating > > free pages, since the offlined pages will not be freed into heap()()(). > >> If not, the following may have a bug. > >> > >> + if ( !list_empty(&heap(node, zone, j)) ) { > >> + pg = list_entry(heap(node, zone, > > j).next, struct page_info, list); > >> + if (!(pg->count_info & PGC_reserved)) > >> + goto found; > >> + else > >> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be > >> allocated.\n", + pg, page_to_maddr(pg)); + > >> } > >> > >> If one free page (not pg) within pg and pg+(1U<<j) is > > offlined, the range pg~pg+(1U<<j) has the risk to be allocated > > with that page. > >> > >> Shane > >> > >> Jiang, Yunhong wrote: > >>> xen-devel-bounces@lists.xensource.com <> wrote: > >>>> Hi all, > >>>> > >>>> I had posted about MCA support for Intel64 before. It had only a > >>>> function to log the MCA error data received from hypervisor. > >>>> > >>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html > >>>> > >>>> I attach patches that support not only error logging but also Page > >>>> Offlining function. The page where an MCA occurs will offline and not > >>>> reuse. A new flag 'PGC_reserved' was added in page count_info to mark > >>>> the impacted page. > >>>> > >>>> I know that it is better to implement the page offlining for general > >>>> purpose, but I implemented for MCA specialized in this first step. > >>> > >>> Maybe the MCA page offline is a bit different to normal page offline > >>> requirement, so take it as first step maybe a good choice :) > >>> > >>> As for your current page_offlining, I'm not sure why the PGC_reserved > >>> page should not be freed? Also, for following code, will that make > >>> the heap(node, zone, j) can't be allocated anymore? Maybe we can > >>> creat a special list to hold all those pages and remove them from the > >>> heap list? > >>> > >>> + if ( !list_empty(&heap(node, zone, j)) ) { > >>> + pg = list_entry(heap(node, zone, j).next, struct > >>> page_info, list); + if (!(pg->count_info & > >>> PGC_reserved)) + goto found; + > >>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not to > >>> be allocated.\n", + pg, page_to_maddr(pg)); > >>> + > >>> > >>> > >>>> > >>>> And I also implement the MCA handler of Dom0 which support to > >>>> shutdown the remote domain where a MCA occurred. If the MCA occurred > >>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, > >>>> Dom0 calls SCHEDOP_remote_shutdown hypercall. > >>>> > >>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch > >>> > >>> We are not sure we really need pass all #MC information to dom0 > >>> firstly, and let dom0 to notify domU. Xen should knows about > >>> everything, so it may have knowledge to decide inject virtual #MC to > >>> guest or not. Of course, this does not impact your patch. > >>> > >>>> [2/2] linux/x86_64 part: > > mca-support-with-page-offlining-linux.patch > >>> > >>> As for how to inject virtual #MC to guest (including dom0), I think > >>> we need consider following point: > >>> > >>> a) Benefit from reusing guest #MC handler's . #MC handler is well > >>> known difficult to test, and the native guest handler may have been > >>> tested more widely. Also #MC handler improves as time going-on, reuse > >>> guest's MCA handler share us those improvement. > >>> b) Maintain the PV handler to different OS version may not so easy, > >>> especially as hardware improves, and kernel may have better support > >>> for error handling/containment. > >>> c) #MC handler may need some model specific information to decide the > >>> action, while guest (not dom0) has virtualized CPUID information. > >>> d) Guest's MCA handler may requires the physical information when the > >>> #MC hapen, like the CPU number the #MC happens. > >>> e) For HVM domain, PV handler will be difficult (considering Windows > >>> guest). > >>> > >>> And we have several option to support virtual #MC to guest: > >>> > >>> Option 1 is what currently implemented. A PV #MC handler is > >>> implemented in guest. This PV handler gets MCA information from Xen > >>> HV through hypercall, including MCA MSR value, also some additional > >>> information, like which physical CPU the MCA happened. Option 1 will > >>> help us on issue d), but we need main a PV handler, and can't get > >>> benifit from native handler. Also it does not resolve issue c) quite well. > >>> > >>> option 2, Xen will provide MCA MSR virtualization so that guest's > >>> native #MC handler can run without changes. It can benifit from guest > >>> #MC handler, but it will be difficult to get model specific > >>> information, and has no physical information. > >>> > >>> Option 3 uses a PV #MC handler for guest as option 1, but interface > >>> between Xen/guest is abstract event, like offline offending page, > >>> terminate current execution context etc. This should be straight > >>> forward for Linux, but may be difficult to Windows and other OS. > >>> > >>> Currently we are considering option 2 to provide MCA MSR > >>> virtualization to guest, and dom0 can also benifit from such support > >>> (if guest has different CPUID as native, we will either keep guest > >>> running, or kill guest based on error code). Of course, current > >>> mechanism of passing MCA information from xen to dom0 will still be > >>> useful, but that will be used for logging purpose or for Correcatable > >>> Error. How do you think about this? > >>> > >>> Thanks > >>> Yunhong Jiang > >>> > >>>> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > >>>> > >>>> Thanks, > >>>> KAZ > >>>> > >>>> _______________________________________________ > >>>> Xen-devel mailing list > >>>> Xen-devel@lists.xensource.com > >>>> http://lists.xensource.com/xen-devel > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@lists.xensource.com > >>> http://lists.xensource.com/xen-devel > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
SUZUKI Kazuhiro
2008-Dec-25 06:06 UTC
Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining
Hi Wang, Thank you very much for your comments. I will post an updated patch later. Thanks, KAZ From: "Wang, Shane" <shane.wang@intel.com> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining Date: Mon, 22 Dec 2008 09:52:28 +0800> Hi KAZ, > > For your latest patch, I am thinking of the following code. > > + /* check whether a page number have been already registered or not */ > + list_for_each_entry(page, &page_offlining, list) { > + if (page == pg) > + goto out; > + } > > x86_page_offlining_t e is put into the list. Here, is it "if (e->page == pg)"? > > + > + /* check whether already having attribute 'reserved' */ > + if (pg->count_info & PGC_reserved) { > + printk(XENLOG_DEBUG "Page offlining: ( %lx ) failure.\n", > + maddr); > + return 1; > + } > + > + /* add attribute 'reserved' and register the page */ > + get_page(pg, d); > + pg->count_info |= PGC_reserved; > + > + e = xmalloc(x86_page_offlining_t); > + BUG_ON(e == NULL); > + e->page = pg; > + list_add(&e->list, &page_offlining); > > Since page struct is also a list entry (containing "struct list_head list"). You can pick off from domain page list and put it onto your page_offlining list. > Certainly, the domain may be using it now. You can mark it by using your flag PGC_reserved first and then do this in the free function. I think this is what Jiang Yunhong suggested:) > What do you think on those suggestions? PS: in that case, alloc_bitmap[]/heap[][][]/avail[] should be considered. > > For the code piece > > + if ( !list_empty(&heap(node, zone, j)) ) { > + pg = list_entry(heap(node, zone, j).next, struct page_info, list); > + if (!(pg->count_info & PGC_reserved)) > + goto found; > + else > + printk(XENLOG_DEBUG "Page %p(%lx) is not to be allocated.\n", > + pg, page_to_maddr(pg)); > + > > It seems this mechanism is not efficient enough since the first page is marked PGC_reserved (but the rest pages in heap(node, zone, j).next are not, the rest pages can't be used and allocated any more although they are not offlined), and it didn't solve the potential bug I mentioned. > Moreover, since the original code (i.e. without your patch) just tries to find a space with an enough order, it just gets the first finding. So it is enough. However, I am wondering why you don't try to enumerate heap(node, zone, j) to get the second one ... but directly return "not to be allocated", when the code finds the first list entry can't meet the requirement? > > Thanks. > Shane > > -----Original Message----- > From: SUZUKI Kazuhiro [mailto:kaz@jp.fujitsu.com] > Sent: 2008年12月22日 8:42 > To: Jiang, Yunhong > Cc: Wang, Shane; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining > > Hi, > > > But I'm a bit confused some changes in this patch. > > Sorry, I attached a wrong patch, this is a correct one. > > Thanks, > KAZ > > Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > > > From: "Jiang, Yunhong" <yunhong.jiang@intel.com> > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > Date: Fri, 19 Dec 2008 18:48:11 +0800 > > > SUZUKI Kazuhiro <mailto:kaz@jp.fujitsu.com> wrote: > > > Hi Yunhong and Shane, > > > > > > Thank you for your comments and reviews. > > > > > >> For page offlining, It may not be a good way to put the page into an array > > >> but a list. > > > > > > I modified to register offlined pages to a list instead of an array. > > > > > >> I guess the code targets for offlining domain pages only, > > > right? How about free pages and xen pages? > > >> If so, no need to check in the following when allocating > > > free pages, since the offlined pages will not be freed into heap()()(). > > >> If not, the following may have a bug. > > > > > > Yes, I assumed that offlining page was needed for domain pages. If xen > > > pages are impacted, then it is enough to crash the Hypervisor, in current > > > implementation. > > > > We have a internal patch for the similar purpose, for page offline caused by both #MC or other purpose. We can base our work on your patch if needed. > > > > But I'm a bit confused some changes in this patch. > > In your previous version, if a page is marked as PGC_reserved, it will not be allocated anymore. However, in this version, when a page is marked as PGC_reserved, it is just not freed, so does that mean the page will not be removed from current list? That seems a bit hack for me. > > > > What I think to do is for page offline is: > > a) mark a page as PGC_reserved (or other name like PGC_broken ) > > b) If it is free, to step c, otherwise, wait till it is freed by the owner and to step c. > > c) Remove the page from the buddy system, motve it to a special and seperated list (i.e. not in the heap[][][] anymore), and return other page to the buddy allocator. > > > > Some argument is in step b, that if the page is owned by a guest, we can replace it with a new page through p2m table, and don't need wait till it is freed, we didn't do that currently because it is a bit complex to achieve that. > > > > How is your idea on this? > > > > Thanks > > Yunhong Jiang > > > > > > > > I attach an updated patch for xen part which also includes some bug fixes. > > > > > > Thanks, > > > KAZ > > > > > > Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > > > > > > > > > From: "Wang, Shane" <shane.wang@intel.com> > > > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > > > Date: Tue, 16 Dec 2008 19:10:00 +0800 > > > > > >> For page offlining, It may not be a good way to put the page into an array > > >> but a list. > > >> > > >> + pg->count_info |= PGC_reserved; > > >> + page_offlining[num_page_offlining++] = pg; > > >> > > >> I guess the code targets for offlining domain pages only, > > > right? How about free pages and xen pages? > > >> If so, no need to check in the following when allocating > > > free pages, since the offlined pages will not be freed into heap()()(). > > >> If not, the following may have a bug. > > >> > > >> + if ( !list_empty(&heap(node, zone, j)) ) { > > >> + pg = list_entry(heap(node, zone, > > > j).next, struct page_info, list); > > >> + if (!(pg->count_info & PGC_reserved)) > > >> + goto found; > > >> + else > > >> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be > > >> allocated.\n", + pg, page_to_maddr(pg)); + > > >> } > > >> > > >> If one free page (not pg) within pg and pg+(1U<<j) is > > > offlined, the range pg~pg+(1U<<j) has the risk to be allocated > > > with that page. > > >> > > >> Shane > > >> > > >> Jiang, Yunhong wrote: > > >>> xen-devel-bounces@lists.xensource.com <> wrote: > > >>>> Hi all, > > >>>> > > >>>> I had posted about MCA support for Intel64 before. It had only a > > >>>> function to log the MCA error data received from hypervisor. > > >>>> > > >>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html > > >>>> > > >>>> I attach patches that support not only error logging but also Page > > >>>> Offlining function. The page where an MCA occurs will offline and not > > >>>> reuse. A new flag 'PGC_reserved' was added in page count_info to mark > > >>>> the impacted page. > > >>>> > > >>>> I know that it is better to implement the page offlining for general > > >>>> purpose, but I implemented for MCA specialized in this first step. > > >>> > > >>> Maybe the MCA page offline is a bit different to normal page offline > > >>> requirement, so take it as first step maybe a good choice :) > > >>> > > >>> As for your current page_offlining, I'm not sure why the PGC_reserved > > >>> page should not be freed? Also, for following code, will that make > > >>> the heap(node, zone, j) can't be allocated anymore? Maybe we can > > >>> creat a special list to hold all those pages and remove them from the > > >>> heap list? > > >>> > > >>> + if ( !list_empty(&heap(node, zone, j)) ) { > > >>> + pg = list_entry(heap(node, zone, j).next, struct > > >>> page_info, list); + if (!(pg->count_info & > > >>> PGC_reserved)) + goto found; + > > >>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not to > > >>> be allocated.\n", + pg, page_to_maddr(pg)); > > >>> + > > >>> > > >>> > > >>>> > > >>>> And I also implement the MCA handler of Dom0 which support to > > >>>> shutdown the remote domain where a MCA occurred. If the MCA occurred > > >>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, > > >>>> Dom0 calls SCHEDOP_remote_shutdown hypercall. > > >>>> > > >>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch > > >>> > > >>> We are not sure we really need pass all #MC information to dom0 > > >>> firstly, and let dom0 to notify domU. Xen should knows about > > >>> everything, so it may have knowledge to decide inject virtual #MC to > > >>> guest or not. Of course, this does not impact your patch. > > >>> > > >>>> [2/2] linux/x86_64 part: > > > mca-support-with-page-offlining-linux.patch > > >>> > > >>> As for how to inject virtual #MC to guest (including dom0), I think > > >>> we need consider following point: > > >>> > > >>> a) Benefit from reusing guest #MC handler's . #MC handler is well > > >>> known difficult to test, and the native guest handler may have been > > >>> tested more widely. Also #MC handler improves as time going-on, reuse > > >>> guest's MCA handler share us those improvement. > > >>> b) Maintain the PV handler to different OS version may not so easy, > > >>> especially as hardware improves, and kernel may have better support > > >>> for error handling/containment. > > >>> c) #MC handler may need some model specific information to decide the > > >>> action, while guest (not dom0) has virtualized CPUID information. > > >>> d) Guest's MCA handler may requires the physical information when the > > >>> #MC hapen, like the CPU number the #MC happens. > > >>> e) For HVM domain, PV handler will be difficult (considering Windows > > >>> guest). > > >>> > > >>> And we have several option to support virtual #MC to guest: > > >>> > > >>> Option 1 is what currently implemented. A PV #MC handler is > > >>> implemented in guest. This PV handler gets MCA information from Xen > > >>> HV through hypercall, including MCA MSR value, also some additional > > >>> information, like which physical CPU the MCA happened. Option 1 will > > >>> help us on issue d), but we need main a PV handler, and can't get > > >>> benifit from native handler. Also it does not resolve issue c) quite well. > > >>> > > >>> option 2, Xen will provide MCA MSR virtualization so that guest's > > >>> native #MC handler can run without changes. It can benifit from guest > > >>> #MC handler, but it will be difficult to get model specific > > >>> information, and has no physical information. > > >>> > > >>> Option 3 uses a PV #MC handler for guest as option 1, but interface > > >>> between Xen/guest is abstract event, like offline offending page, > > >>> terminate current execution context etc. This should be straight > > >>> forward for Linux, but may be difficult to Windows and other OS. > > >>> > > >>> Currently we are considering option 2 to provide MCA MSR > > >>> virtualization to guest, and dom0 can also benifit from such support > > >>> (if guest has different CPUID as native, we will either keep guest > > >>> running, or kill guest based on error code). Of course, current > > >>> mechanism of passing MCA information from xen to dom0 will still be > > >>> useful, but that will be used for logging purpose or for Correcatable > > >>> Error. How do you think about this? > > >>> > > >>> Thanks > > >>> Yunhong Jiang > > >>> > > >>>> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> > > >>>> > > >>>> Thanks, > > >>>> KAZ > > >>>> > > >>>> _______________________________________________ > > >>>> Xen-devel mailing list > > >>>> Xen-devel@lists.xensource.com > > >>>> http://lists.xensource.com/xen-devel > > >>> _______________________________________________ > > >>> Xen-devel mailing list > > >>> Xen-devel@lists.xensource.com > > >>> http://lists.xensource.com/xen-devel > > >> > > >> > > >> _______________________________________________ > > >> Xen-devel mailing list > > >> Xen-devel@lists.xensource.com > > >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel