Jiang, Yunhong
2009-Feb-09 08:54 UTC
[Xen-devel] [RFC][PATCH] Basic support for page offline
Hi, Tim, this patchset try to support page offline request. I want to get some initial feedback before more testing. Page offline can be used by multiple usage model, belows are some examples: a) If too many correctable error happen to one page, management tools may try to offline the page to avoid more server error in future; b) When page is ECC error and can''t be recoverd by hardware, Xen''s MCA handler may try to offline the page, so that it will not be accessed anymore. c) Offline some DIMM for power management etc (Of course, this is far more than simple page offline) The basic idea to offline a page is: 1) If a page is free, it will be removed from page allocator 2) If page is in use, the owner will be checked 2.1) if it is owned by xen/dom0, the offline will be failed 2.2) If it is owned by a PV guest with no device assigned, user space tools will try to replace the page with new one. 2.3) It it is owned by a HVM guest with no device assigned, user space tools will try to live migration it. 2.4) If it is owned by a guest with device assigned, user space tools can do live migration if needed. This patchset includes support for type 2.1/2.2. page_offfline_xen.patch gives basic support. The new hypercall (XEN_SYSCTL_page_offline) will mark a page offlining if the page is in-use, otherwise, it will remove the page from the page allocator. It also changes the free_heap_pages(), so that if a page_offlining page is freed, that page will be marked as page_offlined and will not be allocated anymore. One tricky thing is, the offlined page may not be buddy-aligned (i.e., it may be in the middle of a 2^order pages), so that we have to re-arrange the buddy system (i.e. &heap[][][]) carefully. page_offline_xen_memory.patch add support to PV guest, a new hypercall (XENMEM_page_offline) try to replace the old page with the new one. This will happen only when the guest has been suspeneded, to avoid complex page sharing situation. I''m still checking if more situation need be considered, like LDT pages and CR3 pages, so any suggestion is really great help. page_offline_tools.patch is an example user space tools based on libxc/xc_domain_save.c, it will try to firstly mark a page offline, and checking the result. If a page is owned by a PV guest, it will try to replace the pages. I did some basic testing, tried free pages and PV guest pages and is ok. Of course, I need more test on it. And more robust error handling is needed. Any suggestion is welcome. Thanks Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Feb-10 09:15 UTC
[Xen-devel] Re: [RFC][PATCH] Basic support for page offline
At 03:54 -0500 on 09 Feb (1234151686), Jiang, Yunhong wrote:> Hi, Tim, this patchset try to support page offline request. I want to get some initial feedback before more testing.I haven''t had a chance to read the patches in detail yet, but my initial impression is that: - The general approach so far seems good (I suspect that your 2.3 stage below could also be done like 2.2 without a full live migration but since that''s not implemented yet that''s fine). - It seems like a lot of code for what it does. On the Xen side that''s just a general impression since I''m not familiar with the bits of the heap allocators that you''re changing. In libxc you seem to have duplicated parts of the save/restore code -- better to make those routines externally visible to the rest of libxc and call them from your new function. - Like all systems code everywhere, it needs more comments. :) You''ve introduced some generic-sounding functions (adjust_pte &c) without describing what they do. I''ll have more detailed comments later in the week, I hope. Cheers, Tim.> Page offline can be used by multiple usage model, belows are some examples: > a) If too many correctable error happen to one page, management tools may try to offline the page to avoid more server error in future; > b) When page is ECC error and can''t be recoverd by hardware, Xen''s MCA handler may try to offline the page, so that it will not be accessed anymore. > c) Offline some DIMM for power management etc (Of course, this is far more than simple page offline) > > The basic idea to offline a page is: > 1) If a page is free, it will be removed from page allocator > 2) If page is in use, the owner will be checked > 2.1) if it is owned by xen/dom0, the offline will be failed > 2.2) If it is owned by a PV guest with no device assigned, user space tools will try to replace the page with new one. > 2.3) It it is owned by a HVM guest with no device assigned, user space tools will try to live migration it. > 2.4) If it is owned by a guest with device assigned, user space tools can do live migration if needed. > > This patchset includes support for type 2.1/2.2. > > page_offfline_xen.patch gives basic support. The new hypercall (XEN_SYSCTL_page_offline) will mark a page offlining if the page is in-use, otherwise, it will remove the page from the page allocator. It also changes the free_heap_pages(), so that if a page_offlining page is freed, that page will be marked as page_offlined and will not be allocated anymore. One tricky thing is, the offlined page may not be buddy-aligned (i.e., it may be in the middle of a 2^order pages), so that we have to re-arrange the buddy system (i.e. &heap[][][]) carefully. > > page_offline_xen_memory.patch add support to PV guest, a new hypercall (XENMEM_page_offline) try to replace the old page with the new one. This will happen only when the guest has been suspeneded, to avoid complex page sharing situation. I''m still checking if more situation need be considered, like LDT pages and CR3 pages, so any suggestion is really great help. > > page_offline_tools.patch is an example user space tools based on libxc/xc_domain_save.c, it will try to firstly mark a page offline, and checking the result. If a page is owned by a PV guest, it will try to replace the pages. > > I did some basic testing, tried free pages and PV guest pages and is ok. Of course, I need more test on it. And more robust error handling is needed. > > Any suggestion is welcome. > > Thanks > Yunhong Jiang-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-10 09:29 UTC
[Xen-devel] RE: [RFC][PATCH] Basic support for page offline
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> At 03:54 -0500 on 09 Feb (1234151686), Jiang, Yunhong wrote: >> Hi, Tim, this patchset try to support page offline request. > I want to get some initial feedback before more testing. > > I haven''t had a chance to read the patches in detail yet, but > my initial > impression is that:Thanks for your comments very much!> > - The general approach so far seems good (I suspect that your > 2.3 stage > below could also be done like 2.2 without a full live migration but > since that''s not implemented yet that''s fine).Yes, for HVM guest, if it has stub domain support, we can do it through two step: a) first do the update for the stub domain as 2.2 b) update the shadow/hap page table for the domain itself. But as stub domain support is not official enabled, so we will wait till stub domain is by default enabled.> - It seems like a lot of code for what it does. On the Xen > side that''s > just a general impression since I''m not familiar with the > bits of theFor Xen part, the reason is because the currently heap allocator is based on buddy system, while the offlined page may be in the middle of the buddy, so we have to split the original buddy into several smaller one. And do you mean I need turn to Keir for the heap allocator changes?> heap allocators that you''re changing. In libxc you seem to have > duplicated parts of the save/restore code -- better to make those > routines externally visible to the rest of libxc and call them from > your new function. - Like all systems code everywhere, it needs more > comments. :) You''ve introduced some generic-sounding functions > (adjust_pte &c) without describing what they do.Yes, I will add it in next round. -- Yunhong> > I''ll have more detailed comments later in the week, I hope. > > Cheers, > > Tim. > >> Page offline can be used by multiple usage model, belows are some examples: >> a) If too many correctable error happen to one page, > management tools may try to offline the page to avoid more > server error in future; >> b) When page is ECC error and can''t be recoverd by hardware, > Xen''s MCA handler may try to offline the page, so that it will > not be accessed anymore. >> c) Offline some DIMM for power management etc (Of course, > this is far more than simple page offline) >> >> The basic idea to offline a page is: >> 1) If a page is free, it will be removed from page allocator >> 2) If page is in use, the owner will be checked >> 2.1) if it is owned by xen/dom0, the offline will be failed >> 2.2) If it is owned by a PV guest with no device assigned, > user space tools will try to replace the page with new one. >> 2.3) It it is owned by a HVM guest with no device > assigned, user space tools will try to live migration it. >> 2.4) If it is owned by a guest with device assigned, user > space tools can do live migration if needed. >> >> This patchset includes support for type 2.1/2.2. >> >> page_offfline_xen.patch gives basic support. The new > hypercall (XEN_SYSCTL_page_offline) will mark a page offlining > if the page is in-use, otherwise, it will remove the page from > the page allocator. It also changes the free_heap_pages(), so > that if a page_offlining page is freed, that page will be > marked as page_offlined and will not be allocated anymore. One > tricky thing is, the offlined page may not be buddy-aligned > (i.e., it may be in the middle of a 2^order pages), so that we > have to re-arrange the buddy system (i.e. &heap[][][]) carefully. >> >> page_offline_xen_memory.patch add support to PV guest, a new > hypercall (XENMEM_page_offline) try to replace the old page > with the new one. This will happen only when the guest has > been suspeneded, to avoid complex page sharing situation. I''m > still checking if more situation need be considered, like LDT > pages and CR3 pages, so any suggestion is really great help. >> >> page_offline_tools.patch is an example user space tools > based on libxc/xc_domain_save.c, it will try to firstly mark a > page offline, and checking the result. If a page is owned by a > PV guest, it will try to replace the pages. >> >> I did some basic testing, tried free pages and PV guest > pages and is ok. Of course, I need more test on it. And more > robust error handling is needed. >> >> Any suggestion is welcome. >> >> Thanks >> Yunhong Jiang > > > > > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Feb-10 09:42 UTC
[Xen-devel] Re: [RFC][PATCH] Basic support for page offline
Hi, At 04:29 -0500 on 10 Feb (1234240157), Jiang, Yunhong wrote:> For Xen part, the reason is because the currently heap allocator is > based on buddy system, while the offlined page may be in the middle of > the buddy, so we have to split the original buddy into several smaller > one.Right, I see. It still seems like a lot of code but as I said I haven''t worked through the detail.> And do you mean I need turn to Keir for the heap allocator > changes?Well, Keir will have to ack/nack the patches eventually. :) But I''m happy to have ago at them first. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-10 10:29 UTC
[Xen-devel] Re: [RFC][PATCH] Basic support for page offline
On 10/02/2009 09:29, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> - It seems like a lot of code for what it does. On the Xen >> side that''s >> just a general impression since I''m not familiar with the >> bits of the > > For Xen part, the reason is because the currently heap allocator is based on > buddy system, while the offlined page may be in the middle of the buddy, so we > have to split the original buddy into several smaller one. > And do you mean I need turn to Keir for the heap allocator changes?I will look at them, yes. It''s probably more my area than Tim''s. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Feb-10 21:09 UTC
Re: [Xen-devel] [RFC][PATCH] Basic support for page offline
Would it be possible to add page status query to the hypercalls? E.g. return the status and owner for just one page, without trying to on/offline it? The same goes for the CPU hotplug code, but that''s another issue. The Solaris FMA code likes to have the option of querying the status, so that''s why we''d like to have such an option. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-11 00:16 UTC
RE: [Xen-devel] [RFC][PATCH] Basic support for page offline
Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote:> Would it be possible to add page status query to the hypercalls? E.g. > return the status and owner for just one page, without trying to on/offlineFrank, can you please elabrate what''s the status mean? What''s the usage by Solaris FMA code? Do you mean how the page is used (i.e. information in the page_info->count_info)? As for the ownership of the page, I''m not sure if we can do that without trying online/offline it, since the page may be assigned to another guest after the query if it is not marked offline-pending (mark offline pending make sure it will not be used anymore if freed). -- Yunhong Jiang> it? > > The same goes for the CPU hotplug code, but that''s another issue. > > The Solaris FMA code likes to have the option of querying the status, so > that''s why we''d like to have such an option. > > - Frank_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Feb-11 00:39 UTC
Re: [Xen-devel] [RFC][PATCH] Basic support for page offline
Jiang, Yunhong wrote:> > Frank, can you please elabrate what''s the status mean? What''s the usage by Solaris FMA code? Do you mean how the page is used (i.e. information in the page_info->count_info)? > > As for the ownership of the page, I''m not sure if we can do that without trying online/offline it, since the page may be assigned to another guest after the query if it is not marked offline-pending (mark offline pending make sure it will not be used anymore if freed).The usage is pretty simple: there is an interface to: * retire a page * unretire a page * check on the status The status doesn''t have to have much info. The Solaris interface returns one of: * retired * not retired * pending * invalid page I understand that the owner may be harder to return, and it isn''t really important, so it doesn''t matter if that information is not available from the hypercall. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-11 01:08 UTC
RE: [Xen-devel] [RFC][PATCH] Basic support for page offline
So you mean just the online/offline status? That make sense. I will do that. Thanks Yunhong Jiang xen-devel-bounces@lists.xensource.com <> wrote:> Jiang, Yunhong wrote: >> >> Frank, can you please elabrate what''s the status mean? > What''s the usage by Solaris FMA code? Do you mean how the page > is used (i.e. information in the page_info->count_info)? >> >> As for the ownership of the page, I''m not sure if we can do > that without trying online/offline it, since the page may be > assigned to another guest after the query if it is not marked > offline-pending (mark offline pending make sure it will not be > used anymore if freed). > > The usage is pretty simple: there is an interface to: > > * retire a page > * unretire a page > * check on the status > > The status doesn''t have to have much info. The Solaris > interface returns > one of: > > * retired > * not retired > * pending > * invalid page > > I understand that the owner may be harder to return, and it > isn''t really > important, so it doesn''t matter if that information is not available from > the hypercall. > > - Frank > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Van Der Linden
2009-Feb-11 04:08 UTC
Re: [Xen-devel] [RFC][PATCH] Basic support for page offline
Jiang, Yunhong wrote:> So you mean just the online/offline status? That make sense. I will do that. > >Cool, thanks! - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Feb-13 17:03 UTC
[Xen-devel] Re: [RFC][PATCH] Basic support for page offline
Hi, So a few more comments on the detail of those patches. I had imagined that you would suspend the domain, then update the p2m and pagetables in the guest memory from the _tools_. That would involve less code (possibly none) in Xen, and is how I''d prefer it. But your current approach probably catches more of the corner cases (grant tables &c) than the tools could, so that''s OK. update_pgtable_entry() needs a more descriptive name! It updates potentially very many pagetable entries, and in a particular way. Also it probably ought to be static. The reference counting in update_pgtable_entry() is confusing -- it should probably always do reference counting for both the old and new entries; that seems more robust than only doing the decrements there and manually setting count_info and type_info on the new page in replace_page. In replace_page(), your error paths are confused: the ENOMEM error case drops a ref that wasn''t taken and if get_page() fails you don''t free the allocated page. Both of those functions need comments describing what they do and what their arguments are. memory_page_offline(): again, check your error and exit paths; I''m pretty sure you leak references to the domain. Why does this take a domain, by the way? can''t it just take a range of MFNs and figure out the owning domain for each one as it goes? Also, isn''t the returned nr_offlined value always one less than was requested? You write back the _index_ of the highest-numbered frame that you _attempted_ to offline, which is a pretty confusing number. Other than that, the xen mm patch just needs a good scattering of comments. The tools patch is enormous, and seems to copy big chunks of xc_domain_save into a new file. And since Xen is now doing the hard work of pagetable manipulation, I don''t think you even need to suspend the guest -- just pausing it should be enough and is much easier. If you do need to use the suspend/resume code in later stages of development, please don''t copy it out; just make a libxc function that calls the existing functions appropriately. I''ll leave page_offline_xen.patch to Keir since he''s said he''ll do it, but 700 new lines of code seems like quite a lot -- surely some subsets of he existing buddy splitting and merging code could be split out and reused? Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-13 17:36 UTC
Re: [Xen-devel] Re: [RFC][PATCH] Basic support for page offline
On 13/02/2009 17:03, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> I''ll leave page_offline_xen.patch to Keir since he''s said he''ll do it, > but 700 new lines of code seems like quite a lot -- surely some subsets > of he existing buddy splitting and merging code could be split out and > reused?It could indeed surely be half the size. It''s split for some reason into a chain of about five functions, each of which does a little bit of the work. Just merge them all together. And unless you have a good reason for currently expecting to offline large ranges of pages, and can measure a substantial performance difference, I would actually just offline one page a at a time -- implement that in a function and call it repeatedly from a for loop. It will be far less complex and for bad-page offlining should perform just fine. Also some comments about what the difference is between offlining, offlined, and broken would be nice. The change in free_heap_pages() to preserve PGC_offlining|PGC_broken struck me as particularly worrying -- no sane C programmer should write it like that, which just makes me more worried about the verbosity of the rest. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-15 09:39 UTC
RE: [Xen-devel] Re: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 13/02/2009 17:03, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > >> I'll leave page_offline_xen.patch to Keir since he's said he'll do it, >> but 700 new lines of code seems like quite a lot -- surely some subsets >> of he existing buddy splitting and merging code could be split out and >> reused? > > It could indeed surely be half the size. It's split for some > reason into a > chain of about five functions, each of which does a little bit > of the work. > Just merge them all together. And unless you have a good reason for > currently expecting to offline large ranges of pages, and can measure a > substantial performance difference, I would actually just > offline one page a > at a time -- implement that in a function and call it > repeatedly from a for > loop. It will be far less complex and for bad-page offlining > should perform > just fine.The reasonto offline large ranges of pages is to offline a DIMM, but yes that may not important currently, also that is not performance critical. We only support one page each time, that is sure to be much simpler.> > Also some comments about what the difference is between > offlining, offlined, > and broken would be nice. The change in free_heap_pages() to preserve > PGC_offlining|PGC_broken struck me as particularly worrying -- > no sane C > programmer should write it like that, which just makes me more > worried about > the verbosity of the rest.Sure, will add comments for how the page status change. Thanks Yunhong Jiang> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-15 09:48 UTC
[Xen-devel] RE: [RFC][PATCH] Basic support for page offline
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> Hi, > > So a few more comments on the detail of those patches. > > I had imagined that you would suspend the domain, then update the p2m > and pagetables in the guest memory from the _tools_. That > would involve > less code (possibly none) in Xen, and is how I''d prefer it. But your > current approach probably catches more of the corner cases (grant tables > &c) than the tools could, so that''s OK. > > update_pgtable_entry() needs a more descriptive name! It updates > potentially very many pagetable entries, and in a particular way. > Also it probably ought to be static.Tim, thanks for your feedback very much. Yes, the update_pgtable_entry() will update potentially very much page table entries, I''m not sure that''s the right method to achieve it.> > The reference counting in update_pgtable_entry() is confusing -- it > should probably always do reference counting for both the old and new > entries; that seems more robust than only doing the decrements > there and > manually setting count_info and type_info on the new page in replace_page.Sure, I will do like this.> > In replace_page(), your error paths are confused: the ENOMEM error case > drops a ref that wasn''t taken and if get_page() fails you > don''t free the > allocated page. > > Both of those functions need comments describing what they do and what > their arguments are. > > memory_page_offline(): again, check your error and exit paths; I''m > pretty sure you leak references to the domain. Why does this take a > domain, by the way? can''t it just take a range of MFNs and figure out > the owning domain for each one as it goes? > > Also, isn''t the returned nr_offlined value always one less than was > requested? You write back the _index_ of the highest-numbered frame > that you _attempted_ to offline, which is a pretty confusing number. > > Other than that, the xen mm patch just needs a good scattering of comments.Sure, I will update it.> > The tools patch is enormous, and seems to copy big chunks of > xc_domain_save into a new file. And since Xen is now doing the hard > work of pagetable manipulation, I don''t think you even need to suspend > the guest -- just pausing it should be enough and is much easier.But I''m not sure if we can update the P2M table from Xen side, that''s the reason I did the it in the user space. -- Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Feb-16 14:31 UTC
[Xen-devel] Re: [RFC][PATCH] Basic support for page offline
At 04:48 -0500 on 15 Feb (1234673293), Jiang, Yunhong wrote:> > The reference counting in update_pgtable_entry() is confusing -- it > > should probably always do reference counting for both the old and new > > entries; that seems more robust than only doing the decrements > > there and > > manually setting count_info and type_info on the new page in replace_page. > > Sure, I will do like this.In fact, it should use the existing PTE-updating code -- I suspect that, for example, your code won''t work at all on a guest that has shadow pagetables enabled.> > The tools patch is enormous, and seems to copy big chunks of > > xc_domain_save into a new file. And since Xen is now doing the hard > > work of pagetable manipulation, I don''t think you even need to suspend > > the guest -- just pausing it should be enough and is much easier. > > But I''m not sure if we can update the P2M table from Xen side, that''s > the reason I did the it in the user space.In that case, why don''t you update the pagetables from the tools as well? That way you''d avoid walking the guest pagetables in Xen. You could make all the PTE changes, try to free the page, and if it still doesn''t work (because there''s some other refcount held), put things back the way they were. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-16 15:25 UTC
[Xen-devel] RE: [RFC][PATCH] Basic support for page offline
>-----Original Message----- >From: Tim Deegan [mailto:Tim.Deegan@citrix.com] >Sent: 2009年2月16日 22:31 >To: Jiang, Yunhong >Cc: xen-devel@lists.xensource.com >Subject: Re: [RFC][PATCH] Basic support for page offline > >At 04:48 -0500 on 15 Feb (1234673293), Jiang, Yunhong wrote: >> > The reference counting in update_pgtable_entry() is confusing -- it >> > should probably always do reference counting for both the >old and new >> > entries; that seems more robust than only doing the decrements >> > there and >> > manually setting count_info and type_info on the new page >in replace_page. >> >> Sure, I will do like this. > >In fact, it should use the existing PTE-updating code -- I >suspect that, >for example, your code won't work at all on a guest that has shadow >pagetables enabled.Yes, we need work differently depends on guest's paging mode. I forgot PV guest will use shadow mode for log dirty. Just as you said, doing this in user space tools will be much simpler, I will consider more on that option.> >> > The tools patch is enormous, and seems to copy big chunks of >> > xc_domain_save into a new file. And since Xen is now >doing the hard >> > work of pagetable manipulation, I don't think you even >need to suspend >> > the guest -- just pausing it should be enough and is much easier. >> >> But I'm not sure if we can update the P2M table from Xen side, that's >> the reason I did the it in the user space. > >In that case, why don't you update the pagetables from the tools as >well? That way you'd avoid walking the guest pagetables in Xen. You >could make all the PTE changes, try to free the page, and if it still >doesn't work (because there's some other refcount held), put >things back >the way they were.> >Tim. > >-- >Tim Deegan <Tim.Deegan@citrix.com> >Principal Software Engineer, Citrix Systems (R&D) Ltd. >[Company #02300071, SL9 0DZ, UK.] >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-18 14:51 UTC
[Xen-devel] RE: [RFC][PATCH] Basic support for page offline
>-----Original Message----- >From: Tim Deegan [mailto:Tim.Deegan@citrix.com] >Sent: 2009年2月16日 22:31 >To: Jiang, Yunhong >Cc: xen-devel@lists.xensource.com >Subject: Re: [RFC][PATCH] Basic support for page offline > >At 04:48 -0500 on 15 Feb (1234673293), Jiang, Yunhong wrote: >> > The reference counting in update_pgtable_entry() is confusing -- it >> > should probably always do reference counting for both the >old and new >> > entries; that seems more robust than only doing the decrements >> > there and >> > manually setting count_info and type_info on the new page >in replace_page. >> >> Sure, I will do like this. > >In fact, it should use the existing PTE-updating code -- I >suspect that, >for example, your code won't work at all on a guest that has shadow >pagetables enabled.Can you please share me which existing PET-updating code? I browsed the code and didn't find approprate function, especially considering we need update all level page tables.> >> > The tools patch is enormous, and seems to copy big chunks of >> > xc_domain_save into a new file. And since Xen is now >doing the hard >> > work of pagetable manipulation, I don't think you even >need to suspend >> > the guest -- just pausing it should be enough and is much easier. >> >> But I'm not sure if we can update the P2M table from Xen side, that's >> the reason I did the it in the user space. > >In that case, why don't you update the pagetables from the tools as >well? That way you'd avoid walking the guest pagetables in Xen. You >could make all the PTE changes, try to free the page, and if it still >doesn't work (because there's some other refcount held), put >things back >the way they were.Just as you stated before, there may have some corner case need considered, grant table etc (some is missed in my previous patch). For example, if guest has been granted, but the remote domain has not map it (i.e. it is in grant_table->shared, but not been mapped still), there should have no reference added, but if we don't hold the grant table lock, then the backend may mapped a wrong mfn. Such situation is difficult to be solved in tools. BTW, I suspect I may missed more reference count, for example, the page may be pinned etc, I will consider that in my next patch. Also, as to your suggestion in previous mail of "since Xen is now doing the hard work of pagetable manipulation, I don't think you even need to suspend the guest -- just pausing it should be enough and is much easier", I suppose suspend guest will make thing much simpler. For example, we need consider the preempted hypercall that handle page table page (for example, the page table may be partially validated still when we do the offline). Thanks Yunhong Jiang> >Tim. > >-- >Tim Deegan <Tim.Deegan@citrix.com> >Principal Software Engineer, Citrix Systems (R&D) Ltd. >[Company #02300071, SL9 0DZ, UK.] >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Feb-18 15:20 UTC
[Xen-devel] Re: [RFC][PATCH] Basic support for page offline
Hi, At 14:51 +0000 on 18 Feb (1234968681), Jiang, Yunhong wrote:> Can you please share me which existing PET-updating code? I browsed > the code and didn''t find approprate function, especially considering > we need update all level page tables.xen/arch/x86/mm.c: mod_l1_entry() mod_l2_entry(), etc.> Just as you stated before, there may have some corner case need > considered, grant table etc (some is missed in my previous patch). For > example, if guest has been granted, but the remote domain has not map > it (i.e. it is in grant_table->shared, but not been mapped still), > there should have no reference added, but if we don''t hold the grant > table lock, then the backend may mapped a wrong mfn. Such situation is > difficult to be solved in tools. BTW, I suspect I may missed more > reference count, for example, the page may be pinned etc, I will > consider that in my next patch.It should be possible to have the tools do all the PTE manipulations with MMU update hypercalls (I think -- Keir may correct me here). Then the final hypercall to surrender the page will fail if the grant tables are wrong; if it does, put the PTEs back and fall back to a live migration. Isn''t that what your in-xen code does anyway?> Also, as to your suggestion in previous mail of "since Xen is now > doing the hard work of pagetable manipulation, I don''t think you even > need to suspend the guest -- just pausing it should be enough and is > much easier"As you already pointed out, I was wrong about that -- you need to suspend the guest to do the p2m manipulations. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-19 08:44 UTC
[Xen-devel] RE: [RFC][PATCH] Basic support for page offline
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> Hi, > > At 14:51 +0000 on 18 Feb (1234968681), Jiang, Yunhong wrote: >> Can you please share me which existing PET-updating code? I browsed >> the code and didn''t find approprate function, especially considering >> we need update all level page tables. > > xen/arch/x86/mm.c: mod_l1_entry() mod_l2_entry(), etc.Yes, I considered this also, I give it up because these function will do more than simply swap the entry, for example, it may try to re-validate the page table again. But yes, I should use them because they are common function and performance is not important for us.> It should be possible to have the tools do all the PTE manipulations > with MMU update hypercalls (I think -- Keir may correct me here). Then > the final hypercall to surrender the page will fail if the grant tables > are wrong; if it does, put the PTEs back and fall back to a live > migration. Isn''t that what your in-xen code does anyway?Yes, although we may need a very big multicall to achieve it. I didn''t see any issue with this method and will work this way. BTW, I think what we are doing is in fact something like memory_exchange(), although memory_exchange() requires the source memory is not referenced anymore. Thanks Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-19 14:37 UTC
[Xen-devel] RE: [RFC][PATCH] Basic support for page offline
>> It should be possible to have the tools do all the PTE manipulations >> with MMU update hypercalls (I think -- Keir may correct me here). Then >> the final hypercall to surrender the page will fail if the grant tables >> are wrong; if it does, put the PTEs back and fall back to a live >> migration. Isn''t that what your in-xen code does anyway?Tim, after checking the code more carefully, seems currently the MMU update hypercalls (including mod_lx_entry ) assume it is for current domain, while in our usage model, it will update MMU for other domain, so I will try to do following changes: 1) change mod_lx_entry() to get a domain parameter 2) Add a new hypercall (or a new command to do_mmu_update ) to update the MMU for other domain. I''m not sure if there are other usage model for such requirement, and if such changes acceptable? Any feedback is welcome. Thanks Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Mar-02 11:56 UTC
[Xen-devel] Re: [RFC][PATCH] Basic support for page offline
At 14:37 +0000 on 19 Feb (1235054276), Jiang, Yunhong wrote:> > >> It should be possible to have the tools do all the PTE manipulations > >> with MMU update hypercalls (I think -- Keir may correct me here). Then > >> the final hypercall to surrender the page will fail if the grant tables > >> are wrong; if it does, put the PTEs back and fall back to a live > >> migration. Isn''t that what your in-xen code does anyway? > > Tim, after checking the code more carefully, seems currently the MMU update hypercalls (including mod_lx_entry ) assume it is for current domain, while in our usage model, it will update MMU for other domain, so I will try to do following changes: 1) change mod_lx_entry() to get a domain parameter 2) Add a new hypercall (or a new command to do_mmu_update ) to update the MMU for other domain. I''m not sure if there are other usage model for such requirement, and if such changes acceptable? Any feedback is welcome. >Sorry for the delay -- I was travelling around the summit and this got lost. Yes, I think this is an OK approach. I doubt there will be many other users of such a hypercall since most OSes will get upset by their PTEs changing under their feet, but I prefer it to the current patch. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-04 08:23 UTC
[Xen-devel] RE: [RFC][PATCH] Basic support for page offline
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> At 14:37 +0000 on 19 Feb (1235054276), Jiang, Yunhong wrote: >> >>>> It should be possible to have the tools do all the PTE manipulations >>>> with MMU update hypercalls (I think -- Keir may correct me here). Then >>>> the final hypercall to surrender the page will fail if the grant tables >>>> are wrong; if it does, put the PTEs back and fall back to a live >>>> migration. Isn''t that what your in-xen code does anyway? >> >> Tim, after checking the code more carefully, seems currently > the MMU update hypercalls (including mod_lx_entry ) assume it > is for current domain, while in our usage model, it will > update MMU for other domain, so I will try to do following > changes: 1) change mod_lx_entry() to get a domain parameter 2) > Add a new hypercall (or a new command to do_mmu_update ) to > update the MMU for other domain. I''m not sure if there are > other usage model for such requirement, and if such changes > acceptable? Any feedback is welcome. >> > > Sorry for the delay -- I was travelling around the summit and this got > lost. Yes, I think this is an OK approach. I doubt there will be many > other users of such a hypercall since most OSes will get upset by their > PTEs changing under their feet, but I prefer it to the current patch.Sure, I will do this way. Thanks Yunhong Jiang> > Cheers, > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-18 10:24 UTC
[Xen-devel] [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Tim, this is the implementation as discussed. The swap_page.patch is for HV side change, basically it call the mod_lx_entry to update the table. The free_page.patch is the function from user space tools to offlien a page. Thanks Yunhong Jiang Jiang, Yunhong <> wrote:> Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote: >> At 14:37 +0000 on 19 Feb (1235054276), Jiang, Yunhong wrote: >>> >>>>> It should be possible to have the tools do all the PTE manipulations >>>>> with MMU update hypercalls (I think -- Keir may correct me here). Then >>>>> the final hypercall to surrender the page will fail if the grant tables >>>>> are wrong; if it does, put the PTEs back and fall back to a live >>>>> migration. Isn''t that what your in-xen code does anyway? >>> >>> Tim, after checking the code more carefully, seems currently >> the MMU update hypercalls (including mod_lx_entry ) assume it >> is for current domain, while in our usage model, it will >> update MMU for other domain, so I will try to do following >> changes: 1) change mod_lx_entry() to get a domain parameter 2) >> Add a new hypercall (or a new command to do_mmu_update ) to >> update the MMU for other domain. I''m not sure if there are >> other usage model for such requirement, and if such changes >> acceptable? Any feedback is welcome. >>> >> >> Sorry for the delay -- I was travelling around the summit and this got >> lost. Yes, I think this is an OK approach. I doubt there will be many >> other users of such a hypercall since most OSes will get upset by their >> PTEs changing under their feet, but I prefer it to the current patch. > > Sure, I will do this way. > > Thanks > Yunhong Jiang > >> >> Cheers, >> >> Tim. >> >> -- >> Tim Deegan <Tim.Deegan@citrix.com> >> Principal Software Engineer, Citrix Systems (R&D) Ltd. >> [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-18 10:32 UTC
[Xen-devel] RE: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
BTW, this patch depends on several patches I sent out earlier, i.e the change to suspend event channel and the changes to mod_lx_entry, although those patches don''t rely on this one. Keir, as the page offline support is in HV already, so can this target for 3.4 if it pass the review? Thanks Yunhong Jiang Jiang, Yunhong <> wrote:> Tim, this is the implementation as discussed. > The swap_page.patch is for HV side change, basically it call > the mod_lx_entry to update the table. > The free_page.patch is the function from user space tools to offlien a page. > > Thanks > Yunhong Jiang > > Jiang, Yunhong <> wrote: >> Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote: >>> At 14:37 +0000 on 19 Feb (1235054276), Jiang, Yunhong wrote: >>>> >>>>>> It should be possible to have the tools do all the PTE manipulations >>>>>> with MMU update hypercalls (I think -- Keir may correct me here). Then >>>>>> the final hypercall to surrender the page will fail if the grant tables >>>>>> are wrong; if it does, put the PTEs back and fall back to a live >>>>>> migration. Isn''t that what your in-xen code does anyway? >>>> >>>> Tim, after checking the code more carefully, seems currently >>> the MMU update hypercalls (including mod_lx_entry ) assume it >>> is for current domain, while in our usage model, it will >>> update MMU for other domain, so I will try to do following >>> changes: 1) change mod_lx_entry() to get a domain parameter 2) >>> Add a new hypercall (or a new command to do_mmu_update ) to >>> update the MMU for other domain. I''m not sure if there are >>> other usage model for such requirement, and if such changes >>> acceptable? Any feedback is welcome. >>>> >>> >>> Sorry for the delay -- I was travelling around the summit and this got >>> lost. Yes, I think this is an OK approach. I doubt there will be many >>> other users of such a hypercall since most OSes will get upset by their >>> PTEs changing under their feet, but I prefer it to the current patch. >> >> Sure, I will do this way. >> >> Thanks >> Yunhong Jiang >> >>> >>> Cheers, >>> >>> Tim. >>> >>> -- >>> Tim Deegan <Tim.Deegan@citrix.com> >>> Principal Software Engineer, Citrix Systems (R&D) Ltd. >>> [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-18 10:42 UTC
[Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 18/03/2009 10:32, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> BTW, this patch depends on several patches I sent out earlier, i.e the change > to suspend event channel and the changes to mod_lx_entry, although those > patches don''t rely on this one. > > Keir, as the page offline support is in HV already, so can this target for 3.4 > if it pass the review?Yes, I''d be happy to check this in this week. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Mar-18 17:34 UTC
[Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Hi, At 10:24 +0000 on 18 Mar (1237371884), Jiang, Yunhong wrote:> Tim, this is the implementation as discussed. > The swap_page.patch is for HV side change, basically it call the mod_lx_entry to update the table.That seems good. One or two nits with that patch: - You''re passing a physical address (of the PTE to update) in an MFN field. That''s not going to be big enough on all platforms. Also it''s pretty confusing. - The "swap" operation takes the physical address of a PTE and a new MFN. Why not a new PTE? And if it''s going to be called "swap", why not take the old PTE too and do an atomic update? (or just call it something else and only take the new PTE?) - Why the checks for the validity of the old MFN before the change? What are you guarding against? And please document the hypercall, specially since its side-effects could be quite surprising.> The free_page.patch is the function from user space tools to offlien a page.This is much better than the previous version, thanks. Cheers, Tim.> > Thanks > Yunhong Jiang > > Jiang, Yunhong <> wrote: > > Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote: > >> At 14:37 +0000 on 19 Feb (1235054276), Jiang, Yunhong wrote: > >>> > >>>>> It should be possible to have the tools do all the PTE manipulations > >>>>> with MMU update hypercalls (I think -- Keir may correct me here). Then > >>>>> the final hypercall to surrender the page will fail if the grant tables > >>>>> are wrong; if it does, put the PTEs back and fall back to a live > >>>>> migration. Isn''t that what your in-xen code does anyway? > >>> > >>> Tim, after checking the code more carefully, seems currently > >> the MMU update hypercalls (including mod_lx_entry ) assume it > >> is for current domain, while in our usage model, it will > >> update MMU for other domain, so I will try to do following > >> changes: 1) change mod_lx_entry() to get a domain parameter 2) > >> Add a new hypercall (or a new command to do_mmu_update ) to > >> update the MMU for other domain. I''m not sure if there are > >> other usage model for such requirement, and if such changes > >> acceptable? Any feedback is welcome. > >>> > >> > >> Sorry for the delay -- I was travelling around the summit and this got > >> lost. Yes, I think this is an OK approach. I doubt there will be many > >> other users of such a hypercall since most OSes will get upset by their > >> PTEs changing under their feet, but I prefer it to the current patch. > > > > Sure, I will do this way. > > > > Thanks > > Yunhong Jiang > > > >> > >> Cheers, > >> > >> Tim. > >> > >> -- > >> Tim Deegan <Tim.Deegan@citrix.com> > >> Principal Software Engineer, Citrix Systems (R&D) Ltd. > >> [Company #02300071, SL9 0DZ, UK.]-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-19 05:12 UTC
[Xen-devel] RE: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Tim, thanks for review very much. Attached is the new version. See below for comments. Thanks Yunhong Jiang Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> Hi, > > At 10:24 +0000 on 18 Mar (1237371884), Jiang, Yunhong wrote: >> Tim, this is the implementation as discussed. >> The swap_page.patch is for HV side change, basically it call > the mod_lx_entry to update the table. > > That seems good. One or two nits with that patch: > > - You''re passing a physical address (of the PTE to update) in an MFN > field. That''s not going to be big enough on all platforms. Also it''s > pretty confusing.Yes, fixed and now named pte_addr as a uint64.> > - The "swap" operation takes the physical address of a PTE and a new > MFN. Why not a new PTE? And if it''s going to be called "swap", why > not take the old PTE too and do an atomic update? (or just call it > something else and only take the new PTE?)Yes, I change the name to update_pte, and pass only the new PTE.> > - Why the checks for the validity of the old MFN before the change? > What are you guarding against?Yes, seems it is meaningless.> > And please document the hypercall, specially since its side-effects could > be quite surprising. > >> The free_page.patch is the function from user space tools to offlien a >> page. > > This is much better than the previous version, thanks.I missed one thing in previous patch, i.e. the changes to xc_core_arch_map_p2m(). Originally I change that function to map the p2m table as rw (it is forgoted in previous mail). Now I add a new function xc_core_arch_map_p2m_writable() so that not break the original API. But I''m a bit confused why the xc_domain_save.c will not use this function to map p2m table also? Instead, I noticed a lot of duplicate on these two files, I can send out a clean patch in future if it is ok. Thanks -- Yunhong Jiang> > Cheers, > > Tim. > >> >> Thanks >> Yunhong Jiang >> >> Jiang, Yunhong <> wrote: >>> Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote: >>>> At 14:37 +0000 on 19 Feb (1235054276), Jiang, Yunhong wrote: >>>>> >>>>>>> It should be possible to have the tools do all the PTE manipulations >>>>>>> with MMU update hypercalls (I think -- Keir may correct me here). Then >>>>>>> the final hypercall to surrender the page will fail if the grant >>>>>>> tables are wrong; if it does, put the PTEs back and fall back to a >>>>>>> live migration. Isn''t that what your in-xen code does anyway? >>>>> >>>>> Tim, after checking the code more carefully, seems currently >>>> the MMU update hypercalls (including mod_lx_entry ) assume it >>>> is for current domain, while in our usage model, it will >>>> update MMU for other domain, so I will try to do following >>>> changes: 1) change mod_lx_entry() to get a domain parameter 2) >>>> Add a new hypercall (or a new command to do_mmu_update ) to >>>> update the MMU for other domain. I''m not sure if there are >>>> other usage model for such requirement, and if such changes >>>> acceptable? Any feedback is welcome. >>>>> >>>> >>>> Sorry for the delay -- I was travelling around the summit and this got >>>> lost. Yes, I think this is an OK approach. I doubt there will be many >>>> other users of such a hypercall since most OSes will get upset by their >>>> PTEs changing under their feet, but I prefer it to the current patch. >>> >>> Sure, I will do this way. >>> >>> Thanks >>> Yunhong Jiang >>> >>>> >>>> Cheers, >>>> >>>> Tim. >>>> >>>> -- >>>> Tim Deegan <Tim.Deegan@citrix.com> >>>> Principal Software Engineer, Citrix Systems (R&D) Ltd. >>>> [Company #02300071, SL9 0DZ, UK.] > > > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Mar-19 09:32 UTC
[Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Hi, At 05:12 +0000 on 19 Mar (1237439530), Jiang, Yunhong wrote:> > - You''re passing a physical address (of the PTE to update) in an MFN > > field. That''s not going to be big enough on all platforms. Also it''s > > pretty confusing. > > Yes, fixed and now named pte_addr as a uint64.You made it an unsigned long, which is still smaller than a paddr_t on PAE builds. And you can''t just make it 64 bits in that union without breaking the ABI; you''ll need to add a new interface somewhere. Maybe Keir can suggest a better place.> I missed one thing in previous patch, i.e. the changes to > xc_core_arch_map_p2m(). Originally I change that function to map the > p2m table as rw (it is forgoted in previous mail). Now I add a new > function xc_core_arch_map_p2m_writable() so that not break the > original API.OK. Are there any callers of the xc_core_arch_map_p2m() that would care if it gave a writable mapping?> But I''m a bit confused why the xc_domain_save.c will not use this > function to map p2m table also? Instead, I noticed a lot of duplicate > on these two files, I can send out a clean patch in future if it is > ok.I think that was just carelessness at the time the xc_core stuff went in (and possibly also distaste at the rather scruffy state of the xc_domain_save version). They should probably be unified at some point if anyone has the energy. :) Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-19 09:45 UTC
[Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 19/03/2009 09:32, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:>>> - You''re passing a physical address (of the PTE to update) in an MFN >>> field. That''s not going to be big enough on all platforms. Also it''s >>> pretty confusing. >> >> Yes, fixed and now named pte_addr as a uint64. > > You made it an unsigned long, which is still smaller than a paddr_t on > PAE builds. And you can''t just make it 64 bits in that union without > breaking the ABI; you''ll need to add a new interface somewhere. Maybe > Keir can suggest a better place.Firstly, the comment added to the header file is pretty rubbish. The description fits existing update methods such as MMU_NORMAL_PT_UPDATE, so based on that comment I could quite reasonably reject your patch on grounds that it is redundant. Secondly, the patch name says swap_page and a printk the patch adds refers to ''swap page''. What''s being swapped? That''s not the name of the operation, nor is swapping referred to in the description comment I mention above. Thirdly, perhaps this makes more sense as a MMU_* op hanging off mmu_update()? That call takes pairs of u64 values, which could give you the space you require. Then you can add a nice comment explaining how your new command differs from MMU_NORMAL_PT_UPDATE. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-19 09:48 UTC
[Xen-devel] RE: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> Hi, > > At 05:12 +0000 on 19 Mar (1237439530), Jiang, Yunhong wrote: >> I missed one thing in previous patch, i.e. the changes to >> xc_core_arch_map_p2m(). Originally I change that function to map the >> p2m table as rw (it is forgoted in previous mail). Now I add a new >> function xc_core_arch_map_p2m_writable() so that not break the >> original API. > > OK. Are there any callers of the xc_core_arch_map_p2m() that > would care > if it gave a writable mapping?It is exported by libxc, so we can''t make assumption here.> >> But I''m a bit confused why the xc_domain_save.c will not use this >> function to map p2m table also? Instead, I noticed a lot of duplicate >> on these two files, I can send out a clean patch in future if it is >> ok. > > I think that was just carelessness at the time the xc_core > stuff went in > (and possibly also distaste at the rather scruffy state of the > xc_domain_save version). They should probably be unified at some point if > anyone has the energy. :) > > Cheers, > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-19 09:57 UTC
[Xen-devel] RE: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 19/03/2009 09:32, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote: > >>>> - You''re passing a physical address (of the PTE to update) in an MFN >>>> field. That''s not going to be big enough on all platforms. Also >>>> it''s pretty confusing. >>> >>> Yes, fixed and now named pte_addr as a uint64. >> >> You made it an unsigned long, which is still smaller than a paddr_t on >> PAE builds. And you can''t just make it 64 bits in that union without >> breaking the ABI; you''ll need to add a new interface somewhere. Maybe >> Keir can suggest a better place. > > Firstly, the comment added to the header file is pretty rubbish. The > description fits existing update methods such as > MMU_NORMAL_PT_UPDATE, so > based on that comment I could quite reasonably reject your > patch on grounds > that it is redundant.I will update it. In fact, I didn''t find a proper name for it. Maybe something like MMU_FOREIGN_PT_UPDATE? But it may still be confused as update pt to point memory belongs to other domain.> > Secondly, the patch name says swap_page and a printk the patch > adds refers > to ''swap page''. What''s being swapped? That''s not the name of > the operation, > nor is swapping referred to in the description comment I mention above. > > Thirdly, perhaps this makes more sense as a MMU_* op hanging off > mmu_update()? That call takes pairs of u64 values, which could > give you the > space you require. Then you can add a nice comment explaining > how your new > command differs from MMU_NORMAL_PT_UPDATE.I turn to the mmu_ext_op because it has only 1 entry left. So do you mean it is ok to be there? - yhj> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-19 10:13 UTC
[Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 19/03/2009 09:57, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Thirdly, perhaps this makes more sense as a MMU_* op hanging off >> mmu_update()? That call takes pairs of u64 values, which could >> give you the >> space you require. Then you can add a nice comment explaining >> how your new >> command differs from MMU_NORMAL_PT_UPDATE. > > I turn to the mmu_ext_op because it has only 1 entry left. So do you mean it > is ok to be there?Yes, I think it makes most sense there. It''s close in behaviour to MMU_NORMAL_PT_UPDATE, except for the ''foreigness''. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-19 13:01 UTC
[Xen-devel] RE: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 19/03/2009 09:57, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Thirdly, perhaps this makes more sense as a MMU_* op hanging off >>> mmu_update()? That call takes pairs of u64 values, which could give you >>> the space you require. Then you can add a nice comment explaining how >>> your new command differs from MMU_NORMAL_PT_UPDATE. >> >> I turn to the mmu_ext_op because it has only 1 entry left. So do you mean >> it is ok to be there? > > Yes, I think it makes most sense there. It''s close in behaviour to > MMU_NORMAL_PT_UPDATE, except for the ''foreigness''. > > -- KeirThis is updated version. Instead of add a new mmu_*op, I try to change the MMU_NORMAL_PT_UPDATE to support update foreign page table. There will be some restirction for it, including the pagetable and the new mfn pointed should be owned by same domain, and the domain should be suspeneded already. The update_foregin_pt.patch is for this. I''m not sure if this method is feasible. The exchange_page.patch is to replace a mfn with a new one. Although there is already a memory_exchange hypercall, but that hypercall will not accept new page, add that support will break current ABI. Also it does not support foreign domain. This function add those support The other two have not much changes. The only change is the user space tools need two hypercall, one is to update the page table, while the second is to exchange the pages. Please have a look on it. Thanks Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-19 13:22 UTC
[Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 19/03/2009 13:01, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Instead of add a new mmu_*op, I try to change the MMU_NORMAL_PT_UPDATE to > support update foreign page table. There will be some restirction for it, > including the pagetable and the new mfn pointed should be owned by same > domain, and the domain should be suspeneded already. The > update_foregin_pt.patch is for this. I''m not sure if this method is feasible.I''ll need to look in more detail, but yes it does appear this approach can work. This is probably the easiest way to go in that case.> The exchange_page.patch is to replace a mfn with a new one. Although there is > already a memory_exchange hypercall, but that hypercall will not accept new > page, add that support will break current ABI. Also it does not support > foreign domain. This function add those supportWell, why is the existing hypercall not okay by itself? Why can you not take an arbitrary new mfn and you feel you have to specify it from the tools instead? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-19 14:26 UTC
[Xen-devel] RE: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 19/03/2009 13:01, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Instead of add a new mmu_*op, I try to change the MMU_NORMAL_PT_UPDATE to >> support update foreign page table. There will be some restirction for it, >> including the pagetable and the new mfn pointed should be owned by same >> domain, and the domain should be suspeneded already. The >> update_foregin_pt.patch is for this. I''m not sure if this method is >> feasible. > > I''ll need to look in more detail, but yes it does appear this > approach can > work. This is probably the easiest way to go in that case. > >> The exchange_page.patch is to replace a mfn with a new one. Although there >> is already a memory_exchange hypercall, but that hypercall will not accept >> new page, add that support will break current ABI. Also it does not support >> foreign domain. This function add those support > > Well, why is the existing hypercall not okay by itself? Why > can you not take > an arbitrary new mfn and you feel you have to specify it from the tools > instead?It is ok for us to use an arbitrary new mfn, and then do the update_entry. But what happen if this process failed and we want to turn back to the old page? We still need this mechanism at that situation. Thanks Yunhong Jiang> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-19 14:36 UTC
[Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 19/03/2009 14:26, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Well, why is the existing hypercall not okay by itself? Why >> can you not take >> an arbitrary new mfn and you feel you have to specify it from the tools >> instead? > > It is ok for us to use an arbitrary new mfn, and then do the update_entry. But > what happen if this process failed and we want to turn back to the old page? > We still need this mechanism at that situation.If what failed? The update_entry? How could that happen? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-19 14:42 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
xen-devel-bounces@lists.xensource.com <> wrote:> On 19/03/2009 14:26, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Well, why is the existing hypercall not okay by itself? Why can you not >>> take an arbitrary new mfn and you feel you have to specify it from the >>> tools instead? >> >> It is ok for us to use an arbitrary new mfn, and then do the update_entry. >> But what happen if this process failed and we want to turn back to the old >> page? We still need this mechanism at that situation. > > If what failed? The update_entry? How could that happen?Per discussion before, when the page is granted to other domain, then after we update all entry, there will still have reference to left. -- yhj> > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-19 14:48 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Or when page is mapped by other domain. xen-devel-bounces@lists.xensource.com <> wrote:> xen-devel-bounces@lists.xensource.com <> wrote: >> On 19/03/2009 14:26, "Jiang, Yunhong" > <yunhong.jiang@intel.com> wrote: >> >>>> Well, why is the existing hypercall not okay by itself? Why can you not >>>> take an arbitrary new mfn and you feel you have to specify it from the >>>> tools instead? >>> >>> It is ok for us to use an arbitrary new mfn, and then do the update_entry. >>> But what happen if this process failed and we want to turn back to the old >>> page? We still need this mechanism at that situation. >> >> If what failed? The update_entry? How could that happen? > > Per discussion before, when the page is granted to other > domain, then after we update all entry, there will still have reference to > left. > > -- yhj > >> >> -- Keir >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-19 16:45 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 19/03/2009 14:42, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>>> It is ok for us to use an arbitrary new mfn, and then do the update_entry. >>> But what happen if this process failed and we want to turn back to the old >>> page? We still need this mechanism at that situation. >> >> If what failed? The update_entry? How could that happen? > > Per discussion before, when the page is granted to other domain, then after we > update all entry, there will still have reference to left.Hmmm I don''t really understand. K. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 02:52 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
xen-devel-bounces@lists.xensource.com <> wrote:> On 19/03/2009 14:42, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>>> It is ok for us to use an arbitrary new mfn, and then do the >>>> update_entry. But what happen if this process failed and we want to turn >>>> back to the old page? We still need this mechanism at that situation. >>> >>> If what failed? The update_entry? How could that happen? >> >> Per discussion before, when the page is granted to other domain, then >> after we update all entry, there will still have reference to left. > > Hmmm I don''t really understand.The basic idea to offline a page is: 1) mark a page offline pending 2) If the page is owned by a HVM domain, user have to live migrate it 3) If the page is owned by a PV domain, we will try to exchange the offline pending page to a new one and free the old page. (This is target of this series patches). The method to exchange the offline pending page for PV domain is: 1) Suspend the guest. 2) Allocate a new page for the guest 3) Get a copy for the content 4) User space tools will scan all page table page to see if any reference to the offending page, if yes, then it will hypercall to Xen to replace the entry to point to the new one. (Through the mmu_*ops) 5) After update all page tables, user space tools will try to exchange the old page with the new page. If the new mfn has no reference anymore (i.e. count_info & count_mask = 1), the exchange will update the m2p and return success, otherwise it will return fail. (the page may be referenced by other domain, like grant table or foreign mapped). 6) If step 5 is success, user space tools will update the content of the new page and the p2m table, else it will try to undo step 4 to revert page table changes. 7) Resume the guest. This requires we need to allocate the new page before the exchange call and we have to pass both old_mfn and new_mfn in step 5 to exchange the memory. However, current hypercall will always allocate a new page to replace the old one. Currently I try to add a new hypercall for this purpose. Maybe we can enhance the current XENMEM_exchange to accept a mem_flags, when that flag is set, the exch.out.extent_start will be the new_mfn instead of the gpfn, and the gpfn will be always same as corresponding gpfn in the exch.in.ext_start. But I do think this is a bit tricky, it change the meaning of exch.out.extent_start and how the gpn is pass down. Thanks Yunhong Jiang> > K.> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 09:05 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 02:52, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> This requires we need to allocate the new page before the exchange call and we > have to pass both old_mfn and new_mfn in step 5 to exchange the memory. > However, current hypercall will always allocate a new page to replace the old > one. > > Currently I try to add a new hypercall for this purpose. > > Maybe we can enhance the current XENMEM_exchange to accept a mem_flags, when > that flag is set, the exch.out.extent_start will be the new_mfn instead of the > gpfn, and the gpfn will be always same as corresponding gpfn in the > exch.in.ext_start. But I do think this is a bit tricky, it change the meaning > of exch.out.extent_start and how the gpn is pass down.Thanks for the description. I guess I will wait for your next patch, which should I think at least separate the foreign pagetable update hypercall from this weird swap hypercall. Then I can comment on the new swap hypercall perhaps getting less confused by it. I certainly don''t promise to get this in for 3.4 at this point however. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 09:16 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 20/03/2009 02:52, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> This requires we need to allocate the new page before the exchange call >> and we have to pass both old_mfn and new_mfn in step 5 to exchange the >> memory. However, current hypercall will always allocate a new page to >> replace the old one. >> >> Currently I try to add a new hypercall for this purpose. >> >> Maybe we can enhance the current XENMEM_exchange to accept a mem_flags, >> when that flag is set, the exch.out.extent_start will be the new_mfn >> instead of the gpfn, and the gpfn will be always same as corresponding >> gpfn in the exch.in.ext_start. But I do think this is a bit tricky, it >> change the meaning of exch.out.extent_start and how the gpn is pass down. > > Thanks for the description. I guess I will wait for your next > patch, which > should I think at least separate the foreign pagetable update hypercall fromI think I have split that patch last night. There is no foreign page table hypercall anymore, instead, I just tried to enhance the current mmu_op. And this new "weird" swap is now named mmu_ext_exchange_page.(Is it really so weird to just pass the new mfn down?)> this weird swap hypercall. Then I can comment on the new swap hypercall > perhaps getting less confused by it. I certainly don''t promise > to get this > in for 3.4 at this point however.That''s ok and should depends on our discussion. Just stated before, hope check in IF it pass the review. Now it is on-going still :)> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 09:28 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 09:16, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Thanks for the description. I guess I will wait for your next >> patch, which >> should I think at least separate the foreign pagetable update hypercall from > > I think I have split that patch last night. There is no foreign page table > hypercall anymore, instead, I just tried to enhance the current mmu_op. And > this new "weird" swap is now named mmu_ext_exchange_page.(Is it really so > weird to just pass the new mfn down?)Ah yes, I found the email now. Well I''m still confused as to why it is needed. It seems to me you could scan for all PTEs mapping old_pfn, stash them in a list and temporarily make them not-present, and take a copy of old_pfn. Then do a normal XENMEM_exchange: on failure revert all PTEs, on success switch over all PTEs and copy old_pfn data into new_pfn. Well it is more hypercalls (two updates per PTE) I suppose, but I doubt this matters unless you''re offlining a lot of pages, and we don''t support offlining memory banks really at the moment. Also some of this will potentially batch up into multicalls or MMUOP_ lists nicely anyway. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Mar-20 09:37 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
>>> "Jiang, Yunhong" <yunhong.jiang@intel.com> 20.03.09 03:52 >>> >The method to exchange the offline pending page for PV domain is: >1) Suspend the guest. >2) Allocate a new page for the guest >3) Get a copy for the content >4) User space tools will scan all page table page to see if any reference to the offending page, if yes, then it will hypercall to Xen >to replace the entry to point to the new one. (Through the mmu_*ops) >5) After update all page tables, user space tools will try to exchange the old page with the new page. If the new mfn has no >reference anymore (i.e. count_info & count_mask = 1), the exchange will update the m2p and return success, otherwise it will >return fail. (the page may be referenced by other domain, like grant table or foreign mapped).Hmm, if you consider the possibility of this case, then you should also consider the possibility of a page still being accessible by another domain at the point where you copy its content, but no longer in use when you do the exchange (which means that the content may have changed between the two points in time).>6) If step 5 is success, user space tools will update the content of the new page and the p2m table, else it will try to undo step 4 >to revert page table changes. >7) Resume the guest.Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 09:41 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
Jan Beulich <mailto:jbeulich@novell.com> wrote:>>>> "Jiang, Yunhong" <yunhong.jiang@intel.com> 20.03.09 03:52 >>> >> The method to exchange the offline pending page for PV domain is: 1) >> Suspend the guest. 2) Allocate a new page for the guest >> 3) Get a copy for the content >> 4) User space tools will scan all page table page to see if > any reference to the offending page, if yes, then it will > hypercall to Xen >> to replace the entry to point to the new one. (Through the mmu_*ops) >> 5) After update all page tables, user space tools will try to > exchange the old page with the new page. If the new mfn has no >> reference anymore (i.e. count_info & count_mask = 1), the > exchange will update the m2p and return success, otherwise it will >> return fail. (the page may be referenced by other domain, > like grant table or foreign mapped). > > Hmm, if you consider the possibility of this case, then you > should also consider the possibility of a page still being > accessible by another domain at the point where you copy its > content, but no longer in use when you do the exchange (which > means that the content may have changed between the two points > in time).Aha, yes, thanks for pointing this. I considerd this but apparently missed this race condition. When the page is freed, we can''t map the page from the user space anymore, so we have to do it in the exchange hypercall to gurantee the atomic. Keir, I checked the XENMEM_exchange before, and it didn''t do the copy, are there any reason for that? Or if we can add the copy to it? Thanks Yunhong Jiang> >> 6) If step 5 is success, user space tools will update the > content of the new page and the p2m table, else it will try to > undo step 4 >> to revert page table changes. >> 7) Resume the guest. > > Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 09:42 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 09:37, "Jan Beulich" <jbeulich@novell.com> wrote:>> 5) After update all page tables, user space tools will try to exchange the >> old page with the new page. If the new mfn has no >> reference anymore (i.e. count_info & count_mask = 1), the exchange will >> update the m2p and return success, otherwise it will >> return fail. (the page may be referenced by other domain, like grant table or >> foreign mapped). > > Hmm, if you consider the possibility of this case, then you should also > consider the possibility of a page still being accessible by another domain at > the point where you copy its content, but no longer in use when you do the > exchange (which means that the content may have changed between the two points > in time).Since the guest is suspended, would that matter? Any I/Os that were in flight on suspend will get resubmitted on resume. So the potential races might all be benign. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Mar-20 09:42 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user space tools-- Was RE: [RFC][PATCH] Basic support for page offline
>>> Keir Fraser <keir.fraser@eu.citrix.com> 20.03.09 10:28 >>> >On 20/03/2009 09:16, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Thanks for the description. I guess I will wait for your next >>> patch, which >>> should I think at least separate the foreign pagetable update hypercall from >> >> I think I have split that patch last night. There is no foreign page table >> hypercall anymore, instead, I just tried to enhance the current mmu_op. And >> this new "weird" swap is now named mmu_ext_exchange_page.(Is it really so >> weird to just pass the new mfn down?) > >Ah yes, I found the email now. Well I''m still confused as to why it is >needed. It seems to me you could scan for all PTEs mapping old_pfn, stash >them in a list and temporarily make them not-present, and take a copy of >old_pfn. Then do a normal XENMEM_exchange: on failure revert all PTEs, on >success switch over all PTEs and copy old_pfn data into new_pfn. Well it is >more hypercalls (two updates per PTE) I suppose, but I doubt this matters >unless you''re offlining a lot of pages, and we don''t support offlining >memory banks really at the moment. Also some of this will potentially batch >up into multicalls or MMUOP_ lists nicely anyway.A normal XENMEM_exchange wouldn''t work here, would it? The old page must have no other references for it to succeed, and will go back to the allocator right afterwards - it''s contents won''t be recoverable. This would probably require a new flag to XENMEM_exchange (which I agree would be much simpler than adding a full-blown new [sub-]hypercall). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 09:44 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 20/03/2009 09:16, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Thanks for the description. I guess I will wait for your next patch, which >>> should I think at least separate the foreign pagetable update hypercall >>> from >> >> I think I have split that patch last night. There is no foreign page table >> hypercall anymore, instead, I just tried to enhance the current mmu_op. And >> this new "weird" swap is now named mmu_ext_exchange_page.(Is it really so >> weird to just pass the new mfn down?) > > Ah yes, I found the email now. Well I''m still confused as to why it is > needed. It seems to me you could scan for all PTEs mapping > old_pfn, stash > them in a list and temporarily make them not-present, and take > a copy of > old_pfn. Then do a normal XENMEM_exchange: on failure revert > all PTEs, on > success switch over all PTEs and copy old_pfn data into > new_pfn. Well it is > more hypercalls (two updates per PTE) I suppose, but I doubt > this matters > unless you''re offlining a lot of pages, and we don''t support offlining > memory banks really at the moment. Also some of this will > potentially batch > up into multicalls or MMUOP_ lists nicely anyway.Yes, seems this method works well. The only change to Xen is to enhance current mmu_xxx_pt_update to support foreign page table. I will update the patch accordingly to see if it can pass the review :-) Thanks Yunhong Jiang> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 09:48 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user space tools-- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 09:42, "Jan Beulich" <jbeulich@novell.com> wrote:>> Ah yes, I found the email now. Well I''m still confused as to why it is >> needed. It seems to me you could scan for all PTEs mapping old_pfn, stash >> them in a list and temporarily make them not-present, and take a copy of >> old_pfn. Then do a normal XENMEM_exchange: on failure revert all PTEs, on >> success switch over all PTEs and copy old_pfn data into new_pfn. Well it is >> more hypercalls (two updates per PTE) I suppose, but I doubt this matters >> unless you''re offlining a lot of pages, and we don''t support offlining >> memory banks really at the moment. Also some of this will potentially batch >> up into multicalls or MMUOP_ lists nicely anyway. > > A normal XENMEM_exchange wouldn''t work here, would it? The old page > must have no other references for it to succeed, and will go back to the > allocator right afterwards - it''s contents won''t be recoverable. This would > probably require a new flag to XENMEM_exchange (which I agree would be > much simpler than adding a full-blown new [sub-]hypercall).If the XENMEM_exchange succeeds then you don''t need the old_pfn any more. You only need recover if the excahnge fails, and in that case the page isn''t released by XENMEM_exchange. This is my understanding at least. :-) I''m trying to work out whether I''m wrong or whether the extra Xen bits are not needed. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 09:52 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 20/03/2009 09:37, "Jan Beulich" <jbeulich@novell.com> wrote: > >>> 5) After update all page tables, user space tools will try to exchange the >>> old page with the new page. If the new mfn has no >>> reference anymore (i.e. count_info & count_mask = 1), the exchange will >>> update the m2p and return success, otherwise it will >>> return fail. (the page may be referenced by other domain, like grant >>> table or foreign mapped). >> >> Hmm, if you consider the possibility of this case, then you should also >> consider the possibility of a page still being accessible by another >> domain at the point where you copy its content, but no longer in use when >> you do the exchange (which means that the content may have changed between >> the two points in time). > > Since the guest is suspended, would that matter? Any I/Os that were in > flight on suspend will get resubmitted on resume. So the > potential races > might all be benign.I think it may not be resubmited. Backend may have no idea at all if the guest is resume quickly, I remember. Or maybe I miss understand the interface between backend/frontend.. Thanks Yunhong Jiang> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 09:52 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 09:44, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Yes, seems this method works well. The only change to Xen is to enhance > current mmu_xxx_pt_update to support foreign page table. > > I will update the patch accordingly to see if it can pass the review :-)Yay! Good news. The only question, then, is whether we think the races against backends are benign or not. Since the guest is suspended (in the ready-for-save-restore meaning of suspended, right?) I think you get away with it. If actually there are dangerous races we may need Xen support for this operation yet. But if the guest is really suspended this is no more dangerous than the stop-and-copy phase of a domain save or live migration, I would say. And that obviously works! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 09:58 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 09:52, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Since the guest is suspended, would that matter? Any I/Os that were in >> flight on suspend will get resubmitted on resume. So the >> potential races >> might all be benign. > > I think it may not be resubmited. Backend may have no idea at all if the guest > is resume quickly, I remember. Or maybe I miss understand the interface > between backend/frontend..Ah, you do a suspend-cancel/fast-resume? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 09:59 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 20/03/2009 09:52, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Since the guest is suspended, would that matter? Any I/Os that were in >>> flight on suspend will get resubmitted on resume. So the potential races >>> might all be benign. >> >> I think it may not be resubmited. Backend may have no idea at all if the >> guest is resume quickly, I remember. Or maybe I miss understand the >> interface between backend/frontend.. > > Ah, you do a suspend-cancel/fast-resume?Yes, that''s suggested by Tim and I think that''s meet our purpose quite well. Thanks Yunhong Jiang> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 10:03 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 09:59, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>>>> Since the guest is suspended, would that matter? Any I/Os that were in >>>> flight on suspend will get resubmitted on resume. So the potential races >>>> might all be benign. >>> >>> I think it may not be resubmited. Backend may have no idea at all if the >>> guest is resume quickly, I remember. Or maybe I miss understand the >>> interface between backend/frontend.. >> >> Ah, you do a suspend-cancel/fast-resume? > > Yes, that''s suggested by Tim and I think that''s meet our purpose quite well.Okay, then I suggest you extend XENMEM_exchange so that in.mem_flags can tell that hypercall to copy data from old to new pages. XENMEMF_copy_data? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 10:05 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
Yes, I''m working on that in fact. -- yhj Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 20/03/2009 09:59, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>>>> Since the guest is suspended, would that matter? Any I/Os that were in >>>>> flight on suspend will get resubmitted on resume. So the potential races >>>>> might all be benign. >>>> >>>> I think it may not be resubmited. Backend may have no idea at all if the >>>> guest is resume quickly, I remember. Or maybe I miss understand the >>>> interface between backend/frontend.. >>> >>> Ah, you do a suspend-cancel/fast-resume? >> >> Yes, that''s suggested by Tim and I think that''s meet our purpose quite >> well. > > Okay, then I suggest you extend XENMEM_exchange so that > in.mem_flags can > tell that hypercall to copy data from old to new pages. XENMEMF_copy_data? > > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 10:07 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 10:03, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>>> Ah, you do a suspend-cancel/fast-resume? >> >> Yes, that''s suggested by Tim and I think that''s meet our purpose quite well. > > Okay, then I suggest you extend XENMEM_exchange so that in.mem_flags can > tell that hypercall to copy data from old to new pages. XENMEMF_copy_data?Even this may not work. Old grants will reference the old page. Subsequent attempts by a backend to map the grant will fail. And the resulting failed I/Os will probably make the frontend driver throw a fit. Getting this working with suspend-cancel seems pretty tricky. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 10:13 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 20/03/2009 10:03, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >>>> Ah, you do a suspend-cancel/fast-resume? >>> >>> Yes, that''s suggested by Tim and I think that''s meet our purpose quite >>> well. >> >> Okay, then I suggest you extend XENMEM_exchange so that in.mem_flags can >> tell that hypercall to copy data from old to new pages. XENMEMF_copy_data? > > Even this may not work. Old grants will reference the old > page. Subsequent > attempts by a backend to map the grant will fail. And the > resulting failed > I/Os will probably make the frontend driver throw a fit. Getting this > working with suspend-cancel seems pretty tricky.If there is grant map for it, I think we will fail since the reference is not 1 when XENMEM_exchange. Or do you mean there is a reference in grant table but is not mapped still? Will the refrence count be added when a page is granted? I''m not quite sure about this, but I think that will be same to original XENMEM_exchange. If yes, we may have to update the grant information? The normal save-restore method may cause service broken, I think that''s the reason of choosing the suspend-cacel method. Thanks Yunhong Jiang> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 10:19 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 10:07, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>>> Yes, that''s suggested by Tim and I think that''s meet our purpose quite well. >> >> Okay, then I suggest you extend XENMEM_exchange so that in.mem_flags can >> tell that hypercall to copy data from old to new pages. XENMEMF_copy_data? > > Even this may not work. Old grants will reference the old page. Subsequent > attempts by a backend to map the grant will fail. And the resulting failed > I/Os will probably make the frontend driver throw a fit. Getting this > working with suspend-cancel seems pretty tricky.In fact do you have *any* handling for grants which have not yet been used by the backend? It seems that such grants are doomed to fail since you probably don''t rewrite them? Would it be a good idea to map the guest''s grant pages and scan for your old_pfn? Then fail the exchange if you see a match? That avoids this whole issue and then also you don''t need to worry about racing backends and you can keep the copy operation where it is, in the tools, I think. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 10:21 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 10:13, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Even this may not work. Old grants will reference the old >> page. Subsequent >> attempts by a backend to map the grant will fail. And the >> resulting failed >> I/Os will probably make the frontend driver throw a fit. Getting this >> working with suspend-cancel seems pretty tricky. > > If there is grant map for it, I think we will fail since the reference is not > 1 when XENMEM_exchange. > > Or do you mean there is a reference in grant table but is not mapped still? > Will the refrence count be added when a page is granted? I''m not quite sure > about this, but I think that will be same to original XENMEM_exchange. If yes, > we may have to update the grant information?Reference counts are not adjusted until a grant is mapped. So I''m talking about grants which have been written by the domU, passed to the backend driver, but not yet used. Those will be screwed. Anyway, see my email just now, which suggests you map the guest''s grant table into your toolstack code, and you can go check for existing grants yourself. As I say there -- if you do that I don''t think you need worry about atomicity of the copy. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-20 10:36 UTC
RE: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 20/03/2009 10:13, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Even this may not work. Old grants will reference the old page. Subsequent >>> attempts by a backend to map the grant will fail. And the resulting failed >>> I/Os will probably make the frontend driver throw a fit. Getting this >>> working with suspend-cancel seems pretty tricky. >> >> If there is grant map for it, I think we will fail since the reference is >> not 1 when XENMEM_exchange. >> >> Or do you mean there is a reference in grant table but is not mapped still? >> Will the refrence count be added when a page is granted? I''m not quite sure >> about this, but I think that will be same to original XENMEM_exchange. If >> yes, we may have to update the grant information? > > Reference counts are not adjusted until a grant is mapped. So > I''m talking > about grants which have been written by the domU, passed to the backend > driver, but not yet used. Those will be screwed.Got it.> > Anyway, see my email just now, which suggests you map the guest''s grant > table into your toolstack code, and you can go check for > existing grants > yourself. As I say there -- if you do that I don''t think you need worry > about atomicity of the copy.Are there any method for user space to get the grant table information?> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 10:40 UTC
Re: [Xen-devel] Re: [PATCH] Support swap a page from user spacetools -- Was RE: [RFC][PATCH] Basic support for page offline
On 20/03/2009 10:36, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Anyway, see my email just now, which suggests you map the guest''s grant >> table into your toolstack code, and you can go check for >> existing grants >> yourself. As I say there -- if you do that I don''t think you need worry >> about atomicity of the copy. > > Are there any method for user space to get the grant table information?GNTTABOP_query_size to discover size of the grant table. GNTTABOP_setup_table to get the frame list (pass in nr_frames from the query_size operation). Map the frames and away you go. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel