Jiang, Yunhong
2008-Dec-02 03:24 UTC
[Xen-devel] [Question] How to support page offline in Xen environment
Hi, all Page offline can be used by many purpose, like memory offline, memory power management, proactive action when multiple CE error happen to one page etc. In virtualization environment without guest offline support, we think offline a page usually means replace the old page with a new one transparently to guest. Currently we are trying to add page offline support in Xen environment . We''d share our idea with the mailing list before we begin implment it and hope to get feedback from the community. Our idea: Page offline will be done in two steps: firstly, a page is marked offline-pending (if the page is free already, it will be marked as offline directly); secondly, when that page is freed, that page will be marked offline automatically, and will not allocated anymore. Notice is, we will not support all types of page because the diffrent page usage model. Basically, xen heap page, page owned by dom0 can not be offline. Also, it is complex to offline a page that is used for guest with device assigned. We are considering utilize the live migration mechanism to achieve the two steps. The user space tools will firstly mark page offline_pending through hypercall, this hyercall will also return the owners of the pages. secondly, if all pages can be offlined, user space tools should live migrate the domains owning those pages. Thirdly, user space tools will check all page is offlined already. Following hypercall will be added: int xen_page_offline_pending(int start_pfn, int end_pfn, void *result, void *owners) IN: start_pfn/end_pfn: the range of pages to be offlined. OUT: result: A buffer contain the page status for each page, it can be: offlined: the page is offlined already (e.g. the page is already freed when the hypercall happen) offline_pending: the page will be offline when freed offline_fail: The page can''t be offline, may because it is used by xen/dom0. Notice is, if any page is marked offline_fail, this hypercall will not change any page''s status (i.e. no page will be marked offline_pending or offlined) to make sure atomic operation. other status: Other status to be defined in future. OUT: owners: A buffer contains the domains owning of the pages. Because of security consideration, it will not state which domain owning which page. Need notice is, issue exists for the live migrate mechanism: a) the domain ID will be changed after live migrate b) live migration will fail for a domain with device assigned, so user space tools have to hot remove the device, or fail the page offline requirement. Some other option: Of course, there are still exists some other mechianism to achive this purpose: 1) Handle the page offline requirement in Xen environment. It is simple to page offline a HVM domain ( without any device assigned) utilize the p2m table, and re-create the shadow/EPT table from scratch. But it is not so easy for PV domain (maybe we can switch the domain to shadow mode during this procedure), and domain with device assigned. Also, it is complex to support following page types: a) page shared between multiple domain, like granted to dom0, b) page used for domain control, for example, the page used for hvm domain''s vlapic. Any feedback is welcome. Thanks Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Dec-03 10:09 UTC
Re: [Xen-devel] [Question] How to support page offline in Xen environment
Hi, At 11:24 +0800 on 02 Dec (1228217091), Jiang, Yunhong wrote:> We are considering utilize the live migration mechanism to > achieve the two steps. The user space tools will firstly mark page > offline_pending through hypercall, this hyercall will also return the > owners of the pages. secondly, if all pages can be offlined, user > space tools should live migrate the domains owning those > pages.What do you mean by "live migrate" here? Presuambly you can do something a lot more lightweight, just pausing the guest for a checkpoint, changing one or two p2m entries, and letting it resume.> Following hypercall will be added: > int xen_page_offline_pending(int start_pfn, int end_pfn, void *result, void *owners) > IN: start_pfn/end_pfn: > the range of pages to be offlined. > OUT: result: > A buffer contain the page status for each page, it can be: > offlined: the page is offlined already (e.g. the page is already freed when the hypercall happen) > offline_pending: the page will be offline when freed > offline_fail: The page can''t be offline, may because it is used by xen/dom0. Notice is, if any page is marked offline_fail, this hypercall will not change any page''s status (i.e. no page will be marked offline_pending or offlined) to make sure atomic operation. > other status: Other status to be defined in future.Should this buffer be IN/OUT? The caller has to allocate it anyway and it would give a more general interface than the start/end arguments.> OUT: owners: > A buffer contains the domains owning of the pages. Because of security consideration, it will not state which domain owning which page.Why not? Presumably the caller needs to have privilege over all those domains anyway in order to mark their frames pending-offline.> Need notice is, issue exists for the live migrate mechanism: > a) the domain ID will be changed after live migrate > b) live migration will fail for a domain with device assigned, so user space tools have to hot remove the device, or fail the page offline requirementBoth of those issues go away if you don''t use a full migrate. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Dec-03 10:30 UTC
RE: [Xen-devel] [Question] How to support page offline in Xen environment
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> Hi, > > At 11:24 +0800 on 02 Dec (1228217091), Jiang, Yunhong wrote: >> We are considering utilize the live migration mechanism to >> achieve the two steps. The user space tools will firstly mark page >> offline_pending through hypercall, this hyercall will also return the >> owners of the pages. secondly, if all pages can be offlined, user >> space tools should live migrate the domains owning those >> pages. > > What do you mean by "live migrate" here? Presuambly you can do > something a lot more lightweight, just pausing the guest for a > checkpoint, changing one or two p2m entries, and letting it resume.Thanks for your reply very much. Yes, we do consider this before (it is in fac the "other option" in my mail) and it is ok for HVM domain without any foreign mapping, however, considering following situation: a) A page is foreign mapped by another guest (e.g. dom0), change p2m entries is not enough. b) A page is assigned to a domain with device assigned, we can''t simply change the p2m entry because of DMA operation may on-going. (this in fact can''t resolve cleanly through live migration, although the tools do hot remove in advance). c) If a page is used like shadow page table or, virutal local apic''s page, currently we can''t simply exchange these pages. d) For PV guest, can this be done without co-operation from guest? (Or we need change the paging mode to shadow before page offline?) Of course, we can simplify the request, for example, no support for page in item a), b) and c) and that will be ok. That''s the reason we hope to get suggestion on next step.> >> Following hypercall will be added: >> int xen_page_offline_pending(int start_pfn, int > end_pfn, void *result, void *owners) >> IN: start_pfn/end_pfn: >> the range of pages to be offlined. >> OUT: result: >> A buffer contain the page status for each page, it can be: >> offlined: the page is offlined already > (e.g. the page is already freed when the hypercall happen) >> offline_pending: the page will be > offline when freed >> offline_fail: The page can''t be > offline, may because it is used by xen/dom0. Notice is, if any > page is marked offline_fail, this hypercall will not change > any page''s status (i.e. no page will be marked offline_pending > or offlined) to make sure atomic operation. >> other status: Other status to be > defined in future. > > Should this buffer be IN/OUT? The caller has to allocate it anyway and > it would give a more general interface than the start/end arguments. > >> OUT: owners: >> A buffer contains the domains owning of the > pages. Because of security consideration, it will not state > which domain owning which page. > > Why not? Presumably the caller needs to have privilege over all those > domains anyway in order to mark their frames pending-offline.Hmm, that depends on how we define the privilege, and yes it is ture for dom0 now.> >> Need notice is, issue exists for the live migrate mechanism: >> a) the domain ID will be changed after live migrate >> b) live migration will fail for a domain with device > assigned, so user space tools have to hot remove the device, > or fail the page offline requirement > > Both of those issues go away if you don''t use a full migrate. > > Cheers, > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Dec-03 10:45 UTC
Re: [Xen-devel] [Question] How to support page offline in Xen environment
At 18:30 +0800 on 03 Dec (1228329001), Jiang, Yunhong wrote:> a) A page is foreign mapped by another guest (e.g. dom0), change p2m entries is not enough.True.> b) A page is assigned to a domain with device assigned, we can''t simply change the p2m entry because of DMA operation may on-going. (this in fact can''t resolve cleanly through live migration, although the tools do hot remove in advance).That seems to be orthogonal to the question of how the page is got rid of; you can do a hot remove and hot add whether you do a full live-migrate or not.> c) If a page is used like shadow page table or, virutal local apic''s page, currently we can''t simply exchange these pages.True, but live-migration doesn''t help that because right now given an MFN that''s in use as a shadow page or any HVM state page you can''t easily find out which domain is responsible for it. Also, remember that full live-migration needs enough spare RAM to hold an extra copy of the guest, so it couldn''t work on guests larger than half the physical RAM, for example.> d) For PV guest, can this be done without co-operation from guest?Yes it can. As long as you don''t use the "fast path" resume to restart the guest, it will re-read its p2m just like it would after a full save/restore.> Of course, we can simplify the request, for example, no support for page in item a), b) and c) and that will be ok. > That''s the reason we hope to get suggestion on next step.I think it depends on how important it is to be able to offline frames quickly and transparently. If that''s not important, then just save the owning domain to disk, offline the frames, and restore it. If it is important, I''d be inclined to to something very lightweight based on small parts of the save/restore code (which will be much faster than live migration), and keep save-to-disk as a backstop for the edge cases. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Dec-04 02:02 UTC
RE: [Xen-devel] [Question] How to support page offline in Xen environment
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> At 18:30 +0800 on 03 Dec (1228329001), Jiang, Yunhong wrote: >> a) A page is foreign mapped by another guest (e.g. dom0), > change p2m entries is not enough. > > True. > >> b) A page is assigned to a domain with device assigned, we > can''t simply change the p2m entry because of DMA operation may > on-going. (this in fact can''t resolve cleanly through live > migration, although the tools do hot remove in advance). > > That seems to be orthogonal to the question of how the page is got rid > of; you can do a hot remove and hot add whether you do a full live-migrate > or not. > >> c) If a page is used like shadow page table or, virutal > local apic''s page, currently we can''t simply exchange these pages. > > True, but live-migration doesn''t help that because right now given an > MFN that''s in use as a shadow page or any HVM state page you can''t > easily find out which domain is responsible for it.Ahh, yes, didn''t realize this. So do you think it is ok to add such information?> > Also, remember that full live-migration needs enough spare RAM to hold > an extra copy of the guest, so it couldn''t work on guests larger than > half the physical RAM, for example.Yes, that is one argument we thought that before.> >> d) For PV guest, can this be done without co-operation from guest? > > Yes it can. As long as you don''t use the "fast path" resume to restart > the guest, it will re-read its p2m just like it would after a full > save/restore.What do you mean of the "fast path"?> >> Of course, we can simplify the request, for example, no > support for page in item a), b) and c) and that will be ok. >> That''s the reason we hope to get suggestion on next step. > > I think it depends on how important it is to be able to offline frames > quickly and transparently. If that''s not important, then just save the > owning domain to disk, offline the frames, and restore it. If it is > important, I''d be inclined to to something very lightweight based on > small parts of the save/restore code (which will be much faster than > live migration), and keep save-to-disk as a backstop for the > edge cases.Do you mean the "something very lightweight based on small parts of the save/restore code" is done in management tools, not in HV, am I right? Thanks Yunhong Jiang> > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Dec-04 13:09 UTC
Re: [Xen-devel] [Question] How to support page offline in Xen environment
Hi, At 10:02 +0800 on 04 Dec (1228384976), Jiang, Yunhong wrote:> > True, but live-migration doesn''t help that because right now given an > > MFN that''s in use as a shadow page or any HVM state page you can''t > > easily find out which domain is responsible for it. > > Ahh, yes, didn''t realize this. So do you think it is ok to add such > information?Maybe for shadow pagetables something could be done, though I''m not sure there''s room in the frametable. I think the other frames are sufficiently few that if you''re already excluding xenheap and dom0 they won''t make much difference.> > Also, remember that full live-migration needs enough spare RAM to hold > > an extra copy of the guest, so it couldn''t work on guests larger than > > half the physical RAM, for example. > > Yes, that is one argument we thought that before. > > > > >> d) For PV guest, can this be done without co-operation from guest? > > > > Yes it can. As long as you don''t use the "fast path" resume to restart > > the guest, it will re-read its p2m just like it would after a full > > save/restore. > > What do you mean of the "fast path"?Have a look at the code for xc_domain_resume in libxc/xc_resume.c. The slow-and-safe version makes the domain state look like it would after a save/restore, so that older kernels can be resumed after they''ve paused and had their state saved. The fast version just changes the return code that the guest will see from its shutdown hypercall. My suggestion is that you cause the guest to stop like it would for a save to disk, shuffle its p2m around, and call the slow-path resume function so that it will pick up the new p2m properly.> Do you mean the "something very lightweight based on small parts of > the save/restore code" is done in management tools, not in HV, am I > right?Yes. For the common case this lets you get what you want without the hypervisor being involved. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Dec-04 14:44 UTC
RE: [Xen-devel] [Question] How to support page offline in Xen environment
Tim Deegan <mailto:Tim.Deegan@citrix.com> wrote:> Hi, > > At 10:02 +0800 on 04 Dec (1228384976), Jiang, Yunhong wrote: >>> True, but live-migration doesn''t help that because right now given an >>> MFN that''s in use as a shadow page or any HVM state page you can''t >>> easily find out which domain is responsible for it. >> >> Ahh, yes, didn''t realize this. So do you think it is ok to add such >> information? > > Maybe for shadow pagetables something could be done, though > I''m not sure > there''s room in the frametable. I think the other frames are > sufficiently few that if you''re already excluding xenheap and dom0 they > won''t make much difference.Yes, at least in first stage we will not do that. After all, we just need cover the majority of the frames. We many need extra 2 bit in frametable to mark page offline status, not sure if that''s ok.> >>> Also, remember that full live-migration needs enough spare RAM to hold >>> an extra copy of the guest, so it couldn''t work on guests larger than >>> half the physical RAM, for example. >> >> Yes, that is one argument we thought that before. >> >>> >>>> d) For PV guest, can this be done without co-operation from guest? >>> >>> Yes it can. As long as you don''t use the "fast path" resume to restart >>> the guest, it will re-read its p2m just like it would after a full >>> save/restore. >> >> What do you mean of the "fast path"? > > Have a look at the code for xc_domain_resume in libxc/xc_resume.c. The > slow-and-safe version makes the domain state look like it would after a > save/restore, so that older kernels can be resumed after they''ve paused > and had their state saved. The fast version just changes the return > code that the guest will see from its shutdown hypercall. > > My suggestion is that you cause the guest to stop like it would for a > save to disk, shuffle its p2m around, and call the slow-path resume > function so that it will pick up the new p2m properly.I will dig-into the code Thanks for your suggestion.> >> Do you mean the "something very lightweight based on small parts of >> the save/restore code" is done in management tools, not in HV, am I >> right? > > Yes. For the common case this lets you get what you want without the > hypervisor being involved.So maybe the policy can be: if HV can offline page easily (like page owned by hvm domain without device assigned, or free pages), HV will do that, otherwise, we will leave it to user space tools. We will firstly implement the code in HV side.> > Cheers, > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Dec-11 15:21 UTC
RE: [Xen-devel] [Question] How to support page offline in Xen environment
Tim, do you think this code path will keep guest''s service continue (that''s one reason we considered LM method before). Seems from the code, the guest will be down for very short time, but I''m not qutie sure. I''m still trying to reading the code. Thanks Yunhong Jiang xen-devel-bounces@lists.xensource.com <> wrote:>> Have a look at the code for xc_domain_resume in libxc/xc_resume.c. The >> slow-and-safe version makes the domain state look like it would after a >> save/restore, so that older kernels can be resumed after they''ve paused >> and had their state saved. The fast version just changes the return >> code that the guest will see from its shutdown hypercall. >> >> My suggestion is that you cause the guest to stop like it would for a >> save to disk, shuffle its p2m around, and call the slow-path resume >> function so that it will pick up the new p2m properly. > > I will dig-into the code Thanks for your suggestion. > >> >>> Do you mean the "something very lightweight based on small parts of >>> the save/restore code" is done in management tools, not in HV, am I >>> right? >> >> Yes. For the common case this lets you get what you want without the >> hypervisor being involved. > > So maybe the policy can be: if HV can offline page easily > (like page owned by hvm domain without device assigned, or > free pages), HV will do that, otherwise, we will leave it to user space > tools. > > We will firstly implement the code in HV side. > >> >> Cheers, >> >> Tim. >> >> -- >> Tim Deegan <Tim.Deegan@citrix.com> >> Principal Software Engineer, Citrix Systems (R&D) Ltd. >> [Company #02300071, SL9 0DZ, UK.] > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Dec-11 15:48 UTC
Re: [Xen-devel] [Question] How to support page offline in Xen environment
At 23:21 +0800 on 11 Dec (1229037698), Jiang, Yunhong wrote:> Tim, do you think this code path will keep guest''s service continue > (that''s one reason we considered LM method before). Seems from the > code, the guest will be down for very short time, but I''m not qutie > sure. I''m still trying to reading the code.Yes; the resume code was written as part of a project that does continuous live snapshotting of a running guest to provide fault-tolerance. The downtime for offlining pages this way should be even less than for live migration. Cheers, Tim.> Thanks > Yunhong Jiang > > xen-devel-bounces@lists.xensource.com <> wrote: > >> Have a look at the code for xc_domain_resume in libxc/xc_resume.c. The > >> slow-and-safe version makes the domain state look like it would after a > >> save/restore, so that older kernels can be resumed after they''ve paused > >> and had their state saved. The fast version just changes the return > >> code that the guest will see from its shutdown hypercall. > >> > >> My suggestion is that you cause the guest to stop like it would for a > >> save to disk, shuffle its p2m around, and call the slow-path resume > >> function so that it will pick up the new p2m properly. > > > > I will dig-into the code Thanks for your suggestion. > > > >> > >>> Do you mean the "something very lightweight based on small parts of > >>> the save/restore code" is done in management tools, not in HV, am I > >>> right? > >> > >> Yes. For the common case this lets you get what you want without the > >> hypervisor being involved. > > > > So maybe the policy can be: if HV can offline page easily > > (like page owned by hvm domain without device assigned, or > > free pages), HV will do that, otherwise, we will leave it to user space > > tools. > > > > We will firstly implement the code in HV side. > > > >> > >> Cheers, > >> > >> Tim. > >> > >> -- > >> Tim Deegan <Tim.Deegan@citrix.com> > >> Principal Software Engineer, Citrix Systems (R&D) Ltd. > >> [Company #02300071, SL9 0DZ, UK.] > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Possibly Parallel Threads
- can dom0 modify Shadow PT of HVM domU?
- paging and shadow paging in xen: trying to implement split memory
- One question on MMIO
- [PATCH] Fix guest_handle_okay/guest_handle_subrange_okay
- relationship of the auto_translated_physmap feature and the shadow_mode_translate mode of domain