Zhai, Edwin
2009-Aug-19 07:08 UTC
[Xen-devel] revisit the super page support in HVM restore
Keir, We had ever discussed the super page support in HVM restore, and the conclusion was: "If pseudo-phys page is not yet populated in target domain, AND it is first page of a 2MB extent, AND no other pages in that extent are yet populated, AND the next pages in the save-image stream populate that extent in order, THEN allocate a superpage. If the next 511 pages (to make the 2MB extent) are split across a batch boundary, we have to optimistically allocate a superpage in the 1st batch, and then break it into several 4K pages in the 2nd batch." I once had a patch for it(sleep in my machine for a long time), but the logic is a little bit complicated: We need a "requried_pfn" to indicate the expected pfn num in the pfn list transferred from source machine(set it to invalid when not tracking 2M page), and the pseudo-code is as following: for ( i = 0; i < nr_mfns; i++ ) { if ( pfn_list[i] is START of a 2M page ) { /* case 1 */ populate previous collected pfn buffer; start a new tracking for this 2M page { required_pfn = pfn_list[i] + 1; start collecting pfn buffer; } } else if ( pfn_list[i] == required_pfn ) { /* case 2: this pfn comes in order inside the 2M page */ continue tracking this 2M page { required_pfn++ add this pfn into collected pfn buffer; } } else if ( required_pfn is VALID) { /* * case 3: this pfn comes out of order inside the 2M page * (not start && not required && in tracking) */ populate previous collected pfn buffer; start a new tracking for the following 4K pages { required_pfn = INVALID; start collecting pfn buffer; } } else { /* * case 4: series of 4K pages * (not start && not required && not in tracking) */ continue this series of 4K pages { add this pfn into collected pfn buffer; } } } This is not the end of the story: for the populating action in case 1 & 3, we need tell if it''s a super page or not. Also need know if the page has been already populated, and if populated as a normal page or super page. Furthermore, we need decide if need break previous allocated 2M page in case 1 & 3, so need set some flags for it and keep some info when allocating 2M page. There are other actions, considerations... I had spent some time on this patch, but still got some minor bugs:( Do you have any idea for optimizing this logic? We have 2 concerns for this method: 1. The code is complicated and bug prone. 2. The target machine at most has the same 2M pages as source machine, even owning more available big bulk of memory. So how about this new method: * Not tracking each of pfn inside 2M page, but trying best to allocate 2M pages if the 2M page covering this pfn is not allocated. * There may be holes inside new allocated 2M pages that are not synced in this batch, but we don''t care and assume these missing pfns will come in future. This new method is simple as the super page support for PV guest is already there. Thanks for any comments, Edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Aug-19 07:29 UTC
[Xen-devel] Re: revisit the super page support in HVM restore
On 19/08/2009 08:08, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:> So how about this new method: > * Not tracking each of pfn inside 2M page, but trying best to allocate > 2M pages if the 2M page covering this pfn is not allocated. > * There may be holes inside new allocated 2M pages that are not synced in > this batch, but we don''t care and assume these missing pfns will > come in future. > > This new method is simple as the super page support for PV guest is > already there.You wil fail to restore a guest which has ballooned down its memory as there will be 4k holes in its memory map. You will allocate 2MB superpages despite these holes, which do not get fixed up until end of restore process, and run out of memory in the host, or against the guest''s maxmem limit. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhai, Edwin
2009-Aug-19 07:55 UTC
[Xen-devel] Re: revisit the super page support in HVM restore
Keir Fraser wrote:> On 19/08/2009 08:08, "Zhai, Edwin" <edwin.zhai@intel.com> wrote: > > >> So how about this new method: >> * Not tracking each of pfn inside 2M page, but trying best to allocate >> 2M pages if the 2M page covering this pfn is not allocated. >> * There may be holes inside new allocated 2M pages that are not synced in >> this batch, but we don''t care and assume these missing pfns will >> come in future. >> >> This new method is simple as the super page support for PV guest is >> already there. >> > > You wil fail to restore a guest which has ballooned down its memory as there > will be 4k holes in its memory map.I see. But current PV guest has same issue also. If set superpages for the PV guest, allocate_mfn in xc_domain_restore.c would try to allocate 2M page for each of pfn regardless of the holes. Per my understanding, this is more serious issue for PV guest, as it uses balloon driver more frequently. If we have to use this algorithm, back to my complicated code -- do you have any suggestion to simplify the logic? Thanks,> You will allocate 2MB superpages despite > these holes, which do not get fixed up until end of restore process, and run > out of memory in the host, or against the guest''s maxmem limit. > > -- Keir > > >-- best rgds, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Aug-19 09:04 UTC
[Xen-devel] Re: revisit the super page support in HVM restore
On 19/08/2009 08:55, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:>> You wil fail to restore a guest which has ballooned down its memory as there >> will be 4k holes in its memory map. > > I see. But current PV guest has same issue also. If set superpages for > the PV guest, allocate_mfn in xc_domain_restore.c would try to allocate > 2M page for each of pfn regardless of the holes. Per my understanding, > this is more serious issue for PV guest, as it uses balloon driver more > frequently.I don''t think this has been addressed yet for PV guests. But then again noone much is using the PV superpage support. Whereas this HVM superpage logic will be always on. So it needs to work reliably!> If we have to use this algorithm, back to my complicated code -- do you > have any suggestion to simplify the logic?I wasn''t clear where your pseudocode fits into xc_domain_restore. My view is that we would probably stuff the logic inside allocate_physmem(), or near the call to allocate_physmem(). The logic added would look for start of a superpage, and look for a straight run of pages to the end of the superpage (or until we hit the end of the batch, which would need special treatment). As for other points: * "Need tell if it''s a super page or not" -- superpages in the guest physmap are only an optimisation. We can introduce them where possible, regardless of which regions were or weren''t superpage-backed in the original source domain. * "Need know if page has already been populated, and if populated as a normal page or superpage" -- p2m[] array tells us what is already populated. And we do not need care after the allocation has happened whether it was a superpage or not: a superpage will simply fill 512 entries in the p2m[]. Our try-to-allocate-superpage logic will simply bail if it detects any entry in the p2m[] range of interest is already populated. Basically all we need is a "good-enough" heuristic for allocating superpages, as they are an optimisation only. If measurement tells us our heuristic is failing too often, then we can get more sophisticated/complicated. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel