George Dunlap
2008-Dec-23 12:55 UTC
[Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
This set of patches introduces a set of mechanisms and interfaces to implement populate-on-demand memory. The purpose of populate-on-demand memory is to allow non-paravirtualized guests (such as Windows or Linux HVM) boot in a ballooned state. BACKGROUND When non-PV domains boots, they typically read the e820 maps to determine how much memory they have, and then assume that much memory thereafter. Memory requirements can be reduced using a balloon driver, but it cannot be increased past this initial value. Currently, this means that a non-PV domain must be booted with the maximum amount of memory you want that VM every to be able to use. Populate-on-demand allows us to "boot ballooned", in the following manner: * Mark the entire range of memory (memory_static_max aka maxmem) with a new p2m type, populate_on_demand, reporting memory_static_max in th e820 map. No memory is allocated at this stage. * Allocate the "memory_dynamic_max" (aka "target") amount of memory for a "PoD cache". This memory is kept on a separate list in the domain struct. * Boot the guest. * Populate the p2m table on-demand as it''s accessed with pages from the PoD cache. * When the balloon driver loads, it inflates the balloon size to (maxmem - target), giving the memory back to Xen. When this is accomplished, the "populate-on-demand" portion of boot is effectively finished. One complication is that many operating systems have start-of-day page scrubbers, which touch all of memory to zero it. This scrubber may run before the balloon driver can return memory to Xen. These zeroed pages, however, don''t contain any information; we can safely replace them with PoD entries again. So when we run out of PoD cache, we do an "emergency sweep" to look for zero pages we can reclaim for the populate-on-demand cache. When we find a page range which is entirely zero, we mark the gfn range PoD again, and put the memory back into the PoD cache. NB that this code is designed to work only in conjunction with a balloon driver. If the balloon driver is not loaded, eventually all pages will be dirtied (non-zero), the emergency sweep will fail, and there will be no memory to back outstanding PoD pages. When this happens, the domain will crash. The code works for both shadow mode and HAP mode; it has been tested with NPT/RVI and shadow, but not yet with EPT. It also attempts to avoid splintering superpages, to allow HAP to function more effectively. To use: * ensure that you have a functioning balloon driver in the guest (e.g., xen_balloon.ko for Linux HVM guests). * Set maxmem/memory_static_max to one value, and memory/memory_dynamic_max to another when creating the domain; e.g: # xm create debian-hvm maxmem=512 memory=256 The patches are as follows: 01 - Add a p2m_query_type to core gfn_to_mfn*() functions. 02 - Change some gfn_to_mfn() calls to gfn_to_mfn_query(), which will not populate PoD entries. Specifically, since gfn_to_mfn() may grab the p2m lock, it must not be called while the shadow lock is held. 03 - Populate-on-demand core. Introduce new p2m type, PoD cache structures, and core functionality. Add PoD checking to audit_p2m(). Add PoD information to the ''q'' debug key. 04 - Implement p2m_decrease_reservation. As the balloon driver returns gfns to Xen, it handles PoD entries properly; it also "steals" memory being returned for the PoD cache instead of freeing it, if necessary. 05 - emergency sweep: Implement emergency sweep for zero memory if the cache is low. If it finds pages (or page ranges) entirely zero, it will replace the entry with a PoD entry again, reclaiming the memory for the PoD cache. 06 - Deal with splintering both PoD pages (to back singleton PoD entries) and PoD ranges 07 - Xen interface for populate-on-demand functionality: PoD flag for populate_physmap, {get,set}_pod_target for interacting with the PoD cache. set_pod_target() should be called for any domain that may have PoD entries. It will increase the size of the cache if necessary, but will never decrease the size of the cache. (This will be done as the balloon driver balloons down.) 08 - libxc interface. Add a new libxc functions: + xc_hvm_build_target_mem(), which accepts memsize and target. If these are equal, PoD functionality is not invoked. Otherwise, memsize is marked PoD, and the target MiB is allocated to the PoD cache. + xc_[sg]et_pod_target(): get / set PoD target. set_pod_target() should be called whenever you change the guest target mem on a domain which may have outstaning PoD entries. This may increase the size of the PoD cache up to the number of outstanding PoD entries, but will not reduce the size of the cache. (The cache may be reduced as the balloon driver returns gfn space to Xen.) 09 - xend integration. + Always calls xc_hvm_build_target_mem() with memsize=maxmem and target=memory. If these the same, the internal function will not use PoD. + Calls xc_set_target_mem() whenever a domain''s target is changed. Also calls balloon.free(), causing dom0 to balloon down itself if there''s not enough memory otherwise. Things still to do: * When reduce_reservation() is called with a superpage, keep the superpage intact. * Create a hypercall continuation for set_pod_target. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Dec-23 19:06 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
Very nice! One thing that might be worth adding to the requirements list or README is that this approach (or any which depends on ballooning) will now almost certainly require any participating hvm domain to have an adequately-sized properly-configured swap disk. Ballooning is insufficiently responsive to grow memory fast enough to handle rapidly growing memory needs of an active domain The consequence for a no-swap-disk is application failures and the consequence even if a swap disk IS configured is temporarily very poor performance. I''m working on fixing that (at least on pv domains). Watch this list after the new year. So this won''t work for any domain that does start-of-day scrubbing with a non-zero value? I suppose that''s OK. Happy holidays to all! Dan> -----Original Message----- > From: George Dunlap [mailto:dunlapg@umich.edu] > Sent: Tuesday, December 23, 2008 5:55 AM > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory > > > This set of patches introduces a set of mechanisms and interfaces to > implement populate-on-demand memory. The purpose of > populate-on-demand memory is to allow non-paravirtualized guests (such > as Windows or Linux HVM) boot in a ballooned state. > > BACKGROUND > > When non-PV domains boots, they typically read the e820 maps to > determine how much memory they have, and then assume that much memory > thereafter. Memory requirements can be reduced using a balloon > driver, but it cannot be increased past this initial value. > Currently, this means that a non-PV domain must be booted with the > maximum amount of memory you want that VM every to be able to use. > > Populate-on-demand allows us to "boot ballooned", in the > following manner: > * Mark the entire range of memory (memory_static_max aka maxmem) with > a new p2m type, populate_on_demand, reporting memory_static_max in th > e820 map. No memory is allocated at this stage. > * Allocate the "memory_dynamic_max" (aka "target") amount of memory > for a "PoD cache". This memory is kept on a separate list in the > domain struct. > * Boot the guest. > * Populate the p2m table on-demand as it''s accessed with pages from > the PoD cache. > * When the balloon driver loads, it inflates the balloon size to > (maxmem - target), giving the memory back to Xen. When this is > accomplished, the "populate-on-demand" portion of boot is effectively > finished. > > One complication is that many operating systems have start-of-day page > scrubbers, which touch all of memory to zero it. This scrubber may > run before the balloon driver can return memory to Xen. These zeroed > pages, however, don''t contain any information; we can safely replace > them with PoD entries again. So when we run out of PoD cache, we do > an "emergency sweep" to look for zero pages we can reclaim for the > populate-on-demand cache. When we find a page range which is entirely > zero, we mark the gfn range PoD again, and put the memory back into > the PoD cache. > > NB that this code is designed to work only in conjunction with a > balloon driver. If the balloon driver is not loaded, eventually all > pages will be dirtied (non-zero), the emergency sweep will fail, and > there will be no memory to back outstanding PoD pages. When this > happens, the domain will crash. > > The code works for both shadow mode and HAP mode; it has been tested > with NPT/RVI and shadow, but not yet with EPT. It also attempts to > avoid splintering superpages, to allow HAP to function more > effectively. > > To use: > * ensure that you have a functioning balloon driver in the guest > (e.g., xen_balloon.ko for Linux HVM guests). > * Set maxmem/memory_static_max to one value, and > memory/memory_dynamic_max to another when creating the domain; e.g: > # xm create debian-hvm maxmem=512 memory=256 > > The patches are as follows: > 01 - Add a p2m_query_type to core gfn_to_mfn*() functions. > > 02 - Change some gfn_to_mfn() calls to gfn_to_mfn_query(), which will > not populate PoD entries. Specifically, since gfn_to_mfn() may grab > the p2m lock, it must not be called while the shadow lock is held. > > 03 - Populate-on-demand core. Introduce new p2m type, PoD cache > structures, and core functionality. Add PoD checking to audit_p2m(). > Add PoD information to the ''q'' debug key. > > 04 - Implement p2m_decrease_reservation. As the balloon driver > returns gfns to Xen, it handles PoD entries properly; it also "steals" > memory being returned for the PoD cache instead of freeing it, if > necessary. > > 05 - emergency sweep: Implement emergency sweep for zero memory if the > cache is low. If it finds pages (or page ranges) entirely zero, it > will replace the entry with a PoD entry again, reclaiming the memory > for the PoD cache. > > 06 - Deal with splintering both PoD pages (to back singleton PoD > entries) and PoD ranges > > 07 - Xen interface for populate-on-demand functionality: PoD flag for > populate_physmap, {get,set}_pod_target for interacting with the PoD > cache. set_pod_target() should be called for any domain that may have > PoD entries. It will increase the size of the cache if necessary, but > will never decrease the size of the cache. (This will be done as the > balloon driver balloons down.) > > 08 - libxc interface. Add a new libxc functions: > + xc_hvm_build_target_mem(), which accepts memsize and target. If > these are equal, PoD functionality is not invoked. Otherwise, memsize > is marked PoD, and the target MiB is allocated to the PoD cache. > + xc_[sg]et_pod_target(): get / set PoD target. set_pod_target() > should be called whenever you change the guest target mem on a domain > which may have outstaning PoD entries. This may increase the size of > the PoD cache up to the number of outstanding PoD entries, but will > not reduce the size of the cache. (The cache may be reduced as the > balloon driver returns gfn space to Xen.) > > 09 - xend integration. > + Always calls xc_hvm_build_target_mem() with memsize=maxmem and > target=memory. If these the same, the internal function will not use > PoD. > + Calls xc_set_target_mem() whenever a domain''s target is changed. > Also calls balloon.free(), causing dom0 to balloon down itself if > there''s not enough memory otherwise. > > Things still to do: > * When reduce_reservation() is called with a superpage, keep the > superpage intact. > * Create a hypercall continuation for set_pod_target. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Dec-24 01:46 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>From: George Dunlap >Sent: Tuesday, December 23, 2008 8:55 PM >BACKGROUND > >When non-PV domains boots, they typically read the e820 maps to >determine how much memory they have, and then assume that much memory >thereafter. Memory requirements can be reduced using a balloon >driver, but it cannot be increased past this initial value.Isn''t it also true for pv guest? Unless guest supports memory hotadd, balloon driver always can''t increase past the initial max memory. But your patch is nice since more VMs can be created w/o below hard limitation at boot time.>Currently, this means that a non-PV domain must be booted with the >maximum amount of memory you want that VM every to be able to use. > >Populate-on-demand allows us to "boot ballooned", in the >following manner: >* Mark the entire range of memory (memory_static_max aka maxmem) with >a new p2m type, populate_on_demand, reporting memory_static_max in th >e820 map. No memory is allocated at this stage. >* Allocate the "memory_dynamic_max" (aka "target") amount of memory >for a "PoD cache". This memory is kept on a separate list in the >domain struct. >* Boot the guest. >* Populate the p2m table on-demand as it''s accessed with pages from >the PoD cache. >* When the balloon driver loads, it inflates the balloon size to >(maxmem - target), giving the memory back to Xen. When this is >accomplished, the "populate-on-demand" portion of boot is effectively >finished. >Another tricky point could be with VT-d. If one guest page is used as DMA target before balloon driver is installed, and no early access on that page (like start-of-day scrubber), then PoD action will not be triggered... Not sure the possibility of such condition, but you may need to have some thought or guard on that. em... after more thinking, actually PoD pages may be alive even after balloon driver is installed. I guess before coming up a solution you may add a check on whether target domain has passthrough device to decide whether this feature is on on-the-fly. PoD is anyhow a bit different from balloon driver, since the latter claims ownership on ballooned pages which then will not be used as the DMA target within guest.> >NB that this code is designed to work only in conjunction with a >balloon driver. If the balloon driver is not loaded, eventually all >pages will be dirtied (non-zero), the emergency sweep will fail, and >there will be no memory to back outstanding PoD pages. When this >happens, the domain will crash.In that case, is it better to increase PoD target to configured max mem? It looks uncomfortable to crash a domain just because some optimization doesn''t apply. :-) Last, do you have any performance data on how this patch may impact the boot process, or even some workload after login? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-24 13:55 UTC
Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
On Tue, Dec 23, 2008 at 7:06 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:> Very nice!Thanks!> One thing that might be worth adding to the requirements list or > README is that this approach (or any which depends on ballooning) > will now almost certainly require any participating hvm domain > to have an adequately-sized properly-configured swap disk. > Ballooning is insufficiently responsive to grow memory fast > enough to handle rapidly growing memory needs of an active domain > The consequence for a no-swap-disk is application failures > and the consequence even if a swap disk IS configured is temporarily > very poor performance.I don''t think this is particular to the PoD patches, or even ballooning per se. A swap disk would be required any time you boot with a small amount of memory, whether it could be increased or not. But you''re right, in that this differs from a typical operating system''s "demang-paging" mechanism, where the goal is to give a process only the memory it actually needs, so you can use it for other processes. You''re still allocating a fixed amount of memory to a guest at start-up. The un-populated memory is not available to use by other VMs, and allocating more memory is a (relatively) slow process. I guess a brief note pointing out the difference between "populate on demand" and "allocate on demand" would be useful.> So this won''t work for any domain that does start-of-day > scrubbing with a non-zero value? I suppose that''s OK.Not if the scrubber might win the race against the balloon driver. :-) If this really becomes an issue, it should be straightforward to add functionality to handle it. It just requires having a simple way of specifying what "scrubbed" pages look like, an extra p2m type for "PoD scrubbed" (rather than PoD zero, the default), and how to change from scrubbed <-> zero. Did you have a particular system in mind? -George>> -----Original Message----- >> From: George Dunlap [mailto:dunlapg@umich.edu] >> Sent: Tuesday, December 23, 2008 5:55 AM >> To: xen-devel@lists.xensource.com >> Subject: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory >> >> >> This set of patches introduces a set of mechanisms and interfaces to >> implement populate-on-demand memory. The purpose of >> populate-on-demand memory is to allow non-paravirtualized guests (such >> as Windows or Linux HVM) boot in a ballooned state. >> >> BACKGROUND >> >> When non-PV domains boots, they typically read the e820 maps to >> determine how much memory they have, and then assume that much memory >> thereafter. Memory requirements can be reduced using a balloon >> driver, but it cannot be increased past this initial value. >> Currently, this means that a non-PV domain must be booted with the >> maximum amount of memory you want that VM every to be able to use. >> >> Populate-on-demand allows us to "boot ballooned", in the >> following manner: >> * Mark the entire range of memory (memory_static_max aka maxmem) with >> a new p2m type, populate_on_demand, reporting memory_static_max in th >> e820 map. No memory is allocated at this stage. >> * Allocate the "memory_dynamic_max" (aka "target") amount of memory >> for a "PoD cache". This memory is kept on a separate list in the >> domain struct. >> * Boot the guest. >> * Populate the p2m table on-demand as it''s accessed with pages from >> the PoD cache. >> * When the balloon driver loads, it inflates the balloon size to >> (maxmem - target), giving the memory back to Xen. When this is >> accomplished, the "populate-on-demand" portion of boot is effectively >> finished. >> >> One complication is that many operating systems have start-of-day page >> scrubbers, which touch all of memory to zero it. This scrubber may >> run before the balloon driver can return memory to Xen. These zeroed >> pages, however, don''t contain any information; we can safely replace >> them with PoD entries again. So when we run out of PoD cache, we do >> an "emergency sweep" to look for zero pages we can reclaim for the >> populate-on-demand cache. When we find a page range which is entirely >> zero, we mark the gfn range PoD again, and put the memory back into >> the PoD cache. >> >> NB that this code is designed to work only in conjunction with a >> balloon driver. If the balloon driver is not loaded, eventually all >> pages will be dirtied (non-zero), the emergency sweep will fail, and >> there will be no memory to back outstanding PoD pages. When this >> happens, the domain will crash. >> >> The code works for both shadow mode and HAP mode; it has been tested >> with NPT/RVI and shadow, but not yet with EPT. It also attempts to >> avoid splintering superpages, to allow HAP to function more >> effectively. >> >> To use: >> * ensure that you have a functioning balloon driver in the guest >> (e.g., xen_balloon.ko for Linux HVM guests). >> * Set maxmem/memory_static_max to one value, and >> memory/memory_dynamic_max to another when creating the domain; e.g: >> # xm create debian-hvm maxmem=512 memory=256 >> >> The patches are as follows: >> 01 - Add a p2m_query_type to core gfn_to_mfn*() functions. >> >> 02 - Change some gfn_to_mfn() calls to gfn_to_mfn_query(), which will >> not populate PoD entries. Specifically, since gfn_to_mfn() may grab >> the p2m lock, it must not be called while the shadow lock is held. >> >> 03 - Populate-on-demand core. Introduce new p2m type, PoD cache >> structures, and core functionality. Add PoD checking to audit_p2m(). >> Add PoD information to the ''q'' debug key. >> >> 04 - Implement p2m_decrease_reservation. As the balloon driver >> returns gfns to Xen, it handles PoD entries properly; it also "steals" >> memory being returned for the PoD cache instead of freeing it, if >> necessary. >> >> 05 - emergency sweep: Implement emergency sweep for zero memory if the >> cache is low. If it finds pages (or page ranges) entirely zero, it >> will replace the entry with a PoD entry again, reclaiming the memory >> for the PoD cache. >> >> 06 - Deal with splintering both PoD pages (to back singleton PoD >> entries) and PoD ranges >> >> 07 - Xen interface for populate-on-demand functionality: PoD flag for >> populate_physmap, {get,set}_pod_target for interacting with the PoD >> cache. set_pod_target() should be called for any domain that may have >> PoD entries. It will increase the size of the cache if necessary, but >> will never decrease the size of the cache. (This will be done as the >> balloon driver balloons down.) >> >> 08 - libxc interface. Add a new libxc functions: >> + xc_hvm_build_target_mem(), which accepts memsize and target. If >> these are equal, PoD functionality is not invoked. Otherwise, memsize >> is marked PoD, and the target MiB is allocated to the PoD cache. >> + xc_[sg]et_pod_target(): get / set PoD target. set_pod_target() >> should be called whenever you change the guest target mem on a domain >> which may have outstaning PoD entries. This may increase the size of >> the PoD cache up to the number of outstanding PoD entries, but will >> not reduce the size of the cache. (The cache may be reduced as the >> balloon driver returns gfn space to Xen.) >> >> 09 - xend integration. >> + Always calls xc_hvm_build_target_mem() with memsize=maxmem and >> target=memory. If these the same, the internal function will not use >> PoD. >> + Calls xc_set_target_mem() whenever a domain''s target is changed. >> Also calls balloon.free(), causing dom0 to balloon down itself if >> there''s not enough memory otherwise. >> >> Things still to do: >> * When reduce_reservation() is called with a superpage, keep the >> superpage intact. >> * Create a hypercall continuation for set_pod_target. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Dec-24 14:32 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
> > The consequence for a no-swap-disk is application failures > > and the consequence even if a swap disk IS configured is temporarily > > very poor performance. > > I don''t think this is particular to the PoD patches, or even > ballooning per se. A swap disk would be required any time you boot > with a small amount of memory, whether it could be increased or not. > > But you''re right, in that this differs from a typical operating > system''s "demang-paging" mechanism, where the goal is to give a > process only the memory it actually needs, so you can use it for other > processes. You''re still allocating a fixed amount of memory to a > guest at start-up. The un-populated memory is not available to use by > other VMs, and allocating more memory is a (relatively) slow process. > I guess a brief note pointing out the difference between "populate on > demand" and "allocate on demand" would be useful.Yes, its just that with your fix, Windows VM users are much more likely to use memory overcommit and will need to be "trained" to always configure a swap disk to ensure bad things don''t happen. And this swap disk had better be on a network-based medium or live migration won''t work.> > So this won''t work for any domain that does start-of-day > > scrubbing with a non-zero value? I suppose that''s OK. > > Not if the scrubber might win the race against the balloon driver. :-) > If this really becomes an issue, it should be straightforward to add > functionality to handle it. It just requires having a simple way of > specifying what "scrubbed" pages look like, an extra p2m type for "PoD > scrubbed" (rather than PoD zero, the default), and how to change from > scrubbed <-> zero. > > Did you have a particular system in mind?No, I had just given some limited thought to this problem previously, had considered the idea of sharing a zero page for the Windows start-of-day scrubbing problem, but didn''t know if the scrubbing always only used zeroes. If it does, great! I was worried that something like a secure version of Windows might use some other random bit pattern, but I''ll bet Windows elsewhere assumes that all pages start as zero-filled and is thus dependent on start-of-day ZERO scrubbing, so I''ll bet your approach will always work. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-24 14:42 UTC
Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
On Wed, Dec 24, 2008 at 1:46 AM, Tian, Kevin <kevin.tian@intel.com> wrote:>>* When the balloon driver loads, it inflates the balloon size to >>(maxmem - target), giving the memory back to Xen. When this is >>accomplished, the "populate-on-demand" portion of boot is effectively >>finished. >> > > Another tricky point could be with VT-d. If one guest page is used as > DMA target before balloon driver is installed, and no early access on > that page (like start-of-day scrubber), then PoD action will not be triggered... > Not sure the possibility of such condition, but you may need to have > some thought or guard on that. em... after more thinking, actually PoD > pages may be alive even after balloon driver is installed. I guess before > coming up a solution you may add a check on whether target domain > has passthrough device to decide whether this feature is on on-the-fly.Hmm, I haven''t looked at VT-d integration; it at least requires some examination. How are gfns translated to mfns for the VT-d hardware? Does it use the hardware EPT tables? Is the transaction re-startable if we get an EPT fault and then fix the EPT table? Any time gfn_to_mfn() is called, unless it''s specifcally called with the "query" type, the gfn is populated. That''s why qemu, the domain builder, &c work currently without any modifications. But if VT-d uses the EPT tables to translate requests for a guest in hardware, and the device requests can''t be easily re-started after an EPT fault, then this won''t work. A second issue is with the emergency sweep: if a page which happens to be zero ends up being the target of a DMA, we may get: * Device request to write to gfn X, which translates to mfn Y. * Demand-fault on gfn Z, with no pages in the cache. * Emergency sweep scans through gfn space, finds that mfn Y is empty. It replaces gfn X with a PoD entry, and puts mfn Y behind gfn Z. * The request finishes. Either the request then fails (because EPT translation for gfn X is not valid anymore), or it silently succeeds in writing to mfn Y, which is now behind gfn Z instead of gfn X. If we can''t tell that there''s an outstanding I/O on the page, then we can''t do an emergency sweep. If we have some way of knowing that there''s *some* outstanding I/O to *some* page, we could pause the guest until the I/O completes, then do the sweep. At any rate, until we have that worked out, we should probably add some "seatbelt" code to make sure that people don''t use PoD for a VT-d enabled domain. I know absolutely nothing about the VT-d code; could you either write a patch to do this check, or give me an idea of the simplest thing to check?>>NB that this code is designed to work only in conjunction with a >>balloon driver. If the balloon driver is not loaded, eventually all >>pages will be dirtied (non-zero), the emergency sweep will fail, and >>there will be no memory to back outstanding PoD pages. When this >>happens, the domain will crash. > > In that case, is it better to increase PoD target to configured max mem? > It looks uncomfortable to crash a domain just because some optimization > doesn''t apply. :-)If this happened, it wouldn''t be because an optimization didn''t apply, but because we purposely tried to use a feature for which a key component failed or wasn''t properly in place. If we set up a domain with VT-d access on a box with no VT-d hardware, it would fail as well -- just during boot, not 5 minutes after it. :-) We could to allocate a new page at that point; but it''s likely that the allocation will fail unless there happens to be memory lying around somewhere, not used by dom0 or any other doamin. And if that were the case, why not just start it with that much memory to begin with? The only way to make this more robust would be to pause the domain, send a message back to xend, have it try to balloon down domain 0 (or possibly other domains), increase the PoD cache size, and then unpause the domain again. This is not only a lot of work, but many of the failure modes will be really hard to handle; e.g., if qemu makes a hypercall that ends up doing a gfn_to_mfn() translation which fails, we would need to make that whole operation re-startable. I did look at this, but it''s a ton of work, and a lot of code changes (including interface changes bewteen Xen and dom0 components), for a situation which really should never happen in a properly configured system. There''s no reason that with a balloon driver which loads during boot, and a properly configured target (i.e., not unreasonably small), the driver shouldn''t be able to quickly reach its target.> Last, do you have any performance data on how this patch may impact > the boot process, or even some workload after login?I do not have any solid numbers. Perceptually, I haven''t noticed anything too slow. I''ll do some simple benchmarks. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-24 15:13 UTC
Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
On Wed, Dec 24, 2008 at 2:32 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:> Yes, its just that with your fix, Windows VM users are much more > likely to use memory overcommit and will need to be "trained" to > always configure a swap disk to ensure bad things don''t happen. > And this swap disk had better be on a network-based medium or > live migration won''t work.You mean they may be much more likely to under-provision memory to their VMs, booting with (say) 64M on the assumption that they can balloon it up to 512M if they want to? That seems rather unlikely to me... if they''re not likely to start a Windows VM with 64M normally, why would they be more likely to start with 64M now? I''d''ve thought it would be likely to go the other way: if they normally boot a guest with 256M, they can now start with maxmem=1G and memory=256M, and balloon it up if they want.> No, I had just given some limited thought to this problem previously, > had considered the idea of sharing a zero page for the Windows > start-of-day scrubbing problem, but didn''t know if the scrubbing > always only used zeroes. If it does, great! I was worried that > something like a secure version of Windows might use some other random > bit pattern, but I''ll bet Windows elsewhere assumes that all pages > start as zero-filled and is thus dependent on start-of-day ZERO > scrubbing, so I''ll bet your approach will always work.AIUI, Windows has two "free page" lists: zeroed, and dirty. The scrubber moves pages from the dirty list to the zero list. Most of the page allocation interfaces promise zeroed pages, as would mapping "anonymous" process memory (not sure the Windows term for that). So the most useful state for an un-allocated page to be in is zero, because there''s a high probability that it will have to be zeroed before it''s used anyway. At any rate, we can cross that bridge if we ever come to it. :-) -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Dec-24 15:35 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
> We could to allocate a new page at that point; but it''s likely that > the allocation will fail unless there happens to be memory lying > around somewhere, not used by dom0 or any other doamin. And if that > were the case, why not just start it with that much memory to begin > with?Actually, if dom0_mem is used rather than the default of letting domain0 absorb all free memory and dole it as needed to launching VMs, there will almost always be some memory lying around. And in the not-to-distant future, when live migration is more widely used, there had better be memory lying around or migration won''t work. As for "why not just start it with that much memory to begin with?"... because in most environments VMs are sized once (e.g. 512MB) and almost never changed... because sysadmins rarely want to be bothered with constantly fine tuning just to use an extra spare few MB of memory. That''s why your patch is so important! Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-24 15:46 UTC
Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
On Wed, Dec 24, 2008 at 3:35 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:>> We could to allocate a new page at that point; but it''s likely that >> the allocation will fail unless there happens to be memory lying >> around somewhere, not used by dom0 or any other doamin. And if that >> were the case, why not just start it with that much memory to begin >> with? > > Actually, if dom0_mem is used rather than the default of letting > domain0 absorb all free memory and dole it as needed to launching > VMs, there will almost always be some memory lying around.At any rate, I suppose it might not be a bad idea to *try* to allocate more memory in an emergency. I''ll add that to the list of improvements. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Dec-24 15:54 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
> On Wed, Dec 24, 2008 at 2:32 PM, Dan Magenheimer > <dan.magenheimer@oracle.com> wrote: > > Yes, its just that with your fix, Windows VM users are much more > > likely to use memory overcommit and will need to be "trained" to > > always configure a swap disk to ensure bad things don''t happen. > > And this swap disk had better be on a network-based medium or > > live migration won''t work. > > You mean they may be much more likely to under-provision memory to > their VMs, booting with (say) 64M on the assumption that they can > balloon it up to 512M if they want to? That seems rather unlikely to > me... if they''re not likely to start a Windows VM with 64M normally, > why would they be more likely to start with 64M now? I''d''ve thought > it would be likely to go the other way: if they normally boot a guest > with 256M, they can now start with maxmem=1G and memory=256M, and > balloon it up if they want.What I mean is that now that they CAN start with memory=256M and maxmem=1G, it is now much more likely that ballooning and memory overcommit will be used, possibly hidden by vendors'' tools. Once ballooning is used at all, memory can not only go above the starting memory= threshold but can also go below. Thus, your patch will make it more likely that "memory pressure" will be dynamically applied to Windows VMs, which means swapping is more likely to occur, which means there had better be a properly-sized swap disk. For example, on a 2GB system, a reasonable configuration might be: Windows VM1: memory=256M maxmem=1GB Windows VM2: memory=256M maxmem=1GB Windows VM3: memory=256M maxmem=1GB Windows VM4: memory=256M maxmem=1GB (dom0_mem=256M, Xen+heap=256M for the sake of argument) Assume that VM1 and VM2 are heavily loaded and VM3 and VM4 are idle (or nearly so). So VM1 and VM2 are ballooned up towards 1G by taking memory away from VM3 and VM4. Say VM3 and VM4 are ballooned down to about 128M each. Now VM3 and VM4 suddenly get loaded and need more memory. But VM1 and VM2 are hesitant to surrender memory because it is fully utilized. SOME VM is going to have to start swapping! So, I''m just saying that your patch makes this kind of scenario more likely, so listing the need for a swap disk in your README would be a good idea. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Dec-25 02:36 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>From: George Dunlap >Sent: Wednesday, December 24, 2008 10:43 PM >> Another tricky point could be with VT-d. If one guest page is used as >> DMA target before balloon driver is installed, and no early access on >> that page (like start-of-day scrubber), then PoD action will >not be triggered... >> Not sure the possibility of such condition, but you may need to have >> some thought or guard on that. em... after more thinking, >actually PoD >> pages may be alive even after balloon driver is installed. I >guess before >> coming up a solution you may add a check on whether target domain >> has passthrough device to decide whether this feature is on >on-the-fly. > >Hmm, I haven''t looked at VT-d integration; it at least requires some >examination. How are gfns translated to mfns for the VT-d hardware? >Does it use the hardware EPT tables? Is the transaction re-startable >if we get an EPT fault and then fix the EPT table?there''s a VT-d page table walked by VT-d engine, which is similar to EPT content. When device dma request is intercepted by VT-d engine, VT-d page table corresponding to that device is walked for valid mapping. Not like EPT which is restartable, VT-d page fault is just for log purpose since pci bus doesn''t support I/O restart yet (although pcisig is looking at this possibility). That says, if we can''t find a chance to trigger a cpu page fault before PoD page is used as dma target, either one should be disabled if both are configured.> >A second issue is with the emergency sweep: if a page which happens to >be zero ends up being the target of a DMA, we may get: >* Device request to write to gfn X, which translates to mfn Y. >* Demand-fault on gfn Z, with no pages in the cache. >* Emergency sweep scans through gfn space, finds that mfn Y is empty. >It replaces gfn X with a PoD entry, and puts mfn Y behind gfn Z. >* The request finishes. Either the request then fails (because EPT >translation for gfn X is not valid anymore), or it silently succeeds >in writing to mfn Y, which is now behind gfn Z instead of gfn X.yes, this is also one issue. the request will fail since the dma address written to device is gfn, while X->Y mapping is cut off due to sweep.> >If we can''t tell that there''s an outstanding I/O on the page, then we >can''t do an emergency sweep. If we have some way of knowing that >there''s *some* outstanding I/O to *some* page, we could pause the >guest until the I/O completes, then do the sweep.one possibility is to have a pv dma engine or virtual VT-d engine within guest, but that''s another story.> >At any rate, until we have that worked out, we should probably add >some "seatbelt" code to make sure that people don''t use PoD for a VT-d >enabled domain. I know absolutely nothing about the VT-d code; could >you either write a patch to do this check, or give me an idea of the >simplest thing to check?Weidong works on VT-d and could give comments on exact point to check.> >>>NB that this code is designed to work only in conjunction with a >>>balloon driver. If the balloon driver is not loaded, eventually all >>>pages will be dirtied (non-zero), the emergency sweep will fail, and >>>there will be no memory to back outstanding PoD pages. When this >>>happens, the domain will crash. >> >> In that case, is it better to increase PoD target to >configured max mem? >> It looks uncomfortable to crash a domain just because some >optimization >> doesn''t apply. :-) > >If this happened, it wouldn''t be because an optimization didn''t apply, >but because we purposely tried to use a feature for which a key >component failed or wasn''t properly in place. If we set up a domain >with VT-d access on a box with no VT-d hardware, it would fail as well >-- just during boot, not 5 minutes after it. :-)It''s different story regarding to VT-d, since as you said domain creation will fail due to lacking of VT-d support, and user can be aware of what''s happening immediately and then make approriate change to configuration file. Nothing is impacted. However in PoD case, failure of emergency sweep may happen after booting 5 minutes or even longer if guest doesn''t use too much memory, and then... crash. This is a bad user experience and especially some unsynced stuff could be lost. Anyway PoD looks like a nice-to-have feature, just like super page. In both cases, as long as there''re fallback chance, we''d better fallback instead of crash. for example, as long as free domheap pages are enough, use 4k page for failed super page case and expand PoD to max mem for domain which doesn''t install a balloon driver successfully. In a environment with such over-commitment support, not all VMs are expected to participate into that party. :-) A side question is how emergency sweep failure could be checked and reported to user...> >We could to allocate a new page at that point; but it''s likely that >the allocation will fail unless there happens to be memory lying >around somewhere, not used by dom0 or any other doamin. And if that >were the case, why not just start it with that much memory to begin >with?This is the case that user''s willing to use PoD doesn''t mean it always successful. You won''t expect to have user to disable PoD and use that much memory only after several rounds of crash experience.> >The only way to make this more robust would be to pause the domain, >send a message back to xend, have it try to balloon down domain 0 (or >possibly other domains), increase the PoD cache size, and then unpause >the domain again. This is not only a lot of work, but many of the >failure modes will be really hard to handle; e.g., if qemu makes a >hypercall that ends up doing a gfn_to_mfn() translation which fails, >we would need to make that whole operation re-startable. I did look >at this, but it''s a ton of work, and a lot of code changes (including >interface changes bewteen Xen and dom0 components), for a situation >which really should never happen in a properly configured system. >There''s no reason that with a balloon driver which loads during boot, >and a properly configured target (i.e., not unreasonably small), the >driver shouldn''t be able to quickly reach its target.So I think a simple fallback to expand PoD to maxmem automatically can avoid such complexity.> >> Last, do you have any performance data on how this patch may impact >> the boot process, or even some workload after login? > >I do not have any solid numbers. Perceptually, I haven''t noticed >anything too slow. I''ll do some simple benchmarks. >Thanks for your good work. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Dec-25 02:47 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: Wednesday, December 24, 2008 11:35 PM > >> We could to allocate a new page at that point; but it''s likely that >> the allocation will fail unless there happens to be memory lying >> around somewhere, not used by dom0 or any other doamin. And if that >> were the case, why not just start it with that much memory to begin >> with? > >Actually, if dom0_mem is used rather than the default of letting >domain0 absorb all free memory and dole it as needed to launching >VMs, there will almost always be some memory lying around.I recall some previous discussion to have explicit dom0_mem setting instead of blindly giving all memory to dom0. How about others'' preferrence on this option? At least another benefit to limit dom0_mem size, iirc, is about numa node aware memory allocation. Currently Xen could allocate memory by taking node factor into consideration, but once all memories are allocated to dom0 in the start, it''d be much more complex since balloon driver is not node aware and thus can''t selectively give back pages from dom0, which then nullify xen''s node aware allocator.> >And in the not-to-distant future, when live migration is >more widely used, there had better be memory lying around >or migration won''t work.live migration seems orthogonal here, since new domain is created and the condition doesn''t change as long as dom0 has enough memory to balloon back. :-)> >As for "why not just start it with that much memory to begin >with?"... because in most environments VMs are sized once >(e.g. 512MB) and almost never changed... because sysadmins >rarely want to be bothered with constantly fine tuning >just to use an extra spare few MB of memory. > >That''s why your patch is so important!Agree. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Han, Weidong
2008-Dec-25 05:43 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
Tian, Kevin wrote:>> From: George Dunlap >> Sent: Wednesday, December 24, 2008 10:43 PM >>> Another tricky point could be with VT-d. If one guest page is used >>> as DMA target before balloon driver is installed, and no early >>> access on that page (like start-of-day scrubber), then PoD action >>> will not be triggered... Not sure the possibility of such >>> condition, but you may need to have some thought or guard on that. >>> em... after more thinking, actually PoD pages may be alive even >>> after balloon driver is installed. I guess before coming up a >>> solution you may add a check on whether target domain has >>> passthrough device to decide whether this feature is on on-the-fly. >> >> Hmm, I haven''t looked at VT-d integration; it at least requires some >> examination. How are gfns translated to mfns for the VT-d hardware? >> Does it use the hardware EPT tables? Is the transaction re-startable >> if we get an EPT fault and then fix the EPT table? > > there''s a VT-d page table walked by VT-d engine, which is similar to > EPT content. When device dma request is intercepted by VT-d engine, > VT-d page table corresponding to that device is walked for valid > mapping. Not like EPT which is restartable, VT-d page fault is just > for log purpose since pci bus doesn''t support I/O restart yet > (although pcisig is looking at this possibility). That says, if we > can''t find a chance to trigger a cpu page fault before PoD page is > used as dma target, either one should be disabled if both are > configured. > >> >> A second issue is with the emergency sweep: if a page which happens >> to be zero ends up being the target of a DMA, we may get: >> * Device request to write to gfn X, which translates to mfn Y. >> * Demand-fault on gfn Z, with no pages in the cache. >> * Emergency sweep scans through gfn space, finds that mfn Y is empty. >> It replaces gfn X with a PoD entry, and puts mfn Y behind gfn Z. >> * The request finishes. Either the request then fails (because EPT >> translation for gfn X is not valid anymore), or it silently succeeds >> in writing to mfn Y, which is now behind gfn Z instead of gfn X. > > yes, this is also one issue. the request will fail since the dma > address written to device is gfn, while X->Y mapping is cut off due > to sweep. > >> >> If we can''t tell that there''s an outstanding I/O on the page, then we >> can''t do an emergency sweep. If we have some way of knowing that >> there''s *some* outstanding I/O to *some* page, we could pause the >> guest until the I/O completes, then do the sweep. > > one possibility is to have a pv dma engine or virtual VT-d engine > within guest, but that''s another story. > >> >> At any rate, until we have that worked out, we should probably add >> some "seatbelt" code to make sure that people don''t use PoD for a >> VT-d enabled domain. I know absolutely nothing about the VT-d code; >> could you either write a patch to do this check, or give me an idea >> of the simplest thing to check? > > Weidong works on VT-d and could give comments on exact point > to check. >You can simply check "iommu_enabled" to know whether IOMMU including VT-d and AMD IOMMU is used or not. Regards, Weidong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Dec-25 11:45 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>From: Han, Weidong >Sent: Thursday, December 25, 2008 1:43 PM >>> >>> At any rate, until we have that worked out, we should probably add >>> some "seatbelt" code to make sure that people don''t use PoD for a >>> VT-d enabled domain. I know absolutely nothing about the VT-d code; >>> could you either write a patch to do this check, or give me an idea >>> of the simplest thing to check? >> >> Weidong works on VT-d and could give comments on exact point >> to check. >> > >You can simply check "iommu_enabled" to know whether IOMMU >including VT-d and AMD IOMMU is used or not. >Weidong, does iommu_enabled indicate IOMMU h/w availability? Then you''ll have this nice feature disabled on most new platform with IOMMU shipped. :-) Here a domain-based check is required, i.e. PoD is only appliable when target domain is not passthroughed with any device. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Han, Weidong
2008-Dec-26 00:42 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
Tian, Kevin wrote:>> From: Han, Weidong >> Sent: Thursday, December 25, 2008 1:43 PM >>>> >>>> At any rate, until we have that worked out, we should probably add >>>> some "seatbelt" code to make sure that people don''t use PoD for a >>>> VT-d enabled domain. I know absolutely nothing about the VT-d >>>> code; could you either write a patch to do this check, or give me >>>> an idea of the simplest thing to check? >>> >>> Weidong works on VT-d and could give comments on exact point to >>> check. >>> >> >> You can simply check "iommu_enabled" to know whether IOMMU >> including VT-d and AMD IOMMU is used or not. >> > > Weidong, does iommu_enabled indicate IOMMU h/w availability? > Then you''ll have this nice feature disabled on most new platform > with IOMMU shipped. :-) Here a domain-based check is required, > i.e. PoD is only appliable when target domain is not passthroughed > with any device. > > Thanks, > Keviniommu_enabled will be set when IOMMU h/w is available and user sets "iommu=1" in grub to use it. Because device hotplug with VT-d is already supported, I think domain-based check is not enough, it''s better to disable PoD when iommu_enabled is set. Regards, Weidong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Dec-30 09:26 UTC
Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
At 15:46 +0000 on 24 Dec (1230133560), George Dunlap wrote:> On Wed, Dec 24, 2008 at 3:35 PM, Dan Magenheimer > <dan.magenheimer@oracle.com> wrote: > >> We could to allocate a new page at that point; but it''s likely that > >> the allocation will fail unless there happens to be memory lying > >> around somewhere, not used by dom0 or any other doamin. And if that > >> were the case, why not just start it with that much memory to begin > >> with? > > > > Actually, if dom0_mem is used rather than the default of letting > > domain0 absorb all free memory and dole it as needed to launching > > VMs, there will almost always be some memory lying around. > > At any rate, I suppose it might not be a bad idea to *try* to allocate > more memory in an emergency. I''ll add that to the list of > improvements.Please don''t do this. It''s not OK for a domain to start using more memory without the say-so of the tool stack. Since this emergency condition means something has gone wrong (balloon driver failed to start) then you''re probably just postponing the inevitable, and in the meantime you might cause problems for domains that *aren''t* misbehaving. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Dec-31 01:40 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>From: Tim Deegan [mailto:Tim.Deegan@citrix.com] >Sent: Tuesday, December 30, 2008 5:27 PM > >At 15:46 +0000 on 24 Dec (1230133560), George Dunlap wrote: >> On Wed, Dec 24, 2008 at 3:35 PM, Dan Magenheimer >> <dan.magenheimer@oracle.com> wrote: >> >> We could to allocate a new page at that point; but it''s >likely that >> >> the allocation will fail unless there happens to be memory lying >> >> around somewhere, not used by dom0 or any other doamin. >And if that >> >> were the case, why not just start it with that much >memory to begin >> >> with? >> > >> > Actually, if dom0_mem is used rather than the default of letting >> > domain0 absorb all free memory and dole it as needed to launching >> > VMs, there will almost always be some memory lying around. >> >> At any rate, I suppose it might not be a bad idea to *try* >to allocate >> more memory in an emergency. I''ll add that to the list of >> improvements. > >Please don''t do this. It''s not OK for a domain to start using more >memory without the say-so of the tool stack. Since this emergency >condition means something has gone wrong (balloon driver failed to >start) then you''re probably just postponing the inevitable, and in the >meantime you might cause problems for domains that *aren''t* >misbehaving. >Then a user controlled option would fit here, which indicate whether given domain is important and then emergency expansion could be allowed in such case if mandatory kill is not acceptable. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Jan-02 10:03 UTC
Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
Hi, At 09:40 +0800 on 31 Dec (1230716432), Tian, Kevin wrote:> >From: Tim Deegan [mailto:Tim.Deegan@citrix.com] > >At 15:46 +0000 on 24 Dec (1230133560), George Dunlap wrote: > >> At any rate, I suppose it might not be a bad idea to *try* to allocate > >> more memory in an emergency. I''ll add that to the list of > >> improvements. > > > >Please don''t do this. It''s not OK for a domain to start using more > >memory without the say-so of the tool stack. Since this emergency > >condition means something has gone wrong (balloon driver failed to > >start) then you''re probably just postponing the inevitable, and in the > >meantime you might cause problems for domains that *aren''t* > >misbehaving. > > > > Then a user controlled option would fit here, which indicate whether > given domain is important and then emergency expansion could be > allowed in such case if mandatory kill is not acceptable.What if you''re booting two important domains, one of which misbehaves and uses extra memory, causing the second boot to fail? They were both important, and you''ve just chosen the buggy one. :) Anyway, the only way to guarantee that a domain will boot even if it fails to launch its balloon driver is to make sure there is enough memory around for it to populate its entire p2m -- in which case you might as well just allocate it all that memory in the first place and avoid the extra risk of a bug in the pod code nobbling this important domain. The marginal benefit of allowing it to break the rules in the case where things go "slightly wrong" (i.e. it overruns its allocation but somehow recovers before using all available memory) seems so small to me that it''s not even worth the extra lines of code in Xen and xend. Especially since probably either nobody would turn it on, or everyone would turn it on for every domain. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-Jan-05 06:08 UTC
RE: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>From: Tim Deegan [mailto:Tim.Deegan@citrix.com] >Sent: Friday, January 02, 2009 6:04 PM >Hi, > >At 09:40 +0800 on 31 Dec (1230716432), Tian, Kevin wrote: >> >From: Tim Deegan [mailto:Tim.Deegan@citrix.com] >> >At 15:46 +0000 on 24 Dec (1230133560), George Dunlap wrote: >> >> At any rate, I suppose it might not be a bad idea to >*try* to allocate >> >> more memory in an emergency. I''ll add that to the list of >> >> improvements. >> > >> >Please don''t do this. It''s not OK for a domain to start using more >> >memory without the say-so of the tool stack. Since this emergency >> >condition means something has gone wrong (balloon driver failed to >> >start) then you''re probably just postponing the inevitable, >and in the >> >meantime you might cause problems for domains that *aren''t* >> >misbehaving. >> > >> >> Then a user controlled option would fit here, which indicate whether >> given domain is important and then emergency expansion could be >> allowed in such case if mandatory kill is not acceptable. > >What if you''re booting two important domains, one of which misbehaves >and uses extra memory, causing the second boot to fail? They were both >important, and you''ve just chosen the buggy one. :) > >Anyway, the only way to guarantee that a domain will boot even if it >fails to launch its balloon driver is to make sure there is enough >memory around for it to populate its entire p2m -- in which case you >might as well just allocate it all that memory in the first place and >avoid the extra risk of a bug in the pod code nobbling this important >domain. > >The marginal benefit of allowing it to break the rules in the >case where >things go "slightly wrong" (i.e. it overruns its allocation but somehow >recovers before using all available memory) seems so small to me that >it''s not even worth the extra lines of code in Xen and xend. >Especially >since probably either nobody would turn it on, or everyone >would turn it >on for every domain. >ok, a sound argument. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel