These changes make xenpaging work for me. The first one is a fix for a bug I introduced shortly before leaving for vacation. Currently the guest will hang because the ring fills up when the balloon driver removes pages. The second one avoids a crash during BIOS startup. The third one is another version of the machine_to_phys_mapping[] handling. The fourth is only required when the balloon driver is used in the guest. The fifth implements a config option to allow an automated start of xenpaging for a given guest. And the last one lists some action items for xenpaging. As of now, xenpaging will either crash the hypervisor or the guest due to page-out of low memory. Perhaps the building of xenpaging could be disabled in the Makefile? Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Jan-16 16:32 UTC
[Xen-devel] [PATCH 1/6] xenpaging: correct dropping pages to avoid full ring buffer
xenpaging uses the mem_event ring buffer, which expects request/response pairs to make progress. The previous patch, which tried to establish a one-way communication from Xen to xenpaging, stalled the guest once the buffer was filled up with requests. A simple fix is to take the slow path and let p2m_mem_paging_resume() consume the response from xenpaging. This makes room for yet another request/response pair and avoids hanging guests. Signed-off-by: Olaf Hering <olaf@aepfle.de> --- tools/xenpaging/xenpaging.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) --- xen-unstable.hg-4.1.22764.orig/tools/xenpaging/xenpaging.c +++ xen-unstable.hg-4.1.22764/tools/xenpaging/xenpaging.c @@ -653,19 +653,19 @@ int main(int argc, char *argv[]) ERROR("Error populating page"); goto out; } + } - /* Prepare the response */ - rsp.gfn = req.gfn; - rsp.p2mt = req.p2mt; - rsp.vcpu_id = req.vcpu_id; - rsp.flags = req.flags; + /* Prepare the response */ + rsp.gfn = req.gfn; + rsp.p2mt = req.p2mt; + rsp.vcpu_id = req.vcpu_id; + rsp.flags = req.flags; - rc = xenpaging_resume_page(paging, &rsp, 1); - if ( rc != 0 ) - { - ERROR("Error resuming page"); - goto out; - } + rc = xenpaging_resume_page(paging, &rsp, 1); + if ( rc != 0 ) + { + ERROR("Error resuming page"); + goto out; } /* Evict a new page to replace the one we just paged in */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Jan-16 16:32 UTC
[Xen-devel] [PATCH 2/6] xenpaging: prevent page-out of first 16MB
This is more a workaround than a bugfix: Don''t page out first 16MB of memory. When the BIOS does its initialization process and xenpaging removes pages, crashes will occour due to lack of support of xenpaging. A more complet change to prevent the early crashes is to use the newly added wait_queue feature in the gfn_to_mfn() variants. Signed-off-by: Olaf Hering <olaf@aepfle.de> --- tools/xenpaging/policy_default.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- xen-unstable.hg-4.1.22746.orig/tools/xenpaging/policy_default.c +++ xen-unstable.hg-4.1.22746/tools/xenpaging/policy_default.c @@ -73,8 +73,9 @@ int policy_init(xenpaging_t *paging) for ( i = 0; i < mru_size; i++ ) mru[i] = INVALID_MFN; - /* Don''t page out page 0 */ - set_bit(0, bitmap); + /* Don''t page out first 16MB */ + for ( i = 0; i < ((16*1024*1024)/4096); i++ ) + set_bit(i, bitmap); out: return rc; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Jan-16 16:32 UTC
[Xen-devel] [PATCH 3/6] xenpaging: update machine_to_phys_mapping[] during page deallocation
The machine_to_phys_mapping[] array needs updating during page deallocation. If that page is allocated again, a call to get_gpfn_from_mfn() will still return an old gfn from another guest. This will cause trouble because this gfn number has no or different meaning in the context of the current guest. This happens when the entire guest ram is paged-out before xen_vga_populate_vram() runs. Then XENMEM_populate_physmap is called with gfn 0xff000. A new page is allocated with alloc_domheap_pages. This new page does not have a gfn yet. However, in guest_physmap_add_entry() the passed mfn maps still to an old gfn (perhaps from another old guest). This old gfn is in paged-out state in this guests context and has no mfn anymore. As a result, the ASSERT() triggers because p2m_is_ram() is true for p2m_ram_paging* types. If the machine_to_phys_mapping[] array is updated properly, both loops in guest_physmap_add_entry() turn into no-ops for the new page and the mfn/gfn mapping will be done at the end of the function. If XENMEM_add_to_physmap is used with XENMAPSPACE_gmfn, get_gpfn_from_mfn() will return an appearently valid gfn. As a result, guest_physmap_remove_page() is called. The ASSERT in p2m_remove_page triggers because the passed mfn does not match the old mfn for the passed gfn. Signed-off-by: Olaf Hering <olaf@aepfle.de> --- xen/common/page_alloc.c | 9 +++++++++ 1 file changed, 9 insertions(+) --- xen-unstable.hg-4.1.22764.orig/xen/common/page_alloc.c +++ xen-unstable.hg-4.1.22764/xen/common/page_alloc.c @@ -1200,6 +1200,7 @@ void free_domheap_pages(struct page_info { int i, drop_dom_ref; struct domain *d = page_get_owner(pg); + unsigned long mfn; ASSERT(!in_irq()); @@ -1257,6 +1258,14 @@ void free_domheap_pages(struct page_info drop_dom_ref = 0; } + /* this page is not a gfn anymore */ + mfn = page_to_mfn(pg); + for ( i = 0; i < (1 << order); i++ ) + { + page_set_owner(&pg[i], NULL); + set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY); + } + if ( drop_dom_ref ) put_domain(d); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Jan-16 16:32 UTC
[Xen-devel] [PATCH 4/6] xenpaging: handle HVMCOPY_gfn_paged_out in copy_from/to_user
copy_from_user_hvm can fail when __hvm_copy returns HVMCOPY_gfn_paged_out for a referenced gfn, for example during guests pagetable walk. This has to be handled in some way. For the time being, return -EAGAIN for the most common case (xen_balloon driver crashing in guest) until the recently added waitqueues will be used. Signed-off-by: Olaf Hering <olaf@aepfle.de> --- xen/arch/x86/hvm/hvm.c | 4 ++++ xen/common/memory.c | 39 ++++++++++++++++++++++++++++++++++----- 2 files changed, 38 insertions(+), 5 deletions(-) --- xen-unstable.hg-4.1.22764.orig/xen/arch/x86/hvm/hvm.c +++ xen-unstable.hg-4.1.22764/xen/arch/x86/hvm/hvm.c @@ -2163,6 +2163,8 @@ unsigned long copy_to_user_hvm(void *to, rc = hvm_copy_to_guest_virt_nofault((unsigned long)to, (void *)from, len, 0); + if ( unlikely(rc == HVMCOPY_gfn_paged_out) ) + return -EAGAIN; return rc ? len : 0; /* fake a copy_to_user() return code */ } @@ -2180,6 +2182,8 @@ unsigned long copy_from_user_hvm(void *t #endif rc = hvm_copy_from_guest_virt_nofault(to, (unsigned long)from, len, 0); + if ( unlikely(rc == HVMCOPY_gfn_paged_out) ) + return -EAGAIN; return rc ? len : 0; /* fake a copy_from_user() return code */ } --- xen-unstable.hg-4.1.22764.orig/xen/common/memory.c +++ xen-unstable.hg-4.1.22764/xen/common/memory.c @@ -48,6 +48,7 @@ static void increase_reservation(struct { struct page_info *page; unsigned long i; + unsigned long ctg_ret; xen_pfn_t mfn; struct domain *d = a->domain; @@ -81,8 +82,13 @@ static void increase_reservation(struct if ( !guest_handle_is_null(a->extent_list) ) { mfn = page_to_mfn(page); - if ( unlikely(__copy_to_guest_offset(a->extent_list, i, &mfn, 1)) ) + ctg_ret = __copy_to_guest_offset(a->extent_list, i, &mfn, 1); + if ( unlikely(ctg_ret) ) + { + if ( (long)ctg_ret == -EAGAIN ) + a->preempted = 1; goto out; + } } } @@ -94,6 +100,7 @@ static void populate_physmap(struct memo { struct page_info *page; unsigned long i, j; + unsigned long cftg_ret; xen_pfn_t gpfn, mfn; struct domain *d = a->domain; @@ -112,8 +119,13 @@ static void populate_physmap(struct memo goto out; } - if ( unlikely(__copy_from_guest_offset(&gpfn, a->extent_list, i, 1)) ) + cftg_ret = __copy_from_guest_offset(&gpfn, a->extent_list, i, 1); + if ( unlikely(cftg_ret) ) + { + if ( (long)cftg_ret == -EAGAIN ) + a->preempted = 1; goto out; + } if ( a->memflags & MEMF_populate_on_demand ) { @@ -143,8 +155,13 @@ static void populate_physmap(struct memo set_gpfn_from_mfn(mfn + j, gpfn + j); /* Inform the domain of the new page''s machine address. */ - if ( unlikely(__copy_to_guest_offset(a->extent_list, i, &mfn, 1)) ) + cftg_ret = __copy_to_guest_offset(a->extent_list, i, &mfn, 1); + if ( unlikely(cftg_ret) ) + { + if ( (long)cftg_ret == -EAGAIN ) + a->preempted = 1; goto out; + } } } } @@ -213,6 +230,7 @@ int guest_remove_page(struct domain *d, static void decrease_reservation(struct memop_args *a) { unsigned long i, j; + unsigned long cfg_ret; xen_pfn_t gmfn; if ( !guest_handle_subrange_okay(a->extent_list, a->nr_done, @@ -227,8 +245,13 @@ static void decrease_reservation(struct goto out; } - if ( unlikely(__copy_from_guest_offset(&gmfn, a->extent_list, i, 1)) ) + cfg_ret = __copy_from_guest_offset(&gmfn, a->extent_list, i, 1); + if ( unlikely(cfg_ret) ) + { + if ( (long)cfg_ret == -EAGAIN ) + a->preempted = 1; goto out; + } if ( tb_init_done ) { @@ -509,6 +532,7 @@ long do_memory_op(unsigned long cmd, XEN int rc, op; unsigned int address_bits; unsigned long start_extent; + unsigned long cfg_ret; struct xen_memory_reservation reservation; struct memop_args args; domid_t domid; @@ -522,8 +546,13 @@ long do_memory_op(unsigned long cmd, XEN case XENMEM_populate_physmap: start_extent = cmd >> MEMOP_EXTENT_SHIFT; - if ( copy_from_guest(&reservation, arg, 1) ) + cfg_ret = copy_from_guest(&reservation, arg, 1); + if ( unlikely(cfg_ret) ) + { + if ( (long)cfg_ret == -EAGAIN ) + return hypercall_create_continuation(__HYPERVISOR_memory_op, "lh", cmd, arg); return start_extent; + } /* Is size too large for us to encode a continuation? */ if ( reservation.nr_extents > (ULONG_MAX >> MEMOP_EXTENT_SHIFT) ) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Jan-16 16:32 UTC
[Xen-devel] [PATCH 5/6] xenpaging: start xenpaging via config option
Start xenpaging via config option. TODO: add config option for different pagefile directory TODO: add libxl support TODO: parse config values like 42K, 42M, 42G, 42% Signed-off-by: Olaf Hering <olaf@aepfle.de> --- v3: move debug for stopping xenpaging to destroyXenPaging v2: unlink logfile instead of truncating it. allows hardlinking for further inspection tools/examples/xmexample.hvm | 3 + tools/python/README.XendConfig | 1 tools/python/README.sxpcfg | 1 tools/python/xen/xend/XendConfig.py | 3 + tools/python/xen/xend/XendDomainInfo.py | 5 + tools/python/xen/xend/image.py | 87 ++++++++++++++++++++++++++++++++ tools/python/xen/xm/create.py | 5 + tools/python/xen/xm/xenapi_create.py | 1 8 files changed, 106 insertions(+) --- xen-unstable.hg-4.1.22746.orig/tools/examples/xmexample.hvm +++ xen-unstable.hg-4.1.22746/tools/examples/xmexample.hvm @@ -127,6 +127,9 @@ disk = [ ''file:/var/images/min-el3-i386. # Device Model to be used device_model = ''qemu-dm'' +# xenpaging, number of pages, or -1 for entire guest memory range +xenpaging = 42 + #----------------------------------------------------------------------------- # boot on floppy (a), hard disk (c), Network (n) or CD-ROM (d) # default: hard disk, cd-rom, floppy --- xen-unstable.hg-4.1.22746.orig/tools/python/README.XendConfig +++ xen-unstable.hg-4.1.22746/tools/python/README.XendConfig @@ -120,6 +120,7 @@ otherConfig image.vncdisplay image.vncunused image.hvm.device_model + image.hvm.xenpaging image.hvm.display image.hvm.xauthority image.hvm.vncconsole --- xen-unstable.hg-4.1.22746.orig/tools/python/README.sxpcfg +++ xen-unstable.hg-4.1.22746/tools/python/README.sxpcfg @@ -51,6 +51,7 @@ image - vncunused (HVM) - device_model + - xenpaging - display - xauthority - vncconsole --- xen-unstable.hg-4.1.22746.orig/tools/python/xen/xend/XendConfig.py +++ xen-unstable.hg-4.1.22746/tools/python/xen/xend/XendConfig.py @@ -147,6 +147,7 @@ XENAPI_PLATFORM_CFG_TYPES = { ''apic'': int, ''boot'': str, ''device_model'': str, + ''xenpaging'': int, ''loader'': str, ''display'' : str, ''fda'': str, @@ -512,6 +513,8 @@ class XendConfig(dict): self[''platform''][''nomigrate''] = 0 if self.is_hvm(): + if ''xenpaging'' not in self[''platform'']: + self[''platform''][''xenpaging''] = None if ''timer_mode'' not in self[''platform'']: self[''platform''][''timer_mode''] = 1 if ''viridian'' not in self[''platform'']: --- xen-unstable.hg-4.1.22746.orig/tools/python/xen/xend/XendDomainInfo.py +++ xen-unstable.hg-4.1.22746/tools/python/xen/xend/XendDomainInfo.py @@ -2390,6 +2390,7 @@ class XendDomainInfo: if self.image: self.image.createDeviceModel() + self.image.createXenPaging() #if have pass-through devs, need the virtual pci slots info from qemu self.pci_device_configure_boot() @@ -2402,6 +2403,10 @@ class XendDomainInfo: self.image.destroyDeviceModel() except Exception, e: log.exception("Device model destroy failed %s" % str(e)) + try: + self.image.destroyXenPaging() + except Exception, e: + log.exception("stopping xenpaging failed %s" % str(e)) else: log.debug("No device model") --- xen-unstable.hg-4.1.22746.orig/tools/python/xen/xend/image.py +++ xen-unstable.hg-4.1.22746/tools/python/xen/xend/image.py @@ -122,12 +122,16 @@ class ImageHandler: self.vm.permissionsVm("image/cmdline", { ''dom'': self.vm.getDomid(), ''read'': True } ) self.device_model = vmConfig[''platform''].get(''device_model'') + self.xenpaging = vmConfig[''platform''].get(''xenpaging'') + if self.xenpaging == 0: + self.xenpaging = None self.display = vmConfig[''platform''].get(''display'') self.xauthority = vmConfig[''platform''].get(''xauthority'') self.vncconsole = int(vmConfig[''platform''].get(''vncconsole'', 0)) self.dmargs = self.parseDeviceModelArgs(vmConfig) self.pid = None + self.xenpaging_pid = None rtc_timeoffset = int(vmConfig[''platform''].get(''rtc_timeoffset'', 0)) if int(vmConfig[''platform''].get(''localtime'', 0)): if time.localtime(time.time())[8]: @@ -392,6 +396,89 @@ class ImageHandler: sentinel_fifos_inuse[sentinel_path_fifo] = 1 self.sentinel_path_fifo = sentinel_path_fifo + def createXenPaging(self): + if self.xenpaging is None: + return + if self.xenpaging_pid: + return + xenpaging_bin = auxbin.pathTo("xenpaging") + args = [xenpaging_bin] + args = args + ([ "%d" % self.vm.getDomid()]) + args = args + ([ "%s" % self.xenpaging]) + env = dict(os.environ) + self.xenpaging_logfile = "/var/log/xen/xenpaging-%s.log" % str(self.vm.info[''name_label'']) + logfile_mode = os.O_WRONLY|os.O_CREAT|os.O_APPEND|os.O_TRUNC + null = os.open("/dev/null", os.O_RDONLY) + try: + os.unlink(self.xenpaging_logfile) + except: + pass + logfd = os.open(self.xenpaging_logfile, logfile_mode, 0644) + sys.stderr.flush() + contract = osdep.prefork("%s:%d" % (self.vm.getName(), self.vm.getDomid())) + xenpaging_pid = os.fork() + if xenpaging_pid == 0: #child + try: + xenpaging_dir = "/var/lib/xen/xenpaging" + osdep.postfork(contract) + os.dup2(null, 0) + os.dup2(logfd, 1) + os.dup2(logfd, 2) + try: + os.chdir(xenpaging_dir) + except: + log.warn("chdir %s failed" % xenpaging_dir) + try: + log.info("starting %s" % args) + os.execve(xenpaging_bin, args, env) + except Exception, e: + print >>sys.stderr, ( + ''failed to execute xenpaging: %s: %s'' % + xenpaging_bin, utils.exception_string(e)) + os._exit(126) + except Exception, e: + log.warn("staring xenpaging in %s failed" % xenpaging_dir) + os._exit(127) + else: + osdep.postfork(contract, abandon=True) + self.xenpaging_pid = xenpaging_pid + os.close(null) + os.close(logfd) + + def destroyXenPaging(self): + if self.xenpaging is None: + return + log.debug("stopping xenpaging") + if self.xenpaging_pid: + try: + os.kill(self.xenpaging_pid, signal.SIGHUP) + except OSError, exn: + log.exception(exn) + for i in xrange(100): + try: + (p, rv) = os.waitpid(self.xenpaging_pid, os.WNOHANG) + if p == self.xenpaging_pid: + break + except OSError: + # This is expected if Xend has been restarted within + # the life of this domain. In this case, we can kill + # the process, but we can''t wait for it because it''s + # not our child. We continue this loop, and after it is + # terminated make really sure the process is going away + # (SIGKILL). + pass + time.sleep(0.1) + else: + log.warning("xenpaging %d took more than 10s " + "to terminate: sending SIGKILL" % self.xenpaging_pid) + try: + os.kill(self.xenpaging_pid, signal.SIGKILL) + os.waitpid(self.xenpaging_pid, 0) + except OSError: + # This happens if the process doesn''t exist. + pass + self.xenpaging_pid = None + def createDeviceModel(self, restore = False): if self.device_model is None: return --- xen-unstable.hg-4.1.22746.orig/tools/python/xen/xm/create.py +++ xen-unstable.hg-4.1.22746/tools/python/xen/xm/create.py @@ -491,6 +491,10 @@ gopts.var(''nfs_root'', val="PATH", fn=set_value, default=None, use="Set the path of the root NFS directory.") +gopts.var(''xenpaging'', val=''NUM'', + fn=set_int, default=None, + use="Number of pages to swap.") + gopts.var(''device_model'', val=''FILE'', fn=set_value, default=None, use="Path to device model program.") @@ -1076,6 +1080,7 @@ def configure_hvm(config_image, vals): args = [ ''acpi'', ''apic'', ''boot'', ''cpuid'', ''cpuid_check'', + ''xenpaging'', ''device_model'', ''display'', ''fda'', ''fdb'', ''gfx_passthru'', ''guest_os_type'', --- xen-unstable.hg-4.1.22746.orig/tools/python/xen/xm/xenapi_create.py +++ xen-unstable.hg-4.1.22746/tools/python/xen/xm/xenapi_create.py @@ -1085,6 +1085,7 @@ class sxp2xml: ''acpi'', ''apic'', ''boot'', + ''xenpaging'', ''device_model'', ''loader'', ''fda'', _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Jan-16 16:32 UTC
[Xen-devel] [PATCH 6/6] xenpaging: document outstanding features
Signed-off-by: Olaf Hering <olaf@aepfle.de> --- docs/misc/xenpaging.txt | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) --- xen-unstable.hg-4.1.22764.orig/docs/misc/xenpaging.txt +++ xen-unstable.hg-4.1.22764/docs/misc/xenpaging.txt @@ -31,7 +31,7 @@ crash because the paged-out memory is no After a reboot of a guest, its guest_id changes, the current xenpaging binary has no target anymore. To automate restarting of xenpaging after -guest reboot, specify the number if pages in the guest configuration +guest reboot, specify the number of pages in the guest configuration file /etc/xen/vm/<guest_name>: xenpaging=32768 @@ -41,8 +41,26 @@ changes. Todo: +- implement xl support - implement stopping of xenpaging -- implement/test live migration +- implement live migration +- implement config option for XENPAGING_DEBUG and XENPAGING_POLICY_MRU_SIZE +- implement config option for xenpaging_dir +- implement better starting gfn in xenpaging policy + an initial gfn number in the middle of the gfn range may avoid page-ins + during BIOS startup +- fix machine_to_phys_mapping[] array handling during page deallocation + the gfn of a released page must be maintained properly in the array + http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00824.html +- fix HVMCOPY_gfn_paged_out handling + some callers of __hvm_copy() do not handle HVMCOPY_gfn_paged_out, such + as hypercalls and the MMIO emulation + the recently added waitqueue feature in Xen 4.1 should be used +- remove all retry code from gfn_to_mfn() calls + use the waitqueue feature to hide page-in from the caller and cover + all cases where a retry is currently missing +- do not bounce p2mt to xenpaging + p2m_mem_paging_populate/p2m_mem_paging_resume dont make use of p2mt # vim: tw=72 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jan-16 16:50 UTC
Re: [Xen-devel] [PATCH 0/6] xenpaging changes for xen-4.1
On 16/01/2011 16:32, "Olaf Hering" <olaf@aepfle.de> wrote:> > > These changes make xenpaging work for me. > The first one is a fix for a bug I introduced shortly before leaving for > vacation. Currently the guest will hang because the ring fills up when the > balloon driver removes pages. > The second one avoids a crash during BIOS startup. > The third one is another version of the machine_to_phys_mapping[] handling. > The fourth is only required when the balloon driver is used in the guest. > The fifth implements a config option to allow an automated start of > xenpaging for a given guest. > And the last one lists some action items for xenpaging. > > As of now, xenpaging will either crash the hypervisor or the guest due > to page-out of low memory. Perhaps the building of xenpaging could be > disabled in the Makefile?The hard truth is that this misses 4.1.0. We will check this stuff in when 4.2 development opens and then, when we have a fully working set of patches, we can consider backport to 4.1.x. Frankly I''m a little doubtful even of that, as various subsystems'' interaction with the waitqueue feature needs a bunch more bugfixing, let alone the xenpaging patches that sit above it. That is, the required full set of patches to get xenpaging, plus all existing features, working correctly is likely to be quite large indeed. -- Keir> Olaf > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel