Olaf Hering
2012-Feb-26 16:15 UTC
[PATCH] [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1330272880 -3600 # Node ID e17afed13a35989bdf422b75d44843bfd2fda0f4 # Parent 3ace34b89ebe3bdd6afb0bca85b0c1270f99c1bb [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL Is the approach taken in this patch ok? With the debug output in my patch I get output like pasted below. It means that xenpaging set qemu_mapcache_invalidate very often before a VMCALL occoured. What triggers a VMCALL anyway? It looks like its not something one can depend on. From xenpagings PoV its not required that the command reaches qemu right away, but it should be processed "soon". Could send_invalidate_req() be called from another exit reason (or from another place perhaps)? Olaf .... (XEN) irq.c:350: Dom1 callback via changed to PCI INTx Dev 0x03 IntA (XEN) hvm_do_hypercall: 0 1 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 0 1 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 3049 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 4562 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 6 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 6 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 14 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 598 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 1 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 3 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 161 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 1 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 4860 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 1 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 799 0 0 (XEN) io.c:155:d1 send_invalidate_req (XEN) hvm_do_hypercall: 5 0 0 (XEN) io.c:155:d1 send_invalidate_req .... .... qemu will just keep mapping pages and not release them, which causes problems for the memory pager (since the page is mapped, it won''t get paged out). When the pager has trouble finding a page to page out, it asks qemu to flush its buffer, which releases all the page mappings. This makes it possible to find pages to swap out agian. This change removes the code which writes "flush-cache" to xenstore because the corresponding change for qemu-xen-traditional was never applied, and it also will not work with qemu-xen-upstream. Instead of sending commands via xenstore, use the existing IOREQ_TYPE_INVALIDATE command for HVM guests. A simple memory_op just sets qemu_mapcache_invalidate flag which in turn causes the hypervisor to send the actual IOREQ_TYPE_INVALIDATE during next VMEXIT_VMCALL. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r 3ace34b89ebe -r e17afed13a35 tools/libxc/xc_mem_paging.c --- a/tools/libxc/xc_mem_paging.c +++ b/tools/libxc/xc_mem_paging.c @@ -93,6 +93,14 @@ int xc_mem_paging_load(xc_interface *xch return rc; } +int xc_mem_paging_flush(xc_interface *xch, domid_t domain_id) +{ + return xc_mem_event_memop(xch, domain_id, + XENMEM_paging_op_ioemu_flush, + XENMEM_paging_op, + 0, NULL); +} + /* * Local variables: diff -r 3ace34b89ebe -r e17afed13a35 tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1902,6 +1902,7 @@ int xc_mem_paging_evict(xc_interface *xc int xc_mem_paging_prep(xc_interface *xch, domid_t domain_id, unsigned long gfn); int xc_mem_paging_load(xc_interface *xch, domid_t domain_id, unsigned long gfn, void *buffer); +int xc_mem_paging_flush(xc_interface *xch, domid_t domain_id); int xc_mem_access_enable(xc_interface *xch, domid_t domain_id, void *shared_page, void *ring_page); diff -r 3ace34b89ebe -r e17afed13a35 tools/xenpaging/xenpaging.c --- a/tools/xenpaging/xenpaging.c +++ b/tools/xenpaging/xenpaging.c @@ -62,15 +62,11 @@ static void close_handler(int sig) unlink_pagefile(); } -static void xenpaging_mem_paging_flush_ioemu_cache(struct xenpaging *paging) +static int xenpaging_mem_paging_flush_ioemu_cache(struct xenpaging *paging) { - struct xs_handle *xsh = paging->xs_handle; + xc_interface *xch = paging->xc_handle; domid_t domain_id = paging->mem_event.domain_id; - char path[80]; - - sprintf(path, "/local/domain/0/device-model/%u/command", domain_id); - - xs_write(xsh, XBT_NULL, path, "flush-cache", strlen("flush-cache")); + return xc_mem_paging_flush(xch, domain_id); } static int xenpaging_wait_for_event_or_timeout(struct xenpaging *paging) @@ -768,7 +764,12 @@ static int evict_victim(struct xenpaging if ( num_paged_out != paging->num_paged_out ) { DPRINTF("Flushing qemu cache\n"); - xenpaging_mem_paging_flush_ioemu_cache(paging); + ret = xenpaging_mem_paging_flush_ioemu_cache(paging); + if ( ret < 0 ) + { + PERROR("Flushing qemu cache\n"); + goto out; + } num_paged_out = paging->num_paged_out; } ret = ENOSPC; diff -r 3ace34b89ebe -r e17afed13a35 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2997,6 +2997,7 @@ static long hvm_memory_op(int cmd, XEN_G return -ENOSYS; case XENMEM_decrease_reservation: rc = do_memory_op(cmd, arg); + current->domain->arch.hvm_domain.inv_hvm++; current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1; return rc; } @@ -3088,6 +3089,7 @@ static long hvm_memory_op_compat32(int c return -ENOSYS; case XENMEM_decrease_reservation: rc = compat_memory_op(cmd, arg); + current->domain->arch.hvm_domain.inv_hvm32++; current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1; return rc; } @@ -3246,7 +3248,17 @@ int hvm_do_hypercall(struct cpu_user_reg if ( unlikely(curr->domain->arch.hvm_domain.qemu_mapcache_invalidate) && test_and_clear_bool(curr->domain->arch.hvm_domain. qemu_mapcache_invalidate) ) + { + printk("%s: %u %u %u\n", + __func__, + curr->domain->arch.hvm_domain.inv_paging, + curr->domain->arch.hvm_domain.inv_hvm, + curr->domain->arch.hvm_domain.inv_hvm32); + curr->domain->arch.hvm_domain.inv_paging = + curr->domain->arch.hvm_domain.inv_hvm = + curr->domain->arch.hvm_domain.inv_hvm32 = 0; return HVM_HCALL_invalidate; + } return HVM_HCALL_completed; } diff -r 3ace34b89ebe -r e17afed13a35 xen/arch/x86/hvm/io.c --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -152,6 +152,7 @@ void send_invalidate_req(void) struct vcpu *v = current; ioreq_t *p = get_ioreq(v); + gdprintk(XENLOG_DEBUG,"%s\n", __func__); if ( p->state != STATE_IOREQ_NONE ) { gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something " diff -r 3ace34b89ebe -r e17afed13a35 xen/arch/x86/mm/mem_paging.c --- a/xen/arch/x86/mm/mem_paging.c +++ b/xen/arch/x86/mm/mem_paging.c @@ -53,6 +53,15 @@ int mem_paging_memop(struct domain *d, x } break; + case XENMEM_paging_op_ioemu_flush: + { + /* Flush qemu cache on next VMEXIT_VMCALL */ + d->arch.hvm_domain.inv_paging++; + d->arch.hvm_domain.qemu_mapcache_invalidate = 1; + return 0; + } + break; + default: return -ENOSYS; break; diff -r 3ace34b89ebe -r e17afed13a35 xen/include/asm-x86/hvm/domain.h --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -95,6 +95,9 @@ struct hvm_domain { bool_t mem_sharing_enabled; bool_t qemu_mapcache_invalidate; bool_t is_s3_suspended; + unsigned int inv_paging; + unsigned int inv_hvm; + unsigned int inv_hvm32; union { struct vmx_domain vmx; diff -r 3ace34b89ebe -r e17afed13a35 xen/include/public/memory.h --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -322,6 +322,7 @@ typedef struct xen_pod_target xen_pod_ta #define XENMEM_paging_op_nominate 0 #define XENMEM_paging_op_evict 1 #define XENMEM_paging_op_prep 2 +#define XENMEM_paging_op_ioemu_flush 3 #define XENMEM_access_op 21 #define XENMEM_access_op_resume 0
Tim Deegan
2012-Mar-08 14:29 UTC
Re: [PATCH] [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL
At 17:15 +0100 on 26 Feb (1330276512), Olaf Hering wrote:> # HG changeset patch > # User Olaf Hering <olaf@aepfle.de> > # Date 1330272880 -3600 > # Node ID e17afed13a35989bdf422b75d44843bfd2fda0f4 > # Parent 3ace34b89ebe3bdd6afb0bca85b0c1270f99c1bb > [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL > > Is the approach taken in this patch ok?Can you explain a bit more about what you''re trying to do? It looks like you''re replacing a xenstore-based flush operation with one that forces Xen to send an invalidation after the next guest hypercall completes. What''s the advantage of doing it that way? As Andres points out, if the guest isn''t making hypercalls that means you''ll never get any flush requests. And even if not, it seems like an abuse of the system. Tim
Olaf Hering
2012-Mar-08 14:42 UTC
Re: [PATCH] [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL
On Thu, Mar 08, Tim Deegan wrote:> At 17:15 +0100 on 26 Feb (1330276512), Olaf Hering wrote: > > # HG changeset patch > > # User Olaf Hering <olaf@aepfle.de> > > # Date 1330272880 -3600 > > # Node ID e17afed13a35989bdf422b75d44843bfd2fda0f4 > > # Parent 3ace34b89ebe3bdd6afb0bca85b0c1270f99c1bb > > [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL > > > > Is the approach taken in this patch ok? > > Can you explain a bit more about what you''re trying to do?It looked easy to "inject" a request to send a flush to ioemu in this way. But it turned out that this approach will not work. So both old and new qemu need a change to accept the flush command in some way. For old qemu the xenstore approach looks best, for the new qemu I need to dig into QMP. Olaf
Tim Deegan
2012-Mar-08 14:51 UTC
Re: [PATCH] [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL
At 15:42 +0100 on 08 Mar (1331221336), Olaf Hering wrote:> On Thu, Mar 08, Tim Deegan wrote: > > > At 17:15 +0100 on 26 Feb (1330276512), Olaf Hering wrote: > > > # HG changeset patch > > > # User Olaf Hering <olaf@aepfle.de> > > > # Date 1330272880 -3600 > > > # Node ID e17afed13a35989bdf422b75d44843bfd2fda0f4 > > > # Parent 3ace34b89ebe3bdd6afb0bca85b0c1270f99c1bb > > > [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL > > > > > > Is the approach taken in this patch ok? > > > > Can you explain a bit more about what you''re trying to do? > > It looked easy to "inject" a request to send a flush to ioemu in this > way. But it turned out that this approach will not work.I think it could be made to work, but it will need more changes than this. Since the current mechanism is really only intended for guest hypercalls it would need to be updated so that the flag is checked in more places (say, on every VMENTER, or maybe somewhere a little less hot like on interrupt injection). How synchronous to you need the flush to be, btw? If you need to be sure it''s completed then you probably can''t go through the hypervisor.> So both old and new qemu need a change to accept the flush command in > some way. For old qemu the xenstore approach looks best, for the new > qemu I need to dig into QMP.Righto. Can you please make sure that whatever you do for new qemu doesn''t require xenpaging and qemu to be in the same VM? Cheers, Tim.
Olaf Hering
2012-Mar-08 15:04 UTC
Re: [PATCH] [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL
On Thu, Mar 08, Tim Deegan wrote:> How synchronous to you need the flush to be, btw? If you need to be > sure it''s completed then you probably can''t go through the hypervisor.This event is async, just a one-shot signal to kindly ask ioemu to flush its buffers. If you see a way to inject this sort of request from dom0, that would be great. Olaf
Andres Lagar-Cavilla
2012-Mar-08 15:08 UTC
Re: [PATCH] [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL
> Date: Thu, 8 Mar 2012 14:51:20 +0000 > From: Tim Deegan <tim@xen.org> > To: Olaf Hering <olaf@aepfle.de> > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH] [RFC]: xenpaging: add command to to > flush qemu buffer cache via VMCALL > Message-ID: <20120308145120.GL64337@ocelot.phlegethon.org> > Content-Type: text/plain; charset=iso-8859-1 > > At 15:42 +0100 on 08 Mar (1331221336), Olaf Hering wrote: >> On Thu, Mar 08, Tim Deegan wrote: >> >> > At 17:15 +0100 on 26 Feb (1330276512), Olaf Hering wrote: >> > > # HG changeset patch >> > > # User Olaf Hering <olaf@aepfle.de> >> > > # Date 1330272880 -3600 >> > > # Node ID e17afed13a35989bdf422b75d44843bfd2fda0f4 >> > > # Parent 3ace34b89ebe3bdd6afb0bca85b0c1270f99c1bb >> > > [RFC]: xenpaging: add command to to flush qemu buffer cache via >> VMCALL >> > > >> > > Is the approach taken in this patch ok? >> > >> > Can you explain a bit more about what you''re trying to do? >> >> It looked easy to "inject" a request to send a flush to ioemu in this >> way. But it turned out that this approach will not work. > > I think it could be made to work, but it will need more changes than > this. Since the current mechanism is really only intended for guest > hypercalls it would need to be updated so that the flag is checked in > more places (say, on every VMENTER, or maybe somewhere a little less hot > like on interrupt injection). > > How synchronous to you need the flush to be, btw? If you need to be > sure it''s completed then you probably can''t go through the hypervisor. > >> So both old and new qemu need a change to accept the flush command in >> some way. For old qemu the xenstore approach looks best, for the new >> qemu I need to dig into QMP. > > Righto. Can you please make sure that whatever you do for new qemu > doesn''t require xenpaging and qemu to be in the same VM?Stub domains: great fun with QMP! Guaranteed or your money back! (insert smiley here) This is a good reason to get inter VM TCP/IP within the same host to finally work, without requiring naming other than a domain id. If feasible at all. Andres> > Cheers, > > Tim. > > > > ------------------------------ > > Message: 2 > Date: Thu, 8 Mar 2012 15:00:59 +0000 > From: xen.org <ian.jackson@eu.citrix.com> > To: xen-devel@lists.xensource.com > Cc: ian.jackson@eu.citrix.com > Subject: [Xen-devel] [xen-unstable test] 12201: tolerable FAIL > Message-ID: <osstest-12201-mainreport@xen.org> > Content-Type: text/plain > > flight 12201 xen-unstable real [real] > http://www.chiark.greenend.org.uk/~xensrcts/logs/12201/ > > Failures :-/ but no regressions. > > Tests which are failing intermittently (not blocking): > build-i386-pvops 4 kernel-build fail pass in > 12195 > > Regressions which are regarded as allowable (not blocking): > test-amd64-i386-qemuu-rhel6hvm-amd 9 guest-start.2 fail in 12195 blocked > in 12201 > test-i386-i386-win 14 guest-start.2 fail in 12195 blocked in > 12201 > > Tests which did not succeed, but are not blocking: > test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never > pass > test-amd64-i386-pv 1 xen-build-check(1) blocked n/a > test-i386-i386-pv 1 xen-build-check(1) blocked n/a > test-amd64-i386-xl-multivcpu 1 xen-build-check(1) blocked n/a > test-i386-i386-xl 1 xen-build-check(1) blocked n/a > test-amd64-i386-xl 1 xen-build-check(1) blocked n/a > test-amd64-i386-xl-credit2 1 xen-build-check(1) blocked n/a > test-amd64-i386-qemuu-rhel6hvm-intel 1 xen-build-check(1) > blocked n/a > test-i386-i386-pair 1 xen-build-check(1) blocked n/a > test-amd64-i386-win 1 xen-build-check(1) blocked n/a > test-amd64-i386-xend-winxpsp3 1 xen-build-check(1) blocked > n/a > test-amd64-i386-rhel6hvm-amd 1 xen-build-check(1) blocked n/a > test-amd64-i386-rhel6hvm-intel 1 xen-build-check(1) blocked > n/a > test-amd64-i386-qemuu-rhel6hvm-amd 1 xen-build-check(1) > blocked n/a > test-amd64-amd64-win 16 leak-check/check fail never > pass > test-amd64-i386-win-vcpus1 1 xen-build-check(1) blocked n/a > test-amd64-i386-pair 1 xen-build-check(1) blocked n/a > test-i386-i386-win 1 xen-build-check(1) blocked n/a > test-i386-i386-xl-qemuu-winxpsp3 1 xen-build-check(1) blocked > n/a > test-amd64-amd64-xl-qemuu-winxpsp3 7 windows-install fail never > pass > test-i386-i386-xl-win 1 xen-build-check(1) blocked n/a > test-amd64-amd64-xl-win 13 guest-stop fail never > pass > test-amd64-i386-xl-win7-amd64 1 xen-build-check(1) blocked > n/a > test-amd64-i386-xl-win-vcpus1 1 xen-build-check(1) blocked > n/a > test-amd64-amd64-xl-qemuu-win7-amd64 7 windows-install fail never > pass > test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never > pass > test-i386-i386-xl-winxpsp3 1 xen-build-check(1) blocked n/a > test-amd64-i386-xl-winxpsp3-vcpus1 1 xen-build-check(1) > blocked n/a > test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never > pass > test-amd64-i386-qemuu-rhel6hvm-intel 9 guest-start.2 fail in 12195 never > pass > test-amd64-i386-win 16 leak-check/check fail in 12195 never > pass > test-amd64-i386-xend-winxpsp3 16 leak-check/check fail in 12195 never > pass > test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail in 12195 never > pass > test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail in 12195 never > pass > test-amd64-i386-win-vcpus1 16 leak-check/check fail in 12195 never > pass > test-i386-i386-xl-qemuu-winxpsp3 7 windows-install fail in 12195 never > pass > test-i386-i386-xl-win 13 guest-stop fail in 12195 never > pass > test-amd64-i386-xl-win7-amd64 13 guest-stop fail in 12195 never > pass > test-amd64-i386-xl-win-vcpus1 13 guest-stop fail in 12195 never > pass > test-i386-i386-xl-winxpsp3 13 guest-stop fail in 12195 never > pass > test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail in 12195 never > pass > > version targeted for testing: > xen f61120046915 > baseline version: > xen f61120046915 > > jobs: > build-amd64 pass > build-i386 pass > build-amd64-oldkern pass > build-i386-oldkern pass > build-amd64-pvops pass > build-i386-pvops fail > test-amd64-amd64-xl pass > test-amd64-i386-xl blocked > test-i386-i386-xl blocked > test-amd64-i386-rhel6hvm-amd blocked > test-amd64-i386-qemuu-rhel6hvm-amd blocked > test-amd64-amd64-xl-qemuu-win7-amd64 fail > test-amd64-amd64-xl-win7-amd64 fail > test-amd64-i386-xl-win7-amd64 blocked > test-amd64-i386-xl-credit2 blocked > test-amd64-amd64-xl-pcipt-intel fail > test-amd64-i386-rhel6hvm-intel blocked > test-amd64-i386-qemuu-rhel6hvm-intel blocked > test-amd64-i386-xl-multivcpu blocked > test-amd64-amd64-pair pass > test-amd64-i386-pair blocked > test-i386-i386-pair blocked > test-amd64-amd64-xl-sedf-pin pass > test-amd64-amd64-pv pass > test-amd64-i386-pv blocked > test-i386-i386-pv blocked > test-amd64-amd64-xl-sedf pass > test-amd64-i386-win-vcpus1 blocked > test-amd64-i386-xl-win-vcpus1 blocked > test-amd64-i386-xl-winxpsp3-vcpus1 blocked > test-amd64-amd64-win fail > test-amd64-i386-win blocked > test-i386-i386-win blocked > test-amd64-amd64-xl-win fail > test-i386-i386-xl-win blocked > test-amd64-amd64-xl-qemuu-winxpsp3 fail > test-i386-i386-xl-qemuu-winxpsp3 blocked > test-amd64-i386-xend-winxpsp3 blocked > test-amd64-amd64-xl-winxpsp3 fail > test-i386-i386-xl-winxpsp3 blocked > > > ------------------------------------------------------------ > sg-report-flight on woking.cam.xci-test.com > logs: /home/xc_osstest/logs > images: /home/xc_osstest/images > > Logs, config files, etc. are available at > http://www.chiark.greenend.org.uk/~xensrcts/logs > > Test harness code can be found at > http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary > > > Published tested tree is already up to date. > > > > > ------------------------------ > > Message: 3 > Date: Thu, 08 Mar 2012 10:02:55 -0500 > From: Andres Lagar-Cavilla <andres@lagarcavilla.org> > To: xen-devel@lists.xensource.com > Cc: olaf@aepfle.de, ian.campbell@citrix.com, andres@gridcentric.ca, > tim@xen.org, keir.xen@gmail.com, ian.jackson@citrix.com, > adin@gridcentric.ca > Subject: [Xen-devel] [PATCH 0 of 8] Mem event ring interface setup > update, V3 rebased > Message-ID: <patchbomb.1331218975@xdev.gridcentric.ca> > Content-Type: text/plain; charset="us-ascii" > > Changes from previous posting, dated Feb 29th: > - Added Acked-by Ian Campbell for tools bits > - Turned ((void)0) define for non-x86 architectures into "empty" static > inlines > - Added patch #8 to remove the ring from the guest physmap''s once > successfully > mapped by the helper > > Rebased against 24984:f61120046915 > > Headers from previous posting follow below > > ------------------------------------------------------------------------------ > Feb 29th: Changes from previous posting > - Added Acked-by Tim Deegan for hypervisor side > - Added Acked-by Olaf Hering, okaying the ABI/API change > - +ve errno value when sanity-checking the port pointer within libxc > - not clearing errno before calling the ring setup ioctl. > > ------------------------------------------------------------------------------ > > Update the interface for setting up mem event rings (for sharing, mem > access or > paging). > > Remove the "shared page", which was a waste of a whole page for a single > event > channel port value. > > More importantly, both the shared page and the ring page were dom0 > user-space > process pages mapped by the hypervisor. If the dom0 process does not clean > up, > the hypervisor keeps posting events (and holding a map) to a page now > belonging > to another process. > > Solutions proposed: > - Pass the event channel port explicitly as part of the domctl payload. > - Reserve a pfn in the guest physmap for a each mem event ring. > Set/retrieve > these pfns via hvm params. Ensure they are set during build and restore, > and > retrieved during save. Ensure these pages don''t leak and domains are left > zombie. > > In all cases mem events consumers in-tree (xenpaging and xen-access) have > been > updated. > > Updating the interface to deal with these problems requires > backwards-incompatible changes on both the helper<->libxc and > libxc<->hypervisor interfaces. > > Take advantage of the interface update to plumb setting up of the sharing > ring, > which was missing. > > All patches touch x86/mm hypervisor bits. Patches 1, 3 and 5 are tools > patches > as well. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Acked-by: Olaf Hering <olaf@aepfle.de> > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > tools/libxc/xc_mem_access.c | 10 +++- > tools/libxc/xc_mem_event.c | 12 +++-- > tools/libxc/xc_mem_paging.c | 10 +++- > tools/libxc/xenctrl.h | 6 +- > tools/tests/xen-access/xen-access.c | 22 +-------- > tools/xenpaging/xenpaging.c | 17 +------ > tools/xenpaging/xenpaging.h | 2 +- > xen/arch/x86/mm/mem_event.c | 33 +------------- > xen/include/public/domctl.h | 4 +- > xen/include/public/mem_event.h | 4 - > xen/include/xen/sched.h | 2 - > xen/arch/x86/hvm/hvm.c | 48 ++++++++++++++++----- > xen/include/asm-x86/hvm/hvm.h | 7 +++ > tools/libxc/xc_domain_restore.c | 42 ++++++++++++++++++ > tools/libxc/xc_domain_save.c | 36 ++++++++++++++++ > tools/libxc/xc_hvm_build.c | 21 ++++++-- > tools/libxc/xc_mem_access.c | 6 +- > tools/libxc/xc_mem_event.c | 3 +- > tools/libxc/xc_mem_paging.c | 6 +- > tools/libxc/xenctrl.h | 8 +-- > tools/libxc/xg_save_restore.h | 4 + > tools/tests/xen-access/xen-access.c | 83 > +++++++++++++++++------------------- > tools/xenpaging/xenpaging.c | 52 ++++++++++++++++------ > xen/arch/x86/mm/mem_event.c | 50 ++++++++++------------ > xen/include/public/domctl.h | 1 - > xen/include/public/hvm/params.h | 7 ++- > xen/include/xen/sched.h | 1 + > xen/arch/x86/mm/mem_event.c | 41 ++++++++++++++++++ > xen/include/public/domctl.h | 20 ++++++++- > xen/include/xen/sched.h | 3 + > tools/libxc/xc_memshr.c | 25 +++++++++++ > tools/libxc/xenctrl.h | 5 ++ > xen/arch/x86/mm/mem_event.c | 11 ++++ > xen/common/domain.c | 3 + > xen/include/asm-arm/mm.h | 3 +- > xen/include/asm-ia64/mm.h | 3 +- > xen/include/asm-x86/mm.h | 6 ++ > xen/arch/x86/mm/mem_event.c | 6 +- > tools/tests/xen-access/xen-access.c | 5 ++ > tools/xenpaging/xenpaging.c | 5 ++ > 40 files changed, 425 insertions(+), 208 deletions(-) > > > > ------------------------------ > > Message: 4 > Date: Thu, 08 Mar 2012 10:02:56 -0500 > From: Andres Lagar-Cavilla <andres@lagarcavilla.org> > To: xen-devel@lists.xensource.com > Cc: olaf@aepfle.de, ian.campbell@citrix.com, andres@gridcentric.ca, > tim@xen.org, keir.xen@gmail.com, ian.jackson@citrix.com, > adin@gridcentric.ca > Subject: [Xen-devel] [PATCH 1 of 8] Tools: Remove shared page from > mem_event/access/paging interfaces > Message-ID: <77f173290740f336baaa.1331218976@xdev.gridcentric.ca> > Content-Type: text/plain; charset="us-ascii" > > tools/libxc/xc_mem_access.c | 10 ++++++++-- > tools/libxc/xc_mem_event.c | 12 +++++++----- > tools/libxc/xc_mem_paging.c | 10 ++++++++-- > tools/libxc/xenctrl.h | 6 +++--- > tools/tests/xen-access/xen-access.c | 22 ++++------------------ > tools/xenpaging/xenpaging.c | 17 ++--------------- > tools/xenpaging/xenpaging.h | 2 +- > xen/arch/x86/mm/mem_event.c | 33 > +++------------------------------ > xen/include/public/domctl.h | 4 ++-- > xen/include/public/mem_event.h | 4 ---- > xen/include/xen/sched.h | 2 -- > 11 files changed, 38 insertions(+), 84 deletions(-) > > > Don''t use the superfluous shared page, return the event channel directly > as > part of the domctl struct, instead. > > In-tree consumers (xenpaging, xen-access) updated. This is an ABI/API > change, > so please voice any concerns. > > Known pending issues: > - pager could die and its ring page could be used by some other process, > yet > Xen retains the mapping to it. > - use a saner interface for the paging_load buffer. > > This change also affects the x86/mm bits in the hypervisor that process > the > mem_event setup domctl. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Acked-by: Olaf Hering <olaf@aepfle.de> > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > diff -r e81ed892ba62 -r 77f173290740 tools/libxc/xc_mem_access.c > --- a/tools/libxc/xc_mem_access.c > +++ b/tools/libxc/xc_mem_access.c > @@ -25,12 +25,18 @@ > > > int xc_mem_access_enable(xc_interface *xch, domid_t domain_id, > - void *shared_page, void *ring_page) > + uint32_t *port, void *ring_page) > { > + if ( !port ) > + { > + errno = EINVAL; > + return -1; > + } > + > return xc_mem_event_control(xch, domain_id, > XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE, > XEN_DOMCTL_MEM_EVENT_OP_ACCESS, > - shared_page, ring_page); > + port, ring_page); > } > > int xc_mem_access_disable(xc_interface *xch, domid_t domain_id) > diff -r e81ed892ba62 -r 77f173290740 tools/libxc/xc_mem_event.c > --- a/tools/libxc/xc_mem_event.c > +++ b/tools/libxc/xc_mem_event.c > @@ -24,19 +24,21 @@ > #include "xc_private.h" > > int xc_mem_event_control(xc_interface *xch, domid_t domain_id, unsigned > int op, > - unsigned int mode, void *page, void *ring_page) > + unsigned int mode, uint32_t *port, void > *ring_page) > { > DECLARE_DOMCTL; > + int rc; > > domctl.cmd = XEN_DOMCTL_mem_event_op; > domctl.domain = domain_id; > domctl.u.mem_event_op.op = op; > domctl.u.mem_event_op.mode = mode; > - > - domctl.u.mem_event_op.shared_addr = (unsigned long)page; > - domctl.u.mem_event_op.ring_addr = (unsigned long)ring_page; > + domctl.u.mem_event_op.ring_addr = (unsigned long) ring_page; > > - return do_domctl(xch, &domctl); > + rc = do_domctl(xch, &domctl); > + if ( !rc && port ) > + *port = domctl.u.mem_event_op.port; > + return rc; > } > > int xc_mem_event_memop(xc_interface *xch, domid_t domain_id, > diff -r e81ed892ba62 -r 77f173290740 tools/libxc/xc_mem_paging.c > --- a/tools/libxc/xc_mem_paging.c > +++ b/tools/libxc/xc_mem_paging.c > @@ -25,12 +25,18 @@ > > > int xc_mem_paging_enable(xc_interface *xch, domid_t domain_id, > - void *shared_page, void *ring_page) > + uint32_t *port, void *ring_page) > { > + if ( !port ) > + { > + errno = EINVAL; > + return -1; > + } > + > return xc_mem_event_control(xch, domain_id, > XEN_DOMCTL_MEM_EVENT_OP_PAGING_ENABLE, > XEN_DOMCTL_MEM_EVENT_OP_PAGING, > - shared_page, ring_page); > + port, ring_page); > } > > int xc_mem_paging_disable(xc_interface *xch, domid_t domain_id) > diff -r e81ed892ba62 -r 77f173290740 tools/libxc/xenctrl.h > --- a/tools/libxc/xenctrl.h > +++ b/tools/libxc/xenctrl.h > @@ -1897,13 +1897,13 @@ int xc_tmem_restore_extra(xc_interface * > * mem_event operations > */ > int xc_mem_event_control(xc_interface *xch, domid_t domain_id, unsigned > int op, > - unsigned int mode, void *shared_page, void > *ring_page); > + unsigned int mode, uint32_t *port, void > *ring_page); > int xc_mem_event_memop(xc_interface *xch, domid_t domain_id, > unsigned int op, unsigned int mode, > uint64_t gfn, void *buffer); > > int xc_mem_paging_enable(xc_interface *xch, domid_t domain_id, > - void *shared_page, void *ring_page); > + uint32_t *port, void *ring_page); > int xc_mem_paging_disable(xc_interface *xch, domid_t domain_id); > int xc_mem_paging_nominate(xc_interface *xch, domid_t domain_id, > unsigned long gfn); > @@ -1913,7 +1913,7 @@ int xc_mem_paging_load(xc_interface *xch > unsigned long gfn, void *buffer); > > int xc_mem_access_enable(xc_interface *xch, domid_t domain_id, > - void *shared_page, void *ring_page); > + uint32_t *port, void *ring_page); > int xc_mem_access_disable(xc_interface *xch, domid_t domain_id); > int xc_mem_access_resume(xc_interface *xch, domid_t domain_id, > unsigned long gfn); > diff -r e81ed892ba62 -r 77f173290740 tools/tests/xen-access/xen-access.c > --- a/tools/tests/xen-access/xen-access.c > +++ b/tools/tests/xen-access/xen-access.c > @@ -98,7 +98,7 @@ typedef struct mem_event { > xc_evtchn *xce_handle; > int port; > mem_event_back_ring_t back_ring; > - mem_event_shared_page_t *shared_page; > + uint32_t evtchn_port; > void *ring_page; > spinlock_t ring_lock; > } mem_event_t; > @@ -166,7 +166,7 @@ int xc_wait_for_event_or_timeout(xc_inte > err: > return -errno; > } > - > + > static void *init_page(void) > { > void *buffer; > @@ -214,14 +214,6 @@ xenaccess_t *xenaccess_init(xc_interface > /* Set domain id */ > xenaccess->mem_event.domain_id = domain_id; > > - /* Initialise shared page */ > - xenaccess->mem_event.shared_page = init_page(); > - if ( xenaccess->mem_event.shared_page == NULL ) > - { > - ERROR("Error initialising shared page"); > - goto err; > - } > - > /* Initialise ring page */ > xenaccess->mem_event.ring_page = init_page(); > if ( xenaccess->mem_event.ring_page == NULL ) > @@ -242,7 +234,7 @@ xenaccess_t *xenaccess_init(xc_interface > > /* Initialise Xen */ > rc = xc_mem_access_enable(xenaccess->xc_handle, > xenaccess->mem_event.domain_id, > - xenaccess->mem_event.shared_page, > + &xenaccess->mem_event.evtchn_port, > xenaccess->mem_event.ring_page); > if ( rc != 0 ) > { > @@ -271,7 +263,7 @@ xenaccess_t *xenaccess_init(xc_interface > /* Bind event notification */ > rc = xc_evtchn_bind_interdomain(xenaccess->mem_event.xce_handle, > xenaccess->mem_event.domain_id, > - > xenaccess->mem_event.shared_page->port); > + xenaccess->mem_event.evtchn_port); > if ( rc < 0 ) > { > ERROR("Failed to bind event channel"); > @@ -322,12 +314,6 @@ xenaccess_t *xenaccess_init(xc_interface > err: > if ( xenaccess ) > { > - if ( xenaccess->mem_event.shared_page ) > - { > - munlock(xenaccess->mem_event.shared_page, PAGE_SIZE); > - free(xenaccess->mem_event.shared_page); > - } > - > if ( xenaccess->mem_event.ring_page ) > { > munlock(xenaccess->mem_event.ring_page, PAGE_SIZE); > diff -r e81ed892ba62 -r 77f173290740 tools/xenpaging/xenpaging.c > --- a/tools/xenpaging/xenpaging.c > +++ b/tools/xenpaging/xenpaging.c > @@ -337,14 +337,6 @@ static struct xenpaging *xenpaging_init( > goto err; > } > > - /* Initialise shared page */ > - paging->mem_event.shared_page = init_page(); > - if ( paging->mem_event.shared_page == NULL ) > - { > - PERROR("Error initialising shared page"); > - goto err; > - } > - > /* Initialise ring page */ > paging->mem_event.ring_page = init_page(); > if ( paging->mem_event.ring_page == NULL ) > @@ -361,7 +353,7 @@ static struct xenpaging *xenpaging_init( > > /* Initialise Xen */ > rc = xc_mem_paging_enable(xch, paging->mem_event.domain_id, > - paging->mem_event.shared_page, > + &paging->mem_event.evtchn_port, > paging->mem_event.ring_page); > if ( rc != 0 ) > { > @@ -393,7 +385,7 @@ static struct xenpaging *xenpaging_init( > /* Bind event notification */ > rc = xc_evtchn_bind_interdomain(paging->mem_event.xce_handle, > paging->mem_event.domain_id, > - paging->mem_event.shared_page->port); > + paging->mem_event.evtchn_port); > if ( rc < 0 ) > { > PERROR("Failed to bind event channel"); > @@ -474,11 +466,6 @@ static struct xenpaging *xenpaging_init( > munlock(paging->paging_buffer, PAGE_SIZE); > free(paging->paging_buffer); > } > - if ( paging->mem_event.shared_page ) > - { > - munlock(paging->mem_event.shared_page, PAGE_SIZE); > - free(paging->mem_event.shared_page); > - } > > if ( paging->mem_event.ring_page ) > { > diff -r e81ed892ba62 -r 77f173290740 tools/xenpaging/xenpaging.h > --- a/tools/xenpaging/xenpaging.h > +++ b/tools/xenpaging/xenpaging.h > @@ -36,7 +36,7 @@ struct mem_event { > xc_evtchn *xce_handle; > int port; > mem_event_back_ring_t back_ring; > - mem_event_shared_page_t *shared_page; > + uint32_t evtchn_port; > void *ring_page; > }; > > diff -r e81ed892ba62 -r 77f173290740 xen/arch/x86/mm/mem_event.c > --- a/xen/arch/x86/mm/mem_event.c > +++ b/xen/arch/x86/mm/mem_event.c > @@ -50,12 +50,10 @@ static int mem_event_enable( > struct domain *dom_mem_event = current->domain; > struct vcpu *v = current; > unsigned long ring_addr = mec->ring_addr; > - unsigned long shared_addr = mec->shared_addr; > l1_pgentry_t l1e; > - unsigned long shared_gfn = 0, ring_gfn = 0; /* gcc ... */ > + unsigned long ring_gfn = 0; /* gcc ... */ > p2m_type_t p2mt; > mfn_t ring_mfn; > - mfn_t shared_mfn; > > /* Only one helper at a time. If the helper crashed, > * the ring is in an undefined state and so is the guest. > @@ -66,10 +64,6 @@ static int mem_event_enable( > /* Get MFN of ring page */ > guest_get_eff_l1e(v, ring_addr, &l1e); > ring_gfn = l1e_get_pfn(l1e); > - /* We''re grabbing these two in an order that could deadlock > - * dom0 if 1. it were an hvm 2. there were two concurrent > - * enables 3. the two gfn''s in each enable criss-crossed > - * 2MB regions. Duly noted.... */ > ring_mfn = get_gfn(dom_mem_event, ring_gfn, &p2mt); > > if ( unlikely(!mfn_valid(mfn_x(ring_mfn))) ) > @@ -80,23 +74,9 @@ static int mem_event_enable( > > mem_event_ring_lock_init(med); > > - /* Get MFN of shared page */ > - guest_get_eff_l1e(v, shared_addr, &l1e); > - shared_gfn = l1e_get_pfn(l1e); > - shared_mfn = get_gfn(dom_mem_event, shared_gfn, &p2mt); > - > - if ( unlikely(!mfn_valid(mfn_x(shared_mfn))) ) > - { > - put_gfn(dom_mem_event, ring_gfn); > - put_gfn(dom_mem_event, shared_gfn); > - return -EINVAL; > - } > - > - /* Map ring and shared pages */ > + /* Map ring page */ > med->ring_page = map_domain_page(mfn_x(ring_mfn)); > - med->shared_page = map_domain_page(mfn_x(shared_mfn)); > put_gfn(dom_mem_event, ring_gfn); > - put_gfn(dom_mem_event, shared_gfn); > > /* Set the number of currently blocked vCPUs to 0. */ > med->blocked = 0; > @@ -108,8 +88,7 @@ static int mem_event_enable( > if ( rc < 0 ) > goto err; > > - ((mem_event_shared_page_t *)med->shared_page)->port = rc; > - med->xen_port = rc; > + med->xen_port = mec->port = rc; > > /* Prepare ring buffer */ > FRONT_RING_INIT(&med->front_ring, > @@ -125,9 +104,6 @@ static int mem_event_enable( > return 0; > > err: > - unmap_domain_page(med->shared_page); > - med->shared_page = NULL; > - > unmap_domain_page(med->ring_page); > med->ring_page = NULL; > > @@ -249,9 +225,6 @@ static int mem_event_disable(struct doma > unmap_domain_page(med->ring_page); > med->ring_page = NULL; > > - unmap_domain_page(med->shared_page); > - med->shared_page = NULL; > - > /* Unblock all vCPUs */ > for_each_vcpu ( d, v ) > { > diff -r e81ed892ba62 -r 77f173290740 xen/include/public/domctl.h > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -747,8 +747,8 @@ struct xen_domctl_mem_event_op { > uint32_t op; /* XEN_DOMCTL_MEM_EVENT_OP_*_* */ > uint32_t mode; /* XEN_DOMCTL_MEM_EVENT_OP_* */ > > - uint64_aligned_t shared_addr; /* IN: Virtual address of shared page > */ > - uint64_aligned_t ring_addr; /* IN: Virtual address of ring page > */ > + uint32_t port; /* OUT: event channel for ring */ > + uint64_aligned_t ring_addr; /* IN: Virtual address of ring page */ > }; > typedef struct xen_domctl_mem_event_op xen_domctl_mem_event_op_t; > DEFINE_XEN_GUEST_HANDLE(xen_domctl_mem_event_op_t); > diff -r e81ed892ba62 -r 77f173290740 xen/include/public/mem_event.h > --- a/xen/include/public/mem_event.h > +++ b/xen/include/public/mem_event.h > @@ -46,10 +46,6 @@ > #define MEM_EVENT_REASON_INT3 5 /* int3 was hit: gla/gfn are > RIP */ > #define MEM_EVENT_REASON_SINGLESTEP 6 /* single step was invoked: > gla/gfn are RIP */ > > -typedef struct mem_event_shared_page { > - uint32_t port; > -} mem_event_shared_page_t; > - > typedef struct mem_event_st { > uint32_t flags; > uint32_t vcpu_id; > diff -r e81ed892ba62 -r 77f173290740 xen/include/xen/sched.h > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -190,8 +190,6 @@ struct mem_event_domain > /* The ring has 64 entries */ > unsigned char foreign_producers; > unsigned char target_producers; > - /* shared page */ > - mem_event_shared_page_t *shared_page; > /* shared ring page */ > void *ring_page; > /* front-end ring */ > > > > ------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > > > End of Xen-devel Digest, Vol 85, Issue 116 > ****************************************** >
Tim Deegan
2012-Mar-08 15:17 UTC
Re: [PATCH] [RFC]: xenpaging: add command to to flush qemu buffer cache via VMCALL
At 07:08 -0800 on 08 Mar (1331190509), Andres Lagar-Cavilla wrote:> > Righto. Can you please make sure that whatever you do for new qemu > > doesn''t require xenpaging and qemu to be in the same VM? > > Stub domains: great fun with QMP! Guaranteed or your money back! (insert > smiley here)Cc''ing Stefano, who presumably knows about such things. Stefano, what is the Correct Way to communicate with a new-style qemu in another domain? QMP over Xenstore or over libvchan?> This is a good reason to get inter VM TCP/IP within the same host to > finally work, without requiring naming other than a domain id. If feasible > at all.Doesn''t libvchan solve this problem? (I presume you want a socket-style interface, rather than actual TCP/IP.) Tim.