Wangzhenguo
2012-Jul-02 06:40 UTC
The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
Hi, everybody I meet a trouble. in dom0, I call the xc_domain_set_pod_target hypercall in one thread, and meanwhile fork a process in another thread, it will return EFAULT by the function of copy_to_user failed in hypervisor. I see that when forking a process, the page will become COW, copy_to_user will cause a wirte protection page fault and return EFAULT. Is There any ideas for it? I write a simple codes to reproduce it: typedef unsigned long xen_pfn_t; typedef unsigned long uint64_t; typedef unsigned short domid_t; #include <unistd.h> #include <sys/ioctl.h> #include <stdlib.h> #include <assert.h> #include <string.h> #include <sys/stat.h> #include <fcntl.h> #include <stdio.h> #include <sys/types.h> #include <xen/sys/privcmd.h> #include <errno.h> #define BUFFER_SIZE 4096 struct xen_pod_target { /* IN */ uint64_t target_pages; /* OUT */ uint64_t tot_pages; uint64_t pod_cache_pages; uint64_t pod_entries; /* IN */ domid_t domid; }; #define PRIVCMD_INTF "/proc/xen/privcmd" #define XENMEM_get_pod_target 17 #define __HYPERVISOR_memory_op 12 int main (int argc, char *argv[]) { int err; domid_t domid; struct xen_pod_target *pod = NULL; int fd; privcmd_hypercall_t hypercall; fd = open (PRIVCMD_INTF, O_RDWR); assert (0 < fd); pod = (struct xen_pod_target *) memalign(4096, BUFFER_SIZE); memset (pod, 0, BUFFER_SIZE); pod->domid = 0; printf("pid = %d, pod=0x%p\n",getpid(), pod); hypercall.op = __HYPERVISOR_memory_op; hypercall.arg[0] = (unsigned long)XENMEM_get_pod_target; hypercall.arg[1] = (unsigned long)pod; if (0 == fork()) { exit(0); } err = ioctl (fd, IOCTL_PRIVCMD_HYPERCALL, &hypercall); if (0 > err) { printf("Failed get_pod_target dom0, errno = %d\n", errno); } return 0; }
Ian Campbell
2012-Jul-02 09:38 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Mon, 2012-07-02 at 07:40 +0100, Wangzhenguo wrote:> Hi, everybody > I meet a trouble. in dom0, I call the xc_domain_set_pod_target hypercall in one thread, and meanwhile fork a process in another thread, it will return EFAULT by the function of copy_to_user failed in hypervisor. I see that when forking a process, the page will become COW, copy_to_user will cause a wirte protection page fault and return EFAULT. > Is There any ideas for it?You must lock down any memory to be used as a hypercall argument. libxc provides the xc_hypercall_buffer interfaces specifically for this reason. Also in general you should be using libxc instead of open coding your own hypercalls. In this case xc_domain_get_pod_target() is the function to use. Ian.
Wangzhenguo
2012-Jul-03 12:08 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> On Mon, 2012-07-02 at 07:40 +0100, Wangzhenguo wrote: > > Hi, everybody > > I meet a trouble. in dom0, I call the xc_domain_set_pod_target > hypercall in one thread, and meanwhile fork a process in another thread, > it will return EFAULT by the function of copy_to_user failed in > hypervisor. I see that when forking a process, the page will become COW, > copy_to_user will cause a wirte protection page fault and return EFAULT. > > Is There any ideas for it? > > You must lock down any memory to be used as a hypercall argument. > > libxc provides the xc_hypercall_buffer interfaces specifically for this > reason. > > Also in general you should be using libxc instead of open coding your > own hypercalls. In this case xc_domain_get_pod_target() is the function > to use.Thanks for your reply. Yes, I use the libxl coding in my program(The simple code is just for emulating the COW page). The following is the context that causes the EFAULT error. There are many threads in my program, one thread(thread1) calls the libxl_set_memory_target to set pod, another thread(thread2) calls fork at between the t2 and t3 time. After fork, all pages are COW in the program. And I think any get operation hypercalls will fail when another thread calls fork at between the t2 and t3 time. time thread1 thread2 | | t0:call libxl_set_memory_target(libxl) | | | t1: xc_domain_set_pod_target(libxc) | | | t2: do_xen_hypercall(privcmd) | | fork t3: __HYPERVISOR_memory_op | | | t4: return EFAULT |> > Ian. >
Ian Campbell
2012-Jul-24 13:06 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Tue, 2012-07-03 at 13:08 +0100, Wangzhenguo wrote:> > On Mon, 2012-07-02 at 07:40 +0100, Wangzhenguo wrote: > > > Hi, everybody > > > I meet a trouble. in dom0, I call the xc_domain_set_pod_target > > hypercall in one thread, and meanwhile fork a process in another thread, > > it will return EFAULT by the function of copy_to_user failed in > > hypervisor. I see that when forking a process, the page will become COW, > > copy_to_user will cause a wirte protection page fault and return EFAULT. > > > Is There any ideas for it? > > > > You must lock down any memory to be used as a hypercall argument. > > > > libxc provides the xc_hypercall_buffer interfaces specifically for this > > reason. > > > > Also in general you should be using libxc instead of open coding your > > own hypercalls. In this case xc_domain_get_pod_target() is the function > > to use. > > Thanks for your reply. Yes, I use the libxl coding in my program(The > simple code is just for emulating the COW page). The following is the > context that causes the EFAULT error. There are many threads in my > program, one thread(thread1) calls the libxl_set_memory_target to set > pod, another thread(thread2) calls fork at between the t2 and t3 time. > After fork, all pages are COW in the program. And I think any get > operation hypercalls will fail when another thread calls fork at > between the t2 and t3 time. > > time thread1 thread2 > | | > t0:call libxl_set_memory_target(libxl) | > | | > t1: xc_domain_set_pod_target(libxc) | > | | > t2: do_xen_hypercall(privcmd) | > | fork > t3: __HYPERVISOR_memory_op | > | | > t4: return EFAULT > | > >Hrm. yes. This basically harks back to the use of mlock as a surrogate for the requirement to properly lock down memory for use as a hypercall argument. This is flawed in ways other than this (i.e. NUMA memory migration would also break it). The correct answer here is a special device (e.g. /dev/xen/hypercall?) which can be mmap by libxc to provide memory specifically for this purspoe which is fully locked down (e.g. VM_DONTCOPY and whatever else is required). The libxc "osdep" interface (see xenctrlosdep.h) provides a framework for doing this, it just doesn''t actually implement the use of the special driver yet. Is that something you might be interested in coding up? Ian.
Wangzhenguo
2012-Jul-25 11:48 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Tuesday, July 24, 2012 9:06 PM > > Hrm. yes. This basically harks back to the use of mlock as a surrogate > for the requirement to properly lock down memory for use as a hypercall > argument. This is flawed in ways other than this (i.e. NUMA memory > migration would also break it). > > The correct answer here is a special device (e.g. /dev/xen/hypercall?) > which can be mmap by libxc to provide memory specifically for this > purspoe which is fully locked down (e.g. VM_DONTCOPY and whatever else > is required).Thanks for your reply. I see the madvise syscall can make the vma to be VM_DONTCOPY by deliverying MADV_DONTFORK advice. I''ll fixup and test it.> > The libxc "osdep" interface (see xenctrlosdep.h) provides a framework > for doing this, it just doesn''t actually implement the use of the > special driver yet. > > Is that something you might be interested in coding up? > > Ian.
Ian Campbell
2012-Jul-25 13:12 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Wed, 2012-07-25 at 12:48 +0100, Wangzhenguo wrote:> > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > Sent: Tuesday, July 24, 2012 9:06 PM > > > > Hrm. yes. This basically harks back to the use of mlock as a surrogate > > for the requirement to properly lock down memory for use as a hypercall > > argument. This is flawed in ways other than this (i.e. NUMA memory > > migration would also break it). > > > > The correct answer here is a special device (e.g. /dev/xen/hypercall?) > > which can be mmap by libxc to provide memory specifically for this > > purspoe which is fully locked down (e.g. VM_DONTCOPY and whatever else > > is required). > > Thanks for your reply. I see the madvise syscall can make the vma to > be VM_DONTCOPY by deliverying MADV_DONTFORK advice.I didn''t know about this MADV option, it sounds like the right idea to me.> I''ll fixup and test it.> > > > The libxc "osdep" interface (see xenctrlosdep.h) provides a framework > > for doing this, it just doesn''t actually implement the use of the > > special driver yet. > > > > Is that something you might be interested in coding up? > > > > Ian. >
Ian Campbell
2012-Aug-03 10:10 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Wed, 2012-07-25 at 14:12 +0100, Ian Campbell wrote:> On Wed, 2012-07-25 at 12:48 +0100, Wangzhenguo wrote: > > > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > > Sent: Tuesday, July 24, 2012 9:06 PM > > > > > > Hrm. yes. This basically harks back to the use of mlock as a surrogate > > > for the requirement to properly lock down memory for use as a hypercall > > > argument. This is flawed in ways other than this (i.e. NUMA memory > > > migration would also break it). > > > > > > The correct answer here is a special device (e.g. /dev/xen/hypercall?) > > > which can be mmap by libxc to provide memory specifically for this > > > purspoe which is fully locked down (e.g. VM_DONTCOPY and whatever else > > > is required). > > > > Thanks for your reply. I see the madvise syscall can make the vma to > > be VM_DONTCOPY by deliverying MADV_DONTFORK advice. > > I didn''t know about this MADV option, it sounds like the right idea to > me. > > > I''ll fixup and test it.BTW, I think this would be a good fix to have for 4.2.0 if you are able to produce a patch.> > > > > > > The libxc "osdep" interface (see xenctrlosdep.h) provides a framework > > > for doing this, it just doesn''t actually implement the use of the > > > special driver yet. > > > > > > Is that something you might be interested in coding up? > > > > > > Ian. > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Wangzhenguo
2012-Aug-06 13:01 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Friday, August 03, 2012 6:10 PM > > BTW, I think this would be a good fix to have for 4.2.0 if you are able > to produce a patch. >Hi, Ian There is a patch that changes: 1. use madvise(MADV_DONTFORK) decleare that don''t copy the vma when fork after allocating pages, and usr madvise(MADV_DOFORK) restore the flags of vma before freeing the pages. 2. use mmap/nunmap to alloc/free memory instead of malloc/free for passing through libc. The free interface in libc may not really free memory, just returns the control to libc. If the memeory set not copy when call fork(), after forking: a, In child process, you call free() to the memory, then malloc(), the libc maybe return the same memory, if you access the memeory, and it causes segment fault. b, If you not call the free() in child process, it maybe leak the memory which manages the malloc''s memory in libc. mmap/munmap don''t those problems. 3. In the same thread, do not call fork() syscall between xc__hypercall_buffer_alloc_pages and xc__hypercall_buffer_free_pages,otherwise it will cause segment fault when access the hypercall buffer in child process. In normal context, we call alloc hypercall buffer, then call hypercall, and free the hypercall buffer (or free to the cache). No one call fork() between alloc and free hypercall buffers, so, I don''t think it''s a problem. We test the patch and it''s OK on multi-threads and multi-processes context. Thanks Ian and xiaowei for giving good ideas. diff -r 3d17148e465c tools/libxc/xc_hcall_buf.c --- a/tools/libxc/xc_hcall_buf.c Thu Aug 02 11:49:37 2012 +0200 +++ b/tools/libxc/xc_hcall_buf.c Mon Aug 06 19:45:00 2012 +0800 @@ -19,6 +19,7 @@ #include <stdlib.h> #include <string.h> #include <pthread.h> +#include <sys/mman.h> #include "xc_private.h" #include "xg_private.h" @@ -135,6 +136,9 @@ b->hbuf = p; + /*Do not copy the VMA to child process when call fork(), avoid the page being COW when hyper calling*/ + madvise(p, nr_pages * PAGE_SIZE, MADV_DONTFORK); + memset(p, 0, nr_pages * PAGE_SIZE); return b->hbuf; @@ -145,6 +149,8 @@ if ( b->hbuf == NULL ) return; + /*Recover the VMA flags, allow copy the VMA when call fork()*/ + madvise(b->hbuf, nr_pages * PAGE_SIZE, MADV_DOFORK) ; if ( !hypercall_buffer_cache_free(xch, b->hbuf, nr_pages) ) xch->ops->u.privcmd.free_hypercall_buffer(xch, xch->ops_handle, b->hbuf, nr_pages); } diff -r 3d17148e465c tools/libxc/xc_linux_osdep.c --- a/tools/libxc/xc_linux_osdep.c Thu Aug 02 11:49:37 2012 +0200 +++ b/tools/libxc/xc_linux_osdep.c Mon Aug 06 19:45:00 2012 +0800 @@ -93,22 +93,14 @@ size_t size = npages * XC_PAGE_SIZE; void *p; - p = xc_memalign(xch, XC_PAGE_SIZE, size); - if (!p) - return NULL; + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); - if ( mlock(p, size) < 0 ) - { - free(p); - return NULL; - } return p; } static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) { - munlock(ptr, npages * XC_PAGE_SIZE); - free(ptr); + munmap(ptr, npages * XC_PAGE_SIZE); } static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall)
Ian Campbell
2012-Aug-06 13:28 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Mon, 2012-08-06 at 14:01 +0100, Wangzhenguo wrote:> > -----Original Message----- > > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > Sent: Friday, August 03, 2012 6:10 PM > > > > BTW, I think this would be a good fix to have for 4.2.0 if you are able > > to produce a patch. > > > Hi, Ian > There is a patch that changes: > 1. use madvise(MADV_DONTFORK) decleare that don''t copy the vma when fork > after allocating pages, and usr madvise(MADV_DOFORK) restore the flags > of vma before freeing the pages. > 2. use mmap/nunmap to alloc/free memory instead of malloc/free for passing > through libc. > The free interface in libc may not really free memory, just returns the > control to libc. If the memeory set not copy when call fork(), after forking: > a, In child process, you call free() to the memory, then malloc(), > the libc maybe return the same memory, if you access the memeory, > and it causes segment fault. > b, If you not call the free() in child process, it maybe leak the memory > which manages the malloc''s memory in libc. > mmap/munmap don''t those problems. > 3. In the same thread, do not call fork() syscall between xc__hypercall_buffer_alloc_pages > and xc__hypercall_buffer_free_pages,otherwise it will cause segment fault > when access the hypercall buffer in child process. > In normal context, we call alloc hypercall buffer, then call hypercall, > and free the hypercall buffer (or free to the cache). No one call fork() > between alloc and free hypercall buffers, so, I don''t think it''s a problemAnother thread in the process might fork though, wasn''t that the main observation you made when you first posted? I think perhaps you mean it is forbidden to fork and then access a hypercall buffer allocated before the fork, which sounds ok, since no thread which allocates a hypercall buffer should fork with it still allocated.> . > > We test the patch and it''s OK on multi-threads and multi-processes context. > > Thanks Ian and xiaowei for giving good ideas.Thanks, this will need a Signed-off-by and a commit message as described in: http://wiki.xen.org/wiki/Submitting_Xen_Patches> diff -r 3d17148e465c tools/libxc/xc_hcall_buf.c > --- a/tools/libxc/xc_hcall_buf.c Thu Aug 02 11:49:37 2012 +0200 > +++ b/tools/libxc/xc_hcall_buf.c Mon Aug 06 19:45:00 2012 +0800 > @@ -19,6 +19,7 @@Please can you add this to your ~/.hgrc: [diff] showfunc = True That will make "hg diff" and similar functions show the name of the changed function here which is very useful for reviewers.> @@ -135,6 +136,9 @@ > > b->hbuf = p; > > + /*Do not copy the VMA to child process when call fork(), avoid the page being COW when hyper calling*/ > + madvise(p, nr_pages * PAGE_SIZE, MADV_DONTFORK);madvise(2) tells me that MADV_{DO,DONT}FORK are Linux specific, so I think this belongs in the Linux specific alloc_hypercall_buffer hook.> memset(p, 0, nr_pages * PAGE_SIZE); > > return b->hbuf; > @@ -145,6 +149,8 @@ > if ( b->hbuf == NULL ) > return; > > + /*Recover the VMA flags, allow copy the VMA when call fork()*/ > + madvise(b->hbuf, nr_pages * PAGE_SIZE, MADV_DOFORK) ;Likewise I think this belongs in the free_hypercall_buffer hook.> if ( !hypercall_buffer_cache_free(xch, b->hbuf, nr_pages) ) > xch->ops->u.privcmd.free_hypercall_buffer(xch, xch->ops_handle, b->hbuf, nr_pages); > } > diff -r 3d17148e465c tools/libxc/xc_linux_osdep.c > --- a/tools/libxc/xc_linux_osdep.c Thu Aug 02 11:49:37 2012 +0200 > +++ b/tools/libxc/xc_linux_osdep.c Mon Aug 06 19:45:00 2012 +0800 > @@ -93,22 +93,14 @@ > size_t size = npages * XC_PAGE_SIZE; > void *p; > > - p = xc_memalign(xch, XC_PAGE_SIZE, size); > - if (!p) > - return NULL; > + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0);I suppose this must necessarily return a page aligned result?> > - if ( mlock(p, size) < 0 ) > - { > - free(p); > - return NULL; > - } > return p; > } > > static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) > { > - munlock(ptr, npages * XC_PAGE_SIZE); > - free(ptr); > + munmap(ptr, npages * XC_PAGE_SIZE); > } > > static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall)
Wangzhenguo
2012-Aug-07 09:53 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
Hi, Ian Thanks your reply. I added two hooks in xc_osdep_ops.u.pricmd for the different OSes after allocating and before freeing the hypercall buffer. One hook makes the hypercall buffer not to COW after being allocated in Linux, and to restore it to normal before being freed.> Another thread in the process might fork though, wasn''t that the main > observation you made when you first posted? > > I think perhaps you mean it is forbidden to fork and then access a > hypercall buffer allocated before the fork, which sounds ok, since no > thread which allocates a hypercall buffer should fork with it still > allocated.Yes.> > That will make "hg diff" and similar functions show the name of the > changed function here which is very useful for reviewers.OK.> the page being COW when hyper calling*/ > > + madvise(p, nr_pages * PAGE_SIZE, MADV_DONTFORK); > > madvise(2) tells me that MADV_{DO,DONT}FORK are Linux specific, so I > think this belongs in the Linux specific alloc_hypercall_buffer hook.I don''t think so. We only need madvise(MADV_DONTFORK) before hypercall, and madvise(MADV_DOFORK) after hypercall. The pages in the hypercall buffer need not be protected. So two extra hooks are added in xc_osdep_ops.u.pricmd. The linux version is implemented.> > + p = mmap(NULL, size, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); > > I suppose this must necessarily return a page aligned result?The address returned by mmap is already page aligned in the linux OS. the patch is below: # HG changeset patch # Parent bd244b9bc84bc74b6c6182c34369a31d1c5c869c libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to avoid the hypercall buffer becoming COW on hypercall. In multi-threads and multi-processes environment, e.g. the process has two threads, thread A may call hypercall, thread B may call fork() to create child process. After forking, all pages of the process including hypercall buffers are cow. The hypervisor calls copy_to_user in hypercall in thread A context, will cause a write protection and return EFAULT error. Fix: 1. Before hypercall: use MADV_DONTFORK of madvise syscall to make the hypercall buffer not to be copied to child process after fork. 2. After hypercall: undo the effect of MADV_DONTFORK for the hypercall buffer by using MADV_DOFORK of madvise syscall. 3. Use mmap/nunmap for memory alloc/free instead of malloc/free to bypass libc. Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com> diff -r bd244b9bc84b tools/libxc/xc_hcall_buf.c --- a/tools/libxc/xc_hcall_buf.c Tue Aug 07 16:44:20 2012 +0800 +++ b/tools/libxc/xc_hcall_buf.c Tue Aug 07 16:46:56 2012 +0800 @@ -135,6 +135,9 @@ void *xc__hypercall_buffer_alloc_pages(x b->hbuf = p; + if (xch->ops->u.privcmd.prepare_hypercall_buffer) + xch->ops->u.privcmd.prepare_hypercall_buffer(xch, xch->ops_handle, p, nr_pages); + memset(p, 0, nr_pages * PAGE_SIZE); return b->hbuf; @@ -145,6 +148,9 @@ void xc__hypercall_buffer_free_pages(xc_ if ( b->hbuf == NULL ) return; + if (xch->ops->u.privcmd.unprepare_hypercall_buffer) + xch->ops->u.privcmd.unprepare_hypercall_buffer(xch, xch->ops_handle, b->hbuf, nr_pages); + if ( !hypercall_buffer_cache_free(xch, b->hbuf, nr_pages) ) xch->ops->u.privcmd.free_hypercall_buffer(xch, xch->ops_handle, b->hbuf, nr_pages); } diff -r bd244b9bc84b tools/libxc/xc_linux_osdep.c --- a/tools/libxc/xc_linux_osdep.c Tue Aug 07 16:44:20 2012 +0800 +++ b/tools/libxc/xc_linux_osdep.c Tue Aug 07 16:46:56 2012 +0800 @@ -93,22 +93,26 @@ static void *linux_privcmd_alloc_hyperca size_t size = npages * XC_PAGE_SIZE; void *p; - p = xc_memalign(xch, XC_PAGE_SIZE, size); - if (!p) - return NULL; - - if ( mlock(p, size) < 0 ) - { - free(p); - return NULL; - } + /* Address returned by mmap is page aligned. */ + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); return p; } static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) { - munlock(ptr, npages * XC_PAGE_SIZE); - free(ptr); + munmap(ptr, npages * XC_PAGE_SIZE); +} + +static void linux_privcmd_prepare_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) +{ + /* Do not copy the VMA to child process on fork. Avoid the page being COW on hypercall. */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DONTFORK); +} + +static void linux_privcmd_unprepare_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) +{ + /* Recover the VMA flags. */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DOFORK); } static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall) @@ -420,6 +424,8 @@ static struct xc_osdep_ops linux_privcmd .u.privcmd = { .alloc_hypercall_buffer = &linux_privcmd_alloc_hypercall_buffer, .free_hypercall_buffer = &linux_privcmd_free_hypercall_buffer, + .prepare_hypercall_buffer = &linux_privcmd_prepare_hypercall_buffer, + .unprepare_hypercall_buffer = &linux_privcmd_unprepare_hypercall_buffer, .hypercall = &linux_privcmd_hypercall, diff -r bd244b9bc84b tools/libxc/xenctrlosdep.h --- a/tools/libxc/xenctrlosdep.h Tue Aug 07 16:44:20 2012 +0800 +++ b/tools/libxc/xenctrlosdep.h Tue Aug 07 16:46:56 2012 +0800 @@ -78,6 +78,9 @@ struct xc_osdep_ops void *(*alloc_hypercall_buffer)(xc_interface *xch, xc_osdep_handle h, int npages); void (*free_hypercall_buffer)(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages); + void (*prepare_hypercall_buffer)(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages); + void (*unprepare_hypercall_buffer)(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages); + int (*hypercall)(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall); void *(*map_foreign_batch)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, int prot,
Ian Campbell
2012-Aug-07 10:07 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Tue, 2012-08-07 at 10:53 +0100, Wangzhenguo wrote:> > the page being COW when hyper calling*/ > > > + madvise(p, nr_pages * PAGE_SIZE, MADV_DONTFORK); > > > > madvise(2) tells me that MADV_{DO,DONT}FORK are Linux specific, so I > > think this belongs in the Linux specific alloc_hypercall_buffer hook. > > I don''t think so. We only need madvise(MADV_DONTFORK) before hypercall, > and madvise(MADV_DOFORK) after hypercall. The pages in the hypercall buffer > need not be protected.The entire point of the hypercall buffer is that it needs to be safe for use as a hypercall argument, therefore it does need to be protected.> So two extra hooks are added in xc_osdep_ops.u.pricmd.I don''t understand why these new hooks are needed, you call the first immediately after (near enough) alloc_hypercall_buffer and the second immediately before free_hypercall_buffer. The semantics of both those existing calls are already that they must provide and release memory suitable for use as a hypercall argument, so I don''t think having a separate prepare call which takes their result and does "really make this memory suitable" makes sense. Ian.
Wangzhenguo
2012-Aug-07 11:05 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Tuesday, August 07, 2012 6:07 PM> The entire point of the hypercall buffer is that it needs to be safe > for use as a hypercall argument, therefore it does need to be protected.We won''t protect pages in hypercall buffer cache. Hypercall buffer cache belongs to xc_interface, parent process calls xc_interface_open to get xc_interface handler, after fork, child process inherits the handler, and also inherits hypercall buffer cache which belongs to it. It will cause segment fault to access pages in the hypercall buffer cache by being delivered to hypercall, because they are not invalidate in the child process. So, We need restore the status of pages before putting they to hypercall buffer cache, and prevent pages from COW after being allocated in hyprecall buffer cache.> > So two extra hooks are added in xc_osdep_ops.u.pricmd. > > I don''t understand why these new hooks are needed, you call the first > immediately after (near enough) alloc_hypercall_buffer and the second > immediately before free_hypercall_buffer. The semantics of both those > existing calls are already that they must provide and release memory > suitable for use as a hypercall argument, so I don''t think having a > separate prepare call which takes their result and does "really make > this memory suitable" makes sense. > > Ian.
Ian Campbell
2012-Aug-07 12:42 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Tue, 2012-08-07 at 12:05 +0100, Wangzhenguo wrote:> > -----Original Message----- > > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > Sent: Tuesday, August 07, 2012 6:07 PM > > > The entire point of the hypercall buffer is that it needs to be safe > > for use as a hypercall argument, therefore it does need to be protected. > > We won''t protect pages in hypercall buffer cache. Hypercall buffer > cache belongs to xc_interface, parent process calls xc_interface_open > to get xc_interface handler, after fork, child process inherits the > handler, and also inherits hypercall buffer cache which belongs to it. > It will cause segment fault to access pages in the hypercall buffer > cache by being delivered to hypercall, because they are not > invalidate in the child process. So, We need restore the status of > pages before putting they to hypercall buffer cache, and prevent pages > from COW after being allocated in hyprecall buffer cache.I don''t think we should expect it to be valid to keep an xc interface handle open after a fork. The child should open a fresh handle if it wants to keep interacting with xc. Ian.
Wangzhenguo
2012-Aug-08 03:44 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > I don''t think we should expect it to be valid to keep an xc interface > handle open after a fork. The child should open a fresh handle if it > wants to keep interacting with xc.OK, I see xs, libxl and python open a fresh handle to interact with xc in child process. I modified the patch as follow: # HG changeset patch # Parent a5dfd924fcdb173a154dad9f37073c1de1302065 libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to avoid the hypercall buffer becoming COW on hypercall. In multi-threads and multi-processes environment, e.g. the process has two threads, thread A may call hypercall, thread B may call fork() to create child process. After forking, all pages of the process including hypercall buffers are cow. The hypervisor calls copy_to_user in hypercall in thread A context, will cause a write protection and return EFAULT error. Fix: 1. Before hypercall: use MADV_DONTFORK of madvise syscall to make the hypercall buffer not to be copied to child process after fork. 2. After hypercall: undo the effect of MADV_DONTFORK for the hypercall buffer by using MADV_DOFORK of madvise syscall. 3. Use mmap/nunmap for memory alloc/free instead of malloc/free to bypass libc. Note: Chlid process do not use xc interface handle which is opend by parent process, it should open a fresh handle if it wants to keep interacting with xc. Otherwise, it may cause segment fault to access hypercall buffer cache in the handle. Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com> diff -r a5dfd924fcdb tools/libxc/xc_linux_osdep.c --- a/tools/libxc/xc_linux_osdep.c Tue Aug 07 13:52:10 2012 +0800 +++ b/tools/libxc/xc_linux_osdep.c Wed Aug 08 11:33:38 2012 +0800 @@ -93,22 +93,20 @@ static void *linux_privcmd_alloc_hyperca size_t size = npages * XC_PAGE_SIZE; void *p; - p = xc_memalign(xch, XC_PAGE_SIZE, size); - if (!p) - return NULL; + /* Address returned by mmap is page aligned. */ + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); - if ( mlock(p, size) < 0 ) - { - free(p); - return NULL; - } + /* Do not copy the VMA to child process on fork. Avoid the page being COW on hypercall. */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DONTFORK); return p; } static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) { - munlock(ptr, npages * XC_PAGE_SIZE); - free(ptr); + /* Recover the VMA flags. Maybe it''s not necessary */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DOFORK); + + munmap(ptr, npages * XC_PAGE_SIZE); } static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall)
Ian Campbell
2012-Aug-08 09:00 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
Adding Ian J for his opinion. I think this is a candidate for 4.2.0, although I would like to get rc2 under our belts first. On Wed, 2012-08-08 at 04:44 +0100, Wangzhenguo wrote:> > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > > > I don''t think we should expect it to be valid to keep an xc interface > > handle open after a fork. The child should open a fresh handle if it > > wants to keep interacting with xc. > OK, I see xs, libxl and python open a fresh handle to interact with xc in child process. > I modified the patch as follow: > > # HG changeset patch > # Parent a5dfd924fcdb173a154dad9f37073c1de1302065 > libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to avoid the hypercall buffer becoming COW on hypercall. > > In multi-threads and multi-processes environment, e.g. the process has two threads, thread A > may call hypercall, thread B may call fork() to create child process. After forking, all pages > of the process including hypercall buffers are cow. The hypervisor calls copy_to_user in hypercall > in thread A context, will cause a write protection and return EFAULT error. > > Fix: > 1. Before hypercall: use MADV_DONTFORK of madvise syscall to make the hypercall buffer > not to be copied to child process after fork. > 2. After hypercall: undo the effect of MADV_DONTFORK for the hypercall buffer by > using MADV_DOFORK of madvise syscall. > 3. Use mmap/nunmap for memory alloc/free instead of malloc/free to bypass libc. > > Note: > Chlid process do not use xc interface handle which is opend by parent process, it should open > a fresh handle if it wants to keep interacting with xc. Otherwise, it may cause segment fault > to access hypercall buffer cache in the handle.It would be good to write this in a comment next to each of the xc_{interface,evtchn,gnttab,gntshr}_open() prototypes in the header (assuming it applies to all of them, since they all make hypercalls I expect it does and in any case it is easy to relax this restriction in the future if not) Otherwise: Acked-by: Ian Campbell <ian.campbell@citrix.com>> > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> > Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com> > > diff -r a5dfd924fcdb tools/libxc/xc_linux_osdep.c > --- a/tools/libxc/xc_linux_osdep.c Tue Aug 07 13:52:10 2012 +0800 > +++ b/tools/libxc/xc_linux_osdep.c Wed Aug 08 11:33:38 2012 +0800 > @@ -93,22 +93,20 @@ static void *linux_privcmd_alloc_hyperca > size_t size = npages * XC_PAGE_SIZE; > void *p; > > - p = xc_memalign(xch, XC_PAGE_SIZE, size); > - if (!p) > - return NULL; > + /* Address returned by mmap is page aligned. */ > + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); > > - if ( mlock(p, size) < 0 ) > - { > - free(p); > - return NULL; > - } > + /* Do not copy the VMA to child process on fork. Avoid the page being COW on hypercall. */ > + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DONTFORK); > return p; > } > > static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) > { > - munlock(ptr, npages * XC_PAGE_SIZE); > - free(ptr); > + /* Recover the VMA flags. Maybe it''s not necessary */ > + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DOFORK); > + > + munmap(ptr, npages * XC_PAGE_SIZE); > } > > static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall) >
Wangzhenguo
2012-Aug-08 11:36 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > It would be good to write this in a comment next to each of the > xc_{interface,evtchn,gnttab,gntshr}_open() prototypes in the header > (assuming it applies to all of them, since they all make hypercalls I > expect it does and in any case it is easy to relax this restriction in > the future if not) > > Otherwise: > Acked-by: Ian Campbell <ian.campbell@citrix.com>I added a comment and your acked-by in patch The patch is # HG changeset patch # Parent a5dfd924fcdb173a154dad9f37073c1de1302065 libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to avoid the hypercall buffer becoming COW on hypercall. In multi-threads and multi-processes environment, e.g. the process has two threads, thread A may call hypercall, thread B may call fork() to create child process. After forking, all pages of the process including hypercall buffers are cow. The hypervisor calls copy_to_user in hypercall in thread A context, will cause a write protection and return EFAULT error. Fix: 1. Before hypercall: use MADV_DONTFORK of madvise syscall to make the hypercall buffer not to be copied to child process after fork. 2. After hypercall: undo the effect of MADV_DONTFORK for the hypercall buffer by using MADV_DOFORK of madvise syscall. 3. Use mmap/nunmap for memory alloc/free instead of malloc/free to bypass libc. Note: Chlid process do not use xc interface handle which is opend by parent process, it should open a fresh handle if it wants to keep interacting with xc. Otherwise, it may cause segment fault to access hypercall buffer cache in the handle. Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> diff -r a5dfd924fcdb tools/libxc/xc_linux_osdep.c --- a/tools/libxc/xc_linux_osdep.c Tue Aug 07 13:52:10 2012 +0800 +++ b/tools/libxc/xc_linux_osdep.c Wed Aug 08 19:31:53 2012 +0800 @@ -93,22 +93,20 @@ static void *linux_privcmd_alloc_hyperca size_t size = npages * XC_PAGE_SIZE; void *p; - p = xc_memalign(xch, XC_PAGE_SIZE, size); - if (!p) - return NULL; + /* Address returned by mmap is page aligned. */ + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); - if ( mlock(p, size) < 0 ) - { - free(p); - return NULL; - } + /* Do not copy the VMA to child process on fork. Avoid the page being COW on hypercall. */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DONTFORK); return p; } static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) { - munlock(ptr, npages * XC_PAGE_SIZE); - free(ptr); + /* Recover the VMA flags. Maybe it''s not necessary */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DOFORK); + + munmap(ptr, npages * XC_PAGE_SIZE); } static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall) diff -r a5dfd924fcdb tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h Tue Aug 07 13:52:10 2012 +0800 +++ b/tools/libxc/xenctrl.h Wed Aug 08 19:31:53 2012 +0800 @@ -131,8 +131,11 @@ typedef enum xc_error_code xc_error_code /** * This function opens a handle to the hypervisor interface. This function can - * be called multiple times within a single process. Multiple processes can - * have an open hypervisor interface at the same time. + * be called multiple times within a single process. Multiple processes can not + * have an open hypervisor interface at the same time. Chlid process do not + * use xc interface handle which is opend by parent process, it should open + * a fresh handle if it wants to keep interacting with xc. Otherwise, it may + * cause segment fault to access hypercall buffer cache in the handle. * * Each call to this function should have a corresponding call to * xc_interface_close(). @@ -908,6 +911,11 @@ int xc_evtchn_status(xc_interface *xch, * Return a handle to the event channel driver, or -1 on failure, in which case * errno will be set appropriately. * + * Note: + * Chlid process do not use xc evtchn handle which is opend by parent process, + * it should open a fresh handle if it wants to keep interacting with xc. Otherwise, + * it may cause segment fault to access hypercall buffer cache in the handle. + * * Before Xen pre-4.1 this function would sometimes report errors with perror. */ xc_evtchn *xc_evtchn_open(xentoollog_logger *logger, @@ -1339,9 +1347,12 @@ int xc_domain_subscribe_for_suspend( /* * These functions sometimes log messages as above, but not always. - */ - -/* + * + * Note: + * Chlid process do not use xc gnttab handle which is opend by parent process, + * it should open a fresh handle if it wants to keep interacting with xc. Otherwise, + * it may cause segment fault to access hypercall buffer cache in the handle. + * * Return an fd onto the grant table driver. Logs errors. */ xc_gnttab *xc_gnttab_open(xentoollog_logger *logger, @@ -1458,6 +1469,12 @@ grant_entry_v2_t *xc_gnttab_map_table_v2 /* * Return an fd onto the grant sharing driver. Logs errors. + * + * Note: + * Chlid process do not use xc gntshr handle which is opend by parent process, + * it should open a fresh handle if it wants to keep interacting with xc. Otherwise, + * it may cause segment fault to access hypercall buffer cache in the handle. + * */ xc_gntshr *xc_gntshr_open(xentoollog_logger *logger, unsigned open_flags);
Ian Campbell
2012-Aug-08 12:06 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Wed, 2012-08-08 at 12:36 +0100, Wangzhenguo wrote:> > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > > > > It would be good to write this in a comment next to each of the > > xc_{interface,evtchn,gnttab,gntshr}_open() prototypes in the header > > (assuming it applies to all of them, since they all make hypercalls I > > expect it does and in any case it is easy to relax this restriction in > > the future if not) > > > > Otherwise: > > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > I added a comment and your acked-by in patch > > The patch is > > # HG changeset patch > # Parent a5dfd924fcdb173a154dad9f37073c1de1302065 > libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to avoid the hypercall buffer becoming COW on hypercall. > > In multi-threads and multi-processes environment, e.g. the process has two threads, thread A > may call hypercall, thread B may call fork() to create child process. After forking, all pages > of the process including hypercall buffers are cow. The hypervisor calls copy_to_user in hypercall > in thread A context, will cause a write protection and return EFAULT error. > > Fix: > 1. Before hypercall: use MADV_DONTFORK of madvise syscall to make the hypercall buffer > not to be copied to child process after fork. > 2. After hypercall: undo the effect of MADV_DONTFORK for the hypercall buffer by > using MADV_DOFORK of madvise syscall. > 3. Use mmap/nunmap for memory alloc/free instead of malloc/free to bypass libc. > > Note: > Chlid process do not use xc interface handle which is opend by parent process, it should open"Child" and "opened" (you made these typos pretty consistently throughout ;-)) Please can you try and keep the commit message to 75-80 characters wide.> a fresh handle if it wants to keep interacting with xc. Otherwise, it may cause segment fault > to access hypercall buffer cache in the handle. > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> > Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com> > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > diff -r a5dfd924fcdb tools/libxc/xc_linux_osdep.c > --- a/tools/libxc/xc_linux_osdep.c Tue Aug 07 13:52:10 2012 +0800 > +++ b/tools/libxc/xc_linux_osdep.c Wed Aug 08 19:31:53 2012 +0800 > @@ -93,22 +93,20 @@ static void *linux_privcmd_alloc_hyperca > size_t size = npages * XC_PAGE_SIZE; > void *p; > > - p = xc_memalign(xch, XC_PAGE_SIZE, size); > - if (!p) > - return NULL; > + /* Address returned by mmap is page aligned. */ > + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); > > - if ( mlock(p, size) < 0 ) > - { > - free(p); > - return NULL; > - } > + /* Do not copy the VMA to child process on fork. Avoid the page being COW on hypercall. */ > + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DONTFORK); > return p; > } > > static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) > { > - munlock(ptr, npages * XC_PAGE_SIZE); > - free(ptr); > + /* Recover the VMA flags. Maybe it''s not necessary */ > + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DOFORK); > + > + munmap(ptr, npages * XC_PAGE_SIZE); > } > > static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall) > diff -r a5dfd924fcdb tools/libxc/xenctrl.h > --- a/tools/libxc/xenctrl.h Tue Aug 07 13:52:10 2012 +0800 > +++ b/tools/libxc/xenctrl.h Wed Aug 08 19:31:53 2012 +0800 > @@ -131,8 +131,11 @@ typedef enum xc_error_code xc_error_code > > /** > * This function opens a handle to the hypervisor interface. This function can > - * be called multiple times within a single process. Multiple processes can > - * have an open hypervisor interface at the same time. > + * be called multiple times within a single process. Multiple processes can not > + * have an open hypervisor interface at the same time. Chlid process do not > + * use xc interface handle which is opend by parent process, it should openThis would be more naturally written as "Child processes must not use..."> + * a fresh handle if it wants to keep interacting with xc. Otherwise, it may > + * cause segment fault to access hypercall buffer cache in the handle. > * > * Each call to this function should have a corresponding call to > * xc_interface_close(). > @@ -908,6 +911,11 @@ int xc_evtchn_status(xc_interface *xch, > * Return a handle to the event channel driver, or -1 on failure, in which case > * errno will be set appropriately. > * > + * Note: > + * Chlid process do not use xc evtchn handle which is opend by parent process, > + * it should open a fresh handle if it wants to keep interacting with xc. Otherwise, > + * it may cause segment fault to access hypercall buffer cache in the handle. > + * > * Before Xen pre-4.1 this function would sometimes report errors with perror. > */ > xc_evtchn *xc_evtchn_open(xentoollog_logger *logger, > @@ -1339,9 +1347,12 @@ int xc_domain_subscribe_for_suspend( > > /* > * These functions sometimes log messages as above, but not always. > - */ > - > -/* > + * > + * Note: > + * Chlid process do not use xc gnttab handle which is opend by parent process, > + * it should open a fresh handle if it wants to keep interacting with xc. Otherwise, > + * it may cause segment fault to access hypercall buffer cache in the handle. > + * > * Return an fd onto the grant table driver. Logs errors. > */ > xc_gnttab *xc_gnttab_open(xentoollog_logger *logger, > @@ -1458,6 +1469,12 @@ grant_entry_v2_t *xc_gnttab_map_table_v2 > > /* > * Return an fd onto the grant sharing driver. Logs errors. > + * > + * Note: > + * Chlid process do not use xc gntshr handle which is opend by parent process, > + * it should open a fresh handle if it wants to keep interacting with xc. Otherwise, > + * it may cause segment fault to access hypercall buffer cache in the handle. > + * > */ > xc_gntshr *xc_gntshr_open(xentoollog_logger *logger, > unsigned open_flags);
Wangzhenguo
2012-Aug-09 06:56 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Wednesday, August 08, 2012 8:06 PM > To: Wangzhenguo > Cc: Yangxiaowei; Yechuan; Hongtao; xen-devel@lists.xen.org; Ian Jackson > Subject: Re: [Xen-devel] The hypercall will fail and return EFAULT when > the page becomes COW by forking process in linux > > > Note: > > Chlid process do not use xc interface handle which is opend by parent > process, it should open > > "Child" and "opened" (you made these typos pretty consistently > throughout ;-)) > > Please can you try and keep the commit message to 75-80 characters wide.Hi, Ian, thanks your reply. There is a new patch as follow # HG changeset patch # Parent a5dfd924fcdb173a154dad9f37073c1de1302065 libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to avoid the hypercall buffer becoming COW on hypercall. In multi-threads and multi-processes environment, e.g. the process has two threads, thread A may call hypercall, thread B may call fork() to create child process. After forking, all pages of the process including hypercall buffers are cow. It will cause a write protection and return EFAULT error if hypervisor calls copy_to_user in hypercall in thread A context, Fix: 1. Before hypercall: use MADV_DONTFORK of madvise syscall to make the hypercall buffer not to be copied to child process after fork. 2. After hypercall: undo the effect of MADV_DONTFORK for the hypercall buffer by using MADV_DOFORK of madvise syscall. 3. Use mmap/nunmap for memory alloc/free instead of malloc/free to bypass libc. Note: Child processes must not use the opened xc_{interface,evtchn,gnttab,gntshr} handle that inherits from parents. They should reopen the handle if they want to interact with xc. Otherwise, it may cause segment fault to access hypercall buffer caches of the handle. Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> diff -r a5dfd924fcdb tools/libxc/xc_linux_osdep.c --- a/tools/libxc/xc_linux_osdep.c Tue Aug 07 13:52:10 2012 +0800 +++ b/tools/libxc/xc_linux_osdep.c Thu Aug 09 14:42:29 2012 +0800 @@ -93,22 +93,21 @@ static void *linux_privcmd_alloc_hyperca size_t size = npages * XC_PAGE_SIZE; void *p; - p = xc_memalign(xch, XC_PAGE_SIZE, size); - if (!p) - return NULL; + /* Address returned by mmap is page aligned. */ + p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, -1, 0); - if ( mlock(p, size) < 0 ) - { - free(p); - return NULL; - } + /* Do not copy the VMA to child process on fork. Avoid the page being COW + on hypercall. */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DONTFORK); return p; } static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, xc_osdep_handle h, void *ptr, int npages) { - munlock(ptr, npages * XC_PAGE_SIZE); - free(ptr); + /* Recover the VMA flags. Maybe it''s not necessary */ + madvise(ptr, npages * XC_PAGE_SIZE, MADV_DOFORK); + + munmap(ptr, npages * XC_PAGE_SIZE); } static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, privcmd_hypercall_t *hypercall) diff -r a5dfd924fcdb tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h Tue Aug 07 13:52:10 2012 +0800 +++ b/tools/libxc/xenctrl.h Thu Aug 09 14:42:29 2012 +0800 @@ -134,6 +134,12 @@ typedef enum xc_error_code xc_error_code * be called multiple times within a single process. Multiple processes can * have an open hypervisor interface at the same time. * + * Note: + * Child processes must not use the opened xc interface handle that inherits + * from parents. They should reopen the handle if they want to interact with + * xc. Otherwise, it may cause segment fault to access hypercall buffer caches + * of the handle. + * * Each call to this function should have a corresponding call to * xc_interface_close(). * @@ -908,6 +914,12 @@ int xc_evtchn_status(xc_interface *xch, * Return a handle to the event channel driver, or -1 on failure, in which case * errno will be set appropriately. * + * Note: + * Child processes must not use the opened xc evtchn handle that inherits from + * parents. They should reopen the handle if they want to interact with xc. + * Otherwise, it may cause segment fault to access hypercall buffer caches of + * the handle. + * * Before Xen pre-4.1 this function would sometimes report errors with perror. */ xc_evtchn *xc_evtchn_open(xentoollog_logger *logger, @@ -1339,9 +1351,13 @@ int xc_domain_subscribe_for_suspend( /* * These functions sometimes log messages as above, but not always. - */ - -/* + * + * Note: + * Child processes must not use the opened xc gnttab handle that inherits from + * parents. They should reopen the handle if they want to interact with xc. + * Otherwise, it may cause segment fault to access hypercall buffer caches of + * the handle. + * * Return an fd onto the grant table driver. Logs errors. */ xc_gnttab *xc_gnttab_open(xentoollog_logger *logger, @@ -1458,6 +1474,13 @@ grant_entry_v2_t *xc_gnttab_map_table_v2 /* * Return an fd onto the grant sharing driver. Logs errors. + * + * Note: + * Child processes must not use the opened xc gntshr handle that inherits from + * parents. They should reopen the handle if they want to interact with xc. + * Otherwise, it may cause segment fault to access hypercall buffer caches of + * the handle. + * */ xc_gntshr *xc_gntshr_open(xentoollog_logger *logger, unsigned open_flags);
Ian Campbell
2012-Aug-17 13:47 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
On Thu, 2012-08-09 at 07:56 +0100, Wangzhenguo wrote:> # HG changeset patch > # Parent a5dfd924fcdb173a154dad9f37073c1de1302065 > libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to avoid the > hypercall buffer becoming COW on hypercall.When I came to commit this I got: xc_linux_osdep.c: In function ‘linux_privcmd_alloc_hypercall_buffer’: xc_linux_osdep.c:101: error: ‘ptr’ undeclared (first use in this function) xc_linux_osdep.c:101: error: (Each undeclared identifier is reported only once xc_linux_osdep.c:101: error: for each function it appears in.) I've fixed this up for you this time but please be more careful in future and compile/test your patches. I also tweaked the wording of your comments a little bit, I hope that's ok. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Wangzhenguo
2012-Aug-20 01:30 UTC
Re: The hypercall will fail and return EFAULT when the page becomes COW by forking process in linux
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Friday, August 17, 2012 9:48 PM > To: Wangzhenguo > Cc: Yangxiaowei; Yechuan; Hongtao; xen-devel@lists.xen.org; Ian Jackson > Subject: Re: [Xen-devel] The hypercall will fail and return EFAULT when > the page becomes COW by forking process in linux > > On Thu, 2012-08-09 at 07:56 +0100, Wangzhenguo wrote: > > > # HG changeset patch > > # Parent a5dfd924fcdb173a154dad9f37073c1de1302065 > > libxc: Add VM_DONTCOPY flag of the VMA of the hypercall buffer, to > avoid the > > hypercall buffer becoming COW on hypercall. > > When I came to commit this I got: > xc_linux_osdep.c: In function ‘linux_privcmd_alloc_hypercall_buffer’: > xc_linux_osdep.c:101: error: ‘ptr’ undeclared (first use in this > function) > xc_linux_osdep.c:101: error: (Each undeclared identifier is reported > only once > xc_linux_osdep.c:101: error: for each function it appears in.) > > I've fixed this up for you this time but please be more careful in > future and compile/test your patches.Thanks Ian. I'm sorry, I promise to compile/test my patches.> > I also tweaked the wording of your comments a little bit, I hope that's > ok. > > Ian._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel