Chunyan Liu
2013-Sep-03 07:01 UTC
question about SIGSEGV in datacopier_readable in libxl_aoutil.c
Hi, List, I''m trying to add migration APIs to libvirt libxl driver. In testing HVM migration, on source side, when executing libxl_domain_suspend, often meet SIGSEGV in libxl_aoutil.c: datacopier_readable, the malloc() function place: if (!buf || buf->used >= sizeof(buf->buf)) { buf = malloc(sizeof(*buf)); I doubt the heap is corrupted someway but couldn''t confirm the root cause. And I tried valgrind to find some clue, following is the info right before the SIGSEGV. #valgrind --leak-check=full /usr/sbin/libvirtd -l -d [snip] ==7510== Syscall param read(buf) points to unaddressable byte(s) ==7510== at 0x8ECC76D: ??? (syscall-template.S:82) ==7510== by 0x14AB3070: datacopier_readable (unistd.h:45) ==7510== by 0x14AB833C: afterpoll_internal (libxl_event.c:995) ==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440) ==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685) ==7510== by 0x14A9ABF7: libxl_domain_suspend (libxl.c:785) ==7510== by 0x148404B3: libxlDomainMigratePerform3 (libxl_driver.c:5100) ==7510== by 0x5390CFA: virDomainMigratePerform3 (libvirt.c:7050) ==7510== by 0x12C262: remoteDispatchDomainMigratePerform3Helper (remote.c:3507) ==7510== by 0x53EACBE: virNetServerProgramDispatch (virnetserverprogram.c:435) ==7510== by 0x53EBCBD: virNetServerProcessMsg (virnetserver.c:165) ==7510== by 0x53EC912: virNetServerHandleJob (virnetserver.c:186) ==7510== Address 0x18a409ec is 0 bytes after a block of size 28 alloc''d ==7510== at 0x4C26FFB: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83) ==7510== by 0x14AB33B0: libxl__datacopier_prefixdata (libxl_aoutils.c:92) ==7510== by 0x14AA6CDC: libxl__domain_save_device_model (libxl_dom.c:1447) ==7510== by 0x14AA8097: libxl__xc_domain_save_done (libxl_dom.c:1382) ==7510== by 0x14AB4268: helper_done (libxl_save_callout.c:332) ==7510== by 0x14AB4CA2: helper_exited (libxl_save_callout.c:317) ==7510== by 0x14ABB274: childproc_reaped (libxl_fork.c:264) ==7510== by 0x14ABB97A: libxl__fork_selfpipe_woken (libxl_fork.c:300) ==7510== by 0x14AB83A0: afterpoll_internal (libxl_event.c:1008) ==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440) ==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685) ==7510=--7510-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --7510-- si_code=80; Faulting address: 0x0; sp: 0x406ad5da0 I couldn''t find a clear problem in the code, but after trying to change the code a little, it turned to be working. Following is the change. --- a/tools/libxl/libxl_aoutils.c +++ b/tools/libxl/libxl_aoutils.c @@ -89,7 +89,8 @@ void libxl__datacopier_prefixdata(libxl_ assert(len < dc->maxsz - dc->used); - buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len); +// buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len); + buf = libxl__zalloc(NOGC, sizeof(libxl__datacopier_buf)); buf->used = len; memcpy(buf->buf, data, len); @@ -141,10 +142,11 @@ static void datacopier_readable(libxl__e libxl__datacopier_buf *buf LIBXL_TAILQ_LAST(&dc->bufs, libxl__datacopier_bufs); if (!buf || buf->used >= sizeof(buf->buf)) { - buf = malloc(sizeof(*buf)); - if (!buf) libxl__alloc_failed(CTX, __func__, 1, sizeof(*buf)); - buf->used = 0; - LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry); + libxl__datacopier_buf *newbuf malloc(sizeof(libxl__datacopier_buf)); + if (!newbuf) libxl__alloc_failed(CTX, __func__, 1, sizeof(libxl__datacopier_buf)); + newbuf->used = 0; + LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, newbuf, entry); + buf = newbuf; } int r = read(ev->fd, buf->buf + buf->used, Could anybody familiar with this part of code take a look at it? Thanks, Chunyan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Sep-03 07:56 UTC
Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c
On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote:> Hi, List,Adding Ian J, who knows this bit of the code. Which version of Xen (and therefore libxl) are you using?> I''m trying to add migration APIs to libvirt libxl driver. In testing > HVM migration, on source side, when executing libxl_domain_suspend, > often meet SIGSEGV in libxl_aoutil.c: datacopier_readable, the > malloc() function place: > if (!buf || buf->used >= sizeof(buf->buf)) { > buf = malloc(sizeof(*buf)); > > I doubt the heap is corrupted someway but couldn''t confirm the root > cause. And I tried valgrind to find some clue, following is the info > right before the SIGSEGV. > #valgrind --leak-check=full /usr/sbin/libvirtd -l -dDid you use a Xen aware version of valgrind? http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/> [snip] > > ==7510== Syscall param read(buf) points to unaddressable byte(s) > ==7510== at 0x8ECC76D: ??? (syscall-template.S:82) > ==7510== by 0x14AB3070: datacopier_readable (unistd.h:45) > ==7510== by 0x14AB833C: afterpoll_internal (libxl_event.c:995) > ==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440) > ==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685) > ==7510== by 0x14A9ABF7: libxl_domain_suspend (libxl.c:785) > ==7510== by 0x148404B3: libxlDomainMigratePerform3 (libxl_driver.c:5100) > ==7510== by 0x5390CFA: virDomainMigratePerform3 (libvirt.c:7050) > ==7510== by 0x12C262: remoteDispatchDomainMigratePerform3Helper (remote.c:3507) > ==7510== by 0x53EACBE: virNetServerProgramDispatch (virnetserverprogram.c:435) > ==7510== by 0x53EBCBD: virNetServerProcessMsg (virnetserver.c:165) > ==7510== by 0x53EC912: virNetServerHandleJob (virnetserver.c:186) > ==7510== Address 0x18a409ec is 0 bytes after a block of size 28 alloc''d > ==7510== at 0x4C26FFB: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83) > ==7510== by 0x14AB33B0: libxl__datacopier_prefixdata (libxl_aoutils.c:92) > ==7510== by 0x14AA6CDC: libxl__domain_save_device_model (libxl_dom.c:1447) > ==7510== by 0x14AA8097: libxl__xc_domain_save_done (libxl_dom.c:1382) > ==7510== by 0x14AB4268: helper_done (libxl_save_callout.c:332) > ==7510== by 0x14AB4CA2: helper_exited (libxl_save_callout.c:317) > ==7510== by 0x14ABB274: childproc_reaped (libxl_fork.c:264) > ==7510== by 0x14ABB97A: libxl__fork_selfpipe_woken (libxl_fork.c:300) > ==7510== by 0x14AB83A0: afterpoll_internal (libxl_event.c:1008) > ==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440) > ==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685) > ==7510=> --7510-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting > --7510-- si_code=80; Faulting address: 0x0; sp: 0x406ad5da0 > > I couldn''t find a clear problem in the code, but after trying to change the code a little, it turned to be working. > Following is the change. > > --- a/tools/libxl/libxl_aoutils.c > +++ b/tools/libxl/libxl_aoutils.c > @@ -89,7 +89,8 @@ void libxl__datacopier_prefixdata(libxl_ > > assert(len < dc->maxsz - dc->used); > > - buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len); > +// buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len); > + buf = libxl__zalloc(NOGC, sizeof(libxl__datacopier_buf)); > buf->used = len; > memcpy(buf->buf, data, len); > > @@ -141,10 +142,11 @@ static void datacopier_readable(libxl__e > libxl__datacopier_buf *buf > LIBXL_TAILQ_LAST(&dc->bufs, libxl__datacopier_bufs); > if (!buf || buf->used >= sizeof(buf->buf)) { > - buf = malloc(sizeof(*buf)); > - if (!buf) libxl__alloc_failed(CTX, __func__, 1, sizeof(*buf)); > - buf->used = 0; > - LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry); > + libxl__datacopier_buf *newbuf = malloc(sizeof(libxl__datacopier_buf)); > + if (!newbuf) libxl__alloc_failed(CTX, __func__, 1, sizeof(libxl__datacopier_buf)); > + newbuf->used = 0; > + LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, newbuf, entry); > + buf = newbuf; > } > int r = read(ev->fd, > buf->buf + buf->used, > > > Could anybody familiar with this part of code take a look at it? > > > Thanks, > Chunyan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Andrew Cooper
2013-Sep-03 09:23 UTC
Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c
On 03/09/13 08:56, Ian Campbell wrote:> On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote: >> Hi, List, > Adding Ian J, who knows this bit of the code. > > Which version of Xen (and therefore libxl) are you using? > >> I''m trying to add migration APIs to libvirt libxl driver. In testing >> HVM migration, on source side, when executing libxl_domain_suspend, >> often meet SIGSEGV in libxl_aoutil.c: datacopier_readable, the >> malloc() function place: >> if (!buf || buf->used >= sizeof(buf->buf)) { >> buf = malloc(sizeof(*buf)); >> >> I doubt the heap is corrupted someway but couldn''t confirm the root >> cause. And I tried valgrind to find some clue, following is the info >> right before the SIGSEGV. >> #valgrind --leak-check=full /usr/sbin/libvirtd -l -d > Did you use a Xen aware version of valgrind? > http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/In which case I really should post my valgrind patches for full migration support. I will try to get around to posting the series today. ~Andrew
Ian Jackson
2013-Sep-03 14:31 UTC
Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c [and 1 more messages]
Ian Campbell writes ("Re: [Xen-devel] question about SIGSEGV in datacopier_readable in libxl_aoutil.c"):> On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote: > > if (!buf || buf->used >= sizeof(buf->buf)) { > > buf = malloc(sizeof(*buf));...> > ==7510== Syscall param read(buf) points to unaddressable byte(s)...> > ==7510== Address 0x18a409ec is 0 bytes after a block of size 28 alloc''d > > ==7510== at 0x4C26FFB: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > > ==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83) > > ==7510== by 0x14AB33B0: libxl__datacopier_prefixdata (libxl_aoutils.c:92)I think this is my fault. Please try this patch. Thanks, Ian. commit 25cd65c97b733d5892b62c3ffae0887f426398ec Author: Ian Jackson <ian.jackson@eu.citrix.com> Date: Tue Sep 3 13:41:46 2013 +0100 libxl: Do not generate short block in libxl__datacopier_prefixdata libxl__datacopier_prefixdata would prepend a deliberately short block (not just a half-full one, but one with a short buffer) to the dc->bufs queue. However, this is wrong because datacopier_readable will find it and try to continue to fill it up. Instead, allocate a full-sized buffer. Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c index 983a60a..b4eb6e5 100644 --- a/tools/libxl/libxl_aoutils.c +++ b/tools/libxl/libxl_aoutils.c @@ -89,7 +89,7 @@ void libxl__datacopier_prefixdata(libxl__egc *egc, libxl__datacopier_state *dc, assert(len < dc->maxsz - dc->used); - buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len); + buf = libxl__zalloc(NOGC, sizeof(*buf)); buf->used = len; memcpy(buf->buf, data, len);
Chunyan Liu
2013-Sep-04 08:15 UTC
Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c [and 1 more messages]
2013/9/3 Ian Jackson <Ian.Jackson@eu.citrix.com>> Ian Campbell writes ("Re: [Xen-devel] question about SIGSEGV in > datacopier_readable in libxl_aoutil.c"): > > On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote: > > > if (!buf || buf->used >= sizeof(buf->buf)) { > > > buf = malloc(sizeof(*buf)); > ... > > > ==7510== Syscall param read(buf) points to unaddressable byte(s) > ... > > > ==7510== Address 0x18a409ec is 0 bytes after a block of size 28 > alloc''d > > > ==7510== at 0x4C26FFB: calloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > > > ==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83) > > > ==7510== by 0x14AB33B0: libxl__datacopier_prefixdata > (libxl_aoutils.c:92) > > I think this is my fault. Please try this patch. > > Thanks, > Ian. > > commit 25cd65c97b733d5892b62c3ffae0887f426398ec > Author: Ian Jackson <ian.jackson@eu.citrix.com> > Date: Tue Sep 3 13:41:46 2013 +0100 > > libxl: Do not generate short block in libxl__datacopier_prefixdata > > libxl__datacopier_prefixdata would prepend a deliberately short block > (not just a half-full one, but one with a short buffer) to the > dc->bufs queue. However, this is wrong because datacopier_readable > will find it and try to continue to fill it up. > > Instead, allocate a full-sized buffer. > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c > index 983a60a..b4eb6e5 100644 > --- a/tools/libxl/libxl_aoutils.c > +++ b/tools/libxl/libxl_aoutils.c > @@ -89,7 +89,7 @@ void libxl__datacopier_prefixdata(libxl__egc *egc, > libxl__datacopier_state *dc, > > assert(len < dc->maxsz - dc->used); > > - buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len); > + buf = libxl__zalloc(NOGC, sizeof(*buf)); > buf->used = len; > memcpy(buf->buf, data, len); > >Tried the patch. It worked. Thanks.> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel