马磊
2012-Jul-31 06:55 UTC
Why `xl restore` runs OK by shell but running on gdb encounter an segmentation fault?
Hi all, I paste below the running info of `gdb xl` Program received signal SIGSEGV, Segmentation fault. 0x000000000040a8a3 in create_domain (dom_info=0x7fffffffe4f0) at xl_cmdimpl.c:1432 (gdb) bt #0 0x000000000040a8a3 in create_domain (dom_info=0x7fffffffe4f0) at xl_cmdimpl.c:1432 #1 0x000000000040f938 in main_restore (argc=2, argv=0x7fffffffe6b8) at xl_cmdimpl.c:2932 #2 0x000000000040508b in main (argc=2, argv=0x7fffffffe6b8) at xl.c:141 (gdb) show args Argument list to give program being debugged when it is started is "-v restore ./xp101.save". Actually, in the shell I could run `xl -v restore ./xp101.save` successfully. Did anyone encounter the same wired problem? thanks in advance _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
马磊
2012-Jul-31 08:16 UTC
Re: Why `xl restore` runs OK by shell but running on gdb encounter an segmentation fault?
On Tue, Jul 31, 2012 at 2:55 PM, 马磊 <aware.why@gmail.com> wrote:> Hi all, > I paste below the running info of `gdb xl` > Program received signal SIGSEGV, Segmentation fault. > 0x000000000040a8a3 in create_domain (dom_info=0x7fffffffe4f0) at > xl_cmdimpl.c:1432 > (gdb) bt > #0 0x000000000040a8a3 in create_domain (dom_info=0x7fffffffe4f0) at > xl_cmdimpl.c:1432 > #1 0x000000000040f938 in main_restore (argc=2, argv=0x7fffffffe6b8) at > xl_cmdimpl.c:2932 > #2 0x000000000040508b in main (argc=2, argv=0x7fffffffe6b8) at xl.c:141 > (gdb) show args > Argument list to give program being debugged when it is started is "-v > restore ./xp101.save". > > Actually, in the shell I could run `xl -v restore ./xp101.save` > successfully. > Did anyone encounter the same wired problem? > > thanks in advance >I need to dig into xl by gdb, could anyone kind-hearted help me get through this problem... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-31 08:41 UTC
Re: Why `xl restore` runs OK by shell but running on gdb encounter an segmentation fault?
On Tue, 2012-07-31 at 09:16 +0100, 马磊 wrote:> > > On Tue, Jul 31, 2012 at 2:55 PM, 马磊 <aware.why@gmail.com> wrote: > Hi all, > I paste below the running info of `gdb xl` > Program received signal SIGSEGV, Segmentation fault. > 0x000000000040a8a3 in create_domain (dom_info=0x7fffffffe4f0) > at xl_cmdimpl.c:1432 > (gdb) bt > #0 0x000000000040a8a3 in create_domain > (dom_info=0x7fffffffe4f0) at xl_cmdimpl.c:1432 > #1 0x000000000040f938 in main_restore (argc=2, > argv=0x7fffffffe6b8) at xl_cmdimpl.c:2932 > #2 0x000000000040508b in main (argc=2, argv=0x7fffffffe6b8) > at xl.c:141 > (gdb) show args > Argument list to give program being debugged when it is > started is "-v restore ./xp101.save". > > > Actually, in the shell I could run `xl -v > restore ./xp101.save` successfully. > Did anyone encounter the same wired problem? > > > thanks in advance > > I need to dig into xl by gdb, could anyone kind-hearted help me get > through this problem...You need to be more patient -- you left around 80 minutes between your original mail and this ping. Remember that not everyone is awake and online at the same times as you are, nor can they be expected to drop everything to reply to your mail. I'm afraid that "gdb --args xl -v restore SAVED" works fine for me. I presume that running under gdb perturbs timings or memory layout etc to expose a previously latent bug. gdb has given you the line numbers of where the segmentation fault has occurred -- why don't you investigate further? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-31 09:12 UTC
Re: Why `xl restore` runs OK by shell but running on gdb encounter an segmentation fault?
Please don't drop the list. I've added it back. On Tue, 2012-07-31 at 09:58 +0100, 马磊 wrote:> > > On Tue, Jul 31, 2012 at 4:41 PM, Ian Campbell > <Ian.Campbell@citrix.com> wrote: > On Tue, 2012-07-31 at 09:16 +0100, 马磊 wrote: > > > > > > On Tue, Jul 31, 2012 at 2:55 PM, 马磊 <aware.why@gmail.com> > wrote: > > Hi all, > > I paste below the running info of `gdb xl` > > Program received signal SIGSEGV, Segmentation fault. > > 0x000000000040a8a3 in create_domain > (dom_info=0x7fffffffe4f0) > > at xl_cmdimpl.c:1432 > > (gdb) bt > > #0 0x000000000040a8a3 in create_domain > > (dom_info=0x7fffffffe4f0) at xl_cmdimpl.c:1432 > > #1 0x000000000040f938 in main_restore (argc=2, > > argv=0x7fffffffe6b8) at xl_cmdimpl.c:2932 > > #2 0x000000000040508b in main (argc=2, > argv=0x7fffffffe6b8) > > at xl.c:141 > > (gdb) show args > > Argument list to give program being debugged when it > is > > started is "-v restore ./xp101.save". > > > > > > Actually, in the shell I could run `xl -v > > restore ./xp101.save` successfully. > > Did anyone encounter the same wired problem? > > > > > > thanks in advance > > > > I need to dig into xl by gdb, could anyone kind-hearted > help me get > > through this problem... > > > You need to be more patient -- you left around 80 minutes > between your > original mail and this ping. Remember that not everyone is > awake and > online at the same times as you are, nor can they be expected > to drop > everything to reply to your mail. > > I'm afraid that "gdb --args xl -v restore SAVED" works fine > for me. > > I presume that running under gdb perturbs timings or memory > layout etc > to expose a previously latent bug. > > gdb has given you the line numbers of where the segmentation > fault has > occurred -- why don't you investigate further? > > Ian. > > I investigated it and observed the local variables info, the line 1432 > is not likely to cause SIGSEGV in theory because `xmalloc` works well > and config_data\optdata_here\config_len was right. > But after xmalloc, application received SIGSEGV while config_data had > a valid value as (void *) 0x622e10. > 1426 if (OPTDATA_LEFT) { > 1427 fprintf(stderr, " Savefile contains xl domain config\n"); > 1428 WITH_OPTDATA(4, { > 1429 memcpy(u32buf.b, optdata_here, 4); > 1430 config_len = u32buf.u32; > 1431 }); > 1432 WITH_OPTDATA(config_len, { > 1433 config_data = xmalloc(config_len); > 1434 memcpy(config_data, optdata_here, config_len);Is "optdata_here" valid too? Have you checked the values of all the local variables with "print <var>"? You can use the "disas" to print the assembly code and find the exact instruction which caused the fault. That might give a clue as to which variable was invalid. You might also need to add some prints of variables etc to check that they are ok. Which version of Xen is this happening with? These line numbers do not correspond at all to the head version (I have this stuff at line 169x) and I don't see anything recently which added 200+ lines to xl_cmdimpl.c> 1435 }); > 1436 }_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
马磊
2012-Jul-31 09:40 UTC
Re: Why `xl restore` runs OK by shell but running on gdb encounter an segmentation fault?
On Tue, Jul 31, 2012 at 5:12 PM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> Please don''t drop the list. I''ve added it back. > > > On Tue, 2012-07-31 at 09:58 +0100, 马磊 wrote: > > > > > > On Tue, Jul 31, 2012 at 4:41 PM, Ian Campbell > > <Ian.Campbell@citrix.com> wrote: > > On Tue, 2012-07-31 at 09:16 +0100, 马磊 wrote: > > > > > > > > > On Tue, Jul 31, 2012 at 2:55 PM, 马磊 <aware.why@gmail.com> > > wrote: > > > Hi all, > > > I paste below the running info of `gdb xl` > > > Program received signal SIGSEGV, Segmentation fault. > > > 0x000000000040a8a3 in create_domain > > (dom_info=0x7fffffffe4f0) > > > at xl_cmdimpl.c:1432 > > > (gdb) bt > > > #0 0x000000000040a8a3 in create_domain > > > (dom_info=0x7fffffffe4f0) at xl_cmdimpl.c:1432 > > > #1 0x000000000040f938 in main_restore (argc=2, > > > argv=0x7fffffffe6b8) at xl_cmdimpl.c:2932 > > > #2 0x000000000040508b in main (argc=2, > > argv=0x7fffffffe6b8) > > > at xl.c:141 > > > (gdb) show args > > > Argument list to give program being debugged when it > > is > > > started is "-v restore ./xp101.save". > > > > > > > > > Actually, in the shell I could run `xl -v > > > restore ./xp101.save` successfully. > > > Did anyone encounter the same wired problem? > > > > > > > > > thanks in advance > > > > > > I need to dig into xl by gdb, could anyone kind-hearted > > help me get > > > through this problem... > > > > > > You need to be more patient -- you left around 80 minutes > > between your > > original mail and this ping. Remember that not everyone is > > awake and > > online at the same times as you are, nor can they be expected > > to drop > > everything to reply to your mail. > > > > I''m afraid that "gdb --args xl -v restore SAVED" works fine > > for me. > > > > I presume that running under gdb perturbs timings or memory > > layout etc > > to expose a previously latent bug. > > > > gdb has given you the line numbers of where the segmentation > > fault has > > occurred -- why don''t you investigate further? > > > > Ian. > > > > I investigated it and observed the local variables info, the line 1432 > > is not likely to cause SIGSEGV in theory because `xmalloc` works well > > and config_data\optdata_here\config_len was right. > > But after xmalloc, application received SIGSEGV while config_data had > > a valid value as (void *) 0x622e10. > > 1426 if (OPTDATA_LEFT) { > > 1427 fprintf(stderr, " Savefile contains xl domain config\n"); > > 1428 WITH_OPTDATA(4, { > > 1429 memcpy(u32buf.b, optdata_here, 4); > > 1430 config_len = u32buf.u32; > > 1431 }); > > 1432 WITH_OPTDATA(config_len, { > > 1433 config_data = xmalloc(config_len); > > 1434 memcpy(config_data, optdata_here, config_len); > > Is "optdata_here" valid too? > > Have you checked the values of all the local variables with "print > <var>"? > > You can use the "disas" to print the assembly code and find the exact > instruction which caused the fault. That might give a clue as to which > variable was invalid. You might also need to add some prints of > variables etc to check that they are ok. > > Which version of Xen is this happening with? These line numbers do not > correspond at all to the head version (I have this stuff at line 169x) > and I don''t see anything recently which added 200+ lines to xl_cmdimpl.c > > > 1435 }); > > 1436 } > > > It''s Xen 4.1.2 (hypervisor and tools) official source distribution comesfrom http://xen.org/products/xen_source.html. This is the current scene at that time:(please ignore the beginning and trailing character ''x'', and the character ''>'' points to the current execution line!) xl_cmdimpl.c (source followed by disassembly window) x241 rc = 0; x x242 close(fd_lock); x x243 fd_lock = -1; x x244 x x245 return rc; x x246 } x x247 x x248 static void *xmalloc(size_t sz) { x x249 void *r; x x250 r = malloc(sz); x x251 if (!r) { fprintf(stderr,"xl: Unable to malloc %lu bytes.\n", x x252 (unsigned long)sz); exit(-ERROR_FAIL); } x x253 return r; x >x254 } x x255 x x256 static void *xrealloc(void *ptr, size_t sz) { x x0x405619 <xmalloc+52> mov 0x8(%rsp),%rdx x x0x40561e <xmalloc+57> mov %rcx,%rsi x x0x405621 <xmalloc+60> mov %rax,%rdi x x0x405624 <xmalloc+63> mov $0x0,%eax x x0x405629 <xmalloc+68> callq 0x404a00 <fprintf@plt> x x0x40562e <xmalloc+73> mov $0x3,%edi x x0x405633 <xmalloc+78> callq 0x4040a0 <exit@plt> x x0x405638 <xmalloc+83> mov 0x18(%rsp),%rax x >x0x40563d <xmalloc+88> add $0x28,%rsp x x0x405641 <xmalloc+92> retq After that, press (dgb)`s`, I got echo ''Program received signal SIGSEGV, Segmentation fault.'' The var sz=916, what happend? I have no idea because of not clear understanding about disam... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel