Mike C.
2013-Nov-04 22:13 UTC
Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On 31.10.13 04:34, Miguel Clara wrote:> I was trying to get a core-dump for a domU with xl and got this error: > > # xl dump-core 20 test.core > Memory fault > > GDB shows this: > > a# gdb xl xl.core > GNU gdb (GDB) 7.3.1 > Copyright (C) 2011 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later<http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64--netbsd". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /usr/sbin/xl...done. > [New process 1] > Core was generated by `xl''. > Program terminated with signal 11, Segmentation fault. > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, > dump_rtn=0x7f7ff700632c<local_file_dump>) > at xc_core.c:860 > 860 xc_core.c: No such file or directory. > in xc_core.c > > > (gdb) backtrace > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, > dump_rtn=0x7f7ff700632c<local_file_dump>) > at xc_core.c:860 > #1 0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800, > domid=20, corename=0x7f7ffffffe78 "test.core") at xc_core.c:983 > #2 0x00007f7ff74117b3 in libxl_domain_core_dump (ctx=0x7f7ff7b03200, > domid=20, filename=0x7f7ffffffe78 "test.core", ao_how=<optimized out>) > at libxl.c:808 > #3 0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe78 > "test.core", domain_spec=<optimized out>) at xl_cmdimpl.c:3301 > #4 main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca0) at > xl_cmdimpl.c:3642 > #5 0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca0) at xl.c:267 >I think, xen-devel is the right list for this. It''s ok to cross-post to keep NetBSD people involved for answering NetBSD specific questions from the Xen/Citrix people that would be not answered, otherwise. Christoph
Ian Campbell
2013-Nov-07 10:29 UTC
Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:> On 31.10.13 04:34, Miguel Clara wrote: > > > I was trying to get a core-dump for a domU with xl and got this error: > > > > # xl dump-core 20 test.core > > Memory fault > > > > GDB shows this: > > > > a# gdb xl xl.core > > GNU gdb (GDB) 7.3.1 > > Copyright (C) 2011 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later<http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > and "show warranty" for details. > > This GDB was configured as "x86_64--netbsd". > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>... > > Reading symbols from /usr/sbin/xl...done. > > [New process 1] > > Core was generated by `xl''. > > Program terminated with signal 11, Segmentation fault. > > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback > > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, > > dump_rtn=0x7f7ff700632c<local_file_dump>) > > at xc_core.c:860We need to know your version of Xen (ideally the changeset id) to make sense of these line numbers. Line 860 of this file doesn''t look plausible for unstable or 4.3.0. Could be 4.2 I guess?> > 860 xc_core.c: No such file or directory. > > in xc_core.c > > > > > > (gdb) backtrace > > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback > > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, > > dump_rtn=0x7f7ff700632c<local_file_dump>) > > at xc_core.c:860 > > #1 0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800, > > domid=20, corename=0x7f7ffffffe78 "test.core") at xc_core.c:983 > > #2 0x00007f7ff74117b3 in libxl_domain_core_dump (ctx=0x7f7ff7b03200, > > domid=20, filename=0x7f7ffffffe78 "test.core", ao_how=<optimized out>) > > at libxl.c:808 > > #3 0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe78 > > "test.core", domain_spec=<optimized out>) at xl_cmdimpl.c:3301 > > #4 main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca0) at > > xl_cmdimpl.c:3642 > > #5 0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca0) at xl.c:267 > > > > I think, xen-devel is the right list for this. > It''s ok to cross-post to keep NetBSD people involved for answering > NetBSD specific questions from the Xen/Citrix people that would > be not answered, otherwise. > > Christoph > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Miguel C.
2013-Nov-07 21:04 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
yes its 4.2 from pkgsrc. how can i get the changeset id? Ian Campbell <Ian.Campbell@citrix.com> wrote:>On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote: >> On 31.10.13 04:34, Miguel Clara wrote: >> >> > I was trying to get a core-dump for a domU with xl and got this >error: >> > >> > # xl dump-core 20 test.core >> > Memory fault >> > >> > GDB shows this: >> > >> > a# gdb xl xl.core >> > GNU gdb (GDB) 7.3.1 >> > Copyright (C) 2011 Free Software Foundation, Inc. >> > License GPLv3+: GNU GPL version 3 or >later<http://gnu.org/licenses/gpl.html> >> > This is free software: you are free to change and redistribute it. >> > There is NO WARRANTY, to the extent permitted by law. Type "show >copying" >> > and "show warranty" for details. >> > This GDB was configured as "x86_64--netbsd". >> > For bug reporting instructions, please see: >> > <http://www.gnu.org/software/gdb/bugs/>... >> > Reading symbols from /usr/sbin/xl...done. >> > [New process 1] >> > Core was generated by `xl''. >> > Program terminated with signal 11, Segmentation fault. >> > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, >> > dump_rtn=0x7f7ff700632c<local_file_dump>) >> > at xc_core.c:860 > >We need to know your version of Xen (ideally the changeset id) to make >sense of these line numbers. Line 860 of this file doesn''t look >plausible for unstable or 4.3.0. Could be 4.2 I guess? > >> > 860 xc_core.c: No such file or directory. >> > in xc_core.c >> > >> > >> > (gdb) backtrace >> > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, >> > dump_rtn=0x7f7ff700632c<local_file_dump>) >> > at xc_core.c:860 >> > #1 0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800, >> > domid=20, corename=0x7f7ffffffe78 "test.core") at xc_core.c:983 >> > #2 0x00007f7ff74117b3 in libxl_domain_core_dump >(ctx=0x7f7ff7b03200, >> > domid=20, filename=0x7f7ffffffe78 "test.core", ao_how=<optimized >out>) >> > at libxl.c:808 >> > #3 0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe78 >> > "test.core", domain_spec=<optimized out>) at xl_cmdimpl.c:3301 >> > #4 main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca0) at >> > xl_cmdimpl.c:3642 >> > #5 0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca0) at >xl.c:267 >> > >> >> I think, xen-devel is the right list for this. >> It''s ok to cross-post to keep NetBSD people involved for answering >> NetBSD specific questions from the Xen/Citrix people that would >> be not answered, otherwise. >> >> Christoph >> >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Ian Campbell
2013-Nov-08 10:29 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote:> yes its 4.2 from pkgsrc.Thanks, that might be enough.> how can i get the changeset id?that''d be one for the port-xen folks I think. It might be printed in the xen dmesg, but that depends on how it was built and 4.2 may be too old to have such functionalilty.> Ian Campbell <Ian.Campbell@citrix.com> wrote: > >On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote: > >> On 31.10.13 04:34, Miguel Clara wrote: > >> > >> > I was trying to get a core-dump for a domU with xl and got this > >error: > >> > > >> > # xl dump-core 20 test.core > >> > Memory fault > >> > > >> > GDB shows this: > >> > > >> > a# gdb xl xl.core > >> > GNU gdb (GDB) 7.3.1 > >> > Copyright (C) 2011 Free Software Foundation, Inc. > >> > License GPLv3+: GNU GPL version 3 or > >later<http://gnu.org/licenses/gpl.html> > >> > This is free software: you are free to change and redistribute it. > >> > There is NO WARRANTY, to the extent permitted by law. Type "show > >copying" > >> > and "show warranty" for details. > >> > This GDB was configured as "x86_64--netbsd". > >> > For bug reporting instructions, please see: > >> > <http://www.gnu.org/software/gdb/bugs/>... > >> > Reading symbols from /usr/sbin/xl...done. > >> > [New process 1] > >> > Core was generated by `xl''. > >> > Program terminated with signal 11, Segmentation fault. > >> > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback > >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, > >> > dump_rtn=0x7f7ff700632c<local_file_dump>) > >> > at xc_core.c:860 > >In 4.2.0 this corresponds to memcpy(dump_mem, vaddr, PAGE_SIZE); which is a plausible source of a segfault. xc_core.c has only changed in immaterial ways (although ways which caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely that this bug is still present. Can you tell via gdb what the faulting address was and whether it corresponds to dump_mem or vaddr? gdb''s "info locals" might give you at least some of that? Also you can use disas to identify the precise instruction at 0x00007f7ff7007b45, which will show you the registers which might lead you to the faulting address. vaddr is certainly not NULL, it''s checked right before. It could be non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD, but that is surely used elsewhere? I suppose it might have mapped an MFN which was either invalid (or became invalid, but your bug is deterministic, right?. IIRC NetBSD''s privcmd foreign mappings are populated lazily and not immediately like on Linux? If that were the case (and I''m only vaguely aware of how NetBSD operates) then it would be plausible that xc_map_foreign_range would succeed but that a subsequent attempt to access the region would fault? dump_mem isn''t NULL, it''s a pointer into the dump_mem_start array which has a check for failure when it is allocated. Since dump_mem is just normal process memory and vaddr is a magic foreign mapping I''d be inclined to suspect vaddr was not right in some way... Does "xl -vvv core-dump" give any useful additional logging? Unfortunately I don''t think anyone has done valgrind support for debugging processes which use Xen hypercalls for *BSD (if you were very keen you could probably follow what was done for Linux http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/ and wire up the BSD privcmd ioctl to the generic Xen hypercall code I added) I fear this bug is going to take someone on the ground with a NetBSD system and the ability to dive into BSD kernel internals to get to the bottom of... Ian.
John Nemeth
2013-Nov-08 17:20 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On Nov 8, 10:29am, Ian Campbell wrote: } On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote: } > yes its 4.2 from pkgsrc. } } Thanks, that might be enough. More specifically, it''s 4.2.3. } > how can i get the changeset id? } } that''d be one for the port-xen folks I think. It might be printed in the } xen dmesg, but that depends on how it was built and 4.2 may be too old } to have such functionalilty. xl dmesg says: (XEN) Latest ChangeSet: unavailable The package was built using this tarball: http://bits.xensource.com/oss-xen/release/4.2.3/xen-4.2.3.tar.gz And, just for reference, this is the info we have on the tarball: SHA1 (xen-4.2.3.tar.gz) = 7c72e1aa870cc938afdc50bd9f2d879118aa8b99 RMD160 (xen-4.2.3.tar.gz) = da0fbb7bbc0796bd83c223f7d21015ce0d4c8553 Size (xen-4.2.3.tar.gz) = 15613235 bytes } > Ian Campbell <Ian.Campbell@citrix.com> wrote: } > >On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote: } > >> On 31.10.13 04:34, Miguel Clara wrote: } > >> } > >> > I was trying to get a core-dump for a domU with xl and got this } > >error: } > >> > } > >> > # xl dump-core 20 test.core } > >> > Memory fault } > >> > } > >> > GDB shows this: } > >> > } > >> > a# gdb xl xl.core } > >> > GNU gdb (GDB) 7.3.1 } > >> > Copyright (C) 2011 Free Software Foundation, Inc. } > >> > License GPLv3+: GNU GPL version 3 or } > >later<http://gnu.org/licenses/gpl.html> } > >> > This is free software: you are free to change and redistribute it. } > >> > There is NO WARRANTY, to the extent permitted by law. Type "show } > >copying" } > >> > and "show warranty" for details. } > >> > This GDB was configured as "x86_64--netbsd". } > >> > For bug reporting instructions, please see: } > >> > <http://www.gnu.org/software/gdb/bugs/>... } > >> > Reading symbols from /usr/sbin/xl...done. } > >> > [New process 1] } > >> > Core was generated by `xl''. } > >> > Program terminated with signal 11, Segmentation fault. } > >> > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback } > >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, } > >> > dump_rtn=0x7f7ff700632c<local_file_dump>) } > >> > at xc_core.c:860 } > > } } In 4.2.0 this corresponds to } memcpy(dump_mem, vaddr, PAGE_SIZE); } which is a plausible source of a segfault. } } xc_core.c has only changed in immaterial ways (although ways which } caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely } that this bug is still present. } } Can you tell via gdb what the faulting address was and whether it } corresponds to dump_mem or vaddr? gdb''s "info locals" might give you at } least some of that? Also you can use disas to identify the precise } instruction at 0x00007f7ff7007b45, which will show you the registers } which might lead you to the faulting address. } } vaddr is certainly not NULL, it''s checked right before. It could be } non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD, } but that is surely used elsewhere? I suppose it might have mapped an MFN } which was either invalid (or became invalid, but your bug is } deterministic, right?. IIRC NetBSD''s privcmd foreign mappings are } populated lazily and not immediately like on Linux? If that were the } case (and I''m only vaguely aware of how NetBSD operates) then it would } be plausible that xc_map_foreign_range would succeed but that a } subsequent attempt to access the region would fault? } } dump_mem isn''t NULL, it''s a pointer into the dump_mem_start array which } has a check for failure when it is allocated. Since dump_mem is just } normal process memory and vaddr is a magic foreign mapping I''d be } inclined to suspect vaddr was not right in some way... } } Does "xl -vvv core-dump" give any useful additional logging? } } Unfortunately I don''t think anyone has done valgrind support for } debugging processes which use Xen hypercalls for *BSD (if you were very } keen you could probably follow what was done for Linux } http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/ } and wire up the BSD privcmd ioctl to the generic Xen hypercall code I } added) } } I fear this bug is going to take someone on the ground with a NetBSD } system and the ability to dive into BSD kernel internals to get to the } bottom of... } } Ian. } }-- End of excerpt from Ian Campbell
Ian Campbell
2013-Nov-12 09:35 UTC
Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On Fri, 2013-11-08 at 09:20 -0800, John Nemeth wrote:> On Nov 8, 10:29am, Ian Campbell wrote: > } On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote: > } > yes its 4.2 from pkgsrc. > } > } Thanks, that might be enough. > > More specifically, it''s 4.2.3.Thanks. This seems to confirm that it is the memcpy I pointed to below. I''m afraid that any further progress here is going to require input from you on the other questions I asked, and perhaps from someone who understands how the NetBSD kernel (in particular the privcmd driver) operates. Ian.> > } > how can i get the changeset id? > } > } that''d be one for the port-xen folks I think. It might be printed in the > } xen dmesg, but that depends on how it was built and 4.2 may be too old > } to have such functionalilty. > > xl dmesg says: > > (XEN) Latest ChangeSet: unavailable > > The package was built using this tarball: > > http://bits.xensource.com/oss-xen/release/4.2.3/xen-4.2.3.tar.gz > > And, just for reference, this is the info we have on the tarball: > > SHA1 (xen-4.2.3.tar.gz) = 7c72e1aa870cc938afdc50bd9f2d879118aa8b99 > RMD160 (xen-4.2.3.tar.gz) = da0fbb7bbc0796bd83c223f7d21015ce0d4c8553 > Size (xen-4.2.3.tar.gz) = 15613235 bytes > > } > Ian Campbell <Ian.Campbell@citrix.com> wrote: > } > >On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote: > } > >> On 31.10.13 04:34, Miguel Clara wrote: > } > >> > } > >> > I was trying to get a core-dump for a domU with xl and got this > } > >error: > } > >> > > } > >> > # xl dump-core 20 test.core > } > >> > Memory fault > } > >> > > } > >> > GDB shows this: > } > >> > > } > >> > a# gdb xl xl.core > } > >> > GNU gdb (GDB) 7.3.1 > } > >> > Copyright (C) 2011 Free Software Foundation, Inc. > } > >> > License GPLv3+: GNU GPL version 3 or > } > >later<http://gnu.org/licenses/gpl.html> > } > >> > This is free software: you are free to change and redistribute it. > } > >> > There is NO WARRANTY, to the extent permitted by law. Type "show > } > >copying" > } > >> > and "show warranty" for details. > } > >> > This GDB was configured as "x86_64--netbsd". > } > >> > For bug reporting instructions, please see: > } > >> > <http://www.gnu.org/software/gdb/bugs/>... > } > >> > Reading symbols from /usr/sbin/xl...done. > } > >> > [New process 1] > } > >> > Core was generated by `xl''. > } > >> > Program terminated with signal 11, Segmentation fault. > } > >> > #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback > } > >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, > } > >> > dump_rtn=0x7f7ff700632c<local_file_dump>) > } > >> > at xc_core.c:860 > } > > > } > } In 4.2.0 this corresponds to > } memcpy(dump_mem, vaddr, PAGE_SIZE); > } which is a plausible source of a segfault. > } > } xc_core.c has only changed in immaterial ways (although ways which > } caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely > } that this bug is still present. > } > } Can you tell via gdb what the faulting address was and whether it > } corresponds to dump_mem or vaddr? gdb''s "info locals" might give you at > } least some of that? Also you can use disas to identify the precise > } instruction at 0x00007f7ff7007b45, which will show you the registers > } which might lead you to the faulting address. > } > } vaddr is certainly not NULL, it''s checked right before. It could be > } non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD, > } but that is surely used elsewhere? I suppose it might have mapped an MFN > } which was either invalid (or became invalid, but your bug is > } deterministic, right?. IIRC NetBSD''s privcmd foreign mappings are > } populated lazily and not immediately like on Linux? If that were the > } case (and I''m only vaguely aware of how NetBSD operates) then it would > } be plausible that xc_map_foreign_range would succeed but that a > } subsequent attempt to access the region would fault? > } > } dump_mem isn''t NULL, it''s a pointer into the dump_mem_start array which > } has a check for failure when it is allocated. Since dump_mem is just > } normal process memory and vaddr is a magic foreign mapping I''d be > } inclined to suspect vaddr was not right in some way... > } > } Does "xl -vvv core-dump" give any useful additional logging? > } > } Unfortunately I don''t think anyone has done valgrind support for > } debugging processes which use Xen hypercalls for *BSD (if you were very > } keen you could probably follow what was done for Linux > } http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/ > } and wire up the BSD privcmd ioctl to the generic Xen hypercall code I > } added) > } > } I fear this bug is going to take someone on the ground with a NetBSD > } system and the ability to dive into BSD kernel internals to get to the > } bottom of... > } > } Ian. > } > }-- End of excerpt from Ian Campbell > >
Roger Pau Monné
2013-Nov-12 09:48 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On 08/11/13 11:29, Ian Campbell wrote:> On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote: >> yes its 4.2 from pkgsrc. > > Thanks, that might be enough. > >> how can i get the changeset id? > > that''d be one for the port-xen folks I think. It might be printed in the > xen dmesg, but that depends on how it was built and 4.2 may be too old > to have such functionalilty. > >> Ian Campbell <Ian.Campbell@citrix.com> wrote: >>> On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote: >>>> On 31.10.13 04:34, Miguel Clara wrote: >>>> >>>>> I was trying to get a core-dump for a domU with xl and got this >>> error: >>>>> >>>>> # xl dump-core 20 test.core >>>>> Memory fault >>>>> >>>>> GDB shows this: >>>>> >>>>> a# gdb xl xl.core >>>>> GNU gdb (GDB) 7.3.1 >>>>> Copyright (C) 2011 Free Software Foundation, Inc. >>>>> License GPLv3+: GNU GPL version 3 or >>> later<http://gnu.org/licenses/gpl.html> >>>>> This is free software: you are free to change and redistribute it. >>>>> There is NO WARRANTY, to the extent permitted by law. Type "show >>> copying" >>>>> and "show warranty" for details. >>>>> This GDB was configured as "x86_64--netbsd". >>>>> For bug reporting instructions, please see: >>>>> <http://www.gnu.org/software/gdb/bugs/>... >>>>> Reading symbols from /usr/sbin/xl...done. >>>>> [New process 1] >>>>> Core was generated by `xl''. >>>>> Program terminated with signal 11, Segmentation fault. >>>>> #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback >>>>> (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0, >>>>> dump_rtn=0x7f7ff700632c<local_file_dump>) >>>>> at xc_core.c:860 >>> > > In 4.2.0 this corresponds to > memcpy(dump_mem, vaddr, PAGE_SIZE); > which is a plausible source of a segfault. > > xc_core.c has only changed in immaterial ways (although ways which > caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely > that this bug is still present. > > Can you tell via gdb what the faulting address was and whether it > corresponds to dump_mem or vaddr? gdb''s "info locals" might give you at > least some of that? Also you can use disas to identify the precise > instruction at 0x00007f7ff7007b45, which will show you the registers > which might lead you to the faulting address. > > vaddr is certainly not NULL, it''s checked right before. It could be > non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD, > but that is surely used elsewhere? I suppose it might have mapped an MFN > which was either invalid (or became invalid, but your bug is > deterministic, right?. IIRC NetBSD''s privcmd foreign mappings are > populated lazily and not immediately like on Linux? If that were the > case (and I''m only vaguely aware of how NetBSD operates) then it would > be plausible that xc_map_foreign_range would succeed but that a > subsequent attempt to access the region would fault?Yes, NetBSD privcmd maps the region lazily (it does the actual map on the page fault handler for that region). I have not tested it, but could you give a try to the following patch: http://mail-index.netbsd.org/port-xen/2012/06/27/msg007464.html It''s quite old, but I expect there hasn''t been many changes in NetBSD privcmd recently. Roger.
Ian Campbell
2013-Nov-12 10:00 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On Tue, 2013-11-12 at 10:48 +0100, Roger Pau Monné wrote:> > vaddr is certainly not NULL, it''s checked right before. It could be > > non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD, > > but that is surely used elsewhere? I suppose it might have mapped an MFN > > which was either invalid (or became invalid, but your bug is > > deterministic, right?. IIRC NetBSD''s privcmd foreign mappings are > > populated lazily and not immediately like on Linux? If that were the > > case (and I''m only vaguely aware of how NetBSD operates) then it would > > be plausible that xc_map_foreign_range would succeed but that a > > subsequent attempt to access the region would fault? > > Yes, NetBSD privcmd maps the region lazily (it does the actual map on > the page fault handler for that region).Thanks for the confirmation. Would it be expected that a message would be logged to dom0''s dmesg if something went wrong here?> I have not tested it, but could > you give a try to the following patch: > > http://mail-index.netbsd.org/port-xen/2012/06/27/msg007464.html > > It''s quite old, but I expect there hasn''t been many changes in NetBSD > privcmd recently. > > Roger. >
Roger Pau Monné
2013-Nov-12 10:09 UTC
Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On 12/11/13 11:00, Ian Campbell wrote:> On Tue, 2013-11-12 at 10:48 +0100, Roger Pau Monné wrote: >> Yes, NetBSD privcmd maps the region lazily (it does the actual map on >> the page fault handler for that region). > > Thanks for the confirmation. Would it be expected that a message would > be logged to dom0's dmesg if something went wrong here?By doing a quick look at current NetBSD privcmd code I'm not sure a message is printed on all error cases, so it's possible that it just fails silently. You might get some messages from the hypervisor if compiled with debug=y, but I have not tried it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Miguel C.
2013-Nov-13 12:36 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
I have the xenkernel debug version but in this case you mean the tool right? I recompile xentools again with debug support pater today or tomorrow and give some more feedback. Thanks for following up on this so far. "Roger Pau Monné" <roger.pau@citrix.com> wrote:>On 12/11/13 11:00, Ian Campbell wrote: >> On Tue, 2013-11-12 at 10:48 +0100, Roger Pau Monné wrote: >>> Yes, NetBSD privcmd maps the region lazily (it does the actual map >on >>> the page fault handler for that region). >> >> Thanks for the confirmation. Would it be expected that a message >would >> be logged to dom0''s dmesg if something went wrong here? > >By doing a quick look at current NetBSD privcmd code I''m not sure a >message is printed on all error cases, so it''s possible that it just >fails silently. You might get some messages from the hypervisor if >compiled with debug=y, but I have not tried it.-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Roger Pau Monné
2013-Nov-13 12:39 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On 13/11/13 13:36, Miguel C. wrote:> I have the xenkernel debug version but in this case you mean the tool right? > > I recompile xentools again with debug support pater today or tomorrow and give some more feedback.I mean that you need to compile the hypervisor with debug=y. Have you tried to apply the patch on the link that I''ve posted to NetBSD source and rebuild the kernel?
Miguel C.
2013-Nov-13 17:59 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
not yet it, and it seems I wont have time today. I will try that tomorrow. thanks "Roger Pau Monné" <roger.pau@citrix.com> wrote:>On 13/11/13 13:36, Miguel C. wrote: >> I have the xenkernel debug version but in this case you mean the tool >right? >> >> I recompile xentools again with debug support pater today or tomorrow >and give some more feedback. > >I mean that you need to compile the hypervisor with debug=y. Have you >tried to apply the patch on the link that I''ve posted to NetBSD source >and rebuild the kernel?-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
James Harper
2013-Nov-13 21:31 UTC
Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
> > > > More specifically, it''s 4.2.3. > > Thanks. This seems to confirm that it is the memcpy I pointed to below. > > I''m afraid that any further progress here is going to require input from > you on the other questions I asked, and perhaps from someone who > understands how the NetBSD kernel (in particular the privcmd driver) > operates. >FWIW, the resulting core file appears to be the right size, and has the ELF header etc, but is missing the section strings. James
Mike C.
2013-Dec-03 18:14 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On 11/13/13 12:39, Roger Pau Monné wrote:> On 13/11/13 13:36, Miguel C. wrote: >> I have the xenkernel debug version but in this case you mean the tool right? >> >> I recompile xentools again with debug support pater today or tomorrow and give some more feedback. > > I mean that you need to compile the hypervisor with debug=y. Have you > tried to apply the patch on the link that I''ve posted to NetBSD source > and rebuild the kernel? >Hi, I''ve rebuilded with the patch + debug a=ynd tried the xl core dump again, I still get the same issue! It really seems to fail close to the end (at least judging for the size of the files) GDB seems to show similar output, not sure if the debug option should give more info?! (gdb) run Starting program: /usr/sbin/xl -vf dump-core w2k12 core.dump Program received signal SIGSEGV, Segmentation fault. [Switching to LWP 1] 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback (xch=0x7f7ff7b0d800, domid=32, args=0x7f7fffffdae0, dump_rtn=0x7f7ff700632c <local_file_dump>) at xc_core.c:860 860 xc_core.c: No such file or directory. in xc_core.c (gdb) backtrace #0 0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback (xch=0x7f7ff7b0d800, domid=32, args=0x7f7fffffdae0, dump_rtn=0x7f7ff700632c <local_file_dump>) at xc_core.c:860 #1 0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800, domid=32, corename=0x7f7ffffffe91 "core.dump") at xc_core.c:983 #2 0x00007f7ff74117b3 in libxl_domain_core_dump (ctx=0x7f7ff7b03200, domid=32, filename=0x7f7ffffffe91 "core.dump", ao_how=<optimized out>) at libxl.c:808 #3 0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe91 "core.dump", domain_spec=<optimized out>) at xl_cmdimpl.c:3301 #4 main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca8) at xl_cmdimpl.c:3642 #5 0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca8) at xl.c:267 (gdb)
James Harper
2013-Dec-10 08:21 UTC
Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
I''ve been working with Mike on this today. After he re-applied the patch (something must have gone wrong initially), an ioctl error is repeated constantly instead of SIGSEGV: xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal error I dumped out some of the variables though, and: nr_memory_map = 1 pfn_start = 0, pfn_end = 1048575 this equates to 4GB of pfn''s to be dumped on a vm with mem/maxmem = 256MB... is there code that skips empty pages? If not, that seems to be the explanation for the errors. James
James Harper
2013-Dec-10 09:27 UTC
RE: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
> > I''ve been working with Mike on this today. After he re-applied the patch > (something must have gone wrong initially), an ioctl error is repeated > constantly instead of SIGSEGV: > > xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal > error > > I dumped out some of the variables though, and: > > nr_memory_map = 1 > pfn_start = 0, pfn_end = 1048575 > > this equates to 4GB of pfn''s to be dumped on a vm with mem/maxmem > 256MB... is there code that skips empty pages? If not, that seems to be the > explanation for the errors. >A bit more info with a bit more debugging printf''s, and removing the perror in xc_map_foreign_range: nr_pages = 64472 nr_memory_map = 1 map_idx = 0 pfn_start = 0, pfn_end = 1048575 xc: info: j (63456) != nr_pages (64472) The resulting dump file is readable by my xen->windows dump converter, and the windows debugger doesn''t complain about the resulting windows dump file, so it seems to be working okay. James
Andrew Cooper
2013-Dec-10 10:41 UTC
Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
On 10/12/13 08:21, James Harper wrote:> I''ve been working with Mike on this today. After he re-applied the patch (something must have gone wrong initially), an ioctl error is repeated constantly instead of SIGSEGV: > > xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal error > > I dumped out some of the variables though, and: > > nr_memory_map = 1 > pfn_start = 0, pfn_end = 1048575 > > this equates to 4GB of pfn''s to be dumped on a vm with mem/maxmem = 256MB... is there code that skips empty pages? If not, that seems to be the explanation for the errors. > > Jamesxc_map_foreign_range is completely broken as far as errors go. The privcmd driver ends up doing: if ( HYPERVISOR_mmu_update(foo,bar) < 0 ) return -EFAULT; Your best bet here is intercepting this and finding the real error. privcmd (and evenchn and gnttab) devices are generally broken as far as errors go, because it is impossible to distinguish between a kernel error and a Xen error. In someones copious free time, (possibly mine if I ever get any) a brand new set of ioctls on each of the Xen devices would not go amis. ~Andrew
James Harper
2013-Dec-10 10:46 UTC
Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
> > On 10/12/13 08:21, James Harper wrote: > > I''ve been working with Mike on this today. After he re-applied the patch > (something must have gone wrong initially), an ioctl error is repeated > constantly instead of SIGSEGV: > > > > xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal > error > > > > I dumped out some of the variables though, and: > > > > nr_memory_map = 1 > > pfn_start = 0, pfn_end = 1048575 > > > > this equates to 4GB of pfn''s to be dumped on a vm with mem/maxmem > > 256MB... is there code that skips empty pages? If not, that seems to be the > > explanation for the errors. > > > > James > > xc_map_foreign_range is completely broken as far as errors go. > > The privcmd driver ends up doing: > > if ( HYPERVISOR_mmu_update(foo,bar) < 0 ) > return -EFAULT; > > Your best bet here is intercepting this and finding the real error. > > privcmd (and evenchn and gnttab) devices are generally broken as far as > errors go, because it is impossible to distinguish between a kernel > error and a Xen error. > > > In someones copious free time, (possibly mine if I ever get any) a brand > new set of ioctls on each of the Xen devices would not go amis. >I think that the core dump stuff just iterates over the whole memory range and skips anything that xc_map_foreign_range returns an error on. After applying the patch that caused the resulting vaddr to sigsegv, the only problem was that it logged an error when trying to map a page. Rmoving that perror is appears to be sufficient for now, although maybe it should only do it on certain errors... James
Possibly Parallel Threads
- [PATCH] Support cross-bitness guest when core-dumping
- [PATCH 0/5] dump-core take 2:
- Re: dumpcore changes -- [Xen-changelog] [xen-unstable] In this patch, the xc_domain_dumpcore_via_callback() in xc_core.c of
- Re: dom0 bootstrap for xenstore
- [PATCH] - xc_core.c/xenctrl.h - refactor slightly to allow user specified output routines