Zoltan HERPAI
2008-Aug-03 19:20 UTC
[Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper error
Hi, I''ve ran into a strange message when tried to test Xen 3.2.1 dom0 with 2.6.18.8-xen kernel (cloned from the mercurial repository as README said). Error as follows: [ 1.623431] BUG: warning at /usr/src/xen-3.2.1/linux-2.6.18-xen.hg/drivers/xen/core/pci.c:28/pci_bus_probe_wrapper() [ 1.623529] [ 1.623529] Call Trace: [ 1.623665] [<ffffffff8036f84a>] pci_bus_probe_wrapper+0x10b/0x114 [ 1.623742] [<ffffffff80322857>] pci_match_device+0x13/0xac [ 1.623818] [<ffffffff8035dd90>] driver_probe_device+0x52/0xa2 [ 1.623893] [<ffffffff8035de45>] __driver_attach+0x0/0x9b [ 1.623968] [<ffffffff8035de95>] __driver_attach+0x50/0x9b [ 1.624043] [<ffffffff8035de45>] __driver_attach+0x0/0x9b [ 1.628899] [<ffffffff8035d811>] bus_for_each_dev+0x43/0x6e [ 1.628976] [<ffffffff8035d45f>] bus_add_driver+0x73/0x122 [ 1.629052] [<ffffffff80322af6>] __pci_register_driver+0x5a/0x80 [ 1.629128] [<ffffffff805a3b6a>] ide_scan_pcibus+0x8b/0x9e [ 1.629203] [<ffffffff805a3a74>] ide_init+0x58/0x6b [ 1.629277] [<ffffffff80266e02>] init+0x150/0x34c [ 1.629352] [<ffffffff8025fcc8>] child_rip+0xa/0x12 [ 1.629427] [<ffffffff8032c499>] acpi_ds_init_one_object+0x0/0x80 [ 1.629504] [<ffffffff80266cb2>] init+0x0/0x34c [ 1.629578] [<ffffffff8025fcbe>] child_rip+0x0/0x12 [ 1.629651] The error appears many times in the bootlog, with the same offsets. I thought that this has something to do with the PCI backend stuff, so I tried setting in and out the configuration options regarding it, but the warning message stayed. I''m running Ubuntu 8.04.1 on an Asus M2N-E mainboard, latest BIOS, 64-bit userland, I can send the kernel configuration if someone is interested. Google says no results on this, that''s why I''m asking if anyone has seen this yet, or could this affect my HVM and paravirt domUs, if I don''t use PCI passthru or virtual PCI on them. Regards, Zoltan HERPAI _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Scott Garron
2008-Aug-04 17:48 UTC
Re: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper error
Zoltan HERPAI wrote:> I''ve ran into a strange message when tried to test Xen 3.2.1 dom0 > with 2.6.18.8-xen kernel> [ ... snip ... ]> [ 1.623431] BUG: warning at > ...linux-2.6.18-xen.hg/drivers/xen/core/pci.c:28/pci_bus_probe_wrapper()> [ ... snip ... ] > I''m running Ubuntu 8.04.1 on an Asus M2N-E mainboard, latest BIOS, > 64-bit userland I''ve also wrestled with this issue for some 36 hours or so. I''m running Debian testing (lenny/sid) on a Supermicro X7DBE+ motherboard (Intel 5000P chipset). It currently has a single CPU, Quad-core Xeon E5345 (2.33GHz), 4GB RAM 64-bit Userland consists of gcc-4.3.1-2_amd64 (x86_64-linux-gnu target, posix thread model) and libc6-2.7-10_amd64 In my case, the machine gets partway through the init process, and while starting a few of the more involved network services, such as bind9 or apache2, the kernel panics and the machine halts (crash). While attempting to figure out why it was doing that, I tried reverting back to the previous version that I had been running. Just running ./install.sh from dist in that tree was enough to get the machine to boot with a xen-enabled kernel, but because I had done an aptitude dist-upgrade, none of the Xen utilities were working (xend start, xm list, etc). I cloned the older build tree and did a re-compile with the latest versions of the python and libc dev libraries. That yielded a similar result as the Xen 3.2.1 compile: During boot, the kernel would complain about the pci probe and then in the middle of the init process, it would crash. The only way I got the machine back to a working order was to install the version of the kernel (2.6.18-xen) and Xen (3.0, changeset 15521) that I had compiled with earlier gcc and libraries (back in July, 2007), and manually cherry pick the install from the dist/install/usr/lib64/python/xen directory on the freshly compiled copy of that same build tree. It''s running again, but my net result was just a dist-upgrade. I''m not running a newer kernel or Xen, which is what I had set out to do in the first place. Anyway, the point I''m trying to make is that because a fresh compile of my old build tree, a build tree that previously worked, yields the same crash result, it seems to be somehow related to the version of gcc or development libraries with which I used to compile it. The two "Oops"''s I get are: BUG: warning at /usr/src/linux-2.6.18-xen.hg/drivers/xen/core/pci.c:28/pci_bus_probe_wrapper() Call Trace: [<ffffffff803529a1>] pci_bus_probe_wrapper+0x10b/0x116 [<ffffffff802f9485>] pci_match_device+0x13/0xb9 [<ffffffff80349a11>] driver_probe_device+0x52/0xa4 [<ffffffff80349ad0>] __driver_attach+0x6d/0xa7 [<ffffffff80349a63>] __driver_attach+0x0/0xa7 [<ffffffff8034941e>] bus_for_each_dev+0x43/0x77 [<ffffffff80348ecd>] bus_add_driver+0x73/0x123 [<ffffffff802f972b>] __pci_register_driver+0x4e/0x6f [<ffffffff80512e4e>] ide_scan_pcibus+0x8b/0x9e [<ffffffff8051266c>] ide_init+0x58/0x75 [<ffffffff8020715f>] init+0x138/0x3c9 [<ffffffff8020b080>] child_rip+0xa/0x12 [<ffffffff803128ca>] acpi_ds_init_one_object+0x0/0x82 [<ffffffff80207027>] init+0x0/0x3c9 [<ffffffff8020b076>] child_rip+0x0/0x12 --- and: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200 PGD 19a2d067 PUD 19a2e067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: video button ac battery ppp_deflate zlib_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ipt_REDIRECT xt_tcpudp xt_multiport iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 reiserfs nls_iso8859_1 nls_cp437 vfat fat serio_raw i2c_i801 intel_rng pcspkr i2c_core tsdev ext3 jbd dm_mirror dm_snapshot dm_mod sd_mod usb_storage sg sr_mod cdrom usbhid 3w_9xxx 3c59x e1000 mii floppy ehci_hcd ata_piix libata scsi_mod uhci_hcd usbcore thermal processor fan Pid: 2964, comm: named Not tainted 2.6.18.8-xen #1 RIP: e030:[<ffffffff88214114>] [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200 RSP: e02b:ffff880019a85e38 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000008000 RDX: 0000000000000000 RSI: 0000000000008000 RDI: 0000000000008000 RBP: 000000000000001c R08: 000000000000ee48 R09: 000000000000807f R10: 0000000000000008 R11: 0000000000000246 R12: ffff88001b71c3c0 R13: ffff880019a85ec8 R14: 000000000000001c R15: 0000000000000000 FS: 00002b17d2a5f6e0(0063) GS:ffffffff804d9000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process named (pid: 2964, threadinfo ffff880019a84000, task ffff88001f4c1100) Stack: 0000000000000000 000000000000001c ffff88001b71c3c0 ffffffff88201a64 0000000000000004 ffffffff80397979 ffff88001b71c3c0 ffff880019a85ed0 0000000000000000 ffff88001b71c698 0000000019a85f54 ffff880019341400 Call Trace: [<ffffffff88201a64>] :ipv6:inet6_bind+0x1e6/0x2a6 [<ffffffff80397979>] sock_getsockopt+0x2d8/0x2fa [<ffffffff8039554b>] sys_bind+0x76/0xa6 [<ffffffff88211256>] :ipv6:ipv6_setsockopt+0x3a/0x84 [<ffffffff80394ad7>] sys_setsockopt+0xa5/0xb7 [<ffffffff8020a644>] system_call+0x68/0x6d [<ffffffff8020a5dc>] system_call+0x0/0x6d Code: 48 8b 12 0f 18 0a ff c0 3d fe 7f 00 00 7e f1 48 ff c7 44 39 RIP [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200 RSP <ffff880019a85e38> CR2: 0000000000000000 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. -- Scott Garron _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Zoltan HERPAI
2008-Aug-05 19:38 UTC
Re: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper error
Scott Garron wrote:> Zoltan HERPAI wrote: > > > I''m running Ubuntu 8.04.1 on an Asus M2N-E mainboard, latest BIOS, > > 64-bit userland > > > I''ve also wrestled with this issue for some 36 hours or so. I''m > running Debian testing (lenny/sid) on a Supermicro X7DBE+ motherboard > (Intel 5000P chipset). It currently has a single CPU, Quad-core Xeon > E5345 (2.33GHz), 4GB RAM > > 64-bit Userland consists of gcc-4.3.1-2_amd64 (x86_64-linux-gnu > target, posix thread model) and libc6-2.7-10_amd64 > > In my case, the machine gets partway through the init process, > and while starting a few of the more involved network services, such > as bind9 or apache2, the kernel panics and the machine halts (crash). > > While attempting to figure out why it was doing that, I tried > reverting back to the previous version that I had been running. Just > running ./install.sh from dist in that tree was enough to get the > machine to boot with a xen-enabled kernel, but because I had done an > aptitude dist-upgrade, none of the Xen utilities were working (xend > start, xm list, etc). I cloned the older build tree and did a > re-compile with the latest versions of the python and libc dev > libraries. That yielded a similar result as the Xen 3.2.1 compile: > During boot, the kernel would complain about the pci probe and then in > the middle of the init process, it would crash. > > The only way I got the machine back to a working order was to > install the version of the kernel (2.6.18-xen) and Xen (3.0, changeset > 15521) that I had compiled with earlier gcc and libraries (back in > July, 2007), and manually cherry pick the install from the > dist/install/usr/lib64/python/xen directory on the freshly compiled > copy of that same build tree. It''s running again, but my net result > was just a dist-upgrade. I''m not running a newer kernel or Xen, which > is what I had set out to do in the first place. > > Anyway, the point I''m trying to make is that because a fresh > compile of my old build tree, a build tree that previously worked, > yields the same crash result, it seems to be somehow related to the > version of gcc or development libraries with which I used to compile it. > > The two "Oops"''s I get are: > > BUG: warning at > /usr/src/linux-2.6.18-xen.hg/drivers/xen/core/pci.c:28/pci_bus_probe_wrapper() >[...]> --- and: > > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: > [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200 > PGD 19a2d067 PUD 19a2e067 PMD 0 > Oops: 0000 [1] SMP > CPU 0 > Modules linked in: video button ac battery ppp_deflate zlib_deflate > bsd_comp ppp_async crc_ccitt ppp_generic slhc ipt_REDIRECT xt_tcpudp > xt_multiport iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter > ip_tables x_tables ipv6 reiserfs nls_iso8859_1 nls_cp437 vfat fat > serio_raw i2c_i801 intel_rng pcspkr i2c_core tsdev ext3 jbd dm_mirror > dm_snapshot dm_mod sd_mod usb_storage sg sr_mod cdrom usbhid 3w_9xxx > 3c59x e1000 mii floppy ehci_hcd ata_piix libata scsi_mod uhci_hcd > usbcore thermal processor fan > Pid: 2964, comm: named Not tainted 2.6.18.8-xen #1 > RIP: e030:[<ffffffff88214114>] [<ffffffff88214114>] > :ipv6:udp_v6_get_port+0x81/0x200 > RSP: e02b:ffff880019a85e38 EFLAGS: 00010297 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000008000 > RDX: 0000000000000000 RSI: 0000000000008000 RDI: 0000000000008000 > RBP: 000000000000001c R08: 000000000000ee48 R09: 000000000000807f > R10: 0000000000000008 R11: 0000000000000246 R12: ffff88001b71c3c0 > R13: ffff880019a85ec8 R14: 000000000000001c R15: 0000000000000000 > FS: 00002b17d2a5f6e0(0063) GS:ffffffff804d9000(0000) > knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > Process named (pid: 2964, threadinfo ffff880019a84000, task > ffff88001f4c1100) > Stack: 0000000000000000 000000000000001c ffff88001b71c3c0 > ffffffff88201a64 > 0000000000000004 ffffffff80397979 ffff88001b71c3c0 ffff880019a85ed0 > 0000000000000000 ffff88001b71c698 0000000019a85f54 ffff880019341400 > Call Trace: > [<ffffffff88201a64>] :ipv6:inet6_bind+0x1e6/0x2a6 > [<ffffffff80397979>] sock_getsockopt+0x2d8/0x2fa > [<ffffffff8039554b>] sys_bind+0x76/0xa6 > [<ffffffff88211256>] :ipv6:ipv6_setsockopt+0x3a/0x84 > [<ffffffff80394ad7>] sys_setsockopt+0xa5/0xb7 > [<ffffffff8020a644>] system_call+0x68/0x6d > [<ffffffff8020a5dc>] system_call+0x0/0x6d > > > Code: 48 8b 12 0f 18 0a ff c0 3d fe 7f 00 00 7e f1 48 ff c7 44 39 > RIP [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200 > RSP <ffff880019a85e38> > CR2: 0000000000000000 > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! > (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting.Thanks for the detailed infos. So it seems we''ve ran into a reproducible bug, even if I''m luckier to have at least the dom0 working - I was able to get guests running, both paravirt and HVM, stresstested them a bit, they were running fine. During your session, were you playing around with BIOS version, or were you experiencing this on another similar box if you have one? What could be the solution if I want to stay with 3.2.1? Running forward to 3.2.2 doesn''t seem to be a likely option. Regards, Zoltan HERPAI _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Hochholdinger
2008-Aug-15 20:00 UTC
Re: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper error
Hi, Am Montag, 4. August 2008 19:48 schrieb Scott Garron:> Zoltan HERPAI wrote:[..]> Anyway, the point I''m trying to make is that because a fresh > compile of my old build tree, a build tree that previously worked, > yields the same crash result, it seems to be somehow related to the > version of gcc or development libraries with which I used to compile it.i have the same problem. Compiling 2.6.18.8 dom0 with gcc and libs from lenny produces a misfunctioning dom0 kernel which reboots my machine. I made me a chroot etch environment for building the dom0 kernel and this one works as expected. I assume some misfunctioning optimisations from gcc!? To be clear, i tested it twice on the same machine. First time i was testing a lot of things with lenny. Second time a made a "clean room" installation and documented each step. Each time I made the exactly same steps to compile dom0, one time in lenny and one time in (chrooted) etch (on the same hardware, on the same kernel). I''ve two Intel Xeon Quad Core E5430 @ 2.66GHz in this machine. If it helps, i could also catch some kernel messages of the crashing kernel. -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users