Luis Vinay
2006-Dec-07 15:04 UTC
[Xen-devel] domU''s crashing Dom0 (Xen + iSCS = timebomb)
I''m experimenting with xen + iscsi, and I founded that under heavy stress domU''s can crash entire system, I''ve reproduced this many many times. My system is like this Software: - iSCSI Enterprise Target v0.4.13 - RedHat AS4 update 4 64bit + Xen 3.0.3-0 Kernel 2.6.16.29 + Open iSCSI v2.0.730 (Initiator) - Bonnie++ v1.03a VM: - Debian 3.1r3 + Open iSCSI v2.0.730 (Initiator) - RedHat AS4 update 4 + Open iSCSI v2.0.730 (Initiator) Tests: Four instances of bonnie++ with root uid on the filesystem to be stressed hda (S.O., and swap) + hdb (stressed ext3 filesystem ) over iscsi, both debian and RH Result: crash hda (S.O., and swap) local + hdb (stressed ext3 filesystem ) over iscsi, both debian and RH Result: crash hda (S.O., and swap) + hdb (stressed ext3-writeback mode filesystem ) over iscsi, both debian and RH Result: crash hda (S.O., and swap) local + hdb (stressed ext3-writeback mode filesystem ) over iscsi, both debian and RH Result: crash hda (S.O., and swap) + hdb (stressed ext2 filesystem ) over iscsi, both debian and RH Result: crash hda (S.O., and swap) local + hdb (stressed ext2 filesystem ) over iscsi, both debian and RH Result: crash hda (S.O., and swap) + hdb (stressed xfs filesystem ) over iscsi, both debian and RH Result: crash hda (S.O., and swap) local + hdb (stressed xfs filesystem ) over iscsi, both debian and RH Result: crash Also tested: Xen 3.0.2-2 Dom0 kernel 2.6.16-xen0 (stressed ext2 filesystem ) over iscsi Result: ~60hs of testing with no problems (then stopped the tests) kernel 2.6.16.29 (stressed ext2 filesystem ) over iscsi Result: ~24.30hs of testing with no problems (then stopped the tests) Xen 3.0.2-2 Dom0 kernel 2.6.16.29-xen0 (stressed ext2 filesystem ) over iscsi Result: 15min and crashed I managed to capture the error: Unable to handle kernel NULL pointer dereference at 00000000000000e8 RIP: <ffffffff88009a1e>{:bnx2:bnx2_poll+231} PGD 1f4d7067 PUD 1f613067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: xt_physdev iptable_filter ip_tables x_tables bridge 8021q netloop ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi joydev tsdev binfmt_misc dm_mirror dm_mod usb_storage video thermal processor usbmouse usbhid usbkbd fan container button battery ac uhci_hcd ehci_hcd usbcore hw_random e1000 bnx2 piix ide_generic Pid: 0, comm: swapper Not tainted 2.6.16.29-xen0 #3 RIP: e030:[<ffffffff88009a1e>] <ffffffff88009a1e>{:bnx2:bnx2_poll+231} RSP: e02b:ffffffff80503de8 EFLAGS: 00010286 RAX: 000000000000c9f8 RBX: ffff880017778e30 RCX: ffff880014eee000 RDX: 0000000000000001 RSI: 000000000000c9f7 RDI: 00000000000000e3 RBP: 0000000000000000 R08: 0000000100215d2c R09: 000000000000002c R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000c9e3 R13: 0000000100215d29 R14: ffff88001e15ed00 R15: 0000000000000000 FS: 00002b02608bf360(0000) GS:ffffffff804b3000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process swapper (pid: 0, threadinfo ffffffff804ca000, task ffffffff80428bc0) Stack: 0000000000000001 0000000000000001 0000000000000bf0 ffff88001877abf0 00000000000000d0 ffffffff80111c93 0000000000000bf0 ffffffff8800d0c3 ffff880000000002 ffffffff00000000 Call Trace: <IRQ> <ffffffff80111c93>{dma_map_page+43} <ffffffff8800d0c3>{:bnx2:bnx2_start_xmit+801} <ffffffff803548be>{net_rx_action+230} <ffffffff801325d6>{__do_softirq+114} <ffffffff8010bac6>{call_softirq+30} <ffffffff8010d575>{do_softirq+71} <ffffffff8010d3ed>{do_IRQ+63} <ffffffff802f6b82>{evtchn_do_upcall+192} <ffffffff8010b5f6>{do_hypervisor_callback+30} <EOI> <ffffffff801073aa>{hypercall_page+938} <ffffffff801073aa>{hypercall_page+938} <ffffffff8010f702>{safe_halt+132} <ffffffff80108d77>{xen_idle+106} <ffffffff80108e36>{cpu_idle+171} <ffffffff804cd77a>{start_kernel+488} <ffffffff804cd223>{_sinittext+547} Code: 48 8b 85 e8 00 00 00 66 83 78 06 00 74 25 0f b7 40 04 41 8d RIP <ffffffff88009a1e>{:bnx2:bnx2_poll+231} RSP <ffffffff80503de8> CR2: 00000000000000e8 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. Luis Vinay _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Dec-07 23:07 UTC
RE: [Xen-devel] domU''s crashing Dom0 (Xen + iSCS = timebomb)
> I''m experimenting with xen + iscsi, and I founded that under heavystress> domU''s can crash entire system, I''ve reproduced this many many times.Are all the crashes the same? Have you got a collection of the oops messages?> Xen 3.0.2-2 Dom0 kernel 2.6.16.29-xen0 (stressed ext2 filesystem )over> iscsi > Result: 15min and crashedThe fact that this crashed without any domU''s or blkback/front suggests that this may be a native problem rather than just specifically Xen (though it could be an event channel interaction). Ho are you stressing the filesystem? It might be worth finding out if the bug is more easily triggered by either read or write workloads. It would also be very interesting to know whether it can be repro''ed on 32b. I''m not sure whether its possible to dynamically turn of NAPI support with ethtool, but this would be interesting too. Thanks, Ian> My system is like this > > Software: > - iSCSI Enterprise Target v0.4.13 > - RedHat AS4 update 4 64bit + Xen 3.0.3-0 Kernel 2.6.16.29 + OpeniSCSI> v2.0.730 (Initiator) > - Bonnie++ v1.03a > > VM: > - Debian 3.1r3 + Open iSCSI v2.0.730 (Initiator) > - RedHat AS4 update 4 + Open iSCSI v2.0.730 (Initiator) > > Tests: > Four instances of bonnie++ with root uid on the filesystem to bestressed> > hda (S.O., and swap) + hdb (stressed ext3 filesystem ) over iscsi,both> debian and RH > Result: crash > hda (S.O., and swap) local + hdb (stressed ext3 filesystem ) overiscsi,> both debian and RH > Result: crash > > hda (S.O., and swap) + hdb (stressed ext3-writeback mode filesystem )over> iscsi, both debian and RH > Result: crash > hda (S.O., and swap) local + hdb (stressed ext3-writeback modefilesystem )> over iscsi, both debian and RH > Result: crash > > hda (S.O., and swap) + hdb (stressed ext2 filesystem ) over iscsi,both> debian and RH > Result: crash > hda (S.O., and swap) local + hdb (stressed ext2 filesystem ) overiscsi,> both debian and RH > Result: crash > > hda (S.O., and swap) + hdb (stressed xfs filesystem ) over iscsi, both > debian and RH > Result: crash > hda (S.O., and swap) local + hdb (stressed xfs filesystem ) overiscsi,> both debian and RH > Result: crash > > Also tested: > > Xen 3.0.2-2 Dom0 kernel 2.6.16-xen0 (stressed ext2 filesystem ) overiscsi> Result: ~60hs of testing with no problems (then stopped the tests) > > kernel 2.6.16.29 (stressed ext2 filesystem ) over iscsi > Result: ~24.30hs of testing with no problems (then stopped thetests)> > Xen 3.0.2-2 Dom0 kernel 2.6.16.29-xen0 (stressed ext2 filesystem )over> iscsi > Result: 15min and crashed > > I managed to capture the error: > Unable to handle kernel NULL pointer dereference at 00000000000000e8RIP:> <ffffffff88009a1e>{:bnx2:bnx2_poll+231} > PGD 1f4d7067 PUD 1f613067 PMD 0 > Oops: 0000 [1] SMP > CPU 0 > Modules linked in: xt_physdev iptable_filter ip_tables x_tables bridge > 8021q netloop ipv6 parport_pc lp parport autofs4 i2c_dev i2c_coresunrpc> crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi joydev tsdev > binfmt_misc dm_mirror dm_mod usb_storage video thermal processorusbmouse> usbhid usbkbd fan container button battery ac uhci_hcd ehci_hcdusbcore> hw_random e1000 bnx2 piix ide_generic > Pid: 0, comm: swapper Not tainted 2.6.16.29-xen0 #3 > RIP: e030:[<ffffffff88009a1e>] <ffffffff88009a1e>{:bnx2:bnx2_poll+231} > RSP: e02b:ffffffff80503de8 EFLAGS: 00010286 > RAX: 000000000000c9f8 RBX: ffff880017778e30 RCX: ffff880014eee000 > RDX: 0000000000000001 RSI: 000000000000c9f7 RDI: 00000000000000e3 > RBP: 0000000000000000 R08: 0000000100215d2c R09: 000000000000002c > R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000c9e3 > R13: 0000000100215d29 R14: ffff88001e15ed00 R15: 0000000000000000 > FS: 00002b02608bf360(0000) GS:ffffffff804b3000(0000) > knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > Process swapper (pid: 0, threadinfo ffffffff804ca000, task > ffffffff80428bc0) > Stack: 0000000000000001 0000000000000001 0000000000000bf0ffff88001877abf0> 00000000000000d0 ffffffff80111c93 0000000000000bf0ffffffff8800d0c3> ffff880000000002 ffffffff00000000 > Call Trace: <IRQ> <ffffffff80111c93>{dma_map_page+43} > <ffffffff8800d0c3>{:bnx2:bnx2_start_xmit+801} > <ffffffff803548be>{net_rx_action+230} > <ffffffff801325d6>{__do_softirq+114} > <ffffffff8010bac6>{call_softirq+30} > <ffffffff8010d575>{do_softirq+71} <ffffffff8010d3ed>{do_IRQ+63} > <ffffffff802f6b82>{evtchn_do_upcall+192} > <ffffffff8010b5f6>{do_hypervisor_callback+30} <EOI> > <ffffffff801073aa>{hypercall_page+938} > <ffffffff801073aa>{hypercall_page+938} > <ffffffff8010f702>{safe_halt+132}<ffffffff80108d77>{xen_idle+106}> <ffffffff80108e36>{cpu_idle+171} > <ffffffff804cd77a>{start_kernel+488} > <ffffffff804cd223>{_sinittext+547} > > Code: 48 8b 85 e8 00 00 00 66 83 78 06 00 74 25 0f b7 40 04 41 8d > RIP <ffffffff88009a1e>{:bnx2:bnx2_poll+231} RSP <ffffffff80503de8> > CR2: 00000000000000e8 > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! > (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. > > Luis Vinay_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel