Ulrich Hochholdinger
2010-Nov-15 17:52 UTC
[Xen-users] Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"
Hi, My dom0 crashes while doing I/O on the local harddrive. * System is a "Dell Poweredge R710" with an "Perc H200" controller "mpt2sas"/ 96GB RAM / 2x XEON X5650 * Harddrives are configured as raid1. * OS is Debian Squeeze with * Xen version 4.0.1 (Debian 4.0.1-1) - amd64 (xen Option: dom0_mem=512M) * Dom0-Kernel (Distribution Kernel) : 2.6.32-5-xen-686 (no special Options) * after doing some moderate I/O on the local raid1 with "dd if=/dev/zero of=bigfile bs=1024 count=100000" the system crashes. * Strange: if the raid1 is degraded, the system doesn''t crash, doing I/O over the complete Harddrive. Has someone an idea how to fix/workaorund this "bug"? In the meantime I tested different setings without any success: - VT-d enabled / disabled / (BIOS and iommu=1) - dom0_mem=512M (my default) different settings - modified swiotlb (without any success) The last lines the Kernel reports: [ 5822.499666] mpt2sas 0000:03:00.0: DMA: Out of SW-IOMMU space for 65536 bytes. [ 5822.499743] BUG: unable to handle kernel NULL pointer dereference at 00000008 [ 5822.499919] IP: [<e09a10a4>] _scsih_qcmd+0x412/0x4d0 [mpt2sas] [ 5822.500024] *pdpt = 0000000001466007 *pde = 0000000000000000 [ 5822.500147] Oops: 0000 [#1] SMP [ 5822.500269] last sysfs file: /sys/devices/virtual/block/md0/md/mismatch_cnt [ 5822.500330] Modules linked in: netconsole configfs xen_evtchn xenfs fuse 8021q garp bridge stp reiserfs loop snd_pcm snd_timer ioatdma snd soundcore snd_page_alloc psmouse dca dcdbas serio_raw evdev processor button power_meter pcspkr joydev acpi_processor ext3 jbd mbcache dm_mod raid1 md_mod sg sr_mod sd_mod cdrom crc_t10dif usbhid hid usb_storage uhci_hcd mpt2sas ehci_hcd scsi_transport_sas usbcore nls_base scsi_mod bnx2 thermal thermal_sys [last unloaded: netconsole] [ 5822.502221] [ 5822.502272] Pid: 442, comm: md0_raid1 Not tainted (2.6.32-5-xen-686 #1) PowerEdge R710 [ 5822.502348] EIP: 0061:[<e09a10a4>] EFLAGS: 00010002 CPU: 1 [ 5822.502406] EIP is at _scsih_qcmd+0x412/0x4d0 [mpt2sas] [ 5822.502462] EAX: dd9ba344 EBX: 00000009 ECX: e099b05d EDX: 14000000 [ 5822.502520] ESI: 00000000 EDI: dd145b30 EBP: 0000000f ESP: dd5efd64 [ 5822.502615] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [ 5822.502679] Process md0_raid1 (pid: 442, ti=dd5ee000 task=c1f4f2c0 task.ti=dd5ee000) [ 5822.502754] Stack: [ 5822.502804] 000000b6 dd9ba344 c1dde400 d5000000 94000000 fffffff1 bf145b00 00000000 [ 5822.503086] <0> dd105b00 14000000 dd145b00 c1dde000 dada6240 dd9ba000 dd9b0228 e096597b [ 5822.503442] <0> dd0a0f90 c1dde000 de50f560 dd9ba000 e096a33c dd0a0f90 c1dde0b0 dada6240 [ 5822.503844] Call Trace: [ 5822.503907] [<e096597b>] ? scsi_dispatch_cmd+0x179/0x1e5 [scsi_mod] [ 5822.503971] [<e096a33c>] ? scsi_request_fn+0x343/0x47a [scsi_mod] [ 5822.504032] [<c1131da3>] ? __generic_unplug_device+0x23/0x25 [ 5822.504091] [<c11323a4>] ? __make_request+0x364/0x3d9 [ 5822.505487] [<c107655b>] ? rcu_process_callbacks+0x33/0x39 [ 5822.505546] [<c103c4f6>] ? __do_softirq+0x128/0x151 [ 5822.505605] [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10 [ 5822.505663] [<c1130f81>] ? generic_make_request+0x266/0x2b4 [ 5822.505723] [<e08f3d12>] ? flush_pending_writes+0x58/0x74 [raid1] [ 5822.505783] [<e08f3df3>] ? raid1d+0x61/0xccc [raid1] [ 5822.505842] [<c1007c85>] ? __switch_to+0x124/0x141 [ 5822.505900] [<c1032342>] ? finish_task_switch+0x3c/0x95 [ 5822.505958] [<c128d196>] ? schedule+0x78f/0x7dc [ 5822.506015] [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10 [ 5822.506074] [<c10066d3>] ? xen_restore_fl_direct_end+0x0/0x1 [ 5822.506133] [<c128e2f9>] ? _spin_unlock_irqrestore+0xd/0xf [ 5822.506192] [<c104241a>] ? try_to_del_timer_sync+0x4f/0x56 [ 5822.506251] [<c104242b>] ? del_timer_sync+0xa/0x14 [ 5822.506308] [<c128d512>] ? schedule_timeout+0x89/0xb0 [ 5822.506365] [<c10424d3>] ? process_timeout+0x0/0x5 [ 5822.506424] [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10 [ 5822.506483] [<c10066dc>] ? check_events+0x8/0xc [ 5822.506542] [<e0acd050>] ? md_thread+0xe1/0xf8 [md_mod] [ 5822.506601] [<c104b0ea>] ? autoremove_wake_function+0x0/0x2d [ 5822.506661] [<e0accf6f>] ? md_thread+0x0/0xf8 [md_mod] [ 5822.506718] [<c104aeb8>] ? kthread+0x61/0x66 [ 5822.506774] [<c104ae57>] ? kthread+0x0/0x66 [ 5822.506830] [<c1009a67>] ? kernel_thread_helper+0x7/0x10 [ 5822.506886] Code: 08 89 eb 8b 7c 24 28 eb 48 8b 7c 24 28 e9 a9 00 00 00 8b 44 24 04 83 fb 01 8b 88 14 02 00 00 75 06 8b 54 24 10 eb 04 8b 54 24 24 <0b> 56 08 89 f8 ff 76 10 4b ff 76 0c ff d1 58 89 f0 5a e8 7d 4c [ 5822.509030] EIP: [<e09a10a4>] _scsih_qcmd+0x412/0x4d0 [mpt2sas] SS:ESP 0069:dd5efd64 [ 5822.509183] CR2: 0000000000000008 [ 5822.509238] ---[ end trace 3c25d9a65cc7a879 ]--- Regards, Ulli _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bart Coninckx
2010-Nov-16 21:26 UTC
Re: [Xen-users] Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"
On Monday 15 November 2010 18:52:05 Ulrich Hochholdinger wrote:> Hi, > My dom0 crashes while doing I/O on the local harddrive. > * System is a "Dell Poweredge R710" with an "Perc H200" controller > "mpt2sas"/ 96GB RAM / 2x XEON X5650 * Harddrives are configured as raid1. > * OS is Debian Squeeze with > * Xen version 4.0.1 (Debian 4.0.1-1) - amd64 (xen Option: dom0_mem=512M) > * Dom0-Kernel (Distribution Kernel) : 2.6.32-5-xen-686 (no special > Options) * after doing some moderate I/O on the local raid1 with "dd > if=/dev/zero of=bigfile bs=1024 count=100000" the system crashes. * > Strange: if the raid1 is degraded, the system doesn''t crash, doing I/O > over the complete Harddrive. > > Has someone an idea how to fix/workaorund this "bug"? In the meantime I > tested different setings without any success: > - VT-d enabled / disabled / (BIOS and iommu=1) > - dom0_mem=512M (my default) different settings > - modified swiotlb (without any success) > > The last lines the Kernel reports: > [ 5822.499666] mpt2sas 0000:03:00.0: DMA: Out of SW-IOMMU space for 65536 > bytes. [ 5822.499743] BUG: unable to handle kernel NULL pointer > dereference at 00000008 [ 5822.499919] IP: [<e09a10a4>] > _scsih_qcmd+0x412/0x4d0 [mpt2sas] > [ 5822.500024] *pdpt = 0000000001466007 *pde = 0000000000000000 > [ 5822.500147] Oops: 0000 [#1] SMP > [ 5822.500269] last sysfs file: > /sys/devices/virtual/block/md0/md/mismatch_cnt [ 5822.500330] Modules > linked in: netconsole configfs xen_evtchn xenfs fuse 8021q garp bridge stp > reiserfs loop snd_pcm snd_timer ioatdma snd soundcore snd_page_alloc > psmouse dca dcdbas serio_raw evdev processor button power_meter pcspkr > joydev acpi_processor ext3 jbd mbcache dm_mod raid1 md_mod sg sr_mod > sd_mod cdrom crc_t10dif usbhid hid usb_storage uhci_hcd mpt2sas ehci_hcd > scsi_transport_sas usbcore nls_base scsi_mod bnx2 thermal thermal_sys > [last unloaded: netconsole] [ 5822.502221] > [ 5822.502272] Pid: 442, comm: md0_raid1 Not tainted (2.6.32-5-xen-686 #1) > PowerEdge R710 [ 5822.502348] EIP: 0061:[<e09a10a4>] EFLAGS: 00010002 CPU: > 1 > [ 5822.502406] EIP is at _scsih_qcmd+0x412/0x4d0 [mpt2sas] > [ 5822.502462] EAX: dd9ba344 EBX: 00000009 ECX: e099b05d EDX: 14000000 > [ 5822.502520] ESI: 00000000 EDI: dd145b30 EBP: 0000000f ESP: dd5efd64 > [ 5822.502615] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > [ 5822.502679] Process md0_raid1 (pid: 442, ti=dd5ee000 task=c1f4f2c0 > task.ti=dd5ee000) [ 5822.502754] Stack: > [ 5822.502804] 000000b6 dd9ba344 c1dde400 d5000000 94000000 fffffff1 > bf145b00 00000000 [ 5822.503086] <0> dd105b00 14000000 dd145b00 c1dde000 > dada6240 dd9ba000 dd9b0228 e096597b [ 5822.503442] <0> dd0a0f90 c1dde000 > de50f560 dd9ba000 e096a33c dd0a0f90 c1dde0b0 dada6240 [ 5822.503844] Call > Trace: > [ 5822.503907] [<e096597b>] ? scsi_dispatch_cmd+0x179/0x1e5 [scsi_mod] > [ 5822.503971] [<e096a33c>] ? scsi_request_fn+0x343/0x47a [scsi_mod] > [ 5822.504032] [<c1131da3>] ? __generic_unplug_device+0x23/0x25 > [ 5822.504091] [<c11323a4>] ? __make_request+0x364/0x3d9 > [ 5822.505487] [<c107655b>] ? rcu_process_callbacks+0x33/0x39 > [ 5822.505546] [<c103c4f6>] ? __do_softirq+0x128/0x151 > [ 5822.505605] [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10 > [ 5822.505663] [<c1130f81>] ? generic_make_request+0x266/0x2b4 > [ 5822.505723] [<e08f3d12>] ? flush_pending_writes+0x58/0x74 [raid1] > [ 5822.505783] [<e08f3df3>] ? raid1d+0x61/0xccc [raid1] > [ 5822.505842] [<c1007c85>] ? __switch_to+0x124/0x141 > [ 5822.505900] [<c1032342>] ? finish_task_switch+0x3c/0x95 > [ 5822.505958] [<c128d196>] ? schedule+0x78f/0x7dc > [ 5822.506015] [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10 > [ 5822.506074] [<c10066d3>] ? xen_restore_fl_direct_end+0x0/0x1 > [ 5822.506133] [<c128e2f9>] ? _spin_unlock_irqrestore+0xd/0xf > [ 5822.506192] [<c104241a>] ? try_to_del_timer_sync+0x4f/0x56 > [ 5822.506251] [<c104242b>] ? del_timer_sync+0xa/0x14 > [ 5822.506308] [<c128d512>] ? schedule_timeout+0x89/0xb0 > [ 5822.506365] [<c10424d3>] ? process_timeout+0x0/0x5 > [ 5822.506424] [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10 > [ 5822.506483] [<c10066dc>] ? check_events+0x8/0xc > [ 5822.506542] [<e0acd050>] ? md_thread+0xe1/0xf8 [md_mod] > [ 5822.506601] [<c104b0ea>] ? autoremove_wake_function+0x0/0x2d > [ 5822.506661] [<e0accf6f>] ? md_thread+0x0/0xf8 [md_mod] > [ 5822.506718] [<c104aeb8>] ? kthread+0x61/0x66 > [ 5822.506774] [<c104ae57>] ? kthread+0x0/0x66 > [ 5822.506830] [<c1009a67>] ? kernel_thread_helper+0x7/0x10 > [ 5822.506886] Code: 08 89 eb 8b 7c 24 28 eb 48 8b 7c 24 28 e9 a9 00 00 00 > 8b 44 24 04 83 fb 01 8b 88 14 02 00 00 75 06 8b 54 24 10 eb 04 8b 54 24 24 > <0b> 56 08 89 f8 ff 76 10 4b ff 76 0c ff d1 58 89 f0 5a e8 7d 4c [ > 5822.509030] EIP: [<e09a10a4>] _scsih_qcmd+0x412/0x4d0 [mpt2sas] SS:ESP > 0069:dd5efd64 [ 5822.509183] CR2: 0000000000000008 > [ 5822.509238] ---[ end trace 3c25d9a65cc7a879 ]--- > > > Regards, > Ulli > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-usersI have had similar things going on. Culprit was RAID1 on H200. Performance is plain ugly with this card: about 20 MB/s write speed with bonnie++. In RAID0 you get a whopping 120 MB/s (go figure ...). Try the install on RAID0 to see if things get better. Also taking one drive out enhanced write speed times two. When you install the Dell OSA software, you can set Disk Caching Policy to "On", which helps as well. B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Nov-19 11:25 UTC
[Xen-users] Re: Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"
Ulrich Hochholdinger wrote:> My dom0 crashes while doing I/O on the local harddrive. > [..] > [ 5822.499666] mpt2sas 0000:03:00.0: DMA: Out of SW-IOMMU space for 65536 > [ bytes. 5822.499743] BUG: unable to handle kernel NULL pointerWow, I really hope that "Out of SW-IOMMU space" problem doesn''t come back. We had problems back in late 2007/ early 2008, just search xen-user and xen- devel for "Out of SW-IOMMU space". At that time, the trick was a patch to the dom0 kernel. Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users