thr3ads.net - Xen users - [Xen-users] Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space" [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Ulrich Hochholdinger

2010-Nov-15 17:52 UTC

[Xen-users] Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"

Hi,
My dom0 crashes while doing I/O on the local harddrive.
* System is a "Dell Poweredge R710" with an "Perc H200"
controller "mpt2sas"/ 96GB RAM / 2x XEON X5650
* Harddrives are configured as raid1. 
* OS is Debian Squeeze with
  * Xen version 4.0.1 (Debian 4.0.1-1) - amd64 (xen Option: dom0_mem=512M)
  * Dom0-Kernel (Distribution Kernel) : 2.6.32-5-xen-686 (no special Options)
* after doing some moderate I/O on the local raid1 with "dd if=/dev/zero
of=bigfile bs=1024 count=100000" the system crashes.
* Strange: if the raid1 is degraded, the system doesn''t crash, doing
I/O over the complete Harddrive.

Has someone an idea how to fix/workaorund this "bug"? In the meantime
I tested
different setings without any success: 
- VT-d enabled / disabled / (BIOS and iommu=1)
- dom0_mem=512M (my default) different settings
- modified swiotlb (without any success)

The last lines the Kernel reports:
[ 5822.499666] mpt2sas 0000:03:00.0: DMA: Out of SW-IOMMU space for 65536 bytes.
[ 5822.499743] BUG: unable to handle kernel NULL pointer dereference at 00000008
[ 5822.499919] IP: [<e09a10a4>] _scsih_qcmd+0x412/0x4d0 [mpt2sas]
[ 5822.500024] *pdpt = 0000000001466007 *pde = 0000000000000000 
[ 5822.500147] Oops: 0000 [#1] SMP 
[ 5822.500269] last sysfs file: /sys/devices/virtual/block/md0/md/mismatch_cnt
[ 5822.500330] Modules linked in: netconsole configfs xen_evtchn xenfs fuse
8021q garp bridge stp reiserfs loop snd_pcm snd_timer ioatdma snd soundcore
snd_page_alloc psmouse dca dcdbas serio_raw evdev processor button power_meter
pcspkr joydev acpi_processor ext3 jbd mbcache dm_mod raid1 md_mod sg sr_mod
sd_mod cdrom crc_t10dif usbhid hid usb_storage uhci_hcd mpt2sas ehci_hcd
scsi_transport_sas usbcore nls_base scsi_mod bnx2 thermal thermal_sys [last
unloaded: netconsole]
[ 5822.502221] 
[ 5822.502272] Pid: 442, comm: md0_raid1 Not tainted (2.6.32-5-xen-686 #1)
PowerEdge R710
[ 5822.502348] EIP: 0061:[<e09a10a4>] EFLAGS: 00010002 CPU: 1
[ 5822.502406] EIP is at _scsih_qcmd+0x412/0x4d0 [mpt2sas]
[ 5822.502462] EAX: dd9ba344 EBX: 00000009 ECX: e099b05d EDX: 14000000
[ 5822.502520] ESI: 00000000 EDI: dd145b30 EBP: 0000000f ESP: dd5efd64
[ 5822.502615]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[ 5822.502679] Process md0_raid1 (pid: 442, ti=dd5ee000 task=c1f4f2c0
task.ti=dd5ee000)
[ 5822.502754] Stack:
[ 5822.502804]  000000b6 dd9ba344 c1dde400 d5000000 94000000 fffffff1 bf145b00
00000000
[ 5822.503086] <0> dd105b00 14000000 dd145b00 c1dde000 dada6240 dd9ba000
dd9b0228 e096597b
[ 5822.503442] <0> dd0a0f90 c1dde000 de50f560 dd9ba000 e096a33c dd0a0f90
c1dde0b0 dada6240
[ 5822.503844] Call Trace:
[ 5822.503907]  [<e096597b>] ? scsi_dispatch_cmd+0x179/0x1e5 [scsi_mod]
[ 5822.503971]  [<e096a33c>] ? scsi_request_fn+0x343/0x47a [scsi_mod]
[ 5822.504032]  [<c1131da3>] ? __generic_unplug_device+0x23/0x25
[ 5822.504091]  [<c11323a4>] ? __make_request+0x364/0x3d9
[ 5822.505487]  [<c107655b>] ? rcu_process_callbacks+0x33/0x39
[ 5822.505546]  [<c103c4f6>] ? __do_softirq+0x128/0x151
[ 5822.505605]  [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10
[ 5822.505663]  [<c1130f81>] ? generic_make_request+0x266/0x2b4
[ 5822.505723]  [<e08f3d12>] ? flush_pending_writes+0x58/0x74 [raid1]
[ 5822.505783]  [<e08f3df3>] ? raid1d+0x61/0xccc [raid1]
[ 5822.505842]  [<c1007c85>] ? __switch_to+0x124/0x141
[ 5822.505900]  [<c1032342>] ? finish_task_switch+0x3c/0x95
[ 5822.505958]  [<c128d196>] ? schedule+0x78f/0x7dc
[ 5822.506015]  [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10
[ 5822.506074]  [<c10066d3>] ? xen_restore_fl_direct_end+0x0/0x1
[ 5822.506133]  [<c128e2f9>] ? _spin_unlock_irqrestore+0xd/0xf
[ 5822.506192]  [<c104241a>] ? try_to_del_timer_sync+0x4f/0x56
[ 5822.506251]  [<c104242b>] ? del_timer_sync+0xa/0x14
[ 5822.506308]  [<c128d512>] ? schedule_timeout+0x89/0xb0
[ 5822.506365]  [<c10424d3>] ? process_timeout+0x0/0x5
[ 5822.506424]  [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10
[ 5822.506483]  [<c10066dc>] ? check_events+0x8/0xc
[ 5822.506542]  [<e0acd050>] ? md_thread+0xe1/0xf8 [md_mod]
[ 5822.506601]  [<c104b0ea>] ? autoremove_wake_function+0x0/0x2d
[ 5822.506661]  [<e0accf6f>] ? md_thread+0x0/0xf8 [md_mod]
[ 5822.506718]  [<c104aeb8>] ? kthread+0x61/0x66
[ 5822.506774]  [<c104ae57>] ? kthread+0x0/0x66
[ 5822.506830]  [<c1009a67>] ? kernel_thread_helper+0x7/0x10
[ 5822.506886] Code: 08 89 eb 8b 7c 24 28 eb 48 8b 7c 24 28 e9 a9 00 00 00 8b 44
24 04 83 fb 01 8b 88 14 02 00 00 75 06 8b 54 24 10 eb 04 8b 54 24 24 <0b>
56 08 89 f8 ff 76 10 4b ff 76 0c ff d1 58 89 f0 5a e8 7d 4c
[ 5822.509030] EIP: [<e09a10a4>] _scsih_qcmd+0x412/0x4d0 [mpt2sas] SS:ESP
0069:dd5efd64
[ 5822.509183] CR2: 0000000000000008
[ 5822.509238] ---[ end trace 3c25d9a65cc7a879 ]---


Regards,
	Ulli

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Bart Coninckx

2010-Nov-16 21:26 UTC

head link

Re: [Xen-users] Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"

On Monday 15 November 2010 18:52:05 Ulrich Hochholdinger
wrote:> Hi,
> My dom0 crashes while doing I/O on the local harddrive.
> * System is a "Dell Poweredge R710" with an "Perc H200"
controller
> "mpt2sas"/ 96GB RAM / 2x XEON X5650 * Harddrives are configured
as raid1.
> * OS is Debian Squeeze with
>   * Xen version 4.0.1 (Debian 4.0.1-1) - amd64 (xen Option: dom0_mem=512M)
>   * Dom0-Kernel (Distribution Kernel) : 2.6.32-5-xen-686 (no special
> Options) * after doing some moderate I/O on the local raid1 with "dd
> if=/dev/zero of=bigfile bs=1024 count=100000" the system crashes. *
> Strange: if the raid1 is degraded, the system doesn''t crash, doing
I/O
> over the complete Harddrive.
> 
> Has someone an idea how to fix/workaorund this "bug"? In the
meantime I
> tested different setings without any success:
> - VT-d enabled / disabled / (BIOS and iommu=1)
> - dom0_mem=512M (my default) different settings
> - modified swiotlb (without any success)
> 
> The last lines the Kernel reports:
> [ 5822.499666] mpt2sas 0000:03:00.0: DMA: Out of SW-IOMMU space for 65536
> bytes. [ 5822.499743] BUG: unable to handle kernel NULL pointer
> dereference at 00000008 [ 5822.499919] IP: [<e09a10a4>]
> _scsih_qcmd+0x412/0x4d0 [mpt2sas]
> [ 5822.500024] *pdpt = 0000000001466007 *pde = 0000000000000000
> [ 5822.500147] Oops: 0000 [#1] SMP
> [ 5822.500269] last sysfs file:
> /sys/devices/virtual/block/md0/md/mismatch_cnt [ 5822.500330] Modules
> linked in: netconsole configfs xen_evtchn xenfs fuse 8021q garp bridge stp
> reiserfs loop snd_pcm snd_timer ioatdma snd soundcore snd_page_alloc
> psmouse dca dcdbas serio_raw evdev processor button power_meter pcspkr
> joydev acpi_processor ext3 jbd mbcache dm_mod raid1 md_mod sg sr_mod
> sd_mod cdrom crc_t10dif usbhid hid usb_storage uhci_hcd mpt2sas ehci_hcd
> scsi_transport_sas usbcore nls_base scsi_mod bnx2 thermal thermal_sys
> [last unloaded: netconsole] [ 5822.502221]
> [ 5822.502272] Pid: 442, comm: md0_raid1 Not tainted (2.6.32-5-xen-686 #1)
> PowerEdge R710 [ 5822.502348] EIP: 0061:[<e09a10a4>] EFLAGS: 00010002
CPU:
> 1
> [ 5822.502406] EIP is at _scsih_qcmd+0x412/0x4d0 [mpt2sas]
> [ 5822.502462] EAX: dd9ba344 EBX: 00000009 ECX: e099b05d EDX: 14000000
> [ 5822.502520] ESI: 00000000 EDI: dd145b30 EBP: 0000000f ESP: dd5efd64
> [ 5822.502615]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [ 5822.502679] Process md0_raid1 (pid: 442, ti=dd5ee000 task=c1f4f2c0
> task.ti=dd5ee000) [ 5822.502754] Stack:
> [ 5822.502804]  000000b6 dd9ba344 c1dde400 d5000000 94000000 fffffff1
> bf145b00 00000000 [ 5822.503086] <0> dd105b00 14000000 dd145b00
c1dde000
> dada6240 dd9ba000 dd9b0228 e096597b [ 5822.503442] <0> dd0a0f90
c1dde000
> de50f560 dd9ba000 e096a33c dd0a0f90 c1dde0b0 dada6240 [ 5822.503844] Call
> Trace:
> [ 5822.503907]  [<e096597b>] ? scsi_dispatch_cmd+0x179/0x1e5
[scsi_mod]
> [ 5822.503971]  [<e096a33c>] ? scsi_request_fn+0x343/0x47a [scsi_mod]
> [ 5822.504032]  [<c1131da3>] ? __generic_unplug_device+0x23/0x25
> [ 5822.504091]  [<c11323a4>] ? __make_request+0x364/0x3d9
> [ 5822.505487]  [<c107655b>] ? rcu_process_callbacks+0x33/0x39
> [ 5822.505546]  [<c103c4f6>] ? __do_softirq+0x128/0x151
> [ 5822.505605]  [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10
> [ 5822.505663]  [<c1130f81>] ? generic_make_request+0x266/0x2b4
> [ 5822.505723]  [<e08f3d12>] ? flush_pending_writes+0x58/0x74 [raid1]
> [ 5822.505783]  [<e08f3df3>] ? raid1d+0x61/0xccc [raid1]
> [ 5822.505842]  [<c1007c85>] ? __switch_to+0x124/0x141
> [ 5822.505900]  [<c1032342>] ? finish_task_switch+0x3c/0x95
> [ 5822.505958]  [<c128d196>] ? schedule+0x78f/0x7dc
> [ 5822.506015]  [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10
> [ 5822.506074]  [<c10066d3>] ? xen_restore_fl_direct_end+0x0/0x1
> [ 5822.506133]  [<c128e2f9>] ? _spin_unlock_irqrestore+0xd/0xf
> [ 5822.506192]  [<c104241a>] ? try_to_del_timer_sync+0x4f/0x56
> [ 5822.506251]  [<c104242b>] ? del_timer_sync+0xa/0x14
> [ 5822.506308]  [<c128d512>] ? schedule_timeout+0x89/0xb0
> [ 5822.506365]  [<c10424d3>] ? process_timeout+0x0/0x5
> [ 5822.506424]  [<c1005fb4>] ? xen_force_evtchn_callback+0xc/0x10
> [ 5822.506483]  [<c10066dc>] ? check_events+0x8/0xc
> [ 5822.506542]  [<e0acd050>] ? md_thread+0xe1/0xf8 [md_mod]
> [ 5822.506601]  [<c104b0ea>] ? autoremove_wake_function+0x0/0x2d
> [ 5822.506661]  [<e0accf6f>] ? md_thread+0x0/0xf8 [md_mod]
> [ 5822.506718]  [<c104aeb8>] ? kthread+0x61/0x66
> [ 5822.506774]  [<c104ae57>] ? kthread+0x0/0x66
> [ 5822.506830]  [<c1009a67>] ? kernel_thread_helper+0x7/0x10
> [ 5822.506886] Code: 08 89 eb 8b 7c 24 28 eb 48 8b 7c 24 28 e9 a9 00 00 00
> 8b 44 24 04 83 fb 01 8b 88 14 02 00 00 75 06 8b 54 24 10 eb 04 8b 54 24 24
> <0b> 56 08 89 f8 ff 76 10 4b ff 76 0c ff d1 58 89 f0 5a e8 7d 4c [
> 5822.509030] EIP: [<e09a10a4>] _scsih_qcmd+0x412/0x4d0 [mpt2sas]
SS:ESP
> 0069:dd5efd64 [ 5822.509183] CR2: 0000000000000008
> [ 5822.509238] ---[ end trace 3c25d9a65cc7a879 ]---
> 
> 
> Regards,
> 	Ulli
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users

I have had similar things going on. Culprit was RAID1 on H200. Performance is 
plain ugly with this card: about 20 MB/s write speed with bonnie++. In RAID0 
you get a whopping 120 MB/s (go figure ...).

Try the install on RAID0 to see if things get better. Also taking one drive 
out enhanced write speed times two. 

When you install the Dell OSA software, you can set Disk Caching Policy to 
"On", which helps as well.


B.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Markus Schuster

2010-Nov-19 11:25 UTC

head link

[Xen-users] Re: Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"

Ulrich Hochholdinger wrote:> My dom0 crashes while doing I/O on the local harddrive.
> [..]
> [ 5822.499666] mpt2sas 0000:03:00.0: DMA: Out of SW-IOMMU space for 65536
> [ bytes. 5822.499743] BUG: unable to handle kernel NULL pointer
Wow, I really hope that "Out of SW-IOMMU space" problem
doesn''t come back.
We had problems back in late 2007/ early 2008, just search xen-user and xen-
devel for "Out of SW-IOMMU space".
At that time, the trick was a patch to the dom0 kernel. 

Regards,
Markus


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Nov 2010 - Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"

[Xen-users] Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"

Re: [Xen-users] Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"

[Xen-users] Re: Xen Dom0 crash doing some I/O with "Out of SW-IOMMU space"