Hans Rakers
2007-Dec-17 09:54 UTC
[Xen-users] DomU with PCI passthrough NIC crashes with fatal DMA error
Hi list,
One of my CentOS 5.1 DomU''s is regularly crashing on network activity.
It has a pci passthrough for direct communication to a Intel
Etherexpress 100 card (e100 driver).
I managed to grab a kernel crash backtrace through the xen console:
-----
Fatal DMA error! Please use ''swiotlb=force''
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:365
invalid opcode: 0000 [1] SMP
last sysfs file: /block/dm-1/range
CPU 0
Modules linked in: autofs4 hidp l2cap bluetooth sunrpc ipv6 dm_multipath
parport_pc lp parport e100 mii pcspkr dm_snapshot dm_zero dm_mirror
dm_mod xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-53.1.4.el5xen #1
RIP: e030:[<ffffffff8026e9b3>] [<ffffffff8026e9b3>]
dma_map_single+0x16b/0x180
RSP: e02b:ffffffff80616e00 EFLAGS: 00010282
RAX: 000000000000002f RBX: ffff880003c72012 RCX: ffffffff804c9328
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 000000011607e012 R08: ffffffff804c9328 R09: 0000000000001bed
R10: 0000000000004474 R11: ffff8800054ed940 R12: 00000000000005fe
R13: ffff88001ff16070 R14: ffff880006f97012 R15: ffff88001dbf1980
FS: 00002aaaaee6e4e0(0000) GS:ffffffff80599000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff805d8000, task ffffffff804c4a00)
Stack: ffff88001f53a000 ffff88001dbf1960 ffff88001f53a500
0000000000000000
ffff88001f53a000 ffffffff880d82f6 ffff88001dbf1960 ffff88001f53a500
0000000000000000 ffffffff880dac60
Call Trace:
<IRQ> [<ffffffff880d82f6>] :e100:e100_rx_alloc_skb+0x93/0x116
[<ffffffff880dac60>] :e100:e100_poll+0x244/0x33b
[<ffffffff80395fcf>] unmask_evtchn+0x2d/0xd7
[<ffffffff8020c603>] net_rx_action+0xa8/0x1b4
[<ffffffff80211f50>] __do_softirq+0x62/0xdd
[<ffffffff8025dd9c>] call_softirq+0x1c/0x280
[<ffffffff8026aa98>] do_softirq+0x31/0x98
[<ffffffff8026a913>] do_IRQ+0xec/0xf5
[<ffffffff803965ae>] evtchn_do_upcall+0x86/0xe0
[<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c
<EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
[<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
[<ffffffff80292086>] rcu_pending+0x26/0x50
[<ffffffff8026be87>] raw_safe_halt+0x84/0xa8
[<ffffffff80269453>] xen_idle+0x38/0x4a
[<ffffffff80247ba9>] cpu_idle+0x97/0xba
[<ffffffff805e2b10>] start_kernel+0x21f/0x224
[<ffffffff805e21ed>] _sinittext+0x1ed/0x1f3
Code: 0f 0b 68 73 f9 46 80 c2 6d 01 59 5b 48 89 e8 5d 41 5c 41 5d
RIP [<ffffffff8026e9b3>] dma_map_single+0x16b/0x180
RSP <ffffffff80616e00>
<0>Kernel panic - not syncing: Fatal exception
-----
I''ve tried booting the DomU with the ''swiotlb=force''
kernel option but
that crashes the DomU immediately at boot. I also checked for IRQ
conflicts but there are none afaics:
Dom0:
[root@xen ~]# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
1: 309 0 0 0 Phys-irq i8042
4: 122094 0 0 0 Phys-irq serial
6: 5 0 0 0 Phys-irq floppy
8: 0 0 0 0 Phys-irq rtc
9: 0 0 0 0 Phys-irq acpi
14: 4462155 0 1890 0 Phys-irq ide0
16: 1075573 0 0 49 Phys-irq
uhci_hcd:usb4, peth0
18: 0 0 0 0 Phys-irq
uhci_hcd:usb3
19: 0 0 0 0 Phys-irq
uhci_hcd:usb1, ehci_hcd:usb5
20: 2306521 24678 376 0 Phys-irq
uhci_hcd:usb2, ahci
256: 50650289 0 0 0 Dynamic-irq timer0
257: 1256496 0 0 0 Dynamic-irq resched0
258: 47 0 0 0 Dynamic-irq callfunc0
259: 0 407003 0 0 Dynamic-irq resched1
260: 0 115 0 0 Dynamic-irq callfunc1
261: 0 10477080 0 0 Dynamic-irq timer1
262: 0 0 241603 0 Dynamic-irq resched2
263: 0 0 117 0 Dynamic-irq callfunc2
264: 0 0 1051159 0 Dynamic-irq timer2
265: 0 0 0 87279 Dynamic-irq resched3
266: 0 0 0 90 Dynamic-irq callfunc3
267: 0 0 0 785090 Dynamic-irq timer3
268: 26156 1368 0 0 Dynamic-irq xenbus
NMI: 0 0 0 0
LOC: 0 0 0 0
ERR: 0
MIS: 0
DomU:
[root@svn ~]# cat /proc/interrupts
CPU0
21: 719 Phys-irq eth0
256: 4140 Dynamic-irq timer0
257: 0 Dynamic-irq resched0
258: 0 Dynamic-irq callfunc0
259: 369 Dynamic-irq xenbus
260: 309 Dynamic-irq xencons
261: 188 Dynamic-irq xenfb
262: 0 Dynamic-irq xenkbd
263: 5810 Dynamic-irq blkif
NMI: 0
LOC: 0
ERR: 0
MIS: 0
Dom0 also runs CentOS 5.1 btw.
Anyone have any ideas how to deal with this?
Thanks for your time.
With kind regards,
Hans Rakers
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Hans Rakers
2007-Dec-17 11:23 UTC
Re: [Xen-users] DomU with PCI passthrough NIC crashes with fatal DMA error
In addition, i just tried the CentOS Plus Xen kernel which has the
eepro100 driver. Using eepro100 it craps out during boot while
configuring the interface. Seems there''s some major DMA issues when
using PCI passthrough NICs :{
Backtrace using eepro100 follows:
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: Fatal DMA error! Please use
''swiotlb=force''
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:365
invalid opcode: 0000 [1] SMP
last sysfs file: /class/net/eth0/address
CPU 0
Modules linked in: ipv6 dm_multipath parport_pc lp parport eepro100 mii
pcspkr dm_snapshot dm_zero dm_mirror dm_mod xenblk ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
Pid: 1035, comm: arping Not tainted 2.6.18-53.1.4.el5.centos.plusxen #1
RIP: e030:[<ffffffff8026e9b3>] [<ffffffff8026e9b3>]
dma_map_single+0x16b/0x180
RSP: e02b:ffff88001d55bb98 EFLAGS: 00010086
RAX: 000000000000002f RBX: ffff880000df7c02 RCX: ffff88001ff1f070
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 00000001141dec02 R08: ffff88001f1a79f8 R09: ffff88001d55bb08
R10: 0000000000001b3c R11: ffff88001f4c7078 R12: 000000000000002a
R13: ffff88001ff1f070 R14: ffff88001e47b4c0 R15: ffffc200001ea000
FS: 00002aaaab22baf0(0000) GS:ffffffff8059b000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process arping (pid: 1035, threadinfo ffff88001d55a000, task
ffff88001f4e0040)
Stack: 00000000000004d0 ffff88001ed4e000 ffff88001f53d500
0000000000000100
ffff88001f53d000 ffffffff880da78c 0000000000000000 ffff88001f53d000
ffff88001e47b4c0 0000000000000000
Call Trace:
[<ffffffff880da78c>] :eepro100:speedo_start_xmit+0x136/0x274
[<ffffffff8040df2a>] __qdisc_run+0xf6/0x1bb
[<ffffffff8022fdf5>] dev_queue_xmit+0x1ee/0x313
[<ffffffff8044b5f1>] packet_sendmsg+0x216/0x26c
[<ffffffff802538c4>] sock_sendmsg+0xf3/0x110
[<ffffffff80294356>] autoremove_wake_function+0x0/0x2e
[<ffffffff8026190f>] _read_lock_irq+0x9/0x19
[<ffffffff802071cf>] find_get_page+0x44/0x4b
[<ffffffff8021330d>] filemap_nopage+0x188/0x322
[<ffffffff80208e30>] __handle_mm_fault+0x668/0xf4d
[<ffffffff803f76a5>] sys_sendto+0x11c/0x14f
[<ffffffff803f7894>] move_addr_to_user+0x5d/0x78
[<ffffffff8026187d>] _spin_lock_irq+0x9/0x14
[<ffffffff80228ace>] do_sigaction+0x189/0x19d
[<ffffffff802409af>] do_ioctl+0x21/0x6b
[<ffffffff8025d102>] system_call+0x86/0x8b
[<ffffffff8025d07c>] system_call+0x0/0x8b
Code: 0f 0b 68 38 1b 47 80 c2 6d 01 59 5b 48 89 e8 5d 41 5c 41 5d
RIP [<ffffffff8026e9b3>] dma_map_single+0x16b/0x180
RSP <ffff88001d55bb98>
<0>Kernel panic - not syncing: Fatal exception
Hans Rakers wrote:> Hi list,
>
> One of my CentOS 5.1 DomU''s is regularly crashing on network
activity.
> It has a pci passthrough for direct communication to a Intel
> Etherexpress 100 card (e100 driver).
>
> I managed to grab a kernel crash backtrace through the xen console:
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users