Alexander Menk
2009-Jun-09 09:32 UTC
[Xen-users] Intel Quad NIC made visible in guest -> system crash
Hi! we have two Intel Quad Nic 82576, PCI ID 8086:10E8 and use the igb driver 1.3.19.3 on Debian 5.0.1. I used the pciback.hide XEN kernel parameter and made on of the NIC''s interfaces available in a DomU. Now, when I am starting the VM, the system crashes (log attached) I also tried to boot with the irqpoll option - the Interrupt Disabled message still appears when I try to start the VM. Is that a serious driver problem or something that is easy to fix? Best Regards, Alexander Jun 9 10:39:56 dom0 kernel: [ 2362.408134] pciback 0000:10:00.1: enabling device (0000 -> 0003) Jun 9 10:39:56 dom0 kernel: [ 2362.408134] ACPI: PCI Interrupt 0000:10:00.1[B] -> GSI 17 (level, low) -> IRQ 17 Jun 9 10:39:56 dom0 kernel: [ 2362.408134] pciback 0000:10:00.1: Driver tried to write to a read-only configuration space field at offset 0xa8, size 2. This may be harmless, but if you have problems with your device: Jun 9 10:39:56 dom0 kernel: [ 2362.408134] 1) see permissive attribute in sysfs Jun 9 10:39:56 dom0 kernel: [ 2362.408134] 2) report problems to the xen-devel mailing list along with details of your device obtained from lspci. Jun 9 10:39:56 dom0 kernel: [ 2362.408134] PCI: Setting latency timer of device 0000:10:00.1 to 64 Jun 9 10:39:56 dom0 kernel: [ 2362.408134] get owner for dev 1 get 7 Jun 9 10:39:56 dom0 kernel: [ 2362.408134] map irq failed Jun 9 10:39:56 dom0 kernel: [ 2362.412135] get owner for dev 1 get 7 Jun 9 10:39:56 dom0 kernel: [ 2362.412135] map irq failed Jun 9 10:39:56 dom0 kernel: [ 2362.412135] error enable msi for guest 7 status fffffff0 Jun 9 10:40:01 dom0 /USR/SBIN/CRON[8617]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi) Jun 9 10:40:04 dom0 kernel: [ 2370.648947] vif7.0: no IPv6 routers present Jun 9 10:41:42 dom0 kernel: [ 2468.316595] kjournald starting. Commit interval 5 seconds Jun 9 10:41:42 dom0 kernel: [ 2468.316595] EXT3 FS on dm-14, internal journal Jun 9 10:41:42 dom0 kernel: [ 2468.316595] EXT3-fs: recovery complete. Jun 9 10:41:42 dom0 kernel: [ 2468.316595] EXT3-fs: mounted filesystem with ordered data mode. Jun 9 10:45:01 dom0 /USR/SBIN/CRON[8639]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi) Jun 9 10:50:01 dom0 /USR/SBIN/CRON[8645]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi) Jun 9 10:55:01 dom0 /USR/SBIN/CRON[8651]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi) Jun 9 11:00:01 dom0 /USR/SBIN/CRON[8657]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi) Jun 9 11:05:01 dom0 /USR/SBIN/CRON[8663]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi) Jun 9 11:07:24 dom0 kernel: [ 4010.214397] irq 17: nobody cared (try booting with the "irqpoll" option) Jun 9 11:07:24 dom0 kernel: [ 4010.214438] Pid: 0, comm: swapper Not tainted 2.6.26-2-xen-amd64 #1 Jun 9 11:07:24 dom0 kernel: [ 4010.214469] Jun 9 11:07:24 dom0 kernel: [ 4010.214470] Call Trace: Jun 9 11:07:24 dom0 kernel: [ 4010.214515] <IRQ> [<ffffffff8037c718>] irq_ignore_unhandled+0x1c/0x32 Jun 9 11:07:24 dom0 kernel: [ 4010.214564] [<ffffffff8025f9f3>] __report_bad_irq+0x30/0x72 Jun 9 11:07:24 dom0 kernel: [ 4010.214595] [<ffffffff8025fcbc>] note_interrupt+0x287/0x2c7 Jun 9 11:07:24 dom0 kernel: [ 4010.214628] [<ffffffff802605a7>] handle_level_irq+0xc3/0x116 Jun 9 11:07:24 dom0 kernel: [ 4010.214661] [<ffffffff8020e13e>] do_IRQ +0x4e/0x9a Jun 9 11:07:24 dom0 kernel: [ 4010.214691] [<ffffffff8037d42c>] evtchn_do_upcall+0x13c/0x1fc Jun 9 11:07:24 dom0 kernel: [ 4010.214724] [<ffffffff8020bbde>] do_hypervisor_callback+0x1e/0x30 Jun 9 11:07:24 dom0 kernel: [ 4010.214755] <EOI> [<ffffffff8020e795>] xen_safe_halt+0x90/0xa6 Jun 9 11:07:24 dom0 kernel: [ 4010.214796] [<ffffffff8020a0c8>] xen_idle+0x2e/0x66 Jun 9 11:07:24 dom0 kernel: [ 4010.214825] [<ffffffff80209cd6>] cpu_idle+0x97/0xb9 Jun 9 11:07:24 dom0 kernel: [ 4010.214858] Jun 9 11:07:24 dom0 kernel: [ 4010.214880] handlers: Jun 9 11:07:24 dom0 kernel: [ 4010.214903] [<ffffffff8039c058>] (usb_hcd_irq+0x0/0xab) Jun 9 11:07:24 dom0 kernel: [ 4010.214940] [<ffffffff8039c058>] (usb_hcd_irq+0x0/0xab) Jun 9 11:07:24 dom0 kernel: [ 4010.214977] [<ffffffffa0183eb0>] (igb_intr+0x0/0x100 [igb]) Jun 9 11:07:24 dom0 kernel: [ 4010.215021] Disabling IRQ #17 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Joseph L. Casale
2009-Jun-09 11:38 UTC
RE: [Xen-users] Intel Quad NIC made visible in guest -> system crash
>we have two Intel Quad Nic 82576, PCI ID 8086:10E8 and use the igb >driver 1.3.19.3 on Debian 5.0.1. > >I used the pciback.hide XEN kernel parameter and made on of the NIC''s >interfaces available in a DomU. > >Now, when I am starting the VM, the system crashes (log attached)W/o doing any research myself, I vaguely remember someone here having similar results and suggesting that some nics have a design such that some ports are tied together as a result of sharing components on the nic itself. Basically, you may have a nic that is really only two independent nics, each with two ports so you have to pass two in at once etc. A quick search or test should validate this... hth, jlc _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Alexander Menk
2009-Jun-09 12:27 UTC
RE: [Xen-users] Intel Quad NIC made visible in guest -> system crash
Hi! thanks for the reply. On Di, 2009-06-09 at 11:38 +0000, Joseph L. Casale wrote:> >we have two Intel Quad Nic 82576, PCI ID 8086:10E8 and use the igb > >driver 1.3.19.3 on Debian 5.0.1. > > > >I used the pciback.hide XEN kernel parameter and made on of the NIC''s > >interfaces available in a DomU. > > > >Now, when I am starting the VM, the system crashes (log attached) > > W/o doing any research myself, I vaguely remember someone here having > similar results and suggesting that some nics have a design such that > some ports are tied together as a result of sharing components on the > nic itself. Basically, you may have a nic that is really only two > independent nics, each with two ports so you have to pass two in at > once etc. > > A quick search or test should validate this...I already blacklisted all 4 ports of the whole nic. Next I blacklisted the igb module in dom0 as suggested in http://lists.xensource.com/archives/html/xen-users/2007-10/msg00598.html were Stephan Seitz recommends to not use the module in the dom0. I also disabled MSI interrupts in the igb driver (make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install) as the igb readme says there might be some problems. Now, when starting the domU, I do not get the message anymore that IRQ #17 was disabled, but still: [ 623.361836] ACPI: PCI Interrupt 0000:10:00.1[B] -> GSI 17 (level, low) -> IRQ 17 [ 623.362307] pciback 0000:10:00.1: Driver tried to write to a read-only configuration space field at offset 0xa8, size 2. This may be harmless, but if you have problems with your device: [ 623.362310] 1) see permissive attribute in sysfs [ 623.362311] 2) report problems to the xen-devel mailing list along with details of your device obtained from lspci. [ 623.362771] PCI: Setting latency timer of device 0000:10:00.1 to 64 When doing ifup eth0 inside the domU, I get the message that the cable is not connected. Platform is amd64 with 2 Intel Xeon CPUs with 4 cores. On many places I read to use the boot option pciback.permissive - unfortunately my kernel does not support that setting. I would have been happy to avoid recompiling the kernel, and I read that pciback should work without the permissive flag as well. Any ideas? please ... Best Regards, Alexander _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Alexander Menk
2009-Jun-10 06:12 UTC
RE: [Xen-users] Intel Quad NIC made visible in guest -> system crash
On Di, 2009-06-09 at 20:50 +0000, Fischer, Anna wrote:> > Subject: RE: [Xen-users] Intel Quad NIC made visible in guest -> system > > crash > > > > Hi! > > > > thanks for the reply. > > > > On Di, 2009-06-09 at 11:38 +0000, Joseph L. Casale wrote: > > > >we have two Intel Quad Nic 82576, PCI ID 8086:10E8 and use the igb > > > >driver 1.3.19.3 on Debian 5.0.1. > > > > > > > >I used the pciback.hide XEN kernel parameter and made on of the > > NIC''s > > > >interfaces available in a DomU. > > > > > > > >Now, when I am starting the VM, the system crashes (log attached) > > > > > > W/o doing any research myself, I vaguely remember someone here having > > > similar results and suggesting that some nics have a design such that > > > some ports are tied together as a result of sharing components on the > > > nic itself. Basically, you may have a nic that is really only two > > > independent nics, each with two ports so you have to pass two in at > > > once etc. > > > > > > A quick search or test should validate this... > > > > I already blacklisted all 4 ports of the whole nic. Next I blacklisted > > the igb module in dom0 as suggested in > > http://lists.xensource.com/archives/html/xen-users/2007- > > 10/msg00598.html > > were Stephan Seitz recommends to not use the module in the dom0. > > > > I also disabled MSI interrupts in the igb driver (make > > CFLAGS_EXTRA=-DDISABLE_PCI_MSI install) as the igb readme says there > > might be some problems. > > > > Now, when starting the domU, I do not get the message anymore that IRQ > > #17 was disabled, but still: > > > > [ 623.361836] ACPI: PCI Interrupt 0000:10:00.1[B] -> GSI 17 (level, > > low) -> IRQ 17 > > [ 623.362307] pciback 0000:10:00.1: Driver tried to write to a > > read-only configuration space field at offset 0xa8, size 2. This may be > > harmless, but if you have problems with your device: > > [ 623.362310] 1) see permissive attribute in sysfs > > [ 623.362311] 2) report problems to the xen-devel mailing list along > > with details of your device obtained from lspci. > > [ 623.362771] PCI: Setting latency timer of device 0000:10:00.1 to 64 > > > > When doing ifup eth0 inside the domU, I get the message that the cable > > is not connected. > > > > Platform is amd64 with 2 Intel Xeon CPUs with 4 cores. > > > > On many places I read to use the boot option pciback.permissive - > > unfortunately my kernel does not support that setting. I would have > > been > > happy to avoid recompiling the kernel, and I read that pciback should > > work without the permissive flag as well. > > > > Any ideas? please ... > > I am assuming that you are not using the SR-IOV capabilities of the device?no I don''t. How is the current support status in XEN?> > The 82576 is a multi-function device. If you do an lspci -t then you should see that all ports have the same bus/slot number and only differ in the last digit which is the function ID. I believe that with the current Xen PCI pass-through you have to co-assign all device residing under the same PCI bridge to a single guest domain. So you cannot only assign a single port to a guest. > > You can also see under /proc/interrupts who is using IRQ 17 (that was disabled due to an interrupt clash). I guess that something in your Dom0 is also using it.The usb devices seem to use this interrupt as well: 16: 3796 0 0 0 0 0 0 0 Phys-irq-level arcmsr 17: 0 0 0 0 0 0 0 0 Phys-irq-level uhci_hcd:usb1, ehci_hcd:usb4 18: 737 0 0 0 0 0 0 0 Phys-irq-level uhci_hcd:usb3, eth0 19: 4468 0 0 0 0 0 0 0 Phys-irq-level uhci_hcd:usb2, peth1 I now assigned all of the 4 ports of the device to a single guest, but the error continues. Now I get this error 4 times, for each device, and IRQ 16,17,18,19 have problems. Jun 10 09:03:23 dom0 kernel: [ 369.001440] pciback 0000:0f:00.1: enabling device (0000 -> 0003) Jun 10 09:03:23 dom0 kernel: [ 369.001554] ACPI: PCI Interrupt 0000:0f:00.1[B] -> GSI 19 (level, low) -> IRQ 19 Jun 10 09:03:23 dom0 kernel: [ 369.002066] pciback 0000:0f:00.1: Driver tried to write to a read-only configuration space field at offset 0xa8, size 2. This may be harmless, but if you have problems with your device: Jun 10 09:03:23 dom0 kernel: [ 369.002069] 1) see permissive attribute in sysfs Jun 10 09:03:23 dom0 kernel: [ 369.002070] 2) report problems to the xen-devel mailing list along with details of your device obtained from lspci. I already unloaded the driver for the card in the dom0, but how can I make the dom0 not to use these interrupts? Alexander _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fischer, Anna
2009-Jun-10 08:22 UTC
RE: [Xen-users] Intel Quad NIC made visible in guest -> system crash
> Subject: RE: [Xen-users] Intel Quad NIC made visible in guest -> system > crash > > > On Di, 2009-06-09 at 20:50 +0000, Fischer, Anna wrote: > > > Subject: RE: [Xen-users] Intel Quad NIC made visible in guest -> > system > > > crash > > > > > > Hi! > > > > > > thanks for the reply. > > > > > > On Di, 2009-06-09 at 11:38 +0000, Joseph L. Casale wrote: > > > > >we have two Intel Quad Nic 82576, PCI ID 8086:10E8 and use the > igb > > > > >driver 1.3.19.3 on Debian 5.0.1. > > > > > > > > > >I used the pciback.hide XEN kernel parameter and made on of the > > > NIC''s > > > > >interfaces available in a DomU. > > > > > > > > > >Now, when I am starting the VM, the system crashes (log > attached) > > > > > > > > W/o doing any research myself, I vaguely remember someone here > having > > > > similar results and suggesting that some nics have a design such > that > > > > some ports are tied together as a result of sharing components on > the > > > > nic itself. Basically, you may have a nic that is really only two > > > > independent nics, each with two ports so you have to pass two in > at > > > > once etc. > > > > > > > > A quick search or test should validate this... > > > > > > I already blacklisted all 4 ports of the whole nic. Next I > blacklisted > > > the igb module in dom0 as suggested in > > > http://lists.xensource.com/archives/html/xen-users/2007- > > > 10/msg00598.html > > > were Stephan Seitz recommends to not use the module in the dom0. > > > > > > I also disabled MSI interrupts in the igb driver (make > > > CFLAGS_EXTRA=-DDISABLE_PCI_MSI install) as the igb readme says > there > > > might be some problems. > > > > > > Now, when starting the domU, I do not get the message anymore that > IRQ > > > #17 was disabled, but still: > > > > > > [ 623.361836] ACPI: PCI Interrupt 0000:10:00.1[B] -> GSI 17 > (level, > > > low) -> IRQ 17 > > > [ 623.362307] pciback 0000:10:00.1: Driver tried to write to a > > > read-only configuration space field at offset 0xa8, size 2. This > may be > > > harmless, but if you have problems with your device: > > > [ 623.362310] 1) see permissive attribute in sysfs > > > [ 623.362311] 2) report problems to the xen-devel mailing list > along > > > with details of your device obtained from lspci. > > > [ 623.362771] PCI: Setting latency timer of device 0000:10:00.1 to > 64 > > > > > > When doing ifup eth0 inside the domU, I get the message that the > cable > > > is not connected. > > > > > > Platform is amd64 with 2 Intel Xeon CPUs with 4 cores. > > > > > > On many places I read to use the boot option pciback.permissive - > > > unfortunately my kernel does not support that setting. I would have > > > been > > > happy to avoid recompiling the kernel, and I read that pciback > should > > > work without the permissive flag as well. > > > > > > Any ideas? please ... > > > > I am assuming that you are not using the SR-IOV capabilities of the > device? > > no I don''t. How is the current support status in XEN? > > > > > The 82576 is a multi-function device. If you do an lspci -t then you > should see that all ports have the same bus/slot number and only differ > in the last digit which is the function ID. I believe that with the > current Xen PCI pass-through you have to co-assign all device residing > under the same PCI bridge to a single guest domain. So you cannot only > assign a single port to a guest. > > > > You can also see under /proc/interrupts who is using IRQ 17 (that was > disabled due to an interrupt clash). I guess that something in your > Dom0 is also using it. > > The usb devices seem to use this interrupt as well: > > 16: 3796 0 0 0 0 0 > 0 0 Phys-irq-level arcmsr > 17: 0 0 0 0 0 0 > 0 0 Phys-irq-level uhci_hcd:usb1, ehci_hcd:usb4 > 18: 737 0 0 0 0 0 > 0 0 Phys-irq-level uhci_hcd:usb3, eth0 > 19: 4468 0 0 0 0 0 > 0 0 Phys-irq-level uhci_hcd:usb2, peth1This should not show your peth1 and eth0 device if you have properly disabled those in Dom0. Is this the output of the running Xen system when the guest is running too? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Alexander Menk
2009-Jun-10 08:39 UTC
RE: [Xen-users] Intel Quad NIC made visible in guest -> system crash
On Mi, 2009-06-10 at 08:22 +0000, Fischer, Anna wrote:> > Subject: RE: [Xen-users] Intel Quad NIC made visible in guest -> system > > crash > > > > > > On Di, 2009-06-09 at 20:50 +0000, Fischer, Anna wrote: > > > > Subject: RE: [Xen-users] Intel Quad NIC made visible in guest -> > > system > > > > crash > > > > > > > > Hi! > > > > > > > > thanks for the reply. > > > > > > > > On Di, 2009-06-09 at 11:38 +0000, Joseph L. Casale wrote: > > > > > >we have two Intel Quad Nic 82576, PCI ID 8086:10E8 and use the > > igb > > > > > >driver 1.3.19.3 on Debian 5.0.1. > > > > > > > > > > > >I used the pciback.hide XEN kernel parameter and made on of the > > > > NIC''s > > > > > >interfaces available in a DomU. > > > > > > > > > > > >Now, when I am starting the VM, the system crashes (log > > attached) > > > > > > > > > > W/o doing any research myself, I vaguely remember someone here > > having > > > > > similar results and suggesting that some nics have a design such > > that > > > > > some ports are tied together as a result of sharing components on > > the > > > > > nic itself. Basically, you may have a nic that is really only two > > > > > independent nics, each with two ports so you have to pass two in > > at > > > > > once etc. > > > > > > > > > > A quick search or test should validate this... > > > > > > > > I already blacklisted all 4 ports of the whole nic. Next I > > blacklisted > > > > the igb module in dom0 as suggested in > > > > http://lists.xensource.com/archives/html/xen-users/2007- > > > > 10/msg00598.html > > > > were Stephan Seitz recommends to not use the module in the dom0. > > > > > > > > I also disabled MSI interrupts in the igb driver (make > > > > CFLAGS_EXTRA=-DDISABLE_PCI_MSI install) as the igb readme says > > there > > > > might be some problems. > > > > > > > > Now, when starting the domU, I do not get the message anymore that > > IRQ > > > > #17 was disabled, but still: > > > > > > > > [ 623.361836] ACPI: PCI Interrupt 0000:10:00.1[B] -> GSI 17 > > (level, > > > > low) -> IRQ 17 > > > > [ 623.362307] pciback 0000:10:00.1: Driver tried to write to a > > > > read-only configuration space field at offset 0xa8, size 2. This > > may be > > > > harmless, but if you have problems with your device: > > > > [ 623.362310] 1) see permissive attribute in sysfs > > > > [ 623.362311] 2) report problems to the xen-devel mailing list > > along > > > > with details of your device obtained from lspci. > > > > [ 623.362771] PCI: Setting latency timer of device 0000:10:00.1 to > > 64 > > > > > > > > When doing ifup eth0 inside the domU, I get the message that the > > cable > > > > is not connected. > > > > > > > > Platform is amd64 with 2 Intel Xeon CPUs with 4 cores. > > > > > > > > On many places I read to use the boot option pciback.permissive - > > > > unfortunately my kernel does not support that setting. I would have > > > > been > > > > happy to avoid recompiling the kernel, and I read that pciback > > should > > > > work without the permissive flag as well. > > > > > > > > Any ideas? please ... > > > > > > I am assuming that you are not using the SR-IOV capabilities of the > > device? > > > > no I don''t. How is the current support status in XEN? > > > > > > > > The 82576 is a multi-function device. If you do an lspci -t then you > > should see that all ports have the same bus/slot number and only differ > > in the last digit which is the function ID. I believe that with the > > current Xen PCI pass-through you have to co-assign all device residing > > under the same PCI bridge to a single guest domain. So you cannot only > > assign a single port to a guest. > > > > > > You can also see under /proc/interrupts who is using IRQ 17 (that was > > disabled due to an interrupt clash). I guess that something in your > > Dom0 is also using it. > > > > The usb devices seem to use this interrupt as well: > > > > 16: 3796 0 0 0 0 0 > > 0 0 Phys-irq-level arcmsr > > 17: 0 0 0 0 0 0 > > 0 0 Phys-irq-level uhci_hcd:usb1, ehci_hcd:usb4 > > 18: 737 0 0 0 0 0 > > 0 0 Phys-irq-level uhci_hcd:usb3, eth0 > > 19: 4468 0 0 0 0 0 > > 0 0 Phys-irq-level uhci_hcd:usb2, peth1 > > This should not show your peth1 and eth0 device if you have properly disabled those in Dom0.why? eth0 and eth1 are onboard interfaces, eth2-5 and eth6-9 are on two intel quad NICs. eth1 is using bridging.> Is this the output of the running Xen system when the guest is running > too?yes, the guest is already running. It''s the output after the "see permissive attribute in sysfs" messages in syslog. I am wondering if the solution is just as easy as compiling a kernel that supports setting that permissive attribute? But somehow I don''t fell well with that and maybe that will mess up things even more? Regards, Alexander _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users