Hi Alex: I was looking at an ia64 bug report and noticed that we don''t actually free IRQs in the free_irq_vector hypercall. This would eventually lead to alloc_irq_vector failing. Unless I''m mistaken something like calling pci_disable_device and pci_enable_device can lead to this situation. So I''m wondering what the original problem was and how could we resolve it without leaking the IRQ. Any ideas? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-08-30 at 18:48 +0800, Herbert Xu wrote:> Hi Alex: > > I was looking at an ia64 bug report and noticed that we don''t > actually free IRQs in the free_irq_vector hypercall. This > would eventually lead to alloc_irq_vector failing. Unless I''m > mistaken something like calling pci_disable_device and > pci_enable_device can lead to this situation. > > So I''m wondering what the original problem was and how could > we resolve it without leaking the IRQ. Any ideas?Hi Herbert, I don''t think we ever investigated this any further, though there''s obviously something wrong there. I believe you''re referring to this cset: http://xenbits.xensource.com/xen-unstable.hg?rev/968caf47b548 Unfortunately, the comment is still true. This is fairly simply to reproduce, hide a PCI device from dom0 using something like pciback.hide=(0000:01:02.1) on the dom0 append line (this is hiding function 1 or a 2 port/function e1000 card). Simply boot dom0, reboot, badness... Unable to handle kernel paging request at virtual address 0000007366627375 reboot[2703]: Oops 8813272891392 [1] Modules linked in: Pid: 2703, CPU 1, comm: reboot psr : 00001010085a6010 ifs : 800000000000038a ip : [<a0000001000acab0>] Not tainted ip is at notifier_call_chain+0x30/0xc0 unat: 0000000000000000 pfs : 400000000000038a rsc : 0000000000000007 rnat: 0000000000000000 bsps: 0000000000000000 pr : 000000000055a959 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000aed00 b6 : a000000100018610 b7 : a000000100018570 f6 : 000000000000000000000 f7 : 000000000000000000000 f8 : 000000000000000000000 f9 : 000000000000000000000 f10 : 000000000000000000000 f11 : 000000000000000000000 r1 : a0000001011225a0 r2 : 4000000000000792 r3 : a000000100f3e368 r8 : 0000007366627375 r9 : 60000fffff4bfc90 r10 : 0000000000000000 r11 : 0000000000000008 r12 : e0000001b7d57d30 r13 : e0000001b7d50000 r14 : e0000001be05a880 r15 : 0000000100000000 r16 : 0000000000000000 r17 : 0000000001234567 r18 : 0000000000200000 r19 : 0000000000000008 r20 : 2000000000244200 r21 : 0009804c8a70033f r22 : e0000001b7d50f70 r23 : 60000fff7fffc0c8 r24 : 0000000000000000 r25 : 0000000000000000 r26 : c00000000000010a r27 : 0000000000000000 r28 : fffffffffff00031 r29 : 00001213085a6010 r30 : 0000000000000000 r31 : a000000100d0b080 Call Trace: [<a00000010001d520>] show_stack+0x40/0xa0 sp=e0000001b7d578e0 bsp=e0000001b7d511a8 [<a00000010001e180>] show_regs+0x840/0x880 sp=e0000001b7d57ab0 bsp=e0000001b7d51150 [<a000000100042900>] die+0x1c0/0x380 sp=e0000001b7d57ab0 bsp=e0000001b7d51108 [<a000000100066970>] ia64_do_page_fault+0x870/0x9a0 sp=e0000001b7d57ad0 bsp=e0000001b7d510b8 [<a000000100069140>] xen_leave_kernel+0x0/0x3e0 sp=e0000001b7d57b60 bsp=e0000001b7d510b8 [<a0000001000acab0>] notifier_call_chain+0x30/0xc0 sp=e0000001b7d57d30 bsp=e0000001b7d51068 [<a0000001000aed00>] blocking_notifier_call_chain+0x40/0x80 sp=e0000001b7d57d30 bsp=e0000001b7d51030 [<a0000001000afb10>] kernel_restart+0x30/0x120 sp=e0000001b7d57d30 bsp=e0000001b7d51010 [<a0000001000afff0>] sys_reboot+0x3b0/0x480 sp=e0000001b7d57d30 bsp=e0000001b7d50f90 [<a000000100014560>] ia64_ret_from_syscall+0x0/0x40 sp=e0000001b7d57e30 bsp=e0000001b7d50f90 [<a0000000000108e0>] __kernel_syscall_via_break+0x0/0x20 sp=e0000001b7d58000 bsp=e0000001b7d50f90 /etc/rc6.d/S90reboot: line 17: 2703 Segmentation fault reboot -d -f -i I''d guess this is some kind of double free that we need to track down. Thanks, Alex -- Alex Williamson HP Open Source & Linux Org. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Alex: I follow your steps to hide e1000 from domain0, adding "pciback.hide=(0000:01:00.0)" in append line, then reboot .It seem that all right. The NAT consumption fault may have fixed by this patch: http://xenbits.xensource.com/xen-unstable.hg?rev/7158623a1b3d Could you please have a check? BTW: It seems that we don''t free irq_handler of e1000, so when rebooting, there may be some warning message printing in the serial port! Thank you! Best regards Duan Ronghui -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Alex Williamson Sent: Thursday, August 30, 2007 10:35 PM To: Herbert Xu Cc: Xen Development Mailing List; xen-ia64-devel Subject: [Xen-devel] Re: free_irq_vector on ia64 On Thu, 2007-08-30 at 18:48 +0800, Herbert Xu wrote:> Hi Alex: > > I was looking at an ia64 bug report and noticed that we don''t > actually free IRQs in the free_irq_vector hypercall. This > would eventually lead to alloc_irq_vector failing. Unless I''m > mistaken something like calling pci_disable_device and > pci_enable_device can lead to this situation. > > So I''m wondering what the original problem was and how could > we resolve it without leaking the IRQ. Any ideas?Hi Herbert, I don''t think we ever investigated this any further, though there''s obviously something wrong there. I believe you''re referring to this cset: http://xenbits.xensource.com/xen-unstable.hg?rev/968caf47b548 Unfortunately, the comment is still true. This is fairly simply to reproduce, hide a PCI device from dom0 using something like pciback.hide=(0000:01:02.1) on the dom0 append line (this is hiding function 1 or a 2 port/function e1000 card). Simply boot dom0, reboot, badness... Unable to handle kernel paging request at virtual address 0000007366627375 reboot[2703]: Oops 8813272891392 [1] Modules linked in: Pid: 2703, CPU 1, comm: reboot psr : 00001010085a6010 ifs : 800000000000038a ip : [<a0000001000acab0>] Not tainted ip is at notifier_call_chain+0x30/0xc0 unat: 0000000000000000 pfs : 400000000000038a rsc : 0000000000000007 rnat: 0000000000000000 bsps: 0000000000000000 pr : 000000000055a959 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000aed00 b6 : a000000100018610 b7 : a000000100018570 f6 : 000000000000000000000 f7 : 000000000000000000000 f8 : 000000000000000000000 f9 : 000000000000000000000 f10 : 000000000000000000000 f11 : 000000000000000000000 r1 : a0000001011225a0 r2 : 4000000000000792 r3 : a000000100f3e368 r8 : 0000007366627375 r9 : 60000fffff4bfc90 r10 : 0000000000000000 r11 : 0000000000000008 r12 : e0000001b7d57d30 r13 : e0000001b7d50000 r14 : e0000001be05a880 r15 : 0000000100000000 r16 : 0000000000000000 r17 : 0000000001234567 r18 : 0000000000200000 r19 : 0000000000000008 r20 : 2000000000244200 r21 : 0009804c8a70033f r22 : e0000001b7d50f70 r23 : 60000fff7fffc0c8 r24 : 0000000000000000 r25 : 0000000000000000 r26 : c00000000000010a r27 : 0000000000000000 r28 : fffffffffff00031 r29 : 00001213085a6010 r30 : 0000000000000000 r31 : a000000100d0b080 Call Trace: [<a00000010001d520>] show_stack+0x40/0xa0 sp=e0000001b7d578e0 bsp=e0000001b7d511a8 [<a00000010001e180>] show_regs+0x840/0x880 sp=e0000001b7d57ab0 bsp=e0000001b7d51150 [<a000000100042900>] die+0x1c0/0x380 sp=e0000001b7d57ab0 bsp=e0000001b7d51108 [<a000000100066970>] ia64_do_page_fault+0x870/0x9a0 sp=e0000001b7d57ad0 bsp=e0000001b7d510b8 [<a000000100069140>] xen_leave_kernel+0x0/0x3e0 sp=e0000001b7d57b60 bsp=e0000001b7d510b8 [<a0000001000acab0>] notifier_call_chain+0x30/0xc0 sp=e0000001b7d57d30 bsp=e0000001b7d51068 [<a0000001000aed00>] blocking_notifier_call_chain+0x40/0x80 sp=e0000001b7d57d30 bsp=e0000001b7d51030 [<a0000001000afb10>] kernel_restart+0x30/0x120 sp=e0000001b7d57d30 bsp=e0000001b7d51010 [<a0000001000afff0>] sys_reboot+0x3b0/0x480 sp=e0000001b7d57d30 bsp=e0000001b7d50f90 [<a000000100014560>] ia64_ret_from_syscall+0x0/0x40 sp=e0000001b7d57e30 bsp=e0000001b7d50f90 [<a0000000000108e0>] __kernel_syscall_via_break+0x0/0x20 sp=e0000001b7d58000 bsp=e0000001b7d50f90 /etc/rc6.d/S90reboot: line 17: 2703 Segmentation fault reboot -d -f -i I''d guess this is some kind of double free that we need to track down. Thanks, Alex -- Alex Williamson HP Open Source & Linux Org. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2007-09-03 at 15:27 +0800, Duan, Ronghui wrote:> Hi Alex: > > I follow your steps to hide e1000 from domain0, adding > "pciback.hide=> (0000:01:00.0)" in append line, then reboot .It seem that all right. The > NAT consumption fault may have fixed by this patch: > http://xenbits.xensource.com/xen-unstable.hg?rev/7158623a1b3d > > Could you please have a check?That could be why I don''t get a NaT consumption like the comment indicated, but I still get the Oops. It doesn''t seem like there''s anything platform specific here, so I''m not sure why you don''t see it. Note that I''m hiding function 1 of a two port, two function e1000, while you''re hiding function 0. I don''t know if that matters, but perhaps a variable to play with.> BTW: It seems that we don''t free irq_handler of e1000, so when > rebooting, there may be some warning message printing in the serial > port! > Thank you!Yup, that''s a bug in our base 2.6.18 kernel which should go away next time we re-base. However, that''s a separate issue. Thanks, Alex -- Alex Williamson HP Open Source & Linux Org. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Here''s what I see on dom0 bootup if I revert xen-unstable.hg cset 12884 (plus a little debug code): ... pciback 0000:01:02.1: seizing device [port 1 of e1000] pciback 0000:02:05.0: seizing device [a tulip nic] pciback 0000:16:05.0: seizing device [another tulip nic] ... (XEN) assign_irq_vector(-1)... (XEN) assign_irq_vector(-1) = 59 GSI 76 (level, low) -> CPU 2 (0x0200) vector 59 ACPI: PCI Interrupt 0000:16:05.0[A] -> GSI 76 (level, low) -> IRQ 59 ACPI: PCI interrupt for device 0000:16:05.0 disabled GSI 76 (level, low) -> CPU 2 (0x0200) vector 59 unregistered (XEN) free_irq_vector(59) ... (XEN) assign_irq_vector(-1)... (XEN) assign_irq_vector(-1) = 59 GSI 28 (level, low) -> CPU 3 (0x0300) vector 59 ACPI: PCI Interrupt 0000:02:05.0[A] -> GSI 28 (level, low) -> IRQ 59 ACPI: PCI interrupt for device 0000:02:05.0 disabled GSI 28 (level, low) -> CPU 3 (0x0300) vector 59 unregistered (XEN) free_irq_vector(59) ... (XEN) assign_irq_vector(-1)... (XEN) assign_irq_vector(-1) = 59 GSI 32 (level, low) -> CPU 0 (0x0000) vector 59 ACPI: PCI Interrupt 0000:01:02.1[B] -> GSI 32 (level, low) -> IRQ 59 ACPI: PCI interrupt for device 0000:01:02.1 disabled GSI 32 (level, low) -> CPU 0 (0x0000) vector 59 unregistered (XEN) free_irq_vector(59) ... (XEN) assign_irq_vector(-1)... (XEN) assign_irq_vector(-1) = 59 GSI 19 (level, low) -> CPU 1 (0x0100) vector 59 ACPI: PCI Interrupt 0000:00:02.2[C] -> GSI 19 (level, low) -> IRQ 59 ehci_hcd 0000:00:02.2: EHCI Host Controller ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:02.2: request interrupt 59 failed ehci_hcd 0000:00:02.2: USB bus 1 deregistered ACPI: PCI interrupt for device 0000:00:02.2 disabled GSI 19 (level, low) -> CPU 1 (0x0100) vector 59 unregistered (XEN) free_irq_vector(59) So, I think each of the assigns of the hidden devices is from the pci_enable_device() call in pcistub_init_device(). This then immediately calls pciback_reset_device() which frees the irq. Note how vector 59 gets tossed around the hidden devices, then ends up being re-used for the USB device and it doesn''t work (at least request_irq() failed). The order of device startup might have more to do with the Oops on shutdown than anything else (maybe a bad error path in the usb shutdown notifier chain). There''s a slightly scary comment in pci_stub.c that we likely run afoul of if we reuse the interrupt vector: /* HACK: Force device (& ACPI) to determine what IRQ it''s on - we * must do this here because pcibios_enable_device may specify * the pci device''s true irq (and possibly its other resources) * if they differ from what''s in the configuration space. * This makes the assumption that the device''s resources won''t * change after this point (otherwise this code may break!) */ When I run lspci on these devices, they all show up on IRQ 59. So, not calling free_irq_vector() is a bad hack, but it makes sure the interrupt vector assigned to the hidden doesn''t get recycled somewhere else. Perhaps we need make sure pciback_reset_device() doesn''t release the vector, but it''s not obvious how to do that. Thanks, Alex -- Alex Williamson HP Open Source & Linux Org. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel