Marcial Rion
2010-Jan-23 16:28 UTC
[Xen-devel] No ACPI/networking with pv_ops Kernel 2.6.31.6 and Xen on PIIX4 motherboard
Hi First of all I have to state that I am neither a Kernel nor a Xen developer. Nevertheless, while trying to use Kernel 2.6.31.6 from git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git as a Dom0 Kernel, I discovered an issue and searching the Internet for a long time, I probably also found the cause. However, I won''t be able to fix it by myself :-), so I am trying to share my knowledge with this list, in the hope that the issue might gets fixed sometime :-)... I will try to give you all information that seems relevant to me; however, if it turns out I missed to give enough details about my system (configuration), log files or anything else, I will be glad to provide this information. Furthermore, I would also be happy to support "testing" of potential patches if this is required. I post to this list as this has been suggested at http://wiki.xensource.com/xenwiki/XenParavirtOps (bottom of page). If I am wrong, please give me a short hint so I wont bother you any longer... Now, let''s get into it... About my system: I am running Gentoo (10.0, server profile) on an Asus P2B-D motherboard (PIIX4 chipset) with two PIII 500 MHz CPUs and 1G of RAM. The system furthermore possesses 3 PCI network interfaces of chip type Realtek RLT 8139 (rlt8139too Kernel driver). Network interface to be used is eth0 (I already tried whether using another interface as eth0 would change anything - without success :-( ). The issue I have: While Xen pv_ops Kernel 2.6.31.6 perfectly runs on bare metal, it fails to get network connectivity when run on top of Xen 3.4.1 (Gentoo default installation). Though the system seems to come up correctly at a first sight and network interface is available (I can ping it locally), access to network fails (I cannot ping other system in the network nor vice-versa). What I discovered so far: Consulting the boot messages within "dmesg", I discovered that ACPI SCI fails to load when run on top of Xen, while this error is not happening on bare metal. With XEN: ********* bio: create slab <bio-0> at 0 ACPI: SCI (IRQ20) allocation failed ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler 20090521 evevent-161 ACPI: Unable to start the ACPI Interpreter ------------[ cut here ]------------ WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c() Hardware name: System Name kobject: ''<NULL>'' (cf805ea0): is not initialized, yet kobject_put() is being called. Modules linked in: Pid: 1, comm: swapper Tainted: G W 2.6.31.6 #14 Call Trace: [<c043a2db>] warn_slowpath_common+0x60/0x90 [<c043a33f>] warn_slowpath_fmt+0x24/0x27 [<c05588cb>] kobject_put+0x27/0x3c [<c049e502>] kmem_cache_destroy+0x105/0x11b [<c058adc8>] acpi_os_delete_cache+0x8/0xc [<c05a6fe6>] acpi_ut_delete_caches+0xd/0x6b [<c05a77f7>] acpi_ut_subsystem_shutdown+0x87/0x90 [<c0904837>] ? acpi_init+0x0/0x263 [<c05a8067>] acpi_terminate+0x8/0x14 [<c09049cb>] acpi_init+0x194/0x263 [<c05f0e66>] ? __class_create+0x44/0x5e [<c09021c5>] ? fbmem_init+0x0/0x78 [<c0904837>] ? acpi_init+0x0/0x263 [<c0403051>] do_one_initcall+0x4c/0x13a [<c08e030d>] kernel_init+0x12c/0x17d [<c08e01e1>] ? kernel_init+0x0/0x17d [<c040ad17>] kernel_thread_helper+0x7/0x10 ---[ end trace 4eaa2a86a8e2da23 ]--- ------------[ cut here ]------------ WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c() Hardware name: System Name kobject: ''<NULL>'' (cf805f60): is not initialized, yet kobject_put() is being called. Modules linked in: Pid: 1, comm: swapper Tainted: G W 2.6.31.6 #14 Call Trace: [<c043a2db>] warn_slowpath_common+0x60/0x90 [<c043a33f>] warn_slowpath_fmt+0x24/0x27 [<c05588cb>] kobject_put+0x27/0x3c [<c049e502>] kmem_cache_destroy+0x105/0x11b [<c058adc8>] acpi_os_delete_cache+0x8/0xc [<c05a700e>] acpi_ut_delete_caches+0x35/0x6b [<c05a77f7>] acpi_ut_subsystem_shutdown+0x87/0x90 [<c0904837>] ? acpi_init+0x0/0x263 [<c05a8067>] acpi_terminate+0x8/0x14 [<c09049cb>] acpi_init+0x194/0x263 [<c05f0e66>] ? __class_create+0x44/0x5e [<c09021c5>] ? fbmem_init+0x0/0x78 [<c0904837>] ? acpi_init+0x0/0x263 [<c0403051>] do_one_initcall+0x4c/0x13a [<c08e030d>] kernel_init+0x12c/0x17d [<c08e01e1>] ? kernel_init+0x0/0x17d [<c040ad17>] kernel_thread_helper+0x7/0x10 ---[ end trace 4eaa2a86a8e2da24 ]--- sync cpu 0 get result ffffffff max_id 0 Failed to sync pcpu 0 xenbus_probe_backend_init bus registered ok Wihout Xen: *********** bio: create slab <bio-0> at 0 ACPI: EC: Look up EC in DSDT ACPI: Interpreter enabled ACPI: (supports S0 S5) ACPI: Using IOAPIC for interrupt routing ACPI: No dock devices found. ACPI: PCI Root Bridge [PCI0] (0000:00) pci 0000:00:00.0: reg 10 32bit mmio: [0xf8000000-0xfbffffff] pci 0000:00:04.1: reg 20 io port: [0xb800-0xb80f] pci 0000:00:04.2: reg 20 io port: [0xb400-0xb41f] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, * this clock source is slow. Consider trying other clock sources pci 0000:00:04.3: quirk: region e400-e43f claimed by PIIX4 ACPI pci 0000:00:04.3: quirk: region e800-e80f claimed by PIIX4 SMB pci 0000:00:04.3: PIIX4 devres B PIO at 0290-0297 pci 0000:00:09.0: reg 10 io port: [0xb000-0xb0ff] pci 0000:00:09.0: reg 14 32bit mmio: [0xde800000-0xde8000ff] pci 0000:00:09.0: reg 30 32bit mmio: [0x000000-0x00ffff] pci 0000:00:0a.0: reg 10 io port: [0xa800-0xa8ff] pci 0000:00:0a.0: reg 14 32bit mmio: [0xde000000-0xde0000ff] pci 0000:00:0a.0: supports D1 D2 pci 0000:00:0a.0: PME# supported from D1 D2 D3hot pci 0000:00:0a.0: PME# disabled pci 0000:00:0b.0: reg 10 io port: [0xa400-0xa4ff] pci 0000:00:0b.0: reg 14 32bit mmio: [0xdd800000-0xdd8000ff] pci 0000:00:0b.0: supports D1 D2 pci 0000:00:0b.0: PME# supported from D1 D2 D3hot pci 0000:00:0b.0: PME# disabled pci 0000:01:00.0: reg 10 32bit mmio: [0xe0000000-0xe3ffffff] pci 0000:01:00.0: reg 14 32bit mmio: [0xdf800000-0xdf87ffff] pci 0000:01:00.0: reg 18 io port: [0xd800-0xd8ff] pci 0000:01:00.0: reg 30 32bit mmio: [0xdf7e0000-0xdf7fffff] pci 0000:01:00.0: supports D1 D2 pci 0000:00:01.0: bridge io port: [0xd000-0xdfff] pci 0000:00:01.0: bridge 32bit mmio: [0xf4000000-0xf40fffff] pci 0000:00:01.0: bridge 32bit mmio pref: [0xdf700000-0xe3ffffff] pci_bus 0000:00: on NUMA node 0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 11 *12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 *4 5 6 7 9 10 11 12 14 15) xenbus_probe_backend_init bus registered ok Respective to the error, the /proc/interrupts tables were also different: With XEN: ********* CPU0 CPU1 1: 426 0 xen-pirq-ioapic-edge i8042 3: 0 0 xen-pirq-ioapic-edge uhci_hcd:usb1 4: 2 0 xen-pirq-ioapic-edge serial 8: 2 0 xen-pirq-ioapic-edge rtc0 12: 0 0 xen-pirq-ioapic-edge eth0 14: 4319 0 xen-pirq-ioapic-edge ide0 15: 42 0 xen-pirq-ioapic-edge ide1 411: 0 0 xen-dyn-event xenbus 412: 0 703 xen-dyn-ipi callfuncsingle1 413: 0 0 xen-dyn-virq debug1 414: 0 0 xen-dyn-ipi callfunc1 415: 0 45622 xen-dyn-ipi resched1 416: 0 311 xen-dyn-ipi spinlock1 417: 0 153289 xen-dyn-virq timer1 418: 550 0 xen-dyn-ipi callfuncsingle0 419: 0 0 xen-dyn-virq debug0 420: 0 0 xen-dyn-ipi callfunc0 421: 18071 0 xen-dyn-ipi resched0 422: 661 0 xen-dyn-ipi spinlock0 423: 277476 0 xen-dyn-virq timer0 NMI: 0 0 Non-maskable interrupts LOC: 0 0 Local timer interrupts SPU: 0 0 Spurious interrupts CNT: 0 0 Performance counter interrupts PND: 0 0 Performance pending work RES: 18071 45622 Rescheduling interrupts CAL: 550 703 Function call interrupts TLB: 0 0 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 132 132 Machine check polls ERR: 0 MIS: 0 Without XEN: ************ CPU0 CPU1 0: 46 0 IO-APIC-edge timer 1: 2567 4239 IO-APIC-edge i8042 6: 3 0 IO-APIC-edge floppy 8: 1 1 IO-APIC-edge rtc0 14: 28604 27089 IO-APIC-edge ide0 15: 0 0 IO-APIC-edge ide1 18: 1942 1978 IO-APIC-fasteoi eth0 20: 0 0 IO-APIC-fasteoi acpi NMI: 0 0 Non-maskable interrupts LOC: 1097380 1052641 Local timer interrupts SPU: 0 0 Spurious interrupts CNT: 0 0 Performance counter interrupts PND: 0 0 Performance pending work RES: 105211 107135 Rescheduling interrupts CAL: 16 20 Function call interrupts TLB: 4542 4509 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 289 289 Machine check polls ERR: 0 MIS: 0 Searching the Internet, I ran across different messages (i.e. http://www.mail-archive.com/kvm@vger.kernel.org/msg26601.html) mentioning that on motherboards with the PIIX4 chipset SCI interrupt is hardwired to IRQ 9. However, on my system it is assigned IRQ 20 on bare metal, and fails to be set to IRQ 20 on top of Xen (see extract above of dmesg when run on top of Xen -> ACPI: SCI (IRQ20) allocation failed). As I started wondering whether it would work with IRQ 9 and having no knowledge of ACPI and interrupt handling in the Kernel, I badly fixed the code of <Kernel-DIR>/drivers/acpi/osl.c in the following manner: osl.c:391 ********* acpi_status acpi_os_install_interrupt_handler(u32 gsi, acpi_osd_handler handler, void *context) { unsigned int irq; acpi_irq_stats_init(); /* * Ignore the GSI from the core, and use the value in our copy of the * FADT. It may not be the same if an interrupt source override exists * for the SCI. */ gsi = acpi_gbl_FADT.sci_interrupt; if (acpi_gsi_to_irq(gsi, &irq) < 0) { printk(KERN_ERR PREFIX "SCI (ACPI GSI %d) not registered\n", gsi); return AE_OK; } + irq = 9; acpi_irq_handler = handler; acpi_irq_context = context; if (request_irq(irq, acpi_irq, IRQF_SHARED, "acpi", acpi_irq)) { printk(KERN_ERR PREFIX "SCI (IRQ%d) allocation failed\n", irq); return AE_NOT_ACQUIRED; } acpi_irq_irq = irq; return AE_OK; } As you can see, I just "overwrote" the IRQ number somehow evaluated by the system with IRQ 9, recompiled the Kernel and discovered(!) that networking was now working, even within Xen (btw: it was still working on bare metal). Now I don''t know why it is working with SCI mapped to IRQ 20 on bare metal while SCI is supposed to be hardwired to IRQ 9, but the fact that it works in both cases with IRQ 9 suggests me there is something "wrong" or at least different when pv_ops Kernel 2.6.31.6 is run on top of Xen. So someone somewhen might have a look at it, because that''s where my knowledge stops... Thanks & regards, Marcial _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel