Jürgen Keil
2007-Nov-28 16:56 UTC
domU panic, after: 6611846 ... all dom0 interrupts are targeting CPU 0
Hmm, I just bfu''ed to a Solaris PV domU to bits compiled from today''s opensolaris sources (which have just been tagged with onnv_79); the update included the fix for 6611846 after boot, all dom0 interrupts are targeting CPU 0 in a MP system The Solaris domU was running with parameter vcpus=2. Problem: during the reboot after bfu, the Solaris domU panics, like this: # xm console solaris module /platform/i86xpv/kernel/amd64/unix: text at [0xfffffffffb800000, 0xfffffffffb91ce53] data at 0xfffffffffbc00000 module /kernel/amd64/genunix: text at [0xfffffffffb91ce60, 0xfffffffffbb4d977] data at 0xfffffffffbca2ba0 Loading kmdb... module /kernel/misc/amd64/kmdbmod: text at [0xfffffffffbb4d980, 0xfffffffffbbdcb1f] data at 0xfffffffffbd0afd0 module /kernel/misc/amd64/ctf: text at [0xfffffffffbbdcb20, 0xfffffffffbbe6a1f] data at 0xfffffffffbd262b8 v3.1.0 chgset ''unavailable'' SunOS Release 5.11 Version wos_b79 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. features: 31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc> mem = 4194304K (0x100000000) root nexus = i86xpv pseudo0 at root pseudo0 is /pseudo scsi_vhci0 at root scsi_vhci0 is /scsi_vhci xpvd0 at root xdf@0, xdf1 xdf1 is /xpvd/xdf@0 /xpvd/xdf@0 (xdf1) online xdf@0: 67108864 blocks/cpus (cpunex0) online pseudo-device: dld0 dld0 is /pseudo/dld@0 xencons@0, xencons0 xencons0 is /xpvd/xencons@0 cpu0: x86 (chipid 0x0 GenuineIntel 6FB family 6 model 15 step 11 clock 2388 MHz) cpu0: Intel(r) Core(tm)2 Quad CPU Q6600 @ 2.40GHz panic[cpu1]/thread=ffffff0007bf1c80: BAD TRAP: type=e (#pf Page fault) rp=ffffff0007bf1af0 addr=17 occurred in module "xpv_psm" due to a NULL pointer dereference #pf Page fault Bad kernel fault at addr=0x17 pid=0, pc=0xfffffffff780d09e, sp=0xffffff0007bf1be0, eflags=0x10002 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2620<vmxe,xmme,fxsr,pae> cr2: 17 rdi: 96 rsi: ffffff0007bf1c80 rdx: ffffff0007bf1ad4 rcx: fffffffffb83d609 r8: 0 r9: 0 rax: 0 rbx: 0 rbp: ffffff0007bf1c00 r10: fffffffffffffffd r11: 282 r12: 104 r13: 1 r14: 0 r15: 0 fsb: 0 gsb: ffffff01ce514000 ds: e02b es: e02b fs: 0 gs: 0 trp: e err: 2 rip: fffffffff780d09e cs: e030 rfl: 10002 rsp: ffffff0007bf1be0 ss: e02b ffffff0007bf19d0 unix:die+c8 () ffffff0007bf1ae0 unix:trap+13bd () ffffff0007bf1af0 unix:cmntrap+12f () ffffff0007bf1c00 xpv_psm:xen_psm_rebind_irq+4e () ffffff0007bf1c30 xpv_psm:xen_psm_enable_intr+41 () ffffff0007bf1c40 xpv_psm:xen_psm_post_cpu_start+34 () ffffff0007bf1c70 unix:mp_startup+3d () >> warning! 8-byte aligned %fp = ffffff0007bf1c68 ffffff0007bf1c68 0 () panic: entering debugger (no dump device, continue to reboot) Welcome to kmdb Loaded modules: [ scsi_vhci neti xpv_psm ufs unix krtld uhci hook genunix ip usba specfs sctp arp ] [1]> $c kmdb_enter+0xb() debug_enter+0x37(fffffffffbc0e370) panicsys+0x3fd(fffffffffb917388, ffffff0007bf1918, fffffffffbc7a000, 1) vpanic+0x15d() panic+0x9c() die+0xc8(e, ffffff0007bf1af0, 17, 1) trap+0x13bd(ffffff0007bf1af0, 17, 1) 0xfffffffffb80020f() xpv_psm`xen_psm_rebind_irq+0x4e(104) xpv_psm`xen_psm_enable_intr+0x41(fffffffffbc147e0) xpv_psm`xen_psm_post_cpu_start+0x34() mp_startup+0x3d() [1]> xen_psm_rebind_irq+0x4e/i xpv_psm`xen_psm_rebind_irq+0x4e:movb %bl,0x17(%r8) [1]> xen_psm_rebind_irq+0x4e::dis xpv_psm`xen_psm_rebind_irq+0x21:movq $0x1,%rsi xpv_psm`xen_psm_rebind_irq+0x28:movl $-0x81,%ecx <0xffffff7f> xpv_psm`xen_psm_rebind_irq+0x2d:andl %ebx,%ecx xpv_psm`xen_psm_rebind_irq+0x2f:shlq %cl,%rsi xpv_psm`xen_psm_rebind_irq+0x32:jmp +0x7 <xpv_psm`xen_psm_rebind_irq+0x3b>xpv_psm`xen_psm_rebind_irq+0x34: movq +0x456f34d(%rip),%rsi <xpv_psm`xen_psm_cpus_online> xpv_psm`xen_psm_rebind_irq+0x3b:movl %r12d,%edi xpv_psm`xen_psm_rebind_irq+0x3e:call +0x402e2bd <ec_set_irq_affinity> xpv_psm`xen_psm_rebind_irq+0x43:movslq %r12d,%r8 xpv_psm`xen_psm_rebind_irq+0x46: movq 0xfffffffffbd7a200(,%r8,8),%r8 <xpv_psm`apic_irq_table> xpv_psm`xen_psm_rebind_irq+0x4e:movb %bl,0x17(%r8) xpv_psm`xen_psm_rebind_irq+0x52:popq %r12 xpv_psm`xen_psm_rebind_irq+0x54:popq %rbx xpv_psm`xen_psm_rebind_irq+0x55:leave xpv_psm`xen_psm_rebind_irq+0x56:ret 0xfffffffff780d0a7: nop 0xfffffffff780d0aa: nop 0xfffffffff780d0ad: nop xpv_psm`xen_psm_disable_intr: pushq %rbp xpv_psm`xen_psm_disable_intr+1: movq %rsp,%rbp xpv_psm`xen_psm_disable_intr+4: subq $0x10,%rsp [1]> apic_irq_table::print [0x104] kmdb: index 104 is outside of array bounds [0 .. ff] [0x104] = 0 [1]> apic_irq_table::print -t apic_irq_t *[256] [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ] [1]> I think the new code in xen_psm_rebind_irq() must check if the interrupt vector is a valid "PIRQ" vector, before using it as apic_irq_table index. Other parts of the code use something like if (irqno >= PIRQ_BASE && irqno < NR_PIRQS && DOMAIN_IS_INITDOMAIN(xen_info)) { irqptr = apic_irq_table[irqno]; ... or if (irq <= APIC_MAX_VECTOR) irqptr = apic_irq_table[irq]; else irqptr = NULL; This message posted from opensolaris.org