Jürgen Keil
2007-Nov-28 16:56 UTC
domU panic, after: 6611846 ... all dom0 interrupts are targeting CPU 0
Hmm, I just bfu''ed to a Solaris PV domU to bits compiled from
today''s
opensolaris sources (which have just been tagged with onnv_79);
the update included the fix for
6611846 after boot, all dom0 interrupts are targeting CPU 0 in a MP system
The Solaris domU was running with parameter vcpus=2.
Problem: during the reboot after bfu, the Solaris domU panics,
like this:
# xm console solaris
module /platform/i86xpv/kernel/amd64/unix: text at [0xfffffffffb800000,
0xfffffffffb91ce53] data at 0xfffffffffbc00000
module /kernel/amd64/genunix: text at [0xfffffffffb91ce60, 0xfffffffffbb4d977]
data at 0xfffffffffbca2ba0
Loading kmdb...
module /kernel/misc/amd64/kmdbmod: text at [0xfffffffffbb4d980,
0xfffffffffbbdcb1f] data at 0xfffffffffbd0afd0
module /kernel/misc/amd64/ctf: text at [0xfffffffffbbdcb20, 0xfffffffffbbe6a1f]
data at 0xfffffffffbd262b8
v3.1.0 chgset ''unavailable''
SunOS Release 5.11 Version wos_b79 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
features:
31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc>
mem = 4194304K (0x100000000)
root nexus = i86xpv
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
xpvd0 at root
xdf@0, xdf1
xdf1 is /xpvd/xdf@0
/xpvd/xdf@0 (xdf1) online
xdf@0: 67108864 blocks/cpus (cpunex0) online
pseudo-device: dld0
dld0 is /pseudo/dld@0
xencons@0, xencons0
xencons0 is /xpvd/xencons@0
cpu0: x86 (chipid 0x0 GenuineIntel 6FB family 6 model 15 step 11 clock 2388 MHz)
cpu0: Intel(r) Core(tm)2 Quad CPU Q6600 @ 2.40GHz
panic[cpu1]/thread=ffffff0007bf1c80: BAD TRAP: type=e (#pf Page fault)
rp=ffffff0007bf1af0 addr=17 occurred in module "xpv_psm" due to a NULL
pointer dereference
#pf Page fault
Bad kernel fault at addr=0x17
pid=0, pc=0xfffffffff780d09e, sp=0xffffff0007bf1be0, eflags=0x10002
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2620<vmxe,xmme,fxsr,pae>
cr2: 17
rdi: 96 rsi: ffffff0007bf1c80 rdx: ffffff0007bf1ad4
rcx: fffffffffb83d609 r8: 0 r9: 0
rax: 0 rbx: 0 rbp: ffffff0007bf1c00
r10: fffffffffffffffd r11: 282 r12: 104
r13: 1 r14: 0 r15: 0
fsb: 0 gsb: ffffff01ce514000 ds: e02b
es: e02b fs: 0 gs: 0
trp: e err: 2 rip: fffffffff780d09e
cs: e030 rfl: 10002 rsp: ffffff0007bf1be0
ss: e02b
ffffff0007bf19d0 unix:die+c8 ()
ffffff0007bf1ae0 unix:trap+13bd ()
ffffff0007bf1af0 unix:cmntrap+12f ()
ffffff0007bf1c00 xpv_psm:xen_psm_rebind_irq+4e ()
ffffff0007bf1c30 xpv_psm:xen_psm_enable_intr+41 ()
ffffff0007bf1c40 xpv_psm:xen_psm_post_cpu_start+34 ()
ffffff0007bf1c70 unix:mp_startup+3d ()
>> warning! 8-byte aligned %fp = ffffff0007bf1c68
ffffff0007bf1c68 0 ()
panic: entering debugger (no dump device, continue to reboot)
Welcome to kmdb
Loaded modules: [ scsi_vhci neti xpv_psm ufs unix krtld uhci hook genunix ip
usba specfs sctp arp ]
[1]> $c
kmdb_enter+0xb()
debug_enter+0x37(fffffffffbc0e370)
panicsys+0x3fd(fffffffffb917388, ffffff0007bf1918, fffffffffbc7a000, 1)
vpanic+0x15d()
panic+0x9c()
die+0xc8(e, ffffff0007bf1af0, 17, 1)
trap+0x13bd(ffffff0007bf1af0, 17, 1)
0xfffffffffb80020f()
xpv_psm`xen_psm_rebind_irq+0x4e(104)
xpv_psm`xen_psm_enable_intr+0x41(fffffffffbc147e0)
xpv_psm`xen_psm_post_cpu_start+0x34()
mp_startup+0x3d()
[1]> xen_psm_rebind_irq+0x4e/i
xpv_psm`xen_psm_rebind_irq+0x4e:movb %bl,0x17(%r8)
[1]> xen_psm_rebind_irq+0x4e::dis
xpv_psm`xen_psm_rebind_irq+0x21:movq $0x1,%rsi
xpv_psm`xen_psm_rebind_irq+0x28:movl $-0x81,%ecx <0xffffff7f>
xpv_psm`xen_psm_rebind_irq+0x2d:andl %ebx,%ecx
xpv_psm`xen_psm_rebind_irq+0x2f:shlq %cl,%rsi
xpv_psm`xen_psm_rebind_irq+0x32:jmp +0x7
<xpv_psm`xen_psm_rebind_irq+0x3b>
xpv_psm`xen_psm_rebind_irq+0x34:
movq +0x456f34d(%rip),%rsi <xpv_psm`xen_psm_cpus_online>
xpv_psm`xen_psm_rebind_irq+0x3b:movl %r12d,%edi
xpv_psm`xen_psm_rebind_irq+0x3e:call +0x402e2bd
<ec_set_irq_affinity>
xpv_psm`xen_psm_rebind_irq+0x43:movslq %r12d,%r8
xpv_psm`xen_psm_rebind_irq+0x46:
movq 0xfffffffffbd7a200(,%r8,8),%r8 <xpv_psm`apic_irq_table>
xpv_psm`xen_psm_rebind_irq+0x4e:movb %bl,0x17(%r8)
xpv_psm`xen_psm_rebind_irq+0x52:popq %r12
xpv_psm`xen_psm_rebind_irq+0x54:popq %rbx
xpv_psm`xen_psm_rebind_irq+0x55:leave
xpv_psm`xen_psm_rebind_irq+0x56:ret
0xfffffffff780d0a7: nop
0xfffffffff780d0aa: nop
0xfffffffff780d0ad: nop
xpv_psm`xen_psm_disable_intr: pushq %rbp
xpv_psm`xen_psm_disable_intr+1: movq %rsp,%rbp
xpv_psm`xen_psm_disable_intr+4: subq $0x10,%rsp
[1]> apic_irq_table::print [0x104]
kmdb: index 104 is outside of array bounds [0 .. ff]
[0x104] = 0
[1]> apic_irq_table::print -t
apic_irq_t *[256] [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
[1]>
I think the new code in xen_psm_rebind_irq() must check
if the interrupt vector is a valid "PIRQ" vector, before using
it as apic_irq_table index.
Other parts of the code use something like
if (irqno >= PIRQ_BASE && irqno < NR_PIRQS &&
DOMAIN_IS_INITDOMAIN(xen_info)) {
irqptr = apic_irq_table[irqno];
...
or
if (irq <= APIC_MAX_VECTOR)
irqptr = apic_irq_table[irq];
else
irqptr = NULL;
This message posted from opensolaris.org