Lu Baolu
2008-Oct-15 08:26 UTC
onnv_98 domain 0 panic on xvm built from latest xvm source code
Hi, I am trying to build a Nevada domain 0 for the xvm hypervisor which was built from the latest source code. I followed the instructions described in below thread to build xvm. http://mail.opensolaris.org/pipermail/xen-discuss/2008-May/003278.html Solaris domain 0 panic''ed during boot. The information of this panic is posted below. panic[cpu0]/thread=fffffffffbc736e0: BAD TRAP: type=e (#pf Page fault) rp=fffffffffbca6090 addr=d occurred in module "unix" due to a NULL pointer dereference #pf Page fault Bad kernel fault at addr=0xd pid=0, pc=0xfffffffffb846d3b, sp=0xfffffffffbca6180, eflags=0x10246 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2620<vmxe,xmme,fxsr,pae> cr2: d rdi: 286 rsi: 0 rdx: fffffffe rcx: 1 r8: 0 r9: 40000 rax: d rbx: 0 rbp: fffffffffbca61c0 r10: fffffffffbc74ab0 r11: ffffff012fe59000 r12: 0 r13: fffffffffbcb6dc0 r14: 1 r15: ffffff0135bc9580 fsb: 200000000 gsb: fffffffffbc74ab0 ds: 0 es: 0 fs: 0 gs: 0 trp: e err: 2 rip: fffffffffb846d3b cs: e030 rfl: 10246 rsp: fffffffffbca6180 ss: e02b cpu address timestamp type vc handler pc 0 fffffffffbc1ffc8 dc8add987 trap e #pf ec_bind_virq_to_irq+ab 0 fffffffffbc1fe40 dc8ac5dfb intr 4 asyintr sti+86 0 fffffffffbc1fcb8 dc8ac53ff intr ff unknown fakesoftint+4a 0 fffffffffbc1fb30 dc8ac386f intr 104 cbe_fire restore_int_flag+fc 0 fffffffffbc1f9a8 dc84d0cb2 intr 104 cbe_fire restore_int_flag+fc 0 fffffffffbc1f820 dc6a1eceb intr 104 cbe_fire restore_int_flag+fc 0 fffffffffbc1f698 dc4c7bd21 intr 104 cbe_fire restore_int_flag+fc 0 fffffffffbc1f510 dc31c5d55 intr 104 cbe_fire restore_int_flag+fc 0 fffffffffbc1f388 dc20e75f1 intr 13 uhci_intr HYPERVISOR_sched_op+29 0 fffffffffbc1f200 dc1efa45a intr 13 uhci_intr HYPERVISOR_sched_op+29 fffffffffbca5f50 unix:die+d2 () fffffffffbca6080 unix:trap+162f () fffffffffbca6090 unix:cmntrap+24d () fffffffffbca61c0 unix:ec_bind_virq_to_irq+ab () fffffffffbca61f0 xpv_psm:xen_psm_cpu_start+4b () fffffffffbca6210 unix:mach_cpu_start+4a () fffffffffbca6270 unix:start_cpu+5e () fffffffffbca62b0 unix:start_other_cpus+db () fffffffffbca62f0 genunix:main+2bf () fffffffffbca6300 unix:_locore_start+80 () panic: entering debugger (no dump device, continue to reboot) Loaded modules: [ scsi_vhci neti xpv_psm zfs uhci hook ip usba specfs sctp arp xpv_uppc ] kmdb: target stopped at: kmdb_enter+0xb: movq %rax,%rdi [0]> The source code of the panic code is: i86xpv/os/evtchn.c 707 int 708 ec_bind_virq_to_irq(int virq, int cpu) 709 { 710 int err; 711 int evtchn; 712 mec_info_t *virqp; 713 714 virqp = &virq_info[virq]; 715 cmn_err(CE_CONT, "ec_bind_virq_to_irq: virq = %d\n", virq); 716 mutex_enter(&ec_lock); 717 718 err = xen_bind_virq(virq, cpu, &evtchn); 719 ASSERT(err == 0); 720 721 ASSERT(evtchn_to_irq[evtchn] == INVALID_IRQ); 722 723 if (virqp->mi_irq == INVALID_IRQ) { 724 virqp->mi_irq = alloc_irq(IRQT_VIRQ, virq, evtchn, cpu); 725 } else { 726 alloc_irq_evtchn(virqp->mi_irq, virq, evtchn, cpu); 727 } 728 729 mutex_exit(&ec_lock); 730 return (virqp->mi_irq); 731 } This panic happened between line 729 and 730. The disassemble of this code is: [0]> ec_bind_virq_to_irq::dis ec_bind_virq_to_irq+0x95: call -0x97a <alloc_irq> ec_bind_virq_to_irq+0x9a: movw %ax,0xfffffffffbc46ac0(%r12) <virq_info+0x200> ec_bind_virq_to_irq+0xa3: movq %r13,%rdi ec_bind_virq_to_irq+0xa6: call +0x16d35 <mutex_exit> ec_bind_virq_to_irq+0xab: addb %al,(%rax) ec_bind_virq_to_irq+0xad: addb %al,(%rax) ec_bind_virq_to_irq+0xaf: addb %al,(%rax) ec_bind_virq_to_irq+0xb1: addb %al,(%rax) ec_bind_virq_to_irq+0xb3: sti ec_bind_virq_to_irq+0xb4: popq %r14 ec_bind_virq_to_irq+0xb6: popq %r13 ec_bind_virq_to_irq+0xb8: popq %r12 ec_bind_virq_to_irq+0xba: popq %rbx ec_bind_virq_to_irq+0xbb: leave ec_bind_virq_to_irq+0xbc: ret "rax" was "0xd" when "addb %al,(%rax)" was executed. That led to the panic. However, before mutex_exit() was called, "rax" still contained a valid pointer. I have no idea why it was changed during mutex_exit(). Another strange thing is when I add an ASSERT between mutex_exit() and return(), this panic disappeared. That is: 728 729 mutex_exit(&ec_lock); ASSERT(virqp != NULL); 730 return (virqp->mi_irq); 731 } I am very appreciated for any feedback. Thanks