the hardware is Sun Fire X2200 M2, and it's discless, PXE booted. this seems to have started sometime before 8.2, and it 'sometimes happens': FreeBSD 8.2-PRERELEASE #15 r4274: Wed Dec 22 09:11:27 IST 2010c40, rbp = 0xffffffff80ef5c60 --- danny@rnd:/home/obj/rnd/r+d/stable/8/sys/HUJI amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2613.40-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x40f13 Family = f Model = 41 Stepping = 3 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA, CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x2001<SSE3,CX16> AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8> ... SMP: AP CPU #3 Launched! (cd0:ata0:0:0:0): SCSI status: Check Condition cpu3 AP: (cd0:ata0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present) ID: 0x03000000 VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff (cd0: lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff ata0:0: timer: 0x000200ef therm: 0x00010000 err: 0x000000f00: pmc: 0x000104000): Error 6, Unretryable error SMP: AP CPU #2 Launched! cd0 at ata0 bus 0 scbus0 target 0 lun 0 cpu2 AP: cd0: ID: 0x02000000 VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff <TEAC DV-28E-N P.6A> Removable CD-ROM SCSI-0 device lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff cd0: 33.300MB/s transfers timer: 0x000200ef therm: 0x00010000 err: 0x000000f0 ( pmc: 0x00010400UDMA2, ATAPI 12bytes, ioapic0: routing intpin 3 (PIO 65534bytesISA IRQ 3)) to lapic 1 vector 48 f loiwotaapbilce0 :c lreoaunteirn gs tianrttpeidn 4 (cd0: Attempt to query device size failed: NOT READY, Medium not present ISA IRQ 4) to lapic 2 vector 48 ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 3 vector 48 ioapic0: routing intpin 15 (ISA IRQ 15) to lapic 1 vector 49 ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 2 vector 49 ioapic0: routing intpin 18 (PCI IRQ 18) to lapic 3 vector 49 ioapic0: routing intpin 22 (PCI IRQ 22) to lapic 1 vector 50 ioapic0: routing intpin 23 (PCI IRQ 23) to lapic 2 vector 50 kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808b1581 stack pointer = 0x28:0xffffffff80ef5b20 frame pointer = 0x28:0xffffffff80ef5b50 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 0 (swapper) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap_pfault() at trap_pfault+0x28f trap() at trap+0x3df calltrap() at calltrap+0x8 --- trap 0xc, rip = 0xffffffff808b1581, rsp = 0xffffffff80ef5b20, rbp = 0xffffffff80ef5b50 --- intr_execute_handlers() at intr_execute_handlers+0x21 lapic_handle_intr() at lapic_handle_intr+0x37 Xapic_isr1() at Xapic_isr1+0xa5 --- interrupt, rip = 0xffffffff808b6cf3, rsp = 0xffffffff80ef5c40, rbp = 0xffffffff80ef5c60 --- spinlock_exit() at spinlock_exit+0x33 ioapic_assign_cpu() at ioapic_assign_cpu+0x123 intr_shuffle_irqs() at intr_shuffle_irqs+0x9d mi_startup() at mi_startup+0x77 btext() at btext+0x2c Uptime: 2s
On Wednesday, December 22, 2010 5:12:03 am Daniel Braniss wrote:> the hardware is Sun Fire X2200 M2, and it's discless, PXE booted. > > this seems to have started sometime before 8.2, and it > 'sometimes happens': > > FreeBSD 8.2-PRERELEASE #15 r4274: Wed Dec 22 09:11:27 IST 2010c40, rbp = > 0xffffffff80ef5c60 --- > danny@rnd:/home/obj/rnd/r+d/stable/8/sys/HUJI amd64 > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2613.40-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x40f13 Family = f Model = 41 Stepping = 3 > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA, > CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x2001<SSE3,CX16> > AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!> > AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8> > ... > SMP: AP CPU #3 Launched! > (cd0:ata0:0:0:0): SCSI status: Check Condition > cpu3 AP: > (cd0:ata0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present) > ID: 0x03000000 VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff > (cd0: lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff > ata0:0: timer: 0x000200ef therm: 0x00010000 err: 0x000000f00: pmc: 0x000104000): > Error 6, Unretryable error > SMP: AP CPU #2 Launched! > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > cpu2 AP: > cd0: ID: 0x02000000 VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff > <TEAC DV-28E-N P.6A> Removable CD-ROM SCSI-0 device > lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff > cd0: 33.300MB/s transfers timer: 0x000200ef therm: 0x00010000 err: 0x000000f0 ( pmc: 0x00010400UDMA2, > ATAPI 12bytes, ioapic0: routing intpin 3 (PIO 65534bytesISA IRQ 3)) to lapic 1 vector 48 > f > loiwotaapbilce0 :c lreoaunteirn gs tianrttpeidn > 4 (cd0: Attempt to query device size failed: NOT READY, Medium not present > ISA IRQ 4) to lapic 2 vector 48 > ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 3 vector 48 > ioapic0: routing intpin 15 (ISA IRQ 15) to lapic 1 vector 49 > ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 2 vector 49 > ioapic0: routing intpin 18 (PCI IRQ 18) to lapic 3 vector 49 > ioapic0: routing intpin 22 (PCI IRQ 22) to lapic 1 vector 50 > ioapic0: routing intpin 23 (PCI IRQ 23) to lapic 2 vector 50 > kernel trap 12 with interrupts disabled > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x10 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff808b1581 > stack pointer = 0x28:0xffffffff80ef5b20 > frame pointer = 0x28:0xffffffff80ef5b50 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = resume, IOPL = 0 > current process = 0 (swapper) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > trap_fatal() at trap_fatal+0x290 > trap_pfault() at trap_pfault+0x28f > trap() at trap+0x3df > calltrap() at calltrap+0x8 > --- trap 0xc, rip = 0xffffffff808b1581, rsp = 0xffffffff80ef5b20, rbp = 0xffffffff80ef5b50 --- > intr_execute_handlers() at intr_execute_handlers+0x21 > lapic_handle_intr() at lapic_handle_intr+0x37 > Xapic_isr1() at Xapic_isr1+0xa5 > --- interrupt, rip = 0xffffffff808b6cf3, rsp = 0xffffffff80ef5c40, rbp = 0xffffffff80ef5c60 --- > spinlock_exit() at spinlock_exit+0x33 > ioapic_assign_cpu() at ioapic_assign_cpu+0x123 > intr_shuffle_irqs() at intr_shuffle_irqs+0x9d > mi_startup() at mi_startup+0x77 > btext() at btext+0x2c > Uptime: 2sCan you do 'l *intr_execute_handlers+0x21' and 'l *ioapic_assign_cpu+0x123' in 'gdb kernel.debug' of your kernel? -- John Baldwin
ok, it happened ... Cannot dump. Device not defined or unavailable. Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. but a- the 15 seconds never happen :-) b- there is some magic to get into the debugger but can't find it. danny
> On Wednesday, December 22, 2010 10:58:56 am Daniel Braniss wrote: > > > On Wednesday, December 22, 2010 5:12:03 am Daniel Braniss wrote: > > > > the hardware is Sun Fire X2200 M2, and it's discless, PXE booted. > > > > > > > > this seems to have started sometime before 8.2, and it > > > > 'sometimes happens': > > > > > > > > FreeBSD 8.2-PRERELEASE #15 r4274: Wed Dec 22 09:11:27 IST 2010c40, rbp = > > > > 0xffffffff80ef5c60 --- > > > > danny@rnd:/home/obj/rnd/r+d/stable/8/sys/HUJI amd64 > > > > Timecounter "i8254" frequency 1193182 Hz quality 0 > > > > CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2613.40-MHz K8-class CPU) > > > > Origin = "AuthenticAMD" Id = 0x40f13 Family = f Model = 41 Stepping = 3 > > > > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA, > > > > CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > > > > Features2=0x2001<SSE3,CX16> > > > > AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!> > > > > AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8> > > > > ... > > > > SMP: AP CPU #3 Launched! > > > > (cd0:ata0:0:0:0): SCSI status: Check Condition > > > > cpu3 AP: > > > > (cd0:ata0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present) > > > > ID: 0x03000000 VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff > > > > (cd0: lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff > > > > ata0:0: timer: 0x000200ef therm: 0x00010000 err: 0x000000f00: pmc: 0x000104000): > > > > Error 6, Unretryable error > > > > SMP: AP CPU #2 Launched! > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > > > > cpu2 AP: > > > > cd0: ID: 0x02000000 VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff > > > > <TEAC DV-28E-N P.6A> Removable CD-ROM SCSI-0 device > > > > lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff > > > > cd0: 33.300MB/s transfers timer: 0x000200ef therm: 0x00010000 err: 0x000000f0 ( pmc: 0x00010400UDMA2, > > > > ATAPI 12bytes, ioapic0: routing intpin 3 (PIO 65534bytesISA IRQ 3)) to lapic 1 vector 48 > > > > f > > > > loiwotaapbilce0 :c lreoaunteirn gs tianrttpeidn > > > > 4 (cd0: Attempt to query device size failed: NOT READY, Medium not present > > > > ISA IRQ 4) to lapic 2 vector 48 > > > > ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 3 vector 48 > > > > ioapic0: routing intpin 15 (ISA IRQ 15) to lapic 1 vector 49 > > > > ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 2 vector 49 > > > > ioapic0: routing intpin 18 (PCI IRQ 18) to lapic 3 vector 49 > > > > ioapic0: routing intpin 22 (PCI IRQ 22) to lapic 1 vector 50 > > > > ioapic0: routing intpin 23 (PCI IRQ 23) to lapic 2 vector 50 > > > > kernel trap 12 with interrupts disabled > > > > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > > cpuid = 0; apic id = 00 > > > > fault virtual address = 0x10 > > > > fault code = supervisor read data, page not present > > > > instruction pointer = 0x20:0xffffffff808b1581 > > > > stack pointer = 0x28:0xffffffff80ef5b20 > > > > frame pointer = 0x28:0xffffffff80ef5b50 > > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > > processor eflags = resume, IOPL = 0 > > > > current process = 0 (swapper) > > > > trap number = 12 > > > > panic: page fault > > > > cpuid = 0 > > > > KDB: stack backtrace: > > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > > kdb_backtrace() at kdb_backtrace+0x37 > > > > panic() at panic+0x187 > > > > trap_fatal() at trap_fatal+0x290 > > > > trap_pfault() at trap_pfault+0x28f > > > > trap() at trap+0x3df > > > > calltrap() at calltrap+0x8 > > > > --- trap 0xc, rip = 0xffffffff808b1581, rsp = 0xffffffff80ef5b20, rbp = 0xffffffff80ef5b50 --- > > > > intr_execute_handlers() at intr_execute_handlers+0x21 > > > > lapic_handle_intr() at lapic_handle_intr+0x37 > > > > Xapic_isr1() at Xapic_isr1+0xa5 > > > > --- interrupt, rip = 0xffffffff808b6cf3, rsp = 0xffffffff80ef5c40, rbp = 0xffffffff80ef5c60 --- > > > > spinlock_exit() at spinlock_exit+0x33 > > > > ioapic_assign_cpu() at ioapic_assign_cpu+0x123 > > > > intr_shuffle_irqs() at intr_shuffle_irqs+0x9d > > > > mi_startup() at mi_startup+0x77 > > > > btext() at btext+0x2c > > > > Uptime: 2s > > > > > > Can you do 'l *intr_execute_handlers+0x21' and 'l *ioapic_assign_cpu+0x123' > > > in 'gdb kernel.debug' of your kernel? > > > > sure, as soon as it happens, and it aint happening now :-( > > but when it will happen, I think it won't let me into the debugger > > - probably will have to recompile > > You don't need to trigger the panic, you can just run > 'gdb /path/to/kernel.debug' (e.g. > 'gdb /usr/obj/usr/src/sys/GENERIC/kernel.debug')sorry, missed the gdb part. gdb /d/7/boot/kernel/kernel ... (gdb) l *intr_execute_handlers+0x21 0xffffffff808b1581 is in intr_execute_handlers (/r+d/stable/8/sys/amd64/amd64/i ntr_machdep.c:243). 238 * We count software interrupts when we process them. The 239 * code here follows previous practice, but there's an 240 * argument for counting hardware interrupts when they're 241 * processed too. 242 */ 243 (*isrc->is_count)++; 244 PCPU_INC(cnt.v_intr); 245 246 ie = isrc->is_event; 247 (gdb) l *ioapic_assign_cpu+0x123 0xffffffff808b29c3 is in ioapic_assign_cpu (/r+d/stable/8/sys/amd64/amd64/io_ap ic.c:383). 378 379 /* 380 * Free the old vector after the new one is established. This is done 381 * to prevent races where we could miss an interrupt. 382 */ 383 if (old_vector) { 384 if (isrc->is_handlers > 0) 385 apic_disable_vector(old_id, old_vector); 386 apic_free_vector(old_id, old_vector, intpin->io_irq); 387 } BTW, the config has makeoptions DEBUG=-g but I don't see no kernel.debug (searched the obj directory, and only found old versions) danny