Hi, FreeBSD has been running rock solid on the older i386/HS20's, but the newer ones with amd64 configuration keeps panicing, and I can't quite figure out why. Help tracking this issue down, is greatly appreciated. The panics happen randomly, average once every 2 days, sometimes just 20minutes between each panic, allways in the process tcpserver, which indicates that this is a network related issue(?). Another problem is that the system can't reboot by it's self, because there is no keyboard controller, leaving the filesystems dirty (there is a flag BROKEN_KEYBOARD_RESET in i386, but not in amd64), so I have to reboot the machine via bladecenter managament to get it up again. If there is anything I can do to provide more usefull output, please let me know. Trace: ------------------------------------------ mxtwo# kgdb kernel.debug /var/crash/vmcore.3 Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18c fault code = supervisor read, page not present instruction pointer = 0x8:0xffffffff802cf867 stack pointer = 0x10:0xffffffffb3ff38b0 frame pointer = 0x10:0x4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 1363 (tcpserver) trap number = 12 panic: page fault cpuid = 0 Uptime: 6m22s Dumping 2047 MB (2 chunks) chunk 0: 1MB (154 pages) ... ok chunk 1: 2047MB (523966 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) list *0xffffffff802cf867 0xffffffff802cf867 is in _mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:544). 539 * If the current owner of the lock is executing on another 540 * CPU, spin instead of blocking. 541 */ 542 owner = (struct thread *)(v & MTX_FLAGMASK); 543 #ifdef ADAPTIVE_GIANT 544 if (TD_IS_RUNNING(owner)) { 545 #else 546 if (m != &Giant && TD_IS_RUNNING(owner)) { 547 #endif 548 turnstile_release(&m->mtx_object); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff802d9bf7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff802da291 in panic (fmt=0xffffff005b3fa4c0 "@??[") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff80488bff in trap_fatal (frame=0xffffff005b3fa4c0, eva=18446742975736173376) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff80489126 in trap (frame {tf_rdi = 56, tf_rsi = -1097980730176, tf_rdx = 6, tf_rcx = 0, tf_r8 0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1098015721464, tf_rbp = 4, tf_r10 -2037788432, tf_r11 = -1097980730176, tf_r12 = -1097980730176, tf_r13 -1097438414848, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396, tf_flags = -2143116959, tf_err = 0, tf_rip = -2144536473, tf_cs = 8, tf_rflags = 65538, tf_rsp = -1275119424, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 #6 0xffffffff8047449b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0xffffffff802cf867 in _mtx_lock_sleep (m=0xffffff005929b808, tid=18446742975728821440, opts=6, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:542 #8 0xffffffff803826bd in ip_ctloutput (so=0x38, sopt=0xffffffffb3ff3b30) at /usr/src/sys/netinet/ip_output.c:1193 #9 0xffffffff80393bd5 in tcp_ctloutput (so=0xffffff005a83b738, sopt=0xffffffffb3ff3b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80322068 in sosetopt (so=0xffffff005a83b738, sopt=0xffffffffb3ff3b30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80328536 in kern_setsockopt (td=0xffffff005b3fa4c0, s=1619162408, level=56, name=0, val=0x0, valseg=UIO_USERSPACE, valsize=2257178864) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff803285ae in setsockopt (td=0x38, uap=0xffffff005b3fa4c0) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff80489a51 in syscall (frame {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 140737488349992, tf_rax = 105, tf_rbx = 0, tf_rbp = 0, tf_r10 = 0, tf_r11 514, tf_r12 = 3, tf_r13 = 140737488350320, tf_r14 = 0, tf_r15 = 0, tf_trapno 12, tf_addr = 5285992, tf_flags = 12, tf_err = 2, tf_rip = 34368089164, tf_cs = 43, tf_rflags = 582, tf_rsp = 140737488350040, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80474638 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008007f6c4c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) up 5 #5 0xffffffff80489126 in trap (frame {tf_rdi = 56, tf_rsi = -1097980730176, tf_rdx = 6, tf_rcx = 0, tf_r8 0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1098015721464, tf_rbp = 4, tf_r10 -2037788432, tf_r11 = -1097980730176, tf_r12 = -1097980730176, tf_r13 -1097438414848, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396, tf_flags = -2143116959, tf_err = 0, tf_rip = -2144536473, tf_cs = 8, tf_rflags = 65538, tf_rsp = -1275119424, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 238 trap_fatal(&frame, frame.tf_addr); (kgdb) up #6 0xffffffff8047449b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 168 call trap Current language: auto; currently asm (kgdb) up #7 0xffffffff802cf867 in _mtx_lock_sleep (m=0xffffff005929b808, tid=18446742975728821440, opts=6, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:542 542 owner = (struct thread *)(v & MTX_FLAGMASK); Current language: auto; currently c (kgdb) list 537 #if defined(SMP) && !defined(NO_ADAPTIVE_MUTEXES) 538 /* 539 * If the current owner of the lock is executing on another 540 * CPU, spin instead of blocking. 541 */ 542 owner = (struct thread *)(v & MTX_FLAGMASK); 543 #ifdef ADAPTIVE_GIANT 544 if (TD_IS_RUNNING(owner)) { 545 #else 546 if (m != &Giant && TD_IS_RUNNING(owner)) { (kgdb) up #8 0xffffffff803826bd in ip_ctloutput (so=0x38, sopt=0xffffffffb3ff3b30) at /usr/src/sys/netinet/ip_output.c:1193 1193 INP_LOCK(inp); (kgdb) list 1188 m->m_len); 1189 if (error) { 1190 m_free(m); 1191 break; 1192 } 1193 INP_LOCK(inp); 1194 error = ip_pcbopts(inp, sopt->sopt_name, m); 1195 INP_UNLOCK(inp); 1196 return (error); 1197 } (kgdb) print so $1 = (struct socket *) 0x38 (kgdb) print sopt $2 = (struct sockopt *) 0xffffffffb3ff3b30 (kgdb) up #9 0xffffffff80393bd5 in tcp_ctloutput (so=0xffffff005a83b738, sopt=0xffffffffb3ff3b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 1038 error = ip_ctloutput(so, sopt); (kgdb) list 1033 #ifdef INET6 1034 if (INP_CHECK_SOCKAF(so, AF_INET6)) 1035 error = ip6_ctloutput(so, sopt); 1036 else 1037 #endif /* INET6 */ 1038 error = ip_ctloutput(so, sopt); 1039 return (error); 1040 } 1041 tp = intotcpcb(inp); (kgdb) up #10 0xffffffff80322068 in sosetopt (so=0xffffff005a83b738, sopt=0xffffffffb3ff3b30) at /usr/src/sys/kern/uipc_socket.c:1563 1563 return ((*so->so_proto->pr_ctloutput) (kgdb) print so->so_proto->pr_ctloutput $3 = (pr_ctloutput_t *) 0xffffffff80393ae0 <tcp_ctloutput> (kgdb) list *0xffffffff80393ae0 0xffffffff80393ae0 is in tcp_ctloutput (/usr/src/sys/netinet/tcp_usrreq.c:1016). 1011 */ 1012 int 1013 tcp_ctloutput(so, sopt) 1014 struct socket *so; 1015 struct sockopt *sopt; 1016 { 1017 int error, opt, optval; 1018 struct inpcb *inp; 1019 struct tcpcb *tp; 1020 struct tcp_info ti; (kgdb) up #11 0xffffffff80328536 in kern_setsockopt (td=0xffffff005b3fa4c0, s=1619162408, level=56, name=0, val=0x0, valseg=UIO_USERSPACE, valsize=2257178864) at /usr/src/sys/kern/uipc_syscalls.c:1351 1351 error = sosetopt(so, &sopt); (kgdb) list 1346 1347 NET_LOCK_GIANT(); 1348 error = getsock(td->td_proc->p_fd, s, &fp); 1349 if (error == 0) { 1350 so = fp->f_data; 1351 error = sosetopt(so, &sopt); 1352 fdrop(fp, td); 1353 } 1354 NET_UNLOCK_GIANT(); 1355 return(error); (kgdb) up #12 0xffffffff803285ae in setsockopt (td=0x38, uap=0xffffff005b3fa4c0) at /usr/src/sys/kern/uipc_syscalls.c:1307 1307 return (kern_setsockopt(td, uap->s, uap->level, uap->name, (kgdb) up #13 0xffffffff80489a51 in syscall (frame {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 140737488349992, tf_rax = 105, tf_rbx = 0, tf_rbp = 0, tf_r10 = 0, tf_r11 514, tf_r12 = 3, tf_r13 = 140737488350320, tf_r14 = 0, tf_r15 = 0, tf_trapno 12, tf_addr = 5285992, tf_flags = 12, tf_err = 2, tf_rip = 34368089164, tf_cs = 43, tf_rflags = 582, tf_rsp = 140737488350040, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 792 error = (*callp->sy_call)(td, argp); (kgdb) list 787 if ((callp->sy_narg & SYF_MPSAFE) == 0) { 788 mtx_lock(&Giant); 789 error = (*callp->sy_call)(td, argp); 790 mtx_unlock(&Giant); 791 } else 792 error = (*callp->sy_call)(td, argp); 793 } 794 795 switch (error) { 796 case 0: #14 0xffffffff80474638 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 270 call syscall Current language: auto; currently asm (kgdb) list 265 movq %r12,TF_R12(%rsp) /* C preserved */ 266 movq %r13,TF_R13(%rsp) /* C preserved */ 267 movq %r14,TF_R14(%rsp) /* C preserved */ 268 movq %r15,TF_R15(%rsp) /* C preserved */ 269 FAKE_MCOUNT(TF_RIP(%rsp)) 270 call syscall 271 movq PCPU(CURPCB),%rax 272 testq $PCB_FULLCTX,PCB_FLAGS(%rax) 273 jne 3f 274 1: /* Check for and handle AST's on return to userland */ (kgdb) up #15 0x00000008007f6c4c in ?? () (kgdb) up Initial frame selected; you cannot go up. (kgdb) list 275 cli 276 movq PCPU(CURTHREAD),%rax 277 testl $TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax) 278 je 2f 279 sti 280 movq %rsp, %rdi 281 call ast 282 jmp 1b 283 2: /* restore preserved registers */ 284 MEXITCOUNT DMESG: ------------------------------------------ opyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-STABLE #0: Tue Oct 3 08:33:25 CEST 2006 root@mxtwo.nsn.no:/usr/obj/usr/src/sys/BladeSMP Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2800.11-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0xf41 Stepping = 1 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS, HTT,TM,PBE> Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>> AMD Features=0x20000800<SYSCALL,LM> Logical CPUs per core: 2 real memory = 2147213312 (2047 MB) avail memory = 2064486400 (1968 MB) kbd0 at kbdmux0 cpu0 on motherboard pcib0: <Host to PCI bridge> pcibus 0 on motherboard pci0: <PCI bus> on pcib0 pci0: <unknown> at device 0.1 (no driver attached) pcib1: <PCI-PCI bridge> at device 3.0 on pci0 pci4: <PCI bus> on pcib1 pcib2: <PCI-PCI bridge> at device 0.0 on pci4 pci6: <PCI bus> on pcib2 pcib3: <PCI-PCI bridge> at device 0.2 on pci4 pci5: <PCI bus> on pcib3 bge0: <Broadcom BCM5704 B0, ASIC rev. 0x2100> mem 0xdcff0000-0xdcffffff irq 7 at device 1.0 on pci5 bge0: Ethernet address: 00:14:5e:3c:94:b6 bge1: <Broadcom BCM5704 B0, ASIC rev. 0x2100> mem 0xdcfe0000-0xdcfeffff irq 5 at device 1.1 on pci5 bge1: Ethernet address: 00:14:5e:3c:94:b7 pci0: <base peripheral> at device 8.0 (no driver attached) pcib4: <PCI-PCI bridge> at device 28.0 on pci0 pci2: <PCI bus> on pcib4 mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x4000-0x40ff mem 0xdeff0000-0xdeffffff,0xdefe0000-0xdefeffff irq 10 at device 1.0 on pci2 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.2.15.0 mpt0: Capabilities: ( RAID-1E RAID-1 SAFTE ) mpt0: 1 Active Volume (1 Max) mpt0: 2 Hidden Drive Members (6 Max) uhci0: <UHCI (generic) USB controller> port 0x2200-0x221f irq 10 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <UHCI (generic) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <UHCI (generic) USB controller> port 0x2600-0x261f irq 5 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <UHCI (generic) USB controller> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered pci0: <base peripheral> at device 29.4 (no driver attached) pci0: <base peripheral, interrupt controller> at device 29.5 (no driver attached) pcib5: <PCI-PCI bridge> at device 30.0 on pci0 pci1: <PCI bus> on pcib5 pci1: <display, VGA> at device 1.0 (no driver attached) isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 6300ESB UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 pci0: <serial bus, SMBus> at device 31.3 (no driver attached) orm0: <ISA Option ROM> at iomem 0xc0000-0xc8fff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 uhub2: Cypress Semiconductor 4 Port Hub, class 9/0, rev 1.10/0.01, addr 2 uhub2: 4 ports with 4 removable, bus powered ukbd0: IBM PPC I/F, rev 1.10/0.01, addr 3, iclass 3/1 kbd1 at ukbd0 ums0: IBM PPC I/F, rev 1.10/0.01, addr 3, iclass 3/1 ums0: X report 0x0002 not supported device_attach: ums0 attach returned 6 ukbd1: IBM HIDK/M, rev 1.10/0.01, addr 4, iclass 3/1 kbd2 at ukbd1 ums0: IBM HIDK/M, rev 1.10/0.01, addr 4, iclass 3/1 ums0: 3 buttons and Z dir. Timecounter "TSC" frequency 2800109935 Hz quality 800 Timecounters tick every 1.000 msec IP Filter: v4.1.8 initialized. Default = pass all, Logging = enabled Waiting 5 seconds for SCSI devices to settle mpt0:vol0(mpt0:0:0): Settings ( Hot-Plug-Spares ) mpt0:vol0(mpt0:0:0): Using Spare Pool: 0 mpt0:vol0(mpt0:0:0): 2 Members: (mpt0:0:0): Primary (mpt0:0:1): Secondary mpt0:vol0(mpt0:0:0): RAID-1 - Optimal mpt0:vol0(mpt0:0:0): Status ( Enabled ) (mpt0:vol0:0): Physical (mpt0:0:0), Pass-thru (mpt0:1:0) (mpt0:vol0:0): Online (mpt0:vol0:1): Physical (mpt0:0:1), Pass-thru (mpt0:1:1) (mpt0:vol0:1): Online pass1 at mpt0 bus 1 target 0 lun 0 pass1: <IBM-ESXS ST973401LC FN B41D> Fixed unknown SCSI-4 device pass1: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled pass2 at mpt0 bus 1 target 1 lun 0 pass2: <IBM-ESXS ST973401LC FN B41D> Fixed unknown SCSI-4 device pass2: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0 at mpt0 bus 0 target 0 lun 0 da0: <LSILOGIC 1030 IM IM 1000> Fixed Direct Access SCSI-2 device da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 69878MB (143110144 512 byte sectors: 255H 63S/T 8908C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted WARNING: /usr was not properly dismounted bge1: link state changed to UP -- Med vennlig hilsen / Best regards, ------------------------------------------ Daniel Bond PGP: C822C4BD ------------------------------------------