Vivek Khera
2007-Nov-13 11:29 UTC
reboot after panic: vm_page_unwire: invalid wire count: 0
I've got a Dell 1750 box that was rock-solid stable running 4.11 for a couple of years now operating a pretty busy website backend. A month or so ago we wiped it clean and repurposed it to run a different website running Drupal with a Varnish front-end cache using FreeBSD 6.2-RELEASE-p5. The system is i386 and has 1Gb of RAM. Uname output: FreeBSD mb.kcilink.com 6.2-RELEASE-p5 FreeBSD 6.2- RELEASE-p5 #0: Wed Jun 27 10:47:15 EDT 2007 vivek@dbarchive.m1e.net:/n/lorax1/usr6/obj.i386/n/lorax1/usr6/src/sys/ KCI32SMP i386 The last week or so, it has been crashing regularly. Sometimes twice per day, and sometimes it runs for two days without a problem. I finally managed to make it dump a crashlog and core, and discovered that the panic was: reboot after panic: vm_page_unwire: invalid wire count: 0 I google around and found one old PR #33637 which had a patch but that was for FreeBSD 4.5. I have also found two other mentions of this panic, one on the mailing lists with no responses, and another for a PR from 6.1-PRERELEASE, PR #94578, which has no comments on it. According to the http and varnish logs, we're not being particularly hit very hard when the panic happens, but I don't know if we lose some log data during the panic. I have the core and the kernel.debug. I'm not sure what info to extract from it beyond the backtrace. The watchdog timer fired and dropped me to DDB, so I just typed "watchdog" and "c" and let it finish dumping. Here's the backtrace, and "bt full" output. # kgdb kernel.debug /var/crash/vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/ libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: panic: vm_page_unwire: invalid wire count: 0 cpuid = 1 KDB: stack backtrace: kdb_backtrace(100,c5a76000,c0e88ab0,0,d90d82c8,...) at kdb_backtrace +0x29 panic(c06b011f,0,c0e88ab0,efe80900,c057b96a,...) at panic+0x114 vm_page_unwire(c0e88ab0,0) at vm_page_unwire+0x68 vfs_vmio_release(d90d82c8) at vfs_vmio_release+0xa2 getnewbuf(0,0,4000,4000) at getnewbuf+0x2bc getblk(c6f81550,4f5,0,4000,0,...) at getblk+0x360 ffs_balloc_ufs2(c6f81550,13d4000,0,fa,c4f32780,...) at ffs_balloc_ufs2+0x1606 ffs_write(efe80bec) at ffs_write+0x2ec VOP_WRITE_APV(c06e06a0,efe80bec) at VOP_WRITE_APV+0xce vn_write(c59c8000,efe80cbc,c51cf400,0,c5a76000) at vn_write+0x1ee dofilewrite(c5a76000,c,c59c8000,efe80cbc,ffffffff,...) at dofilewrite +0x77 kern_writev(c5a76000,c,efe80cbc,821bba3,fa,...) at kern_writev+0x3b write(c5a76000,efe80d04) at write+0x45 syscall(3b,809003b,bfbf003b,0,bfbfeaa4,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x483d732f, esp = 0xbfbfe9dc, ebp = 0xbfbfea08 --- Uptime: 1d20h51m58s Dumping 1023 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1023MB (261872 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95interrupt total irq4: sio0 21758 irq15: ata1 1 irq16: bge0 4544565 irq17: bge1 17684238 irq18: amr0 588223 cpu0: timer 323148326 cpu2: timer 323148294 cpu1: timer 323148331 cpu3: timer 323148344 Total 1315432158 KDB: stack backtrace: kdb_backtrace(c069ec5d,4e67e6de,0,c06ea170,c06e9818,...) at kdb_backtrace+0x29 watchdog_fire(c07120e0,c8,efe80634,c065c821,efe8063c,...) at watchdog_fire+0x9d hardclock(efe8063c) at hardclock+0x115 lapic_handle_timer(0) at lapic_handle_timer+0x51 Xtimerint(c4fe6000,1,efe806a8,c066d57b,c4fe6000,...) at Xtimerint+0x30 getit(c4fe6000,c4fe6000,4,efe806c0,c0496f97,...) at getit+0x88 DELAY(1) at DELAY+0x3b amr_quartz_poll_command1(c4fe6000,c51fbff0,0,0,1000,...) at amr_quartz_poll_command1+0x1af amr_setup_polled_dmamap(c51fbff0,c4fef800,1,0) at amr_setup_polled_dmamap+0x94 bus_dmamap_load(c4ffe380,0,c0c22000,10000,c0496cd4,c51fbff0,1) at bus_dmamap_load+0x4b5 amr_quartz_poll_command(c51fbff0) at amr_quartz_poll_command+0x51 amr_dump_blocks(c4fe6000,0,4cb25e,c0c22000,80) at amr_dump_blocks+0x5f amrd_dump(c515b700,c0c22000,0,9964bc00,0,10000) at amrd_dump+0x7c cb_dumpdata(c0711a48,1,c06f44a0) at cb_dumpdata+0x100 foreach_chunk(c0655a78,c06f44a0,c06f44a0) at foreach_chunk+0x23 dumpsys(c06f44a0,efe808b0,c05319be,0,c515b88c,...) at dumpsys+0x1ec doadump(0,c515b88c,483d732f,c4f8998c,c5a76000,...) at doadump+0x42 boot(104,104,c5a76000,c0e88ab0,0,...) at boot+0x4ea panic(c06b011f,0,c0e88ab0,efe80900,c057b96a,...) at panic+0x1b9 vm_page_unwire(c0e88ab0,0) at vm_page_unwire+0x68 vfs_vmio_release(d90d82c8) at vfs_vmio_release+0xa2 getnewbuf(0,0,4000,4000) at getnewbuf+0x2bc getblk(c6f81550,4f5,0,4000,0,...) at getblk+0x360 ffs_balloc_ufs2(c6f81550,13d4000,0,fa,c4f32780,...) at ffs_balloc_ufs2+0x1606 ffs_write(efe80bec) at ffs_write+0x2ec VOP_WRITE_APV(c06e06a0,efe80bec) at VOP_WRITE_APV+0xce vn_write(c59c8000,efe80cbc,c51cf400,0,c5a76000) at vn_write+0x1ee dofilewrite(c5a76000,c,c59c8000,efe80cbc,ffffffff,...) at dofilewrite +0x77 kern_writev(c5a76000,c,efe80cbc,821bba3,fa,...) at kern_writev+0x3b write(c5a76000,efe80d04) at write+0x45 syscall(3b,809003b,bfbf003b,0,bfbfeaa4,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x483d732f, esp = 0xbfbfe9dc, ebp = 0xbfbfea08 --- KDB: enter: watchdog timeout 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt full #0 doadump () at pcpu.h:165 No locals. #1 0xc05319be in boot (howto=260) at /n/lorax1/usr6/src/sys/kern/kern_shutdown.c:409 first_buf_printf = 1 #2 0xc0531d15 in panic ( fmt=0xc06b011f "vm_page_unwire: invalid wire count: %d") at /n/lorax1/usr6/src/sys/kern/kern_shutdown.c:565 td = (struct thread *) 0xc5a76000 bootopt = 260 newpanic = 1 ap = 0xc5a76000 "0D???T;?" buf = "vm_page_unwire: invalid wire count: 0", '\0' <repeats 218 times> #3 0xc0633948 in vm_page_unwire (m=0xc0e88ab0, activate=0) at /n/lorax1/usr6/src/sys/vm/vm_page.c:1275 No locals. #4 0xc057b96a in vfs_vmio_release (bp=0xd90d82c8) at /n/lorax1/usr6/src/sys/kern/vfs_bio.c:1470 i = 0 m = 0xc0e88ab0 #5 0xc057c108 in getnewbuf (slpflag=0, slptimeo=0, size=16384, maxsize=16384) at /n/lorax1/usr6/src/sys/kern/vfs_bio.c:1779 qindex = 1 bp = (struct buf *) 0xd90d82c8 nbp = (struct buf *) 0xd907ae20 defrag = 0 nqindex = 1 flushingbufs = 0 #6 0xc057d6ac in getblk (vp=0xc6f81550, blkno=1269, size=16384, slpflag=0, ---Type <return> to continue, or q <return> to quit--- slptimeo=0, flags=0) at /n/lorax1/usr6/src/sys/kern/vfs_bio.c:2497 bsize = 16384 maxsize = 0 vmio = 1 offset = Unhandled dwarf expression opcode 0x93 (kgdb)
On 11/13/07, Vivek Khera <vivek@khera.org> wrote:> I've got a Dell 1750 box that was rock-solid stable running 4.11 for a > couple of years now operating a pretty busy website backend. A month > or so ago we wiped it clean and repurposed it to run a different > website running Drupal with a Varnish front-end cache using FreeBSD > 6.2-RELEASE-p5. The system is i386 and has 1Gb of RAM. > > Uname output: FreeBSD mb.kcilink.com 6.2-RELEASE-p5 FreeBSD 6.2- > RELEASE-p5 #0: Wed Jun 27 10:47:15 EDT 2007 > vivek@dbarchive.m1e.net:/n/lorax1/usr6/obj.i386/n/lorax1/usr6/src/sys/ > KCI32SMP i386 > > > The last week or so, it has been crashing regularly. Sometimes twice > per day, and sometimes it runs for two days without a problem. I > finally managed to make it dump a crashlog and core, and discovered > that the panic was: > > reboot after panic: vm_page_unwire: invalid wire count: 0 > > I google around and found one old PR #33637 which had a patch but that > was for FreeBSD 4.5. I have also found two other mentions of this > panic, one on the mailing lists with no responses, and another for a > PR from 6.1-PRERELEASE, PR #94578, which has no comments on it. > > According to the http and varnish logs, we're not being particularly > hit very hard when the panic happens, but I don't know if we lose some > log data during the panic. > > I have the core and the kernel.debug. I'm not sure what info to > extract from it beyond the backtrace. The watchdog timer fired and > dropped me to DDB, so I just typed "watchdog" and "c" and let it > finish dumping. > > Here's the backtrace, and "bt full" output. > > > # kgdb kernel.debug /var/crash/vmcore.0 > [GDB will not be able to debug user-mode threads: /usr/lib/ > libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "i386-marcel-freebsd". > > Unread portion of the kernel message buffer: > panic: vm_page_unwire: invalid wire count: 0 > cpuid = 1 > KDB: stack backtrace: > kdb_backtrace(100,c5a76000,c0e88ab0,0,d90d82c8,...) at kdb_backtrace > +0x29 > panic(c06b011f,0,c0e88ab0,efe80900,c057b96a,...) at panic+0x114 > vm_page_unwire(c0e88ab0,0) at vm_page_unwire+0x68 > vfs_vmio_release(d90d82c8) at vfs_vmio_release+0xa2 > getnewbuf(0,0,4000,4000) at getnewbuf+0x2bc > getblk(c6f81550,4f5,0,4000,0,...) at getblk+0x360 > ffs_balloc_ufs2(c6f81550,13d4000,0,fa,c4f32780,...) at > ffs_balloc_ufs2+0x1606 > ffs_write(efe80bec) at ffs_write+0x2ec > VOP_WRITE_APV(c06e06a0,efe80bec) at VOP_WRITE_APV+0xce > vn_write(c59c8000,efe80cbc,c51cf400,0,c5a76000) at vn_write+0x1ee > dofilewrite(c5a76000,c,c59c8000,efe80cbc,ffffffff,...) at dofilewrite > +0x77 > kern_writev(c5a76000,c,efe80cbc,821bba3,fa,...) at kern_writev+0x3b > write(c5a76000,efe80d04) at write+0x45 > syscall(3b,809003b,bfbf003b,0,bfbfeaa4,...) at syscall+0x2bf > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (4, FreeBSD ELF32, write), eip = 0x483d732f, esp > 0xbfbfe9dc, ebp = 0xbfbfea08 --- > Uptime: 1d20h51m58s > Dumping 1023 MB (2 chunks) > chunk 0: 1MB (159 pages) ... ok > chunk 1: 1023MB (261872 pages) 1007 991 975 959 943 927 911 895 879 > 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 > 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 > 319 303 287 271 255 239 223 207 191 175 159 143 127 111 > 95interrupt total > irq4: sio0 21758 > irq15: ata1 1 > irq16: bge0 4544565 > irq17: bge1 17684238 > irq18: amr0 588223 > cpu0: timer 323148326 > cpu2: timer 323148294 > cpu1: timer 323148331 > cpu3: timer 323148344 > Total 1315432158 > KDB: stack backtrace: > kdb_backtrace(c069ec5d,4e67e6de,0,c06ea170,c06e9818,...) at > kdb_backtrace+0x29 > watchdog_fire(c07120e0,c8,efe80634,c065c821,efe8063c,...) at > watchdog_fire+0x9d > hardclock(efe8063c) at hardclock+0x115 > lapic_handle_timer(0) at lapic_handle_timer+0x51 > Xtimerint(c4fe6000,1,efe806a8,c066d57b,c4fe6000,...) at Xtimerint+0x30 > getit(c4fe6000,c4fe6000,4,efe806c0,c0496f97,...) at getit+0x88 > DELAY(1) at DELAY+0x3b > amr_quartz_poll_command1(c4fe6000,c51fbff0,0,0,1000,...) at > amr_quartz_poll_command1+0x1af > amr_setup_polled_dmamap(c51fbff0,c4fef800,1,0) at > amr_setup_polled_dmamap+0x94 > bus_dmamap_load(c4ffe380,0,c0c22000,10000,c0496cd4,c51fbff0,1) at > bus_dmamap_load+0x4b5 > amr_quartz_poll_command(c51fbff0) at amr_quartz_poll_command+0x51 > amr_dump_blocks(c4fe6000,0,4cb25e,c0c22000,80) at amr_dump_blocks+0x5f > amrd_dump(c515b700,c0c22000,0,9964bc00,0,10000) at amrd_dump+0x7c > cb_dumpdata(c0711a48,1,c06f44a0) at cb_dumpdata+0x100 > foreach_chunk(c0655a78,c06f44a0,c06f44a0) at foreach_chunk+0x23 > dumpsys(c06f44a0,efe808b0,c05319be,0,c515b88c,...) at dumpsys+0x1ec > doadump(0,c515b88c,483d732f,c4f8998c,c5a76000,...) at doadump+0x42 > boot(104,104,c5a76000,c0e88ab0,0,...) at boot+0x4ea > panic(c06b011f,0,c0e88ab0,efe80900,c057b96a,...) at panic+0x1b9 > vm_page_unwire(c0e88ab0,0) at vm_page_unwire+0x68 > vfs_vmio_release(d90d82c8) at vfs_vmio_release+0xa2 > getnewbuf(0,0,4000,4000) at getnewbuf+0x2bc > getblk(c6f81550,4f5,0,4000,0,...) at getblk+0x360 > ffs_balloc_ufs2(c6f81550,13d4000,0,fa,c4f32780,...) at > ffs_balloc_ufs2+0x1606 > ffs_write(efe80bec) at ffs_write+0x2ec > VOP_WRITE_APV(c06e06a0,efe80bec) at VOP_WRITE_APV+0xce > vn_write(c59c8000,efe80cbc,c51cf400,0,c5a76000) at vn_write+0x1ee > dofilewrite(c5a76000,c,c59c8000,efe80cbc,ffffffff,...) at dofilewrite > +0x77 > kern_writev(c5a76000,c,efe80cbc,821bba3,fa,...) at kern_writev+0x3b > write(c5a76000,efe80d04) at write+0x45 > syscall(3b,809003b,bfbf003b,0,bfbfeaa4,...) at syscall+0x2bf > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (4, FreeBSD ELF32, write), eip = 0x483d732f, esp > 0xbfbfe9dc, ebp = 0xbfbfea08 --- > KDB: enter: watchdog timeout > 79 63 47 31 15 > > #0 doadump () at pcpu.h:165 > 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > (kgdb) bt full > #0 doadump () at pcpu.h:165 > No locals. > #1 0xc05319be in boot (howto=260) > at /n/lorax1/usr6/src/sys/kern/kern_shutdown.c:409 > first_buf_printf = 1 > #2 0xc0531d15 in panic ( > fmt=0xc06b011f "vm_page_unwire: invalid wire count: %d") > at /n/lorax1/usr6/src/sys/kern/kern_shutdown.c:565 > td = (struct thread *) 0xc5a76000 > bootopt = 260 > newpanic = 1 > ap = 0xc5a76000 "0D???T;?" > buf = "vm_page_unwire: invalid wire count: 0", '\0' <repeats 218 times> > #3 0xc0633948 in vm_page_unwire (m=0xc0e88ab0, activate=0) > at /n/lorax1/usr6/src/sys/vm/vm_page.c:1275 > No locals. > #4 0xc057b96a in vfs_vmio_release (bp=0xd90d82c8) > at /n/lorax1/usr6/src/sys/kern/vfs_bio.c:1470 > i = 0 > m = 0xc0e88ab0 > #5 0xc057c108 in getnewbuf (slpflag=0, slptimeo=0, size=16384, > maxsize=16384) > at /n/lorax1/usr6/src/sys/kern/vfs_bio.c:1779 > qindex = 1 > bp = (struct buf *) 0xd90d82c8 > nbp = (struct buf *) 0xd907ae20 > defrag = 0 > nqindex = 1 > flushingbufs = 0 > #6 0xc057d6ac in getblk (vp=0xc6f81550, blkno=1269, size=16384, > slpflag=0, > ---Type <return> to continue, or q <return> to quit--- > slptimeo=0, flags=0) at /n/lorax1/usr6/src/sys/kern/vfs_bio.c:2497 > bsize = 16384 > maxsize = 0 > vmio = 1 > offset = Unhandled dwarf expression opcode 0x93 > (kgdb) >Do you happen to have ZERO_COPY_SOCKETS in your kernel config?> _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >-- Mahnahmahnah!