Rink Springer
2006-Nov-16 08:24 UTC
panic(): vinvalbuf: dirty bufs: perhaps a ffs_syncvnode bug?
Hi people, At work, some storage is put on a SUN StorEdge 3510 FibreChannel array; the disks are divided into two volumes, which are mounted using isp(4) controllers. Two seperate machines each mount a single volume (this is, box1 mounts the first volume (/dev/da1), and box2 mounts the second volume (/dev/da2)). Both volumes are formatted using UFS2. Over the night, we reset the shelf in order to activate its new management IP address, causing the /dev/da[12] devices to be temporarily unavailable. This resulted in the following panic on the rather busy mailstorage server (the other server has minor load and was fine): --- (da0:isp0:0:1:0): lost device (da0:isp0:0:1:0): removing device entry (da1:isp0:0:2:0): lost device g_vfs_done():da1s1[WRITE(offset=292316823552, length=16384)]error = 6 g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 g_vfs_done():da1s1[READ(offset=12175362048, length=2048)]error = 6 g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 g_vfs_done():da1s1[READ(offset=18370689024, length=2048)]error = 6 g_vfs_done():da1s1[READ(offset=25829486592, length=512)]error = 6 vnode_pager_getpages: I/O read error vm_fault: pager read error, pid 78035 (lmtpd) g_vfs_done():da1s1[WRITE(offset=240287318016, length=1638(da1:isp0:0:2:0): Invalidating pack 4)]error = 6 g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 g_vfs_done():da1s1[READ(offset=102126977024, length=16384)]error = 6 g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 g_vfs_dpone():da1s1[READ(offset=102319669248, length=16384)]error = 6a nic: vinvalbuf: dirty bufs cpuid = 2 Uptime: 54d15h48m38s kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x56 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0681303 stack pointer = 0x28:0xe8d973f0 frame pointer = 0x28:0xe8d973f8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 78066 (lmtpd) trap number = 12 --- When looking at the source code of vinvalbuf(), which calls bufobj_invalbuf(), it seems that this panic is raised after a bufobj still contains dirty data after waiting for it to complete without error. The code can be found at /sys/kern/vfs_subr.c The sync routine called eventually translates to bufsync(), as in /sys/kern/vfs_bio.c, which calls the filesystem's sync routine. It seems as if the return status of vfs_bio_awrite() in ffs_syncvnode() is not checked; all the other parts are checked. I believe this could provoke this panic. As the machine is in production use, it was instantly rebooted by a collegue and thus I have no vmcore, backtrace or anything. I therefore hope the information provided here is adequate. Can someone with more FreeBSD-VFS knowledge please look at this? Thanks! -- Rink P.W. Springer - http://rink.nu "It's you isn't it? THE BASTARD OPERATOR FROM HELL!" "In the flesh, on the phone and in your account..." - BOFH #3 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 2415 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061116/91575736/smime-0001.bin
Jilles Tjoelker
2006-Nov-27 06:54 UTC
panic(): vinvalbuf: dirty bufs: perhaps a ffs_syncvnode bug?
On Thu, Nov 16, 2006 at 09:24:07AM +0100, Rink Springer wrote:> Over the night, we reset the shelf in order to activate its new > management IP address, causing the /dev/da[12] devices to be temporarily > unavailable. This resulted in the following panic on the rather busy > mailstorage server (the other server has minor load and was fine):> --- > (da0:isp0:0:1:0): lost device > (da0:isp0:0:1:0): removing device entry > (da1:isp0:0:2:0): lost device > g_vfs_done():da1s1[WRITE(offset=292316823552, length=16384)]error = 6 > g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 > g_vfs_done():da1s1[READ(offset=12175362048, length=2048)]error = 6 > g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 > g_vfs_done():da1s1[READ(offset=18370689024, length=2048)]error = 6 > g_vfs_done():da1s1[READ(offset=25829486592, length=512)]error = 6 > vnode_pager_getpages: I/O read error > vm_fault: pager read error, pid 78035 (lmtpd) > g_vfs_done():da1s1[WRITE(offset=240287318016, > length=1638(da1:isp0:0:2:0): Invalidating pack > 4)]error = 6 > g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 > g_vfs_done():da1s1[READ(offset=102126977024, length=16384)]error = 6 > g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 > g_vfs_dpone():da1s1[READ(offset=102319669248, length=16384)]error = 6a > nic: vinvalbuf: dirty bufs > cpuid = 2 > Uptime: 54d15h48m38s> When looking at the source code of vinvalbuf(), which calls > bufobj_invalbuf(), it seems that this panic is raised after a bufobj > still contains dirty data after waiting for it to complete without > error. The code can be found at /sys/kern/vfs_subr.cNote that this panic can only occur if vinvalbuf() is called with V_SAVE (save modified data first). The exact condition for the panic is better described as: a bufobj still contains dirty data or still has output in progress after a successful synchronous BO_SYNC operation. bufobj_wwait() cannot return an error unless msleep() fails (e.g. interruptible sleep requested via slpflag and signal occured). If the I/O has failed, bufobj_wwait() will return success.> The sync routine called eventually translates to bufsync(), as in > /sys/kern/vfs_bio.c, which calls the filesystem's sync routine. It seems > as if the return status of vfs_bio_awrite() in ffs_syncvnode() is not > checked; all the other parts are checked. I believe this could provoke > this panic.There does not seem much point in checking an asynchronous write result anyway, as the I/O is not completed yet. I don't understand well what the code is doing with async writes. For all but the last pass (see further), it will call bawrite() on the buffer, which sets B_ASYNC then calls bwrite(). For the last pass, it calls bwrite() directly (has something cleared B_ASYNC?), and returns an error if it fails. bwrite() itself is an inline function defined in /sys/sys/buf.h, which calls BO_WRITE after some KASSERTs.> As the machine is in production use, it was instantly rebooted by a > collegue and thus I have no vmcore, backtrace or anything. I therefore > hope the information provided here is adequate.> Can someone with more FreeBSD-VFS knowledge please look at this?There is another possible problem, from this comment in /sys/ufs/ffs/ffs_vnops.c ffs_syncvnode(): /* * Block devices associated with filesystems may * have new I/O requests posted for them even if * the vnode is locked, so no amount of trying will * get them clean. Thus we give block devices a * good effort, then just give up. For all other file * types, go around and try again until it is clean. */ Actually it just does NIADDR + 1 (four) passes and then gives up. If DIAGNOSTIC is enabled, it will then print the affected vnode, if it is not a disk. This failure is not reflected in ffs_syncvnode()'s return value, so if it occurs when ffs_syncvnode() is called from bufobj_invalbuf(), a panic will result. Suppose ffs_syncvnode() would be changed to return some error in this case. bufobj_invalbuf()/vinvalbuf() will handle a BO_SYNC/ffs_syncvnode() error by aborting with an error return. It seems that in most cases this will cause the operation invoking the vinvalbuf() to fail. However, in at least one case (vm_object_terminate()), the error will be ignored; this may cause old garbage/dangling references? -- Jilles Tjoelker