Hi all! http://pastebin.com/niXrjF0D Please refer to full output from crash above. This morning our IMAP server decided to go belly up. I could not remote in, and the machine would not respond to any pings. Checking the physical console I had the following worrisome messages on screen: ? g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5 ? g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16 ? /mnt/USBBD: got error 16 while accessing filesystem ? panic: softdep_deallocate_dependencies: unrecovered I/O error ? cpuid = 5 /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the IMAP data using rsync. Everything so far has worked without issue. I also noticed a bunch of: ? fstat: can't read file 2 at 0x4000000001fffff ? fstat: can't read file 4 at 0x780000ffff ? fstat: can't read file 5 at 0x600000000 ? fstat: can't read file 1 at 0x200007fffffffff ? fstat: can't read file 2 at 0x4000000001fffff ? fstat: can't read file 4 at 0x780000ffff ? fstat: can't read file 5 at 0x600000000 but I have no idea what these are from. df -h output: /dev/da0p2 1.8T 226G 1.5T 13% / devfs 1.0K 1.0K 0B 100% /dev /dev/da1p1 7.0T 251G 6.2T 4% /mnt/USBBD da0p2 is a RAID level 5 on an HP Smart Array Here is the output of dmsg after reboot: http://pastebin.com/rHVjgZ82 Obviously both the RAID and USB drive did not walk away from the crash cleaning. Should I be running a fsck at this point on both from single user mode to verify and clean up. My concern is the: WARNING: /: mount pending error: blocks 0 files 26 when mounting /dev/da0p2 For some reason I was under the impression that fsck was run automatically on reboot. Any help in this matter would be greatly appreciated. I'm a little concerned that a backup strategy that has worked for us for many MANY years would so easily throw the OS into panic. If an I/O error occurred on the USB Drive I would frankly think it should just back out, without panic. Or am I missing something? Any recommendations / insights would be most welcome. Shawn
On 2/6/2017 15:01, Shawn Bakhtiar wrote:> Hi all! > > http://pastebin.com/niXrjF0D > > Please refer to full output from crash above. > > This morning our IMAP server decided to go belly up. I could not remote in, and the machine would not respond to any pings. > > Checking the physical console I had the following worrisome messages on screen: > > ? g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5 > ? g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16 > ? /mnt/USBBD: got error 16 while accessing filesystem > ? panic: softdep_deallocate_dependencies: unrecovered I/O error > ? cpuid = 5 > > /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the IMAP data using rsync. Everything so far has worked without issue. > > I also noticed a bunch of: > > ? fstat: can't read file 2 at 0x4000000001fffff > ? fstat: can't read file 4 at 0x780000ffff > ? fstat: can't read file 5 at 0x600000000 > ? fstat: can't read file 1 at 0x200007fffffffff > ? fstat: can't read file 2 at 0x4000000001fffff > ? fstat: can't read file 4 at 0x780000ffff > ? fstat: can't read file 5 at 0x600000000 > > > but I have no idea what these are from. > > df -h output: > /dev/da0p2 1.8T 226G 1.5T 13% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da1p1 7.0T 251G 6.2T 4% /mnt/USBBD > > > da0p2 is a RAID level 5 on an HP Smart Array > > Here is the output of dmsg after reboot: > http://pastebin.com/rHVjgZ82 > > Obviously both the RAID and USB drive did not walk away from the crash cleaning. Should I be running a fsck at this point on both from single user mode to verify and clean up. My concern is the: > WARNING: /: mount pending error: blocks 0 files 26 > when mounting /dev/da0p2 > > For some reason I was under the impression that fsck was run automatically on reboot. > > Any help in this matter would be greatly appreciated. I'm a little concerned that a backup strategy that has worked for us for many MANY years would so easily throw the OS into panic. If an I/O error occurred on the USB Drive I would frankly think it should just back out, without panic. Or am I missing something? > > Any recommendations / insights would be most welcome. > Shawn > >The "mount pending error" is normal on a disk that has softupdates turned on; fsck runs in the background after the boot, and this is "safe" because of how the metadata and data writes are ordered. In other words the filesystem in this situation is missing uncommitted data, but the state of the system is consistent. As a result the system can mount root read-write without having to fsck it first and the background cleanup is safe from a disk consistency problem. The panic itself appears to have resulted from an I/O error that resulted in a failed operation. I was part of a thread in 2016 on this you can find here: https://lists.freebsd.org/pipermail/freebsd-stable/2016-July/084944.html The basic problem is that the softupdates code cannot deal with a hard I/O error on write because it no longer can guarantee filesystem integrity if it continues. I argued in that thread that the superior solution would be forcibly detach the volume, which would leave you with a "dirty" filesystem and a failed operation but not a panic. The file(s) involved in the write error might be lost, but the integrity of the filesystem is recoverable (as it is in the panic case) -- at least it is if the fsck doesn't require writing to a block that *also* errors out. The decision in the code is to panic rather than detach the volume, however, so panic it is. This one has bit me with sd cards in small embedded-style machines (where turning off softupdates makes things VERY slow) and at some point I may look into developing a patch to forcibly-detach the volume instead. That obviously won't help you if the system volume is the one the error happens on (now you just forcibly detached the root filesystem which is going to get you an immediate panic anyway) but in the event of a data disk it would prevent the system from crashing. -- Karl Denninger karl at denninger.net <mailto:karl at denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2993 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20170206/e60ac87f/attachment.bin>
Well.... It happened again today. I found a few instances on the web of others reporting similar issues, and also ran across this bug. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211037 This is early similar to what is happening to me, save that in this case it's happening with a USB drive. Should I be attaching this there? I've disabled the local backup to the USB (just doing the remote one to the share drive). Any help would be greatly appreciated. On Feb 6, 2017, at 1:01 PM, Shaheen Bakhtiar <shashaness at hotmail.com<mailto:shashaness at hotmail.com>> wrote: Hi all! http://pastebin.com/niXrjF0D Please refer to full output from crash above. This morning our IMAP server decided to go belly up. I could not remote in, and the machine would not respond to any pings. Checking the physical console I had the following worrisome messages on screen: ? g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5 ? g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16 ? /mnt/USBBD: got error 16 while accessing filesystem ? panic: softdep_deallocate_dependencies: unrecovered I/O error ? cpuid = 5 /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the IMAP data using rsync. Everything so far has worked without issue. I also noticed a bunch of: ? fstat: can't read file 2 at 0x4000000001fffff ? fstat: can't read file 4 at 0x780000ffff ? fstat: can't read file 5 at 0x600000000 ? fstat: can't read file 1 at 0x200007fffffffff ? fstat: can't read file 2 at 0x4000000001fffff ? fstat: can't read file 4 at 0x780000ffff ? fstat: can't read file 5 at 0x600000000 but I have no idea what these are from. df -h output: /dev/da0p2 1.8T 226G 1.5T 13% / devfs 1.0K 1.0K 0B 100% /dev /dev/da1p1 7.0T 251G 6.2T 4% /mnt/USBBD da0p2 is a RAID level 5 on an HP Smart Array Here is the output of dmsg after reboot: http://pastebin.com/rHVjgZ82 Obviously both the RAID and USB drive did not walk away from the crash cleaning. Should I be running a fsck at this point on both from single user mode to verify and clean up. My concern is the: WARNING: /: mount pending error: blocks 0 files 26 when mounting /dev/da0p2 For some reason I was under the impression that fsck was run automatically on reboot. Any help in this matter would be greatly appreciated. I'm a little concerned that a backup strategy that has worked for us for many MANY years would so easily throw the OS into panic. If an I/O error occurred on the USB Drive I would frankly think it should just back out, without panic. Or am I missing something? Any recommendations / insights would be most welcome. Shawn