thr3ads.net - freebsd stable - FreeBSD 10.2-RELEASE #0 r286666: Panic and crash [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Shawn Bakhtiar

2017-Feb-06 21:01 UTC

FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

Hi all!

http://pastebin.com/niXrjF0D

Please refer to full output from crash above.

This morning our IMAP server decided to go belly up. I could not remote in, and
the machine would not respond to any pings.

Checking the physical console I had the following worrisome messages on screen:

? g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5
? g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16
? /mnt/USBBD: got error 16 while accessing filesystem
? panic: softdep_deallocate_dependencies: unrecovered I/O error
? cpuid = 5

/mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the IMAP
data using rsync. Everything so far has worked without issue.

I also noticed a bunch of:

? fstat: can't read file 2 at 0x4000000001fffff
? fstat: can't read file 4 at 0x780000ffff
? fstat: can't read file 5 at 0x600000000
? fstat: can't read file 1 at 0x200007fffffffff
? fstat: can't read file 2 at 0x4000000001fffff
? fstat: can't read file 4 at 0x780000ffff
? fstat: can't read file 5 at 0x600000000


but I have no idea what these are from.

df -h output:
/dev/da0p2    1.8T    226G    1.5T    13%    /
devfs         1.0K    1.0K      0B   100%    /dev
/dev/da1p1    7.0T    251G    6.2T     4%    /mnt/USBBD


da0p2 is a RAID level 5 on an HP Smart Array

Here is the output of dmsg after reboot:
http://pastebin.com/rHVjgZ82

Obviously both the RAID and USB drive did not walk away from the crash cleaning.
Should I be running a fsck at this point on both from single user mode to verify
and clean up. My concern is the:
WARNING: /: mount pending error: blocks 0 files 26
when mounting /dev/da0p2

For some reason I was under the impression that fsck was run automatically on
reboot.

Any help in this matter would be greatly appreciated. I'm a little concerned
that a backup strategy that has worked for us for many MANY years would so
easily throw the OS into panic. If an I/O error occurred on the USB Drive I
would frankly think it should just back out, without panic. Or am I missing
something?

Any recommendations / insights would be most welcome.
Shawn

Karl Denninger

2017-Feb-06 21:23 UTC

head link

FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

On 2/6/2017 15:01, Shawn Bakhtiar wrote:> Hi all!
>
> http://pastebin.com/niXrjF0D
>
> Please refer to full output from crash above.
>
> This morning our IMAP server decided to go belly up. I could not remote in,
and the machine would not respond to any pings.
>
> Checking the physical console I had the following worrisome messages on
screen:
>
> ? g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5
> ? g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16
> ? /mnt/USBBD: got error 16 while accessing filesystem
> ? panic: softdep_deallocate_dependencies: unrecovered I/O error
> ? cpuid = 5
>
> /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the
IMAP data using rsync. Everything so far has worked without issue.
>
> I also noticed a bunch of:
>
> ? fstat: can't read file 2 at 0x4000000001fffff
> ? fstat: can't read file 4 at 0x780000ffff
> ? fstat: can't read file 5 at 0x600000000
> ? fstat: can't read file 1 at 0x200007fffffffff
> ? fstat: can't read file 2 at 0x4000000001fffff
> ? fstat: can't read file 4 at 0x780000ffff
> ? fstat: can't read file 5 at 0x600000000
>
>
> but I have no idea what these are from.
>
> df -h output:
> /dev/da0p2    1.8T    226G    1.5T    13%    /
> devfs         1.0K    1.0K      0B   100%    /dev
> /dev/da1p1    7.0T    251G    6.2T     4%    /mnt/USBBD
>
>
> da0p2 is a RAID level 5 on an HP Smart Array
>
> Here is the output of dmsg after reboot:
> http://pastebin.com/rHVjgZ82
>
> Obviously both the RAID and USB drive did not walk away from the crash
cleaning. Should I be running a fsck at this point on both from single user mode
to verify and clean up. My concern is the:
> WARNING: /: mount pending error: blocks 0 files 26
> when mounting /dev/da0p2
>
> For some reason I was under the impression that fsck was run automatically
on reboot.
>
> Any help in this matter would be greatly appreciated. I'm a little
concerned that a backup strategy that has worked for us for many MANY years
would so easily throw the OS into panic. If an I/O error occurred on the USB
Drive I would frankly think it should just back out, without panic. Or am I
missing something?
>
> Any recommendations / insights would be most welcome.
> Shawn
>
>The "mount pending error" is normal on a disk that has softupdates
turned on; fsck runs in the background after the boot, and this is
"safe" because of how the metadata and data writes are ordered.  In
other words the filesystem in this situation is missing uncommitted
data, but the state of the system is consistent.  As a result the system
can mount root read-write without having to fsck it first and the
background cleanup is safe from a disk consistency problem.

The panic itself appears to have resulted from an I/O error that
resulted in a failed operation.

I was part of a thread in 2016 on this you can find here:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-July/084944.html

The basic problem is that the softupdates code cannot deal with a hard
I/O error on write because it no longer can guarantee filesystem
integrity if it continues.  I argued in that thread that the superior
solution would be forcibly detach the volume, which would leave you with
a "dirty" filesystem and a failed operation but not a panic.  The
file(s) involved in the write error might be lost, but the integrity of
the filesystem is recoverable (as it is in the panic case) -- at least
it is if the fsck doesn't require writing to a block that *also* errors out.

The decision in the code is to panic rather than detach the volume,
however, so panic it is.  This one has bit me with sd cards in small
embedded-style machines (where turning off softupdates makes things VERY
slow) and at some point I may look into developing a patch to
forcibly-detach the volume instead.  That obviously won't help you if
the system volume is the one the error happens on (now you just forcibly
detached the root filesystem which is going to get you an immediate
panic anyway) but in the event of a data disk it would prevent the
system from crashing.

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2993 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20170206/e60ac87f/attachment.bin>

Shawn Bakhtiar

2017-Feb-10 15:36 UTC

head link

FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

Well....

It happened again today.

I found a few instances on the web of others reporting similar issues, and also
ran across this bug.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211037

This is early similar to what is happening to me, save that in this case
it's happening with a USB drive. Should I be attaching this there?

I've disabled the local backup to the USB (just doing the remote one to the
share drive).

Any help would be greatly appreciated.


On Feb 6, 2017, at 1:01 PM, Shaheen Bakhtiar <shashaness at
hotmail.com<mailto:shashaness at hotmail.com>> wrote:

Hi all!

http://pastebin.com/niXrjF0D

Please refer to full output from crash above.

This morning our IMAP server decided to go belly up. I could not remote in, and
the machine would not respond to any pings.

Checking the physical console I had the following worrisome messages on screen:

? g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5
? g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16
? /mnt/USBBD: got error 16 while accessing filesystem
? panic: softdep_deallocate_dependencies: unrecovered I/O error
? cpuid = 5

/mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the IMAP
data using rsync. Everything so far has worked without issue.

I also noticed a bunch of:

? fstat: can't read file 2 at 0x4000000001fffff
? fstat: can't read file 4 at 0x780000ffff
? fstat: can't read file 5 at 0x600000000
? fstat: can't read file 1 at 0x200007fffffffff
? fstat: can't read file 2 at 0x4000000001fffff
? fstat: can't read file 4 at 0x780000ffff
? fstat: can't read file 5 at 0x600000000


but I have no idea what these are from.

df -h output:
/dev/da0p2    1.8T    226G    1.5T    13%    /
devfs         1.0K    1.0K      0B   100%    /dev
/dev/da1p1    7.0T    251G    6.2T     4%    /mnt/USBBD


da0p2 is a RAID level 5 on an HP Smart Array

Here is the output of dmsg after reboot:
http://pastebin.com/rHVjgZ82

Obviously both the RAID and USB drive did not walk away from the crash cleaning.
Should I be running a fsck at this point on both from single user mode to verify
and clean up. My concern is the:
WARNING: /: mount pending error: blocks 0 files 26
when mounting /dev/da0p2

For some reason I was under the impression that fsck was run automatically on
reboot.

Any help in this matter would be greatly appreciated. I'm a little concerned
that a backup strategy that has worked for us for many MANY years would so
easily throw the OS into panic. If an I/O error occurred on the USB Drive I
would frankly think it should just back out, without panic. Or am I missing
something?

Any recommendations / insights would be most welcome.
Shawn

freebsd stable - Feb 2017 - FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

FreeBSD 10.2-RELEASE #0 r286666: Panic and crash