thr3ads.net - freebsd stable - Not-so stable if you take a CAM error.... [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Karl Denninger

2016-Jul-11 17:30 UTC

Not-so stable if you take a CAM error....

On 7/11/2016 11:32, Ian Lepore wrote:> On Mon, 2016-07-11 at 09:50 -0400, Brandon Allbery wrote:
>> On Mon, Jul 11, 2016 at 9:46 AM, Karl Denninger <karl at
denninger.net>
>> wrote:
>>
>>> Here's the backtrace ... sounds like expected behavior, which
is
>>> not-so
>>> good all-in for a situation like this.  I guess the strategy is to
>>> turn
>>> off softupdates before attempting such an update so as not to crash
>>> the
>>> host machine if there's a problem with the card.
>>>
>> I would tend to assume that removable media should not have
>> softupdates
>> enabled. Even with properly working media, it's practically begging
>> for
>> corruption.
>>
> Writing to an sdcard without softupdates enabled will be an exercise in
> patience.  Like, come back next week and maybe it'll be done.
>
> The only thing that comes to mind with this is maybe some sort of mount
> flag to say you're willing to live with any amount of filesystem
> corruption in lieu of panicking.  I'm not sure how easy/practical that
> would be to implement, though.
>
> -- IanWhy not force-detach the volume that takes the error instead of a panic()?

That would lead to a panic if the detached volume was the system volume
(obviously) but for a data volume it would simply result in it being
forcibly unmounted (and dirty, so if it's corrupt it will get caught
when reattached.)

It seems that the current paradigm of saying "screw you, panic the
machine" violates the principle of least astonishment and is overly
punitive vis-a-vis necessity.  Refusing further I/O because the volume
may now have a corrupt filesystem appears to be facially reasonable, but
that doesn't necessarily wind up being fatal the system itself -- it is
if that's the system volume and is not covered by some sort of
redundancy, obviously, but it's not in all cases.

(Note that you can't just unmount the filesystem involved in the error;
it has to be the volume that gets forcibly detached and whatever flows
through from that you have to live with.  The reason is that on any sort
of solid-state media the OS has zero control over zoning and write
amplification means far more the data you were actually modifying may
have been lost -- it's entirely possible that *several megabytes* of
data just got trashed by the write error, and it's even possible that
the block(s) involved cross a filesystem boundary!)

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2996 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160711/ae02a724/attachment.bin>

Ian Lepore

2016-Jul-11 17:39 UTC

head link

Not-so stable if you take a CAM error....

On Mon, 2016-07-11 at 12:30 -0500, Karl Denninger wrote:> On 7/11/2016 11:32, Ian Lepore wrote:
> > On Mon, 2016-07-11 at 09:50 -0400, Brandon Allbery wrote:
> > > On Mon, Jul 11, 2016 at 9:46 AM, Karl Denninger <
> > > karl at denninger.net>
> > > wrote:
> > > 
> > > > Here's the backtrace ... sounds like expected behavior,
which
> > > > is
> > > > not-so
> > > > good all-in for a situation like this.  I guess the strategy
is
> > > > to
> > > > turn
> > > > off softupdates before attempting such an update so as not
to
> > > > crash
> > > > the
> > > > host machine if there's a problem with the card.
> > > > 
> > > I would tend to assume that removable media should not have
> > > softupdates
> > > enabled. Even with properly working media, it's practically
> > > begging
> > > for
> > > corruption.
> > > 
> > Writing to an sdcard without softupdates enabled will be an
> > exercise in
> > patience.  Like, come back next week and maybe it'll be done.
> > 
> > The only thing that comes to mind with this is maybe some sort of
> > mount
> > flag to say you're willing to live with any amount of filesystem
> > corruption in lieu of panicking.  I'm not sure how easy/practical
> > that
> > would be to implement, though.
> > 
> > -- Ian
> Why not force-detach the volume that takes the error instead of a
> panic()?
> 
Patches welcome.

-- Ian
> That would lead to a panic if the detached volume was the system
> volume
> (obviously) but for a data volume it would simply result in it being
> forcibly unmounted (and dirty, so if it's corrupt it will get caught
> when reattached.)
> 
> It seems that the current paradigm of saying "screw you, panic the
> machine" violates the principle of least astonishment and is overly
> punitive vis-a-vis necessity.  Refusing further I/O because the
> volume
> may now have a corrupt filesystem appears to be facially reasonable,
> but
> that doesn't necessarily wind up being fatal the system itself -- it
> is
> if that's the system volume and is not covered by some sort of
> redundancy, obviously, but it's not in all cases.
> 
> (Note that you can't just unmount the filesystem involved in the
> error;
> it has to be the volume that gets forcibly detached and whatever
> flows
> through from that you have to live with.  The reason is that on any
> sort
> of solid-state media the OS has zero control over zoning and write
> amplification means far more the data you were actually modifying may
> have been lost -- it's entirely possible that *several megabytes* of
> data just got trashed by the write error, and it's even possible that
> the block(s) involved cross a filesystem boundary!)
>

Lowell Gilbert

2016-Jul-12 14:22 UTC

head link

Not-so stable if you take a CAM error....

Karl Denninger <karl at denninger.net> writes:
> Why not force-detach the volume that takes the error instead of a panic()?
>
> That would lead to a panic if the detached volume was the system volume
> (obviously) but for a data volume it would simply result in it being
> forcibly unmounted (and dirty, so if it's corrupt it will get caught
> when reattached.)
>
> It seems that the current paradigm of saying "screw you, panic the
> machine" violates the principle of least astonishment and is overly
> punitive vis-a-vis necessity.  Refusing further I/O because the volume
> may now have a corrupt filesystem appears to be facially reasonable, but
> that doesn't necessarily wind up being fatal the system itself -- it is
> if that's the system volume and is not covered by some sort of
> redundancy, obviously, but it's not in all cases.
How do you find the processes with pages mapped from the filesystem's
vnodes? UFS is *very* tightly tied to the VM system, and intentionally
so. Recall that "umount -f" isn't exactly safe...

freebsd stable - Jul 2016 - Not-so stable if you take a CAM error....

Not-so stable if you take a CAM error....

Not-so stable if you take a CAM error....

Not-so stable if you take a CAM error....