On Mon, 2016-07-11 at 06:30 -0500, Karl Denninger wrote:> On 7/11/2016 02:57, Ronald Klop wrote:
> > On Mon, 11 Jul 2016 02:54:38 +0200, Karl Denninger
> > <karl at denninger.net> wrote:
> >
> > > Got a (nasty) surprise this afternoon on my sandbox machine.
> > >
> > > I was updating some Raspberry Pi2 machines which involved taking
> > > the sd
> > > card out, sticking it in an adapter and plugging it into the
> > > sandbox,
> > > then mounting the partition and using rsync.
> > >
> > > Unfortunately one of the cards was, unknown to me, bad and
> > > returned a
> > > write error during the update.
> > >
> > > The machine panic'd immediately after the CAM write error
popped
> > > up.
> > >
> > > I was quite surprised by this, since (1) the SD card was (of
> > > course)
> > > mounted as a UFS filesystem; it shows up as a CAM device, (2) the
> > > machine itself is running off a ZFS root on a normal host-adapter
> > > and
> > > thus there is no comingling of the buffer cache and (3) there
> > > were no
> > > images being run from (can't, wrong architecture!) nor any
system
> > > I/O
> > > (e.g. pagefile) going to the SD card.
> > >
> > > I certainly understand that under some circumstances (maybe even
> > > most
> > > circumstances) taking a hard I/O error to a system device is
> > > going to
> > > hose you and a panic() is arguably "least astonishment"
when the
> > > price
> > > of being wrong might be a corrupted system file or worse (e.g.
> > > corrupted
> > > paged-out RSS, etc.) But I didn't expect a panic out a
failed
> > > write to
> > > a device that is mounted and being used purely for data.
> > >
> > > I don't have a crash dump but can almost-certainly reproduce
this
> > > if
> > > it's something that shouldn't happen and thus merits
> > > investigation.
> > >
> >
> > Hi,
> >
> > I understand you are surprised by this. I don't think it is the
way
> > it
> > should work.
> > Is there _any_ debugging information for people to use and try to
> > help
> > you? Like which FreeBSD version are you running? Which FreeBSD
> > version
> > was used to create the UFS fs? Does it use softupdates (SU) or also
> > journaling (SU+J)?
> > Maybe some output of dmesg? Or type of SD-card and reader. Other
> > people might have similar problems with similar hardware.
> >
> > Regards,
> > Ronald.
> >
> FreeBSD 11.0-BETA1 #0 r302489: Sat Jul 9 10:15:24 CDT 2016
> karl at NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
>
> and
>
> FreeBSD 11.0-BETA1 #0 r302526: Sun Jul 10 10:39:31 CDT 2016
> karl at NewFS.denninger.net:/pics/CrossBuild/obj/arm.armv6/pics/CrossBui
> ld/src/sys/RPI2
>
> Both blew up in the same way when stimulated with same I/O error.
>
> The filesystem in question does have softupdates enabled (the RPI
> images
> have it turned on by default) but no journaling. It's not
> card/reader
> dependent no architecture dependent; when it occurred the first time
> I
> stuck the card and reader into one of my Pis and attempted to update
> it
> there (thinking that perhaps my sandbox machine's USB port was wonky)
> and it blew up the Pi2 in the exact same way.
>
> This isn't (obviously, given both Intel-style and ARM machines being
> involved) architecture dependent.
>
> It's been a good long while since I took an actual hard I/O error
> that
> was 'visible' at the OS level (I've had plenty of disks die on
ZFS
> over
> last few years but no "double failures" on a mirror or similar,
and I
> on
> my servers I haven't had a UFS-based system for a while. This
> definitely looks like some sort of regression in the code; I've run
> FreeBSD for a hell of a long time and have had plenty of instances
> where
> disks have failed without having the machine go out from under me.
>
Unfortunately, this is "just the way it works". A hard IO error while
writing to a ufs filesystem with softupdates enabled will cause a
panic, because the softupdates code doesn't handle that sort of
failure, and the failure means that filesystem integrity is lost. The
code has no idea how important the data is to the functioning of the
system, no basis on which to decide whether to panic or not.
-- Ian