On 7/11/2016 11:32, Ian Lepore wrote:> On Mon, 2016-07-11 at 09:50 -0400, Brandon Allbery wrote: >> On Mon, Jul 11, 2016 at 9:46 AM, Karl Denninger <karl at denninger.net> >> wrote: >> >>> Here's the backtrace ... sounds like expected behavior, which is >>> not-so >>> good all-in for a situation like this. I guess the strategy is to >>> turn >>> off softupdates before attempting such an update so as not to crash >>> the >>> host machine if there's a problem with the card. >>> >> I would tend to assume that removable media should not have >> softupdates >> enabled. Even with properly working media, it's practically begging >> for >> corruption. >> > Writing to an sdcard without softupdates enabled will be an exercise in > patience. Like, come back next week and maybe it'll be done. > > The only thing that comes to mind with this is maybe some sort of mount > flag to say you're willing to live with any amount of filesystem > corruption in lieu of panicking. I'm not sure how easy/practical that > would be to implement, though. > > -- IanWhy not force-detach the volume that takes the error instead of a panic()? That would lead to a panic if the detached volume was the system volume (obviously) but for a data volume it would simply result in it being forcibly unmounted (and dirty, so if it's corrupt it will get caught when reattached.) It seems that the current paradigm of saying "screw you, panic the machine" violates the principle of least astonishment and is overly punitive vis-a-vis necessity. Refusing further I/O because the volume may now have a corrupt filesystem appears to be facially reasonable, but that doesn't necessarily wind up being fatal the system itself -- it is if that's the system volume and is not covered by some sort of redundancy, obviously, but it's not in all cases. (Note that you can't just unmount the filesystem involved in the error; it has to be the volume that gets forcibly detached and whatever flows through from that you have to live with. The reason is that on any sort of solid-state media the OS has zero control over zoning and write amplification means far more the data you were actually modifying may have been lost -- it's entirely possible that *several megabytes* of data just got trashed by the write error, and it's even possible that the block(s) involved cross a filesystem boundary!) -- Karl Denninger karl at denninger.net <mailto:karl at denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2996 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160711/ae02a724/attachment.bin>
On Mon, 2016-07-11 at 12:30 -0500, Karl Denninger wrote:> On 7/11/2016 11:32, Ian Lepore wrote: > > On Mon, 2016-07-11 at 09:50 -0400, Brandon Allbery wrote: > > > On Mon, Jul 11, 2016 at 9:46 AM, Karl Denninger < > > > karl at denninger.net> > > > wrote: > > > > > > > Here's the backtrace ... sounds like expected behavior, which > > > > is > > > > not-so > > > > good all-in for a situation like this. I guess the strategy is > > > > to > > > > turn > > > > off softupdates before attempting such an update so as not to > > > > crash > > > > the > > > > host machine if there's a problem with the card. > > > > > > > I would tend to assume that removable media should not have > > > softupdates > > > enabled. Even with properly working media, it's practically > > > begging > > > for > > > corruption. > > > > > Writing to an sdcard without softupdates enabled will be an > > exercise in > > patience. Like, come back next week and maybe it'll be done. > > > > The only thing that comes to mind with this is maybe some sort of > > mount > > flag to say you're willing to live with any amount of filesystem > > corruption in lieu of panicking. I'm not sure how easy/practical > > that > > would be to implement, though. > > > > -- Ian > Why not force-detach the volume that takes the error instead of a > panic()? >Patches welcome. -- Ian> That would lead to a panic if the detached volume was the system > volume > (obviously) but for a data volume it would simply result in it being > forcibly unmounted (and dirty, so if it's corrupt it will get caught > when reattached.) > > It seems that the current paradigm of saying "screw you, panic the > machine" violates the principle of least astonishment and is overly > punitive vis-a-vis necessity. Refusing further I/O because the > volume > may now have a corrupt filesystem appears to be facially reasonable, > but > that doesn't necessarily wind up being fatal the system itself -- it > is > if that's the system volume and is not covered by some sort of > redundancy, obviously, but it's not in all cases. > > (Note that you can't just unmount the filesystem involved in the > error; > it has to be the volume that gets forcibly detached and whatever > flows > through from that you have to live with. The reason is that on any > sort > of solid-state media the OS has zero control over zoning and write > amplification means far more the data you were actually modifying may > have been lost -- it's entirely possible that *several megabytes* of > data just got trashed by the write error, and it's even possible that > the block(s) involved cross a filesystem boundary!) >
Karl Denninger <karl at denninger.net> writes:> Why not force-detach the volume that takes the error instead of a panic()? > > That would lead to a panic if the detached volume was the system volume > (obviously) but for a data volume it would simply result in it being > forcibly unmounted (and dirty, so if it's corrupt it will get caught > when reattached.) > > It seems that the current paradigm of saying "screw you, panic the > machine" violates the principle of least astonishment and is overly > punitive vis-a-vis necessity. Refusing further I/O because the volume > may now have a corrupt filesystem appears to be facially reasonable, but > that doesn't necessarily wind up being fatal the system itself -- it is > if that's the system volume and is not covered by some sort of > redundancy, obviously, but it's not in all cases.How do you find the processes with pages mapped from the filesystem's vnodes? UFS is *very* tightly tied to the VM system, and intentionally so. Recall that "umount -f" isn't exactly safe...