Karl Denninger
2005-Mar-07 03:28 UTC
Caution - possible system instability on attempted fix for "WRITE_ERROR" problem (see enclosed)
Hi folks; This may be the wrong place, given what I did, but I wanted to give a "heads up" here given the impending release of 5.4-RELEASE This refers to http://www.freebsd.org/cgi/query-pr.cgi?pr=77643 In an attempt to mitigate this, I saw the following commit in the CVS logs: mdodd 2005-03-02 04:01:37 UTC FreeBSD src repository Modified files: sys/dev/ata ata-queue.c Log: When resubmitting a timed out request, reset donecount. Submitted by: Nate Lawson <nate AT root.org> Revision Changes Path 1.42 +1 -0 src/sys/dev/ata/ata-queue.c Is this change supposed to be "safe" against a 5.4-PRERELEASE kernel from today (CVSupped about 1700 CST)? If it is supposed to be, its NOT! It DOES fix the failure to requeue timed out requests, but it also provokes radical destabilization of the interrupt system in the kernel (e.g. receive serial interrupts "disappear", etc) leading evenutally to a panic. BTW, it <DOES> appear to fix the requeue problem with disks, and wth this in a disk that takes a timeout (but is actually working) does not disconnect from a GEOM mirror; the retried write succeeds. However, for obvious reasons the kernel instability that results from the retried write is not acceptable :) Don't know if this is germane to what is about to show up in 5.4-RELEASE, but if it is, this urgently needs to be looked at. Needless to say I've backed this one out. Will also put this against the PR to dissuade others from trying the same thing... -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://www.spamcuda.net SPAM FREE mailboxes - FREE FOR A LIMITED TIME! http://genesis3.blogspot.com Musings Of A Sentient Mind