Twice in the past week (when things have previously been fine for a year), a server has locked up spewing forth a continuous stream of ext3 write errors. This is to a bog-standard IDE disk, only thing on the controller, etc. Nothing EVER hits the logs. Not a single error. Every process that accesses the disk seems to fail. It looks like ext3 is failing every I/O request. If the machine is rebooted, it comes up completely clean. I am prepared to believe I have a bad disk that occasionally pops up the odd IDE error, but given it takes pretty intensive activity (busy mail server) without failing, I don't believe it is riddled with errors. I speculate that either a) After one I/O error, ext3 or the IDE layer is returning I/O errors for every request (something is not resetting some error condition), or b) Something is hanging the IDE bus, which only a reboot cures. Is (a) a known possibility? If not, is (b) possible? I am upgrading to 2.4.26-rc2, to see if that fixes things. What further information should I look for (or post here) to help me (or you) debug this further? Alex
On 3 Jul 2004, Alex Bligh wrote:> Twice in the past week (when things have previously been fine for a > year), a server has locked up spewing forth a continuous stream of > ext3 write errors. This is to a bog-standard IDE disk, only thing > on the controller, etc.I have had this sort of behavior on a system for some time, which resolved itself as a faulty IDE cable in the end.> Nothing EVER hits the logs. Not a single error. Every process that > accesses the disk seems to fail. It looks like ext3 is failing every > I/O request.When the system generated (sufficient) corruption, it would trigger the default action of remounting the filesystem read-only, which cause a good deal of follow on failure; perhaps this is the case here? Regards, Daniel -- Fetters of gold are still fetters, and the softest lining can never make them so easy as liberty. -- Mary Astell, _An Essay in Defence of the Female Sex_, 1696
On Sat, 2004-07-03 at 13:43, Alex Bligh wrote:> Twice in the past week (when things have previously been fine for a > year), a server has locked up spewing forth a continuous stream of > ext3 write errors. This is to a bog-standard IDE disk, only thing > on the controller, etc. > > Nothing EVER hits the logs. Not a single error. Every process that > accesses the disk seems to fail. It looks like ext3 is failing every > I/O request. > > If the machine is rebooted, it comes up completely clean. > > I am prepared to believe I have a bad disk that occasionally pops up > the odd IDE error, but given it takes pretty intensive activity (busy > mail server) without failing, I don't believe it is riddled with errors. > > I speculate that either > a) After one I/O error, ext3 or the IDE layer is returning I/O errors > for every request (something is not resetting some error condition), > or > b) Something is hanging the IDE bus, which only a reboot cures. > > Is (a) a known possibility? If not, is (b) possible? > > I am upgrading to 2.4.26-rc2, to see if that fixes things. What further > information should I look for (or post here) to help me (or you) debug this > further? > > Alexdumpe2fs -h <working partition> could help ... maybe.