Dmitry Morozovsky
2015-Apr-16 11:47 UTC
[GEOM] Disk IO error when resyncing gmirror -> massive hang in D state
Walter, thanks for your suggestions. to quickly answer: I' already evacuated data to the new drive (see the last paragraph of my original message). Luckily no critical data were on failed disk part, so rsync finished well the very first pass. The only question still actually open for me is why the kernel was stuck in geom, not returning read/write errors to the applications I'll try to collect lab machine with this drive (which is still by my work table) and reproduce the error. On Wed, 15 Apr 2015, Walter Cramer wrote:> Here are a few ideas I had, if more capable people have not already sent you > better ones: > > Copy as much important data as possible from the Toshiba drive, since it could > degrade further or die at any time. > > Check whether a 'dd' command can quickly reproduce the error, so you can try > things faster. > > If the failing drive is not fairly cold, try chilling it with a strong fan. > > Briefly put the drive in another system, to see if using a different power > supply, controller, data cable, etc. would help. Changing the orientation > (direction of gravity on the drive) might also be good. > > If nothing else helped, a tiny c language program might use open(), read(), > lseek(), write(), etc. to copy all readable sectors to your replacement disk > (using zeros for the unreadable bad sectors). > > -Walter > > > On Tue, 14 Apr 2015, Dmitry Morozovsky wrote: > > > Dear colleagues, > > > > unfortunately, the machine in question is in productin, so I have no clear > > reproduce case. I do have console logs, however. > > > > prerequisites: > > - rather fresh stable/10, amd64, SuperMicro MicroCloud 1150, X10SLD-F/HF > > - su+j ufs2 on top of gmirror of two SATA Toshiba drives > > - one disk died some time ago, so gmirror works in degraded state > > > > trouble: > > - inserted new drive, labelled, started gmirror resync > > - apparently remaining drive also has read issues: > > (ada0:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 10 b2 c3 40 01 00 00 01 > > 00 00 > > (ada0:ahcich1:0:0:0): CAM status: ATA Status Error > > (ada0:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) > > (ada0:ahcich1:0:0:0): RES: 41 40 04 b3 c3 40 01 00 00 00 01 > > (ada0:ahcich1:0:0:0): Error 5, Retries exhausted > > GEOM_MIRROR: Request failed (error=5). ada0a[READ(offset=6566445056, > > length=131072)] > > GEOM_MIRROR: Synchronization request failed (error=5). > > mirror/m0a[READ(offset=6566445056, length=131072)] > > > > at this point, all requests to disk I/O are stalled, all cron jobs, syslogd, > > dchpd, etc. > > > > Situation reproduce itself at least two times, then as an emergency new > > drive > > had been labelled independently and rsynced over. > > > > Any thoughts? > > > > Thanks in advance! > > > > > > -- > > Sincerely, > > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > > [ FreeBSD committer: marck at FreeBSD.org ] > > ------------------------------------------------------------------------ > > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck at rinet.ru *** > > ------------------------------------------------------------------------ > > _______________________________________________ > > freebsd-stable at freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck at FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck at rinet.ru *** ------------------------------------------------------------------------