Hi all, one single disk in a zfs mirror failed permanently throwing errors like kernel: (ada5:ata10:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT ) and alike. The pool itself continued working degraded, smartctl showed a very high 199 UDMA_CRC_Error_Count value, which to my knowledge may indicate a broken cable, in this case indeed a cable replacement solved the problem, the pool resilvered and all is fine. Still smartctl -a displays a value of 199 UDMA_CRC_Error_Count I reckon to be way too high, though ( > 3900 ) . So is this value now including errors from previous broken cable ? In other words, when, if at all, is the cache smartmontools read from flushed and values are to be taken as of the status after fixing a hardware problem but not swapping the disk ? Can someone please share some insight ? thanks
Hi, On 2012.11.09 12:18, H. Ingow wrote:> > Hi all, > > one single disk in a zfs mirror failed permanently throwing errors like > kernel: (ada5:ata10:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 84 > (ICRC ABRT ) and alike. > > The pool itself continued working degraded, smartctl showed a very high > 199 UDMA_CRC_Error_Count value, which to my knowledge may indicate a > broken cable, in this case indeed a cable replacement solved the > problem, the pool resilvered and all is fine. > > Still smartctl -a displays a value of 199 UDMA_CRC_Error_Count I reckon > to be way too high, though ( > 3900 ) . > So is this value now including errors from previous broken cable ?I'm pretty sure it is. I don't think SMART attributes can vary in value both up and down ; they seem to me like they're counters that can only get incremented.> In other words, when, if at all, is the cache smartmontools read from > flushed and values are to be taken as of the status after fixing a > hardware problem but not swapping the disk ?So, in my opinion no.
Hi!> Still smartctl -a displays a value of 199 UDMA_CRC_Error_Count I reckon > to be way too high, though ( > 3900 ) . > So is this value now including errors from previous broken cable ?> In other words, when, if at all, is the cache smartmontools read from > flushed and values are to be taken as of the status after fixing a > hardware problem but not swapping the disk ?SMART values are stored in the drive, not on some cache in the system. The bad cable caused the drive to see errors. There is no way to reset the counters in the drive. So the error counter will stay at that value, but as long as it does no longer increase, you're fine. -- pi at opsec.eu +49 171 3101372 8 years to go !