Hi, I just replaced a drive (c12t5d0 in the listing below). For the first 6 hours of the resilver I saw no issues. However, sometime during the last hour of the resilver, the new drive and two others in the same RAID-Z2 strip threw a couple checksum errors. Also, two of the other drives in the stripe sometime the the last hour decided they need to resilver small amounts of data (128K and 64K respectively). OS in snv126. My two questions are: Should I be worried about these checksum errors? What caused the small resilverings on c8t5d0 and c11t5d0 which were not replaced or otherwise touched? Thank you in advance. -J pool: zpool_db_css state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 7h0m with 0 errors on Thu Sep 30 04:59:49 2010 config: NAME STATE READ WRITE CKSUM zpool_db_css ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 4 128K resilvered c10t5d0 ONLINE 0 0 0 c11t5d0 ONLINE 0 0 2 64K resilvered c12t5d0 ONLINE 0 0 3 61.0G resilvered c13t5d0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c10t6d0 ONLINE 0 0 0 c11t6d0 ONLINE 0 0 0 c12t6d0 ONLINE 0 0 0 c13t6d0 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c10t7d0 ONLINE 0 0 0 c11t7d0 ONLINE 0 0 0 c12t7d0 ONLINE 0 0 0 c13t7d0 ONLINE 0 0 0 spares c13t4d0 AVAIL c12t4d0 AVAIL -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100930/a99ef36b/attachment.html>
On Thu, Sep 30, 2010 at 9:08 AM, Jason J. W. Williams < jasonjwwilliams at gmail.com> wrote:> > Should I be worried about these checksum errors? > >Maybe. Your disks, cabling or disk controller is probably having some issue which caused them. or maybe sunspots are to blame. Run a scrub often and monitor if there are more, and if there is a pattern to them. Have backups. Maybe switch hardware one by one to see if that helps.> What caused the small resilverings on c8t5d0 and c11t5d0 which were not > replaced or otherwise touched? > >It was the checksum errors. ZFS automatically read the good data on other mirrors, and replaced the broken blocks with correct data. If you run zpool clear and zpool scrub you will notice these checksum errors have vanished. If they were caused by botched writes, no new errors should probably appear. If they are botched reads, you can see some new ones appearing :( So, not critical yet but something to keep an eye on. Tuomas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100930/e0cafc36/attachment.html>
Thanks Tuomas. I''ll run the scrub. It''s an aging X4500. -J On Thu, Sep 30, 2010 at 3:25 AM, Tuomas Leikola <tuomas.leikola at gmail.com>wrote:> On Thu, Sep 30, 2010 at 9:08 AM, Jason J. W. Williams < > jasonjwwilliams at gmail.com> wrote: > >> >> Should I be worried about these checksum errors? >> >> > Maybe. Your disks, cabling or disk controller is probably having some issue > which caused them. or maybe sunspots are to blame. > > Run a scrub often and monitor if there are more, and if there is a pattern > to them. Have backups. Maybe switch hardware one by one to see if that > helps. > > >> What caused the small resilverings on c8t5d0 and c11t5d0 which were not >> replaced or otherwise touched? >> >> > It was the checksum errors. ZFS automatically read the good data on other > mirrors, and replaced the broken blocks with correct data. If you run zpool > clear and zpool scrub you will notice these checksum errors have vanished. > If they were caused by botched writes, no new errors should probably appear. > If they are botched reads, you can see some new ones appearing :( > > So, not critical yet but something to keep an eye on. > > Tuomas >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100930/0e412865/attachment.html>