Hi, My zpool is reporting unrecoverable errors with the metadata: pool: rpool2> state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be effected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > > (snip) > > "errors: Permanent errors have been detected in the following files: > <metadata>:<0x0> > <metadata>:<0x1> >It initially reported a DEGRADED pool, but after a reboot, the pool is now ONLINE and a quick inspection indicates that my data is present and intact (though the errors stop the file-systems in the pool from mounting at boot - it drops into maintenance mode). My reading of http://www.sun.com/msg/ZFS-8000-8A indicates I should destroy the pool and start again, but http://www.crypticide.com/dropsafe/article/2162 gives me some small hope that this might be fixable... The pool has been ''a little flakey'' since I built it two months back. I''ve been getting small numbers of read and checksum errors on a few of the disks each day. Initially I replaced the disks, but they would always pass all testing, so lately I''ve just been clearing the errors each day and looking for another solution. I thought I had found it when I discovered WD had a firmware patch (http://www.3ware.com/kb/article.aspx?id=15592 , http://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html) which solved bugs in the drive spin-up behaviour which has been causing problems for various hardware RAID controllers. So, yesterday I shutdown the machine, pulled 2 of the 4 troublesome disks, and applied the firmware upgrade (04.05G05). When I booted, the pool was degraded and showed metadata errors. After a shutdown and cold start, the pool was ONLINE but still have metadata errors (so somewhat inconsistent guidance from ZPOOL STATUS. Can anyone explain what this ''metadata'' is? More Details: - This is a backup server so I can rebuild if necessary, but on principle I''d like to have a go at fixing it... - The zpool has 96 x 2TB drives divided in to RAIDZ2 sets of 8 (6+2). - The drives are Western Digital RE-4''s (WD2002FYPS). - Running OpenSolaris build snv_111b. - Drives are in two AIC JBODs connectected via SAS. - HBA is an LSI 3801E - Server is 1RU SuperMicro Intel. Any advice appreciated! :-) Paul Tetley NearMap Pty Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100312/2b69e803/attachment.html>
On Mar 11, 2010, at 11:28 PM, Paul Tetley wrote:> Hi, > > My zpool is reporting unrecoverable errors with the metadata: > > pool: rpool2 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be effected. > action: Restore the file in question if possible. Otherwise restore the entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > > (snip) > > "errors: Permanent errors have been detected in the following files: > <metadata>:<0x0> > <metadata>:<0x1> > > It initially reported a DEGRADED pool, but after a reboot, the pool is now ONLINE and a quick inspection indicates that my data is present and intact (though the errors stop the file-systems in the pool from mounting at boot - it drops into maintenance mode). My reading of http://www.sun.com/msg/ZFS-8000-8A indicates I should destroy the pool and start again, but http://www.crypticide.com/dropsafe/article/2162 gives me some small hope that this might be fixable... > > The pool has been ''a little flakey'' since I built it two months back. I''ve been getting small numbers of read and checksum errors on a few of the disks each day. Initially I replaced the disks, but they would always pass all testing, so lately I''ve just been clearing the errors each day and looking for another solution. I thought I had found it when I discovered WD had a firmware patch (http://www.3ware.com/kb/article.aspx?id=15592 , http://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html) which solved bugs in the drive spin-up behaviour which has been causing problems for various hardware RAID controllers. So, yesterday I shutdown the machine, pulled 2 of the 4 troublesome disks, and applied the firmware upgrade (04.05G05). When I booted, the pool was degraded and showed metadata errors. After a shutdown and cold start, the pool was ONLINE but still have metadata errors (so somewhat inconsistent guidance from ZPOOL STATUS. > > Can anyone explain what this ''metadata'' is? > > More Details: > ? This is a backup server so I can rebuild if necessary, but on principle I''d like to have a go at fixing it... > ? The zpool has 96 x 2TB drives divided in to RAIDZ2 sets of 8 (6+2). > ? The drives are Western Digital RE-4''s (WD2002FYPS). > ? Running OpenSolaris build snv_111b. > ? Drives are in two AIC JBODs connectected via SAS. > ? HBA is an LSI 3801EIt has been suggested to check the firmware release for this controller. This CR has a workaround that might help, too. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6894775 -- richard> ? Server is 1RU SuperMicro Intel. > Any advice appreciated! > > :-) > Paul Tetley > NearMap Pty Ltd > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Atlanta, March 16-18, 2010 http://nexenta-atlanta.eventbrite.com Los Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
Thanks Richard - I will look into the firmware versions... On the Corrupt MetaData question, I ran a scrub, and the pool has come up clean again... So at least the pool is operational, and I am happy not to have to rebuild that data set. I''m still a little surprised that zpool status would advise giving up on my zpool on what was apparrently a transient error. Regards, Paul Tetley On Fri, Mar 12, 2010 at 4:12 PM, Richard Elling <richard.elling at gmail.com>wrote:> On Mar 11, 2010, at 11:28 PM, Paul Tetley wrote: > > Hi, > > > > My zpool is reporting unrecoverable errors with the metadata: > > > > pool: rpool2 > > state: ONLINE > > status: One or more devices has experienced an error resulting in data > > corruption. Applications may be effected. > > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > > see: http://www.sun.com/msg/ZFS-8000-8A > > > > (snip) > > > > "errors: Permanent errors have been detected in the following files: > > <metadata>:<0x0> > > <metadata>:<0x1> > > > > It initially reported a DEGRADED pool, but after a reboot, the pool is > now ONLINE and a quick inspection indicates that my data is present and > intact (though the errors stop the file-systems in the pool from mounting at > boot - it drops into maintenance mode). My reading of > http://www.sun.com/msg/ZFS-8000-8A indicates I should destroy the pool and > start again, but http://www.crypticide.com/dropsafe/article/2162 gives me > some small hope that this might be fixable... > > > > The pool has been ''a little flakey'' since I built it two months back. > I''ve been getting small numbers of read and checksum errors on a few of the > disks each day. Initially I replaced the disks, but they would always pass > all testing, so lately I''ve just been clearing the errors each day and > looking for another solution. I thought I had found it when I discovered WD > had a firmware patch (http://www.3ware.com/kb/article.aspx?id=15592 , > http://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html) > which solved bugs in the drive spin-up behaviour which has been causing > problems for various hardware RAID controllers. So, yesterday I shutdown > the machine, pulled 2 of the 4 troublesome disks, and applied the firmware > upgrade (04.05G05). When I booted, the pool was degraded and showed > metadata errors. After a shutdown and cold start, the pool was ONLINE but > still have metadata errors (so somewhat inconsistent guidance from ZPOOL > STATUS. > > > > Can anyone explain what this ''metadata'' is? > > > > More Details: > > ? This is a backup server so I can rebuild if necessary, but on > principle I''d like to have a go at fixing it... > > ? The zpool has 96 x 2TB drives divided in to RAIDZ2 sets of 8 > (6+2). > > ? The drives are Western Digital RE-4''s (WD2002FYPS). > > ? Running OpenSolaris build snv_111b. > > ? Drives are in two AIC JBODs connectected via SAS. > > ? HBA is an LSI 3801E > > It has been suggested to check the firmware release for this controller. > > This CR has a workaround that might help, too. > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6894775 > -- richard > > > ? Server is 1RU SuperMicro Intel. > > Any advice appreciated! > > > > :-) > > Paul Tetley > > NearMap Pty Ltd > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ZFS storage and performance consulting at http://www.RichardElling.com > ZFS training on deduplication, NexentaStor, and NAS performance > Atlanta, March 16-18, 2010 http://nexenta-atlanta.eventbrite.com > Los Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100315/4349eb18/attachment.html>