Kris Kasner
2010-Jul-12 23:10 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
Hi Folks.. I have a system that was inadvertently left unmirrored for root. We were able to add a mirror disk, resilver, and fix the corrupted files (nothing very interesting was corrupt, whew), but zpool status -v still shows errors.. Will this self correct when we replace the degraded disk and resilver? Or is there something else that I''m not finding that I need to do to clean up? This is Solaris 10 u8, zpool v15 15:52:50 catalina(34)> sudo zpool status -v pool: zroot state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 15:41:50 2010 config: NAME STATE READ WRITE CKSUM zroot DEGRADED 18 0 0 mirror DEGRADED 44 0 23 c1t1d0s2 DEGRADED 74 0 23 too many errors c1t0d0s2 ONLINE 0 0 67 29.8G resilvered errors: Permanent errors have been detected in the following files: zroot/packages:<0xad58> zroot/packages:<0x11477> zroot/packages:<0x2531d> <0x6e>:<0xc0f2> <0x6e>:<0xce68> <0x6e>:<0x28d9f> <0x6e>:<0x2b5c1> <0x76>:<0x17369> <0x86>:<0x11fda> <0x86>:<0x13253> <0x86>:<0x13346> <0x86>:<0x33ed3> <0x86>:<0x38fcd> <0x86>:<0x39007> 15:53:04 catalina(35)> Thanks for any suggestions. The system is in another city, so I can''t quickly test replacing the disk and see what happens.. Kris -- Kris Kasner Qualcomm Inc.
Garrett D''Amore
2010-Jul-12 23:15 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
Hey Kris (glad to see someone from my QCOM days!): It should automatically clear itself when you replace the disk. Right now you''re still degraded since you don''t have full redundancy. - Garrett On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote:> Hi Folks.. > > I have a system that was inadvertently left unmirrored for root. We were able > to add a mirror disk, resilver, and fix the corrupted files (nothing very > interesting was corrupt, whew), but zpool status -v still shows errors.. > > Will this self correct when we replace the degraded disk and resilver? Or is > there something else that I''m not finding that I need to do to clean up? > > This is Solaris 10 u8, zpool v15 > 15:52:50 catalina(34)> sudo zpool status -v > pool: zroot > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 15:41:50 > 2010 > config: > > NAME STATE READ WRITE CKSUM > zroot DEGRADED 18 0 0 > mirror DEGRADED 44 0 23 > c1t1d0s2 DEGRADED 74 0 23 too many errors > c1t0d0s2 ONLINE 0 0 67 29.8G resilvered > > errors: Permanent errors have been detected in the following files: > > zroot/packages:<0xad58> > zroot/packages:<0x11477> > zroot/packages:<0x2531d> > <0x6e>:<0xc0f2> > <0x6e>:<0xce68> > <0x6e>:<0x28d9f> > <0x6e>:<0x2b5c1> > <0x76>:<0x17369> > <0x86>:<0x11fda> > <0x86>:<0x13253> > <0x86>:<0x13346> > <0x86>:<0x33ed3> > <0x86>:<0x38fcd> > <0x86>:<0x39007> > 15:53:04 catalina(35)> > > > Thanks for any suggestions. The system is in another city, so I can''t quickly > test replacing the disk and see what happens.. > > Kris >
Ian Collins
2010-Jul-12 23:20 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
On 07/13/10 11:10 AM, Kris Kasner wrote:> > Hi Folks.. > > I have a system that was inadvertently left unmirrored for root. We > were able to add a mirror disk, resilver, and fix the corrupted files > (nothing very interesting was corrupt, whew), but zpool status -v > still shows errors.. > > Will this self correct when we replace the degraded disk and resilver? > Or is there something else that I''m not finding that I need to do to > clean up? > > This is Solaris 10 u8, zpool v15 > 15:52:50 catalina(34)> sudo zpool status -v > pool: zroot > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 > 15:41:50 2010 > config: > > NAME STATE READ WRITE CKSUM > zroot DEGRADED 18 0 0 > mirror DEGRADED 44 0 23 > c1t1d0s2 DEGRADED 74 0 23 too many errors > c1t0d0s2 ONLINE 0 0 67 29.8G resilvered >What happens if you zpool detach the degraded drive? -- Ian.
Kris Kasner
2010-Jul-12 23:25 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
Thanks for the reply.. I got derailed by a DBA while writing the email, I should have been more clear - I realize that the ''DEGRADED'' states should resolve after I replace the disk, but what about the section that states: " errors: Permanent errors have been detected in the following files: " <list of files that are no longer with us..> Will those resolve too? or will it still think that there are corrupt files lying around. They all had valid paths at the start of the process, when I unlinked them and replaced them with good copies they changed to the>> zroot/packages:<0x2531d> >> <0x6e>:<0xc0f2>format. I''m mostly concerned because I want spool status to show up clean and error free so our monitoring can catch it correctly. Thanks again. --Kris Today at 16:15, Garrett D''Amore <garrett at nexenta.com> wrote:> Hey Kris (glad to see someone from my QCOM days!): > > It should automatically clear itself when you replace the disk. Right > now you''re still degraded since you don''t have full redundancy. > > - Garrett > > > On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote: >> Hi Folks.. >> >> I have a system that was inadvertently left unmirrored for root. We were able >> to add a mirror disk, resilver, and fix the corrupted files (nothing very >> interesting was corrupt, whew), but zpool status -v still shows errors.. >> >> Will this self correct when we replace the degraded disk and resilver? Or is >> there something else that I''m not finding that I need to do to clean up? >> >> This is Solaris 10 u8, zpool v15 >> 15:52:50 catalina(34)> sudo zpool status -v >> pool: zroot >> state: DEGRADED >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 15:41:50 >> 2010 >> config: >> >> NAME STATE READ WRITE CKSUM >> zroot DEGRADED 18 0 0 >> mirror DEGRADED 44 0 23 >> c1t1d0s2 DEGRADED 74 0 23 too many errors >> c1t0d0s2 ONLINE 0 0 67 29.8G resilvered >> >> errors: Permanent errors have been detected in the following files: >> >> zroot/packages:<0xad58> >> zroot/packages:<0x11477> >> zroot/packages:<0x2531d> >> <0x6e>:<0xc0f2> >> <0x6e>:<0xce68> >> <0x6e>:<0x28d9f> >> <0x6e>:<0x2b5c1> >> <0x76>:<0x17369> >> <0x86>:<0x11fda> >> <0x86>:<0x13253> >> <0x86>:<0x13346> >> <0x86>:<0x33ed3> >> <0x86>:<0x38fcd> >> <0x86>:<0x39007> >> 15:53:04 catalina(35)> >> >> >> Thanks for any suggestions. The system is in another city, so I can''t quickly >> test replacing the disk and see what happens.. >> >> Kris >> > >-- Thomas Kris Kasner Qualcomm Inc. 5775 Morehouse Drive San Diego, CA 92121 (858)658-4932 "Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." --Bruce Graham
Garrett D''Amore
2010-Jul-15 16:44 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
On Mon, 2010-07-12 at 16:25 -0700, Kris Kasner wrote:> Thanks for the reply.. > > I got derailed by a DBA while writing the email, I should have been more > clear - I realize that the ''DEGRADED'' states should resolve after I replace the > disk, but what about the section that states: > " errors: Permanent errors have been detected in the following files: " > <list of files that are no longer with us..> > > Will those resolve too? or will it still think that there are corrupt files > lying around. They all had valid paths at the start of the process, when I > unlinked them and replaced them with good copies they changed to the > >> zroot/packages:<0x2531d> > >> <0x6e>:<0xc0f2> > format. > > I''m mostly concerned because I want spool status to show up clean and error > free so our monitoring can catch it correctly.Those corrupt files are corrupt forever. Until they are removed. I recommend doing a scrub. There are probably other experts here (Richard?) who can suggest a permanent fix. - Garrett> > Thanks again. > > --Kris > > Today at 16:15, Garrett D''Amore <garrett at nexenta.com> wrote: > > > Hey Kris (glad to see someone from my QCOM days!): > > > > It should automatically clear itself when you replace the disk. Right > > now you''re still degraded since you don''t have full redundancy. > > > > - Garrett > > > > > > On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote: > >> Hi Folks.. > >> > >> I have a system that was inadvertently left unmirrored for root. We were able > >> to add a mirror disk, resilver, and fix the corrupted files (nothing very > >> interesting was corrupt, whew), but zpool status -v still shows errors.. > >> > >> Will this self correct when we replace the degraded disk and resilver? Or is > >> there something else that I''m not finding that I need to do to clean up? > >> > >> This is Solaris 10 u8, zpool v15 > >> 15:52:50 catalina(34)> sudo zpool status -v > >> pool: zroot > >> state: DEGRADED > >> status: One or more devices has experienced an error resulting in data > >> corruption. Applications may be affected. > >> action: Restore the file in question if possible. Otherwise restore the > >> entire pool from backup. > >> see: http://www.sun.com/msg/ZFS-8000-8A > >> scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 15:41:50 > >> 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> zroot DEGRADED 18 0 0 > >> mirror DEGRADED 44 0 23 > >> c1t1d0s2 DEGRADED 74 0 23 too many errors > >> c1t0d0s2 ONLINE 0 0 67 29.8G resilvered > >> > >> errors: Permanent errors have been detected in the following files: > >> > >> zroot/packages:<0xad58> > >> zroot/packages:<0x11477> > >> zroot/packages:<0x2531d> > >> <0x6e>:<0xc0f2> > >> <0x6e>:<0xce68> > >> <0x6e>:<0x28d9f> > >> <0x6e>:<0x2b5c1> > >> <0x76>:<0x17369> > >> <0x86>:<0x11fda> > >> <0x86>:<0x13253> > >> <0x86>:<0x13346> > >> <0x86>:<0x33ed3> > >> <0x86>:<0x38fcd> > >> <0x86>:<0x39007> > >> 15:53:04 catalina(35)> > >> > >> > >> Thanks for any suggestions. The system is in another city, so I can''t quickly > >> test replacing the disk and see what happens.. > >> > >> Kris > >> > > > > >
Kris Kasner
2010-Jul-15 17:12 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
Today at 09:44, Garrett D''Amore <garrett at nexenta.com> wrote:> > Those corrupt files are corrupt forever. Until they are removed. I > recommend doing a scrub. There are probably other experts here > (Richard?) who can suggest a permanent fix. >Right, and we''re OK with that.. We were lucky - all of the corrupt files are non-essential. When I remove the files and replace them, I get something that looks like a hex device:block number (ie: <0x86>:<0x38fcd>). The server is a v440, so I was able to have someone add some extra drives. zpool replace zroot c1t1d0s2 c1t2d0s2 failed to complete.. it left zpool status looking like this: 10:40:33 catalina(36)> sudo zpool status -v Password: pool: zroot state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 0h31m with 4 errors on Tue Jul 13 11:47:07 2010 config: NAME STATE READ WRITE CKSUM zroot DEGRADED 28 0 0 mirror DEGRADED 60 0 27 replacing DEGRADED 31 0 53 c1t1d0s2 DEGRADED 129 0 99 too many errors c1t2d0s2 ONLINE 0 0 84 24.5G resilvered c1t0d0s2 ONLINE 0 0 87 24.4G resilvered errors: Permanent errors have been detected in the following files: //usr/dt/lib/sparcv9/libDtWidget.so.2 //platform/sun4us/failsafe //opt/staroffice8/share/gallery/www-graf/bluleft.gif /var/tmp/patches/10_Recommended/125541-04/SUNWthunderbird/reloc/lib/thunderbird/components/librdf.so If I delete one of these files, zpool status -v shows that device/block identifier I mentioned previously.. I''ve run a few scrubs, but they don''t change anything. The system appears stable right now, our internal customers have no idea anything is wrong (IE, their apps are stable). We''re planning on migrating them to a Niagara blade to return things to "known good". I''m still curious to know if anyone knows a fix for this kind of issue, if there is one. I fully expect that if I was running UFS on one drive and it failed like this zfs drive failed the system would have panicked. That''s a big win. I would still like to get to the bottom of this issue. :-) Thanks again for your replies. --Kris>> >> Today at 16:15, Garrett D''Amore <garrett at nexenta.com> wrote: >> >>> Hey Kris (glad to see someone from my QCOM days!): >>> >>> It should automatically clear itself when you replace the disk. Right >>> now you''re still degraded since you don''t have full redundancy. >>> >>> - Garrett >>> >>> >>> On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote: >>>> Hi Folks.. >>>> >>>> I have a system that was inadvertently left unmirrored for root. We were able >>>> to add a mirror disk, resilver, and fix the corrupted files (nothing very >>>> interesting was corrupt, whew), but zpool status -v still shows errors.. >>>> >>>> Will this self correct when we replace the degraded disk and resilver? Or is >>>> there something else that I''m not finding that I need to do to clean up? >>>> >>>> This is Solaris 10 u8, zpool v15 >>>> 15:52:50 catalina(34)> sudo zpool status -v >>>> pool: zroot >>>> state: DEGRADED >>>> status: One or more devices has experienced an error resulting in data >>>> corruption. Applications may be affected. >>>> action: Restore the file in question if possible. Otherwise restore the >>>> entire pool from backup. >>>> see: http://www.sun.com/msg/ZFS-8000-8A >>>> scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 15:41:50 >>>> 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> zroot DEGRADED 18 0 0 >>>> mirror DEGRADED 44 0 23 >>>> c1t1d0s2 DEGRADED 74 0 23 too many errors >>>> c1t0d0s2 ONLINE 0 0 67 29.8G resilvered >>>> >>>> errors: Permanent errors have been detected in the following files: >>>> >>>> zroot/packages:<0xad58> >>>> zroot/packages:<0x11477> >>>> zroot/packages:<0x2531d> >>>> <0x6e>:<0xc0f2> >>>> <0x6e>:<0xce68> >>>> <0x6e>:<0x28d9f> >>>> <0x6e>:<0x2b5c1> >>>> <0x76>:<0x17369> >>>> <0x86>:<0x11fda> >>>> <0x86>:<0x13253> >>>> <0x86>:<0x13346> >>>> <0x86>:<0x33ed3> >>>> <0x86>:<0x38fcd> >>>> <0x86>:<0x39007> >>>> 15:53:04 catalina(35)> >>>> >>>> >>>> Thanks for any suggestions. The system is in another city, so I can''t quickly >>>> test replacing the disk and see what happens.. >>>> >>>> Kris >>>> >>> >>> >> > >-- Thomas Kris Kasner Qualcomm Inc. 5775 Morehouse Drive San Diego, CA 92121 (858)658-4932 Outside of a dog, A book is man''s best friend. Inside of a dog... It''s too dark to read! (unknown)
Garrett D''Amore
2010-Jul-15 17:17 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
There''s probably a way to clean up those old entries, I''m just not sure what it is. Is the data shared with any snapshots or clones? I''d expect you have to remove all references to the blocks, not just the files but also in snapshots or cloned images. - Garrett On Thu, 2010-07-15 at 10:12 -0700, Kris Kasner wrote:> Today at 09:44, Garrett D''Amore <garrett at nexenta.com> wrote: > > > > > Those corrupt files are corrupt forever. Until they are removed. I > > recommend doing a scrub. There are probably other experts here > > (Richard?) who can suggest a permanent fix. > > > > Right, and we''re OK with that.. We were lucky - all of the corrupt files are > non-essential. When I remove the files and replace them, I get something that > looks like a hex device:block number (ie: <0x86>:<0x38fcd>). > > The server is a v440, so I was able to have someone add some extra drives. > zpool replace zroot c1t1d0s2 c1t2d0s2 > failed to complete.. it left zpool status looking like this: > 10:40:33 catalina(36)> sudo zpool status -v > Password: > pool: zroot > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: resilver completed after 0h31m with 4 errors on Tue Jul 13 11:47:07 > 2010 > config: > > NAME STATE READ WRITE CKSUM > zroot DEGRADED 28 0 0 > mirror DEGRADED 60 0 27 > replacing DEGRADED 31 0 53 > c1t1d0s2 DEGRADED 129 0 99 too many errors > c1t2d0s2 ONLINE 0 0 84 24.5G resilvered > c1t0d0s2 ONLINE 0 0 87 24.4G resilvered > > errors: Permanent errors have been detected in the following files: > > //usr/dt/lib/sparcv9/libDtWidget.so.2 > //platform/sun4us/failsafe > //opt/staroffice8/share/gallery/www-graf/bluleft.gif > > /var/tmp/patches/10_Recommended/125541-04/SUNWthunderbird/reloc/lib/thunderbird/components/librdf.so > > > If I delete one of these files, zpool status -v shows that device/block > identifier I mentioned previously.. I''ve run a few scrubs, but they don''t > change anything. > > > The system appears stable right now, our internal customers have no idea > anything is wrong (IE, their apps are stable). We''re planning on migrating them > to a Niagara blade to return things to "known good". > > > I''m still curious to know if anyone knows a fix for this kind of issue, if > there is one. I fully expect that if I was running UFS on one drive and it > failed like this zfs drive failed the system would have panicked. That''s a > big win. I would still like to get to the bottom of this issue. :-) > > > Thanks again for your replies. > > --Kris > > > > > > > >> > >> Today at 16:15, Garrett D''Amore <garrett at nexenta.com> wrote: > >> > >>> Hey Kris (glad to see someone from my QCOM days!): > >>> > >>> It should automatically clear itself when you replace the disk. Right > >>> now you''re still degraded since you don''t have full redundancy. > >>> > >>> - Garrett > >>> > >>> > >>> On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote: > >>>> Hi Folks.. > >>>> > >>>> I have a system that was inadvertently left unmirrored for root. We were able > >>>> to add a mirror disk, resilver, and fix the corrupted files (nothing very > >>>> interesting was corrupt, whew), but zpool status -v still shows errors.. > >>>> > >>>> Will this self correct when we replace the degraded disk and resilver? Or is > >>>> there something else that I''m not finding that I need to do to clean up? > >>>> > >>>> This is Solaris 10 u8, zpool v15 > >>>> 15:52:50 catalina(34)> sudo zpool status -v > >>>> pool: zroot > >>>> state: DEGRADED > >>>> status: One or more devices has experienced an error resulting in data > >>>> corruption. Applications may be affected. > >>>> action: Restore the file in question if possible. Otherwise restore the > >>>> entire pool from backup. > >>>> see: http://www.sun.com/msg/ZFS-8000-8A > >>>> scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 15:41:50 > >>>> 2010 > >>>> config: > >>>> > >>>> NAME STATE READ WRITE CKSUM > >>>> zroot DEGRADED 18 0 0 > >>>> mirror DEGRADED 44 0 23 > >>>> c1t1d0s2 DEGRADED 74 0 23 too many errors > >>>> c1t0d0s2 ONLINE 0 0 67 29.8G resilvered > >>>> > >>>> errors: Permanent errors have been detected in the following files: > >>>> > >>>> zroot/packages:<0xad58> > >>>> zroot/packages:<0x11477> > >>>> zroot/packages:<0x2531d> > >>>> <0x6e>:<0xc0f2> > >>>> <0x6e>:<0xce68> > >>>> <0x6e>:<0x28d9f> > >>>> <0x6e>:<0x2b5c1> > >>>> <0x76>:<0x17369> > >>>> <0x86>:<0x11fda> > >>>> <0x86>:<0x13253> > >>>> <0x86>:<0x13346> > >>>> <0x86>:<0x33ed3> > >>>> <0x86>:<0x38fcd> > >>>> <0x86>:<0x39007> > >>>> 15:53:04 catalina(35)> > >>>> > >>>> > >>>> Thanks for any suggestions. The system is in another city, so I can''t quickly > >>>> test replacing the disk and see what happens.. > >>>> > >>>> Kris > >>>> > >>> > >>> > >> > > > > >
Russell Hansen
2010-Jul-15 18:25 UTC
[zfs-discuss] How do I clean up corrupted files from zpool status -v?
On the note of snapshots, since it appears to be a root fs you might want/need to check alternate boot environments (lustatus, lu*, etc.) if you did some LiveUpdates. That would create snapshots/clones for you in the process. -Russ> There''s probably a way to clean up those old entries, > I''m just not sure > what it is. Is the data shared with any snapshots or > clones? I''d > expect you have to remove all references to the > blocks, not just the > files but also in snapshots or cloned images. > > - Garrett-- This message posted from opensolaris.org