Jonathan Loran
2009-Jun-01 21:28 UTC
[zfs-discuss] Does zpool clear delete corrupted files
Hi list, First off: # cat /etc/release Solaris 10 6/06 s10x_u2wos_09a X86 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 09 June 2006 Here''s an (almost) disaster scenario that came to life over the past week. We have a very large zpool containing over 30TB, composed (foolishly) of three concatenated iSCSI SAN devices. There''s no redundancy in this pool at the zfs level. We are actually in the process of migrating this to a x4540 + j4500 setup, but since the x4540 is part of the existing pool, we need to mirror it, then detach it so we can build out the replacement storage. What happened was some time after I had attached the mirror to the x4540, the scsi_vhci/network connection went south, and the server panicked. Since this system has been up, over the past 2.5 years, this has never happened before. When we got the thing glued back together, it immediately started resilvering from the beginning, and reported about 1.9 million data errors. The list from zpool status -v gave over 883k bad files. This is a small percentage of the total number of files in this volume: over 80 million (1%). My question is this: When we clear the pool with zpool clear, what happens to all of the bad files? Are they deleted from the pool, or do the error counters just get reset, leaving the bad files in tact? I''m going to perform a full backup of this guy (not so easy on my budget), and I would rather only get the good files. Thanks, Jon - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090601/4a76ec91/attachment.html>
"zpool clear" just clears the list of errors (and # of checksum errors) from its stats. It does not modify the filesystem in any manner. You run "zpool clear" to make the zpool forget that it ever had any issues. -Paul Jonathan Loran wrote:> > Hi list, > > First off: > > # cat /etc/release > Solaris 10 6/06 s10x_u2wos_09a X86 > Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. > Assembled 09 June 2006 > > Here''s an (almost) disaster scenario that came to life over the past > week. We have a very large zpool containing over 30TB, composed > (foolishly) of three concatenated iSCSI SAN devices. There''s no > redundancy in this pool at the zfs level. We are actually in the > process of migrating this to a x4540 + j4500 setup, but since the > x4540 is part of the existing pool, we need to mirror it, > then detach it so we can build out the replacement storage. > > What happened was some time after I had attached the mirror to the > x4540, the scsi_vhci/network connection went south, and the > server panicked. Since this system has been up, over the past 2.5 > years, this has never happened before. When we got the thing glued > back together, it immediately started resilvering from the beginning, > and reported about 1.9 million data errors. The list from zpool > status -v gave over 883k bad files. This is a small percentage of the > total number of files in this volume: over 80 million (1%). > > My question is this: When we clear the pool with zpool clear, what > happens to all of the bad files? Are they deleted from the pool, or > do the error counters just get reset, leaving the bad files in tact? > I''m going to perform a full backup of this guy (not so easy on my > budget), and I would rather only get the good files. > > Thanks, > > Jon > > > - _____/ _____/ / - Jonathan Loran - - > - / / / IT Manager - > - _____ / _____ / / Space Sciences Laboratory, UC Berkeley > - / / / (510) 643-5146 > jloran at ssl.berkeley.edu <mailto:jloran at ssl.berkeley.edu> > - ______/ ______/ ______/ AST:7731^29u18e3 > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Jonathan Loran
2009-Jun-01 22:19 UTC
[zfs-discuss] Does zpool clear delete corrupted files
Kinda scary then. Better make sure we delete all the bad files before I back it up. What''s odd is we''ve checked a few hundred files, and most of them don''t seem to have any corruption. I''m thinking what''s wrong is the metadata for these files is corrupted somehow, yet we can read them just fine. I wish I could tell which ones are really bad, so we wouldn''t have to recreate them unnecessarily. They are mirrored in various places, or can be recreated via reprocessing, but recreating/ restoring that many files is no easy task. Thanks, Jon On Jun 1, 2009, at 2:41 PM, Paul Choi wrote:> "zpool clear" just clears the list of errors (and # of checksum > errors) from its stats. It does not modify the filesystem in any > manner. You run "zpool clear" to make the zpool forget that it ever > had any issues. > > -Paul > > Jonathan Loran wrote: >> >> Hi list, >> >> First off: >> # cat /etc/release >> Solaris 10 6/06 s10x_u2wos_09a X86 >> Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. >> Use is subject to license terms. >> Assembled 09 June 2006 >> >> Here''s an (almost) disaster scenario that came to life over the >> past week. We have a very large zpool containing over 30TB, >> composed (foolishly) of three concatenated iSCSI SAN devices. >> There''s no redundancy in this pool at the zfs level. We are >> actually in the process of migrating this to a x4540 + j4500 setup, >> but since the x4540 is part of the existing pool, we need to mirror >> it, then detach it so we can build out the replacement storage. >> What happened was some time after I had attached the mirror to the >> x4540, the scsi_vhci/network connection went south, and the server >> panicked. Since this system has been up, over the past 2.5 years, >> this has never happened before. When we got the thing glued back >> together, it immediately started resilvering from the beginning, >> and reported about 1.9 million data errors. The list from zpool >> status -v gave over 883k bad files. This is a small percentage of >> the total number of files in this volume: over 80 million (1%). >> My question is this: When we clear the pool with zpool clear, what >> happens to all of the bad files? Are they deleted from the pool, >> or do the error counters just get reset, leaving the bad files in >> tact? I''m going to perform a full backup of this guy (not so easy >> on my budget), and I would rather only get the good files. >> >> Thanks, >> >> Jon >> >> >> - _____/ _____/ / - Jonathan Loran >> - - >> - / / / IT >> Manager - >> - _____ / _____ / / Space Sciences Laboratory, UC >> Berkeley >> - / / / (510) 643-5146 jloran at ssl.berkeley.edu >> <mailto:jloran at ssl.berkeley.edu> >> - ______/ ______/ ______/ AST:7731^29u18e3 >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >- _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
A Darren Dunham
2009-Jun-01 23:19 UTC
[zfs-discuss] Does zpool clear delete corrupted files
On Mon, Jun 01, 2009 at 03:19:59PM -0700, Jonathan Loran wrote:> > Kinda scary then. Better make sure we delete all the bad files before > I back it up.That shouldn''t be necessary. Clearing the error count doesn''t disable checksums. Every read is going to verify checksums on the file data blocks. If it can''t find at least one copy with a valid checksum, you should just get an I/O error trying to read the file, not invalid data.> What''s odd is we''ve checked a few hundred files, and most of them > don''t seem to have any corruption. I''m thinking what''s wrong is the > metadata for these files is corrupted somehow, yet we can read them > just fine.Are you still getting errors? -- Darren
If you run "zpool scrub" on the zpool, it''ll do its best to identify the file(s) or filesystems/snapshots that have issues. Since you''re on a single zpool, it won''t self-heal any checksum errors... It''ll take a long time, though, to scrub 30TB... -Paul Jonathan Loran wrote:> > Kinda scary then. Better make sure we delete all the bad files before > I back it up. > > What''s odd is we''ve checked a few hundred files, and most of them > don''t seem to have any corruption. I''m thinking what''s wrong is the > metadata for these files is corrupted somehow, yet we can read them > just fine. I wish I could tell which ones are really bad, so we > wouldn''t have to recreate them unnecessarily. They are mirrored in > various places, or can be recreated via reprocessing, but > recreating/restoring that many files is no easy task. > > Thanks, > > Jon > > On Jun 1, 2009, at 2:41 PM, Paul Choi wrote: > >> "zpool clear" just clears the list of errors (and # of checksum >> errors) from its stats. It does not modify the filesystem in any >> manner. You run "zpool clear" to make the zpool forget that it ever >> had any issues. >> >> -Paul >> >> Jonathan Loran wrote: >>> >>> Hi list, >>> >>> First off: >>> # cat /etc/release >>> Solaris 10 6/06 s10x_u2wos_09a X86 >>> Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. >>> Use is subject to license terms. >>> Assembled 09 June 2006 >>> >>> Here''s an (almost) disaster scenario that came to life over the past >>> week. We have a very large zpool containing over 30TB, composed >>> (foolishly) of three concatenated iSCSI SAN devices. There''s no >>> redundancy in this pool at the zfs level. We are actually in the >>> process of migrating this to a x4540 + j4500 setup, but since the >>> x4540 is part of the existing pool, we need to mirror it, then >>> detach it so we can build out the replacement storage. >>> What happened was some time after I had attached the mirror to the >>> x4540, the scsi_vhci/network connection went south, and the server >>> panicked. Since this system has been up, over the past 2.5 years, >>> this has never happened before. When we got the thing glued back >>> together, it immediately started resilvering from the beginning, and >>> reported about 1.9 million data errors. The list from zpool status >>> -v gave over 883k bad files. This is a small percentage of the >>> total number of files in this volume: over 80 million (1%). >>> My question is this: When we clear the pool with zpool clear, what >>> happens to all of the bad files? Are they deleted from the pool, or >>> do the error counters just get reset, leaving the bad files in >>> tact? I''m going to perform a full backup of this guy (not so easy >>> on my budget), and I would rather only get the good files. >>> >>> Thanks, >>> >>> Jon >>> >>> >>> - _____/ _____/ / - Jonathan Loran - - >>> - / / / IT Manager - >>> - _____ / _____ / / Space Sciences Laboratory, UC Berkeley >>> - / / / (510) 643-5146 >>> jloran at ssl.berkeley.edu <mailto:jloran at ssl.berkeley.edu> >>> - ______/ ______/ ______/ AST:7731^29u18e3 >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >> > > > > - _____/ _____/ / - Jonathan Loran - - > - / / / IT Manager - > - _____ / _____ / / Space Sciences Laboratory, UC Berkeley > - / / / (510) 643-5146 jloran at ssl.berkeley.edu > - ______/ ______/ ______/ AST:7731^29u18e3 > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Marion Hakanson
2009-Jun-01 23:35 UTC
[zfs-discuss] Does zpool clear delete corrupted files
jloran at ssl.berkeley.edu said:> What''s odd is we''ve checked a few hundred files, and most of them don''t > seem to have any corruption. I''m thinking what''s wrong is the metadata for > these files is corrupted somehow, yet we can read them just fine. I wish I > could tell which ones are really bad, so we wouldn''t have to recreate them > unnecessarily. They are mirrored in various places, or can be recreated > via reprocessing, but recreating/ restoring that many files is no easy task.You know, this sounds similar to what happened to me once when I did a "zpool offline" to half of a mirror, changed a lot of stuff in the pool (like adding 20GB of data to an 80GB pool), then "zpool online", thinking ZFS might be smart enough to sync up the changes that had happened since detaching. Instead, a bunch of bad files were reported. Since I knew nothing was wrong with the half of the mirror that had never been offlined, I just did a "zpool detach" of the formerly offlined drive, "zpool clear" to clear the error counts, "zpool scrub" to check for integrity, then "zpool attach" to cause resilver to start from scratch. If this describes your situation, I guess the tricky part for you is to now decide which half of your mirror is the good half. There''s always "rsync -n -v -a -c ..." to compare copies of files that happen to reside elsewhere. Slow but safe. Regards, Marion
Jonathan Loran
2009-Jun-02 06:18 UTC
[zfs-discuss] Does zpool clear delete corrupted files
Well, I tried to clear the errors, but zpool clear didn''t clear them. I think the errors are in the metadata in such a way that they can''t be cleared. I''m actually a bit scared to scrub it before I grab a backup, so I''m going to do that first. After the backup, I need to break the mirror to pull the x4540 out, and I just hope that can succeed. If not, we''ll be loosing some data between the time the backup is taken and I roll out the new storage. Let this be a double warning to all you zfs-ers out there: Make sure you have redundancy at the zfs layer, and also do backups. Unfortunately for me, penny pinching has precluded both for us until now. Jon On Jun 1, 2009, at 4:19 PM, A Darren Dunham wrote:> On Mon, Jun 01, 2009 at 03:19:59PM -0700, Jonathan Loran wrote: >> >> Kinda scary then. Better make sure we delete all the bad files >> before >> I back it up. > > That shouldn''t be necessary. Clearing the error count doesn''t disable > checksums. Every read is going to verify checksums on the file data > blocks. If it can''t find at least one copy with a valid checksum, > you should just get an I/O error trying to read the file, not invalid > data. > >> What''s odd is we''ve checked a few hundred files, and most of them >> don''t seem to have any corruption. I''m thinking what''s wrong is the >> metadata for these files is corrupted somehow, yet we can read them >> just fine. > > Are you still getting errors? > > -- > Darren > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss- _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090601/b8ffee21/attachment.html>
Hm. That''s odd. "zpool clear" should''ve cleared the list of errors. Unless you were accessing files at the same time, so there were more checksum errors being reported upon reads. As for "zpool scrub", there''s no benefit in your case. Since you are reading from the zpool, and there''s checksums being done as you read - and I assume you''re going to read every single file there is. "zpool scrub" is good when you want to ensure that checksum is good for the whole zpool, including files you haven''t read recently. Well, good luck with your recovery efforts. -Paul Jonathan Loran wrote:> > Well, I tried to clear the errors, but zpool clear didn''t clear them. > I think the errors are in the metadata in such a way that they can''t > be cleared. I''m actually a bit scared to scrub it before I grab a > backup, so I''m going to do that first. After the backup, I need to > break the mirror to pull the x4540 out, and I just hope that can > succeed. If not, we''ll be loosing some data between the time the > backup is taken and I roll out the new storage. > > Let this be a double warning to all you zfs-ers out there: Make sure > you have redundancy at the zfs layer, and also do backups. > Unfortunately for me, penny pinching has precluded both for us until > now. > > Jon > > On Jun 1, 2009, at 4:19 PM, A Darren Dunham wrote: > >> On Mon, Jun 01, 2009 at 03:19:59PM -0700, Jonathan Loran wrote: >>> >>> Kinda scary then. Better make sure we delete all the bad files before >>> I back it up. >> >> That shouldn''t be necessary. Clearing the error count doesn''t disable >> checksums. Every read is going to verify checksums on the file data >> blocks. If it can''t find at least one copy with a valid checksum, >> you should just get an I/O error trying to read the file, not invalid >> data. >> >>> What''s odd is we''ve checked a few hundred files, and most of them >>> don''t seem to have any corruption. I''m thinking what''s wrong is the >>> metadata for these files is corrupted somehow, yet we can read them >>> just fine. >> >> Are you still getting errors? >> >> -- >> Darren >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > - _____/ _____/ / - Jonathan Loran - - > - / / / IT Manager - > - _____ / _____ / / Space Sciences Laboratory, UC Berkeley > - / / / (510) 643-5146 > jloran at ssl.berkeley.edu <mailto:jloran at ssl.berkeley.edu> > - ______/ ______/ ______/ AST:7731^29u18e3 > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >