sbremal at hotmail.com
2011-Nov-03 12:35 UTC
[zfs-discuss] Remove corrupt files from snapshot
Hello, I have got a bunch of corrupted files in various snapshots on my ZFS file backing store. I was not able to recover them so decided to remove all, otherwise the continuously make trouble for my incremental backup (rsync, diff etc. fails). However, snapshots seem to be read-only: # zpool status -v pool: backups state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM backups ONLINE 0 0 13 md0 ONLINE 0 0 13 errors: Permanent errors have been detected in the following files: /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc ... # rm /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc rm: /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc: Read-only file system Is there any way to force the file removal? Cheers, B.
Hi, snapshots are read-only by design; you can clone them and manipulate the clone, but the snapshot itself remains r/o. HTH Michael On Thu, Nov 3, 2011 at 13:35, <sbremal at hotmail.com> wrote:> > Hello, > > I have got a bunch of corrupted files in various snapshots on my ZFS file backing store. I was not able to recover them so decided to remove all, otherwise the continuously make trouble for my incremental backup (rsync, diff etc. fails). > > However, snapshots seem to be read-only: > > # zpool status -v > ?pool: backups > ?state: ONLINE > status: One or more devices has experienced an error resulting in data > ? ? ? ?corruption. ?Applications may be affected. > action: Restore the file in question if possible. ?Otherwise restore the > ? ? ? ?entire pool from backup. > ? see: http://www.sun.com/msg/ZFS-8000-8A > ?scrub: none requested > config: > ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ?backups ? ? ONLINE ? ? ? 0 ? ? 0 ? ?13 > ? ? ? ? ?md0 ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ?13 > errors: Permanent errors have been detected in the following files: > ? ? ? ?/backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc > ... > > # rm /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc > rm: /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc: Read-only file system > > > Is there any way to force the file removal? > > > Cheers, > B. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Michael Schuster http://recursiveramblings.wordpress.com/
On Thu, Nov 3, 2011 at 8:35 AM, <sbremal at hotmail.com> wrote:> I have got a bunch of corrupted files in various snapshots on my > ZFS file backing store. I was not able to recover them so decided > to remove all, otherwise the continuously make trouble for my > incremental backup (rsync, diff etc. fails).Why are you backing up the snapshots ? Or perhaps a better question is why are you backing them up more than once, as they can''t change ? What are you trying to accomplish with the snapshots ? You can set the snapdir property on the dataset to hidden and it will not show up with an ls, even an ls -a, you have to know that the ".zfs" directory is there and cd into it blind. This will keep tools that walk the directory tree from finding it.> zfs get snapdir xxxNAME PROPERTY VALUE SOURCE xxx snapdir hidden default You would use "zfs set snapdir=hidden <dataset>" to set the parameter. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On 03 November, 2011 - Paul Kraus sent me these 1,3K bytes:> On Thu, Nov 3, 2011 at 8:35 AM, <sbremal at hotmail.com> wrote: > > > I have got a bunch of corrupted files in various snapshots on my > > ZFS file backing store. I was not able to recover them so decided > > to remove all, otherwise the continuously make trouble for my > > incremental backup (rsync, diff etc. fails). > > Why are you backing up the snapshots ? Or perhaps a better question is > why are you backing them up more than once, as they can''t change ? > > What are you trying to accomplish with the snapshots ? > > You can set the snapdir property on the dataset to hidden and it will > not show up with an ls, even an ls -a, you have to know that the > ".zfs" directory is there and cd into it blind. This will keep tools > that walk the directory tree from finding it. > > > zfs get snapdir xxx > NAME PROPERTY VALUE SOURCE > xxx snapdir hidden default > > You would use "zfs set snapdir=hidden <dataset>" to set the parameter... which is default. /Tomas -- Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of sbremal at hotmail.com > > However, snapshots seem to be read-only: > > Is there any way to force the file removal?You need to destroy the snapshot completely - But if you want to selectively delete from a snapshot, I think you can clone it, then promote the clone, then destroy the snapshot, then rm something from the clone and then snapshot the clone back to the original name, and then destroy the clone. Right? BTW, since snapshots are listed in chronological order, it is distinctly possible the above might cause unintended consequences for snapshot scripts / autosnapshot / whatever. Most people in your situation would simply destroy the snapshot and never look back. That''s the easy thing to do.
> -----Original Message----- > From: Edward Ned Harvey > Sent: 04/11/2011 21:23 > > You need to destroy the snapshot completely - But if you want > to selectively > delete from a snapshot, I think you can clone it, then > promote the clone, > then destroy the snapshot, then rm something from the clone and then > snapshot the clone back to the original name, and then > destroy the clone. > > Right?Not so fast! :-) If you promote this new clone, the current state / branch of your filesystem becomes a clone instead, dependent on the snapshot. Then if you try to destroy the snapshot, you''ll fail, because it has a dependent clone (your current fs!!!). If you continue without realising the implications, and so try the ''destroy'' again with ''-R'', there goes the neighbourhood! I did this once, and was only saved by the fact that my cwd was in my current filesystem, so couldn''t be unmounted, and therefore couldn''t be removed! Phew!! Nice to learn something and only get singed eyebrows, instead of losing a leg! hth Andy
sbremal at hotmail.com
2011-Nov-14 09:25 UTC
[zfs-discuss] Remove corrupt files from snapshot
Back to this topic, since I cannot touch snapshots I thought I could simply remove the corrupt files after the last snapshot, so the next incremental backup will notice the difference (i.e. no file) and overwrite the corrupt-and-removed files with valid ones. This was the plan. However, while checking for corrupt files, "find" stops at some directory with "fts_read: Not a directory": find . -exec md5 {} \; > /home/xxx/md5_out 2> /home/xxx/md5_err & tail /home/xxx/md5_err ... md5: ./.zfs/snapshot/20100323081201/Bazsi/Projects/Java Test Client/java_test_client/lib/xxx/weblogic.jar: Input/output error md5: ./.zfs/snapshot/20100323081201/@Cache (Bazsi)/BMWi SP/Publikationen/PDF-Brosch?ren/Nexxt.pdf: Input/output error find: fts_read: Not a directory What does this error mean? I cannot even "scan" the ZFS file system anymore? Is there any "fsck" for ZFS? Cheers, B. ----------------------------------------> From: zfsdiscuss at orgdotuk.org.uk > To: zfs-discuss at opensolaris.org > Date: Mon, 7 Nov 2011 21:49:56 +0000 > Subject: Re: [zfs-discuss] Remove corrupt files from snapshot > > > -----Original Message----- > > From: Edward Ned Harvey > > Sent: 04/11/2011 21:23 > > > > You need to destroy the snapshot completely - But if you want > > to selectively > > delete from a snapshot, I think you can clone it, then > > promote the clone, > > then destroy the snapshot, then rm something from the clone and then > > snapshot the clone back to the original name, and then > > destroy the clone. > > > > Right? > > Not so fast! :-) > > If you promote this new clone, the current state / branch of your filesystem becomes a clone instead, dependent on the snapshot. > Then if you try to destroy the snapshot, you''ll fail, because it has a dependent clone (your current fs!!!). If you continue > without realising the implications, and so try the ''destroy'' again with ''-R'', there goes the neighbourhood! > > I did this once, and was only saved by the fact that my cwd was in my current filesystem, so couldn''t be unmounted, and therefore > couldn''t be removed! Phew!! Nice to learn something and only get singed eyebrows, instead of losing a leg! > > hth Andy > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of sbremal at hotmail.com > > What does this error mean? I cannot even "scan" the ZFS file system > anymore? Is there any "fsck" for ZFS?There is zpool scrub. It will check all the checksums previously calculated, verifying the data that was actually written is the data ZFS previously thought it wrote. If you have sufficient redundancy (mirror or raid) it will self-correct any errors it finds. Since you''re experincing corruption that doesn''t go away, I''m supposing you don''t have redundancy, or else the corruption happened in something higher up, such as a failing cpu or non-ecc ram, or a flaky disk controller. In any event, do you have any reason to believe you''ve eliminated the cause of the corruption? The behavior you''re experiencing is normal if you have failing hardware.
sbremal at hotmail.com
2011-Nov-14 16:39 UTC
[zfs-discuss] Remove corrupt files from snapshot
Actually a regular file (on a RAID1 setup with gmirror and 2 identical disks) is used as backing store for ZFS. The hardware should be fine as nothing else seems to be corrupt. Wonder if a server reset could have caused the issue? There are 2 things that surely do not work perfectly: 1. Startup: [root at xxx /etc/rc.d]# cat /etc/rc.conf | grep mdconfig mdconfig_md0="-f /usr/local/zfs/store" [root at xxx /etc/rc.d]# /etc/rc.d/mdconfig start Creating md0 device (-f). mount: /dev/md0: unknown special file or file system I have to run 2 times "zfs mount all", to see the folders. 2. Shutdown: ... +++ /tmp/security.Z3SCbf2M 2011-10-26 03:07:20.000000000 +0200 +Waiting (max 60 seconds) for system process `vnlru'' to stop...done +Waiting (max 60 seconds) for system process `bufdaemon'' to stop...done +Wait +iSnygn ci(nmga xd is6k0s ,s evcnoonddess) rfeomra isnyisntge.m. .pr5oc ess `syncer'' to stop...4 1 3 3 2 2 0 0 0 done +All buffers synced. The computer does not reboot after this, just waits for ??? . Manual reset is needed. Is ZFS not recommended with file backing store? B. ----------------------------------------> From: opensolarisisdeadlongliveopensolaris at nedharvey.com > To: sbremal at hotmail.com; zfs-discuss at opensolaris.org > Subject: RE: [zfs-discuss] Remove corrupt files from snapshot > Date: Mon, 14 Nov 2011 09:36:58 -0500 > > > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of sbremal at hotmail.com > > > > What does this error mean? I cannot even "scan" the ZFS file system > > anymore? Is there any "fsck" for ZFS? > > There is zpool scrub. It will check all the checksums previously > calculated, verifying the data that was actually written is the data ZFS > previously thought it wrote. If you have sufficient redundancy (mirror or > raid) it will self-correct any errors it finds. > > Since you''re experincing corruption that doesn''t go away, I''m supposing you > don''t have redundancy, or else the corruption happened in something higher > up, such as a failing cpu or non-ecc ram, or a flaky disk controller. > > In any event, do you have any reason to believe you''ve eliminated the cause > of the corruption? The behavior you''re experiencing is normal if you have > failing hardware. >
On Mon, 14 Nov 2011 16:39:25 +0000, <sbremal at hotmail.com> wrote:> >Is ZFS not recommended with file backing store? >>From man zpool:SunOS 5.11 Last change: 24 Nov 2009 2 System Administration Commands zpool(1M) Virtual Devices (vdevs) A "virtual device" describes a single device or a collection of devices organized according to certain performance and fault characteristics. The following virtual devices are supported: disk A block device, typically located under /dev/dsk. ZFS [...] file A regular file. The use of files as a backing store is strongly discouraged. It is designed primarily for experimental purposes, as the fault tolerance of a file is only as good as the file system of which it is a part. A file must be specified by a full path. mirror [....] -- ( Kees Nuyt ) c[_]
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of sbremal at hotmail.com > > Actually a regular file (on a RAID1 setup with gmirror and 2 identicaldisks) is> used as backing store for ZFS. The hardware should be fine as nothing else > seems to be corrupt.In a 10-second google, I see that gmirror is a FreeBSD raid tool, perhaps similar in some ways to linux lvm. One similarity it has - It doesn''t count. You should be using zpool mirroring. Then ZFS will be aware of the redundant copy, and then ZFS has the potential to correct corruption it finds. If you''re doing the redundancy at a level below ZFS, then ZFS can only see one device. It cannot perform as well this way, and it cannot perform such features as redundant copy error correction.
sbremal at hotmail.com
2011-Nov-15 16:07 UTC
[zfs-discuss] Remove corrupt files from snapshot
Thanks anyone for the help, finally I removed corrupt files from the "current view" of the file system and left the snapshots as they were. This way at least the incremental backup continues. (It is sad that snapshots are so rigid that even corruption is permanent. What more interesting is that, if snapshots are read only, how can they become corrupted?) Would it make sense to do "zfs scrub" regularly and have a report sent, i.e. once a day, so discrepancy would be noticed beforehand? Is there anything readily available in the Freebsd ZFS package for this? B. ----------------------------------------> From: opensolarisisdeadlongliveopensolaris at nedharvey.com > To: sbremal at hotmail.com; zfs-discuss at opensolaris.org > Subject: RE: [zfs-discuss] Remove corrupt files from snapshot > Date: Mon, 14 Nov 2011 19:32:21 -0500 > > > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of sbremal at hotmail.com > > > > Actually a regular file (on a RAID1 setup with gmirror and 2 identical > disks) is > > used as backing store for ZFS. The hardware should be fine as nothing else > > seems to be corrupt. > > In a 10-second google, I see that gmirror is a FreeBSD raid tool, perhaps > similar in some ways to linux lvm. One similarity it has - It doesn''t > count. > > You should be using zpool mirroring. Then ZFS will be aware of the > redundant copy, and then ZFS has the potential to correct corruption it > finds. If you''re doing the redundancy at a level below ZFS, then ZFS can > only see one device. It cannot perform as well this way, and it cannot > perform such features as redundant copy error correction. >
On Tue, Nov 15, 2011 at 8:07 AM, <sbremal at hotmail.com> wrote:> Thanks anyone for the help, finally I removed corrupt files from the > "current view" of the file system and left the snapshots as they were. This > way at least the incremental backup continues. (It is sad that snapshots > are so rigid that even corruption is permanent. What more interesting is > that, if snapshots are read only, how can they become corrupted?) >The snapshot is read-only, meaning users cannot modify the data in the snapshots. However, there''s nothing to prevents random bit flips in the underlying storage. Maybe the physical harddrive has a bad block and gmirror copied the bad data to both disks, which flipped a bit or two in the file you are using to back the ZFS pool. Since ZFS only see a single device, it has no internal redundancy and can''t fix the corrupted bits, only report that it found a block where the on-disk checksum doesn''t match the computed checksum of the block. This is why you need to let ZFS handle redundancy via mirror vdevs, raidz vdevs, or (at the very least) copies=2 property on the ZFS filesystem. If there''s redundancy in the pool, then ZFS can correct the corruption.> Would it make sense to do "zfs scrub" regularly and have a report sent, > i.e. once a day, so discrepancy would be noticed beforehand? Is there > anything readily available in the Freebsd ZFS package for this? >Without any redundancy in the pool, all a scrub will do is let you know there is corrupted data in the pool. It can''t fix it. Neither can gmirror below the pool fix it. All you can do is delete the corrupted file and restore that file from backups. You really should get rid of the gmirror setup, dedicate the entire disks to ZFS, and create a pool using a mirror vdev. File-backed ZFS vdevs really should only be used for testing purposes. -- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111115/35a5dc08/attachment.html>
Use zpool status -v to see if any errors come up. Then you can use zpool scrub to remove at least some of them. I have had luck with this in the past. ---Todd On Nov 14, 2011, at 04:25 , <sbremal at hotmail.com> <sbremal at hotmail.com> wrote:> > Back to this topic, since I cannot touch snapshots I thought I could simply remove the corrupt files after the last snapshot, so the next incremental backup will notice the difference (i.e. no file) and overwrite the corrupt-and-removed files with valid ones. This was the plan. > > However, while checking for corrupt files, "find" stops at some directory with "fts_read: Not a directory": > > find . -exec md5 {} \; > /home/xxx/md5_out 2> /home/xxx/md5_err & > > tail /home/xxx/md5_err > ... > md5: ./.zfs/snapshot/20100323081201/Bazsi/Projects/Java Test Client/java_test_client/lib/xxx/weblogic.jar: Input/output error > md5: ./.zfs/snapshot/20100323081201/@Cache (Bazsi)/BMWi SP/Publikationen/PDF-Brosch?ren/Nexxt.pdf: Input/output error > find: fts_read: Not a directory > > What does this error mean? I cannot even "scan" the ZFS file system anymore? Is there any "fsck" for ZFS? > > > Cheers, > B. > > ---------------------------------------- >> From: zfsdiscuss at orgdotuk.org.uk >> To: zfs-discuss at opensolaris.org >> Date: Mon, 7 Nov 2011 21:49:56 +0000 >> Subject: Re: [zfs-discuss] Remove corrupt files from snapshot >> >>> -----Original Message----- >>> From: Edward Ned Harvey >>> Sent: 04/11/2011 21:23 >>> >>> You need to destroy the snapshot completely - But if you want >>> to selectively >>> delete from a snapshot, I think you can clone it, then >>> promote the clone, >>> then destroy the snapshot, then rm something from the clone and then >>> snapshot the clone back to the original name, and then >>> destroy the clone. >>> >>> Right? >> >> Not so fast! :-) >> >> If you promote this new clone, the current state / branch of your filesystem becomes a clone instead, dependent on the snapshot. >> Then if you try to destroy the snapshot, you''ll fail, because it has a dependent clone (your current fs!!!). If you continue >> without realising the implications, and so try the ''destroy'' again with ''-R'', there goes the neighbourhood! >> >> I did this once, and was only saved by the fact that my cwd was in my current filesystem, so couldn''t be unmounted, and therefore >> couldn''t be removed! Phew!! Nice to learn something and only get singed eyebrows, instead of losing a leg! >> >> hth Andy >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111115/40a6a8f2/attachment.html>
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Todd Urie > > Use zpool status -v to see if any errors come up. ?Then you can use zpool > scrub to remove at least some of them. ?I have had luck with this in thepast. Disks are made of chemicals, which can degrade over time. If some part of a disk starts to deteriorate, but you never attempt to read it, then you''ll never know it''s going bad. You should have redundancy, and scrub on a regular basis, much more frequently than the occurrence of a disk going bad - maybe once a week or once a month. If you can afford to scrub daily, that''s great. Depending on your system and your data, scrubs might take several hours, thus making it impractical to scrub daily.
On Tue, November 15, 2011 10:07, sbremal at hotmail.com wrote:> Would it make sense to do "zfs scrub" regularly and have a report sent, > i.e. once a day, so discrepancy would be noticed beforehand? Is there > anything readily available in the Freebsd ZFS package for this?If you''re not scrubbing regularly, you''re losing out on one of the key benefits of ZFS. In nearly all fileserver situations, a good amount of the content is essentially archival, infrequently accessed but important now and then. (In my case it''s my collection of digital and digitized photos.) A weekly scrub combined with a decent backup plan will detect bit-rot before the backups with the correct data cycle into the trash (and, with redundant storage like mirroring or RAID, the scrub will probably be able to fix the error without resorting to restoring files from backup). -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info