Robert Watzlavick
2011-Oct-22 16:27 UTC
[zfs-discuss] File contents changed with no ZFS error
I''ve noticed something strange over the past few months with four files on my raidz. Here''s the setup: OpenSolaris snv_111b ZFS Pool version 14 AMD-based server with ECC RAM. 5 ST3500630AS 500 GB SATA drives (4 active plus spare) in raidz1 The other day, I observed what appears to be undetected file corruption in 4 of the files on the raidz. I have two external USB hard drives that I use to back up the contents of the ZFS raidz on alternating months. The USB hard drives use EXT3 so they are connected to a Linux box which in turn connects to the raidz over NFS. Occasionally, I use the checksum option on rsync (rsync -ainc) to make sure everything on the USB hard drives match before I perform the real rsync back from the raid to the USB disk and that''s when I noticed the changes. In each file, there was a single byte changed. Running zpool status doesn''t show any errors and running zpool scrub doesn''t show any problems either. One of the changed files was a .ppt file that I downloaded from the web over a year ago and the other 3 were Acronis incremental Backup files from my XP machine that get stored on the raidz. Since ZFS files aren''t supposed to be corrupted without notification (right?), I initially assumed the problem was with the USB drive. For the 3 Acronis backup files, I had no way of knowing which version was the correct one because Acronis shows all of them to be valid. The .ppt file was not on the web anymore but with the help of the Wayback machine, I was able to re-download it and that''s when I confirmed the "good" copy from the web matches the copy on my USB hard drive, not the copy on the raidz. I know I haven''t modified the .ppt file because the date still matches the date I downloaded it, 2010-01-12. What failure scenario could have caused this? The file was obviously initially good on the raidz because it got backed up to the USB drive and that matches the "good" version from the web. Thanks in advance, -Bob
Garrett D''Amore
2011-Oct-22 17:55 UTC
[zfs-discuss] File contents changed with no ZFS error
You''re using an *old* version of both OpenSolaris and zpool. There have been a few corruption bugs fixed since then. I''d recommend updating. - Garrett On Oct 22, 2011, at 9:27 AM, Robert Watzlavick wrote:> I''ve noticed something strange over the past few months with four files on my raidz. Here''s the setup: > OpenSolaris snv_111b > ZFS Pool version 14 > AMD-based server with ECC RAM. > 5 ST3500630AS 500 GB SATA drives (4 active plus spare) in raidz1 > > The other day, I observed what appears to be undetected file corruption in 4 of the files on the raidz. I have two external USB hard drives that I use to back up the contents of the ZFS raidz on alternating months. The USB hard drives use EXT3 so they are connected to a Linux box which in turn connects to the raidz over NFS. Occasionally, I use the checksum option on rsync (rsync -ainc) to make sure everything on the USB hard drives match before I perform the real rsync back from the raid to the USB disk and that''s when I noticed the changes. In each file, there was a single byte changed. Running zpool status doesn''t show any errors and running zpool scrub doesn''t show any problems either. > > One of the changed files was a .ppt file that I downloaded from the web over a year ago and the other 3 were Acronis incremental Backup files from my XP machine that get stored on the raidz. Since ZFS files aren''t supposed to be corrupted without notification (right?), I initially assumed the problem was with the USB drive. For the 3 Acronis backup files, I had no way of knowing which version was the correct one because Acronis shows all of them to be valid. The .ppt file was not on the web anymore but with the help of the Wayback machine, I was able to re-download it and that''s when I confirmed the "good" copy from the web matches the copy on my USB hard drive, not the copy on the raidz. I know I haven''t modified the .ppt file because the date still matches the date I downloaded it, 2010-01-12. > > What failure scenario could have caused this? The file was obviously initially good on the raidz because it got backed up to the USB drive and that matches the "good" version from the web. > > Thanks in advance, > -Bob > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Edward Ned Harvey
2011-Oct-22 18:14 UTC
[zfs-discuss] File contents changed with no ZFS error
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Robert Watzlavick > > What failure scenario could have caused this? The file was obviously > initially good on the raidz because it got backed up to the USB drive > and that matches the "good" version from the web.How can you outrule the possibility of "something changed the file." Intentionally, not as a form of filesystem corruption. If you have snapshots on your ZFS filesystem, you can use zhist (or whatever technique you want) to see in which snapshot(s) it changed, and find all the unique versions of it. ''Course that will only give you any valuable information if you have different versions of the file in different snapshots.
Robert Watzlavick
2011-Oct-22 19:08 UTC
[zfs-discuss] File contents changed with no ZFS error
On Oct 22, 2011, at 13:14, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:>> > How can you outrule the possibility of "something changed the file." > Intentionally, not as a form of filesystem corruption.I suppose that''s possible but seems unlikely. One byte on a file changed on the disk with no corresponding change in the mod time seems unlikely. I did access that file for read sometime I''m the past few months but again, if it had accidentally been written to, the time would have been updated.> > If you have snapshots on your ZFS filesystem, you can use zhist (or whatever > technique you want) to see in which snapshot(s) it changed, and find all the > unique versions of it. ''Course that will only give you any valuable > information if you have different versions of the file in different > snapshots. >I only have one or two snapshots but I''ll look. Thanks, -Bob
Robert Watzlavick
2011-Oct-22 19:13 UTC
[zfs-discuss] File contents changed with no ZFS error
On Oct 22, 2011, at 12:55, Garrett D''Amore <Garrett.DAmore at nexenta.com> wrote:> You''re using an *old* version of both OpenSolaris and zpool. There have been a few corruption bugs fixed since then. I''d recommend updating. > > - Garrett >I was looking for the changelist to see if any have been fixed but couldnt find it. This is 2009.06 with some updates but not the final set. It''s just a fileserver so I was tempted to freeze the config since it''s not on the net and it has been working so well (then again maybe not). Do you know if the final updates to 2009.06 can still be applied? -Bob> On Oct 22, 2011, at 9:27 AM, Robert Watzlavick wrote: > >> I''ve noticed something strange over the past few months with four files on my raidz. Here''s the setup: >> OpenSolaris snv_111b >> ZFS Pool version 14 >> AMD-based server with ECC RAM. >> 5 ST3500630AS 500 GB SATA drives (4 active plus spare) in raidz1 >> >> The other day, I observed what appears to be undetected file corruption in 4 of the files on the raidz. I have two external USB hard drives that I use to back up the contents of the ZFS raidz on alternating months. The USB hard drives use EXT3 so they are connected to a Linux box which in turn connects to the raidz over NFS. Occasionally, I use the checksum option on rsync (rsync -ainc) to make sure everything on the USB hard drives match before I perform the real rsync back from the raid to the USB disk and that''s when I noticed the changes. In each file, there was a single byte changed. Running zpool status doesn''t show any errors and running zpool scrub doesn''t show any problems either. >> >> One of the changed files was a .ppt file that I downloaded from the web over a year ago and the other 3 were Acronis incremental Backup files from my XP machine that get stored on the raidz. Since ZFS files aren''t supposed to be corrupted without notification (right?), I initially assumed the problem was with the USB drive. For the 3 Acronis backup files, I had no way of knowing which version was the correct one because Acronis shows all of them to be valid. The .ppt file was not on the web anymore but with the help of the Wayback machine, I was able to re-download it and that''s when I confirmed the "good" copy from the web matches the copy on my USB hard drive, not the copy on the raidz. I know I haven''t modified the .ppt file because the date still matches the date I downloaded it, 2010-01-12. >> >> What failure scenario could have caused this? The file was obviously initially good on the raidz because it got backed up to the USB drive and that matches the "good" version from the web. >> >> Thanks in advance, >> -Bob >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Why don''t you see which byte differs, and how it does? Maybe that would suggest the "failure mode". Is it the same byte data in all affected files, for instance? Mark Sent from my iPhone On Oct 22, 2011, at 2:08 PM, Robert Watzlavick <robert at watzlavick.com> wrote:> On Oct 22, 2011, at 13:14, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote: >>> >> How can you outrule the possibility of "something changed the file." >> Intentionally, not as a form of filesystem corruption. > > I suppose that''s possible but seems unlikely. One byte on a file changed on the disk with no corresponding change in the mod time seems unlikely. I did access that file for read sometime I''m the past few months but again, if it had accidentally been written to, the time would have been updated. >> >> If you have snapshots on your ZFS filesystem, you can use zhist (or whatever >> technique you want) to see in which snapshot(s) it changed, and find all the >> unique versions of it. ''Course that will only give you any valuable >> information if you have different versions of the file in different >> snapshots. >> > I only have one or two snapshots but I''ll look. > > Thanks, > -Bob > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Watzlavick
2011-Oct-23 20:36 UTC
[zfs-discuss] File contents changed with no ZFS error
On 10/22/2011 04:14 PM, Mark Sandrock wrote:> Why don''t you see which byte differs, and how it does? > Maybe that would suggest the "failure mode". Is it the > same byte data in all affected files, for instance? > > MarkI found something interesting with the .ppt file. Apparently, just opening a .ppt file (not saving though) from MS Office will change bytes in the file and not update the modification time. I was able to duplicate the changes to an arbitrary PPT file (actually it was more than one byte when I looked closer). The MS-PPT spec shows it to be a section related to the current user''s profile. The interesting thing is that it kept the length the same, changed bytes, but didn''t update the Windows modify time. If I mark the file read-only, it doesn''t get changed of course but in both cases, there was no indication from the UI of a change to the file. The Unix stat command for this file shows: Access: 2011-10-22 18:59:39.671340999 -0500 Modify: 2010-01-12 21:15:07.456360000 -0600 Change: 2011-03-09 20:16:22.094604104 -0600 I originally thought the change date of 2011-03-09 corresponded to some permission changes I made on that folder but that is around the time I would have last opened the file. So the mystery solved for that one. Now on to find out why the 3 Acronis Backup files got modified. This is good news so far... -Bob
Edward Ned Harvey
2011-Oct-24 14:11 UTC
[zfs-discuss] File contents changed with no ZFS error
> From: Robert Watzlavick [mailto:robert at watzlavick.com] > Sent: Sunday, October 23, 2011 4:36 PM > > Now on to find out why the 3 Acronis Backup files got modified. This is > good news so far...I expect you''ll find the same thing for Acronis. Acronis updates those individual files to make them aware of each other. When you open file1.tib, it knows it''s part of a backup set (version chain) that includes file0, file1, file2, file3. Of course, this is probably done by Acronis writing a little bit to file1.tib. Heaven forefend you should want to backup a truecrypt file. There is a reason why backups stopped depending on timestamps years ago. Welcome and praise zfs send.
On Sat, Oct 22, 2011 at 12:27 PM, Robert Watzlavick <robert at watzlavick.com> wrote:> What failure scenario could have caused this? ?The file was obviously > initially good on the raidz because it got backed up to the USB drive and > that matches the "good" version from the web.I ran into a similar "failure" with a ZFS shared via SAMBA. The data has ACLs on them to permit new data to be added, but nothing modified or removed. When testing the configuration the end users had no problems using Windows to copy data into the share. When they used a specific tool to copy (and then verify via checksum) the data, it was occasionally flagging a bad copy. Turns out that the tool was actually copying _more_ data than was in the original and then going back and removing the extra white space at the end of the file (the files matched byte for byte up until the original ended and the copy did not). The ZFS ACL was doing what the end user needed (no modifications). I reported the "problem" to the folks who wrote the tool and never heard anything back. My end users have stopped using that tool for copies, they still use it for verifications (and for that it is fine as it does not try to change the data). This was originally reported to me as a problem with ZFS, SAMBA, or the ACLs I had set up. It is amazing how much _changing_ of data goes on with no knowledge by the end users. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
Edward Ned Harvey
2011-Oct-24 14:42 UTC
[zfs-discuss] File contents changed with no ZFS error
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Robert Watzlavick > > I have two external USB hard drives > that I use to back up the contents of the ZFS raidz on alternating > months. The USB hard drives use EXT3 so they are connected to a Linux > box which in turn connects to the raidz over NFS. Occasionally, I use > the checksum option on rsync (rsync -ainc) to make sure everything on > the USB hard drives match before I perform the real rsync back from the > raid to the USB disk and that''s when I noticed the changes.I would suggest finding a way to connect the external disks directly to the ZFS server, and start using zfs send instead.
Robert Watzlavick
2011-Oct-24 15:20 UTC
[zfs-discuss] File contents changed with no ZFS error
On Oct 24, 2011, at 9:42, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> > I would suggest finding a way to connect the external disks directly to the > ZFS server, and start using zfs send instead. >Since these were my offsite backups I was using Truecrypt which drove the use of ext3 and Linux. Also I wanted to be sure in the event of a disaster I could easily find a machine to read them. I had reservations about just any machine being able to boot the 2009.06 live cd and read them. But now that more distros are starting to support ZFS, I''ll probably switch over to it for the external USB drive backups. -Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111024/b699706a/attachment.html>
Robert Watzlavick
2011-Oct-28 01:08 UTC
[zfs-discuss] File contents changed with no ZFS error
Just to close out the discussion, I wasn''t able to prove any issues with ZFS. The files that were changed all seem to have plausible scenarios. I''ve moved my external USB drive backups over to ZFS directly connected to the file server and it''s all working fine. Thanks for everyone''s help! -Bob