A few questions on data replication: Assuming I''ve created a pool named zfspool containing two unmirrored disks and I create: zfs create zfspool/test2 zfs set copies=2 zfspool/test2 Will data copied in there be guaranteed to be replicated on both devices? Or does ZFS just try its best to spread the data among different devices? Also: Data replication begins on newly-written data AFTER the set copies=? command. Is there a way to force a file system to "catch up" to the new replication settings? (ie: copies=1 switching to copies=2 for ALL data in the filesystem), or is the way to do it to create a new file system with copies=2 and move data to it? Also conversely, copies=2 switching to copies=1, in order to recover disk space... Thanks, -Tim
On Sun, Jan 18, 2009 at 4:36 PM, Timothy Renner <timothy.renner at gmail.com>wrote:> A few questions on data replication: > Assuming I''ve created a pool named zfspool containing two unmirrored > disks and I create: > > zfs create zfspool/test2 > zfs set copies=2 zfspool/test2 > > Will data copied in there be guaranteed to be replicated on both > devices? Or does ZFS just try its best to spread the data among > different devices? >No, and not even that. It guarantee''s you there will be two copies. It doesn''t even make any intentional effort to spread it to multiple devices from what I understand. The second copy goes where it goes.> > > Also: Data replication begins on newly-written data AFTER the set > copies=? command. Is there a way to force a file system to "catch up" > to the new replication settings? (ie: copies=1 switching to copies=2 > for ALL data in the filesystem), or is the way to do it to create a new > file system with copies=2 and move data to it? Also conversely, > copies=2 switching to copies=1, in order to recover disk space... >Again, AFAIK, no way to do so recursively short of the copying. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090118/fca7ffc1/attachment.html>
Tim wrote:> On Sun, Jan 18, 2009 at 4:36 PM, Timothy Renner > <timothy.renner at gmail.com <mailto:timothy.renner at gmail.com>> wrote: > > A few questions on data replication: > Assuming I''ve created a pool named zfspool containing two unmirrored > disks and I create: > > zfs create zfspool/test2 > zfs set copies=2 zfspool/test2 > > Will data copied in there be guaranteed to be replicated on both > devices? Or does ZFS just try its best to spread the data among > different devices? > > > No, and not even that. It guarantee''s you there will be two copies. > It doesn''t even make any intentional effort to spread it to multiple > devices from what I understand. The second copy goes where it goes.This is not quite correct. ZFS will attempt to place the copies on different vdevs. On the same vdev, it will try to place it somewhere which is not contiguous (spatial diversity). I''m curious where you got the information to the contrary? Perhaps there is a wiki somewhere that needs an edit? The copies property causes a lot of confusion and is difficult to explain in words. That is why I like the web, you can use pictures :-) http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection> > Also: Data replication begins on newly-written data AFTER the set > copies=? command. Is there a way to force a file system to "catch up" > to the new replication settings? (ie: copies=1 switching to copies=2 > for ALL data in the filesystem), or is the way to do it to create > a new > file system with copies=2 and move data to it? Also conversely, > copies=2 switching to copies=1, in order to recover disk space... > > > > Again, AFAIK, no way to do so recursively short of the copying.Yes. As for recovering space, that will work as long as the data is not part of a snapshot. -- richard
On Sun, Jan 18, 2009 at 10:12 PM, Richard Elling <Richard.Elling at sun.com>wrote:> This is not quite correct. ZFS will attempt to place the copies on > different vdevs. On the same vdev, it will try to place it somewhere > which is not contiguous (spatial diversity). I''m curious where you > got the information to the contrary? Perhaps there is a wiki somewhere > that needs an edit? > > The copies property causes a lot of confusion and is difficult to > explain in words. That is why I like the web, you can use pictures :-) > http://blogs.sun.com/relling/entry/zfs_copies_and_data_protectionHonestly, I believe this list... when other people have asked if they can use the copies= to avoid mirroring everything. I can''t say I''ve saved any of the threads because they didn''t seem of any particular importance to me at the time. Perhaps if I get motivated I''ll search for some later this week. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090118/c703e66d/attachment.html>
and that''s why I hated blogs! do you know what to read that is not misleading, in a sea of blogs????!!!! and you are wondering why western folks don''t like us?!> ??????????????Best, z ----- Original Message ----- From: Tim To: Richard Elling Cc: zfs-discuss at opensolaris.org Sent: Sunday, January 18, 2009 11:56 PM Subject: Re: [zfs-discuss] Understanding ZFS replication On Sun, Jan 18, 2009 at 10:12 PM, Richard Elling <Richard.Elling at sun.com> wrote: This is not quite correct. ZFS will attempt to place the copies on different vdevs. On the same vdev, it will try to place it somewhere which is not contiguous (spatial diversity). I''m curious where you got the information to the contrary? Perhaps there is a wiki somewhere that needs an edit? The copies property causes a lot of confusion and is difficult to explain in words. That is why I like the web, you can use pictures :-) http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection Honestly, I believe this list... when other people have asked if they can use the copies= to avoid mirroring everything. I can''t say I''ve saved any of the threads because they didn''t seem of any particular importance to me at the time. Perhaps if I get motivated I''ll search for some later this week. --Tim ------------------------------------------------------------------------------ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090118/c00bab72/attachment.html>
Beloved Tim, You challenged me a while ago, as a friend. I did what you asked me to do, in the honor of my father. Best, z ----- Original Message ----- From: JZ To: Tim Cc: zfs-discuss at opensolaris.org Sent: Sunday, January 18, 2009 11:58 PM Subject: Re: [zfs-discuss] Understanding ZFS replication and that''s why I hated blogs! do you know what to read that is not misleading, in a sea of blogs????!!!! and you are wondering why western folks don''t like us?! > ?????????????? Best, z ----- Original Message ----- From: Tim To: Richard Elling Cc: zfs-discuss at opensolaris.org Sent: Sunday, January 18, 2009 11:56 PM Subject: Re: [zfs-discuss] Understanding ZFS replication On Sun, Jan 18, 2009 at 10:12 PM, Richard Elling <Richard.Elling at sun.com> wrote: This is not quite correct. ZFS will attempt to place the copies on different vdevs. On the same vdev, it will try to place it somewhere which is not contiguous (spatial diversity). I''m curious where you got the information to the contrary? Perhaps there is a wiki somewhere that needs an edit? The copies property causes a lot of confusion and is difficult to explain in words. That is why I like the web, you can use pictures :-) http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection Honestly, I believe this list... when other people have asked if they can use the copies= to avoid mirroring everything. I can''t say I''ve saved any of the threads because they didn''t seem of any particular importance to me at the time. Perhaps if I get motivated I''ll search for some later this week. --Tim ---------------------------------------------------------------------------- _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ------------------------------------------------------------------------------ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090119/e33b94f5/attachment.html>
On 18-Jan-09, at 11:56 PM, Tim wrote:> > > On Sun, Jan 18, 2009 at 10:12 PM, Richard Elling > <Richard.Elling at sun.com> wrote: > This is not quite correct. ZFS will attempt to place the copies on > different vdevs. On the same vdev, it will try to place it somewhere > which is not contiguous (spatial diversity). I''m curious where you > got the information to the contrary? Perhaps there is a wiki > somewhere > that needs an edit? > > The copies property causes a lot of confusion and is difficult to > explain in words. That is why I like the web, you can use > pictures :-) > http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection > > Honestly, I believe this list...It''s better to believe the source code. --Toby> when other people have asked if they can use the copies= to avoid > mirroring everything. I can''t say I''ve saved any of the threads > because they didn''t seem of any particular importance to me at the > time. > > Perhaps if I get motivated I''ll search for some later this week. > > --Tim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090119/eb1a6993/attachment.html>
On Sun, 18 Jan 2009, Tim wrote:> Honestly, I believe this list... when other people have asked if they can > use the copies= to avoid mirroring everything. I can''t say I''ve saved any > of the threads because they didn''t seem of any particular importance to me > at the time.The extra copies help avoid data loss, but if a disk is lost and there is no disk-wise redundancy, then the pool will be lost. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Sun, 18 Jan 2009, Tim wrote: > >> Honestly, I believe this list... when other people have asked if they can >> use the copies= to avoid mirroring everything. I can''t say I''ve saved any >> of the threads because they didn''t seem of any particular importance to me >> at the time. > > The extra copies help avoid data loss, but if a disk is lost and there > is no disk-wise redundancy, then the pool will be lost.I''m reading a lot of posts where folks don''t seem to be understanding each other, so let me try and re-phrase things. If you set copies=n, where n > 1, ZFS will _attempt_ to put the copies on different block devices. If it can''t, it will _attempt_ to place the copies "far" away from each other on the same block device. The key word above is "attempt". Previous posters have shot this down for "poor man''s mirroring" because of the lack of guarantees. I suspect these naysayers (and rightly so) are what Tim is recalling. -- Carson
Z,> Beloved Tim, > You challenged me a while ago, as a friend. > I did what you asked me to do, in the honor of my father. > > Best, > zPlease don''t post personal stuff like this or links to wikipedia or other ephemera/apocrypha to this/any list unless they are relevant. Thanks... Sean.
This make sense. Given set of devices, ZFS can only write to free blocks. If the only free blocks are close together or on the same dev, then the protection can''t be as great. This is quite likely to happen on a fullish disk. copies > 1, however, is still better than none (a single dropped block in the right place can wreak havoc). I personally like to use the ''copies'' feature on machines where allocation priority for devices is for my storage pool, rather than my root pool. I like the idea that I can have multiple copies of blocks on my (single) boot device. This also work nicely because on a Solaris machine with lots of memory, I don''t have to write to the disk much after boot, so the performance penalty seems fairly small. I have this running right now in one case. When I get the ability to mirror my rpool, I can remove the copies property if I wish. One other important caveat is that zfs properties only apply to newly-written data. So setting copies > 1 after an install won''t make copies of the blocks you did the initial install to, just the block written going forward. cheers, Blake On Mon, Jan 19, 2009 at 1:04 AM, Carson Gaspar <carson at taltos.org> wrote:> Bob Friesenhahn wrote: >> On Sun, 18 Jan 2009, Tim wrote: >> >>> Honestly, I believe this list... when other people have asked if they can >>> use the copies= to avoid mirroring everything. I can''t say I''ve saved any >>> of the threads because they didn''t seem of any particular importance to me >>> at the time. >> >> The extra copies help avoid data loss, but if a disk is lost and there >> is no disk-wise redundancy, then the pool will be lost. > > I''m reading a lot of posts where folks don''t seem to be understanding > each other, so let me try and re-phrase things. > > If you set copies=n, where n > 1, ZFS will _attempt_ to put the copies > on different block devices. If it can''t, it will _attempt_ to place the > copies "far" away from each other on the same block device. > > The key word above is "attempt". Previous posters have shot this down > for "poor man''s mirroring" because of the lack of guarantees. I suspect > these naysayers (and rightly so) are what Tim is recalling. > > -- > Carson > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
>>>>> "tr" == Timothy Renner <timothy.renner at gmail.com> writes:tr> zfs set copies=2 zfspool/test2 ''copies=2'' says things will be written twice, but regardless of discussion about where the two copies are written, copies=2 says nothing at all about being able to *read back* your data if one of the copies disappears. It only promises that the two copies will be written. This does you no good at all if you can''t import the pool, which is probably what will happen to anyone who has relied on copies=2 for redundancy. The discussion about *where* the copies tend to be written is really impractical and distracting, IMO. The chance that the copies won''t be written to separate vdev''s is not where the problem comes from. You can''t import a pool unless it has enough redundancy at vdev-level to get all your data, so copies=2 doesn''t add much. The best copies=2 will do is give you a slightly better shot at evacuating the data from a slowly-failing drive. If anyone at all should be using it, certainly I don''t think someone with more than one drive should be using it. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090119/3e194f93/attachment.bin>
Timothy Renner
2009-Jan-19 19:50 UTC
[zfs-discuss] Feature Request Discussion (Was: Understanding ZFS replication)
So personally I find ZFS to be fantastic, it''s only missing three features from my ideal filesystem: 1) The ability to easily recover the portions of a filesystem that are still intact after a catastrophic failure (It looks like zfs scrub can do this as long as a damaged pool could be imported, so this is almost there, or it''s hackable at the moment if a bit of drive information has been kept around) 2) The ability to push the data off a device and safely remove it from a non-mirrored pool (Marked as a future feature) 3) File system level mirroring across devices, rather than device level mirroring, so... To raise this issue for discussion, pros/cons/not worth the effort, ideas: It would be fantastic if ZFS could support another option for copies that **guarantees** that it writes copies to different devices, and if it cannot (due to free space constraints or failing/failed device), write to the same device, but raise an error/warning that could be checked in zpool status or similar fashion (similar to a RAID5 losing a disk... It''s workable, simply degraded) zpool scrub strikes me as the perfect tool to attempt to enforce the copies=X attribute, as a way to bring the entire filesystem into line with the current settings and ensure that old data meets the requirement, rather than only affecting new data. An issue I immediately see here would involve possibly needing to move data from one disk to another in order to free up space for replication across devices, which is likely non-trivial. -Tim Miles Nordin wrote:>>>>>> "tr" == Timothy Renner <timothy.renner at gmail.com> writes: >>>>>> > > tr> zfs set copies=2 zfspool/test2 > > ''copies=2'' says things will be written twice, but regardless of > discussion about where the two copies are written, copies=2 says > nothing at all about being able to *read back* your data if one of the > copies disappears. It only promises that the two copies will be > written. This does you no good at all if you can''t import the pool, > which is probably what will happen to anyone who has relied on > copies=2 for redundancy. > > The discussion about *where* the copies tend to be written is really > impractical and distracting, IMO. > > The chance that the copies won''t be written to separate vdev''s is not > where the problem comes from. You can''t import a pool unless it has > enough redundancy at vdev-level to get all your data, so copies=2 > doesn''t add much. The best copies=2 will do is give you a slightly > better shot at evacuating the data from a slowly-failing drive. If > anyone at all should be using it, certainly I don''t think someone with > more than one drive should be using it. > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >