Paul Hedderly
2007-Jun-14 11:45 UTC
[zfs-discuss] ZFS ditto ''mirroring'' on JAOBOD ? Please, pretty please!
Strikes me that at the moment Sun/ZFS team is missing a great opportunity. Imagine Joe bloggs has a historical machine with Just Any Old Bunch Of Discs... (it''s not me, no really). He doesn''t want to have to think too hard about pairing them up in mirrors or in raids - and sometimes they die or are just too small so need to get swapped out - or maybe they are iSCSI/AoE targets that might disappear (say the ''spare space'' on a thousand desktop PC''s...) What Joe really wants to say to ZFS is: "Here is a bunch of discs. Use them any way you like - but I''m setting ''copies=2'' or ''stripes=5'' and ''parity=2'' so you just go allocating space on any of these discs trying to make sure I always have resilliance at the data level." Now I can do that at the moment - well the copies/ditto kind anyway - but if I lose or remove one of the discs, zfs will not start the zpool. [i]That sucks!!![/i] Because... if one disc has gone from a bunch of 10 or so, and I have all my data and metadata using dittos, then the data that was on that disc is replicated on the others - so losing one disc is not a problem (unless there wasn''t space to store all the copies on the other discs, I know) but zfs should be able to start that zpool and give me the option to reditto the data that has lost copies on the dead/removed disc. So I get nice flexible "mirroring" by just throwing a JAOBOD at zfs and it does all the hard work. I really cant see this being difficult - but I guess it is dependant on the zpool remove <vdev> functionality being complete. -- Paul This message posted from opensolaris.org
Will Murnane
2007-Jun-14 13:12 UTC
[zfs-discuss] ZFS ditto ''mirroring'' on JAOBOD ? Please, pretty please!
On 6/14/07, Paul Hedderly <paul+os at mjr.org> wrote:> What Joe really wants to say to ZFS is: "Here is a bunch of discs. Use them any way you like - but I''m setting ''copies=2'' or ''stripes=5'' and ''parity=2'' so you just go allocating space on any of these discs trying to make sure I always have resilliance at the data level."I can think of one good way to mirror across several different-sized disks. Put the disk blocks in some order, then snip down the middle. Mirror the two halves. If more redundancy is needed, cut in N pieces, and mirror across the N chunks. Attached (if the list lets them through, anyways) is a diagram of what I mean by "snip down the middle". The "drives.png" shows the blocks of each disk in a different color. Then the "drives-mirrored.png" shows what happens when you cut that image in half and put the two pieces on top of each other. Then pairs of blocks that are vertically aligned with each other would contain the same data. You could do this manually, or using a script; carve out blocks of disks such that they line up with each other, then add mirror vdevs with those matching chunks to your pool. Adding another device to this conglomeration would be tricky, though. I''ll have to think about it some more.> I really cant see this being difficult - but I guess it is dependant on the zpool remove <vdev> functionality being complete.Agreed. From a Joe Blog point of view, if you know a disk is going bad, it''d be nice to be able to tell ZFS to get blocks off it rather than have to replace it. On 6/14/07, Mario Goebbels <me at tomservo.cc> wrote:> Say you have two disks, one 50GB and one 100GB, part of your data can > only be ditto''d within the upper 50GB of the larger disk. If the larger > one fails, it takes about 25GB of gross data with it, since the ditto > blocks couldn''t be spread to different disks.Well, yes. If you don''t have enough disk space to mirror the largest disk in your array, it''s not going to work out very well. But suppose you had two 100GB disks and one 50GB disk. That''s enough disk space that you should be able to get 125GB of mirrored space out of it, but conventional wisdom says leave the 50GB disk out of it and live with 100. Will -------------- next part -------------- A non-text attachment was scrubbed... Name: drives.png Type: image/png Size: 333 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070614/e8b29171/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: drives-mirrored.png Type: image/png Size: 404 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070614/e8b29171/attachment-0001.png>
Mario Goebbels
2007-Jun-14 15:53 UTC
[zfs-discuss] ZFS ditto ''mirroring'' on JAOBOD ? Please, pretty please!
A bunch of disks of different sizes will make it a problem. I wanted to post that idea to the mailing list before, but didn''t do so, since it doesn''t make too much sense. Say you have two disks, one 50GB and one 100GB, part of your data can only be ditto''d within the upper 50GB of the larger disk. If the larger one fails, it takes about 25GB of gross data with it, since the ditto blocks couldn''t be spread to different disks. More than two disks adds to the chaos. It may reduce the gross data loss of slack ditto blocks (slack as in space that can''t be made physically redundant), still isn''t too redundant. Note that I say gross data, because it might just happen that a lot of your files may have most of its blocks ditto''d across physical disk, but not all of them. That one specific block in file A that''s ditto''d on the same physical disk instead of being spread might just ruin your day. -mg> Strikes me that at the moment Sun/ZFS team is missing a great opportunity. > > Imagine Joe bloggs has a historical machine with Just Any Old Bunch Of Discs... (it''s not me, no really). > > He doesn''t want to have to think too hard about pairing them up in mirrors or in raids - and sometimes they die or are just too small so need to get swapped out - or maybe they are iSCSI/AoE targets that might disappear (say the ''spare space'' on a thousand desktop PC''s...) > > What Joe really wants to say to ZFS is: "Here is a bunch of discs. Use them any way you like - but I''m setting ''copies=2'' or ''stripes=5'' and ''parity=2'' so you just go allocating space on any of these discs trying to make sure I always have resilliance at the data level." > > Now I can do that at the moment - well the copies/ditto kind anyway - but if I lose or remove one of the discs, zfs will not start the zpool. [i]That sucks!!![/i] > > Because... if one disc has gone from a bunch of 10 or so, and I have all my data and metadata using dittos, then the data that was on that disc is replicated on the others - so losing one disc is not a problem (unless there wasn''t space to store all the copies on the other discs, I know) but zfs should be able to start that zpool and give me the option to reditto the data that has lost copies on the dead/removed disc. > > So I get nice flexible "mirroring" by just throwing a JAOBOD at zfs and it does all the hard work. > > I really cant see this being difficult - but I guess it is dependant on the zpool remove <vdev> functionality being complete.-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 648 bytes Desc: This is a digitally signed message part URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070614/33e4a753/attachment.bin>
Matthew Ahrens
2007-Jun-14 17:18 UTC
[zfs-discuss] ZFS ditto ''mirroring'' on JAOBOD ? Please, pretty please!
Paul Hedderly wrote:> Now I can do that at the moment - well the copies/ditto kind anyway - but > if I lose or remove one of the discs, zfs will not start the zpool. > [i]That sucks!!![/i]Agreed, that is a bug (perhaps related to 6540322). --matt
Richard Elling
2007-Jun-14 17:46 UTC
[zfs-discuss] ZFS ditto ''mirroring'' on JAOBOD ? Please, pretty please!
Paul Hedderly wrote:> Strikes me that at the moment Sun/ZFS team is missing a great opportunity. > > Imagine Joe bloggs has a historical machine with Just Any Old Bunch Of Discs... (it''s not me, no really). > > He doesn''t want to have to think too hard about pairing them up in mirrors or in raids - and sometimes they die or are just too small so need to get swapped out - or maybe they are iSCSI/AoE targets that might disappear (say the ''spare space'' on a thousand desktop PC''s...) > > What Joe really wants to say to ZFS is: "Here is a bunch of discs. Use them any way you like - but I''m setting ''copies=2'' or ''stripes=5'' and ''parity=2'' so you just go allocating space on any of these discs trying to make sure I always have resilliance at the data level." > > Now I can do that at the moment - well the copies/ditto kind anyway - but if I lose or remove one of the discs, zfs will not start the zpool. [i]That sucks!!![/i] > > Because... if one disc has gone from a bunch of 10 or so, and I have all my data and metadata using dittos, then the data that was on that disc is replicated on the others - so losing one disc is not a problem (unless there wasn''t space to store all the copies on the other discs, I know) but zfs should be able to start that zpool and give me the option to reditto the data that has lost copies on the dead/removed disc. > > So I get nice flexible "mirroring" by just throwing a JAOBOD at zfs and it does all the hard work.There is a fundamental difference between mirroring and ditto blocks. The allocation of data onto devices is guaranteed to be on both sides of a mirror. The allocation of ditto blocks depends on the available space -- two or more copies of the data could be placed on the same device. If that device is lost, then data is lost. I expect the more common use of ditto blocks will be for single-disk systems such as laptops. This is a good thing for laptops.> I really cant see this being difficult - but I guess it is dependant on the zpool remove <vdev> functionality being complete.There is another issue here. As you note, figuring out the optimal placement of data for a given collection of devices can be more work than we''d like. We can solve this problem using proven mathematical techniques. If such a wizard was available, would you still want to go down the path of possible data loss due to a single device failure? -- richard