Mike Brancato
2008-Dec-05 15:55 UTC
[zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
I''ve seen discussions as far back as 2006 that say development is underway to allow the addition and remove of disks in a raidz vdev to grow/shrink the group. Meaning, if a 4x100GB raidz only used 150GB of space, one could do ''zpool remove tank c0t3d0'' and data residing on c0t3d0 would be migrated to other disks in the raidz. Then, c0t3d0 would be free for removal and reuse. What is the status of this support in nv101? If a pool has multiple raidz vdevs, how would one add a disk to the second raidz vdev? -- This message posted from opensolaris.org
Richard Elling
2008-Dec-05 18:28 UTC
[zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
Mike Brancato wrote:> I''ve seen discussions as far back as 2006 that say development is underway to allow the addition and remove of disks in a raidz vdev to grow/shrink the group. Meaning, if a 4x100GB raidz only used 150GB of space, one could do ''zpool remove tank c0t3d0'' and data residing on c0t3d0 would be migrated to other disks in the raidz. Then, c0t3d0 would be free for removal and reuse. > > What is the status of this support in nv101?Not available. I predict that you will see it mentioned everywhere, billboards, graffiti, slashdot, etc. when it arrives. -- richard
Mike Brancato
2008-Dec-05 19:01 UTC
[zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
Well, I knew it wasn''t available. I meant to ask what is the status of the development of the feature? Not started, I presume. Is there no timeline? -- This message posted from opensolaris.org
Miles Nordin
2008-Dec-05 20:40 UTC
[zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
>>>>> "mb" == Mike Brancato <mike at mikebrancato.com> writes:mb> if a 4x100GB raidz only used 150GB of space, one could do mb> ''zpool remove tank c0t3d0'' and data residing on c0t3d0 would mb> be migrated to other disks in the raidz. that sounds like in-place changing of stripe width, and wasn''t part of the discussion I remember. We were wishing for vdev removal, but you''d have to remove a whole vdev at a time. It would be analagous to ''zpool add'', so just as you can''t add 1 disk to widen a 3-disk raidz vdev to 4-disks, you couldn''t do the reverse even with the wished-for feature. To change from 4x100GB raidz to 3x100GB raidz, you''d have to: zpool add pool raidz disk5 disk6 disk7 zpool evacuate pool raidz disk1 disk2 disk3 disk4 RFE 4852783 is to create something like zpool evacuate, removing the whole vdev at once and migrating onto other vdev''s, not other disks. The feature''s advantage as-is would be for pools with many vdev''s. It could also be an advantage for pools with just one vdev that are humongous: you want to change the shape of the 1 vdev, but you need to do the copy/evacuation online because it takes a week. If not for the week, on a 1-vdev pool you could destroy the pool and make a new one without needing any more media than you would with the new feature. For home storage with big, slow, cheap pools, what you want sounds nice. Someone once told me he''d gotten Veritas to change a plex''s width with the vg online, but for me I think it''s scary because, if it crashed halfway through, I''m not sure how the system could communicate to me what''s happening in a way I''d understand, much less recover from it. I''m not saying Veritas doesn''t do both, just that I''d chuckle happily if I saw it actually work (which was the storyteller''s response too). For vdev removal I think you could harmlessly stop the evacuation at any time with only O(1) quickie-import-time recovery, without needing to communicate anything. much easier. so i like the RFE as-is, analagous to Linux LVM2''s pvmove. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081205/b4f9a801/attachment.bin>
Ross
2008-Dec-06 22:59 UTC
[zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
If I remember right, the code needed for this has implications for a lot of things: - defrag - adding disks to raidz zvols - removing disks from vols - restriping volumes (to give consistent performance after expansion) In fact, I just found the question I asked a year or so back, which had a good reply from Jeff http://opensolaris.org/jive/message.jspa?messageID=186561 ... and while typing this, I also just found this blog post from Adam Leventhal in April, which is also related: http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z -- This message posted from opensolaris.org
Al Tobey
2008-Dec-06 23:55 UTC
[zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
They also mentioned this at some of the ZFS talks at LISA 2008. The general argument is that, while plenty of hobbyists are clamoring for this, not enough paying customers are asking to make it a high enough priority to get done. If you think about it, the code is not only complicated but will be incredibly hard to get right and _prove_ it''s right. Maybe the ZFS guys can just borrow the algorithm from Linux mdraid''s experimental CONFIG_MD_RAID5_RESHAPE: http://git.kernel.org/?p=linux/kernel/git/djbw/md.git;a=blob;f=drivers/md/raid5.c;h=224de022e7c5d6574cf46747947b3c9e326c8632;hb=HEAD#1885 -- This message posted from opensolaris.org