Hi, I am searching for a roadmap for shrinking a pool. Is there some project, where can I find informations, when will it be implemented in Solars10 Thanks Regards Bernhard -- Bernhard Holzer Sun Microsystems Ges.m.b.H. Wienerbergstra?e 3/7 A-1100 Vienna, Austria Phone x60983/+43 1 60563 11983 Mobile +43 664 60563 11983 Fax +43 1 60563 11920 Email Bernhard.Holzer at Sun.COM Handelsgericht Wien, Firmenbuch-Nr. FN 186250 y
Long story short, There isn''t a project, there are no plans to start a project, and don''t expect to see it in Solaris10 in this lifetime without some serious pushback from large Sun customers. Even then, it''s unlikely to happen anytime soon due to the technical complications of doing so reliably. --Tim On Mon, Aug 18, 2008 at 6:06 AM, Bernhard Holzer <Bernhard.Holzer at sun.com>wrote:> Hi, > > I am searching for a roadmap for shrinking a pool. Is there some > project, where can I find informations, when will it be implemented in > Solars10 > > Thanks > Regards > Bernhard > > -- > Bernhard Holzer > Sun Microsystems Ges.m.b.H. > Wienerbergstra?e 3/7 > A-1100 Vienna, Austria > Phone x60983/+43 1 60563 11983 > Mobile +43 664 60563 11983 > Fax +43 1 60563 11920 > Email Bernhard.Holzer at Sun.COM > Handelsgericht Wien, Firmenbuch-Nr. FN 186250 y > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080818/81f9bbd2/attachment.html>
WOW! This is quite a departure from what we''ve been told for the past 2 years... In fact if your comments are true that we''ll never be able to shrink a ZFS pool, i will be, for lack of a better word, PISSED. Like others not being able to shrink is a feature that truly prevents us from replacing all of our Veritas... without being able to shrink, ZFS will be stuck in our dev environment and our non-critical systems... This message posted from opensolaris.org
> WOW! This is quite a departure from what we''ve been > told for the past 2 years...This must be misinformation. The reason there''s no project (yet) is very likely because pool shrinking depends strictly on the availability of bp_rewrite functionality, which is still in development. The last time the topic came up, maybe a few months ago, still in 2008, the discussion indicated that it''s still on the plan. But as said, it relies on aforementioned functionality to be present. -mg This message posted from opensolaris.org
Mario Goebbels wrote:>> WOW! This is quite a departure from what we''ve been >> told for the past 2 years... >> > > This must be misinformation. > > The reason there''s no project (yet) is very likely because pool shrinking depends strictly on the availability of bp_rewrite functionality, which is still in development. > > The last time the topic came up, maybe a few months ago, still in 2008, the discussion indicated that it''s still on the plan. But as said, it relies on aforementioned functionality to be present. > >I agree, it''s on the plan, but in addition to the dependency on that feature it was at a very low priority. If I recall, the low priority was based on the percieved low demand for the feature in enterprise organizations. As I understood it shrinking a pool is percieved as being a feature most desired by home/hobby/development users, and that enterprises mainly only grow thier pools, not shrink. So if anyone in enterprise has need to shrink pools they might want to notify thier Sun support people, and make their voices heard. Unless of course I''m wrong.... Which has been known to happen from time to time. :) -Kyle> -mg > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Our "enterprise" is about 300TB.. maybe a bit more... You are correct that most of the time we grow and not shrink... however, we are fairly dynamic and occasionally do shrink. DBA''s have been known to be off on their space requirements/requests. There is also the human error factor. If someone accidentally grows a zpool there is no easy way to recover that space without down time. Some of my LUNs are in the 1TB range and if that gets added to the wrong zpool that space is basically stuck there until i can get a maintenance window. And then I''m not that''s even possible since my windows are only 3 hours... for example what if I add a LUN to 20TB zpool. What would I do to remove the LUN? I think I would have to create a new 20TB pool and move the data from the original to the new zpool... so that would assume I have a free 20TB and the down time.... This message posted from opensolaris.org
>>>>> "j" == John <fishingjts at chartermi.net> writes:j> There is also the human error factor. If someone accidentally j> grows a zpool or worse, accidentally adds an unredundant vdev to a redundant pool. Once you press return, all you can do is scramble to find mirrors for it. vdev removal is also neeeded to, for example, change each vdev in a big pool of JBOD devices from mirroring to raidz2. in general, for reconfiguring pools'' layouts without outage, not just shrinking. This online-layout-reconfig is also a Veritas selling point, yes?, or is that Veritas feature considered too risky for actual use? For my home user setup, the ability to grow a single vdev by replacing all the disks within it with bigger ones, then export/import, is probably good enough. Note however this is still not quite ``online'''' because export/import is needed to claim the space. Though IIRC some post here said that''s fixed in the latest Nevadas, one would have to look at the whole stack to make sure it''s truly online---can FC and iSCSI gracefully handle a target''s changing size and report it to ZFS, or does FC/iSCSI need to be whacked, or is size change only noticed at zpool replace/attach time? The thing that really made me wish for ''pvmove'' / RFE 4852783 at home so far is the recovering-from-mistaken-add scenario. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080820/cc13ec63/attachment.bin>
John wrote:> Our "enterprise" is about 300TB.. maybe a bit more... > > You are correct that most of the time we grow and not shrink... however, we are fairly dynamic and occasionally do shrink. DBA''s have been known to be off on their space requirements/requests. > >Isn''t that one of the problems ZFS solves? Grow the pool to meet the demand rather than size it for the estimated maximum usage. Even exported vdevs can be thin provisioned. Ian
On Wed, 20 Aug 2008, Miles Nordin wrote:>>>>>> "j" == John <fishingjts at chartermi.net> writes: > > j> There is also the human error factor. If someone accidentally > j> grows a zpool > > or worse, accidentally adds an unredundant vdev to a redundant pool. > Once you press return, all you can do is scramble to find mirrors for > it.Not to detract from the objective to be able to re-shuffle the zfs storage layout, any system administration related to storage is risky business. Few people should be qualified to do it. Studies show that 36% of data loss is due to human error. Once zfs mirroring, raidz, or raidz2 are used to virtually eliminate loss due to hardware or system malfunction, this 36% is increased to a much higher percentage. For example, if loss due to hardware or system malfunction is reduced to just 1% (still a big number) then the human error factor is increased to a wopping 84%. Humans are like a ticking time bomb for data. The errant command which accidentally adds a vdev could just as easily be a command which scrambles up or erases all of the data. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Wed, Aug 20, 2008 at 18:40, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> The errant command which accidentally adds a vdev could just as easily > be a command which scrambles up or erases all of the data.True enough---but if there''s a way to undo accidentally adding a vdev, there''s one source of disastrously bad human error eliminated. If the vdev is removable, then typing "zpool evacuate c3t4d5" to fix the problem instead of getting backups up to date, destroying and recreating the pool, then restoring from backups saves quite a bit of the cost associated with human error in this case. Think of it as the analogue of "zpool import -D": if you screw up, ZFS has a provision to at least try to help. The recent discussion on accepting partial ''zfs recv'' streams is a similar measure. No system is perfectly resilient to human error, but any simple ways in which the resilience (especially of such a large unit as a pool!) can be improved should be considered. Will
Kyle wrote:> ... If I recall, the low priority was based on the percieved low demand > for the feature in enterprise organizations. As I understood it shrinking a > pool is percieved as being a feature most desired by home/hobby/development > users, and that enterprises mainly only grow thier pools, not shrink.Although it''s historically clear that data tends to grow to fill available storable, it should be equally clear that as storage resources are neither free nor inexhaustible; the flexibility to redeploy existing storage resources from one system to another considered more critical may simply necessitate the ability to reduce resources in one pool for transfer to another, and therefore should be easily and efficiently accomplishable without having to otherwise manually shuffle data around like a shell game between multiple storage configurations to achieve the desired results. Seemingly equally valid, if its determined that some storage resources within a pool are beginning to potentially fail and their storage is not literally required at the moment, it would seem like a good idea to have any data which they may contain moved to other resources in the pool, and simply remove them as a storage candidate without having to replace them with alternates which may simply not physically exist at the moment. That being said, fixing bugs which would otherwise render the zfs file system unreliable should always trump "nice to have features". This message posted from opensolaris.org
I''ve heard (though I''d be really interested to read the studies if someone has a link) that a lot of this human error percentage comes at the hardware level. Replacing the wrong physical disk in a RAID-5 disk group, bumping cables, etc. -Aaron On Wed, Aug 20, 2008 at 3:40 PM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Wed, 20 Aug 2008, Miles Nordin wrote: > > >>>>>> "j" == John <fishingjts at chartermi.net> writes: > > > > j> There is also the human error factor. If someone accidentally > > j> grows a zpool > > > > or worse, accidentally adds an unredundant vdev to a redundant pool. > > Once you press return, all you can do is scramble to find mirrors for > > it. > > Not to detract from the objective to be able to re-shuffle the zfs > storage layout, any system administration related to storage is risky > business. Few people should be qualified to do it. Studies show that > 36% of data loss is due to human error. Once zfs mirroring, raidz, or > raidz2 are used to virtually eliminate loss due to hardware or system > malfunction, this 36% is increased to a much higher percentage. For > example, if loss due to hardware or system malfunction is reduced to > just 1% (still a big number) then the human error factor is increased > to a wopping 84%. Humans are like a ticking time bomb for data. > > The errant command which accidentally adds a vdev could just as easily > be a command which scrambles up or erases all of the data. > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080820/673adaae/attachment.html>
John wrote:> Our "enterprise" is about 300TB.. maybe a bit more... > > You are correct that most of the time we grow and not shrink... however, we are fairly dynamic and occasionally do shrink. DBA''s have been known to be off on their space requirements/requests. > >For the record I agree with you and I''m wating for this feature also. I was only citing my recolection of the explanatin given in the past. To add more from my memory, I think the ''Enterprise grows not shrinks'' idea is coming from the idea that in ZFS you should be creating fewer data pools from a few different specific sized LUNs, and using ZFS to allocate filesystems and zVOL''s from the pool, instead of customizing LUN sizes to create more pools each for different purposes. If true, (if you can make all your LUNs one size, and make a few [prefferably one I think] data zPool per server host.) then the need to reduce pool size is diminished. That''s not realistic in the home/hobby/developer market, and I''m not convinced that''s realistic in the enterprise either.> There is also the human error factor. If someone accidentally grows a zpool there is no easy way to recover that space without down time. Some of my LUNs are in the 1TB range and if that gets added to the wrong zpool that space is basically stuck there until i can get a maintenance window. And then I''m not that''s even possible since my windows are only 3 hours... for example what if I add a LUN to 20TB zpool. What would I do to remove the LUN? I think I would have to create a new 20TB pool and move the data from the original to the new zpool... so that would assume I have a free 20TB and the down time.... > >I agree here also, even with a single zpool per server. Consider a policy where when the pool grows you always add a RAIDz2 of 10 200GB LUNs. So your single data pool is currently 3 of these RAIDz2 vdevs, and an admin goes to add 10 more, but forgets the ''raidz2, so you end up with 3 RAIDz2, and 10 single LUN non redundant vdevs. How do you fix that? My suggestion still remains though. Log your enterprises wish for this feature through as many channels as you have into Sun. This list, Sales, Support, every way you can think of. Get it documented, so that when they go to set priorities on RFE''s there''ll be more data on this one. -Kyle> > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Aug 20, 2008, at 6:39 PM, Kyle McDonald wrote:> John wrote: >> Our "enterprise" is about 300TB.. maybe a bit more... >> >> You are correct that most of the time we grow and not shrink... >> however, we are fairly dynamic and occasionally do shrink. DBA''s >> have been known to be off on their space requirements/requests. >> >> > For the record I agree with you and I''m wating for this feature > also. I > was only citing my recolection of the explanatin given in the past. > > To add more from my memory, I think the ''Enterprise grows not shrinks'' > idea is coming from the idea that in ZFS you should be creating fewer > data pools from a few different specific sized LUNs, and using ZFS to > allocate filesystems and zVOL''s from the pool, instead of customizing > LUN sizes to create more pools each for different purposes. If true, > (if > you can make all your LUNs one size, and make a few [prefferably one I > think] data zPool per server host.) then the need to reduce pool > size is > diminished. > > That''s not realistic in the home/hobby/developer market, and I''m not > convinced that''s realistic in the enterprise either. >> There is also the human error factor. If someone accidentally >> grows a zpool there is no easy way to recover that space without >> down time. Some of my LUNs are in the 1TB range and if that gets >> added to the wrong zpool that space is basically stuck there until >> i can get a maintenance window. And then I''m not that''s even >> possible since my windows are only 3 hours... for example what if I >> add a LUN to 20TB zpool. What would I do to remove the LUN? I >> think I would have to create a new 20TB pool and move the data from >> the original to the new zpool... so that would assume I have a free >> 20TB and the down time.... >> >> > I agree here also, even with a single zpool per server. Consider a > policy where when the pool grows you always add a RAIDz2 of 10 200GB > LUNs. So your single data pool is currently 3 of these RAIDz2 vdevs, > and > an admin goes to add 10 more, but forgets the ''raidz2, so you end up > with 3 RAIDz2, and 10 single LUN non redundant vdevs. How do you fix > that? > > My suggestion still remains though. Log your enterprises wish for this > feature through as many channels as you have into Sun. This list, > Sales, > Support, every way you can think of. Get it documented, so that when > they go to set priorities on RFE''s there''ll be more data on this one.Knock yourself out, but it''s really unnecessary. As has been amply documented, on this thread and others, this is already a very high priority for us. It just happens to be rather difficult to do it right. We''re working on it. We''ve heard the message (years ago, actually, just about as soon as we shipped ZFS in S10 6/06.) Your further encouragement is appreciated, but it''s unlikely to speed up what already is deemed a high priority. My 2 cents, Fred> > > -Kyle >> >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Fred Zlotnick Senior Director, Open Filesystem and Sharing Technologies Sun Microsystems, Inc. fred.zlotnick at sun.com x81142/+1 650 352 9298
Zlotnick Fred wrote:> On Aug 20, 2008, at 6:39 PM, Kyle McDonald wrote: > >> >> My suggestion still remains though. Log your enterprises wish for this >> feature through as many channels as you have into Sun. This list, Sales, >> Support, every way you can think of. Get it documented, so that when >> they go to set priorities on RFE''s there''ll be more data on this one. > > Knock yourself out, but it''s really unnecessary. As has been amply > documented, on this thread and others, this is already a very high > priority for us. It just happens to be rather difficult to do it right. > We''re working on it. We''ve heard the message (years ago, actually, just > about as soon as we shipped ZFS in S10 6/06.) Your further encouragement > is appreciated, but it''s unlikely to speed up what already is deemed > a high priority. >Cool. I love it when I''m wrong this way. :) I don''t know where I got it but I really thought it wasn''t seen as a big deal for the larger storage customers. Glad to see I''m wrong, because it''s a real big deal for us little guys. :) -Kyle
Can ADM ease the pain by migrating data only from one pool to the other. I know it''s not what most of you want but... Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +902123352222 Email mertol.ozyoney at sun.com -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Will Murnane Sent: Thursday, August 21, 2008 1:57 AM To: Bob Friesenhahn Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] shrinking a zpool - roadmap On Wed, Aug 20, 2008 at 18:40, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> The errant command which accidentally adds a vdev could just as easily > be a command which scrambles up or erases all of the data.True enough---but if there''s a way to undo accidentally adding a vdev, there''s one source of disastrously bad human error eliminated. If the vdev is removable, then typing "zpool evacuate c3t4d5" to fix the problem instead of getting backups up to date, destroying and recreating the pool, then restoring from backups saves quite a bit of the cost associated with human error in this case. Think of it as the analogue of "zpool import -D": if you screw up, ZFS has a provision to at least try to help. The recent discussion on accepting partial ''zfs recv'' streams is a similar measure. No system is perfectly resilient to human error, but any simple ways in which the resilience (especially of such a large unit as a pool!) can be improved should be considered. Will _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
| The errant command which accidentally adds a vdev could just as easily | be a command which scrambles up or erases all of the data. The difference between a mistaken command that accidentally adds a vdev and the other ways to loose your data with ZFS is that the ''add a vdev'' accident is only one omitted word different from a command that you use routinely. This is a very close distance, especially for fallible humans. (''zpool add ... mirror A B'' and ''zpool add ... spare A''; omit either ''mirror'' or ''spare'' by accident and boom.) - cks
Hello there, I''m working for a bigger customer in germany. The customer ist some thousend TB big. The information that the zpool shrink feature will not be implemented soon is no problem, we just keep using Veritas Storage Foundation. Shirinking a pool is not the only problem with ZFS, try setting up a jumpstart Server with Solaris 10u7 with the media copy on a separate zfs filesystem. Jumpstart puts a loopback mount into the vfstab, and the next boot fails. The Solaris will do the mountall before ZFS starts, so the filesystem service fails and you have not even an sshd to login over the network. Viele Gr??e, rapega -- This message posted from opensolaris.org
Ralf Gans wrote:> > Jumpstart puts a loopback mount into the vfstab, > and the next boot fails. > > The Solaris will do the mountall before ZFS starts, > so the filesystem service fails and you have not even > an sshd to login over the network. >This is why I don''t use the mountpoint settings in ZFS. I set them all to ''legacy'', and put them in the /etc/vfstab myself. I keep many .ISO files on a ZFS filesystem, and I LOFI mount them onto subdirectories of the same ZFS tree, and then (since they are for Jumpstart) loop back mount parts of eacch of the ISO''s into /tftpboot When you''ve got to manage all this other stuff in /etc/vfstab ayway, it''s easier to manage ZFS there too. I don''t see it as a hardship, and I don''t see the value of doing it in ZFS to be honest (unless every filesystem you have is in ZFS maybe.) The same with sharing this stuff through NFS. I since the LOFI mounts are separate filesystems, I have to share them with share (or sharemgr) and it''s easier to share the ZFS diretories through those commands at the same time. I must be missing something, but I''m not sure I get the rationale behind duplicating all this admin stuff inside ZFS. -Kyle
Hello out there, is there any progress in shrinking zpools? i.e. removing vdevs from a pool? Cheers, Ralf -- This message posted from opensolaris.org
I talked with our enterprise systems people recently. I don''t believe they''d consider ZFS until it''s more flexible. Shrink is a big one, as is removing an slog. We also need to be able to expand a raidz, possibly by striping it with a second one and then rebalancing the sizes. -- This message posted from opensolaris.org
On Feb 22, 2010, at 6:42 PM, Charles Hedrick wrote:> I talked with our enterprise systems people recently. I don''t believe they''d consider ZFS until it''s more flexible. Shrink is a big one, as is removing an slog. We also need to be able to expand a raidz, possibly by striping it with a second one and then rebalancing the sizes.So what file system do they use that has all of these features? :-P -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)
On 23/02/2010 02:52, Richard Elling wrote:> On Feb 22, 2010, at 6:42 PM, Charles Hedrick wrote: > >> I talked with our enterprise systems people recently. I don''t believe they''d consider ZFS until it''s more flexible. Shrink is a big one, as is removing an slog. We also need to be able to expand a raidz, possibly by striping it with a second one and then rebalancing the sizes. >> > So what file system do they use that has all of these features? :-P > >VxVM + VxFS? -- Robert Milkowski http://milek.blogspot.com
On Feb 23, 2010, at 5:10 AM, Robert Milkowski wrote:> On 23/02/2010 02:52, Richard Elling wrote: >> On Feb 22, 2010, at 6:42 PM, Charles Hedrick wrote: >> >>> I talked with our enterprise systems people recently. I don''t believe they''d consider ZFS until it''s more flexible. Shrink is a big one, as is removing an slog. We also need to be able to expand a raidz, possibly by striping it with a second one and then rebalancing the sizes. >>> >> So what file system do they use that has all of these features? :-P >> >> > VxVM + VxFS?I did know they still cost $$$, but I didn''t know they implemented a slog :-P -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)
On 23/02/2010 17:20, Richard Elling wrote:> On Feb 23, 2010, at 5:10 AM, Robert Milkowski wrote: > >> On 23/02/2010 02:52, Richard Elling wrote: >> >>> On Feb 22, 2010, at 6:42 PM, Charles Hedrick wrote: >>> >>> >>>> I talked with our enterprise systems people recently. I don''t believe they''d consider ZFS until it''s more flexible. Shrink is a big one, as is removing an slog. We also need to be able to expand a raidz, possibly by striping it with a second one and then rebalancing the sizes. >>>> >>>> >>> So what file system do they use that has all of these features? :-P >>> >>> >>> >> VxVM + VxFS? >> > I did know they still cost $$$, but I didn''t know they implemented a slog :-P > >you got me! :) I missed the reference to a slog. -- Robert Milkowski http://milek.blogspot.com