Hi, I know there is no single-command way to shrink a zpool (say evacuate the data from a disk and then remove the disk from a pool), but is there a logical way? I.e mirrror the pool to a smaller pool and then split the mirror? In this case I''m not talking about disk size (moving from 4 X 72GB disks to 4 X 36GB disks) but rather moving from 4 X 73GB disks to 3 X 73GB disks assuming the pool would fit on 3X. I haven''t found a way but thought I might be missing something. Thanks. This message posted from opensolaris.org
On Mon, Apr 24, 2006 at 02:03:39PM -0700, Peter Baer Galvin wrote:> Hi, I know there is no single-command way to shrink a zpool (say > evacuate the data from a disk and then remove the disk from a pool), > but is there a logical way? I.e mirrror the pool to a smaller pool and > then split the mirror? In this case I''m not talking about disk size > (moving from 4 X 72GB disks to 4 X 36GB disks) but rather moving from > 4 X 73GB disks to 3 X 73GB disks assuming the pool would fit on 3X. I > haven''t found a way but thought I might be missing something. Thanks.There is not any way to do what you want, currently. This is RFE 4852783, "reduce pool capacity". We''ve thought about this quite a bit over the years, but it''s quite tricky to implement. Sorry, --matt
It''s a bloody useful feature though. Just been working with someone who was doing a migration of data from one set of disks to another on a Tru64 5.1B cluster using an AdvFS domain. The procedure was along the lines of add disks to the SAN, create a RAID5 volume and present it to the cluster, add the volume to the AdvFS domain, call the command to remove the old volume and wait for the I/O to complete. The data was migrated to the new set of disks at the OS level rather than the SAN level and the old ones were free for reuse. All while the production Oracle database was still up. That was pretty impressive, Cheers, Alan This message posted from opensolaris.org
Hello Alan, Tuesday, April 25, 2006, 11:50:31 AM, you wrote: AR> It''s a bloody useful feature though. Just been working with AR> someone who was doing a migration of data from one set of disks to AR> another on a Tru64 5.1B cluster using an AdvFS domain. The AR> procedure was along the lines of add disks to the SAN, create a AR> RAID5 volume and present it to the cluster, add the volume to the AR> AdvFS domain, call the command to remove the old volume and wait AR> for the I/O to complete. The data was migrated to the new set of AR> disks at the OS level rather than the SAN level and the old ones AR> were free for reuse. All while the production Oracle database was still up. AR> That was pretty impressive, I was doing something similar with VxVM on Solaris. Although I rarely use it this is really great feature. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
I am the customer that prompted pbg to submit this query. In our experience, the ability to grow and shrink logical volumes and filesystems have been crucial to management of our disk environment. In today''s corporate environment where the push is to do more with less, the ability to migrate SAN disk from machine to machine for temporary usage (within existing filesystems) helps us to save on operational expenses. If we have to buy more disk for each temporary expansion (or worse, have to create new zpools, then take the filesystem offline, so that we can copy to a new zpool, and re-mount to the original mount-point) it seems a step backwards to me. I have to commend the zfs group for the efforts that they''ve taken so far with what appears to be a very robust and advanced fileystem / volume management system. However, I would also have to say that they''ve missed a most basic component of said management. With jfs/jfs2/ufs/vxfs we currently have the ability to shrink or evacuate disks so that they can be re-allocated. It would be a shame to not be able to use zfs due to a lack of such a fundamental feature. JMTCW - for what it''s worth... This message posted from opensolaris.org
Yes, it is well known by the ZFS team that this is a missing feature. We made the judgement call, for better or worse, that it was acceptable to ship without this feature available. Since integration we''ve been focused on stability and performance in preparation for the S10U2 backport. Now that this work is completed, we''ll be able to focus on some of the larger features, of which device removal is a major one. Two of the first features you''ll see (hopefully in the next two weeks) will be hot spare support and double-parity RAID-Z. Hope that makes our course a little clearer, - Eric On Tue, Apr 25, 2006 at 08:13:54AM -0700, Larry Becke wrote:> I am the customer that prompted pbg to submit this query. > > In our experience, the ability to grow and shrink logical volumes and filesystems have been crucial to management of our disk environment. > > In today''s corporate environment where the push is to do more with less, the ability to migrate SAN disk from machine to machine for temporary usage (within existing filesystems) helps us to save on operational expenses. > > If we have to buy more disk for each temporary expansion (or worse, have to create new zpools, then take the filesystem offline, so that we can copy to a new zpool, and re-mount to the original mount-point) it seems a step backwards to me. > > I have to commend the zfs group for the efforts that they''ve taken so far with what appears to be a very robust and advanced fileystem / volume management system. However, I would also have to say that they''ve missed a most basic component of said management. > > With jfs/jfs2/ufs/vxfs we currently have the ability to shrink or evacuate disks so that they can be re-allocated. It would be a shame to not be able to use zfs due to a lack of such a fundamental feature. > > JMTCW - for what it''s worth... > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Tue, 2006-04-25 at 08:13 -0700, Larry Becke wrote:> I am the customer that prompted pbg to submit this query. > > In our experience, the ability to grow and shrink logical volumes and filesystems have been crucial to management of our disk environment. > > In today''s corporate environment where the push is to do more with less, the ability to migrate SAN disk from machine to machine for temporary usage (within existing filesystems) helps us to save on operational expenses. > > If we have to buy more disk for each temporary expansion (or worse, have to create new zpools, then take the filesystem offline, so that we can copy to a new zpool, and re-mount to the original mount-point) it seems a step backwards to me. > > I have to commend the zfs group for the efforts that they''ve taken so far with what appears to be a very robust and advanced fileystem / volume management system. However, I would also have to say that they''ve missed a most basic component of said management.You are the 2nd customer I''ve ever heard of to use shrink. I think the ZFS team wants to implement shrink, but perhaps it is not on the top of the priority list, as so few customers actually use it. Grow is much more commonly used.> With jfs/jfs2/ufs/vxfs we currently have the ability to shrink or evacuate disks so that they can be re-allocated. It would be a shame to not be able to use zfs due to a lack of such a fundamental feature.UFS on Solaris does not have the ability to shrink. -- richard
I think I understand the need here but I think it also helps to remember that ZFS is different. You don''t need to grow or shrink the actual file systems at all, just use quotas and reservations if you need to. However I do get the need to change the "shape" and "makeup" of a pool and this is one of the reasons why when we add crypto to ZFS (which I''m working on now) that the crypto policy is NOT at the level of a pool but at the level of a ZFS data set. This ensures that if/when ZFS gains the ability to reshape how a pool is built the cryptographic integrity of the data remains. Like Richard said though, you can''t shrink UFS at all on Solaris so what are you doing there ? -- Darren J Moffat
On 4/25/06, Richard Elling <Richard.Elling at sun.com> wrote:> > > > You are the 2nd customer I''ve ever heard of to use shrink. > I think the ZFS team wants to implement shrink, but perhaps > it is not on the top of the priority list, as so few customers > actually use it. Grow is much more commonly used.It''s _commonly_ used by LVM customers on AIX ( reducevg ). When you choose a platform, you live with the limition for other gains. You don''t hear many Solaris customers asking for this feature doesn''t mean people are not using it on other platforms.> With jfs/jfs2/ufs/vxfs we currently have the ability to shrink or evacuate > disks so that they can be re-allocated. It would be a shame to not be able > to use zfs due to a lack of such a fundamental feature. > > UFS on Solaris does not have the ability to shrink.UFS/SVM also miss many other features comparing to vxfs/lvm etc, so?>From Eric''s post above, it''s clear that the ZFS team know the importance ofthis feature, it''s simply a more difficult problem (given the design characteristic of ZFS -- I think). Let''s hope the team can make it happen sooner rather than later. Tao -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060425/5b5d85ec/attachment.html>
It would be better, imho, if instead of saying, "We need a shrink utility" the discussion be upleveled a bit. Shrinking is a means to an end. What''s the real requirement? I could venture some guesses but I''d rather not corrupt the discussion. At least not anymore then I normally do. ;)
With ufs we do not, however, with metadevices, we can create a new mirror set, attach it to the existing, then once the mirrored device (with 1 less drive) is done synchronizing, we can detach the original set. So, in effect, we do get the ability to shrink, albeit with a little more legwork involved. I appreciate the efforts involved in geting everything ready, stable, reliable. I just felt I should go ahead and state our position on this, as it effects our roll-out efforts. Is there somewhere I can look to see a timeline on when certain features / requests may be worked on? Thanks! This message posted from opensolaris.org
On Tue, 2006-04-25 at 02:50 -0700, Alan Romeril wrote:> It''s a bloody useful feature though. Just been working with someone who was doing a migration of data from one set of disks to another on a Tru64 5.1B cluster using an AdvFS domain. The procedure was along the lines of add disks to the SAN, create a RAID5 volume and present it to the cluster, add the volume to the AdvFS domain, call the command to remove the old volume and wait for the I/O to complete. The data was migrated to the new set of disks at the OS level rather than the SAN level and the old ones were free for reuse. All while the production Oracle database was still up. > That was pretty impressive,This isn''t my definition of shrinking. I define shrinking as making an existing file system smaller by shrinking the size of the underlying partition (eg. VxFS). What you are describing is more along the lines of attaching and detaching a mirror. -- richard
My apologies, I was mistaken on the UFS. It appears that the process involved using ufsdump/ufsrestore to get the copy created - which would have worked fine. We were using vxfs on most of the Sun systems, which is why we''ve not had to shrink ufs filesystems very often on the Sun platforms. I guess that if we need to shrink, we can utilize the same approach until such time as shrink is implemented. This message posted from opensolaris.org
We need the ability to remove drive(s) from the pool, without replacing them with anything. ie - if there''s enough space for the data on one (or more) drives to be moved to other drives within the pool, then empty/evacuate the requested drives, and detach them from the pool. This message posted from opensolaris.org
On Tue, Apr 25, 2006 at 11:16:42AM -0700, Larry Becke wrote:> My apologies, I was mistaken on the UFS. > > It appears that the process involved using ufsdump/ufsrestore to get > the copy created - which would have worked fine. > > We were using vxfs on most of the Sun systems, which is why we''ve not > had to shrink ufs filesystems very often on the Sun platforms. > > I guess that if we need to shrink, we can utilize the same approach > until such time as shrink is implemented.Yep, and you can use ''zfs send/recv'' to make this go a bit quicker. We''ll be working on some enhancements to ''zfs send'' so that you can send a whole pool with one command. However, I''m still not entirely clear on *why* you want to shrink a zpool. Your original post mentioned moving disks from one machine to another, but I didn''t follow that entirely. If we can understand what problem you''re trying to solve, then maybe we can suggest a better solution, or at least design the ''shrink'' feature to work well for you. --matt
On 4/25/06, Richard Elling <Richard.Elling at sun.com> wrote:> > Thanks for the feedback. I''ve asked all my friends who > run AIX shops, and they never use it (reducevg) either. > Is there a common business case for this, or is it just > another facet of data space management, such as reallocation? > -- richard > >Ok, I used the word "commonly" to contrast your "2nd customer I''ve ever heard" :-) I do hear it regularly as part of disk operations when LVM users need to move disks among hosts and/or different volumn group, but that''s because I am close to the LVM team, who support users around the globe. What''s the difference between "a common business case" and "just another facet of data space management"? For us in service, every customer situation is a business case. Thanks, Tao -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060425/7ae79491/attachment.html>
On Tue, Apr 25, 2006 at 11:15:11AM -0700, Richard Elling wrote:> On Tue, 2006-04-25 at 02:50 -0700, Alan Romeril wrote: > > It''s a bloody useful feature though. Just been working with someone who was doing a migration of data from one set of disks to another on a Tru64 5.1B cluster using an AdvFS domain. The procedure was along the lines of add disks to the SAN, create a RAID5 volume and present it to the cluster, add the volume to the AdvFS domain, call the command to remove the old volume and wait for the I/O to complete. The data was migrated to the new set of disks at the OS level rather than the SAN level and the old ones were free for reuse. All while the production Oracle database was still up. > > That was pretty impressive, > > This isn''t my definition of shrinking. I define shrinking > as making an existing file system smaller by shrinking the > size of the underlying partition (eg. VxFS). What you are > describing is more along the lines of attaching and detaching > a mirror.Yeah, the subject is "Shrinking a zpool" which to me reads like "changing the vdev makeup of a zpool." So we have two possible things being requested: - changing the vdev makeup of a zpool - changing the size of the LUN/partition/whatever underlying a vdev I imagine the first one would be a lot easier to accomplish than the latter. I also imagine that with pools and filesystem reservations/quotas noone should want to play games with LUNs and partitioning, which leaves us with the first thing.
On 4/25/06, Torrey McMahon <Torrey.McMahon at sun.com> wrote:> > It would be better, imho, if instead of saying, "We need a shrink > utility" the discussion be upleveled a bit. Shrinking is a means to an > end. What''s the real requirement? I could venture some guesses but I''d > rather not corrupt the discussion. At least not anymore then I normally > do. ;) >I see this discussion is already going to the way you forsaw :). What I expect "shrink zpool" to do is: 1) Remove a disk from a zpool. 2) If that disk already has user data, provide a way to move the data to other disks in the same zpool. Example: 1) reducevg http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.doc/cmds/aixcmds4/reducevg.htm 2) migratepv http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.doc/cmds/aixcmds3/migratepv.htm Tao -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060425/4d12f2fd/attachment.html>
Tao Chen wrote:> > On 4/25/06, *Torrey McMahon* <Torrey.McMahon at sun.com > <mailto:Torrey.McMahon at sun.com>> wrote: > > It would be better, imho, if instead of saying, "We need a shrink > utility" the discussion be upleveled a bit. Shrinking is a means to an > end. What''s the real requirement? I could venture some guesses but > I''d > rather not corrupt the discussion. At least not anymore then I > normally > do. ;) > > > I see this discussion is already going to the way you forsaw :). > > What I expect "shrink zpool" to do is:Actually, what are the requirements that suggest you need to shrink pools. I''m not arguing that it won''t be a required feature but I''d rather have the actual requirements before a solution is suggested. Is it because you over allocated storage in the first place and need to remove some for an other system? Need to reconfigure your storage array with new drives? Move data to and fro?
A similar feature exists on VxVM to re-layout storage. It has a lot of value in certain circumstances mostly revolving around migrating from one storage solution to another. However, that feature didn''t arrive until years after the initial product rollout. I think that having a stable product is more important in the short term than having a product with every feature. Once ZFS has been out (and on the system disk!) a year, the feature should be in the product. On Apr 25, 2006, at 12:39 PM, Nicolas Williams wrote:> On Tue, Apr 25, 2006 at 11:15:11AM -0700, Richard Elling wrote: >> On Tue, 2006-04-25 at 02:50 -0700, Alan Romeril wrote: >>> It''s a bloody useful feature though. Just been working with >>> someone who was doing a migration of data from one set of disks >>> to another on a Tru64 5.1B cluster using an AdvFS domain. The >>> procedure was along the lines of add disks to the SAN, create a >>> RAID5 volume and present it to the cluster, add the volume to the >>> AdvFS domain, call the command to remove the old volume and wait >>> for the I/O to complete. The data was migrated to the new set of >>> disks at the OS level rather than the SAN level and the old ones >>> were free for reuse. All while the production Oracle database >>> was still up. >>> That was pretty impressive, >> >> This isn''t my definition of shrinking. I define shrinking >> as making an existing file system smaller by shrinking the >> size of the underlying partition (eg. VxFS). What you are >> describing is more along the lines of attaching and detaching >> a mirror. > > Yeah, the subject is "Shrinking a zpool" which to me reads like > "changing the vdev makeup of a zpool." > > So we have two possible things being requested: > > - changing the vdev makeup of a zpool > > - changing the size of the LUN/partition/whatever underlying a vdev > > I imagine the first one would be a lot easier to accomplish than the > latter. > > I also imagine that with pools and filesystem reservations/quotas > noone > should want to play games with LUNs and partitioning, which leaves us > with the first thing. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg.shaw at sun.com (work) Louisville, CO 80028-4382 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve Won." - Linus Torvalds
On Tue, 2006-04-25 at 13:48 -0500, Tao Chen wrote:> > On 4/25/06, Torrey McMahon <Torrey.McMahon at sun.com> wrote: > It would be better, imho, if instead of saying, "We need a > shrink > utility" the discussion be upleveled a bit. Shrinking is a > means to an > end. What''s the real requirement? I could venture some guesses > but I''d > rather not corrupt the discussion. At least not anymore then I > normally > do. ;) > > I see this discussion is already going to the way you forsaw :).:-)> What I expect "shrink zpool" to do is: > > 1) Remove a disk from a zpool. > 2) If that disk already has user data, provide a way to move the data > to other disks in the same zpool.Like zpool detach without the current limitations? Greg Shaw adds:> A similar feature exists on VxVM to re-layout storage. It has a lot > of value in certain circumstances mostly revolving around migrating > from one storage solution to another.Or, more like zpool replace without the current limitations? -- richard
On 4/25/06, Richard Elling <Richard.Elling at sun.com> wrote:> > > > What I expect "shrink zpool" to do is: > > > > 1) Remove a disk from a zpool. > > 2) If that disk already has user data, provide a way to move the data > > to other disks in the same zpool. > > Like zpool detach without the current limitations? > > Greg Shaw adds: > > A similar feature exists on VxVM to re-layout storage. It has a lot > > of value in certain circumstances mostly revolving around migrating > > from one storage solution to another. > > Or, more like zpool replace without the current limitations? > -- richardRichard, Torrey, I just mistakenly added my 1-TB LUN to a zpool, instead of the 18-GB one I really wanted to add. Boss is beating me ... gotta go to rebuild the zpool from scratch ! :-) I worked late last night, need some rest to get ready for tonight''s VMM user group meeting ;-) Will post later. Tao -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060425/75d3b8f6/attachment.html>
What I''d like to see is the ability for a zfs pool configuration to be as dynamic on the "outside" (in terms of its interaction with storage) as it is on the "inside" (in terms of its interaction with the host). For instance, I can forsee wanting to take a pool consisting of a pile of disks in one replication configuration (say, mirrors) and reshuffle it into a different replication configuration (say, raidz with some fault-tolerance level) online without applications noticing that this reshuffling was going on. (This would remove any reservations I''d have to preconfiguring pools in the factory -- because I''d have the freedom to reconfigure a pool without discarding its contents...) Or moving a pool online from, say, a 44x72G JBOD-pair to, say, a 12x300G JBOD with no outage to applications, without needing to pre-slice the 300G disks to simulate a bunch of smaller disks (which would probably confuse ZFS''s I/O scheduler in the long run). - Bill
On Tue, Apr 25, 2006 at 06:40:19PM -0400, Bill Sommerfeld wrote:> What I''d like to see is the ability for a zfs pool configuration to be > as dynamic on the "outside" (in terms of its interaction with storage) > as it is on the "inside" (in terms of its interaction with the host).Random thoughts: Suppose you could mark vdevs as "read-only," "write-only," "replace-with-other-vdev." "Write-only" vdevs would be useful for mirroring in pools where one mirror is a RAM disk (see recent posts on that). They might also be useful for one-way replication to remote iSCSI storage. "Read-only" would be cool for creating a pool with a DVD-ROM as one mirror and some writable storage as another; then scrub and detach the DVD-ROM. "Replace-with-other-vdev" vdevs might cause a scrub to copy all blocks from the old vdevs to the new ones and then would replace the old vdevs with the new ones in the pool. Nico --
On Tue, Apr 25, 2006 at 06:08:39PM -0500, Nicolas Williams wrote:> Random thoughts: > > Suppose you could mark vdevs as "read-only," "write-only," > "replace-with-other-vdev." > > "Write-only" vdevs would be useful for mirroring in pools where one > mirror is a RAM disk (see recent posts on that). They might also be > useful for one-way replication to remote iSCSI storage.And a combination of "read-only" and "write-only" might be used to further mirror/replicate remote replicas, provided that ZFS could ignore vdevs not locally available, or provided that the entire pool configuration needn''t be stored in the self-same pool.
Hi Team ZFS and other folks, Let me try to summarize this discussion. It would be great to be able to reduce the number of disks in a zpool. If multiple steps were required (i.e. mirror to a smaller number of disks and then split the mirror, destroy original zpool) that would be okay if it could all be done live. No matter how you solve it, please do so as soon as all of the other great stuff that we want from ZFS is done (my personal favorite - bootable ZFS :-). This message posted from opensolaris.org
We use a boatload of storage here, and the migrations in this environment (where frames come and go almost monthly) is about the only constant. The churn of frames is a reality that every large storage consumer deals with, and with frames getting larger (yet keeping the same 3-4 year lifespan) so this problem is going to get worse before it gets better. The VxEvacutate command is an asolute livesaver for these migrations, but that functionality can also be performed by zfs''s normal replace-drive policy.... (I think we''re covered there, not having this migration mechanism would have been a showstopper for us.) Where I think there could be some ''stranded storage'' is where data is ILM-ed out of the original high-speed/top-tier pool to some other pool, another system, tape, what have you. Now the problem is that you have (perhaps significant amounts) of ''stranded storage'' where you are paying (chargeback etc) for storage you''re not using. (And you can''t give it back to the frame for other systems/pools to use). (Its also bad practice to start co-locating other customer''s data into this pool as you now have pieces of data in pools where it doesn''t belong which will cause trouble with zfs-exports to other systems etc.) What''s needed is some way to remove a disk/disk-pair from a zpool. (The idea is that when the command is issued, some scrub runs to ''mirror/delete'' any data/metadata onto other space in the pool. If it doesn''t fit, the command fails etc. The challenge ofcourse is how to deal with raidz groups with parity, snapshots and who knows what. This clearly is not a trivial problem, and at least the migration piece isn''t driving this need. (With some of the larger or more dynamic storage-users out there, this is going to be an often-requested RFE) I think for the time being its going to be very important for storage admins to use cautious sizing forecasts, perhaps with some of the sparse/oversubscription features to avoid any stranded storage situations. (Its very hard to explain to a client that they have to pay for storage they''re not actually using, and that we can not detach the data/storage they''re not using... (Huh?) Without some potentially major downtime etc.) Anyone have any thoughts on best practives on dealing with/avoiding stranded storage situations? Interesting discussion, (and I think a lot more common than 1st suspected) -- MikeE ----- Original Message ----- From: zfs-discuss-bounces at opensolaris.org <zfs-discuss-bounces at opensolaris.org> To: Nicolas Williams <Nicolas.Williams at sun.com> Cc: zfs-discuss at opensolaris.org <zfs-discuss at opensolaris.org>; Alan Romeril <a.romeril at boanerges.info>; Richard Elling <Richard.Elling at sun.com> Sent: Tue Apr 25 14:56:03 2006 Subject: Re: [zfs-discuss] Re: Shrinking a zpool? A similar feature exists on VxVM to re-layout storage. It has a lot of value in certain circumstances mostly revolving around migrating from one storage solution to another. However, that feature didn''t arrive until years after the initial product rollout. I think that having a stable product is more important in the short term than having a product with every feature. Once ZFS has been out (and on the system disk!) a year, the feature should be in the product. On Apr 25, 2006, at 12:39 PM, Nicolas Williams wrote:> On Tue, Apr 25, 2006 at 11:15:11AM -0700, Richard Elling wrote: >> On Tue, 2006-04-25 at 02:50 -0700, Alan Romeril wrote: >>> It''s a bloody useful feature though. Just been working with >>> someone who was doing a migration of data from one set of disks >>> to another on a Tru64 5.1B cluster using an AdvFS domain. The >>> procedure was along the lines of add disks to the SAN, create a >>> RAID5 volume and present it to the cluster, add the volume to the >>> AdvFS domain, call the command to remove the old volume and wait >>> for the I/O to complete. The data was migrated to the new set of >>> disks at the OS level rather than the SAN level and the old ones >>> were free for reuse. All while the production Oracle database >>> was still up. >>> That was pretty impressive, >> >> This isn''t my definition of shrinking. I define shrinking >> as making an existing file system smaller by shrinking the >> size of the underlying partition (eg. VxFS). What you are >> describing is more along the lines of attaching and detaching >> a mirror. > > Yeah, the subject is "Shrinking a zpool" which to me reads like > "changing the vdev makeup of a zpool." > > So we have two possible things being requested: > > - changing the vdev makeup of a zpool > > - changing the size of the LUN/partition/whatever underlying a vdev > > I imagine the first one would be a lot easier to accomplish than the > latter. > > I also imagine that with pools and filesystem reservations/quotas > noone > should want to play games with LUNs and partitioning, which leaves us > with the first thing. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg.shaw at sun.com (work) Louisville, CO 80028-4382 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve Won." - Linus Torvalds _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
One thing that may or may not make a difference here. All of our disk is via SAN, and is protected in the SAN. We will never be using RAID5, nor mirroring. At most, we will be doing simple striping. I can fully understand having an issue with removing a disk from a striping w/parity subset, and would hate to even think about the consequences of such an act. Evacuating a disk that is part of a stripe / concat set (yes, I''m showing my VxFS/VxVM mentality here) should, I would think, be easier to code than the other. This message posted from opensolaris.org
On Tue, Apr 25, 2006 at 01:41:20PM -0400, Torrey McMahon wrote:> It would be better, imho, if instead of saying, "We need a shrink > utility" the discussion be upleveled a bit. Shrinking is a means to an > end. What''s the real requirement? I could venture some guesses but I''d > rather not corrupt the discussion. At least not anymore then I normally > do. ;)We, in our environment, use arrays internal, hardware copy (EMC calls it BCV, Hitachi - ShadowImage). The hardware copy is based on LUNs: "left" LUN (primary volume - PV) is the LUN which we are working on and the right LUN (secondary volume - SV) is (re)synchronized every time we need exact copy of PV. Let''s say we have a new project involving a few zones on the same server. We are going to put each zone''s data on its own volume. But at the very begin we really don''t know how each zone will grow in data in the future. We estimate that zone 1 will grow faster than zone 2: z1 - zone 1 z2 - zone 2 p1 - LUN with primary volume of zone 1 p2 - LUN with primary volume of zone 2 s1 - LUN with secondary volume of zone 1 s2 - LUN with secondary volume of zone 2 <-> (re)synchronization z1 z1 z1 z1 z2 z2 z1 z1 z1 z1 z2 z2 LUNs p1 p1 p1 p1 p2 p2 <-> s1 s1 s1 s1 s2 s2 After a month we realize that data in zone 2 grows much faster that data in zone 1. So we shrink zone''s 1 volume by removing one LUN (p1) from zone 1 (which still have plenty if space) and add it to zone 2: z1 z1 z1 z2 z2 z2 z1 z1 z1 z2 z2 z2 LUNs p1 p1 p1 p2 p2 p2 <-> s1 s1 s1 s2 s2 s2 So having the ability to shrink the filesystem is *must* (we cannot get rid of the hardware copy). Currently we use VxVM/VxFS. But if ZFS could shrink a pool we could, in the future, jump on ZFS wagon. The above example is based on real life cases. Regards przemol
On Wed, 2006-04-26 at 09:32 +0200, przemolicc at poczta.fm wrote:> On Tue, Apr 25, 2006 at 01:41:20PM -0400, Torrey McMahon wrote: > > It would be better, imho, if instead of saying, "We need a shrink > > utility" the discussion be upleveled a bit. Shrinking is a means to an > > end. What''s the real requirement? I could venture some guesses but I''d > > rather not corrupt the discussion. At least not anymore then I normally > > do. ;) > > We, in our environment, use arrays internal, hardware copy (EMC calls it BCV, Hitachi - ShadowImage). > The hardware copy is based on LUNs: "left" LUN (primary volume - PV) is the LUN > which we are working on and the right LUN (secondary volume - SV) is (re)synchronized > every time we need exact copy of PV. > Let''s say we have a new project involving a few zones on the same server. We are going > to put each zone''s data on its own volume. But at the very begin we really don''t know > how each zone will grow in data in the future. We estimate that zone 1 will grow > faster than zone 2:I think that you are trying to use your existing terminology for ZFS. ZFS changes the rules because it removes the need for you to think of LUNs in the way you currently do. Rather, think of the pool of storage which can be allocated and reallocated as needed. From what I know of your situation, ZFS would be ideal because: 1. the on-disk data is always consistent -- ideal for BCVs 2. the allocations can be changed on the fly without bothering the slice/LUN layer at all. Sometimes, when the rules change, you get tripped up by your knowledge :-) -- richard
Richard Elling wrote:>On Wed, 2006-04-26 at 09:32 +0200, przemolicc at poczta.fm wrote: > > >>On Tue, Apr 25, 2006 at 01:41:20PM -0400, Torrey McMahon wrote: >> >> >>>It would be better, imho, if instead of saying, "We need a shrink >>>utility" the discussion be upleveled a bit. Shrinking is a means to an >>>end. What''s the real requirement? I could venture some guesses but I''d >>>rather not corrupt the discussion. At least not anymore then I normally >>>do. ;) >>> >>> >>We, in our environment, use arrays internal, hardware copy (EMC calls it BCV, Hitachi - ShadowImage). >>The hardware copy is based on LUNs: "left" LUN (primary volume - PV) is the LUN >>which we are working on and the right LUN (secondary volume - SV) is (re)synchronized >>every time we need exact copy of PV. >>Let''s say we have a new project involving a few zones on the same server. We are going >>to put each zone''s data on its own volume. But at the very begin we really don''t know >>how each zone will grow in data in the future. We estimate that zone 1 will grow >>faster than zone 2: >> >> > >I think that you are trying to use your existing terminology for ZFS. >ZFS changes the rules because it removes the need for you to think >of LUNs in the way you currently do. Rather, think of the pool of >storage which can be allocated and reallocated as needed. > >The only caveat to that is that there does appear to be some bounds on how many LUNs should be in a pool with raid-z before there is some degradation in performance as ZFS spreads the load out. Which tends to put paid to the idea of just continually adding more and more disks to a pool to increase disk space. Darren
Hello Richard, Wednesday, April 26, 2006, 8:50:20 PM, you wrote: RE> From what I know of your situation, ZFS would be ideal because: RE> 1. the on-disk data is always consistent -- ideal for BCVs Well, I''m still not convinced with ZFS + BCVs - snapshots, no problems. But normal filesystems - were, they can get corrupted and right now it could lead to system panic. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello Richard, > > Wednesday, April 26, 2006, 8:50:20 PM, you wrote: > > RE> From what I know of your situation, ZFS would be ideal because: > RE> 1. the on-disk data is always consistent -- ideal for BCVs > > Well, I''m still not convinced with ZFS + BCVs - snapshots, no problems. But normal > filesystems - were, they can get corrupted and right now it could lead > to system panic.The issue with taking a snapshot at the volume/lun level is that all of the volumes/luns in the zpool would need to be snapped at the same time. Otherwise the pool itself would not be consistent within the snapshot. If you have one BCV that backed your zpool then it should work. If you have ten LUNs in a zpool and you need to issue ten BCV operations, one to each LUN, you might not be able to do it in a fashion that would ensure consistency. (Not knowing exactly how BCVs work I can''t say one way or an other.) Here''s a silly example: Put two LUNs in a zpool. Snapshot one. Wait ten minutes while doing i/o to the pool. Replace one, and only one, of the LUNs with it''s snapshot from ten minutes. Me thinks you''ll see some problems.
Hello Torrey, Thursday, April 27, 2006, 5:42:50 AM, you wrote: TM> Robert Milkowski wrote:>> Hello Richard, >> >> Wednesday, April 26, 2006, 8:50:20 PM, you wrote: >> >> RE> From what I know of your situation, ZFS would be ideal because: >> RE> 1. the on-disk data is always consistent -- ideal for BCVs >> >> Well, I''m still not convinced with ZFS + BCVs - snapshots, no problems. But normal >> filesystems - were, they can get corrupted and right now it could lead >> to system panic.TM> The issue with taking a snapshot at the volume/lun level is that all of TM> the volumes/luns in the zpool would need to be snapped at the same time. TM> Otherwise the pool itself would not be consistent within the snapshot. TM> If you have one BCV that backed your zpool then it should work. If you TM> have ten LUNs in a zpool and you need to issue ten BCV operations, one TM> to each LUN, you might not be able to do it in a fashion that would TM> ensure consistency. (Not knowing exactly how BCVs work I can''t say one TM> way or an other.) TM> Here''s a silly example: Put two LUNs in a zpool. Snapshot one. Wait ten TM> minutes while doing i/o to the pool. Replace one, and only one, of the TM> LUNs with it''s snapshot from ten minutes. Me thinks you''ll see some TM> problems. Exactly my point. That''s why I think ZFS is not well suited for BCV like functionality - ok, I haven''t checked it myself (yet) but it just can''t work properly every time with many luns/bcvs. And I don''t think that snapshot for a whole pool is needed - rather something like ''zpool lock pool_name'' - so entire pool is guaranteed to be consistent on disk and no new modifications will occur (reads are ok) as lon as one specify ''zpool unlock pool_name''. It works that way on EMC celerra. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On Wed, Apr 26, 2006 at 11:50:20AM -0700, Richard Elling wrote:> I think that you are trying to use your existing terminology for ZFS. > ZFS changes the rules because it removes the need for you to think > of LUNs in the way you currently do. Rather, think of the pool of > storage which can be allocated and reallocated as needed. > > From what I know of your situation, ZFS would be ideal because: > 1. the on-disk data is always consistent -- ideal for BCVsI haven''t written anything about inconsistency of on-disk data. I haven''t claimed that ZFS cannot cooperate with BCVs regarding data consistency. Someone just requested real life examples why we need shrinking a zpool.> 2. the allocations can be changed on the fly without bothering > the slice/LUN layer at all.Allocation of what ? LUNs ? "Allocation" is general term and you can allocate many things. :-)> Sometimes, when the rules change, you get tripped up by your > knowledge :-)Indeed :-) But I cannot confirm (so far) that it is this time. :-) przemol
more below [the eternally morphing thread :-)] ... On Thu, 2006-04-27 at 09:05 +0200, przemolicc at poczta.fm wrote:> On Wed, Apr 26, 2006 at 11:50:20AM -0700, Richard Elling wrote: > > I think that you are trying to use your existing terminology for ZFS. > > ZFS changes the rules because it removes the need for you to think > > of LUNs in the way you currently do. Rather, think of the pool of > > storage which can be allocated and reallocated as needed. > > > > From what I know of your situation, ZFS would be ideal because: > > 1. the on-disk data is always consistent -- ideal for BCVs > > I haven''t written anything about inconsistency of on-disk data. I haven''t > claimed that ZFS cannot cooperate with BCVs regarding data consistency. > Someone just requested real life examples why we need shrinking a zpool. > > > 2. the allocations can be changed on the fly without bothering > > the slice/LUN layer at all. > > Allocation of what ? LUNs ?Space. There is not a 1:1 mapping of filesystem to LUN/slice in ZFS. You need to separate in your mind the management of the zpool from the management of the space and filesystem allocations (quotas, reservations, etc.) in the zpool. -- richard
On 27/04/2006, at 5:49 AM, Darren Reed wrote:> Richard Elling wrote: > >> On Wed, 2006-04-26 at 09:32 +0200, przemolicc at poczta.fm wrote: >> >>> On Tue, Apr 25, 2006 at 01:41:20PM -0400, Torrey McMahon wrote: >>> >>>> It would be better, imho, if instead of saying, "We need a >>>> shrink utility" the discussion be upleveled a bit. Shrinking is >>>> a means to an end. What''s the real requirement? I could venture >>>> some guesses but I''d rather not corrupt the discussion. At least >>>> not anymore then I normally do. ;) >>>> >>> We, in our environment, use arrays internal, hardware copy (EMC >>> calls it BCV, Hitachi - ShadowImage). >>> The hardware copy is based on LUNs: "left" LUN (primary volume - >>> PV) is the LUN >>> which we are working on and the right LUN (secondary volume - SV) >>> is (re)synchronized >>> every time we need exact copy of PV. >>> Let''s say we have a new project involving a few zones on the same >>> server. We are going >>> to put each zone''s data on its own volume. But at the very begin >>> we really don''t know >>> how each zone will grow in data in the future. We estimate that >>> zone 1 will grow >>> faster than zone 2: >>> >> >> I think that you are trying to use your existing terminology for ZFS. >> ZFS changes the rules because it removes the need for you to think >> of LUNs in the way you currently do. Rather, think of the pool of >> storage which can be allocated and reallocated as needed. >> > > The only caveat to that is that there does appear to be some bounds > on how many LUNs should be in a pool with raid-z before there is > some degradation in performance as ZFS spreads the load out. > > Which tends to put paid to the idea of just continually adding more > and more disks to a pool to increase disk space.Surely it only puts paid to the idea of continually addinig more and more disks to a raidz vdev, not a pool. Since you can''t actually add disks (except for replacement) to a raidz vdev, it''s a moot point. Boyd
This is the fun of having filesystem and volume manager all in one :) zpool should be able to support:- Removal of a top level vdev Replacement of a top level vdev with one of a different size and/or layout Online re-layout of vdevs. It is going to be an interesting one to see if the simplicity of zpool can be kept with some of the additions. How about in an ideal world we roll it all into one command something like zpool relayout pool vdev ... so that if we start with:- zpool create raidpool raidz c1t0d0 c1t1d0 c1t3d0 c1t4d0 c1t5d0 we could be able to:- zpool relayout raidpool raidz c1t0d0 c1t1d0 c1t3d0 to free up a couple of disks. or zpool relayout raidpool raidz c1t0d0 c1t1d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 to add a couple zpool relayout raidpool raidz c2t0d0 c2t1d0 c2t2d0 to move data and free disks zpool relayout raidpool mirror c3t0d0 c4t0d0 to move data and change the fault tolerance. Of course behind this the complexity of re-writing the whole pools worth of data is serious especially if this needs to be written back to the same disks it''s coming from (I''m not sure if that''s even possible heh) Cheers, Alan This message posted from opensolaris.org
Hello there. I do agree, in a small environment you normaly do not need to shrink. One reason to shrink is between keyboard and chair. You just add the wrong disk. 1 TB instead of 100 GB. What do you do? Ask the SAN team to provide space for a second pool of 15 TB to copy it all over into a temporary pool, create the right pool on the right disks and copy back? All inside you downtime of 1 hour? Clear NOGO! Another reason is storage consolidation. The customer was using 15 TB until now He was able to reduce to 5 TB and is not willing to pay for the unused 10 TB. Sure you can tell him this is a downtime if 10 hours. If you want to loose the customer...... About UFS: The missing ability to shrink is the reason for using VxFS in all big environments. I would say: No ''home user'' needs shrink. Every professional datacenter needs shrink. So for me at home ZFS is OK, for my datacenter customers it''s still VxVM with VxFS. With all its withdraws. Mit vielen Gruessen, Ralf Gans This message posted from opensolaris.org
> No ''home user'' needs shrink. > Every professional datacenter needs shrink.I can think of a scenario. I have a n disk RAID that I built with n newly purchased disks that are m GB. One dies. I buy a replacement disk, also m GB but when I put it in, it''s really ( m - x ) GB. I need to shrink my zpool because it''s now smaller then when I originally built it. I''ve gotten in the habit of not using a chunk of the disk to ensure the replacement disk will be at least as large. For 120 GB drives, I''ve seen maximum partitions from 110 GB to 119 GB. If my originals were 119 GB and the replacement was 110 GB, I couldn''t use it as a replacement. How would ZFS handle this? This message posted from opensolaris.org
> No ''home user'' needs shrink.I strongly disagree with this. The ability to shrink can be useful in many specific situations, but in the more general sense, and this is in particular for home use, it allows you to plan much less rigidly. You can add/remove drives left and right at your leasure and won''t work yourself into a corner with regards to drive sizes, lack of drive connectivity, or drive interfaces as you can always perform an incremental migration/upgrade/downgrade.> Every professional datacenter needs shrink.Perhaps at some level. At the level of having 1-20 semi-structured servers with 5-20 or so terrabytes each, you probably don''t need is all that much - even if it would be nice (speaking from experience). -- / Peter Schuller PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at infidyne.com>'' Key retrieval: Send an E-Mail to getpgpkey at scode.org E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org
Ralf Gans wrote:> No ''home user'' needs shrink. > Every professional datacenter needs shrink.Regardless of where you want or don''t want to use shrink, we are actively working on this, targeting delivery in s10u5. --matt ps. To answer a later poster''s question, replacing a disk with a smaller one can be accomplished by simply adding the smaller disk, then removing the larger one (once pool shrinking is supported).
> Ralf Gans wrote: > > No ''home user'' needs shrink. > > Every professional datacenter needs shrink. > > Regardless of where you want or don''t want to use shrink, we are > actively working on this, targeting delivery in s10u5.And I eagerly await the day I''ll get to read a blog discussing how this works and what you had to do with respect to snapshot blocks. :-) (or will you have to remove snapshots?) -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Darren Dunham wrote:>> Ralf Gans wrote: >>> No ''home user'' needs shrink. >>> Every professional datacenter needs shrink. >> Regardless of where you want or don''t want to use shrink, we are >> actively working on this, targeting delivery in s10u5. > > And I eagerly await the day I''ll get to read a blog discussing how this > works and what you had to do with respect to snapshot blocks. :-) (or > will you have to remove snapshots?)Yeah, the implementation is nontrivial. Of course, this won''t have any impact on snapshots, clones, etc. and will happen on-line. Any other solution would be unacceptable. --matt
> > And I eagerly await the day I''ll get to read a blog discussing how this > > works and what you had to do with respect to snapshot blocks. :-) (or > > will you have to remove snapshots?) > > Yeah, the implementation is nontrivial.I thought that might be the case from the tiny details I have picked up.> Of course, this won''t have any impact on snapshots, clones, etc. and > will happen on-line. Any other solution would be unacceptable. > > --mattThat''s excellent. I''d assumed as much, but I hadn''t seen it stated yet. Very good to hear. Thanks for the update! -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Jesus Cea
2007-Feb-15 13:00 UTC
[zfs-discuss] Re: Shrinking a zpool? (refered to "Meta data corruptions on ZFS.")
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The are infinite usecases for storage shrinking. A clear example is the "Meta data corruptions on ZFS." thread currently in the list. The issue is: my pool is full and I can''t delete a file because the COW operation can''t find enough free space (yes, I know this concrete issue woudl be solved if solaris did some small reservation in the zpool). If we could shrink a zpool, the administrator could simply add a new small vdev (for example, a usb pendrive, or a NFS remote file) to provide some free space to the pool, delete some big files and *THEN* shrink the pool to umount the temporary added spare space. To me, a huge issue is when you try to add a way-2 mirror to a zpool but you add the two disk as separate vdev''s by error. The only possible step then is to backup the zpool, destroy it and recreate it again. Not nice... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at argo.es http://www.argo.es/~jcea/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRdRZi5lgi5GaxT1NAQJsdgP/TkdZYyhGzbBNb6NeE15/xNKs5WIwCiTH D5jcd/HQI6BchFEQhd/isV+oCQpWNGL8g0tUcvIBnPfzM0SWJ+UWickfDQRe1Of0 TXk2x3YWZ9q+ZfbELEHo43wiO6IwhW+RyCUjwaUqMkSwzP9X2Nc9JoRVUmSwmbXD 77hnogDpZK4=x+NL -----END PGP SIGNATURE-----
Solaris 10 update 5 was released 05/2008, but no zpool shrink :-( Any update? This message posted from opensolaris.org
On Tue, 6 May 2008, Brad Bender wrote:> Solaris 10 update 5 was released 05/2008, but no zpool shrink :-( Any update?IIRC, the ability to shrink a pool isn''t even in Nevada yet, so it''ll be *some time* before it''ll be in an S10 update... -- Rich Teer, SCSA, SCNA, SCSECA CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com
When on leased equipment and previously using VxVM we were able to migrate even a lowly UFS filesystems from one storage array to another storage array via the evacuate process. I guess this makes us only the 3rd customer waiting for this feature. It would be interesting to ask other users of ZFS on leased storage equipment the question of what they plan on doing when their lease is up. I''d bet that those customers would probably answer to something of the effect of "Lets migrate our storage to a SAN virtualizer ala an IBM SVC or Hitachi USP-like device and never have this problem again." This message posted from opensolaris.org
Nathan Galvin wrote:> When on leased equipment and previously using VxVM we were able to migrate even a lowly UFS filesystems from one storage array to another storage array via the evacuate process. I guess this makes us only the 3rd customer waiting for this feature. >UFS cannot be shrunk, so clearly you were not using a "shrink" feature. For ZFS, you can migrate from one storage device to another using the "zpool replace" command. -- richard
> Hi, I know there is no single-command way to shrink a > zpool (say evacuate the data from a disk and then > remove the disk from a pool), but is there a logical > way? I.e mirrror the pool to a smaller pool and then > split the mirror? In this case I''m not talking about > disk size (moving from 4 X 72GB disks to 4 X 36GB > disks) but rather moving from 4 X 73GB disks to 3 X > 73GB disks assuming the pool would fit on 3X. I > haven''t found a way but thought I might be missing > something. Thanks.Hi. You might try the following: 1) Create the new pool ( your 3 X 73GB disks pool ). 2) Create a snapshot of the main filesystem on your 4 disks pool: zfs snapshot -r poolname at migrate 3) Copy the snapshot to the new 3 disks pool: zfs sent poolname at migrate | zfs recv newpool/tmp cd newppol/tmp mv * ../ 4) Replace the old pool with the new one: zpool destroy poolname ( do a backup before this ) zpool import newpool poolname This worked for me, but test it before using it on production pools. This message posted from opensolaris.org
Is this implemented in OpenSolaris 2008.11? I''m moving move my filer''s rpool to an ssd mirror to free up bigdisk slots currently used by the os and need to shrink rpool from 40GB to 15GB. (only using 2.7GB for the install). thx jake -- This message posted from opensolaris.org
> You are the 2nd customer I''ve ever heard of to use shrink.This attitude seems to be a common theme in ZFS discussions: "No enterprise uses shrink, only grow." Maybe. The enterprise I work for requires that every change be reversible and repeatable. Every change requires a backout plan and that plan better be fast and nondisruptive. Who are these enterprise admins who can honestly state that they have no requirement to reverse operations? Who runs a 24x7 storage system and will look you in the eye and state, "The storage decisions (parity count, number of devices in a stripe, etc.) that I make today will be valid until the end of time and will NEVER need nondisruptive adjustment. Every storage decision I made in 1993 when we first installed RAID is still correct and has needed no changes despite changes in our business models." My experience is that this attitude about enterprise storage borders on insane. Something does not compute. -- This message posted from opensolaris.org
Martin wrote:>> You are the 2nd customer I''ve ever heard of to use shrink. >> > > This attitude seems to be a common theme in ZFS discussions: "No enterprise uses shrink, only grow." > > Maybe. The enterprise I work for requires that every change be reversible and repeatable. Every change requires a backout plan and that plan better be fast and nondisruptive. > > Who are these enterprise admins who can honestly state that they have no requirement to reverse operations? Who runs a 24x7 storage system and will look you in the eye and state, "The storage decisions (parity count, number of devices in a stripe, etc.) that I make today will be valid until the end of time and will NEVER need nondisruptive adjustment. Every storage decision I made in 1993 when we first installed RAID is still correct and has needed no changes despite changes in our business models." > > My experience is that this attitude about enterprise storage borders on insane. >What''s wrong with make a new pool.. safely copy the data. verify data and then delete the old pool.. Who in the enterprise just allocates a massive pool and then one day wants to shrink it... For home nas I could see this being useful.. I''m not aruging there isn''t a use case, but in terms of where my vote for time/energy of the developers goes.. I''d have to concur there''s more useful things out there. OTOH... once/if the block reallocation code is dropped (webrev?) the shrinking of a pool should be a lot easier. I don''t mean to go off on a side rant, but afaik this code is written and should have been available. If we all pressured Green-bytes with an open letter it would maybe help...... The legal issues around this are what''s holding it all up. @Sun people can''t comment I''m sure, but this is what I speculate. ./C
C, I appreciate the feedback and like you, do not wish to start a side rant, but rather understand this, because it is completely counter to my experience. Allow me to respond based on my anecdotal experience.> What''s wrong with make a new pool.. safely copy the data. verify data > and then delete the old pool..You missed a few steps. The actual process would be more like the following. 1. Write up the steps and get approval from all affected parties -- In truth, the change would not make it past step 1. 2. Make a new pool 3. Quiesce the pool and cause a TOTAL outage during steps 4 through 9 4. Safely make a copy of the data 5. Verify the data 6. Export old pool 7. Import new pool 8. Restart server 9. Confirm all services are functioning correctly 10. Announce the outage has finished 11. Delete the old pool Note step 3 and let me know which 24x7 operation would tolerate an extended outage (because it would last for hours or days) on a critical production server. One solution is not to do this on a critical enterprise storage, and that''s the point I am trying to make.> Who in the enterprise just allocates a > massive poolEveryone.> and then one day [months or years later] wants to shrink it...Business needs change. Technology changes. The project was a pilot and canceled. The extended pool didn''t meet verification requirements, e,g, performance and the change must be backed out. Business growth estimates are grossly too high and the pool needs migration to a cheaper frame in order to keep costs in line with revenue. The pool was made of 40 of the largest disks at the time and now, 4 years later, only 10 disks are needed to accomplish the same thing while the 40 original disks are at EOL and no longer supported. The list goes on and on.> I''d have to concur there''s more useful things out there. OTOH...That''s probably true and I have not seen the priority list. I was merely amazed at the number of "Enterprises don''t need this functionality" posts. Thanks again, Marty -- This message posted from opensolaris.org
+1 Thanks for putting this in a real world perspective, Martin. I''m faced with this exact circumstance right now (see my post to the list from earlier today). Our ZFS filers are highly utilised, highly trusted components at the core of our enterprise and serve out OS images, mail storage, customer facing NFS mounts, CIFS mounts, etc. for nearly all of our critical services. Downtime is, essentially, a catastrophe and won''t get approval without weeks of painstaking social engineering.. jake -- This message posted from opensolaris.org
Preface: yes, shrink will be cool. But we''ve been running highly available, mission critical datacenters for more than 50 years without shrink being widely available. On Aug 5, 2009, at 9:17 AM, Martin wrote:>> You are the 2nd customer I''ve ever heard of to use shrink. > > This attitude seems to be a common theme in ZFS discussions: "No > enterprise uses shrink, only grow." > > Maybe. The enterprise I work for requires that every change be > reversible and repeatable. Every change requires a backout plan and > that plan better be fast and nondisruptive.Do it exactly the same way you do it for UFS. You''ve been using UFS for years without shrink, right? Surely you have procedures in place :-)> Who are these enterprise admins who can honestly state that they > have no requirement to reverse operations?Backout plans are not always simple reversals. A well managed site will have procedures for rolling upgrades.> Who runs a 24x7 storage system and will look you in the eye and > state, "The storage decisions (parity count, number of devices in a > stripe, etc.) that I make today will be valid until the end of time > and will NEVER need nondisruptive adjustment. Every storage > decision I made in 1993 when we first installed RAID is still > correct and has needed no changes despite changes in our business > models." > > My experience is that this attitude about enterprise storage borders > on insane. > > Something does not compute.There is more than one way to skin a cat. -- richard
Martin wrote:> C, > > I appreciate the feedback and like you, do not wish to start a side rant, but rather understand this, because it is completely counter to my experience. > > Allow me to respond based on my anecdotal experience. > > >> What''s wrong with make a new pool.. safely copy the data. verify data >> and then delete the old pool.. >> > > You missed a few steps. The actual process would be more like the following. > 1. Write up the steps and get approval from all affected parties > -- In truth, the change would not make it past step 1. >Maybe, but maybe not see below...> 2. Make a new pool > 3. Quiesce the pool and cause a TOTAL outage during steps 4 through 9 >That''s not entirely true. You can use ZFS send/recv to do the major first pass of #4 (and #5 against the snapshot) Live before the total outage. Then after you quiesce everything, you could use an incremental send/recv copy the changes since then quickly, reducing down time. I''d probably run a second full verify anyway, but in theory, I beleive the ZFS checksums are used in the send/recv process to ensure that there isn''t any corruption, so after enough positive experience, I might start to skip the second verify. This should greatly reduce the length of the down time.> Everyone. > > >> and then one day [months or years later] wants to shrink it... >> > > Business needs change. Technology changes. The project was a pilot and canceled. The extended pool didn''t meet verification requirements, e,g, performance and the change must be backed out.In an Enterprise, a change for performance should have been tested on another identical non-production system before being implemented on the production one.>> I''d have to concur there''s more useful things out there. OTOH... >> > > That''s probably true and I have not seen the priority list. I was merely amazed at the number of "Enterprises don''t need this functionality" posts. > >All that said, as a personal home user, this is a feature I''m hoping for all the time. :) -Kyle> Thanks again, > Marty >
Jacob Ritorto wrote:> Is this implemented in OpenSolaris 2008.11? I''m moving move my filer''s rpool to an ssd mirror to free up bigdisk slots currently used by the os and need to shrink rpool from 40GB to 15GB. (only using 2.7GB for the install). > >Your best bet would be to install the new ssd drives, create a new pool, snapshot the exisitng pool and use ZFS send/recv to migrate the data to the new pool. There are docs around about how install grub and the boot blocks on the new devices also. After that remove (export!, don''t destroy yet!) the old drives, and reboot to see how it works. If you have no problems, (and I don''t think there''s anything technical that would keep this from working,) then you''re good. Otherwise put the old pool back in. :) -Kyle> thx > jake >
Kyle McDonald wrote:> Jacob Ritorto wrote: >> Is this implemented in OpenSolaris 2008.11? I''m moving move my >> filer''s rpool to an ssd mirror to free up bigdisk slots currently >> used by the os and need to shrink rpool from 40GB to 15GB. (only >> using 2.7GB for the install). >> >> > Your best bet would be to install the new ssd drives, create a new > pool, snapshot the exisitng pool and use ZFS send/recv to migrate the > data to the new pool. There are docs around about how install grub and > the boot blocks on the new devices also. After that remove (export!, > don''t destroy yet!) > the old drives, and reboot to see how it works. > > If you have no problems, (and I don''t think there''s anything technical > that would keep this from working,) then you''re good. Otherwise put > the old pool back in. :) >This thread dicusses basically this same thing - he had a problem along the way, but Cindy answered it.> Hi Nawir, > > I haven''t tested these steps myself, but the error message > means that you need to set this property: > > # zpool set bootfs=rpool/ROOT/BE-name rpool > > Cindy > > On 08/05/09 03:14, nawir wrote: > Hi, > > I have sol10u7 OS with 73GB HD in c1t0d0. > I want to clone it to 36GB HD > > These steps below is what come in my mind > STEPS TAKEN > # zpool create -f altrpool c1t1d0s0 > # zpool set listsnapshots=on rpool > # SNAPNAME=`date +%Y%m%d` > # zfs snapshot -r rpool/ROOT@$SNAPNAME > # zfs list -t snapshot > # zfs send -R rpool@$SNAPNAME | zfs recv -vFd altrpool > # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk > /dev/rdsk/c1t1d0s0 > for x86 do > # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 > # zpool export altrpool > # init 5 > remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 > -insert solaris10 dvd > ok boot cdrom -s > # zpool import altrpool rpool > # init 0 > ok boot disk1 > > ERROR: > Rebooting with command: boot disk1 > Boot device: /pci at 1c,600000/scsi at 2/disk at 1,0 File and args: > no pool_props > Evaluating: > The file just loaded does not appear to be executable. > ok > > QUESTIONS: > 1. what''s wrong what my steps > 2. any better idea > > thanks-Kyle> > -Kyle > >> thx >> jake >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
richard wrote:> Preface: yes, shrink will be cool. But we''ve been > running highly > available, > mission critical datacenters for more than 50 years > without shrink being > widely available.I would debate that. I remember batch windows and downtime delaying one''s career movement. Today we are 24x7 where an outage can kill an entire business.> Do it exactly the same way you do it for UFS. You''ve > been using UFS > for years without shrink, right? Surely you have > procedures in > place :-)While I haven''t taken a formal survey, everywhere I look I see JFS on AIX and VxFS on Solaris. I haven''t been in a production UFS shop this decade.> Backout plans are not always simple reversals. A > well managed site will > have procedures for rolling upgrades.I agree with everything you wrote. Today other technologies allow live changes to the pool, so companies use those technologies instead of ZFS.> There is more than one way to skin a cat.Which entirely misses the point. -- This message posted from opensolaris.org
Interesting, this is the same procedure I invented (with the exception that the zfs send came from the net) and used to hack OpenSolaris 2009.06 onto my home SunBlade 2000 since it couldn''t do AI due to low OBP rev.. I''ll have to rework it this way, then, which will unfortunately cause downtime for a multitude of dependent services, affect the entire universe here and make my department look inept. As much as it stings, I accept that this is the price I pay for adopting a new technology. Acknowledge and move on. Quite simply, if this happens too often, we know we''ve made the wrong decision on vendor/platform. Anyway, looking forward to shrink. Thanks for the tips. Kyle McDonald wrote:> Kyle McDonald wrote: >> Jacob Ritorto wrote: >>> Is this implemented in OpenSolaris 2008.11? I''m moving move my >>> filer''s rpool to an ssd mirror to free up bigdisk slots currently >>> used by the os and need to shrink rpool from 40GB to 15GB. (only >>> using 2.7GB for the install). >>> >>> >> Your best bet would be to install the new ssd drives, create a new >> pool, snapshot the exisitng pool and use ZFS send/recv to migrate the >> data to the new pool. There are docs around about how install grub and >> the boot blocks on the new devices also. After that remove (export!, >> don''t destroy yet!) >> the old drives, and reboot to see how it works. >> >> If you have no problems, (and I don''t think there''s anything technical >> that would keep this from working,) then you''re good. Otherwise put >> the old pool back in. :) >> > This thread dicusses basically this same thing - he had a problem along > the way, but Cindy answered it. > >> Hi Nawir, >> >> I haven''t tested these steps myself, but the error message >> means that you need to set this property: >> >> # zpool set bootfs=rpool/ROOT/BE-name rpool >> >> Cindy >> >> On 08/05/09 03:14, nawir wrote: >> Hi, >> >> I have sol10u7 OS with 73GB HD in c1t0d0. >> I want to clone it to 36GB HD >> >> These steps below is what come in my mind >> STEPS TAKEN >> # zpool create -f altrpool c1t1d0s0 >> # zpool set listsnapshots=on rpool >> # SNAPNAME=`date +%Y%m%d` >> # zfs snapshot -r rpool/ROOT@$SNAPNAME >> # zfs list -t snapshot >> # zfs send -R rpool@$SNAPNAME | zfs recv -vFd altrpool >> # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk >> /dev/rdsk/c1t1d0s0 >> for x86 do >> # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 >> # zpool export altrpool >> # init 5 >> remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 >> -insert solaris10 dvd >> ok boot cdrom -s >> # zpool import altrpool rpool >> # init 0 >> ok boot disk1 >> >> ERROR: >> Rebooting with command: boot disk1 >> Boot device: /pci at 1c,600000/scsi at 2/disk at 1,0 File and args: >> no pool_props >> Evaluating: >> The file just loaded does not appear to be executable. >> ok >> >> QUESTIONS: >> 1. what''s wrong what my steps >> 2. any better idea >> >> thanks > -Kyle > > >> >> -Kyle >> >>> thx >>> jake >>> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Aug 5, 2009, at 1:06 PM, Martin wrote:> richard wrote: >> Preface: yes, shrink will be cool. But we''ve been >> running highly >> available, >> mission critical datacenters for more than 50 years >> without shrink being >> widely available. > > I would debate that. I remember batch windows and downtime delaying > one''s career movement. Today we are 24x7 where an outage can kill > an entire businessAgree.>> Do it exactly the same way you do it for UFS. You''ve >> been using UFS >> for years without shrink, right? Surely you have >> procedures in >> place :-) > > While I haven''t taken a formal survey, everywhere I look I see JFS > on AIX and VxFS on Solaris. I haven''t been in a production UFS shop > this decade.Then why are you talking on Solaris forum? All versions of Solaris prior to Solaris 10 10/08 only support UFS for boot.>> Backout plans are not always simple reversals. A >> well managed site will >> have procedures for rolling upgrades. > > I agree with everything you wrote. Today other technologies allow > live changes to the pool, so companies use those technologies > instead of ZFS.... and can continue to do so. If you are looking to replace a for-fee product with for-free, then you need to consider all ramifications. For example, a shrink causes previously written data to be re-written, thus exposing the system to additional failure modes. OTOH, a model of place once and never disrupt can provide a more reliable service. You will see the latter "pattern" repeated often for high assurance systems.> >> There is more than one way to skin a cat. > > Which entirely misses the point.Many cases where people needed to shrink were due to the inability to plan for future growth. This is compounded by the rather simplistic interface between a logical volume and traditional file system. ZFS allows you to dynamically grow the pool, so you can implement a process of only adding storage as needs dictate. Bottom line: shrink will be cool, but it is not the perfect solution for managing changing data needs in a mission critical environment. -- richard
I''m chiming in late, but have a mission critical need of this as well and posted as a non-member before. My customer was wondering when this would make it into Solaris 10. Their complete adoption depends on it. I have a customer that is trying to move from VxVM/VxFS to ZFS, however they have this same need. They want to save money and move to ZFS. They are charged by a separate group for their SAN storage needs. The business group storage needs grow and shrink over time, as it has done for years. They''ve been on E25K''s and other high power boxes with VxVM/VxFS as their encapsulated root disk for over a decade. They are/were a big Veritas shop. They rarely ever use UFS, especially in production. They absolutely require the shrink functionality to completely move off VxVM/VxFS to ZFS, and we''re talking $$millions. I think your statements below are from a technology standpoint, not a business standpoint. You say its poor planning, which is way off the mark. Business needs change daily. It takes several weeks to provision SAN with all the approvals, etc. and it it takes massive planning. That goes for increasing as well as decreasing their storage needs. Richard Elling wrote:> On Aug 5, 2009, at 1:06 PM, Martin wrote: > >> richard wrote: >>> Preface: yes, shrink will be cool. But we''ve been >>> running highly >>> available, >>> mission critical datacenters for more than 50 years >>> without shrink being >>> widely available. >> >> I would debate that. I remember batch windows and downtime delaying >> one''s career movement. Today we are 24x7 where an outage can kill an >> entire business > > Agree. > >>> Do it exactly the same way you do it for UFS. You''ve >>> been using UFS >>> for years without shrink, right? Surely you have >>> procedures in >>> place :-) >> >> While I haven''t taken a formal survey, everywhere I look I see JFS on >> AIX and VxFS on Solaris. I haven''t been in a production UFS shop this >> decade. > > Then why are you talking on Solaris forum? All versions of > Solaris prior to Solaris 10 10/08 only support UFS for boot. > >>> Backout plans are not always simple reversals. A >>> well managed site will >>> have procedures for rolling upgrades. >> >> I agree with everything you wrote. Today other technologies allow >> live changes to the pool, so companies use those technologies instead >> of ZFS. > > ... and can continue to do so. If you are looking to replace a > for-fee product with for-free, then you need to consider all > ramifications. For example, a shrink causes previously written > data to be re-written, thus exposing the system to additional > failure modes. OTOH, a model of place once and never disrupt > can provide a more reliable service. You will see the latter > "pattern" repeated often for high assurance systems. > >> >>> There is more than one way to skin a cat. >> >> Which entirely misses the point. > > Many cases where people needed to shrink were due to the > inability to plan for future growth. This is compounded by the > rather simplistic interface between a logical volume and traditional > file system. ZFS allows you to dynamically grow the pool, so you > can implement a process of only adding storage as needs dictate. > > Bottom line: shrink will be cool, but it is not the perfect solution for > managing changing data needs in a mission critical environment. > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Aug 5, 2009, at 2:58 PM, Brian Kolaci wrote:> I''m chiming in late, but have a mission critical need of this as > well and posted as a non-member before. My customer was wondering > when this would make it into Solaris 10. Their complete adoption > depends on it.> I have a customer that is trying to move from VxVM/VxFS to ZFS, > however they have this same need. They want to save money and move > to ZFS. They are charged by a separate group for their SAN storage > needs. The business group storage needs grow and shrink over time, > as it has done for years. They''ve been on E25K''s and other high > power boxes with VxVM/VxFS as their encapsulated root disk for over > a decade. They are/were a big Veritas shop. They rarely ever use > UFS, especially in production. > > They absolutely require the shrink functionality to completely move > off VxVM/VxFS to ZFS, and we''re talking $$millions. I think your > statements below are from a technology standpoint, not a business > standpoint.If you look at it from Sun''s business perspective, ZFS is $$ free, so Sun gains no $$ millions by replacing VxFS. Indeed, if the customer purchases VxFS from Sun, it makes little sense for Sun to eliminate a revenue source. OTOH, I''m sure if they are willing to give Sun $$ millions, it can help raise the priority of CR 4852783. http://bugs.opensolaris.org/view_bug.do?bug_id=4852783> You say its poor planning, which is way off the mark. Business > needs change daily. It takes several weeks to provision SAN with > all the approvals, etc. and it it takes massive planning. That goes > for increasing as well as decreasing their storage needs.I think you''ve identified the real business problem. A shrink feature in ZFS will do nothing to fix this. A business who''s needs change faster than their ability to react has (as we say in business school) an unsustainable business model. -- richard
On Wed, 5 Aug 2009, Brian Kolaci wrote:> > I have a customer that is trying to move from VxVM/VxFS to ZFS, however they > have this same need. They want to save money and move to ZFS. They are > charged by a separate group for their SAN storage needs. The business group > storage needs grow and shrink over time, as it has done for years. They''ve > been on E25K''s and other high power boxes with VxVM/VxFS as their > encapsulated root disk for over a decade. They are/were a big Veritas shop. > They rarely ever use UFS, especially in production.ZFS is a storage pool and not strictly a filesystem. One may create filesystems or logical volumes out of this storage pool. The logical volumes can be exported via iSCSI or FC (COMSTAR). Filesystems may be exported via NFS or CIFS. ZFS filesystems support quotas for both maximum consumption, and minimum space reservation. Perhaps the problem is one of educating the customer so that they can ammend their accounting practices. Different business groups can share the same pool if necessary. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling wrote:> On Aug 5, 2009, at 2:58 PM, Brian Kolaci wrote: > >> I''m chiming in late, but have a mission critical need of this as well >> and posted as a non-member before. My customer was wondering when >> this would make it into Solaris 10. Their complete adoption depends >> on it. > >> I have a customer that is trying to move from VxVM/VxFS to ZFS, >> however they have this same need. They want to save money and move to >> ZFS. They are charged by a separate group for their SAN storage >> needs. The business group storage needs grow and shrink over time, as >> it has done for years. They''ve been on E25K''s and other high power >> boxes with VxVM/VxFS as their encapsulated root disk for over a >> decade. They are/were a big Veritas shop. They rarely ever use UFS, >> especially in production. >> >> They absolutely require the shrink functionality to completely move >> off VxVM/VxFS to ZFS, and we''re talking $$millions. I think your >> statements below are from a technology standpoint, not a business >> standpoint. > > If you look at it from Sun''s business perspective, ZFS is $$ free, so > Sun gains > no $$ millions by replacing VxFS. Indeed, if the customer purchases VxFS > from Sun, it makes little sense for Sun to eliminate a revenue source. > OTOH, > I''m sure if they are willing to give Sun $$ millions, it can help raise the > priority of CR 4852783. > http://bugs.opensolaris.org/view_bug.do?bug_id=4852783They''re probably on the list already, but I''ll check to make sure. What I meant by the $$ millions is that currently all Sun hardware purchases are on hold. Deploying on Solaris currently means not just the hardware, but the support, required certified third-party software such as EMC powerpath, Veritas VxVM & VxFS, BMC monitoring, and more... Yes, I''m still working the MPxIO to replace powerpath, but there''s issues there too. They will not use UFS. Right now ZFS is OK for limited deployment and no production use. Their case on ZFS is that its good for dealing with JBOD, but it not yet "enterprise ready" for SAN use. Shrinking a volume is just one of a list of requirements to move toward "enterprise ready", however many issues have been fixed. So Sun would see increased hardware revenue stream if they would just listen to the customer... Without it, they look for alternative hardware/software vendors. While this is stalled, there have been several hundred systems that have been flipped to competitors (and this is still going on). So lack of this feature will cause $$ millions to be lost...> >> You say its poor planning, which is way off the mark. Business needs >> change daily. It takes several weeks to provision SAN with all the >> approvals, etc. and it it takes massive planning. That goes for >> increasing as well as decreasing their storage needs. > > I think you''ve identified the real business problem. A shrink feature in > ZFS will > do nothing to fix this. A business who''s needs change faster than their > ability > to react has (as we say in business school) an unsustainable business > model. > -- richardYes, hence a federal bail-out. However a shrink feature will help them to be able to spend more with Sun.
Brian, CR 4852783 was updated again this week so you might add yourself or your customer to continue to be updated. In the meantime, a reminder is that a mirrored ZFS configuration is flexible in that devices can be detached (as long as the redundancy is not compromised) or replaced as long as the replacement disk is an equivalent size or larger. So, you can move storage around if you need to in a mirrored ZFS config and until 4852783 integrates. cs On 08/05/09 15:58, Brian Kolaci wrote:> I''m chiming in late, but have a mission critical need of this as well > and posted as a non-member before. My customer was wondering when this > would make it into Solaris 10. Their complete adoption depends on it. > > I have a customer that is trying to move from VxVM/VxFS to ZFS, however > they have this same need. They want to save money and move to ZFS. > They are charged by a separate group for their SAN storage needs. The > business group storage needs grow and shrink over time, as it has done > for years. They''ve been on E25K''s and other high power boxes with > VxVM/VxFS as their encapsulated root disk for over a decade. They > are/were a big Veritas shop. They rarely ever use UFS, especially in > production. > > They absolutely require the shrink functionality to completely move off > VxVM/VxFS to ZFS, and we''re talking $$millions. I think your statements > below are from a technology standpoint, not a business standpoint. You > say its poor planning, which is way off the mark. Business needs change > daily. It takes several weeks to provision SAN with all the approvals, > etc. and it it takes massive planning. That goes for increasing as well > as decreasing their storage needs. > > Richard Elling wrote: > >> On Aug 5, 2009, at 1:06 PM, Martin wrote: >> >>> richard wrote: >>> >>>> Preface: yes, shrink will be cool. But we''ve been >>>> running highly >>>> available, >>>> mission critical datacenters for more than 50 years >>>> without shrink being >>>> widely available. >>> >>> >>> I would debate that. I remember batch windows and downtime delaying >>> one''s career movement. Today we are 24x7 where an outage can kill an >>> entire business >> >> >> Agree. >> >>>> Do it exactly the same way you do it for UFS. You''ve >>>> been using UFS >>>> for years without shrink, right? Surely you have >>>> procedures in >>>> place :-) >>> >>> >>> While I haven''t taken a formal survey, everywhere I look I see JFS on >>> AIX and VxFS on Solaris. I haven''t been in a production UFS shop >>> this decade. >> >> >> Then why are you talking on Solaris forum? All versions of >> Solaris prior to Solaris 10 10/08 only support UFS for boot. >> >>>> Backout plans are not always simple reversals. A >>>> well managed site will >>>> have procedures for rolling upgrades. >>> >>> >>> I agree with everything you wrote. Today other technologies allow >>> live changes to the pool, so companies use those technologies instead >>> of ZFS. >> >> >> ... and can continue to do so. If you are looking to replace a >> for-fee product with for-free, then you need to consider all >> ramifications. For example, a shrink causes previously written >> data to be re-written, thus exposing the system to additional >> failure modes. OTOH, a model of place once and never disrupt >> can provide a more reliable service. You will see the latter >> "pattern" repeated often for high assurance systems. >> >>> >>>> There is more than one way to skin a cat. >>> >>> >>> Which entirely misses the point. >> >> >> Many cases where people needed to shrink were due to the >> inability to plan for future growth. This is compounded by the >> rather simplistic interface between a logical volume and traditional >> file system. ZFS allows you to dynamically grow the pool, so you >> can implement a process of only adding storage as needs dictate. >> >> Bottom line: shrink will be cool, but it is not the perfect solution for >> managing changing data needs in a mission critical environment. >> -- richard >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bob Friesenhahn wrote:> On Wed, 5 Aug 2009, Brian Kolaci wrote: >> >> I have a customer that is trying to move from VxVM/VxFS to ZFS, >> however they have this same need. They want to save money and move to >> ZFS. They are charged by a separate group for their SAN storage >> needs. The business group storage needs grow and shrink over time, as >> it has done for years. They''ve been on E25K''s and other high power >> boxes with VxVM/VxFS as their encapsulated root disk for over a >> decade. They are/were a big Veritas shop. They rarely ever use UFS, >> especially in production. > > ZFS is a storage pool and not strictly a filesystem. One may create > filesystems or logical volumes out of this storage pool. The logical > volumes can be exported via iSCSI or FC (COMSTAR). Filesystems may be > exported via NFS or CIFS. ZFS filesystems support quotas for both > maximum consumption, and minimum space reservation. > > Perhaps the problem is one of educating the customer so that they can > ammend their accounting practices. Different business groups can share > the same pool if necessary.They understand the technology very well. Yes, ZFS is very flexible with many features, and most are not needed in an enterprise environment where they have high-end SAN storage that is shared between Sun, IBM, linux, VMWare ESX and Windows. Local disk is only for the OS image. There is no need to have an M9000 be a file server. They have NAS for that. They use SAN across the enterprise and it gives them the ability to fail-over to servers in other data centers very quickly. Different business groups cannot share the same pool for many reasons. Each business group pays for their own storage. There are legal issues as well, and in fact cannot have different divisions on the same frame let alone shared storage. But they''re in a major virtualization push to the point that nobody will be allowed to be on their own physical box. So the big push is to move to VMware, and we''re trying to salvage as much as we can to move them to containers and LDoms. That being the case, I''ve recommended that each virtual machine on either a container or LDom should be allocated their own zpool, and the zonepath or LDom disk image be on their own zpool. This way when (not if) they need to migrate to another system, they have one pool to move over. They use fixed sized LUNs, so the granularity is a 33GB LUN, which can be migrated. This is also the case for their clusters as well as SRDF to their COB machines.
On Aug 5, 2009, at 4:06 PM, Cindy.Swearingen at Sun.COM wrote:> Brian, > > CR 4852783 was updated again this week so you might add yourself or > your customer to continue to be updated. > > In the meantime, a reminder is that a mirrored ZFS configuration > is flexible in that devices can be detached (as long as the redundancy > is not compromised) or replaced as long as the replacement disk is > an equivalent size or larger. So, you can move storage around if you > need to in a mirrored ZFS config and until 4852783 integrates.Thanks Cindy, This is another way to skin the cat. It works for simple volumes, too. But there are some restrictions, which could impact the operation when a large change in vdev size is needed. Is this planned to be backported to Solaris 10? CR 6844090 has more details. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 -- richard
Cindy.Swearingen at Sun.COM wrote:> Brian, > > CR 4852783 was updated again this week so you might add yourself or > your customer to continue to be updated.Will do. I thought I was on it, but didn''t see any updates...> > In the meantime, a reminder is that a mirrored ZFS configuration > is flexible in that devices can be detached (as long as the redundancy > is not compromised) or replaced as long as the replacement disk is an > equivalent size or larger. So, you can move storage around if you need > to in a mirrored ZFS config and until 4852783 integrates.Yes, we''re trying to push that through now (make a ZFS root). But the case I was more concerned about was the back-end storage for LDom guests and zonepaths. All the SAN storage coming in is already RAID on EMC or Hitachi, and they just move the storage around through the SAN group.
Brian Kolaci wrote:> So Sun would see increased hardware revenue stream if they would just > listen to the customer... Without [pool shrink], they look for alternative > hardware/software vendors.Just to be clear, Sun and the ZFS team are listening to customers on this issue. Pool shrink has been one of our top priorities for some time now. It is unfortunately a very difficult problem, and will take some time to solve even with the application of all possible resources (including the majority of my time). We are updating CR 4852783 at least once a month with progress reports. --matt
On Wed, 5 Aug 2009, Richard Elling wrote:> > Thanks Cindy, > This is another way to skin the cat. It works for simple volumes, too. > But there are some restrictions, which could impact the operation when a > large change in vdev size is needed. Is this planned to be backported > to Solaris 10? > > CR 6844090 has more details. > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090A potential partial solution is to have a pool creation option where the tail device labels are set to a point much smaller than the device size rather than being written to the end of the device. As zfs requires more space, the tail device labels are moved to add sufficient free space that storage blocks can again be efficiently allocated. Since no zfs data is written beyond the tail device labels, the storage LUN could be truncated down to the point where the tail device labels are still left intact. This seems like minimal impact to ZFS and no user data would need to be migrated. If the user''s usage model tends to periodically fill the whole LUN rather than to gradually grow, then this approach won''t work. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
>Preface: yes, shrink will be cool. But we''ve been running highlyavailable,>mission critical datacenters for more than 50 years without shrink being >widely available.Agreed, and shrink IS cool, I used it to migrate VxVM volumes from direct attached storage to slightly smaller SAN LUNS on a solaris sparc box. It sure is nice to add the new storage to the volume and mirror as opposed to copying to a new filesystem. It will be cool when SSDs are released for my fully loaded x4540s, if I can migrate enough users off and shrink the pool perhaps I can drop a couple of SATA disks and then add the SSDs, all on the fly. Perhaps Steve Martin said it best, "Let''s get real small!". Thanks, Jordan On Wed, Aug 5, 2009 at 12:47 PM, Richard Elling <richard.elling at gmail.com>wrote:> Preface: yes, shrink will be cool. But we''ve been running highly > available, > mission critical datacenters for more than 50 years without shrink being > widely available. > > On Aug 5, 2009, at 9:17 AM, Martin wrote: > >> You are the 2nd customer I''ve ever heard of to use shrink. >>> >> >> This attitude seems to be a common theme in ZFS discussions: "No >> enterprise uses shrink, only grow." >> >> Maybe. The enterprise I work for requires that every change be reversible >> and repeatable. Every change requires a backout plan and that plan better >> be fast and nondisruptive. >> > > Do it exactly the same way you do it for UFS. You''ve been using UFS > for years without shrink, right? Surely you have procedures in place :-) > > Who are these enterprise admins who can honestly state that they have no >> requirement to reverse operations? >> > > Backout plans are not always simple reversals. A well managed site will > have procedures for rolling upgrades. > > Who runs a 24x7 storage system and will look you in the eye and state, >> "The storage decisions (parity count, number of devices in a stripe, etc.) >> that I make today will be valid until the end of time and will NEVER need >> nondisruptive adjustment. Every storage decision I made in 1993 when we >> first installed RAID is still correct and has needed no changes despite >> changes in our business models." >> >> My experience is that this attitude about enterprise storage borders on >> insane. >> >> Something does not compute. >> > > There is more than one way to skin a cat. > -- richard > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090805/ab3322a6/attachment.html>
Bob wrote:> Perhaps the problem is one of educating the customer > so that they can > ammend their accounting practices. Different > business groups can > share the same pool if necessary.Bob, while I don''t mean to pick on you, that statement captures a major thinking flaw in IT when it comes to sales. Yes, Brian should do everything possible to shape the customer''s expectations; that''s his job. At the same time, let''s face it. If the customer thinks he needs X (whether or not he really does) and Brian can''t get him to move away from it, Brian is sunk. Here Brian sits with a potential multi-million dollar sale which is stuck on a missing feature, and probably other obstacles. The truth is that the other obstacles are irrelevant as long as the customer can''t get past feature X, valid or not. So millions of dollars to Sun hang in the balance and these discussions revolve around whether or not the customer is planning optimally. Imagine how much rapport Brian will gain when he tells this guy, "You know, if you guys just planned better, you wouldn''t need feature X." Brian would probably not get his phone calls returned after that. You can rest assured that when the customer meets with IBM the next day, the IBM rep won''t let the customer get away from feature X that JFS has. The conversation might go like this. Customer: You know, we are really looking at Sun and ZFS. IBM: Of course you are, because that''s a wise thing to do. ZFS has a lot of exciting potential. Customer: Huh? IBM: ZFS has a solid base and Sun is adding features which will make it quite effective for your applications. Customer: So you like ZFS? IBM: Absolutely. At some point it will have the features you need. You mentioned you use feature X to provide the flexibility you have to continue to outperform your competition during this recession. I understand Sun is working hard to integrate that feature, even as we speak. Customer: Maybe we don''t need feature X. IBM: You would know more than I. When did you last use feature X? Customer: We used X last quarter when we scrambled to add FOO to our product mix so that we could beat our competition to market. IBM: How would it have been different if feature X was unavailable? Customer (mind racing): We would have found a way. IBM: Of course, as innovative as your company is, you would have found a way. How much of a delay? Customer (thinking through the scenarios): I don''t know. IBM: It wouldn''t have impacted the rollout, would it? Customer: I don''t know. IBM: Even if it did delay things, the delay wouldn''t blow back on you, right? Customer (sweating): I don''t think so. Imagine the land mine Brian now has to overcome when he tries to convince the customer that they don''t need feature X, and even if they do, Sun will have it "real soon now." Does anyone really think that Oracle made their money lecturing customers on how Table Partitions are stupid and if the customer would have planned their schema better, they wouldn''t need them anyway? Of course not. People wanted partitions (valid or not) and Oracle delivered. Marty -- This message posted from opensolaris.org
A lot of us have run *with * the ability to shrink because we were using Veritas. Once you have a feature, processes tend to expand to use it. Moving to ZFS was a good move for many reasons but I still missed being able to do something that used to be so easy....
And along those lines, why stop at SSD''s? Get ZFS shrink working, and Sun could release a set of upgrade kits for x4500''s and x4540''s. Kits could range from a couple of SSD devices to crazy specs like 40 2TB drives, and 8 SSD''s. And zpool shrink would be a key facilitator driving sales of these. As Jordan says, if you can shrink your pool down, you can create space to fit the SSD devices. However, shrinking the pool also allows you to upgrade the drives much more quickly. If you have a 46 disk zpool, you can''t replace many disks at once, and the upgrade is high risk if you''re running single parity raid. Provided the pool isn''t full however, if you can shrink it down to say 40 drives first, you can then upgrade in batches of 6 at once. The zpool replace is then an operation between two fully working disks, and doesn''t affect pool integrity at all. -- This message posted from opensolaris.org
> > It is unfortunately a very difficult problem, and will take some time to > solve even with the application of all possible resources (including the > majority of my time). ?We are updating CR 4852783 at least once a month with > progress reports.Matt, should these progress reports be visible via [1] ? Right now it doesn''t seem to be available. Moreover, it says the last update was 6-May-2009. May I suggest using this forum (zfs-discuss) to periodically report the progress ? Chances are that most of the people waiting for this feature reading this list. [1] http://bugs.opensolaris.org/view_bug.do?bug_id=4852783 -- Regards, Cyril
Brian Kolaci wrote:> > They understand the technology very well. Yes, ZFS is very flexible > with many features, and most are not needed in an enterprise > environment where they have high-end SAN storage that is shared > between Sun, IBM, linux, VMWare ESX and Windows. Local disk is only > for the OS image. There is no need to have an M9000 be a file > server. They have NAS for that. They use SAN across the enterprise > and it gives them the ability to fail-over to servers in other data > centers very quickly. > > Different business groups cannot share the same pool for many > reasons. Each business group pays for their own storage. There are > legal issues as well, and in fact cannot have different divisions on > the same frame let alone shared storage. But they''re in a major > virtualization push to the point that nobody will be allowed to be on > their own physical box. So the big push is to move to VMware, and > we''re trying to salvage as much as we can to move them to containers > and LDoms. That being the case, I''ve recommended that each virtual > machine on either a container or LDom should be allocated their own > zpool, and the zonepath or LDom disk image be on their own zpool. > This way when (not if) they need to migrate to another system, they > have one pool to move over. They use fixed sized LUNs, so the > granularity is a 33GB LUN, which can be migrated. This is also the > case for their clusters as well as SRDF to their COB machines. >If they accept virtualisation, why can''t they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I''d have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. -- Ian.
> If they accept virtualisation, why can''t they use individual filesystems (or > zvol) rather than pools? ?What advantage do individual pools have over > filesystems? ?I''d have thought the main disadvantage of pools is storage > flexibility requires pool shrink, something ZFS provides at the filesystem > (or zvol) level.You can move zpools between computers, you can''t move individual file systems. Remember that there is a SAN involved. The disk array does not run Solaris.
Hi Matt Thanks for this update, and the confirmation to the outside world that this problem is being actively worked on with significant resources. But I would like to support Cyril''s comment. AFAIK, any updates you are making to bug 4852783 are not available to the outside world via the normal bug URL. It would be useful if we were able to see them. I think it is frustrating for the outside world that it cannot see Sun''s internal source code repositories for work in progress, and only see the code when it is complete and pushed out. And so there is no way to judge what progress is being made, or to actively help with code reviews or testing. Best Regards Nigel Smith -- This message posted from opensolaris.org
Mattias Pantzare wrote:>> If they accept virtualisation, why can''t they use individual filesystems (or >> zvol) rather than pools? What advantage do individual pools have over >> filesystems? I''d have thought the main disadvantage of pools is storage >> flexibility requires pool shrink, something ZFS provides at the filesystem >> (or zvol) level. >> > > You can move zpools between computers, you can''t move individual file systems. > >send/receive? -- Ian.
On Thu, Aug 6, 2009 at 12:45, Ian Collins<ian at ianshome.com> wrote:> Mattias Pantzare wrote: >>> >>> If they accept virtualisation, why can''t they use individual filesystems >>> (or >>> zvol) rather than pools? ?What advantage do individual pools have over >>> filesystems? ?I''d have thought the main disadvantage of pools is storage >>> flexibility requires pool shrink, something ZFS provides at the >>> filesystem >>> (or zvol) level. >>> >> >> You can move zpools between computers, you can''t move individual file >> systems. >> >> > > send/receive?:-) What is the downtime for doing a send/receive? What is the downtime for zpool export, reconfigure LUN, zpool import? And you still need to shrink the pool. Move a 100Gb application from server A to server B using send/receive and you will have 100Gb stuck on server A that you can''t use on server B where you relay need it.
Nigel Smith wrote:> Hi Matt > Thanks for this update, and the confirmation > to the outside world that this problem is being actively > worked on with significant resources. > > But I would like to support Cyril''s comment. > > AFAIK, any updates you are making to bug 4852783 are not > available to the outside world via the normal bug URL. > It would be useful if we were able to see them. > > I think it is frustrating for the outside world that > it cannot see Sun''s internal source code repositories > for work in progress, and only see the code when it is > complete and pushed out.That is no different to the vast majority of Open Source projects either. Open Source and Open Development usually don''t give you access to individuals work in progress. Compare this to Linux kernel development, you usually don''t get to see the partially implemented drivers or changes until they are requesting integration into the kernel. -- Darren J Moffat
But with export / import, are you really saying that you''re going to physically move 100GB of disks from one system to another? -- This message posted from opensolaris.org
Ross wrote:> But with export / import, are you really saying that you''re going to physically move 100GB of disks from one system to another?zpool export/import would not move anything on disk. It just changes which host the pool is attached to. This is exactly how cluster failover works in the SS7000 systems. -- Darren J Moffat
On Aug 6, 2009, at 5:36 AM, Ian Collins <ian at ianshome.com> wrote:> Brian Kolaci wrote: >> >> They understand the technology very well. Yes, ZFS is very >> flexible with many features, and most are not needed in an >> enterprise environment where they have high-end SAN storage that is >> shared between Sun, IBM, linux, VMWare ESX and Windows. Local disk >> is only for the OS image. There is no need to have an M9000 be a >> file server. They have NAS for that. They use SAN across the >> enterprise and it gives them the ability to fail-over to servers in >> other data centers very quickly. >> >> Different business groups cannot share the same pool for many >> reasons. Each business group pays for their own storage. There >> are legal issues as well, and in fact cannot have different >> divisions on the same frame let alone shared storage. But they''re >> in a major virtualization push to the point that nobody will be >> allowed to be on their own physical box. So the big push is to >> move to VMware, and we''re trying to salvage as much as we can to >> move them to containers and LDoms. That being the case, I''ve >> recommended that each virtual machine on either a container or LDom >> should be allocated their own zpool, and the zonepath or LDom disk >> image be on their own zpool. This way when (not if) they need to >> migrate to another system, they have one pool to move over. They >> use fixed sized LUNs, so the granularity is a 33GB LUN, which can >> be migrated. This is also the case for their clusters as well as >> SRDF to their COB machines. >> > If they accept virtualisation, why can''t they use individual > filesystems (or zvol) rather than pools? What advantage do > individual pools have over filesystems? I''d have thought the main > disadvantage of pools is storage flexibility requires pool shrink, > something ZFS provides at the filesystem (or zvol) level. > > -- > Ian. >For failover scenarios you need a pool per application so they can move the application between servers which may be in different datacenters and each app on one server can fail over to a different server. So the storage needs to be partitioned as such. The failover entails moving or rerouting San.
> What is the downtime for doing a send/receive? What is the downtime > for zpool export, reconfigure LUN, zpool import? >We have a similar situation. Our home directory storage is based on many X4540s. Currently, we use rsync to migrate volumes between systems, but our process could very easily be switched over to zfs send/receive (and very well may be in the near future). What this looks like, if using zfs send/receive, is we perform an initial send (get the bulk of the data over), and then at a planned downtime, do an incremental send to "catch up" the destination. This "catch up" phase is usually a very small fraction of the overall size of the volume. The only downtime required is from just before the final snapshot you send (the last incremental), and when the send finishes, and turning up whatever service(s) on the destination system. If the filesystem a lot of write activity, you can run multiple incrementals to decrease the size of that last snapshot. As far as backing out goes, you can simply destroy the destination filesystem, and continue running on the original system, if all hell breaks loose (of course that never happens, right? :) When everything checks out (which you can safely assume when the recv finishes, thanks to how ZFS send/recv works), you then just have to destroy the original fileystem. It is correct in that this doesn''t shrink the pool, but it''s at least a workaround to be able to swing filesystems around to different systems. If you had only one filesystem in the pool, you could then safely destroy the original pool. This does mean you''d need 2x the size of the LUN during the transfer though. For replication of ZFS filesystems, we a similar process, with just a lot of incremental sends. Greg Mason System Administrator High Performance Computing Center Michigan State University
On Thu, 6 Aug 2009, Cyril Plisko wrote:> > May I suggest using this forum (zfs-discuss) to periodically report > the progress ? Chances are that most of the people waiting for this > feature reading this list.Sun has placed themselves in the interesting predicament that being open about progress on certain high-profile "enterprise" features (such as shrink and de-duplication) could cause them to lose sales to a competitor. Perhaps this is a reason why Sun is not nearly as open as we would like them to be. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
But why do you have to attach to a pool? Surely you''re just attaching to the root filesystem anyway? And as Richard says, since filesystems can be shrunk easily and it''s just as easy to detach a filesystem from one machine and attach to it from another, why the emphasis on pools? For once I''m beginning to side with Richard, I just don''t understand why data has to be in separate pools to do this. The only argument I can think of is for performance since pools use completely separate sets of disks. I don''t know if zfs offers a way to throttle filesystems, but surely that could be managed at the network interconnect level? I have to say that I have no experience of enterprise class systems, these questions are purely me playing devils advocate as I learn :) -- This message posted from opensolaris.org
On Aug 6, 2009, at 7:59 AM, Ross wrote:> But why do you have to attach to a pool? Surely you''re just > attaching to the root filesystem anyway? And as Richard says, since > filesystems can be shrunk easily and it''s just as easy to detach a > filesystem from one machine and attach to it from another, why the > emphasis on pools? > > For once I''m beginning to side with Richard, I just don''t understand > why data has to be in separate pools to do this.welcome to the dark side... bwahahahaa :-) The way I''ve always done such migrations in the past is to get everything ready in parallel, then restart the service pointing to the new data. The cost is a tiny bit and a restart, which isn''t a big deal for most modern system architectures. If you have a high availability cluster, just add it to the list of things to do when you do a weekly/monthly/quarterly failover. Now, if I was to work in a shrink, I would do the same because shrinking moves data and moving data is risky. Perhaps someone could explain how they do a rollback from a shrink? Snapshots? I think the problem at the example company is that they make storage so expensive that the (internal) customers spend way too much time and money trying to figure out how to optimally use it. The storage market is working against this model by reducing the capital cost of storage. ZFS is tackling many of the costs related to managing storage. Clearly, there is still work to be done, but the tide is going out and will leave expensive storage solutions high and dry. Consider how different the process would be as the total cost of storage approaches zero. Would shrink need to exist? The answer is probably no. But the way shrink is being solved in ZFS has another application. Operators can still make mistakes with "add" vs "attach" so the ability to remove a top-level vdev is needed. Once this is solved, shrink is also solved. -- richard
Greg Mason wrote:> >> What is the downtime for doing a send/receive? What is the downtime >> for zpool export, reconfigure LUN, zpool import? >> > We have a similar situation. Our home directory storage is based on > many X4540s. Currently, we use rsync to migrate volumes between > systems, but our process could very easily be switched over to zfs > send/receive (and very well may be in the near future). > > What this looks like, if using zfs send/receive, is we perform an > initial send (get the bulk of the data over), and then at a planned > downtime, do an incremental send to "catch up" the destination. This > "catch up" phase is usually a very small fraction of the overall size > of the volume. The only downtime required is from just before the > final snapshot you send (the last incremental), and when the send > finishes, and turning up whatever service(s) on the destination > system. If the filesystem a lot of write activity, you can run > multiple incrementals to decrease the size of that last snapshot. As > far as backing out goes, you can simply destroy the destination > filesystem, and continue running on the original system, if all hell > breaks loose (of course that never happens, right? :) >That is how I migrate services (zones) and their data between hosts with one of my clients. The big advantage of zfs send/receive over rsync is the final replication is very fast. Run a send/receive just before the migration than top up after the service shuts down. The last one we moved was a mail server with 1TB of small files and the downtime was under 2 minutes. The biggest delay was sending the "start" and "done" text messages!> When everything checks out (which you can safely assume when the recv > finishes, thanks to how ZFS send/recv works), you then just have to > destroy the original fileystem. It is correct in that this doesn''t > shrink the pool, but it''s at least a workaround to be able to swing > filesystems around to different systems. If you had only one > filesystem in the pool, you could then safely destroy the original > pool. This does mean you''d need 2x the size of the LUN during the > transfer though. > > For replication of ZFS filesystems, we a similar process, with just a > lot of incremental sends.Same here. -- Ian.
ob Friesenhahn wrote:> Sun has placed themselves in the interesting predicament that being > open about progress on certain high-profile "enterprise" features > (such as shrink and de-duplication) could cause them to lose sales to > a competitor. Perhaps this is a reason why Sun is not nearly as open > as we would like them to be.I agree that it is difficult for Sun, at this time, to be more ''open'', especially for ZFS, as we still await the resolution of Oracle purchasing Sun, the court case with NetApp over patents, and now the GreenBytes issue! But I would say they are more likely to avoid loosing sales by confirming what enhancements they are prioritising. I think people will wait if they know work is being done, and progress being made, although not indefinitely. I guess it depends on the rate of progress of ZFS compared to say btrfs. I would say that maybe Sun should have held back on announcing the work on deduplication, as it just seems to have ramped up frustration, now that it seems no more news is forthcoming. It''s easy to be wise after the event and time will tell. Thanks Nigel Smith -- This message posted from opensolaris.org
Hi Darren, Darren J Moffat wrote:> That is no different to the vast majority of Open Source projects > either. Open Source and Open Development usually don''t give you access > to individuals work in progress.Yes thats true. But there are more ''open'' models for running an open source project. For instance, Sun''s project for the Comstar iscsi target: http://www.opensolaris.org/os/project/iser/ ..where there was an open mailing list, where you could see the developers making progress: http://mail.opensolaris.org/pipermail/iser-dev/ Best Regards Nigel Smith -- This message posted from opensolaris.org
On Thu, 6 Aug 2009, Nigel Smith wrote:> > I guess it depends on the rate of progress of ZFS compared to say btrfs.Btrfs is still an infant whereas zfs is now into adolescence.> I would say that maybe Sun should have held back on > announcing the work on deduplication, as it just seems toI still have not seen any formal announcement from Sun regarding deduplication. Everything has been based on remarks from code developers. It is not as concrete and definite as Apple''s announcement of zfs inclusion in Snow Leopard Server. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Thu, Aug 6, 2009 at 16:59, Ross<no-reply at opensolaris.org> wrote:> But why do you have to attach to a pool? ?Surely you''re just attaching to the root > filesystem anyway? ?And as Richard says, since filesystems can be shrunk easily > and it''s just as easy to detach a filesystem from one machine and attach to it from > another, why the emphasis on pools?What filesystems are you talking about? A zfs pool can be "attached" to one and only one computer at any given time. All file systems in that pool are "attached" to the same computer.> > For once I''m beginning to side with Richard, I just don''t understand why data has to > be in separate pools to do this.All accounting for data and free blocks are done at the pool level. That is why you can share space between file systems. You could write code that made ZFS a cluster file system, maybe just for the pool but that is a lot of work and would require all attached computer so talk to each other.
On 6 aug 2009, at 23.52, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> I still have not seen any formal announcement from Sun regarding > deduplication. Everything has been based on remarks from code > developers. >To be fair, the official "what''s new" document for 2009.06 states that dedup will be part of the next OSOL release in 2010. Or at least that we should "look out" for it ;) "We''re already looking forward to the next release due in 2010. Look out for great new features like an interactive installation for SPARC, the ability to install packages directly from the repository during the install, offline IPS support, a new version of the GNOME desktop, ZFS deduplication and user quotas, cloud integration and plenty more! As always, you can follow active development by adding the dev/ repository." Henrik http://sparcv9.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090807/431b1449/attachment.html>
On Fri, 7 Aug 2009, Henrik Johansson wrote:> "We''re already looking forward to the next release due in 2010. Look out for > great new features like an interactive installation for SPARC, the ability to > install packages directly from the repository during the install, offline IPS > support, a new version of the GNOME desktop, ZFS deduplication and user > quotas, cloud integration and plenty more! As always, you can follow active > development by adding the dev/ repository."Clearly I was wrong and the ZFS deduplication announcement *is* as concrete as Apple''s announcement of zfs support in Snow Leopard Server. Sorry about that. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hey Richard, I believe 6844090 would be a candidate for an s10 backport. The behavior of 6844090 worked nicely when I replaced a disk of the same physical size even though the disks were not identical. Another flexible storage feature is George''s autoexpand property (Nevada build 117), where you can attach or replace a disk in a pool with LUN that is larger in size than the existing size of the pool, but you can keep the LUN size constrained with autoexpand set to off. Then, if you decide that you want to use the expanded LUN, you can set autoexpand to on, or you can just detach it to use in another pool where you need the expanded size. (The autoexpand feature description is in the ZFS Admin Guide on the opensolaris/...zfs/docs site.) Contrasting the autoexpand behavior to current Solaris 10 releases, I noticed recently that you can use zpool attach/detach to attach a larger disk for eventual replacement purposes and the pool size is expanded automatically, even on a live root pool, without the autoexpand feature and no import/export/reboot is needed. (Well, I always reboot to see if the new disk will boot before detaching the existing disk.) I did this recently to expand a 16-GB root pool to 68-GB root pool. See the example below. Cindy # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT rpool 16.8G 5.61G 11.1G 33% ONLINE - # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t18d0s0 ONLINE 0 0 0 errors: No known data errors # zpool attach rpool c1t18d0s0 c1t1d0s0 # zpool status rpool pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h3m, 51.35% done, 0h3m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t18d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t1d0s0 <boot from new disk to make sure replacement disk boots> # init 0 # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT rpool 16.8G 5.62G 11.1G 33% ONLINE - # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t18d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors # zpool detach rpool c1t18d0s0 # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT rpool 68.2G 5.62G 62.6G 8% ONLINE - # cat /etc/release Solaris 10 5/09 s10s_u7wos_08 SPARC Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 30 March 2009 On 08/05/09 17:20, Richard Elling wrote:> On Aug 5, 2009, at 4:06 PM, Cindy.Swearingen at Sun.COM wrote: > >> Brian, >> >> CR 4852783 was updated again this week so you might add yourself or >> your customer to continue to be updated. >> >> In the meantime, a reminder is that a mirrored ZFS configuration >> is flexible in that devices can be detached (as long as the redundancy >> is not compromised) or replaced as long as the replacement disk is an >> equivalent size or larger. So, you can move storage around if you >> need to in a mirrored ZFS config and until 4852783 integrates. > > > Thanks Cindy, > This is another way to skin the cat. It works for simple volumes, too. > But there are some restrictions, which could impact the operation when a > large change in vdev size is needed. Is this planned to be backported > to Solaris 10? > > CR 6844090 has more details. > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 > -- richard >