With reference to Lori''s blog posting[1] I''d like to throw out a few of my thoughts on spliting up the namespace. This is quite timely because only yesterday when I was updating the ZFS crypto document I was thinking about this. I knew I needed ephemeral key support for ZVOLs so we could swap on an encrypted ZVOL. However I chose not to make that option specific to ZVOLs but made it available to all datasets. The rationale for this was having directories like /var/tmp as separate encrypted datasets with an ephemeral key. So yes Lori I completely agree /var should be a separate data set, whats more I think we can identify certain points of the /var namespace that should almost always be a separate dataset. Other than /var/tmp my short list for being separate ZFS datasets are: /var/crash - because can be big and we might want quotas. /var/core [ which we don''t yet have by default but I''m considering submitting an ARC case for this. ] - as above. /var/tm Similar to the /var/log rationale. There are obvious other places that would really benefit but I think having them as separate datasets really depends on what the machine is doing. For example /var/apache if you really are a webserver, but then why not go one better and split out cgi-bin and htdocs into separate datasets too - that way you have set noexec in htdocs. I think we have lots of options but it might be nice to come up with a short list of special/important directories that would should always recommend be separate datasets - lets not hardcode that into the installer though (heck we still think /usr/openwin is special !) One of the things I''m really interested in seeing is more appropriate sharing with Zones because we have more flexibility in the installer as it becomes zone aware. What I''d love to see is that we completely abandon the package based boundaries for Zones and instead use one based only on the actual filesystem namespace and use Zones to get the best out of that. A nitpick on the terminology. While I agree that some QoS things can be set at the level of a dataset there are others which are really only available to the pool, though now with ditto blocks for data as well as metadata that starts to blur a bit too. [1] http://blogs.sun.com/lalt/entry/zfs_boot_issue_of_the -- Darren J Moffat
On Tue, 24 Apr 2007, Darren J Moffat wrote:> There are obvious other places that would really benefit but I think > having them as separate datasets really depends on what the machine is > doing. For example /var/apache if you really are a webserver, but then > why not go one better and split out cgi-bin and htdocs into separate > datasets too - that way you have set noexec in htdocs.How specific do we want to get? I can see the benefit of splitting out the various apache directories, but those decisions might be better made by the appliance team. Creating a webserver would have different dataset requirements from creating a NAS box, for example. I believe we should stick to the most basic config for the default Solaris installer. Certainly it should allow the admin to create whataever datasets might be desired, but we should keep it simple for the default case. I''ve heard arguments for /tmp and /var/tmp. Your point about /var/crash is a good one. /opt and /usr have also been given good reasons. That''s six already, including root. Regards, markm
On Tue, Apr 24, 2007 at 09:48:33AM -0400, Mark J Musante wrote:> > I believe we should stick to the most basic config for the default Solaris > installer. Certainly it should allow the admin to create whataever > datasets might be desired, but we should keep it simple for the default > case. > > I''ve heard arguments for /tmp and /var/tmp. Your point about /var/crash > is a good one. /opt and /usr have also been given good reasons. That''s > six already, including root.I think for the sake of an argument, we should limit the required split off datasets to be ones that are only related to zfsboot to make zfsboot easier and more flexible (i look forward to raidz(2) boot ability) and not over-think this. I certainly don''t think there shouldn''t be a set of recommended datasets (/var/crash being a good example) but above and beyond that it should be completely up to the administrator how far they want to go. Just my $.02. ;) -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it''s just that most of the shit out there is built by people who''d be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke
Hello Darren, Tuesday, April 24, 2007, 3:33:47 PM, you wrote: DJM> With reference to Lori''s blog posting[1] I''d like to throw out a few of DJM> my thoughts on spliting up the namespace. DJM> This is quite timely because only yesterday when I was updating the ZFS DJM> crypto document I was thinking about this. I knew I needed ephemeral DJM> key support for ZVOLs so we could swap on an encrypted ZVOL. However I DJM> chose not to make that option specific to ZVOLs but made it available to DJM> all datasets. The rationale for this was having directories like DJM> /var/tmp as separate encrypted datasets with an ephemeral key. DJM> So yes Lori I completely agree /var should be a separate data set, whats DJM> more I think we can identify certain points of the /var namespace that DJM> should almost always be a separate dataset. DJM> Other than /var/tmp my short list for being separate ZFS datasets are: DJM> /var/crash - because can be big and we might want quotas. I agree - I''ve been doing this for some time (/ on UFS, rest of a disk on zfs for zones and crash + core file systems with quota set). DJM> /var/core [ which we don''t yet have by default but I''m considering DJM> submitting an ARC case for this. ] - as above. Definitely - we''re doing this in a jumpstart but frankly it should have been for years by default (even without zfs). DJM> /var/tm Similar to the /var/log rationale. DJM> There are obvious other places that would really benefit but I think DJM> having them as separate datasets really depends on what the machine is DJM> doing. For example /var/apache if you really are a webserver, but then DJM> why not go one better and split out cgi-bin and htdocs into separate DJM> datasets too - that way you have set noexec in htdocs. DJM> I think we have lots of options but it might be nice to come up with a DJM> short list of special/important directories that would should always DJM> recommend be separate datasets - lets not hardcode that into the DJM> installer though (heck we still think /usr/openwin is special !) Definitely. We could scare people with dozen or more file systems mounted after fresh install on their laptop. However some time ago here was a discussion on ''zfs split|merge'' functionality. Is it going to happen? If it does then maybe only minimum number of datasets should be created by default (/ /var /opt) and later admin can just ''zfs split root/var/log''? While having lot of datasets is really nice please do not over use it, at least not in a default configs when probably it would introduce more confusion to most users than do any good. I would also consider disabling or changing default config for autofs so local users would go to /home as most people expect by default and then also create /home as separate file system. So my short list is: / /var /opt /home -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Robert, Tuesday, April 24, 2007, 4:59:31 PM, you wrote: RM> Hello Darren, RM> Tuesday, April 24, 2007, 3:33:47 PM, you wrote: DJM>> With reference to Lori''s blog posting[1] I''d like to throw out a few of DJM>> my thoughts on spliting up the namespace. DJM>> This is quite timely because only yesterday when I was updating the ZFS DJM>> crypto document I was thinking about this. I knew I needed ephemeral DJM>> key support for ZVOLs so we could swap on an encrypted ZVOL. However I DJM>> chose not to make that option specific to ZVOLs but made it available to DJM>> all datasets. The rationale for this was having directories like DJM>> /var/tmp as separate encrypted datasets with an ephemeral key. DJM>> So yes Lori I completely agree /var should be a separate data set, whats DJM>> more I think we can identify certain points of the /var namespace that DJM>> should almost always be a separate dataset. DJM>> Other than /var/tmp my short list for being separate ZFS datasets are: DJM>> /var/crash - because can be big and we might want quotas. RM> I agree - I''ve been doing this for some time (/ on UFS, rest of a disk RM> on zfs for zones and crash + core file systems with quota set). DJM>> /var/core [ which we don''t yet have by default but I''m considering DJM>> submitting an ARC case for this. ] - as above. RM> Definitely - we''re doing this in a jumpstart but frankly it should RM> have been for years by default (even without zfs). DJM>> /var/tm Similar to the /var/log rationale. DJM>> There are obvious other places that would really benefit but I think DJM>> having them as separate datasets really depends on what the machine is DJM>> doing. For example /var/apache if you really are a webserver, but then DJM>> why not go one better and split out cgi-bin and htdocs into separate DJM>> datasets too - that way you have set noexec in htdocs. DJM>> I think we have lots of options but it might be nice to come up with a DJM>> short list of special/important directories that would should always DJM>> recommend be separate datasets - lets not hardcode that into the DJM>> installer though (heck we still think /usr/openwin is special !) RM> Definitely. We could scare people with dozen or more file systems RM> mounted after fresh install on their laptop. RM> However some time ago here was a discussion on ''zfs split|merge'' RM> functionality. Is it going to happen? If it does then maybe only RM> minimum number of datasets should be created by default (/ /var /opt) RM> and later admin can just ''zfs split root/var/log''? RM> While having lot of datasets is really nice please do not over use it, RM> at least not in a default configs when probably it would introduce RM> more confusion to most users than do any good. RM> I would also consider disabling or changing default config for autofs RM> so local users would go to /home as most people expect by default and RM> then also create /home as separate file system. RM> So my short list is: RM> / RM> /var RM> /opt RM> /home /var/crash /var/core I think configuring Solaris by default to write crashdumps and cores to above locations should be considered however I would rather not create separata datasets for them by default. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
We''re also updating the EIS bootdisk standard, and are considering similar recommendations. File systems are still not free. They have costs in complexity and maintenance, especially backup/restore. One of the benefits of a single namespace is that it is relatively simple to backup and restore quickly. However, I don''t want to get sidetracked by the state of backup/restore today. One benefit to multiple file systems is that you can apply different policies, so if we stick to discussing policies (ok, including backup/restore policies) then we should be able to arrive at a concensus relatively easily :-) Darren J Moffat wrote:> With reference to Lori''s blog posting[1] I''d like to throw out a few of > my thoughts on spliting up the namespace. > > This is quite timely because only yesterday when I was updating the ZFS > crypto document I was thinking about this. I knew I needed ephemeral > key support for ZVOLs so we could swap on an encrypted ZVOL. However I > chose not to make that option specific to ZVOLs but made it available to > all datasets. The rationale for this was having directories like > /var/tmp as separate encrypted datasets with an ephemeral key.cool> So yes Lori I completely agree /var should be a separate data set, whats > more I think we can identify certain points of the /var namespace that > should almost always be a separate dataset. > > Other than /var/tmp my short list for being separate ZFS datasets are: > > /var/crash - because can be big and we might want quotas.savecore already has a (sort of) quota implementation. I think the policy driving this is backup/restore, not quota. I''d rather not spend a bunch of time or tape backing up old cores.> /var/core [ which we don''t yet have by default but I''m considering > submitting an ARC case for this. ] - as above.ditto> /var/tm Similar to the /var/log rationale.[assuming /var/tmp] It is not clear to me how people use /var/tmp. In other words, I''m pretty sure that most people don''t know /var/tmp exists, and those that do know use it differently than I do. Perhaps the policy driving this should be quota. methinks we need a table... As Robert points out, life becomes so much easier if split/merge existed :-) -- richard
I left a comment on Lori''s blog to the effect that splitting the namespace would complicate LU tools. Perhaps we need a zfs clone -r to match zfs snapshot -r? Nico --
Richard Elling wrote:>> /var/tm Similar to the /var/log rationale. > > [assuming /var/tmp]I intended to type /var/fm not /var/tm or /var/tmp. The FMA state data is I believe something that you would want to share between all boot environments on a given bit of hardware, right ? -- Darren J Moffat
On 04/24/07 17:30, Darren J Moffat wrote:> Richard Elling wrote: > >>> /var/tm Similar to the /var/log rationale. >> >> [assuming /var/tmp] > > I intended to type /var/fm not /var/tm or /var/tmp. The FMA state data > is I believe something that you would want to share between all boot > environments on a given bit of hardware, right ?Yes, under normal production circumstances that is what you''d want. I guess under some test circumstances you may want different state for different BEs. I''d also like to have compression turned on by default for /var/fm. It will cost nothing in turns of cpu time since additions to that tree are at a very low rate and only small chunks of data at a time; but the small chunks can add up in a system suffering solid errors if the ereports are not throttled in some way, and they''re eminently compressible. There are a couple of CRs logged for this somewhere. Gavin
On 4/24/07, Darren J Moffat <Darren.Moffat at sun.com> wrote:> With reference to Lori''s blog posting[1] I''d like to throw out a few of > my thoughts on spliting up the namespace.Just a plea with my sysadmin hat on - please don''t go overboard and make new filesystems just because we can. Each extra filesystem generates more work for the administrator, if only for the effort to parse df output (which is more than cluttered enough already). In other words, let people have a system with just one filesystem.> I think we have lots of options but it might be nice to come up with a > short list of special/important directories that would should always > recommend be separate datasets -If there is such a list, explain *why*, so that admins can make informed choices. Or maybe even restructure the filesystem layout so that directories with common properties could live under a common parent that could be a separate filesystem rather than creating separate filesystems for each?> lets not hardcode that into the > installer though (heck we still think /usr/openwin is special !)Ugh, yes!> One of the things I''m really interested in seeing is more appropriate > sharing with Zones because we have more flexibility in the installer as > it becomes zone aware. What I''d love to see is that we completely > abandon the package based boundaries for Zones and instead use one based > only on the actual filesystem namespace and use Zones to get the best > out of that.Agreed, zones based on packaging causes too much pain all round. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Peter Tribble wrote:> In other words, let people have a system with just one filesystem.I''m fine with that.>> I think we have lots of options but it might be nice to come up with a >> short list of special/important directories that would should always >> recommend be separate datasets - > > If there is such a list, explain *why*, so that admins can make > informed choices. > > Or maybe even restructure the filesystem layout so that directories > with common properties could live under a common parent that could > be a separate filesystem rather than creating separate filesystems > for each?Hmn we have that already. /usr - mostly readonly executable and support. /var pretty much everything here needs to be written to and can grow, /etc can change but should be rare. -- Darren J Moffat
On 4/26/07, Darren J Moffat <Darren.Moffat at sun.com> wrote:> > > Or maybe even restructure the filesystem layout so that directories > > with common properties could live under a common parent that could > > be a separate filesystem rather than creating separate filesystems > > for each? > > Hmn we have that already. /usr - mostly readonly executable and > support. /var pretty much everything here needs to be written to and > can grow, /etc can change but should be rare.Should have said, but I was actually thinking of the /var/crash and /var/core case - where the requirement (for quotas, maybe) and the essential function is the same. A little bit of restructure and we could have 1 dataset instead of 2. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Peter Tribble wrote:> On 4/24/07, Darren J Moffat <Darren.Moffat at sun.com> wrote: >> With reference to Lori''s blog posting[1] I''d like to throw out a few of >> my thoughts on spliting up the namespace. > > Just a plea with my sysadmin hat on - please don''t go overboard > and make new filesystems just because we can. Each extra > filesystem generates more work for the administrator, if only > for the effort to parse df output (which is more than cluttered enough > already).My first reaction to that, is yes, of course, extra file systems are extra work. Don''t require them, and don''t even make them the default unless they buy you a lot. But then I thought, no, let''s challenge that a bit. Why do administrators do ''df'' commands? It''s to find out how much space is used or available in a single file system. That made sense when file systems each had their own dedicated slice, but now it doesn''t make that much sense anymore. Unless you''ve assigned a quota to a zfs file system, "space available" is meaningful more at the pool level. And if you DID assign a quota to the file system, then you really did want that part of the name space to be a separate, and separately manageable, file system. With zfs, file systems are in many ways more like directories than what we used to call file systems. They draw from pooled storage. They have low overhead and are easy to create and destroy. File systems are sort of like super-functional directories, with quality-of-service control and cloning and snapshots. Many of the things that sysadmins used to have to do with file systems just aren''t necessary or even meaningful anymore. And so maybe the additional work of managing more file systems is actually a lot smaller than you might initially think. In other words, think about ALL of the implications of using zfs, not just some. We''ve come up with a lot of good reasons for having multiple file systems. So we know that there are benefits. We also know that there are costs. But if we can figure out a way to keep the costs low, the benefits might outweigh them.> > In other words, let people have a system with just one filesystem.I think I can agree with this, but I''m not absolutely certain. On the one hand, sure, more freedom is better. But I''m concerned that our long-term install and upgrade strategies might be constrained by having to support configurations that haven''t been set up with the granularity needed for some kinds of valuable storage management features. This conversation is great! I''m getting lots of good information and I *really* want to figure out what''s best, even if it challenges some of my cherished notions. Lori
> Peter Tribble wrote: > > On 4/24/07, Darren J Moffat <Darren.Moffat at sun.com> > wrote: > >> With reference to Lori''s blog posting[1] I''d like > to throw out a few of > >> my thoughts on spliting up the namespace. > > > > Just a plea with my sysadmin hat on - please don''t > go overboard > > and make new filesystems just because we can. Each > extra > > filesystem generates more work for the > administrator, if only > > for the effort to parse df output (which is more > than cluttered enough > > already). > My first reaction to that, is yes, of course, extra > file systems are extra > work. Don''t require them, and don''t even make them > the default unless > they buy you a lot. But then I thought, no, let''s > challenge that a bit. > > Why do administrators do ''df'' commands? It''s to find > out how much space > is used or available in a single file system. That > made sense when file > systems each had their own dedicated slice, but now > it doesn''t make that > much sense anymore. Unless you''ve assigned a quota > to a zfs file system, > "space available" is meaningful more at the pool > level. And if you DID > assign a quota to the file system, then you really > did want that part of > the name space to be a separate, and separately > manageable, file system.I''d like to put my sysadmin hat on and add to this: Yes, if you start adding quota''s, etc. you''ll have to start looking at doing df''s again but this is actually easier with zfs (zfs list). Now I can see, very easily, where my space is being allocated and start diving in from there instead of the multiple du -ks * |sort -n recursive rampages I do on one big filesystem. Also, if I start using zfs and some of the other features (read only) for example, I can start taking and locking down some of these filesystems (/usr perhaps???) so I no longer need to worry about the space being allocated in /usr. Or setting reserve and quota''s on file systems, basically eliminating them from my constant monitoring and free space shuffle of where did my space go.> > With zfs, file systems are in many ways more like > directories than what > we used to call file systems. They draw from pooled > storage. They > have low overhead and are easy to create and destroy. > File systems > re sort of like super-functional directories, with > quality-of-service > control and cloning and snapshots. Many of the > things that sysadmins > used to have to do with file systems just aren''t > necessary or even > meaningful anymore. And so maybe the additional work > of managing > more file systems is actually a lot smaller than you > might initially think.I believe so. Just having zfs boot on my system for a couple of days and breaking out the major food groups, I can easily see where my space is at - again zfs list is much faster than du -ks and I don''t have to be root for it to be 100% accurate - my postgres data files aren''t owned by me;) Other things (I''ve mentioned to Lori off alias) is the possible ability to compress some file systems - again possibly /usr and /opt??? Breaking out the namespace provides the flexibility of separate file systems and snapping/cloning/administrating those as needed with the benefits of a single root file system - one disk and not having to get the partition space right. But, there is the matter of balance - too much would be overkill. Perhaps the split and merge RFE''s would bridge that gap to provide again more flexibility?> > In other words, think about ALL of the implications > of using zfs, > not just some. > > We''ve come up with a lot of good reasons for having > multiple > file systems. So we know that there are benefits. > We also know > hat there are costs. But if we can figure out a way > to keep the > costs low, the benefits might outweigh them. > > > > > In other words, let people have a system with just > one filesystem. > I think I can agree with this, but I''m not absolutely > certain. On the > one hand, sure, more freedom is better. But I''m > concerned that > our long-term install and upgrade strategies might be > constrained > by having to support configurations that haven''t been > set up with > the granularity needed for some kinds of valuable > storage management > features. > > This conversation is great! I''m getting lots of good > information > and I *really* want to figure out what''s best, even > if it challenges > some of my cherished notions. > > Lori > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss >This message posted from opensolaris.org
On 4/24/07, Darren J Moffat <Darren.Moffat at sun.com> wrote:> Other than /var/tmp my short list for being separate ZFS datasets are: > > /var/crash - because can be big and we might want quotas. > /var/core [ which we don''t yet have by default but I''m considering > submitting an ARC case for this. ] - as above. > /var/tm Similar to the /var/log rationale.How does this[1] play with live upgrade or a like technology? Presumably a boot environment is created with "zfs snapshot -r". There is very significant value in having the notion of "boot environment data" and app or user data on a server. By having this distinction, it greatly takes away the significance of the question that I keep getting from Sun folks: "how long are you OK between lucreate and luactivate?" One word of caution with "/var/core" is that it makes per-process core dumps for a process with cwd of /var impossible, assuming per-process core pattern is still "core". My approach to this is to have a global core dump pattern of /var/cores/core-%... and have core dump logging enabled. If /var/cores doesn''t exist, I get a syslog message saying that the core dump failed which is usually all the information I need (no, the sysadmin didn''t kill your process, your vendor''s programmers did). In the relatively rare cases where I need to capture a core, I can create /var/cores then rename it once I have good data. [1] And additional proliferation of file systems proposed in this thread. Many file systems can be good but too many become a headache.> I think we have lots of options but it might be nice to come up with a > short list of special/important directories that would should always > recommend be separate datasets - lets not hardcode that into the > installer though (heck we still think /usr/openwin is special !)Most certainly. While I find separating file systems based upon software management boundaries[2], others feel there is more benefit in different strategies[3]. The installer needs reasonable defaults but should only enforce the creation of the set that is really the minimum. [2] /, /usr, /var, and /opt all belong together because everything there is managed by pkgadd/pkgrm/patchadd/patchrm. Some random app that installs in /opt/random-app through a custom installer (and as such is likely administered by non-root, is portable across boot environments) gets its own file system. /var/tmp has no software in it and can be abused to hurt the rest of the system - that''s a good candidate for another FS. [3] Or are simply afraid to deviate from the advice they received in 1988 to have / and /usr as separate file systems> One of the things I''m really interested in seeing is more appropriate > sharing with Zones because we have more flexibility in the installer as > it becomes zone aware. What I''d love to see is that we completely > abandon the package based boundaries for Zones and instead use one based > only on the actual filesystem namespace and use Zones to get the best > out of that.I don''t follow this. It seems to me that zones are very much based upon file system name space and not on package boundaries. For example, a package that has components in /etc and /usr (for better or for worse) installs, uninstalls, and propagates into full and sparse zones properly. I was actually quite impressed that this worked out so well. What I would find really useful is something that allows me to create a zone by cloning the global zone''s file systems then customizing as required. When I patch the server (global zone, propagate patches to local zone), the global zone and the non-global zones should still be referencing the same disk blocks for almost everything in /usr, /lib, etc. (not /etc :) ). The best hope of this right now is some sort of de-duplication that seems not to be high on the list of coming features. This would give the benefits of sparse zones (more efficient use of memory, etc.) without the drawback of not being able to even create mount points for other file systems. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
> With zfs, file systems are in many ways more like directories than whatwe used to call file systems. They draw from pooled storage. They have low overhead and are easy to create and destroy. File systems are sort of like super-functional directories, with quality-of-service control and cloning and snapshots. When you put it that way, I really look forward to an explorer.exe-style file browser tree with pools at the top, maroon file systems underneath, and yellow directories underneath those. I can see a time 5 years down the road where ZFS file systems are actually called "superfolders"! :) *mentally right clicks on /pool/mydocuments and chooses "revert to yesterday''s snapshot"* This message posted from opensolaris.org
On 4/26/07, Lori Alt <Lori.Alt at sun.com> wrote:> Peter Tribble wrote: > > On 4/24/07, Darren J Moffat <Darren.Moffat at sun.com> wrote: > >> With reference to Lori''s blog posting[1] I''d like to throw out a few of > >> my thoughts on spliting up the namespace. > > > > Just a plea with my sysadmin hat on - please don''t go overboard > > and make new filesystems just because we can. Each extra > > filesystem generates more work for the administrator, if only > > for the effort to parse df output (which is more than cluttered enough > > already). > My first reaction to that, is yes, of course, extra file systems are extra > work. Don''t require them, and don''t even make them the default unless > they buy you a lot. But then I thought, no, let''s challenge that a bit. > > Why do administrators do ''df'' commands? It''s to find out how much space > is used or available in a single file system. That made sense when file > systems each had their own dedicated slice, but now it doesn''t make that > much sense anymore. Unless you''ve assigned a quota to a zfs file system, > "space available" is meaningful more at the pool level.True, but it''s actually quite hard to get at the moment. It''s easy if you have a single pool - it doesn''t matter which line you look at. But once you have 2 or more pools (and that''s the way it would work, I expect - a boot pool and 1 or more data pools) there''s an awful lot of output you may have to read. This isn''t helped by zpool and zfs giving different answers., with the one from zfs being the one I want. The point is that every filesystem adds additional output the administrator has to mentally filter. (For one thing, you have to map a directory name to a containing pool.)> With zfs, file systems are in many ways more like directories than what > we used to call file systems. They draw from pooled storage. They > have low overhead and are easy to create and destroy. File systems > are sort of like super-functional directories, with quality-of-service > control and cloning and snapshots. Many of the things that sysadmins > used to have to do with file systems just aren''t necessary or even > meaningful anymore. And so maybe the additional work of managing > more file systems is actually a lot smaller than you might initially think.Oh, I agree. The trouble is that sysadmins still have to work using their traditional tools, including their brains, which are tooled up for cases with a much lower filesystem count. What I don''t see as part of this are new tools (or enhancements to existing tools) that make this easier to handle. For example, backup tools are currently filesystem based. Eventually, the tools will catch up. But my experience so far is that while zfs is fantastic from the point of view of pooling, once I''ve got large numbers of filesystems and snapshots and clones thereof, and the odd zvol, it can be a devil of a job to work out what''s going on. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Sat, 2007-04-28 at 17:48 +0100, Peter Tribble wrote:> On 4/26/07, Lori Alt <Lori.Alt at sun.com> wrote: > > Peter Tribble wrote: > <snip>> Why do administrators do ''df'' commands? It''s to find out how much space > > is used or available in a single file system. That made sense when file > > systems each had their own dedicated slice, but now it doesn''t make that > > much sense anymore. Unless you''ve assigned a quota to a zfs file system, > > "space available" is meaningful more at the pool level. > > True, but it''s actually quite hard to get at the moment. It''s easy if > you have a single pool - it doesn''t matter which line you look at. > But once you have 2 or more pools (and that''s the way it would > work, I expect - a boot pool and 1 or more data pools) there''s > an awful lot of output you may have to read. This isn''t helped > by zpool and zfs giving different answers., with the one from zfs > being the one I want. The point is that every filesystem adds > additional output the administrator has to mentally filter. (For > one thing, you have to map a directory name to a containing > pool.)It''s actually quite easy and easier than the other alternatives (ufs, veritas, etc): # zfs list -rH -o name,used,available,refer rootdg And now it''s setup to be parsed by a script (-H) since the output is tabbed. The -r says to recursively display children of the parent and the -o with the specified fields says to only display the fields specified. (output from one of my systems) blast(9):> zfs list -rH -o name,used,available,refer rootdg rootdg 4.39G 44.1G 32K rootdg/nvx_wos_62 4.38G 44.1G 503M rootdg/nvx_wos_62/opt 793M 44.1G 793M rootdg/nvx_wos_62/usr 3.01G 44.1G 3.01G rootdg/nvx_wos_62/var 113M 44.1G 113M rootdg/swapvol 16K 44.1G 16K Even tho the mount point is setup as a legacy mount point, I know where each of them is mounted due to the vol name. And yes, this system has more than one pool: blast(10):> zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT lpool 17.8G 11.4G 6.32G 64% ONLINE - rootdg 49.2G 4.39G 44.9G 8% ONLINE -> > > With zfs, file systems are in many ways more like directories than what > > we used to call file systems. They draw from pooled storage. They > > have low overhead and are easy to create and destroy. File systems > > are sort of like super-functional directories, with quality-of-service > > control and cloning and snapshots. Many of the things that sysadmins > > used to have to do with file systems just aren''t necessary or even > > meaningful anymore. And so maybe the additional work of managing > > more file systems is actually a lot smaller than you might initially think. > > Oh, I agree. The trouble is that sysadmins still have to work using > their traditional tools, including their brains, which are tooled up > for cases with a much lower filesystem count. What I don''t see as > part of this are new tools (or enhancements to existing tools) that > make this easier to handle.Not sure I agree with this. Many times, you end up dealing with multiple vxvol''s and file systems. Anything over 12 filesystems and you''re in overload (at least for me;) and I used my monitoring and scripting tools to filter that for me. Many of the systems I admin''d were setup quite differently based on use and functionality and disk size. Most of my tools were setup to take most of these into consideration and the fact that we ran almost every flavor of UNIX possible using the features of each OS as appropriate. Most of the tools will still work with zfs (if using df, etc) but it actually makes it easier once you have a monitoring issue - running out of space for example. Most tools have high and low water marks so when a file system gets too full, you get a warning. ZFS makes this much easier to admin as you can see which file system is being the hog and go directly to that file system and hunt instead of first finding the file system, hence the debate of the all-in-one / slice or breaking up to the major os fs''s. Benefit of all-in-one / is you didn''t have to guess at how much space you needed for each slice so you could upgrade, add optional software without needing to grow/shrink the OS. Drawback, if you filled up the file system, you had to hunt where it was filling up - /dev, /usr, /var/tmp, /var, / ??? Benefit of multiple slices was one fs didn''t affect the others if you filled it up and you could find which was the problem fs very easily but if you estimated incorrectly, you had wasted disk space in one slice and not enough in another. ZFS gives you the benefit of both all-in-one and partitioned as it draws from a single pool of storage but also allows you to find which fs is being the problem and lock it down with quota''s and reservations.> > For example, backup tools are currently filesystem based.And this changes the scenario how? I''ve actually been pondering this for quite some time now. Why do we backup the root disk? With many of the tools out now, it makes far more sense to do a flar/incremental flars of the systems and or create custom jumpstart profiles to rebuild the system. Typical scenario for loosing the root file systems (catastrophic) is to restore the OS, install the backup software to the fresh install, then restore the OS via backup software to mirror disk. Why not just restore the OS from a base flar and apply the incremental? Application data is what you really care about and any specific config changes to the OS itself, the rest is fairly generic OS install w/ patches. Other scenario is ufsdump/restore. In that case, it doesn''t really change the scenario any as the scripts iterate across the file systems you want to dump anyway (at least mine).> > Eventually, the tools will catch up. But my experience so far > is that while zfs is fantastic from the point of view of pooling, > once I''ve got large numbers of filesystems and snapshots > and clones thereof, and the odd zvol, it can be a devil of > a job to work out what''s going on.No more difficult than doing ufs/vxfs snapshots and quick I/O, etc. Only thing that really changes is the specific command for each and if you''re doing that, then you''ve already got the infrastructure for it setup. But that''s just my viewpoint... -- Mike Dotson
On 4/28/07, Mike Dotson <Mike.Dotson at sun.com> wrote:> And this changes the scenario how? I''ve actually been pondering this > for quite some time now. Why do we backup the root disk? With many of > the tools out now, it makes far more sense to do a flar/incremental > flars of the systems and or create custom jumpstart profiles to rebuild > the system.I would love to see flash archive content that is the result of "zfs send". Incrementals are easy to do so long as you keep the initial (pristine) snapshot around that matches up exactly with the flar that was initially applied. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Mike Dotson wrote:> On Sat, 2007-04-28 at 17:48 +0100, Peter Tribble wrote: > >> On 4/26/07, Lori Alt <Lori.Alt at sun.com> wrote: >> >>> Peter Tribble wrote: >>> >> <snip> >> > > >> Why do administrators do ''df'' commands? It''s to find out how much space >> >>> is used or available in a single file system. That made sense when file >>> systems each had their own dedicated slice, but now it doesn''t make that >>> much sense anymore. Unless you''ve assigned a quota to a zfs file system, >>> "space available" is meaningful more at the pool level. >>> >> True, but it''s actually quite hard to get at the moment. It''s easy if >> you have a single pool - it doesn''t matter which line you look at. >> But once you have 2 or more pools (and that''s the way it would >> work, I expect - a boot pool and 1 or more data pools) there''s >> an awful lot of output you may have to read. This isn''t helped >> by zpool and zfs giving different answers., with the one from zfs >> being the one I want. The point is that every filesystem adds >> additional output the administrator has to mentally filter. (For >> one thing, you have to map a directory name to a containing >> pool.) >> > > It''s actually quite easy and easier than the other alternatives (ufs, > veritas, etc): > > # zfs list -rH -o name,used,available,refer rootdg > > And now it''s setup to be parsed by a script (-H) since the output is > tabbed. The -r says to recursively display children of the parent and > the -o with the specified fields says to only display the fields > specified. > > (output from one of my systems) > > blast(9):> zfs list -rH -o name,used,available,refer rootdg > rootdg 4.39G 44.1G 32K > rootdg/nvx_wos_62 4.38G 44.1G 503M > rootdg/nvx_wos_62/opt 793M 44.1G 793M > rootdg/nvx_wos_62/usr 3.01G 44.1G 3.01G > rootdg/nvx_wos_62/var 113M 44.1G 113M > rootdg/swapvol 16K 44.1G 16K > > Even tho the mount point is setup as a legacy mount point, I know where > each of them is mounted due to the vol name. > > > And yes, this system has more than one pool: > > blast(10):> zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > lpool 17.8G 11.4G 6.32G 64% ONLINE - > rootdg 49.2G 4.39G 44.9G 8% ONLINE - > > > >>> With zfs, file systems are in many ways more like directories than what >>> we used to call file systems. They draw from pooled storage. They >>> have low overhead and are easy to create and destroy. File systems >>> are sort of like super-functional directories, with quality-of-service >>> control and cloning and snapshots. Many of the things that sysadmins >>> used to have to do with file systems just aren''t necessary or even >>> meaningful anymore. And so maybe the additional work of managing >>> more file systems is actually a lot smaller than you might initially think. >>> >> Oh, I agree. The trouble is that sysadmins still have to work using >> their traditional tools, including their brains, which are tooled up >> for cases with a much lower filesystem count. What I don''t see as >> part of this are new tools (or enhancements to existing tools) that >> make this easier to handle. >> > > Not sure I agree with this. Many times, you end up dealing with > multiple vxvol''s and file systems. Anything over 12 filesystems and > you''re in overload (at least for me;) and I used my monitoring and > scripting tools to filter that for me. > > Many of the systems I admin''d were setup quite differently based on use > and functionality and disk size. > > Most of my tools were setup to take most of these into consideration and > the fact that we ran almost every flavor of UNIX possible using the > features of each OS as appropriate. > > Most of the tools will still work with zfs (if using df, etc) but it > actually makes it easier once you have a monitoring issue - running out > of space for example. > > Most tools have high and low water marks so when a file system gets too > full, you get a warning. ZFS makes this much easier to admin as you can > see which file system is being the hog and go directly to that file > system and hunt instead of first finding the file system, hence the > debate of the all-in-one / slice or breaking up to the major os fs''s. > > Benefit of all-in-one / is you didn''t have to guess at how much space > you needed for each slice so you could upgrade, add optional software > without needing to grow/shrink the OS. > > Drawback, if you filled up the file system, you had to hunt where it was > filling up - /dev, /usr, /var/tmp, /var, / ??? > > Benefit of multiple slices was one fs didn''t affect the others if you > filled it up and you could find which was the problem fs very easily but > if you estimated incorrectly, you had wasted disk space in one slice and > not enough in another. > > ZFS gives you the benefit of both all-in-one and partitioned as it draws > from a single pool of storage but also allows you to find which fs is > being the problem and lock it down with quota''s and reservations. > > >> For example, backup tools are currently filesystem based. >> > > And this changes the scenario how? I''ve actually been pondering this > for quite some time now. Why do we backup the root disk? With many of > the tools out now, it makes far more sense to do a flar/incremental > flars of the systems and or create custom jumpstart profiles to rebuild > the system.Usually because "thats the way we''ve always done it" or "our operations are such that changing is cost prohibitive" or ....