A sysadmin friend of mine at Oxford just posted the attached in his LiveJournal; I am forwarding it here for comments, since it swings between problem definition and rant, and today''s Friday. I have my opinions regarding some of the points raised; I am seeking breadth of insight. At very least his post raises some communications issues, and a few technical ones of implementation, and migration strategy. URL of original: http://mrod.livejournal.com/211908.html ; Steve has agreed it''s OK for me to post this here, since he''s not subscribed... - alec> ZFS: How its design seems to be more trouble than its worth. > > Now, let me say this first; ZFS seems like a wonderful thing. In > fact, it is wonderful except for a couple of things, which makes it > totally undeployable for our new server. Actually, let''s put this > another way. One thing makes it impossible because the ZFS way of > doing things is mutually exclusive with the way our system (and > probably a huge number of other legacy systems) works. > > The main bugbear is what the ZFS development team laughably call > quotas. They aren''t quotas, they are merely filesystem size > restraints. To get around this the developers use the "let them eat > cake" mantra, "creating filesystems is easy" so create a new > filesystem for each user, with a "quota" on it. This is the ZFS way. > > Unfortunately, this causes a number of problems (above the fact > that there''s no soft quota). Firstly, no instead of having only a > few filesystems mounted you have "system mounts + number of users" > mounted filesystems, which makes df a pain to use. Secondly, > there''s no way of having a shared directory structure with > individual users having separate file quotas within it. But > finally, and this is the critical problem, each user''s home > directory is now a separate NFS share. > > At first look that final point doesn''t seem to be much of a worry > until you look at the implications that brings. To cope with a > distributed system with a large number of users the only managable > way of handling NFS mounts is via an automounter. The only > alternative would be to have an fstab/vfstab file holding every > filesystem any user might want. In the past this has been no > problem at all, for all your user home directories on a server you > could just export the parent directory holding all the user home > directories and put a line "users -rw,intr myserver:/disks/users" > and it would work happily. > > Now, with each user having a separate filesystem this breaks. The > automounter will mount the parent filesystem as before but all you > will see are the stub directories ready for the ZFS daughter > filesystems to mount onto and there''s no way of consolidating the > ZFS filesystem tree into one NFS share or rules in automount map > files to be able to do sub-directory mounting. > > Of course, the ZFS developers would argue that you should change > the layout of your automounted filesystems to fit with the new > scheme. This would mean that users'' home directories would appear > directly below /home, say. > > The problem here is one of legacy code, which you''ll find > throughout the academic, and probably commercial world. Basically, > there''s a lot of user generated code which has hard coded paths so > any new system has to replicate what has gone before. (The current > system here has automount map entries which map new disks to the > names of old disks on machines long gone, e.g. /home/eeyore_data/ ) > > The ZFS developers don''t seem to see real-world problems, or maybe > they don''t WANT to see them as it would make thier lives more > complicated. It''s far easier to be arrogant and use the "let them > eat cake" approach rather than engineer a real solution to the > problem, such as actually programming a true quota system. > > As it is, it seems that for our new fileserver I''m going to have to > back off from ZFS and use the old software device concatenation > with UFS on top, which is a right pain and not very resilient. >
Robert Thurlow
2007-Sep-07 19:39 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Alec Muffett wrote:>> But >> finally, and this is the critical problem, each user''s home >> directory is now a separate NFS share. >> >> At first look that final point doesn''t seem to be much of a worry >> until you look at the implications that brings. To cope with a >> distributed system with a large number of users the only managable >> way of handling NFS mounts is via an automounter. The only >> alternative would be to have an fstab/vfstab file holding every >> filesystem any user might want. In the past this has been no >> problem at all, for all your user home directories on a server you >> could just export the parent directory holding all the user home >> directories and put a line "users -rw,intr myserver:/disks/users" >> and it would work happily. >> >> Now, with each user having a separate filesystem this breaks. The >> automounter will mount the parent filesystem as before but all you >> will see are the stub directories ready for the ZFS daughter >> filesystems to mount onto and there''s no way of consolidating the >> ZFS filesystem tree into one NFS share or rules in automount map >> files to be able to do sub-directory mounting.Sun''s NFS team is close to putting back a fix to the Nevada NFS client for this where a single mount of the root of a ZFS tree lets you wander into the daughter filesystems on demand, without automounter configuration. You have to be using NFSv4, since it relies on the server namespace protocol feature. Some other NFSv4 clients already do this. This has always been a part of the plan to cope with more right-sized filesystems, we''ve just not there yet. For NFSv2/v3, there''s no easy answers. Some have experimented with executable automounter maps that build a list of filesystems on the fly, but ick. At some point, some of the global namespace ideas we kick around may benefit NFSv2/v3 as well. Rob T
On 9/7/07, Alec Muffett <Alec.Muffett at sun.com> wrote:> > The main bugbear is what the ZFS development team laughably call > > quotas. They aren''t quotas, they are merely filesystem size > > restraints. To get around this the developers use the "let them eat > > cake" mantra, "creating filesystems is easy" so create a new > > filesystem for each user, with a "quota" on it. This is the ZFS way.Having worked in academia and multiple Fortune 100''s, the problem seems to be most prevalent in academia, although possibly a minor inconvenience in some engineering departments in industry. In the .edu where I used to manage the UNIX environment, I would have a tough time weighing the complexities of quotas he mentions vs. the other niceties. My guess is that unless I had something that was really broken, I would stay with UFS or VxFS waiting for a fix. It appears as though the author has not yet tried out snapshots. The fact that space used by a snapshot for the sysadmin''s convenience counts against the user''s quota is the real killer. This would force me into a disk to disk (rsync, because "zfs send | zfs recv" would require snapshots to stay around for incrementals) backup + snapshot scenario to be able to keep snapshots while minimizing their impact on users. That means double the disk space. Doubling the quota is not an option because without soft quotas there is no way to keep people from using all of their space. Frankly, that would be so much trouble I would be better off using tape for restores, just like with UFS or VxFS.> > Now, with each user having a separate filesystem this breaks. The > > automounter will mount the parent filesystem as before but all you > > will see are the stub directories ready for the ZFS daughter > > filesystems to mount onto and there''s no way of consolidating the > > ZFS filesystem tree into one NFS share or rules in automount map > > files to be able to do sub-directory mounting.While NFS4 holds some promise here, it is not a solution today. It won''t be until all OS''s that came out before 2008 are gone. That will be a while. Use of macros (e.g. * server:/home/&) can go a long ways. If that doesn''t do it, an executable map that does the appropriate munging may be in order.> > The problem here is one of legacy code, which you''ll find > > throughout the academic, and probably commercial world. Basically, > > there''s a lot of user generated code which has hard coded paths so > > any new system has to replicate what has gone before. (The current > > system here has automount map entries which map new disks to the > > names of old disks on machines long gone, e.g. /home/eeyore_data/ )Put such entries before the * entry and things should be OK. For me, quotas are likely to be a pain point that prevents me from making good use of snapshots. Getting changes in application teams'' understanding and behavior is just too much trouble. Others are: 1. There seems to be no integration with backup tools that are time+space+I/O efficient. If my storage is on Netapp, I can use NDMP to do incrementals between snapshots. No such thing exists with ZFS. 2. Use of clones is out because I can''t do a space-efficient restore. 3. ARC messes up my knowledge of how much RAM my machine is making good use of. After the first backup, vmstat says that I am just at the brink of not having enough RAM that paging (file system and pager) will begin soon. This may be fine on a file server, but it really messes with me if it is a J2EE server and I''m trying to figure out how many more app servers I can add. I have a lot of hopes for ZFS and have used it with success (and failures) in limited scope. I''m sure that with time the improvements will come that make that scope increase dramatically, but for now it is confined to the lab. :( Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Nicolas Williams
2007-Sep-07 20:27 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
The complaint is not new, and the problem isn''t quotas or lack thereof. The problem is that remote filesystem clients can''t cope with frequent changes to a server''s share list, which is just ZFS''s "filesystems are cheap" approach promotes. Basically ZFS was ahead of everyone''s implementation of NFSv4 client- side mount mirroring, which would very much help with the dynamic nature of ZFS usage. It does not help that no NFSv3 automounter is sufficiently dynamic to reasonably cope with filesystems coming and going. Given the automounter pain this customer would like to have one large filesystem and quotas. And that''s how quotas are a secondary problem.
On 9/7/07, Mike Gerdts <mgerdts at gmail.com> wrote:> For me, quotas are likely to be a pain point that prevents me from > making good use of snapshots. Getting changes in application teams'' > understanding and behavior is just too much trouble. Others are:not to mention there are smaller-scale users that want the data protection, checksumming and scalability that ZFS offers (although the whole zdev/zpool/etc. thing might wind up causing me to have to buy more disks to add more space, if i were to use it) it would be nice to have a ZFS lite(tm) for those of us that just want easily expandable filesystems (as in, add a new disk/device and not have to think of some larger geometry) with inline checksumming/COW/metadata/ditto blocks/etc/etc goodness. basically like a home edition. i don''t care about LUNs, send/receive, quotas, snapshots (for the most part), setting up different zpools to gain specific performance benefits, etc. i just want raid-z/raid-z2 with a easy way to add disks. i have not actually used ZFS yet because i''ve been waiting for opensolaris/solaris (or even freebsd possibly) to support eSATA hardware or something related. the hardware support front for SOHO users has also been slow. that''s not a shortcoming of ZFS though... but does make me wish i had the basic protection features of ZFS with hardware support like linux. - my two cents
Mike Gerdts wrote:> It appears as though the author has not yet tried out snapshots. The > fact that space used by a snapshot for the sysadmin''s convenience > counts against the user''s quota is the real killer.Very soon there will be another way to specify quotas (and reservations) such that they only apply to the space used by the active dataset. This should make the effect of quotas more obvious to end users while allowing them to remain blissfully unaware of any snapshot activity by the sysadmin. -Chris
Brian H. Nelson
2007-Sep-07 21:23 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Mike Gerdts wrote:> Having worked in academia and multiple Fortune 100''s, the problem > seems to be most prevalent in academia, although possibly a minor > inconvenience in some engineering departments in industry. In the > .edu where I used to manage the UNIX environment, I would have a tough > time weighing the complexities of quotas he mentions vs. the other > niceties. My guess is that unless I had something that was really > broken, I would stay with UFS or VxFS waiting for a fix. >UFS on a zvol is a pretty good compromise. You get lots of the nice ZFS stuff (checksums, raidz/z2, snapshots, growable pool, etc) with no changes in userland. There are a couple gotcha''s but as long as you''re aware of them, it works pretty good. We''ve been using it since January. -Brian -- --------------------------------------------------- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu ---------------------------------------------------
Stephen Usher
2007-Sep-07 22:25 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
I''ve just subscribed to this list after Alec''s posting and reading the comments in the archive and I have a couple of comments: Mike Gerdts: While NFS4 holds some promise here, it is not a solution today. It won''t be until all OS''s that came out before 2008 are gone. That will be a while. Well, seeing as only a few days ago I put the last of our SPARCstation 1s into the recycle pile and have in daily use a DEC Alphastation (circa 1996) running Digital UNIX 4.2C, which the new server will need to support, and that I''ve just managed to migrate the last machine off Solaris 7 (I still have many-many machines on Solaris 8) I can see it being at least a decade until all the machines we have being at a level to handle NFSv4. From your analysis it does look like UFS is the only way to go presantly. However, this is likely to mean that I''m tied to UFS for the lifetime of the server, which is probably in the 7-10 year timescale. Brian H. Nelson: I''m sure it would be interesting for those on the list if you could outline the gotchas so that the rest of us don''t have to re-invent the wheel... or at least not fall down the pitfalls. Nicolas Williams: Unfortunately for us at the coal face it''s very rare that we can do the ideal thing. Quotas are part of the problem but the main problem is that there is currently no way over overcoming the interoperability problems using the toolset offered by ZFS. One way around this for NFSv2/3 clients would be if the ZFS NFS server could "consolidate" a tree of filesystems so that to the clients it looks like one filesystem. From the outside the development group this seems like the 90% solution which would probably take less engineering effort than the full implementation of a user quota system. I''m not sure why the OS (outside the ZFS subsystem) would need to know that the directory tree it''s seeing is composed of separate "filesystems" and is not just one big filesystem. (Unless, of course, there are tape archival programs which require to save and recreate ZFS sub-filesystems.) It would also have the added benefit of making df(1) usable again. ;-) Believe me when I say that I''d love to use ZFS and would love to be able to recommend it to everyone as, other than this particular set of problems, it seems such a great system. My posting on Slashdot was the culmination of frustration and disappointment after a number of days trying every trick I could think of to get it working and failing. Steve -- --------------------------------------------------------------------------- Computer Systems Administrator, E-Mail:-steve at earth.ox.ac.uk Department of Earth Sciences, Tel:- +44 (0)1865 282110 University of Oxford, Parks Road, Oxford, UK. Fax:- +44 (0)1865 272072
Nicolas Williams
2007-Sep-07 23:14 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
On Fri, Sep 07, 2007 at 11:25:38PM +0100, Stephen Usher wrote:> Nicolas Williams: > > Unfortunately for us at the coal face it''s very rare that we can do the > ideal thing. Quotas are part of the problem but the main problem is that > there is currently no way over overcoming the interoperability problems > using the toolset offered by ZFS.Understood. I''ll let the ZFS team answer this.> One way around this for NFSv2/3 clients would be if the ZFS NFS server > could "consolidate" a tree of filesystems so that to the clients it > looks like one filesystem. From the outside the development group this > seems like the 90% solution which would probably take less engineering > effort than the full implementation of a user quota system. I''m not sure > why the OS (outside the ZFS subsystem) would need to know that the > directory tree it''s seeing is composed of separate "filesystems" and is > not just one big filesystem. (Unless, of course, there are tape archival > programs which require to save and recreate ZFS sub-filesystems.) It > would also have the added benefit of making df(1) usable again. ;-)Unfortunately there''s no way to do this and preserve NFS and POSIX semantics (those preserved by NFS). Think of hard links, to name but one very difficult problem. Just the task of creating a uniform, persistent inode number space out of a multitude of distinct filesystems would be daunting indeed. That is, there are good technical reasons why what you propose is non-trivial. The "why the OS ... would need to know that the directory tree it''s seeing is composed of separate "filesystems"" lies in POSIX semantics. And it''s as true on the client side as on the server side. The problem you''re running into is a limitation of the *client*, not of the server. The quota support you''re asking for is to enable a server-side workaround for a client-side problem..> Believe me when I say that I''d love to use ZFS and would love to be able > to recommend it to everyone as, other than this particular set of > problems, it seems such a great system. My posting on Slashdot was the > culmination of frustration and disappointment after a number of days > trying every trick I could think of to get it working and failing.My view (remember, I''m not in the ZFS team) is that ZFS may simply not be applicable to your use case. That you may find other use cases where it is applicable. If adding quota support is easy, if it''s all you need to workaround the automounter issue and if my opinion mattered, then I''d say that we should have ZFS quotas.
On 9/7/07, Stephen Usher <Stephen.Usher at earth.ox.ac.uk> wrote:> > Brian H. Nelson: > > I''m sure it would be interesting for those on the list if you could > outline the gotchas so that the rest of us don''t have to re-invent the > wheel... or at least not fall down the pitfalls.The UFS on zvols option sounds intriguing to me, but I would guess that the following could be problems: 1) Double buffering: Will ZFS store data in the ARC while UFS uses traditional file system buffers? 2) Boot order dependencies. How does the startup of zfs compare to processing of /etc/vfstab? I would guess that this is OK due to legacy mount type supported by zfs. If this is OK, then dfstab processing is probably OK. I say intriguing because it could give you a the improved data integrity checks and bit more flexibility in how you do things like backups and restores. Snapshots of the zvols could be mounted as other UFS file systems that could allow for self-service restores. Perhaps this would make it so that you can write data to tape a bit less frequently. If deduplication comes into zfs, you may be able to get to a point where course project instructions that say "cp ~course/hugefile ~" become not so expensive - you would be charging quota to each user but only storing one copy. Depending on the balance of CPU power vs. I/O bandwidth, compressed zvols could be a real win, more than paying back the space required to have a few snapshots around. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Agreed on the quota issue. When you have 50K users, having a filesystem per user becomes unwieldy and effectively unusable. This message posted from opensolaris.org
Brian Hechinger
2007-Sep-08 01:01 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
On Fri, Sep 07, 2007 at 06:19:34PM -0500, Mike Gerdts wrote:> > backups and restores. Snapshots of the zvols could be mounted as > other UFS file systems that could allow for self-service restores. > Perhaps this would make it so that you can write data to tape a bit > less frequently.This would be a huge win I think. We do something similar with our mail system (NFS mounted to a NetApp). We quicese all the dbs (bdb essentially) and execute a snapshot. Takes mere moments. Then we backup from the snapshot. This allows us to perform a multi-hour backup without having to take the mailsystem offline at all. To be able to apply this to other systems, especially ones that wouldn''t even know any better (UFS, NTFS, etc) would certainly be a nice way to go. In fact, I''ll have to try this on the XP box on my desk that mounts iSCSI zvols the next time I''m in the office. ;) -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it''s just that most of the shit out there is built by people who''d be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke
Casper.Dik at Sun.COM
2007-Sep-08 12:54 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
>For NFSv2/v3, there''s no easy answers. Some have experimented >with executable automounter maps that build a list of filesystems >on the fly, but ick. At some point, some of the global namespace >ideas we kick around may benefit NFSv2/v3 as well.The question for me is: why does this work for /net mounts (to a point, of course) and why can''t we emulate this for other mount points? Casper
Richard Elling
2007-Sep-08 18:30 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Stephen Usher wrote:> I''ve just subscribed to this list after Alec''s posting and reading the > comments in the archive and I have a couple of comments: >Welcome Steve, I think you''ll find that we rehash this about every quarter with an extra kicker just before school starts in the fall. Changing the topic slightly, the strategic question is: why are you providing disk space to students? When you solve this problem, the quota problem is moot. NB. I managed a large University network for several years, and am fully aware of the costs involved. I do not believe that the 1960s timeshare model will survive in such environments. -- richard
On 9/8/07, Richard Elling <Richard.Elling at sun.com> wrote:> Changing the topic slightly, the strategic question is: > why are you providing disk space to students?For most programming and productivity (e.g. word processing, etc.) people will likely be better suited by having network access for their personal equipment with local storage. For cases when specialized expensive tools ($10k + per seat) are used, it is not practical to install them on hundreds or thousands of personal devices for a semester or two of work. The typical computing lab that provides such tools is not well equipped to deal with removable media such as flash drives. Further, such tools will often times be used to do designs that require simulations to run as batch jobs that run under grid computing tools such as Grid Engine, Condor, LSF, etc. Then, of course, there are files that need to be shared, have reliable backups, etc. Pushing that out to desktop or laptop machines is not really a good idea. -- Mike Gerdts http://mgerdts.blogspot.com/
Stephen Usher
2007-Sep-08 19:33 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Richard Elling wrote:> Stephen Usher wrote: >> I''ve just subscribed to this list after Alec''s posting and reading >> the comments in the archive and I have a couple of comments: >> > > Welcome Steve, > I think you''ll find that we rehash this about every quarter with an > extra kicker just before school starts in the fall. > > Changing the topic slightly, the strategic question is: > why are you providing disk space to students?This is actually the research network, so this is for facalty, post-doctural fellows and post-graduate students to do their research jobs. The only undergraduates involved are 4th year ones doing research projects within the research teams. The space being allocated is the basic resource supplied centrally by the Department and for some is the only resource that they have as they don''t get any money for their own computing systems in their grants.> When you solve this problem, the quota problem is moot.Not really, not when you have few resources but have to give them out fairly. Steve
> why are you providing disk space to students? > > When you solve this problem, the quota problem is moot. > > NB. I managed a large University network for several years, and > am fully aware of the costs involved. I do not believe that the > 1960s timeshare model will survive in such environments.So are you saying you don''t believe the network is the computer? :-)
Robert Thurlow
2007-Sep-08 21:25 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Casper.Dik at Sun.COM wrote:> >> For NFSv2/v3, there''s no easy answers. Some have experimented >> with executable automounter maps that build a list of filesystems >> on the fly, but ick. At some point, some of the global namespace >> ideas we kick around may benefit NFSv2/v3 as well. > > > The question for me is: why does this work for /net mounts (to a point, > of course) and why can''t we emulate this for other mount points?Mounts under /net are derived from the filesystems actually shared from the servers; the automount daemon uses the MOUNT protocol to determine this. If you''re looking at a path not already seen, the information will be fresh, but that''s where the good news ends. We don''t refresh this information reliably, so if you add a new share in a directory we''ve already scanned, you won''t see it until the mounts time out and are removed. We should refresh this data more readily and no matter what the source of data. Rob T
Stephen Usher
2007-Sep-09 09:37 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Mike Gerdts wrote:> On 9/8/07, Richard Elling <Richard.Elling at sun.com> wrote: >> Changing the topic slightly, the strategic question is: >> why are you providing disk space to students? > > For most programming and productivity (e.g. word processing, etc.) > people will likely be better suited by having network access for their > personal equipment with local storage.Local storage would be a nightmare for secure back-ups. Having said that, for those using Windows PC and MacOS X we do let them have control of their machine and store things locally, but it''s their own risk. The central service merely provides a (smallish) home directory which we guarantee to back-up. Quotas are needed in this case because users can''t be trusted to play fair, especially if they don''t realise how bit the files that they are dragging and dropping are. These machines are also firewalled to hell and back. For the rest of the researchers, who have Linux or Solaris machines, we do not allow them administrative access. All software and home directories are NFS mounted from the central server so that any machine a user logs into will give them the same set of tools so that they can do their work anywhere they need to. Thier home directories need to be policed by the system because users can''t be fully trusted to play fair and secondly some software will try to cache lots of data in their home directories without the user knowing. Now, in our current set-up all these users have a soft limit and a hard quota. Every night a cron job parses the output of repquota -a and informs those people who have gone overtheir soft quota and hard quota. The difference in size between the soft and hard quotas is enough that, in general, it doesn''t affect the user''s work and allows them to remediate the problem before it becomes critical (and important files suddenly get emptied or the user can''t log in). For large datasets the research groups have their own servers from which data etc. is available. As said previously, the central allocation of space is merely enough for day-to-day documents/theses/papers etc. Oh, and our HPC grid is fully intergrated into this set-up as well. The idea being a consistant experience throughout the research network. Steve -- --------------------------------------------------------------------------- Computer Systems Administrator, E-Mail:-steve at earth.ox.ac.uk Department of Earth Sciences, Tel:- +44 (0)1865 282110 University of Oxford, Parks Road, Oxford, UK. Fax:- +44 (0)1865 272072
Casper.Dik at Sun.COM
2007-Sep-09 10:14 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
>Mounts under /net are derived from the filesystems actually shared >from the servers; the automount daemon uses the MOUNT protocol to >determine this. If you''re looking at a path not already seen, the >information will be fresh, but that''s where the good news ends. >We don''t refresh this information reliably, so if you add a new >share in a directory we''ve already scanned, you won''t see it until >the mounts time out and are removed. We should refresh this data >more readily and no matter what the source of data.I know that, yes, but why can''t we put such an abstraction elsewhere in the name space? One thing I have always disliked about /net mounts is that they''re too magical; it should be possible to replicate them in some form in other mount maps. Casper
On Sep 7, 2007, at 18:25, Stephen Usher wrote:> (I still have many-many machines on Solaris 8) I can see it > being at least a decade until all the machines we have being at a > level > to handle NFSv4.If you need to have a Solaris 8 environment, but want to minimize the number of machines you have to manage, the recently announced Project Etude may be of some interest to you: http://blogs.sun.com/dp/entry/project_etude_revealed It creates a Solaris 8 environment in a Solaris 10 container / zone.
> >> Mounts under /net are derived from the filesystems actually shared >> from the servers; the automount daemon uses the MOUNT protocol to >> determine this. If you''re looking at a path not already seen, the >> information will be fresh, but that''s where the good news ends. > > I know that, yes, but why can''t we put such an abstraction > elsewhere in > the name space? One thing I have always disliked about /net mounts is > that they''re too magical; it should be possible to replicate them > in some form in other mount maps.In short, you''re proposing a solution to the zillions-of-nfs-exports issue, which instead of using "wait for v4 to implement a server-side export consolidation" thingy, would instead be a "better, smarter / net-alike on v2/v3, but give it a sensible name and better namespace semantics"? I could go for that... -a
Spencer Shepler
2007-Sep-10 03:06 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
On Sep 9, 2007, at 5:14 AM, Casper.Dik at Sun.COM wrote:> > >> Mounts under /net are derived from the filesystems actually shared >> from the servers; the automount daemon uses the MOUNT protocol to >> determine this. If you''re looking at a path not already seen, the >> information will be fresh, but that''s where the good news ends. >> We don''t refresh this information reliably, so if you add a new >> share in a directory we''ve already scanned, you won''t see it until >> the mounts time out and are removed. We should refresh this data >> more readily and no matter what the source of data. > > > I know that, yes, but why can''t we put such an abstraction > elsewhere in > the name space? One thing I have always disliked about /net mounts is > that they''re too magical; it should be possible to replicate them > in some form in other mount maps.There is nothing that would get in the way of this type of approach. A simple migration of the -hosts map (/net) functionality would be to take the prefix used for a regular mount (e.g. server:/a/b) and any share/export found at the server with the same prefix would be available at the client''s mount point (e.g. /a/b/c -and- /a/b/d). This would allow the client to mount server:/export/home and all subordinate shares/exports under that single mount. The upcoming NFSv4 client mirror-mounts project will provide this functionality exactly without the need for automount changes (as has been mentioned). Spencer
Stephen Usher
2007-Sep-10 06:54 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Alec Muffett wrote:>>> Mounts under /net are derived from the filesystems actually shared >>> from the servers; the automount daemon uses the MOUNT protocol to >>> determine this. If you''re looking at a path not already seen, the >>> information will be fresh, but that''s where the good news ends. >> I know that, yes, but why can''t we put such an abstraction >> elsewhere in >> the name space? One thing I have always disliked about /net mounts is >> that they''re too magical; it should be possible to replicate them >> in some form in other mount maps. > > In short, you''re proposing a solution to the zillions-of-nfs-exports > issue, which instead of using "wait for v4 to implement a server-side > export consolidation" thingy, would instead be a "better, smarter / > net-alike on v2/v3, but give it a sensible name and better namespace > semantics"? > > I could go for that...The only problem with this approach is that for current systems you would have to make sure that all vendors implemented the new scheme in their automounter and that you could retrospectively add the ability to old systems still in use which have been orphaned by their vendors. I don''t see that this is likely to be realistically possible. The only other option, therefore, is to somehow fudge it at the server end. Steve
Brian H. Nelson
2007-Sep-10 15:25 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Stephen Usher wrote:> Brian H. Nelson: > > I''m sure it would be interesting for those on the list if you could > outline the gotchas so that the rest of us don''t have to re-invent the > wheel... or at least not fall down the pitfalls. >I believe I ran into one or both of these bugs: 6429996 zvols don''t reserve enough space for requisite meta data 6430003 record size needs to affect zvol reservation size on RAID-Z Basically what happened was that the zpool filled to 100% and broke UFS with ''no space left on device'' errors. This was quite strange to sort out since the UFS zvol had 30GB of free space. I never got any replies to my request for more info and/or workarounds for the above bugs. My workaround and recommendation is to leave a ''healthy'' amount of un-allocated space in the zpool. I don''t know what a good level for ''healthy'' is. Currently I''ve left about 1% (2GB) on a 200GB raid-z pool. -Brian -- --------------------------------------------------- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu ---------------------------------------------------
Brian H. Nelson
2007-Sep-10 15:36 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Mike Gerdts wrote:> The UFS on zvols option sounds intriguing to me, but I would guess > that the following could be problems: > > 1) Double buffering: Will ZFS store data in the ARC while UFS uses > traditional file system buffers? >This is probably an issue. You also have the journal+COW combination issue. I''m guessing that both would be performance concerns. My application is relatively low bandwidth, so I haven''t dug deep into this area.> 2) Boot order dependencies. How does the startup of zfs compare to > processing of /etc/vfstab? I would guess that this is OK due to > legacy mount type supported by zfs. If this is OK, then dfstab > processing is probably OK.Zvols by nature are not available under ZFS automatic mounting. You would need to add the /dev/zvol/dsk/... lines to /etc/vfstab just as you would for any other /dev/dsk... or /dev/md/dsk/... devices. If you are not using the z_pool_ for anything else, I would remove the automatic mount point for it. -Brian -- --------------------------------------------------- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu ---------------------------------------------------
Brian H. Nelson
2007-Sep-10 15:41 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Stephen Usher wrote:> > Brian H. Nelson: > > I''m sure it would be interesting for those on the list if you could > outline the gotchas so that the rest of us don''t have to re-invent the > wheel... or at least not fall down the pitfalls. >Also, here''s a link to the ufs on zvol blog where I originally found the idea: http://blogs.sun.com/scottdickson/entry/fun_with_zvols_-_ufs -Brian -- --------------------------------------------------- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu ---------------------------------------------------
Richard Elling
2007-Sep-10 16:33 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Mike Gerdts wrote:> On 9/8/07, Richard Elling <Richard.Elling at sun.com> wrote: >> Changing the topic slightly, the strategic question is: >> why are you providing disk space to students? > > For most programming and productivity (e.g. word processing, etc.) > people will likely be better suited by having network access for their > personal equipment with local storage.Most students today are carrying around more storage in their pocket than they''ll get from the university.> For cases when specialized expensive tools ($10k + per seat) are used, > it is not practical to install them on hundreds or thousands of > personal devices for a semester or two of work. The typical computing > lab that provides such tools is not well equipped to deal with > removable media such as flash drives.I disagree, any lab machine bought in the past 5 years or so has a USB port, even SunRays.> Further, such tools will often > times be used to do designs that require simulations to run as batch > jobs that run under grid computing tools such as Grid Engine, Condor, > LSF, etc.Yes, but you won''t have 15,000 students running grid engine. But even if you do, you can adopt the services models now prevalent in the industry. For example, rather than providing storage for a class, let Google or Yahoo do it.> Then, of course, there are files that need to be shared, have reliable > backups, etc. Pushing that out to desktop or laptop machines is not > really a good idea.Clearly the business of a university has different requirements than student instruction. But even then, it seems we''re stuck in the 1960s rather than the 21st century. I think I might have some home directory somewhere at USC, where I currently attend, but I''m not really sure. I know I have a (Sun-based :-) email account with some sort of quota, but that isn''t implemented as a file system quota. I keep my stuff in my pocket. This won''t work entirely for situations like Steve''s compute cluster, but it will for many. There is also a long tail situation here, which is how I approached the problem at eng.Auburn.edu. 1% of the users will use > 90% of the space. For them, I had special places. For everyone else, they were lumped into large-ish buckets. A daily cron job easily identifies the 1% and we could proactively redistribute them, as needed. Of course, quotas are also easily defeated and the more clever students played a fun game of hide-and-seek, but I digress. There is more than one way to solve these allocation problems. The real PITA was cost accounting, especially for government contracts :-( The cost of managing the storage is much greater than the cost of the storage, so the trend will inexorably be towards eliminating the management costs -- hence the management structure of ZFS is simpler than the previous solutions. The main gap for .edu sites is quotas which will likely be solved some other way in the long run... Meanwhile, pile on http://bugs.opensolaris.org/view_bug.do?bug_id=6501037 -- richard
Darren J Moffat
2007-Sep-10 16:40 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Richard Elling wrote:> There is also a long tail situation here, which is how I approached the > problem at eng.Auburn.edu. 1% of the users will use > 90% of the space. For > them, I had special places. For everyone else, they were lumped into large-ish > buckets. A daily cron job easily identifies the 1% and we could proactively > redistribute them, as needed. Of course, quotas are also easily defeated > and the more clever students played a fun game of hide-and-seek, but I > digress. There is more than one way to solve these allocation problems.Ah I remember those games well and they are one of the reasons I''m now a Solaris developer! Though at Glasgow Uni''s Comp Sci department it wasn''t disk quotas (peer pressure was used for us) but print quotas which were much more fun to try and bypass and environmentally responsible to quota in the first place. -- Darren J Moffat
Wade.Stuart at fallon.com
2007-Sep-10 16:58 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
zfs-discuss-bounces at opensolaris.org wrote on 09/10/2007 11:40:16 AM:> Richard Elling wrote: > > There is also a long tail situation here, which is how I approached the > > problem at eng.Auburn.edu. 1% of the users will use > 90% of thespace. For> > them, I had special places. For everyone else, they were lumped > into large-ish > > buckets. A daily cron job easily identifies the 1% and we couldproactively> > redistribute them, as needed. Of course, quotas are also easilydefeated> > and the more clever students played a fun game of hide-and-seek, but I > > digress. There is more than one way to solve these allocationproblems.> > Ah I remember those games well and they are one of the reasons I''m now a > Solaris developer! Though at Glasgow Uni''s Comp Sci department it > wasn''t disk quotas (peer pressure was used for us) but print quotas > which were much more fun to try and bypass and environmentally > responsible to quota in the first place. >Very true, you could even pay people to track down heavy users and bonk them on the head. Why is everyone responding with alternate routes to a simple need? User quotas have been used in the past, and will be used in the future because they work (well), are simple, tied into many existing workflows/systems and very understandable for both end users and administrators. You can come up with 100 other ways to accomplish psudo user quotas or end runs around the core issue (did we really have google space farming suggested -- we are reading a FS mailing list here?), but quotas are tested and well understood fixes to these problems. Just because someone decided to call ZFS pool reservations quotas does not mean the need for real user quotas is gone. User quotas are a KISS solution to space hogs. Zpool quotas (really pool reservations) are not unless you can divvy up data slices into small fs mounts and have no user overlap in the partition. user quotas + zfs quotas > zfs quotas; -Wade
Darren J Moffat
2007-Sep-10 17:13 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Wade.Stuart at fallon.com wrote:> Very true, you could even pay people to track down heavy users and > bonk them on the head. Why is everyone responding with alternate routes to > a simple need?For the simple reason that sometimes it is good to challenge existing practice and try and find the real need rather than "I need X because I''ve always done it using X". We always used a vfstab and dfstab (or exportfs) file before and used a separate software RAID and filesystem before too. > User quotas have been used in the past, and will be used in> the future because they work (well), are simple, tied into many existing > workflows/systems and very understandable for both end users and > administrators. You can come up with 100 other ways to accomplish psudo > user quotas or end runs around the core issue (did we really have google > space farming suggested -- we are reading a FS mailing list here?), but > quotas are tested and well understood fixes to these problems. Just > because someone decided to call ZFS pool reservations quotas does not mean > the need for real user quotas is gone.Reservations in ZFS are quite different to Quotas, ZFS has both concepts. A reservation is a guaranteed minimum, a quota in ZFS is a guaranteed maximum. -- Darren J Moffat
Wade.Stuart at fallon.com
2007-Sep-10 17:40 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Darren.Moffat at Sun.COM wrote on 09/10/2007 12:13:18 PM:> Wade.Stuart at fallon.com wrote: > > Very true, you could even pay people to track down heavy usersand> > bonk them on the head. Why is everyone responding with alternateroutes to> > a simple need? > > For the simple reason that sometimes it is good to challenge existing > practice and try and find the real need rather than "I need X because > I''ve always done it using X".I am not against refactoring solutions, but zfs quotas and the lack of user quotas in general either leave people trying to use zfs quotas in lieu of user quotas, suggesting weak end runs against the problem (a cron to calculate hogs), or belittling the need to actually limit disk usage per user id. All of these threads to this point have not answered the needs in anyway close to an solution that user quotas allow.> > We always used a vfstab and dfstab (or exportfs) file before and used a > separate software RAID and filesystem before too.Yes, and the replacements (when talking ZFS) are either parity or better -- that makes switching a win-win. ENOSUCH when talking user quotas.> > > User quotas have been used in the past, and will be used in > > the future because they work (well), are simple, tied into manyexisting> > workflows/systems and very understandable for both end users and > > administrators. You can come up with 100 other ways to accomplishpsudo> > user quotas or end runs around the core issue (did we really havegoogle> > space farming suggested -- we are reading a FS mailing list here?), but > > quotas are tested and well understood fixes to these problems. Just > > because someone decided to call ZFS pool reservations quotas does notmean> > the need for real user quotas is gone. > > Reservations in ZFS are quite different to Quotas, ZFS has both > concepts. A reservation is a guaranteed minimum, a quota in ZFS is a > guaranteed maximum. >Reservations (the general term when talking most of the disk virtualizing and pooling technologies in play today) usually cover both the floor (guaranteed space) and ceiling (max alloc space) for the pool volume, dynamic store, or backing store. ZFS Quotas (reservations) can be called whatever you want -- it has just become frustrating when people start pushing ZFS quotas (reservations) as a drop in replacement for user quotas. They are tools for different issues with some overlap. Even though one can pound in a nail with a screwdriver, I would rather have a hammer.
Richard Elling
2007-Sep-10 18:22 UTC
[zfs-discuss] An Academic Sysadmin''s Lament for ZFS ?
Wade.Stuart at fallon.com wrote:> All of these threads to this point have not answered the needs in > anyway close to an solution that user quotas allow.I thought I did answer that... for some definition of "answer"... >> The main gap for .edu sites is quotas which will likely be solved >> some other way in the long run... Meanwhile, pile on >> http://bugs.opensolaris.org/view_bug.do?bug_id=6501037 Or, if you''re so inclined, http://cvs.opensolaris.org/source/ The point being that it either isn''t a high priority for the ZFS team, there are other solutions to the problem (which may not require changes to ZFS), or you can fix it on your own. You can impact any or all of these things. -- richard
On 10 Sep 2007, at 16:41, Brian H. Nelson wrote:> Stephen Usher wrote: >> >> Brian H. Nelson: >> >> I''m sure it would be interesting for those on the list if you could >> outline the gotchas so that the rest of us don''t have to re-invent >> the >> wheel... or at least not fall down the pitfalls. >> > Also, here''s a link to the ufs on zvol blog where I originally > found the > idea: > > http://blogs.sun.com/scottdickson/entry/fun_with_zvols_-_ufsNot everything I''ve seen blogged about UFS and zvols fills me with warm fuzzies. For instance, the above takes no account of the fact that the UFS filesystem needs to be in a consistent state before a snapshot is taken - e.g. using lockfs(1M). Example: Preparation ... basket# zfs create -V 10m pool0/v1 basket# newfs /dev/zvol/rdsk/pool0/v1 newfs: /dev/zvol/rdsk/pool0/v1 last mounted as /tmp/v1 newfs: construct a new file system /dev/zvol/rdsk/pool0/v1: (y/n)? y Warning: 4130 sector(s) in last cylinder unallocated /dev/zvol/rdsk/pool0/v1: 20446 sectors in 4 cylinders of 48 tracks, 128 sectors 10.0MB in 1 cyl groups (14 c/g, 42.00MB/g, 20160 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, basket# mount -r /dev/zvol/dsk/pool0/v1 /tmp/v1 Scenario 1 ... basket# date >/tmp/v1/f1; zfs snapshot pool0/v1 at s1 basket# cat /tmp/v1/f1 Mon Sep 10 23:07:42 BST 2007 basket# mount -r /dev/zvol/dsk/pool0/v1 at s1 /tmp/v1s1 basket# ls /tmp/v1s1 f1 lost+found/ basket# cat /tmp/v1s1/f1 basket# date >/tmp/v1/f1; zfs snapshot pool0/v1 at s2 basket# mount -r /dev/zvol/dsk/pool0/v1 at s2 /tmp/v1s2 basket# cat /tmp/v1s2/f1 Mon Sep 10 23:07:42 BST 2007 basket# cat /tmp/v1/f1 Mon Sep 10 23:09:19 BST 2007 Note: the first snapshot sees the file but not the contents, while the second snapshot sees stale data. Scenario 2 ... basket# date >/tmp/v1/f2; lockfs -wf /tmp/v1; zfs snapshot pool0/ v1 at s3; lockfs -u /tmp/v1 basket# mount -r /dev/zvol/dsk/pool0/v1 at s3 /tmp/v1s3 mount: Mount point /tmp/v1s3 does not exist. basket# mkdir /tmp/v1s3 basket# mount -r /dev/zvol/dsk/pool0/v1 at s3 /tmp/v1s3 basket# cat /tmp/v1s3/f2 Mon Sep 10 23:18:17 BST 2007 basket# cat /tmp/v1/f2 Mon Sep 10 23:18:17 BST 2007 basket# Note: the snapshot is consistent because of the lockfs(1M) calls. Phil -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2490 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070910/22eccec0/attachment.bin>
On Sep 10, 2007, at 13:40, Wade.Stuart at fallon.com wrote:> I am not against refactoring solutions, but zfs quotas and the > lack of > user quotas in general either leave people trying to use zfs quotas > in lieu > of user quotas, suggesting weak end runs against the problem (a > cron to > calculate hogs), or belittling the need to actually limit disk > usage per > user id.And let''s not forget group ID....