-------- Original Message -------- Subject: Trial x4500, zfs with NFS and quotas. Date: Tue, 27 Nov 2007 16:46:33 +0900 From: Jorgen Lundman <lundman at lundman.net> To: zfs-discuss at opensolaris.org Hello list; We are users of NetApps, currently needing to expand. We thought to try a x4500 since Jonathan suggested it might be better/cheaper. Sun very kindly shipped us a trial x4500 unit, even though we are in Tokyo. We probably are not as big a company as Sun is used to, and would only buy two x4500 this cycle, but it would be nice to get some questions answered. The version of Solaris (zfs) that the x4500 was shipped with, was very old and painfully slow, so I installed the latest OpenSolaris I could find during my trials. Solaris Express Developer Edition 9/07 snv_70b X86 SunOS x4500.unix 5.11 snv_70b i86pc i386 i86pc The legacy setup on current NetApps are in the style of: /export/mail/m/e/0/0/meNNNNN00/ /export/mail/m/e/0/1/meNNNNN01/ . . /export/mail/m/e/9/9/meNNNNN99/ Each user has a quota. I have used 30M during my tests, but it really would differ in real life. The smallest volume we have on the NetApp (/export/mail/) has 194172 users. *** NFS Option Start: Since we need quota per user, I need to create a file-system of size=$quota for each user. But NFS will not let you cross mount-point/file-systems so mounting just "/export/mail/" means I will not see any directory below that. (I don''t suppose there is some hack to let me cross file-systems?) *** On the NFS client side, this would mean I would have to do 194172 NFS mounts to see my file-system. Can normal Unix servers even do this? That just is not very realistic. This would cut out many old systems, and probably some Unix flavours. *** From Googling, it seems suggested that I use automount, which would cut out any version of Unix without automount, either from the age of the OS (early Sun might be ok still?) and Unix flavours without automount. So I would have to migrate the entire system to be 100% Sun, today and "forever". Not exactly ideal. I''m sure Sun would be very happy with such a commitment, but that just is not feasible right now. *** Alright, let''s try automount. It seems it does not really do /export/mail/m/e/X/Y very well, so I would have to list 0/0 -> 9/9 each, so 100 lines in automount. Probably that would be possible, just not very attractive. * /export/mail/m/e/0/0/& . . * /export/mail/m/e/9/9/& *** nfsv4 There were some noise about future support that will let you cross file-systems to nfsv4 (mirror mounts). But this doesn''t seem to exist right now. It would also cut out any old systems, and any Unix flavour that does not yet do nfsv4 and mirror mounts. *** iscsi? Could I share the zpool with iscsi, use ufs with ufs-quotas? But I am then limited to any OS with iscsi, and ufs. Could this even do multiple writers? *** software quotas? Probably the only realistic option for zfs/x4500 at this time. It is "most likely" that we can do software quota with the specific software we are wanting to use, but we would always have to look out for smart people getting around it, software bugs etc. *** Answer: x4500 can not replace NetApp in our setup. We would either have to go without quotas, or, cut out any old Unix server versions, or non-Solaris Unix flavours. x4500 is simply not a real NFS server (because we want quotas). Ironically, this would probably work if we used Samba, as it doesn''t care about file-systems. Please tell me I am wrong. I really wanted this to work, as the price per GB is quite attractive. Also, zfs is very neat. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home) -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
> (I don''t suppose there is some hack to let me cross file-systems?)I believe that if you lofs mount the filesystems under, say, /export you can share that directory and have all the subdirectories appear. We certainly do that for a single directory at a time.> On the NFS client side, this would mean I would have to do 194172 NFS > mounts to see my file-system. Can normal Unix servers even do this? That > just is not very realistic. This would cut out many old systems, and > probably some Unix flavours.I know that as of a couple of years ago Linux wouldn''t have coped with that at all. It would use "devices" for each mount. You could increase the number of devices available. Our partial solution was to decrease the number of possible mounts and use an automounter. If a client became busy then it could still fail, but it has proved manageable for us.> From Googling, it seems suggested that I use automount, which would cut > out any version of Unix without automount, either from the age of the OS > (early Sun might be ok still?) and Unix flavours without automount.I''m not aware of a Unix for which you can''t get an automount.> *** software quotas? > > Probably the only realistic option for zfs/x4500 at this time. It is > "most likely" that we can do software quota with the specific software > we are wanting to use, but we would always have to look out for smart > people getting around it, software bugs etc.Or you could group sets of people together so that they are quotad, but that a single person couldn''t fill the entire system.> Please tell me I am wrong. I really wanted this to work, as the price > per GB is quite attractive. Also, zfs is very neat.I think you are wrong. I haven''t, however, solved all your problems in the same single system.> Jorgen Lundman | <lundman at lundman.net>Julian -- Julian King Computer Officer, University of Cambridge, Unix Support
Are you using sendmail (or something like it)? It is well known that having mail clients NFS mount the mail spool is not scalable. That is one reason why IMAP is so popular in large scale e-mail systems. I suggest you look at using something designed to scale, such as the Sun Java Communications Suite: http://www.sun.com/software/communications_suite/index.xml It works fine under ZFS and does implement quotas. -- richard Jorgen Lundman wrote:> -------- Original Message -------- > Subject: Trial x4500, zfs with NFS and quotas. > Date: Tue, 27 Nov 2007 16:46:33 +0900 > From: Jorgen Lundman <lundman at lundman.net> > To: zfs-discuss at opensolaris.org > > > Hello list; > > We are users of NetApps, currently needing to expand. We thought to try > a x4500 since Jonathan suggested it might be better/cheaper. Sun very > kindly shipped us a trial x4500 unit, even though we are in Tokyo. We > probably are not as big a company as Sun is used to, and would only buy > two x4500 this cycle, but it would be nice to get some questions answered. > > The version of Solaris (zfs) that the x4500 was shipped with, was very > old and painfully slow, so I installed the latest OpenSolaris I could > find during my trials. > > Solaris Express Developer Edition 9/07 snv_70b X86 > SunOS x4500.unix 5.11 snv_70b i86pc i386 i86pc > > > > The legacy setup on current NetApps are in the style of: > > /export/mail/m/e/0/0/meNNNNN00/ > /export/mail/m/e/0/1/meNNNNN01/ > . . > /export/mail/m/e/9/9/meNNNNN99/ > > > Each user has a quota. I have used 30M during my tests, but it really > would differ in real life. > > The smallest volume we have on the NetApp (/export/mail/) has 194172 users. > > > *** NFS Option > > Start: > > Since we need quota per user, I need to create a file-system of > size=$quota for each user. > > But NFS will not let you cross mount-point/file-systems so mounting just > "/export/mail/" means I will not see any directory below that. > > (I don''t suppose there is some hack to let me cross file-systems?) > > *** > > On the NFS client side, this would mean I would have to do 194172 NFS > mounts to see my file-system. Can normal Unix servers even do this? That > just is not very realistic. This would cut out many old systems, and > probably some Unix flavours. > > > *** > > From Googling, it seems suggested that I use automount, which would cut > out any version of Unix without automount, either from the age of the OS > (early Sun might be ok still?) and Unix flavours without automount. > > So I would have to migrate the entire system to be 100% Sun, today and > "forever". Not exactly ideal. I''m sure Sun would be very happy with such > a commitment, but that just is not feasible right now. > > > *** > > Alright, let''s try automount. It seems it does not really do > /export/mail/m/e/X/Y very well, so I would have to list 0/0 -> 9/9 each, > so 100 lines in automount. Probably that would be possible, just not > very attractive. > > * /export/mail/m/e/0/0/& > . . > * /export/mail/m/e/9/9/& > > > *** nfsv4 > > There were some noise about future support that will let you cross > file-systems to nfsv4 (mirror mounts). But this doesn''t seem to exist > right now. It would also cut out any old systems, and any Unix flavour > that does not yet do nfsv4 and mirror mounts. > > > *** iscsi? > > Could I share the zpool with iscsi, use ufs with ufs-quotas? But I am > then limited to any OS with iscsi, and ufs. Could this even do multiple > writers? > > > *** software quotas? > > Probably the only realistic option for zfs/x4500 at this time. It is > "most likely" that we can do software quota with the specific software > we are wanting to use, but we would always have to look out for smart > people getting around it, software bugs etc. > > > *** > > Answer: x4500 can not replace NetApp in our setup. > > We would either have to go without quotas, or, cut out any old Unix > server versions, or non-Solaris Unix flavours. x4500 is simply not a > real NFS server (because we want quotas). > > Ironically, this would probably work if we used Samba, as it doesn''t > care about file-systems. > > Please tell me I am wrong. I really wanted this to work, as the price > per GB is quite attractive. Also, zfs is very neat. > > > > >
Jorgen Lundman wrote:> *** NFS Option > > Start: > > Since we need quota per user, I need to create a file-system of > size=$quota for each user. > > But NFS will not let you cross mount-point/file-systems so mounting just > "/export/mail/" means I will not see any directory below that.NFSv4 will let the client cross mount points transparently; this is implemented in Nevada build 77, and in Linux and AIX.> On the NFS client side, this would mean I would have to do 194172 NFS > mounts to see my file-system. Can normal Unix servers even do this?If they all had to be mounted at the same time, I''d expect issues, but an automounter is a better idea if you need to support clients other than listed above.> From Googling, it seems suggested that I use automount, which would cut > out any version of Unix without automount, either from the age of the OS > (early Sun might be ok still?) and Unix flavours without automount.As another poster said, automounters are pretty widespread - at least Linux, MacOS X, HP-UX and Suns of any vintage support the Sun automounter map format. What clients do you have?> Alright, let''s try automount. It seems it does not really do > /export/mail/m/e/X/Y very well, so I would have to list 0/0 -> 9/9 each, > so 100 lines in automount. Probably that would be possible, just not > very attractive. > > * /export/mail/m/e/0/0/& > . . > * /export/mail/m/e/9/9/&If your setup is very dynamic (with mail accounts created or deleted daily), it could be painful. If so, you could perhaps use a map computed on the fly instead of pushing one out through NIS or LDAP, or use /net which is in effect computed on the fly.> *** nfsv4 > > There were some noise about future support that will let you cross > file-systems to nfsv4 (mirror mounts). But this doesn''t seem to exist > right now. It would also cut out any old systems, and any Unix flavour > that does not yet do nfsv4 and mirror mounts.Nevada build 77 has this on Solaris; I can''t give you versions you would need on other operating systems offhand, but could research it if you care to tell me what clients you run.> Answer: x4500 can not replace NetApp in our setup. > > We would either have to go without quotas, or, cut out any old Unix > server versions, or non-Solaris Unix flavours. x4500 is simply not a > real NFS server (because we want quotas).I think this conclusions is hasty. Rob T
lundman at gmo.jp said:> From Googling, it seems suggested that I use automount, which would cut out > any version of Unix without automount, either from the age of the OS (early > Sun might be ok still?) and Unix flavours without automount.Some users have reported "solving" this issue by creating UFS filesystems within ZFS zvol devices, then sharing those UFS filesystems out over NFS. Regular UFS quotas are then available. The downside is that you do lose some of the flexibility of ZFS, mainly that snapshots are now done on whole UFS filesystems (zvol''s), and access to snapshots is not available via the .zfs/snapshot/ path. ZFS ACL''s on the individual file level are also not possible with this scenario. Of course, ZFS is open-source. I wonder if anyone has started implementing user and/or group quotas for ZFS yet.... Regards, Marion
J.P.King, Richard Elling, Robert Thurlow, Marion Hakanson, Thank you for replying. My apologies if I was a bit extreme, the local Sun people do not speak English, and it is my fault for not speaking sufficient Japanese, and the Sunsolve forums appear not to be the place to post questions to Sun. For a perfect replacement, simply unmounting the NetApp "/export/mail/" and mounting x4500''s "/export/mail/" would be have been nice, but a little engineering is require, that is understandable. Then it become about how much work it would be to use a x4500. If we currently have about 500 servers hanging off the NetApp, the majority can most likely be solved once I figure out the right automount configuration files. But some Unix flavours might need additional software compiled. Some might even need kernel chances, and hence, reboots. This makes it a much larger task. I do not even know how it would handle the "/export/www/" area. Currently, it is in the form of: /export/www/com/e/p/example/ for "example.com". The quota is only at the "example/" level. But the complicated issue is that it could be any depth. Can automount even do that? Guess my next stop is automount documentation. Software we use are the usual. Postfix with dovecot, apache with double-hash, https with TLS/SNI, LDAP for provisioning, pure-ftpd, DLZ, freeradius. No local config changes needed for any setup, just ldap and netapp. Lund Robert Thurlow wrote:> Jorgen Lundman wrote: > >> *** NFS Option >> >> Start: >> >> Since we need quota per user, I need to create a file-system of >> size=$quota for each user. >> >> But NFS will not let you cross mount-point/file-systems so mounting just >> "/export/mail/" means I will not see any directory below that. > > NFSv4 will let the client cross mount points transparently; > this is implemented in Nevada build 77, and in Linux and AIX. > >> On the NFS client side, this would mean I would have to do 194172 NFS >> mounts to see my file-system. Can normal Unix servers even do this? > > If they all had to be mounted at the same time, I''d expect > issues, but an automounter is a better idea if you need to > support clients other than listed above. > >> From Googling, it seems suggested that I use automount, which would cut >> out any version of Unix without automount, either from the age of the OS >> (early Sun might be ok still?) and Unix flavours without automount. > > As another poster said, automounters are pretty widespread - > at least Linux, MacOS X, HP-UX and Suns of any vintage support > the Sun automounter map format. What clients do you have? > >> Alright, let''s try automount. It seems it does not really do >> /export/mail/m/e/X/Y very well, so I would have to list 0/0 -> 9/9 each, >> so 100 lines in automount. Probably that would be possible, just not >> very attractive. >> >> * /export/mail/m/e/0/0/& >> . . >> * /export/mail/m/e/9/9/& > > If your setup is very dynamic (with mail accounts created or > deleted daily), it could be painful. If so, you could perhaps > use a map computed on the fly instead of pushing one out through > NIS or LDAP, or use /net which is in effect computed on the fly. > >> *** nfsv4 >> >> There were some noise about future support that will let you cross >> file-systems to nfsv4 (mirror mounts). But this doesn''t seem to exist >> right now. It would also cut out any old systems, and any Unix flavour >> that does not yet do nfsv4 and mirror mounts. > > Nevada build 77 has this on Solaris; I can''t give you versions > you would need on other operating systems offhand, but could > research it if you care to tell me what clients you run. > >> Answer: x4500 can not replace NetApp in our setup. >> >> We would either have to go without quotas, or, cut out any old Unix >> server versions, or non-Solaris Unix flavours. x4500 is simply not a >> real NFS server (because we want quotas). > > I think this conclusions is hasty. > > Rob T >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
> > I believe that if you lofs mount the filesystems under, say, /export you > can share that directory and have all the subdirectories appear.Wow, that a neat idea, and crazy at the same time. But the mknod''s minor value can be 0-262143 so it probably would be doable with some loss of memory and efficiency. But maybe not :) (I would need one lofi dev per filesystem right?) Definitely worth remembering if I need to do something small/quick. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
> > NFSv4 will let the client cross mount points transparently; > this is implemented in Nevada build 77, and in Linux and AIX.Looks like I have 70b only. Wonder what the chances are of another release coming out in the 2 month trial period. Does only the x4500 need to run Nevada 77, or would all NFS clients also need to support it? Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Marion Hakanson wrote:> The downside is that you do lose some of the flexibility of ZFS, mainly > that snapshots are now done on whole UFS filesystems (zvol''s), and access > to snapshots is not available via the .zfs/snapshot/ path. ZFS ACL''s on > the individual file level are also not possible with this scenario.That is an interesting solution. > Of course, ZFS is open-source. I wonder if anyone has started implementing > user and/or group quotas for ZFS yet.... I do not mind not having user quotas in zfs, the file-system way is interesting. But it is annoying that NFS is still bound by file-system, even in "these times". But there probably is some implementational reason why it can''t be "fixed". If it was just that statfs() would report incorrect values, but write() fail with ENOSPC, this would be acceptable to me. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
> > /export/www/com/e/p/example/ for "example.com". The quota is only at the > "example/" level. But the complicated issue is that it could be any depth. > > Can automount even do that? Guess my next stop is automount documentation.I should have played first, then sent the emails. If I use the /net style, it appears to just work. # cd /net/x4500/export/mail/m/e/0/0/me118400/ # ls -l drwxr-xr-x 2 nobody nobody 2 Nov 27 11:03 mail My apologies for the noise. Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Jorgen Lundman wrote:> Software we use are the usual. Postfix with dovecot, apache with > double-hash, https with TLS/SNI, LDAP for provisioning, pure-ftpd, DLZ, > freeradius. No local config changes needed for any setup, just ldap and > netapp.I meant your client operating systems, actually. The apps don''t care so much, especially if they already use NFS. Rob T
Jorgen Lundman wrote:>> NFSv4 will let the client cross mount points transparently; >> this is implemented in Nevada build 77, and in Linux and AIX. > > Looks like I have 70b only. Wonder what the chances are of another > release coming out in the 2 month trial period. > > Does only the x4500 need to run Nevada 77, or would all NFS clients also > need to support it?SXCE is coming out _very_ soon. But all of your clients need to support NFSv4 mount point crossing to make full use of it, unless the automounter works out well enough. Rob T
> > SXCE is coming out _very_ soon. But all of your clients need > to support NFSv4 mount point crossing to make full use of it, > unless the automounter works out well enough. >Ahh, that''s a shame.. Automounter works sufficiently at the moment, but it does not work well with discovering new file-systems. The only way for it to see newly created file-systems seems to be to modify /etc/auto_master and run "automount -v" so that it re-mounts /net/x4500/. This is a shame, I would have to connect to all servers after creating an account, and hope that /net/x4500 can be remounted. # zfs create -o quota=30M -o atime=off zpool1/mail/m/e/0/0/me123456 # showmount -e | grep 123456 /export/mail/m/e/0/0/me123456 (everyone) Client: # cd /net/x4500/export/mail/m/e/0/0/me123456 # df | grep me123456 # touch /etc/auto_master && automount -v # cd /net/x4500/export/mail/m/e/0/0/me123456 # df | grep me123456 Hmm no, that is not always enough for it to realise it needs to refresh. I made the mistake of umount -f /net/x4500/export/mail, even when autofs was disabled, and now all I get is I/O Errors. Is it always this sensitive? So easy to mess up and confuse itself? I would hesitate to run automount in production if the slightest mis-use makes it fall apart, and I still can''t make it see new file-systems :) Can I force it to flush/start-again somehow? I will have to play with it some more. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
> Wow, that a neat idea, and crazy at the same time. But the mknod''s minor > value can be 0-262143 so it probably would be doable with some loss of > memory and efficiency. But maybe not :) (I would need one lofi dev per > filesystem right?) > > Definitely worth remembering if I need to do something small/quick.You''re confusing lofi and lofs, I think. Have a look at man lofs. Now all _I_ would like is translucent options to that and I''d solve one of my major headaches.> Jorgen Lundman | <lundman at lundman.net>Julian -- Julian King Computer Officer, University of Cambridge, Unix Support
> You''re confusing lofi and lofs, I think. Have a look at man lofs. > > Now all _I_ would like is translucent options to that and I''d solve one > of my major headaches. >That I am. I have never used lofs, looks interesting. Thanks. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Jorgen Lundman wrote:> >> You''re confusing lofi and lofs, I think. Have a look at man lofs. >> >> Now all _I_ would like is translucent options to that and I''d solve one >> of my major headaches.I can not export lofs on NFS. Just gives invalid path, and: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6578437 -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
> > I can not export lofs on NFS. Just gives invalid path,Tell that to our mirror server. -bash-3.00$ /sbin/mount -p | grep linux /data/linux - /linux lofs - no ro /data/linux - /export/ftp/pub/linux lofs - no ro -bash-3.00$ grep linux /etc/dfs/sharetab /linux - nfs ro Linux directories -bash-3.00$ df -k /linux Filesystem 1K-blocks Used Available Use% Mounted on data 3369027462 3300686151 68341312 98% /data> and: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6578437I''m using straight Solaris, not Solaris Express or equivalents: -bash-3.00$ uname -a SunOS leprechaun.csi.cam.ac.uk 5.10 Generic_127111-01 sun4u sparc SUNW,Sun-Fire-V240 Solaris I can''t comment on the bug, although I notice it is categorised under nfsv4, but the description doesn''t seem to match that.> Jorgen Lundman | <lundman at lundman.net>Julian -- Julian King Computer Officer, University of Cambridge, Unix Support
Ah it''s a somewhat mis-leading error message: bash-3.00# mount -F lofs /zpool1/test /export/test bash-3.00# share -F nfs -o rw,anon=0 /export/test Could not share: /export/test: invalid path bash-3.00# umount /export/test bash-3.00# zfs set sharenfs=off zpool1/test bash-3.00# mount -F lofs /zpool1/test /export/test bash-3.00# share -F nfs -o rw,anon=0 /export/test So if any zfs file-system has sharenfs enabled, you will get "invalid path". If you disable sharenfs, then you can export the lofs. Lund J.P. King wrote:>> >> I can not export lofs on NFS. Just gives invalid path, > > Tell that to our mirror server. > > -bash-3.00$ /sbin/mount -p | grep linux > /data/linux - /linux lofs - no ro > /data/linux - /export/ftp/pub/linux lofs - no ro > -bash-3.00$ grep linux /etc/dfs/sharetab > /linux - nfs ro Linux directories > -bash-3.00$ df -k /linux > Filesystem 1K-blocks Used Available Use% Mounted on > data 3369027462 3300686151 68341312 98% /data > >> and: >> >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6578437 > > I''m using straight Solaris, not Solaris Express or equivalents: > > -bash-3.00$ uname -a > SunOS leprechaun.csi.cam.ac.uk 5.10 Generic_127111-01 sun4u sparc > SUNW,Sun-Fire-V240 Solaris > > I can''t comment on the bug, although I notice it is categorised under > nfsv4, but the description doesn''t seem to match that. > >> Jorgen Lundman | <lundman at lundman.net> > > Julian > -- > Julian King > Computer Officer, University of Cambridge, Unix Support >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
On Wed, Nov 28, 2007 at 05:40:57PM +0900, Jorgen Lundman wrote:> > Ah it''s a somewhat mis-leading error message: > > bash-3.00# mount -F lofs /zpool1/test /export/test > bash-3.00# share -F nfs -o rw,anon=0 /export/test > Could not share: /export/test: invalid path > bash-3.00# umount /export/test > bash-3.00# zfs set sharenfs=off zpool1/test > bash-3.00# mount -F lofs /zpool1/test /export/test > bash-3.00# share -F nfs -o rw,anon=0 /export/test > > So if any zfs file-system has sharenfs enabled, you will get "invalid > path". If you disable sharenfs, then you can export the lofs.I reported bug #6578437. We recently ugraded to b77 and this bug appears to be fixed now.> Lund > > > J.P. King wrote: > >> > >> I can not export lofs on NFS. Just gives invalid path, > > > > Tell that to our mirror server. > > > > -bash-3.00$ /sbin/mount -p | grep linux > > /data/linux - /linux lofs - no ro > > /data/linux - /export/ftp/pub/linux lofs - no ro > > -bash-3.00$ grep linux /etc/dfs/sharetab > > /linux - nfs ro Linux directories > > -bash-3.00$ df -k /linux > > Filesystem 1K-blocks Used Available Use% Mounted on > > data 3369027462 3300686151 68341312 98% /data > > > >> and: > >> > >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6578437 > > > > I''m using straight Solaris, not Solaris Express or equivalents: > > > > -bash-3.00$ uname -a > > SunOS leprechaun.csi.cam.ac.uk 5.10 Generic_127111-01 sun4u sparc > > SUNW,Sun-Fire-V240 Solaris > > > > I can''t comment on the bug, although I notice it is categorised under > > nfsv4, but the description doesn''t seem to match that. > > > >> Jorgen Lundman | <lundman at lundman.net> > > > > Julian > > -- > > Julian King > > Computer Officer, University of Cambridge, Unix Support > > > > -- > Jorgen Lundman | <lundman at lundman.net> > Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) > Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) > Japan | +81 (0)3 -3375-1767 (home) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- albert chin (china at thewrittenword.com)
Jorgen Lundman wrote:> >> SXCE is coming out _very_ soon. But all of your clients need >> to support NFSv4 mount point crossing to make full use of it, >> unless the automounter works out well enough. >> > > Ahh, that''s a shame.. Automounter works sufficiently at the moment, but > it does not work well with discovering new file-systems.Yes, that''s something I fight with, as well. We tried to make the mount point crossing support better here.> I made the mistake of umount -f /net/x4500/export/mail, even when autofs > was disabled, and now all I get is I/O Errors. > > Is it always this sensitive?"umount -f" is a power tool with no guard. If you had local apps using the filesystem, they would have seen I/O errors as well. The automounter is not making things worse here, so calling it "sensitive" doesn''t sound right to me. Rob T
Casper.Dik at Sun.COM
2007-Nov-28 13:08 UTC
[zfs-discuss] Trial x4500, zfs with NFS and quotas.
>> I made the mistake of umount -f /net/x4500/export/mail, even when autofs >> was disabled, and now all I get is I/O Errors. >> >> Is it always this sensitive? > >"umount -f" is a power tool with no guard. If you had local >apps using the filesystem, they would have seen I/O errors >as well. The automounter is not making things worse here, >so calling it "sensitive" doesn''t sound right to me.The biggest issue, though, here is that the automounter''s /net mounts for each host are read only once when the mountpoint is first established. It''s only ever refreshed when all the host''s filesystems are unmounted. Casper
Casper.Dik at sun.com wrote:> >>> I made the mistake of umount -f /net/x4500/export/mail, even when autofs >>> was disabled, and now all I get is I/O Errors. >>> >>> Is it always this sensitive? >> "umount -f" is a power tool with no guard. If you had local >> apps using the filesystem, they would have seen I/O errors >> as well. The automounter is not making things worse here, >> so calling it "sensitive" doesn''t sound right to me. > > > The biggest issue, though, here is that the automounter''s /net > mounts for each host are read only once when the mountpoint is > first established. It''s only ever refreshed when all the host''s > filesystems are unmounted.Yes, and what''s worse is that that can''t be done manually in a reasonable way - the unmounts just fail unless they''re driven by the unmount thread. I''d love to get this fixed sometime ... Rob T
I am still having issues with lofs even. I have created 2329 home directories, each with a "mail" directory inside it. zfs original: /export/mail/ lofs mount: /export/test/ # find /export/test/mail/m/e/0/0/ -name mail | wc -l 2327 NFS client: mount /export/test/ # ls -l /export/test/mail/m/e/0/0 drwxr-xr-x 2 root root 2 Nov 29 12:13 me649000 [snip] # ls -l /export/test/mail/m/e/0/0 | wc -l 2328 # find /export/test/mail/m/e/0/0/ -name mail | wc -l 0 So I create the three following file-systems, directories: drwxr-xr-x 3 root root 3 Nov 29 12:17 this_is_a_local_dir drwxr-xr-x 3 root root 3 Nov 29 12:18 zfs_without_quota drwxr-xr-x 3 root root 3 Nov 29 12:19 zfs_without_compression As seen from the NFS client: drwxr-xr-x 3 root root 3 Nov 29 12:17 this_is_a_local_dir drwxr-xr-x 2 root root 2 Nov 29 12:19 zfs_without_compression drwxr-xr-x 2 root root 2 Nov 29 12:18 zfs_without_quota NFS client: find /export/test/m/e/0/0/ -name mail -ls 4455 2 drwxr-xr-x 2 root root 2 Nov 29 12:17 /export/test/m/e/0/0/this_is_a_local_dir/mail So, even though the lofs mounted filesystem works just fine on the x4500 machine itself, once it is NFS exported, I can not enter other ZFS file-systems inside that directory tree. All those file-systems just appear empty. I also found this situation to be confusing: x4500: # cd zfs_without_quota # mkdir test # ls -l drwxr-xr-x 2 root root 2 Nov 29 12:18 mail drwxr-xr-x 2 root root 2 Nov 29 12:28 test NFS client: # cd zfs_without_quota # mkdir foo # ls -l drwxr-xr-x 2 root root 2 Nov 29 12:28 foo x4500: # ls -l drwxr-xr-x 2 root root 2 Nov 29 12:18 mail drwxr-xr-x 2 root root 2 Nov 29 12:28 test Sooo.. what? Where did that "foo" directory get created exactly? # pwd /export/test/m/e/0/0/zfs_without_quota # df -h . Filesystem size used avail capacity Mounted on x4500:/export/test 17T 4.6M 17T 1% /export/test Jorgen Lundman wrote:> Ah it''s a somewhat mis-leading error message: > > bash-3.00# mount -F lofs /zpool1/test /export/test > bash-3.00# share -F nfs -o rw,anon=0 /export/test > Could not share: /export/test: invalid path > bash-3.00# umount /export/test > bash-3.00# zfs set sharenfs=off zpool1/test > bash-3.00# mount -F lofs /zpool1/test /export/test > bash-3.00# share -F nfs -o rw,anon=0 /export/test > > So if any zfs file-system has sharenfs enabled, you will get "invalid > path". If you disable sharenfs, then you can export the lofs. > > Lund > > > J.P. King wrote: >>> I can not export lofs on NFS. Just gives invalid path, >> Tell that to our mirror server. >> >> -bash-3.00$ /sbin/mount -p | grep linux >> /data/linux - /linux lofs - no ro >> /data/linux - /export/ftp/pub/linux lofs - no ro >> -bash-3.00$ grep linux /etc/dfs/sharetab >> /linux - nfs ro Linux directories >> -bash-3.00$ df -k /linux >> Filesystem 1K-blocks Used Available Use% Mounted on >> data 3369027462 3300686151 68341312 98% /data >> >>> and: >>> >>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6578437 >> I''m using straight Solaris, not Solaris Express or equivalents: >> >> -bash-3.00$ uname -a >> SunOS leprechaun.csi.cam.ac.uk 5.10 Generic_127111-01 sun4u sparc >> SUNW,Sun-Fire-V240 Solaris >> >> I can''t comment on the bug, although I notice it is categorised under >> nfsv4, but the description doesn''t seem to match that. >> >>> Jorgen Lundman | <lundman at lundman.net> >> Julian >> -- >> Julian King >> Computer Officer, University of Cambridge, Unix Support >> >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Found them. They are all under the second layer file-system. # zfs set mountpoint=/mnt zpool1/mail/m/e/0/0/zfs_without_quota # cd /export/mail/m/e/0/0/zfs_without_quota # ls -l drwxr-xr-x 2 root root 2 Nov 29 12:28 foo drwxr-xr-x 2 root root 2 Nov 29 16:04 roger So lofs works to export one zfs, but any other zfs file-systems inside that are ignored. So basically, lofs will not work either.> x4500: > # cd zfs_without_quota > # mkdir test > # ls -l > drwxr-xr-x 2 root root 2 Nov 29 12:18 mail > drwxr-xr-x 2 root root 2 Nov 29 12:28 test > > > NFS client: > # cd zfs_without_quota > # mkdir foo > # ls -l > drwxr-xr-x 2 root root 2 Nov 29 12:28 foo > > x4500: > > # ls -l > drwxr-xr-x 2 root root 2 Nov 29 12:18 mail > drwxr-xr-x 2 root root 2 Nov 29 12:28 test > > > > Sooo.. what? Where did that "foo" directory get created exactly? >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Thank you all for your input and ideas, it has been an interesting time. I have ended up with the following conclusions, some of which are specific to our circumstance. * lofs Having all zfs file-systems as one export, would have been very attractive, but unfortunately, lofs will only let you export one zfs. If there is another file-system it will only appear as an empty directory after NFS export. Worse, you can write to this directory on the NFS clients, and the changes happen "under" the zfs file-system. This solution will not work here. If only it handled just 2-levels of file-systems! * automount Apart from my initial reservation on using automount, I did find that if I set the timeout to 1 minute or so, I successfully managed to rsync the NetApp users on the volumes I copied, to the x4500. (during copy the number of mounts stayed around ~500). With a higher timeout, it reached about 2000 mounts before the NFS client stalled, and rsync died with errors. However, I have not been about to find a way to make automount detect a newly created zfs. Not even forcing re-mount appears to work if the mount is busy. Re-mounting, successful or not, on 500 servers when we create a new user is not acceptable. * NFSv4 mirror-mounts Untested as it is not available (at least not as a Solaris install DVD from sun.com). However, upgrading 500 NFS client servers to NFSv4 means this is simply not feasible. [1] * What I have not tested: Since I fill users into LDAP, perhaps I can configure automount to read user''s home-directory paths there, and detect new accounts quicker. Maybe? I have run out of solutions to test, so unless someone can come up with other suggestions or solutions, we have to consider that the x4500 can not live up to the requirements of our current NetApp system. Jorgen Lundman [1] I am also unsure if the nfsv4-idmapping will be an issue. The NetApp / NAS does not care about uids, nor should it, whereas Solaris appears to map "unknown" users to nobody with nfsv4. Having a complete uid->textname listed on a NAS should not be required, but maybe there is a way around it, I have not had time to test. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Robert Milkowski
2007-Dec-07 19:03 UTC
[zfs-discuss] Trial x4500, zfs with NFS and quotas.
Hello Jorgen, Honestly - I don''t think zfs is a good solution to your problem. What you could try to do however when it comes to x4500 is: 1. Use SVM+UFS+user quotas 2. Use zfs and create several (like up-to 20? so each stays below 1TB) ufs file systems on zvols and then apply user quotas on ufs level - I would say you are risking lot of possible strange interactions here - maybe it will work perfectly, maybe not. I would also be concern about file system consistency. -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello Jorgen, > > Honestly - I don''t think zfs is a good solution to your problem. > > What you could try to do however when it comes to x4500 is: > > 1. Use SVM+UFS+user quotasI am now trying zfs -V 1Tb and newfs''ed ufs on that device. This looks like a potential solution at least. Even appears that I am allowed to enable "compression" on the volume. Thanks -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Robert Milkowski
2007-Dec-10 20:14 UTC
[zfs-discuss] Trial x4500, zfs with NFS and quotas.
Hello Jorgen, Monday, December 10, 2007, 5:53:31 AM, you wrote: JL> Robert Milkowski wrote:>> Hello Jorgen, >> >> Honestly - I don''t think zfs is a good solution to your problem. >> >> What you could try to do however when it comes to x4500 is: >> >> 1. Use SVM+UFS+user quotasJL> I am now trying zfs -V 1Tb and newfs''ed ufs on that device. This looks JL> like a potential solution at least. Even appears that I am allowed to JL> enable "compression" on the volume. JL> Thanks I don''t know... while it will work I''m not sure I would trust it. Maybe just use Solaris Volume Manager with Soft Partitioning + UFS and forget about ZFS in your case? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> > I don''t know... while it will work I''m not sure I would trust it. > Maybe just use Solaris Volume Manager with Soft Partitioning + UFS and > forget about ZFS in your case?Well, the idea was to see if it could replace the existing NetApps as that was what Jonathan promised it could do, and we do use snapshots on the NetApps, so having zfs snapshots would be attractive, as well as easy to grow the file-system as needed. (Although, perhaps I can growfs with SVM as well.) You may be correct about the trust issue though. copied over a small volume from the netapp: Filesystem size used avail capacity Mounted on 1.0T 8.7G 1005G 1% /export/vol1 NAME SIZE USED AVAIL CAP HEALTH ALTROOT zpool1 20.8T 5.00G 20.8T 0% ONLINE - So copied 8.7Gb, to compressed volume takes up 5Gb. That is quite nice. Enable the same quotas for users, then run quotacheck: [snip] #282759 fixed: files 0 -> 4939 blocks 0 -> 95888 #282859 fixed: files 0 -> 9 blocks 0 -> 144 Read from remote host x4500-test: Operation timed out Connection to x4500-test closed. and it has not come back, so not a panic, just a complete hang. I''ll have to get NOC staff to go power cycle it. We are bending over backwards trying to get the x4500 to work in a simple NAS design, but honestly, the x4500 is not a NAS. Nor can it compete with NetApps. As a Unix server with lots of disks, it is very nice. Perhaps one day it can mind you, it just is not there today. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Hello All; While sometimes not possible, ZFS+Thumper solution is not so far away from replacing expensive to buy and own NetApp like equipment. What people can sometimes forget is Thumper and Solaris are general purpose products that can be spcialized with some efforts. We had some cases where we had to fine tune X4500 and ZFS for more stability or performance. At the end of the day the benefits well worth the efforts. Best regards Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +902123352222 Email mertol.ozyoney at Sun.COM -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jorgen Lundman Sent: Tuesday, December 11, 2007 4:22 AM To: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Trial x4500, zfs with NFS and quotas.> > I don''t know... while it will work I''m not sure I would trust it. > Maybe just use Solaris Volume Manager with Soft Partitioning + UFS and > forget about ZFS in your case?Well, the idea was to see if it could replace the existing NetApps as that was what Jonathan promised it could do, and we do use snapshots on the NetApps, so having zfs snapshots would be attractive, as well as easy to grow the file-system as needed. (Although, perhaps I can growfs with SVM as well.) You may be correct about the trust issue though. copied over a small volume from the netapp: Filesystem size used avail capacity Mounted on 1.0T 8.7G 1005G 1% /export/vol1 NAME SIZE USED AVAIL CAP HEALTH ALTROOT zpool1 20.8T 5.00G 20.8T 0% ONLINE - So copied 8.7Gb, to compressed volume takes up 5Gb. That is quite nice. Enable the same quotas for users, then run quotacheck: [snip] #282759 fixed: files 0 -> 4939 blocks 0 -> 95888 #282859 fixed: files 0 -> 9 blocks 0 -> 144 Read from remote host x4500-test: Operation timed out Connection to x4500-test closed. and it has not come back, so not a panic, just a complete hang. I''ll have to get NOC staff to go power cycle it. We are bending over backwards trying to get the x4500 to work in a simple NAS design, but honestly, the x4500 is not a NAS. Nor can it compete with NetApps. As a Unix server with lots of disks, it is very nice. Perhaps one day it can mind you, it just is not there today. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home) _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski
2007-Dec-11 11:06 UTC
[zfs-discuss] Trial x4500, zfs with NFS and quotas.
Hello Jorgen, Tuesday, December 11, 2007, 2:22:07 AM, you wrote:>> >> I don''t know... while it will work I''m not sure I would trust it. >> Maybe just use Solaris Volume Manager with Soft Partitioning + UFS and >> forget about ZFS in your case?JL> Well, the idea was to see if it could replace the existing NetApps as JL> that was what Jonathan promised it could do, and we do use snapshots on JL> the NetApps, so having zfs snapshots would be attractive, as well as JL> easy to grow the file-system as needed. (Although, perhaps I can growfs JL> with SVM as well.) JL> You may be correct about the trust issue though. copied over a small JL> volume from the netapp: JL> Filesystem size used avail capacity Mounted on JL> 1.0T 8.7G 1005G 1% /export/vol1 JL> NAME SIZE USED AVAIL CAP HEALTH ALTROOT JL> zpool1 20.8T 5.00G 20.8T 0% ONLINE - JL> So copied 8.7Gb, to compressed volume takes up 5Gb. That is quite nice. JL> Enable the same quotas for users, then run quotacheck: JL> [snip] JL> #282759 fixed: files 0 -> 4939 blocks 0 -> 95888 JL> #282859 fixed: files 0 -> 9 blocks 0 -> 144 JL> Read from remote host x4500-test: Operation timed out JL> Connection to x4500-test closed. JL> and it has not come back, so not a panic, just a complete hang. I''ll JL> have to get NOC staff to go power cycle it. JL> We are bending over backwards trying to get the x4500 to work in a JL> simple NAS design, but honestly, the x4500 is not a NAS. Nor can it JL> compete with NetApps. As a Unix server with lots of disks, it is very nice. JL> Perhaps one day it can mind you, it just is not there today. Well, I can''t agree with you. While it may be not suitable in your specific case, as I stated before, in many cases where user quotas are not needed, x4500+zfs is a very compelling solution, and definitely cheaper and more flexible (except user quotas) than NetApp. While I don''t need user quotas I can understand people who do - if you have only a couple (hundreds?) file systems and you are not creating/destroying them then approach file system per user could work (assuming you don''t need users writing to common file systems and still have a user quota) - nevertheless it''s just an workaround in some cases and in other it won''t work. -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
J.P. King wrote:>> Wow, that a neat idea, and crazy at the same time. But the mknod''s minor >> value can be 0-262143 so it probably would be doable with some loss of >> memory and efficiency. But maybe not :) (I would need one lofi dev per >> filesystem right?) >> >> Definitely worth remembering if I need to do something small/quick. > > You''re confusing lofi and lofs, I think. Have a look at man lofs. > > Now all _I_ would like is translucent options to that and I''d solve one of > my major headaches.Check ast-open[1] for the 3d command that implements the nDFS, or multiple dimension file system, allowing you to overlay directories. The 3d [2] utility allows you to run a command with all file system calls intercepted. Any writes will go into the top level directory, while reads pass though until a matching file is found. system calls are intercepted by an LD_PRELOAD library, so each process can have its own settings. [1] http://www.research.att.com/~gsf/download/gen/ast-open.html [2] http://www.research.att.com/~gsf/man/man1/3d.html
NOC staff couldn''t reboot it after the quotacheck crash, and I only just got around to going to the Datacenter. This time I disabled NFS, and the rsync that was running, and ran just quotacheck and it completed successfully. The reason it didn''t boot what that damned boot-archive again. Seriously! Anyway, I did get a vmcore from the crash, but maybe it isn''t so interesting. I will continue with the stress testing of UFS on zpool as it is the only solution that would be acceptable. Not given up yet, I have a few more weeks to keep trying. :) -rw-r--r-- 1 root root 2345863 Dec 14 09:57 unix.0 -rw-r--r-- 1 root root 4741623808 Dec 14 10:05 vmcore.0 bash-3.00# adb -k unix.0 vmcore.0 physmem 3f9789 $c top_end_sync+0xcb(ffffff0a5923d000, ffffff001f175524, b, 0) ufs_fsync+0x1cb(ffffff62e757ad80, 10000, fffffffedd6d2020) fop_fsync+0x51(ffffff62e757ad80, 10000, fffffffedd6d2020) rfs3_setattr+0x3a3(ffffff001f1757c8, ffffff001f1758b8, ffffff1a0d942080, ffffff001f175b20, fffffffedd6d2020) common_dispatch+0x444(ffffff001f175b20, ffffff0a5a4baa80, 2, 4, fffffffff7c7ea78 , ffffffffc06003d0) rfs_dispatch+0x2d(ffffff001f175b20, ffffff0a5a4baa80) svc_getreq+0x1c6(ffffff0a5a4baa80, fffffffec7eda6c0) svc_run+0x171(ffffff62becb72a0) svc_do_run+0x85(1) nfssys+0x748(e, fecf0fc8) sys_syscall32+0x101() BAD TRAP: type=e (#pf Page fault) rp=ffffff001f175320 addr=0 occurred in module "<unknown>" due to a NULL pointer dereference -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Jorgen, You may want to try running ''bootadm update-archive'' Assuming that your boot-archive problem is an out of date boot-archive message at boot and/or doing a clean reboot to let the system try to write an up to date boot-archive. I would also encourage you to connect the LOM to the network in case you have such issues again, you should be able to recover remotely. Shawn On Dec 13, 2007, at 10:33 PM, Jorgen Lundman wrote:> > NOC staff couldn''t reboot it after the quotacheck crash, and I only > just > got around to going to the Datacenter. This time I disabled NFS, and > the rsync that was running, and ran just quotacheck and it completed > successfully. The reason it didn''t boot what that damned boot-archive > again. Seriously! > > Anyway, I did get a vmcore from the crash, but maybe it isn''t so > interesting. I will continue with the stress testing of UFS on zpool > as > it is the only solution that would be acceptable. Not given up yet, I > have a few more weeks to keep trying. :) > > > > -rw-r--r-- 1 root root 2345863 Dec 14 09:57 unix.0 > -rw-r--r-- 1 root root 4741623808 Dec 14 10:05 vmcore.0 > > bash-3.00# adb -k unix.0 vmcore.0 > physmem 3f9789 > $c > top_end_sync+0xcb(ffffff0a5923d000, ffffff001f175524, b, 0) > ufs_fsync+0x1cb(ffffff62e757ad80, 10000, fffffffedd6d2020) > fop_fsync+0x51(ffffff62e757ad80, 10000, fffffffedd6d2020) > rfs3_setattr+0x3a3(ffffff001f1757c8, ffffff001f1758b8, > ffffff1a0d942080, > ffffff001f175b20, fffffffedd6d2020) > common_dispatch+0x444(ffffff001f175b20, ffffff0a5a4baa80, 2, 4, > fffffffff7c7ea78 > , ffffffffc06003d0) > rfs_dispatch+0x2d(ffffff001f175b20, ffffff0a5a4baa80) > svc_getreq+0x1c6(ffffff0a5a4baa80, fffffffec7eda6c0) > svc_run+0x171(ffffff62becb72a0) > svc_do_run+0x85(1) > nfssys+0x748(e, fecf0fc8) > sys_syscall32+0x101() > > > BAD TRAP: type=e (#pf Page fault) rp=ffffff001f175320 addr=0 > occurred in > module > "<unknown>" due to a NULL pointer dereference > > > > > > -- > Jorgen Lundman | <lundman at lundman.net> > Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) > Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) > Japan | +81 (0)3 -3375-1767 (home) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Shawn Ferry shawn.ferry at sun.com Senior Primary Systems Engineer Sun Managed Operations 571.291.4898
Shawn Ferry wrote:> Jorgen, > > You may want to try running ''bootadm update-archive'' > > Assuming that your boot-archive problem is an out of date boot-archive > message at boot and/or doing a clean reboot to let the system try to > write an up to date boot-archive.Yeah, it is remembering to do so after something has changed that''s hard. In this case, I had to break the mirror to install OpenSolaris. (shame that the CD/DVD, and miniroot, doesn''t not have md driver). It would be tempting to add the bootadm update-archive to the boot process, as I would rather have it come up half-assed, than not come up at all. And yes, other servers are on remote access, but since was a temporary trial, we only ran 1 network cable, and 2x 200V cables. Should have done a proper job at the start, I guess. This time I made sure it was reboot-safe :) Lund> > I would also encourage you to connect the LOM to the network in case you > have such issues again, you should be able to recover remotely. > > Shawn > > On Dec 13, 2007, at 10:33 PM, Jorgen Lundman wrote: > >> NOC staff couldn''t reboot it after the quotacheck crash, and I only >> just >> got around to going to the Datacenter. This time I disabled NFS, and >> the rsync that was running, and ran just quotacheck and it completed >> successfully. The reason it didn''t boot what that damned boot-archive >> again. Seriously! >> >> Anyway, I did get a vmcore from the crash, but maybe it isn''t so >> interesting. I will continue with the stress testing of UFS on zpool >> as >> it is the only solution that would be acceptable. Not given up yet, I >> have a few more weeks to keep trying. :) >> >> >> >> -rw-r--r-- 1 root root 2345863 Dec 14 09:57 unix.0 >> -rw-r--r-- 1 root root 4741623808 Dec 14 10:05 vmcore.0 >> >> bash-3.00# adb -k unix.0 vmcore.0 >> physmem 3f9789 >> $c >> top_end_sync+0xcb(ffffff0a5923d000, ffffff001f175524, b, 0) >> ufs_fsync+0x1cb(ffffff62e757ad80, 10000, fffffffedd6d2020) >> fop_fsync+0x51(ffffff62e757ad80, 10000, fffffffedd6d2020) >> rfs3_setattr+0x3a3(ffffff001f1757c8, ffffff001f1758b8, >> ffffff1a0d942080, >> ffffff001f175b20, fffffffedd6d2020) >> common_dispatch+0x444(ffffff001f175b20, ffffff0a5a4baa80, 2, 4, >> fffffffff7c7ea78 >> , ffffffffc06003d0) >> rfs_dispatch+0x2d(ffffff001f175b20, ffffff0a5a4baa80) >> svc_getreq+0x1c6(ffffff0a5a4baa80, fffffffec7eda6c0) >> svc_run+0x171(ffffff62becb72a0) >> svc_do_run+0x85(1) >> nfssys+0x748(e, fecf0fc8) >> sys_syscall32+0x101() >> >> >> BAD TRAP: type=e (#pf Page fault) rp=ffffff001f175320 addr=0 >> occurred in >> module >> "<unknown>" due to a NULL pointer dereference >> >> >> >> >> >> -- >> Jorgen Lundman | <lundman at lundman.net> >> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) >> Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) >> Japan | +81 (0)3 -3375-1767 (home) >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Shawn Ferry shawn.ferry at sun.com > Senior Primary Systems Engineer > Sun Managed Operations > 571.291.4898 > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
On Dec 14, 2007, at 12:27 AM, Jorgen Lundman wrote:> > > Shawn Ferry wrote: >> Jorgen, >> >> You may want to try running ''bootadm update-archive'' >> >> Assuming that your boot-archive problem is an out of date boot- >> archive >> message at boot and/or doing a clean reboot to let the system try to >> write an up to date boot-archive. > > Yeah, it is remembering to do so after something has changed that''s > hard. In this case, I had to break the mirror to install OpenSolaris. > (shame that the CD/DVD, and miniroot, doesn''t not have md driver). > > It would be tempting to add the bootadm update-archive to the boot > process, as I would rather have it come up half-assed, than not come > up > at all.It is part of the shutdown process, you just need to stop crashing :)
Shawn Ferry wrote:> > It is part of the shutdown process, you just need to stop crashing :) >That looks like a good idea on paper, but what other unforeseen side-effects will we get from not crashing?! Apart from the one crash with quotacheck, it is currently running quite well. It updates the quotas as you would expect, I can create new accounts at any time and they appear for all clients. I have done multiple rsyncs from the NetApp to exercise the x4500 as much as possible, although each one takes 12 hours. Perhaps I should do more local & intensive tests as well. At least there is one solution for us, now it is a matter of balancing the various numbers and come to a decision. Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Shawn Ferry wrote:>> It would be tempting to add the bootadm update-archive to the boot >> process, as I would rather have it come up half-assed, than not come >> up at all.> It is part of the shutdown process, you just need to stop crashing :)I put a cron entry that does it manually every night. It only took one crash after some FS work for me to come up with that solution. :) Rob++ -- |Internet: windsor at warthog.com __o |Life: Rob at Carrollton.Texas.USA.Earth _`\<,_ | (_)/ (_) |"They couldn''t hit an elephant at this distance." | -- Major General John Sedgwick
We''ve gone live, and the x4500 is working out rather well. 170,000 accounts so far, all with quota are working. But one day we found our quotas file went from 30M to: -rw------- 1 root root 137438953472 Mar 21 09:35 /export/zero/quotas I assume it is sparse, and everything still works, so I have no idea why the deal is there. But apart from that, it is performing sufficient for our needs. About 86% with compression on the mail volume. (ufs on zdev for quotas). Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Robert Milkowski
2008-Mar-25 08:33 UTC
[zfs-discuss] Trial x4500, zfs with NFS and quotas.
Hello Jorgen, Friday, March 21, 2008, 7:55:36 AM, you wrote: JL> We''ve gone live, and the x4500 is working out rather well. 170,000 JL> accounts so far, all with quota are working. JL> But one day we found our quotas file went from 30M to: JL> -rw------- 1 root root 137438953472 Mar 21 09:35 JL> /export/zero/quotas JL> I assume it is sparse, and everything still works, so I have no idea why JL> the deal is there. JL> But apart from that, it is performing sufficient for our needs. JL> About 86% with compression on the mail volume. (ufs on zdev for quotas). JL> Lund Could you described into more detail your config and share some comments on it? Looks like its unique. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
> > Could you described into more detail your config and share some > comments on it? Looks like its unique.How much detail do you want? Nothing special, there was just a chance to do something right (or perhaps more sane). Not that there is anything wrong with the legacy system, but keeping users in passwd files, rsyncing configs out to cluster servers and such setups, do not scale well after a certain point. The old system uses NFS with NetApps, with quotas. Each brand is mounted as a volume, for your usual ISP hosting, (email, pop, imap, web, cgi, ftp, dns). Sun''s Jonathan sent out the blog about wanting to prove they are worth looking at, and promised a free-trial (and to their credit, usually these things are only for the US, we didn''t think we''d get to trial a x4500 in Japan, but 2 weeks later we had a server). So why not try it. But to do quotas, we found we really had to use zvolumes, with UFS formatted on it. zfs and quotas, and automounter, or mirror mounts, just do not work (YET!) Not that UFS is without issues. Volumes at 999GB is fine, at 1TB and you get that insane inode problem, where everyone recommends compiling your own mkfs. No time for that right now. We made a NetApp vs Sun shootout table, and made the decision to keep trying Sun. (Since we don''t need to upgrade current NetApps until Jun, we have a chance to run the Sun live until then) So, all clustered, no local configuration changes when adding accounts, or domains. Provisioning is always running, so account creations are under a second. Same with all other changes, except buying new domains. The registrars are still slow. That means the provisioning pulls out the requests from DB, creates/changes LDAP for the account data, and creates/changes NFS directories only. Email: postfix, dovecot, squirrelmail. Just LDAP provisioning, no local config changes needed. Apache: double-hash the request, if the directory exists, serve it. No httpd.conf changes needed. CGI: Slight patch to suexec to get uid/gid from users directory and execute. (with extra sanity checks of course). No httpd.conf changes needed. FTP: pure-ftp with ldap, no local conf changes needed. DNS: bind with DLZ, using BDBHPT, update is immediate, and no restarts needed. Radius: FreeRadius, LDAP, (almost) no local config changes needed. It was a mad race to go migrate the first batch of users, but it went rather well. We did have some issues for sure. UFS default maximum number of quota nodes is something low like 2000. Fix and reboot. OpenLDAP replication is randomly losing data, fixed. The biggest problem was essentially MySQL Cluster. It is just not quite ready. It runs, but only because I taped it up. The giant quota file isn''t all that interesting in the end. It is sparse, as one of the developers inserted into the provisioning table: +email|email=test at testing.com|pass=test|uid=90000001|gid=2000 quota|email=test at testing.com|size=50 Sigh. I also want to make it easy for customers to get apps on CGI installed. Tick the box for Gallery, and it is rolled out in their home directory. (For free, since ISP model is generally disk space, and network traffic). V2, or V3 maybe we want to also offer Zones, so that provisioning should be fun. Bet that was more than you wanted to know :) Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)