Paul B. Henson
2007-Sep-19  23:27 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
We are looking for a replacement enterprise file system to handle storage needs for our campus. For the past 10 years, we have been happily using DFS (the distributed file system component of DCE), but unfortunately IBM killed off that product and we have been running without support for over a year now. We have looked at a variety of possible options, none of which have proven fruitful. We are currently investigating the possibility of a Solaris 10/ZFS implementation. I have done a fair amount of reading and perusal of the mailing list archives, but I apologize in advance if I ask anything I should have already found in a FAQ or other repository. Basically, we are looking to provide initially 5 TB of usable storage, potentially scaling up to 25-30TB of usable storage after successful initial deployment. We would have approximately 50,000 user home directories and perhaps 1000 shared group storage directories. Access to this storage would be via NFSv4 for our UNIX infrastructure, and CIFS for those annoying Windows systems you just can''t seem to get rid of ;). I read that initial versions of ZFS had scalability issues with such a large number of file systems, resulting in extremely long boot times and other problems. Supposedly a lot of those problems have been fixed in the latest versions of OpenSolaris, and many of the fixes have been backported to the official Solaris 10 update 4? Will that version of Solaris reasonably support 50 odd thousand ZFS file systems? I saw a couple of threads in the mailing list archives regarding NFS not transitioning file system boundaries, requiring each and every ZFS filesystem (50 thousand-ish in my case) to be exported and mounted on the client separately. While that might be feasible with an automounter, it doesn''t really seem desirable or efficient. It would be much nicer to simply have one mount point on the client with all the home directories available underneath it. I was wondering whether or not that would be possible with the NFSv4 pseudo-root feature. I saw one posting that indicated it might be, but it wasn''t clear whether or not that was a current feature or something yet to be implemented. I have no requirements to support legacy NFSv2/3 systems, so a solution only available via NFSv4 would be acceptable. I was planning to provide CIFS services via Samba. I noticed a posting a while back from a Sun engineer working on integrating NFSv4/ZFS ACL support into Samba, but I''m not sure if that was ever completed and shipped either in the Sun version or pending inclusion in the official version, does anyone happen to have an update on that? Also, I saw a patch proposing a different implementation of shadow copies that better supported ZFS snapshots, any thoughts on that would also be appreciated. Is there any facility for managing ZFS remotely? We have a central identity management system that automatically provisions resources as necessary for users, as well as providing an interface for helpdesk staff to modify things such as quota. I''d be willing to implement some type of web service on the actual server if there is no native remote management; in that case, is there any way to directly configure ZFS via a programmatic API, as opposed to running binaries and parsing the output? Some type of perl module would be perfect. We need high availability, so are looking at Sun Cluster. That seems to add an extra layer of complexity <sigh>, but there''s no way I''ll get signoff on a solution without redundancy. It would appear that ZFS failover is supported with the latest version of Solaris/Sun Cluster? I was speaking with a Sun SE who claimed that ZFS would actually operate active/active in a cluster, simultaneously writable by both nodes. From what I had read, ZFS is not a cluster file system, and would only operate in the active/passive failover capacity. Any comments? The SE also told me that Sun Cluster requires hardware raid, which conflicts with the general recommendation to feed ZFS raw disk. It seems such a configuration would either require configuring zdevs directly on the raid LUNs, losing ZFS self-healing and checksum correction features, or losing space to not only the hardware raid level, but a partially redundant ZFS level as well. What is the general consensus on the best way to deploy ZFS under a cluster using hardware raid? Any other thoughts/comments on the feasibility or practicality of a large-scale ZFS deployment like this? Thanks much... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Richard Elling
2007-Sep-20  18:36 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
a few comments below... Paul B. Henson wrote:> We are looking for a replacement enterprise file system to handle storage > needs for our campus. For the past 10 years, we have been happily using DFS > (the distributed file system component of DCE), but unfortunately IBM > killed off that product and we have been running without support for over a > year now. We have looked at a variety of possible options, none of which > have proven fruitful. We are currently investigating the possibility of a > Solaris 10/ZFS implementation. I have done a fair amount of reading and > perusal of the mailing list archives, but I apologize in advance if I ask > anything I should have already found in a FAQ or other repository. > > Basically, we are looking to provide initially 5 TB of usable storage, > potentially scaling up to 25-30TB of usable storage after successful > initial deployment. We would have approximately 50,000 user home > directories and perhaps 1000 shared group storage directories. Access to > this storage would be via NFSv4 for our UNIX infrastructure, and CIFS for > those annoying Windows systems you just can''t seem to get rid of ;).50,000 directories aren''t a problem, unless you also need 50,000 quotas and hence 50,000 file systems. Such a large, single storage pool system will be an outlier... significantly beyond what we have real world experience with.> I read that initial versions of ZFS had scalability issues with such a > large number of file systems, resulting in extremely long boot times and > other problems. Supposedly a lot of those problems have been fixed in the > latest versions of OpenSolaris, and many of the fixes have been backported > to the official Solaris 10 update 4? Will that version of Solaris > reasonably support 50 odd thousand ZFS file systems?There have been improvements in performance and usability. Not all performance problems were in ZFS, but large numbers of file systems exposed other problems. However, I don''t think that this has been characterized.> I saw a couple of threads in the mailing list archives regarding NFS not > transitioning file system boundaries, requiring each and every ZFS > filesystem (50 thousand-ish in my case) to be exported and mounted on the > client separately. While that might be feasible with an automounter, it > doesn''t really seem desirable or efficient. It would be much nicer to > simply have one mount point on the client with all the home directories > available underneath it. I was wondering whether or not that would be > possible with the NFSv4 pseudo-root feature. I saw one posting that > indicated it might be, but it wasn''t clear whether or not that was a > current feature or something yet to be implemented. I have no requirements > to support legacy NFSv2/3 systems, so a solution only available via NFSv4 > would be acceptable. > > I was planning to provide CIFS services via Samba. I noticed a posting a > while back from a Sun engineer working on integrating NFSv4/ZFS ACL support > into Samba, but I''m not sure if that was ever completed and shipped either > in the Sun version or pending inclusion in the official version, does > anyone happen to have an update on that? Also, I saw a patch proposing a > different implementation of shadow copies that better supported ZFS > snapshots, any thoughts on that would also be appreciated.This work is done and, AFAIK, has been integrated into S10 8/07.> Is there any facility for managing ZFS remotely? We have a central identity > management system that automatically provisions resources as necessary for > users, as well as providing an interface for helpdesk staff to modify > things such as quota. I''d be willing to implement some type of web service > on the actual server if there is no native remote management; in that case, > is there any way to directly configure ZFS via a programmatic API, as > opposed to running binaries and parsing the output? Some type of perl > module would be perfect.This is a loaded question. There is a webconsole interface to ZFS which can be run from most browsers. But I think you''ll find that the CLI is easier for remote management.> We need high availability, so are looking at Sun Cluster. That seems to add > an extra layer of complexity <sigh>, but there''s no way I''ll get signoff on > a solution without redundancy. It would appear that ZFS failover is > supported with the latest version of Solaris/Sun Cluster? I was speaking > with a Sun SE who claimed that ZFS would actually operate active/active in > a cluster, simultaneously writable by both nodes. From what I had read, ZFS > is not a cluster file system, and would only operate in the active/passive > failover capacity. Any comments?Active/passive only. ZFS is not supported over pxfs and ZFS cannot be mounted simultaneously from two different nodes. For most large file servers, people will split the file systems across servers such that under normal circumstances, both nodes are providing file service. This implies two or more storage pools.> The SE also told me that Sun Cluster requires hardware raid, which > conflicts with the general recommendation to feed ZFS raw disk. It seems > such a configuration would either require configuring zdevs directly on the > raid LUNs, losing ZFS self-healing and checksum correction features, or > losing space to not only the hardware raid level, but a partially redundant > ZFS level as well. What is the general consensus on the best way to deploy > ZFS under a cluster using hardware raid?The SE is mistaken. Sun^H^Holaris Cluster supports a wide variety of JBOD and RAID array solutions. For ZFS, I recommend a configuration which allows ZFS to repair corrupted data.> Any other thoughts/comments on the feasibility or practicality of a > large-scale ZFS deployment like this?For today, quotas would be the main hurdle. I''ve read some blogs where people put UFS on ZFS zvols to overcome the quota problem. However, that seems to be too complicated for me, especially when high service availability is important. -- richard
Paul B. Henson
2007-Sep-20  19:49 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Richard Elling wrote:> 50,000 directories aren''t a problem, unless you also need 50,000 quotas > and hence 50,000 file systems. Such a large, single storage pool system > will be an outlier... significantly beyond what we have real world > experience with.Yes, considering that 45,000 of those users will be students, we definitely need separate quotas for each one :). Hmm, I get a bit of a shiver down my spine at the prospect of deploying a critical central service in a relatively untested configuration 8-/. What is the maximum number of file systems in a given pool that has undergone some reasonable amount of real world deployment? One issue I have is that our previous filesystem, DFS, completely spoiled me with its global namespace and location transparency. We had three fairly large servers, with the content evenly dispersed among them, but from the perspective of the client any user''s files were available at /dfs/user/<username>, regardless of which physical server they resided on. We could even move them around between servers transparently. Unfortunately, there aren''t really any filesystems available with similar features and enterprise applicability. OpenAFS comes closest, we''ve been prototyping that but the lack of per file ACLs bites, and as an add-on product we''ve had issues with kernel compatibility across upgrades. I was hoping to replicate a similar feel by just having one large file server with all the data on it. If I split our user files across multiple servers, we would have to worry about which server contained what files, which would be rather annoying. There are some features in NFSv4 that seem like they might someday help resolve this problem, but I don''t think they are readily available in servers and definitely not in the common client.> > I was planning to provide CIFS services via Samba. I noticed a posting a > > while back from a Sun engineer working on integrating NFSv4/ZFS ACL support > > into Samba, but I''m not sure if that was ever completed and shipped either > > in the Sun version or pending inclusion in the official version, does > > anyone happen to have an update on that? Also, I saw a patch proposing a > > different implementation of shadow copies that better supported ZFS > > snapshots, any thoughts on that would also be appreciated. > > This work is done and, AFAIK, has been integrated into S10 8/07.Excellent. I did a little further research myself on the Samba mailing lists, and it looks like ZFS ACL support was merged into the official 3.0.26 release. Unfortunately, the patch to improve shadow copy performance on top of ZFS still appears to be floating around the technical mailing list under discussion.> > Is there any facility for managing ZFS remotely? We have a central identity > > management system that automatically provisions resources as necessary for[...]> This is a loaded question. There is a webconsole interface to ZFS which can > be run from most browsers. But I think you''ll find that the CLI is easier > for remote management.Perhaps I should have been more clear -- a remote facility available via programmatic access, not manual user direct access. If I wanted to do something myself, I would absolutely login to the system and use the CLI. However, the question was regarding an automated process. For example, our Perl-based identity management system might create a user in the middle of the night based on the appearance in our authoritative database of that user''s identity, and need to create a ZFS filesystem and quota for that user. So, I need to be able to manipulate ZFS remotely via a programmatic API.> Active/passive only. ZFS is not supported over pxfs and ZFS cannot be > mounted simultaneously from two different nodes.That''s what I thought, I''ll have to get back to that SE. Makes me wonder as to the reliability of his other answers :).> For most large file servers, people will split the file systems across > servers such that under normal circumstances, both nodes are providing > file service. This implies two or more storage pools.Again though, that would imply two different storage locations visible to the clients? I''d really rather avoid that. For example, with our current Samba implementation, a user can just connect to ''\\files.csupomona.edu\<username>'' to access their home directory or ''\\files.csupomona.edu\<groupname>'' to access a shared group directory. They don''t need to worry on which physical server it resides or determine what server name to connect to.> The SE is mistaken. Sun^H^Holaris Cluster supports a wide variety of > JBOD and RAID array solutions. For ZFS, I recommend a configuration > which allows ZFS to repair corrupted data.That would also be my preference, but if I were forced to use hardware RAID, the additional loss of storage for ZFS redundancy would be painful. Would anyone happen to have any good recommendations for an enterprise scale storage subsystem suitable for ZFS deployment? If I recall correctly, the SE we spoke with recommended the StorageTek 6140 in a hardware raid configuration, and evidently mistakenly claimed that Cluster would not work with JBOD. Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
James F. Hranicky
2007-Sep-20  20:05 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> One issue I have is that our previous filesystem, DFS, completely spoiled > me with its global namespace and location transparency. We had three fairly > large servers, with the content evenly dispersed among them, but from the > perspective of the client any user''s files were available at > /dfs/user/<username>, regardless of which physical server they resided on. > We could even move them around between servers transparently.This can be solved using an automounter as well. All home directories are specified as /nfs/home/user in the passwd map, then have a homes map that maps /nfs/home/user -> /nfs/homeXX/user then have a map that maps /nfs/homeXX -> serverXX:/export/homeXX You can have any number of servers serving up any number of homes filesystems. Moving users between servers means only changing the mapping in the homes map. The user never knows the difference, only seeing the homedir as /nfs/home/user (we used amd)> Again though, that would imply two different storage locations visible to > the clients? I''d really rather avoid that. For example, with our current > Samba implementation, a user can just connect to > ''\\files.csupomona.edu\<username>'' to access their home directory or > ''\\files.csupomona.edu\<groupname>'' to access a shared group directory. > They don''t need to worry on which physical server it resides or determine > what server name to connect to.Samba can be configured to map homes drives to /nfs/home/%u . Let samba use the automounter setup and it''s just as transparent on the CIFS side. This is how we had things set up at my previous place of employment and it worked extremely well. Unfortunately, due to lack of BSD-style quotas and due to the fact that snapshots counted toward ZFS quota, I decided against using ZFS for filesystem service -- the automounter setup cannot mitigate the bunches-of-little-filesystems problem. Jim
Andy Lubel
2007-Sep-20  20:32 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/20/07 3:49 PM, "Paul B. Henson" <henson at acm.org> wrote:> On Thu, 20 Sep 2007, Richard Elling wrote: > >> 50,000 directories aren''t a problem, unless you also need 50,000 quotas >> and hence 50,000 file systems. Such a large, single storage pool system >> will be an outlier... significantly beyond what we have real world >> experience with. > > Yes, considering that 45,000 of those users will be students, we definitely > need separate quotas for each one :). > > Hmm, I get a bit of a shiver down my spine at the prospect of deploying a > critical central service in a relatively untested configuration 8-/. What > is the maximum number of file systems in a given pool that has undergone > some reasonable amount of real world deployment?15,500 is the most I see in this article: http://developers.sun.com/solaris/articles/nfs_zfs.html Looks like its completely scalable but your boot time may suffer the more you have. Just don''t reboot :)> > One issue I have is that our previous filesystem, DFS, completely spoiled > me with its global namespace and location transparency. We had three fairly > large servers, with the content evenly dispersed among them, but from the > perspective of the client any user''s files were available at > /dfs/user/<username>, regardless of which physical server they resided on. > We could even move them around between servers transparently.If it was so great why did IBM kill it? Did they have an alternative with the same functionality?> > Unfortunately, there aren''t really any filesystems available with similar > features and enterprise applicability. OpenAFS comes closest, we''ve been > prototyping that but the lack of per file ACLs bites, and as an add-on > product we''ve had issues with kernel compatibility across upgrades. > > I was hoping to replicate a similar feel by just having one large file > server with all the data on it. If I split our user files across multiple > servers, we would have to worry about which server contained what files, > which would be rather annoying. > > There are some features in NFSv4 that seem like they might someday help > resolve this problem, but I don''t think they are readily available in > servers and definitely not in the common client. > >>> I was planning to provide CIFS services via Samba. I noticed a posting a >>> while back from a Sun engineer working on integrating NFSv4/ZFS ACL support >>> into Samba, but I''m not sure if that was ever completed and shipped either >>> in the Sun version or pending inclusion in the official version, does >>> anyone happen to have an update on that? Also, I saw a patch proposing a >>> different implementation of shadow copies that better supported ZFS >>> snapshots, any thoughts on that would also be appreciated. >> >> This work is done and, AFAIK, has been integrated into S10 8/07. > > Excellent. I did a little further research myself on the Samba mailing > lists, and it looks like ZFS ACL support was merged into the official > 3.0.26 release. Unfortunately, the patch to improve shadow copy performance > on top of ZFS still appears to be floating around the technical mailing > list under discussion. > >>> Is there any facility for managing ZFS remotely? We have a central identity >>> management system that automatically provisions resources as necessary for > [...] >> This is a loaded question. There is a webconsole interface to ZFS which can >> be run from most browsers. But I think you''ll find that the CLI is easier >> for remote management. > > Perhaps I should have been more clear -- a remote facility available via > programmatic access, not manual user direct access. If I wanted to do > something myself, I would absolutely login to the system and use the CLI. > However, the question was regarding an automated process. For example, our > Perl-based identity management system might create a user in the middle of > the night based on the appearance in our authoritative database of that > user''s identity, and need to create a ZFS filesystem and quota for that > user. So, I need to be able to manipulate ZFS remotely via a programmatic > API. > >> Active/passive only. ZFS is not supported over pxfs and ZFS cannot be >> mounted simultaneously from two different nodes. > > That''s what I thought, I''ll have to get back to that SE. Makes me wonder as > to the reliability of his other answers :). > >> For most large file servers, people will split the file systems across >> servers such that under normal circumstances, both nodes are providing >> file service. This implies two or more storage pools. > > Again though, that would imply two different storage locations visible to > the clients? I''d really rather avoid that. For example, with our current > Samba implementation, a user can just connect to > ''\\files.csupomona.edu\<username>'' to access their home directory or > ''\\files.csupomona.edu\<groupname>'' to access a shared group directory. > They don''t need to worry on which physical server it resides or determine > what server name to connect to. > >> The SE is mistaken. Sun^H^Holaris Cluster supports a wide variety of >> JBOD and RAID array solutions. For ZFS, I recommend a configuration >> which allows ZFS to repair corrupted data. > > That would also be my preference, but if I were forced to use hardware > RAID, the additional loss of storage for ZFS redundancy would be painful. > > Would anyone happen to have any good recommendations for an enterprise > scale storage subsystem suitable for ZFS deployment? If I recall correctly, > the SE we spoke with recommended the StorageTek 6140 in a hardware raid > configuration, and evidently mistakenly claimed that Cluster would not work > with JBOD.I really have to disagree, we have 6120 and 6130''s and if I had the option to actually plan out some storage I would have just bought a thumper. You could probably buy 2 for the cost of that 6140.> > Thanks... >-Andy Lubel --
Tim Spriggs
2007-Sep-20  20:41 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Andy Lubel wrote:> On 9/20/07 3:49 PM, "Paul B. Henson" <henson at acm.org> wrote: > > >> On Thu, 20 Sep 2007, Richard Elling wrote: >> >> >> That would also be my preference, but if I were forced to use hardware >> RAID, the additional loss of storage for ZFS redundancy would be painful. >> >> Would anyone happen to have any good recommendations for an enterprise >> scale storage subsystem suitable for ZFS deployment? If I recall correctly, >> the SE we spoke with recommended the StorageTek 6140 in a hardware raid >> configuration, and evidently mistakenly claimed that Cluster would not work >> with JBOD. >> > > I really have to disagree, we have 6120 and 6130''s and if I had the option > to actually plan out some storage I would have just bought a thumper. You > could probably buy 2 for the cost of that 6140. >We are in a similar situation. It turns out that buying two thumpers is cheaper per TB than buying more shelves for an IBM N7600. I don''t know about power/cooling considerations yet though.
Gary Mills
2007-Sep-20  21:22 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, Sep 20, 2007 at 12:49:29PM -0700, Paul B. Henson wrote:> On Thu, 20 Sep 2007, Richard Elling wrote: > > > 50,000 directories aren''t a problem, unless you also need 50,000 quotas > > and hence 50,000 file systems. Such a large, single storage pool system > > will be an outlier... significantly beyond what we have real world > > experience with. > > Hmm, I get a bit of a shiver down my spine at the prospect of deploying a > critical central service in a relatively untested configuration 8-/. What > is the maximum number of file systems in a given pool that has undergone > some reasonable amount of real world deployment?You should consider a Netapp filer. It will do both NFS and CIFS, supports disk quotas, and is highly reliable. We use one for 30,000 students and 3000 employees. Ours has never failed us. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Dickon Hood
2007-Sep-20  22:07 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, Sep 20, 2007 at 16:22:45 -0500, Gary Mills wrote: : You should consider a Netapp filer. It will do both NFS and CIFS, : supports disk quotas, and is highly reliable. We use one for 30,000 : students and 3000 employees. Ours has never failed us. And they might only lightly sue you for contemplating zfs if you''re really, really lucky... -- Dickon Hood Due to digital rights management, my .sig is temporarily unavailable. Normal service will be resumed as soon as possible. We apologise for the inconvenience in the meantime. No virus was found in this outgoing message as I didn''t bother looking.
Paul B. Henson
2007-Sep-20  22:17 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, James F. Hranicky wrote:> This can be solved using an automounter as well.Well, I''d say more "kludged around" than "solved" ;), but again unless you''ve used DFS it might not seem that way. It just seems rather involved, and relatively inefficient to continuously be mounting/unmounting stuff all the time. One of the applications to be deployed against the filesystem will be web service, I can''t really envision a web server with tens of thousands of NFS mounts coming and going, seems like a lot of overhead. I might need to pursue a similar route though if I can''t get one large system to house everything in one place.> Samba can be configured to map homes drives to /nfs/home/%u . Let samba use > the automounter setup and it''s just as transparent on the CIFS side.I''m planning to use NFSv4 with strong authentication and authorization through, and intended to run Samba directly on the file server itself accessing storage locally. I''m not sure that Samba would be able to acquire local Kerberos credentials and switch between them for the users, without that access via NFSv4 isn''t very doable.> and due to the fact that snapshots counted toward ZFS quota, I decidedYes, that does seem to remove a bit of their value for backup purposes. I think they''re planning to rectify that at some point in the future. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-20  22:34 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Andy Lubel wrote:> Looks like its completely scalable but your boot time may suffer the more > you have. Just don''t reboot :)I''m not sure if it''s accurate, but the SE we were meeting with claimed that we could failover all of the filesystems to one half of the cluster, reboot the other half, fail them back, reboot the first half, and have rebooted both cluster members with no downtime. I guess as long as the active cluster member does not fail during the potentially lengthy downtime of the one rebooting.> If it was so great why did IBM kill it?I often daydreamed of a group of high-level IBM executives tied to chairs next to a table filled with rubber hoses ;), for the sole purpose of getting that answer. I think they killed it because the market of technically knowledgeable and capable people that were able to use it to its full capacity was relatively limited, and the average IT shop was happy with Windoze :(.> Did they have an alternative with the same functionality?No, not really. Depending on your situation, they recommended transitioning to GPFS or NFSv4, but neither really met the same needs as DFS.> I really have to disagree, we have 6120 and 6130''s and if I had the option > to actually plan out some storage I would have just bought a thumper. You > could probably buy 2 for the cost of that 6140.Thumper = x4500, right? You can''t really cluster the internal storage of an x4500, so assuming high reliability/availability was a requirement that sort of rules that box out. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-20  22:37 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Tim Spriggs wrote:> We are in a similar situation. It turns out that buying two thumpers is > cheaper per TB than buying more shelves for an IBM N7600. I don''t know > about power/cooling considerations yet though.It''s really a completely different class of storage though, right? I don''t know offhand what an IBM N7600 is, but presumably some type of SAN device? Which can be connected simultaneously to multiple servers for clustering? An x4500 looks great if you only want a bunch of storage with the reliability/availability provided by a relatively fault-tolerant server. But if you want to be able to withstand server failure, or continue to provide service while having one server down for maintenance/patching, it doesn''t seem appropriate. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-20  22:46 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Gary Mills wrote:> You should consider a Netapp filer. It will do both NFS and CIFS, > supports disk quotas, and is highly reliable. We use one for 30,000 > students and 3000 employees. Ours has never failed us.We had actually just finished evaluating Netapp before I started looking into Solaris/ZFS. For a variety of reasons, it was not suitable to our requirements. One, for example, was that it did not support simultaneous operation in an MIT Kerberos realm for NFS authentication while at the same time belonging to an active directory domain for CIFS authentication. Their workaround was to have the filer behave like an NT4 server rather than a Windows 2000+ server, which seemed pretty stupid. That also resulted in the filer not supporting NTLMv2, which was unacceptable. Another issue we had was with access control. Their approach to ACLs was just flat out ridiculous. You had UNIX mode bits, NFSv4 ACLs, and CIFs ACLs, all disjoint, and which one was actually being used and how they interacted was extremely confusing and not even accurately documented. We wanted to be able to have the exact same permissions applied whether via NFSv4 or CIFs, and ideally allow changing permissions via either access protocol. That simply wasn''t going to happen with Netapp. Their Kerberos implementation only supported DES, not 3DES or AES, their LDAP integration only supported the legacy posixGroup/memberUid attribute as opposed to the more modern groupOfNames/member attribute for group membership. They have some type of remote management API, but it just wasn''t very clean IMHO. As far as quotas, I was less than impressed with their implementation. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-20  22:47 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS serverproviding NFSv4/CIFS
On Thu, 20 Sep 2007, Dickon Hood wrote:> On Thu, Sep 20, 2007 at 16:22:45 -0500, Gary Mills wrote: > > : You should consider a Netapp filer. It will do both NFS and CIFS, > : supports disk quotas, and is highly reliable. We use one for 30,000 > : students and 3000 employees. Ours has never failed us. > > And they might only lightly sue you for contemplating zfs if you''re > really, really lucky...Don''t even get me started on the subject of software patents ;)... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Tim Spriggs
2007-Sep-20  22:54 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Thu, 20 Sep 2007, Tim Spriggs wrote: > > >> We are in a similar situation. It turns out that buying two thumpers is >> cheaper per TB than buying more shelves for an IBM N7600. I don''t know >> about power/cooling considerations yet though. >> > > It''s really a completely different class of storage though, right? I don''t > know offhand what an IBM N7600 is, but presumably some type of SAN device? > Which can be connected simultaneously to multiple servers for clustering? > > An x4500 looks great if you only want a bunch of storage with the > reliability/availability provided by a relatively fault-tolerant server. > But if you want to be able to withstand server failure, or continue to > provide service while having one server down for maintenance/patching, it > doesn''t seem appropriate. > > >It''s an IBM re-branded NetApp which can which we are using for NFS and iSCSI.
Chris Kirby
2007-Sep-20  23:10 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Thu, 20 Sep 2007, James F. Hranicky wrote: > > >>and due to the fact that snapshots counted toward ZFS quota, I decided > > > Yes, that does seem to remove a bit of their value for backup purposes. I > think they''re planning to rectify that at some point in the future.We''re adding a style of quota that only includes the bytes referenced by the active fs. Also, there will be a matching style for reservations. "some point in the future" is very soon (weeks). :-) -Chris
Paul B. Henson
2007-Sep-20  23:31 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Tim Spriggs wrote:> It''s an IBM re-branded NetApp which can which we are using for NFS and > iSCSI.Ah, I see. Is it comparable storage though? Does it use SATA drives similar to the x4500, or more expensive/higher performance FC drives? Is it one of the models that allows connecting dual clustered heads and failing over the storage between them? I agree the x4500 is a sweet looking box, but when making price comparisons sometimes it''s more than just the raw storage... I wish I could just drop in a couple of x4500''s and not have to worry about the complexity of clustering <sigh>... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-20  23:33 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Chris Kirby wrote:> We''re adding a style of quota that only includes the bytes referenced by > the active fs. Also, there will be a matching style for reservations. > > "some point in the future" is very soon (weeks). :-)I don''t think my management will let me run Solaris Express on a production server ;), how does that translate into availability into a released/supported version? Would that be something released as a patch to the just made available U4, or delayed until the next complete update release? -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Tim Spriggs
2007-Sep-21  00:51 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> Is it comparable storage though? Does it use SATA drives similar to the > x4500, or more expensive/higher performance FC drives? Is it one of the > models that allows connecting dual clustered heads and failing over the > storage between them? > > I agree the x4500 is a sweet looking box, but when making price comparisons > sometimes it''s more than just the raw storage... I wish I could just drop > in a couple of x4500''s and not have to worry about the complexity of > clustering <sigh>... >It is configured with SATA drives and does support failover for NFS. iSCSI is another story at the moment. The x4500 is very sweet and the only thing stopping us from buying two instead of another shelf is the fact that we have lost pools on Sol10u3 servers and there is no easy way of making two pools redundant (ie the complexity of clustering.) Simply sending incremental snapshots is not a viable option. The pools we lost were pools on iSCSI (in a mirrored config) and they were mostly lost on zpool import/export. The lack of a recovery mechanism really limits how much faith we can put into our data on ZFS. It''s safe as long as the pool is safe... but we''ve lost multiple pools. -Tim
eric kustarz
2007-Sep-21  04:11 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Sep 20, 2007, at 6:46 PM, Paul B. Henson wrote:> On Thu, 20 Sep 2007, Gary Mills wrote: > >> You should consider a Netapp filer. It will do both NFS and CIFS, >> supports disk quotas, and is highly reliable. We use one for 30,000 >> students and 3000 employees. Ours has never failed us. > > We had actually just finished evaluating Netapp before I started > looking > into Solaris/ZFS. For a variety of reasons, it was not suitable to our > requirements. ><omitting some stuff>> As far as quotas, I was less than impressed with their implementation.Would you mind going into more details here? eric
Andy Lubel
2007-Sep-21  15:55 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/20/07 7:31 PM, "Paul B. Henson" <henson at acm.org> wrote:> On Thu, 20 Sep 2007, Tim Spriggs wrote: > >> It''s an IBM re-branded NetApp which can which we are using for NFS and >> iSCSI.Yeah its fun to see IBM compete with its OEM provider Netapp.> > Ah, I see. > > Is it comparable storage though? Does it use SATA drives similar to the > x4500, or more expensive/higher performance FC drives? Is it one of the > models that allows connecting dual clustered heads and failing over the > storage between them? > > I agree the x4500 is a sweet looking box, but when making price comparisons > sometimes it''s more than just the raw storage... I wish I could just drop > in a couple of x4500''s and not have to worry about the complexity of > clustering <sigh>... > >zfs send/receive. Netapp is great, we have about 6 varieties in production here. But what I pay in maintenance and up front cost on just 2 filers, I can buy a x4500 a year, and have a 3 year warranty each time I buy. It just depends on the company you work for. I haven''t played too much with anything but netapp and storagetek.. But once I got started on zfs I just knew it was the future; and I think netapp realizes that too. And if apple does what I think it will, it will only get better :) Fast, Cheap, Easy - you only get 2. Zfs may change that.
James F. Hranicky
2007-Sep-21  16:26 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Thu, 20 Sep 2007, James F. Hranicky wrote: > >> This can be solved using an automounter as well. > > Well, I''d say more "kludged around" than "solved" ;), but again unless > you''ve used DFS it might not seem that way.Hey, I liked it :->> It just seems rather involved, and relatively inefficient to continuously > be mounting/unmounting stuff all the time. One of the applications to be > deployed against the filesystem will be web service, I can''t really > envision a web server with tens of thousands of NFS mounts coming and > going, seems like a lot of overhead.Well, that''s why ZFS wouldn''t work for us :-( .> I might need to pursue a similar route though if I can''t get one large > system to house everything in one place. > >> Samba can be configured to map homes drives to /nfs/home/%u . Let samba use >> the automounter setup and it''s just as transparent on the CIFS side. > > I''m planning to use NFSv4 with strong authentication and authorization > through, and intended to run Samba directly on the file server itself > accessing storage locally. I''m not sure that Samba would be able to acquire > local Kerberos credentials and switch between them for the users, without > that access via NFSv4 isn''t very doable.Makes sense -- in that case you would be looking at multiple SMB servers, though. Jim
> The x4500 is very sweet and the only thing stopping > us from buying two > instead of another shelf is the fact that we have > lost pools on Sol10u3 > servers and there is no easy way of making two pools > redundant (ie the > complexity of clustering.) Simply sending incremental > snapshots is not a > viable option. > > The pools we lost were pools on iSCSI (in a mirrored > config) and they > were mostly lost on zpool import/export. The lack of > a recovery > mechanism really limits how much faith we can put > into our data on ZFS. > It''s safe as long as the pool is safe... but we''ve > lost multiple pools.Hello Tim, did you try SNV60+ or S10U4 ? Gino This message posted from opensolaris.org
Tim Spriggs
2007-Sep-21  16:55 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server
Gino wrote:>> The x4500 is very sweet and the only thing stopping >> us from buying two >> instead of another shelf is the fact that we have >> lost pools on Sol10u3 >> servers and there is no easy way of making two pools >> redundant (ie the >> complexity of clustering.) Simply sending incremental >> snapshots is not a >> viable option. >> >> The pools we lost were pools on iSCSI (in a mirrored >> config) and they >> were mostly lost on zpool import/export. The lack of >> a recovery >> mechanism really limits how much faith we can put >> into our data on ZFS. >> It''s safe as long as the pool is safe... but we''ve >> lost multiple pools. >> > > Hello Tim, > did you try SNV60+ or S10U4 ? > > Gino >Hi Gino, We need Solaris proper for these systems and we will have to schedule a significant downtime to patch update to U4. -Tim
Mike Gerdts
2007-Sep-21  17:32 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/20/07, Paul B. Henson <henson at acm.org> wrote:> Again though, that would imply two different storage locations visible to > the clients? I''d really rather avoid that. For example, with our current > Samba implementation, a user can just connect to > ''\\files.csupomona.edu\<username>'' to access their home directory or > ''\\files.csupomona.edu\<groupname>'' to access a shared group directory. > They don''t need to worry on which physical server it resides or determine > what server name to connect to.MS-DFS could be helpful here. You could have a virtual samba instance that generates MS-DFS redirects to the appropriate spot. At one point in the past I wrote a script (long since lost - at a different job) that would automatically convert automounter maps into the appropriately formatted symbolic links used by the Samba MS-DFS implementation. It worked quite well for giving one place to administer the location mapping while providing transparency to the end-users. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Paul B. Henson
2007-Sep-21  19:11 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Tim Spriggs wrote:> The x4500 is very sweet and the only thing stopping us from buying two > instead of another shelf is the fact that we have lost pools on Sol10u3 > servers and there is no easy way of making two pools redundant (ie the > complexity of clustering.) Simply sending incremental snapshots is not a > viable option. > > The pools we lost were pools on iSCSI (in a mirrored config) and they > were mostly lost on zpool import/export. The lack of a recovery > mechanism really limits how much faith we can put into our data on ZFS. > It''s safe as long as the pool is safe... but we''ve lost multiple pools.Lost data doesn''t give me a warm fuzzy 8-/. Were you running an officially supported version of Solaris at the time? If so, what did Sun support have to say about this issue? -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Tim Spriggs
2007-Sep-21  19:50 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Thu, 20 Sep 2007, Tim Spriggs wrote: > > >> The x4500 is very sweet and the only thing stopping us from buying two >> instead of another shelf is the fact that we have lost pools on Sol10u3 >> servers and there is no easy way of making two pools redundant (ie the >> complexity of clustering.) Simply sending incremental snapshots is not a >> viable option. >> >> The pools we lost were pools on iSCSI (in a mirrored config) and they >> were mostly lost on zpool import/export. The lack of a recovery >> mechanism really limits how much faith we can put into our data on ZFS. >> It''s safe as long as the pool is safe... but we''ve lost multiple pools. >> > > Lost data doesn''t give me a warm fuzzy 8-/. Were you running an officially > supported version of Solaris at the time? If so, what did Sun support have > to say about this issue? >Sol 10 with just about all patches up to date. I joined this list in hope of a good answer. After answering a few questions over two days I had no hope of recovering the data. Don''t import/export (especially between systems) without serious cause, at least not with U3. I haven''t tried updating our servers yet and I don''t intend to for a while now. The filesystems contained databases that were luckily redundant and could be rebuilt, but our DBA was not too happy to have to do that at 3:00am. I still have a pool that can not be mounted or exported. It shows up with zpool list but nothing under zfs list. zpool export gives me an IO error and does nothing. On the next downtime I am going to attempt to yank the lun out from under its feet (as gently as I can) after I have stopped all other services. Still, we are using ZFS but we are re-thinking on how to deploy/manage it. Our original model had us exporting/importing pools in order to move zone data between machines. We had done the same with UFS on iSCSI without a hitch. ZFS worked for about 8 zone moves and then killed 2 zones. The major operational difference between the moves involved a reboot of the global zones. The initial import worked but after the reboot the pools were in a bad state reporting errors on both drives in the mirror. I exported one (bad choice) and attempted to gain access to the other. Now attempting to import the first pool will panic a solaris/opensolaris box very reliably. The second is in the state I described above. Also, the drive labels are intact according to zdb. When we don''t move pools around, zfs seems to be stable on both Solaris and OpenSolaris. I''ve done snapshots/rollbacks/sends/receives/clones/... without problems. We even have zvols exported as mirrored luns from an OpenSolaris box. It mirrors the luns that the IBM/NetApp box exports and seems to be doing fine with that. There are a lot of other people that seem to have the same opinion and use zfs with direct attached storage. -Tim PS: "when I have a lot of time" I might try to reproduce this by: m2# zpool create test mirror iscsi_lun1 iscsi_lun2 m2# zpool export test m1# zpool import -f test m1# reboot m2# reboot
eric kustarz
2007-Sep-21  20:03 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Sep 21, 2007, at 3:50 PM, Tim Spriggs wrote:> Paul B. Henson wrote: >> On Thu, 20 Sep 2007, Tim Spriggs wrote: >> >> >>> The x4500 is very sweet and the only thing stopping us from >>> buying two >>> instead of another shelf is the fact that we have lost pools on >>> Sol10u3 >>> servers and there is no easy way of making two pools redundant >>> (ie the >>> complexity of clustering.) Simply sending incremental snapshots >>> is not a >>> viable option. >>> >>> The pools we lost were pools on iSCSI (in a mirrored config) and >>> they >>> were mostly lost on zpool import/export. The lack of a recovery >>> mechanism really limits how much faith we can put into our data >>> on ZFS. >>> It''s safe as long as the pool is safe... but we''ve lost multiple >>> pools. >>> >> >> Lost data doesn''t give me a warm fuzzy 8-/. Were you running an >> officially >> supported version of Solaris at the time? If so, what did Sun >> support have >> to say about this issue? >> > > Sol 10 with just about all patches up to date. > > I joined this list in hope of a good answer. After answering a few > questions over two days I had no hope of recovering the data. Don''t > import/export (especially between systems) without serious cause, at > least not with U3. I haven''t tried updating our servers yet and I > don''t > intend to for a while now. The filesystems contained databases that > were > luckily redundant and could be rebuilt, but our DBA was not too > happy to > have to do that at 3:00am. > > I still have a pool that can not be mounted or exported. It shows up > with zpool list but nothing under zfs list. zpool export gives me > an IO > error and does nothing. On the next downtime I am going to attempt to > yank the lun out from under its feet (as gently as I can) after I have > stopped all other services. > > Still, we are using ZFS but we are re-thinking on how to deploy/manage > it. Our original model had us exporting/importing pools in order to > move > zone data between machines. We had done the same with UFS on iSCSI > without a hitch. ZFS worked for about 8 zone moves and then killed 2 > zones. The major operational difference between the moves involved a > reboot of the global zones. The initial import worked but after the > reboot the pools were in a bad state reporting errors on both > drives in > the mirror. I exported one (bad choice) and attempted to gain > access to > the other. Now attempting to import the first pool will panic a > solaris/opensolaris box very reliably. The second is in the state I > described above. Also, the drive labels are intact according to zdb. > > When we don''t move pools around, zfs seems to be stable on both > Solaris > and OpenSolaris. I''ve done snapshots/rollbacks/sends/receives/ > clones/... > without problems. We even have zvols exported as mirrored luns from an > OpenSolaris box. It mirrors the luns that the IBM/NetApp box > exports and > seems to be doing fine with that. There are a lot of other people that > seem to have the same opinion and use zfs with direct attached > storage. > > -Tim > > PS: "when I have a lot of time" I might try to reproduce this by: > > m2# zpool create test mirror iscsi_lun1 iscsi_lun2 > m2# zpool export test > m1# zpool import -f test > m1# reboot > m2# reboot >Since I haven''t actually looked into what problem caused your pools to become damaged/lost, i can only guess that its possibly due to the pool being actively imported on multiple machines (perhaps even accidentally). If it is that, you''ll be happy to note that we specifically no longer that to happen (unless you use the -f flag): http://blogs.sun.com/erickustarz/entry/poor_man_s_cluster_end http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725 Looks like it just missed the s10u4 cut off, but should be in s10_u5. In your above example, there should be no reason why you have to use the ''-f'' flag on import (the pool was cleanly exported) - when you''re moving the pool from system to system, this can get you into trouble if things don''t go exactly how you planned. eric
Tim Spriggs
2007-Sep-21  20:20 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
eric kustarz wrote:> > On Sep 21, 2007, at 3:50 PM, Tim Spriggs wrote: > >> m2# zpool create test mirror iscsi_lun1 iscsi_lun2 >> m2# zpool export test >> m1# zpool import -f test >> m1# reboot >> m2# reboot > > Since I haven''t actually looked into what problem caused your pools to > become damaged/lost, i can only guess that its possibly due to the > pool being actively imported on multiple machines (perhaps even > accidentally). > > If it is that, you''ll be happy to note that we specifically no longer > that to happen (unless you use the -f flag): > http://blogs.sun.com/erickustarz/entry/poor_man_s_cluster_end > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725 > > Looks like it just missed the s10u4 cut off, but should be in s10_u5. > > In your above example, there should be no reason why you have to use > the ''-f'' flag on import (the pool was cleanly exported) - when you''re > moving the pool from system to system, this can get you into trouble > if things don''t go exactly how you planned. > > ericThat''s a very possible prognosis. Even when the pools are exported from one system, they are still marked as attached (thus the -f was necessary). Since I rebooted both systems at the same time I guess it''s possible that they both made claim to the pool and corrupted it. I''m glad this will be fixed in the future. -Tim
Paul B. Henson
2007-Sep-22  00:15 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, eric kustarz wrote:> > As far as quotas, I was less than impressed with their implementation. > > Would you mind going into more details here?The feature set was fairly extensive, they supported volume quotas for users or groups, or "qtree" quotas, which similar to the ZFS quota would limit space for a particular directory and all of its contents regardless of user/group ownership. But all quotas were set in a single flat text file. Anytime you added a new quota, you needed to turn off quotas, then turn them back on, and quota enforcement was disabled while it recalculated space utilization. Like a lot of aspects of the filer, it seemed possibly functional but rather kludgy. I hate kludgy :(. I''d have to go review the documentation to recall the other issues I had with it, quotas were one of the last things we reviewed and I''d about given up taking notes at that point. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-22  00:19 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, Andy Lubel wrote:> Yeah its fun to see IBM compete with its OEM provider Netapp.Yes, we had both IBM and Netapp out as well. I''m not sure what the point was... We do have some IBM SAN equipment on site, I suppose if we had gone with the IBM variant we could have consolidated support.> > sometimes it''s more than just the raw storage... I wish I could just drop > > in a couple of x4500''s and not have to worry about the complexity of > > clustering <sigh>... > > > zfs send/receive.If I understand correctly, that would be sort of a poor man''s replication? So you would result with a physical copy on server2 of all of the data on server1? What would you do when server1 crashed and died? One of the benefits of a real cluster would be the automatic failover, and fail back when the server recovered. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-22  00:25 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, James F. Hranicky wrote:> > It just seems rather involved, and relatively inefficient to continuously > > be mounting/unmounting stuff all the time. One of the applications to be > > deployed against the filesystem will be web service, I can''t really > > envision a web server with tens of thousands of NFS mounts coming and > > going, seems like a lot of overhead. > > Well, that''s why ZFS wouldn''t work for us :-( .Although, I''m just saying that from my gut -- does anyone have any actual experience with automounting thousands of file systems? Does it work? Is it horribly inefficient? Poor performance? Resource intensive?> Makes sense -- in that case you would be looking at multiple SMB servers, > though.Yes, with again the resultant problem of worrying about where a user''s files are when they want to access them :(. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-22  00:30 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, Mike Gerdts wrote:> MS-DFS could be helpful here. You could have a virtual samba instance > that generates MS-DFS redirects to the appropriate spot. At one point inThat''s true, although I rather detest Microsoft DFS (they stole the acronym from DCE/DFS, even though particularly the initial versions sucked feature-wise in comparison). Also, the current release version of MacOS X does not support CIFS DFS referrals. I''m not sure if the upcoming version is going to rectify that or not. Windows clients not belonging to the domain also occasionally have problems accessing shares across different servers. Although it is definitely something to consider if I''m going to be unable to achieve my single namespace by having one large server... Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-22  00:34 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, Tim Spriggs wrote:> Still, we are using ZFS but we are re-thinking on how to deploy/manage > it. Our original model had us exporting/importing pools in order to move > zone data between machines. We had done the same with UFS on iSCSI[...]> When we don''t move pools around, zfs seems to be stable on both Solaris > and OpenSolaris. I''ve done snapshots/rollbacks/sends/receives/clones/...Sounds like your problems are in an area we probably wouldn''t be delving into... Thanks for the detail. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Ed Plese
2007-Sep-22  01:50 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, Sep 20, 2007 at 12:49:29PM -0700, Paul B. Henson wrote:> > > I was planning to provide CIFS services via Samba. I noticed a posting a > > > while back from a Sun engineer working on integrating NFSv4/ZFS ACL support > > > into Samba, but I''m not sure if that was ever completed and shipped either > > > in the Sun version or pending inclusion in the official version, does > > > anyone happen to have an update on that? Also, I saw a patch proposing a > > > different implementation of shadow copies that better supported ZFS > > > snapshots, any thoughts on that would also be appreciated. > > > > This work is done and, AFAIK, has been integrated into S10 8/07. > > Excellent. I did a little further research myself on the Samba mailing > lists, and it looks like ZFS ACL support was merged into the official > 3.0.26 release. Unfortunately, the patch to improve shadow copy performance > on top of ZFS still appears to be floating around the technical mailing > list under discussion.ZFS ACL support was going to be merged into 3.0.26 but 3.0.26 ended up being a security fix release and the merge got pushed back. The next release will be 3.2.0 and ACL support will be in there. As others have pointed out though, Samba is included in Solaris 10 Update 4 along with support for ZFS ACLs, Active Directory, and SMF. The patches for the shadow copy module can be found here: http://www.edplese.com/samba-with-zfs.html There are hopefully only a few minor changes that I need to make to them before submitting them again to the Samba team. I recently compiled the module for someone to use with Samba as shipped with U4 and he reported that it worked well. I''ve made the compiled module available on this page as well if anyone is interested in testing it. The patch doesn''t improve performance anymore in order to preserve backwards compatibility with the existing module but adds usability enhancements for both admins and end-users. It allows shadow copy functionality to "just work" with ZFS snapshots without having to create symlinks to each snapshot in the root of each share. For end-users it allows the "Previous Versions" list to be sorted chronologically to make it easier to use. If performance is an issue the patch can be modified to improve performance like the original patch did but this only affects directory listings and is likely negligible in most cases.> > > Is there any facility for managing ZFS remotely? We have a central identity > > > management system that automatically provisions resources as necessary for > [...] > > This is a loaded question. There is a webconsole interface to ZFS which can > > be run from most browsers. But I think you''ll find that the CLI is easier > > for remote management. > > Perhaps I should have been more clear -- a remote facility available via > programmatic access, not manual user direct access. If I wanted to do > something myself, I would absolutely login to the system and use the CLI. > However, the question was regarding an automated process. For example, our > Perl-based identity management system might create a user in the middle of > the night based on the appearance in our authoritative database of that > user''s identity, and need to create a ZFS filesystem and quota for that > user. So, I need to be able to manipulate ZFS remotely via a programmatic > API.While it won''t help you in your case since your users access the files using protocols other than CIFS, if you use only CIFS it''s possible to configure Samba to automatically create a user''s home directory the first time the user connects to the server. This is done using the "root preexec" share option in smb.conf and an example is provided at the above URL. Ed Plese
Peter Tribble
2007-Sep-22  10:28 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/22/07, Paul B. Henson <henson at acm.org> wrote:> On Fri, 21 Sep 2007, James F. Hranicky wrote: > > > > It just seems rather involved, and relatively inefficient to continuously > > > be mounting/unmounting stuff all the time. One of the applications to be > > > deployed against the filesystem will be web service, I can''t really > > > envision a web server with tens of thousands of NFS mounts coming and > > > going, seems like a lot of overhead. > > > > Well, that''s why ZFS wouldn''t work for us :-( . > > Although, I''m just saying that from my gut -- does anyone have any actual > experience with automounting thousands of file systems? Does it work? Is it > horribly inefficient? Poor performance? Resource intensive?Used to do this for years with 20,000 filesystems automounted - each user home directory was automounted separately. Never caused any problems, either with NIS+ or the automounter or the NFS clients and server. And much of the time that was with hardware that would today be antique. So I wouldn''t expect any issues on the automounting part. [Except one - see later.] That was with a relatively small number of ufs filesystems on the server holding the data. When we first got hold of zfs I did try the exercise of one zfs filesystem per user on the server, just to see how it would work. While managing 20,00 filesystems with the automounter was trivial, the attempt to manage 20,000 zfs filesystems wasn''t entirely successful. In fact, based on that experience I simply wouldn''t go down the road of one user per filesystem. [There is one issue with automounting large number of filesystems on a Solaris 10 client. Every mount or unmount triggers SMF activity, and can drive SMF up the wall. We saw one of the svc daemons hog a whole cpu on our mailserver (constantly checking for .forward files in user home directories). This has been fixed, I believe, but only very recently in S10.] -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Jonathan Loran
2007-Sep-23  04:59 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul, My gut tells me that you won''t have much trouble mounting 50K file systems with ZFS. But who knows until you try. My questions for you is can you lab this out? you could build a commodity server with a ZFS pool on it. Heck it could be a small pool, one disk, and then put your 50K file systems on that. Reboot, thrash about, and see what happens. Then the next step would be fooling with the client side of things. If you can get time on a chunk of your existing client systems, see if you can mount a bunch of those 50K file systems smoothly. Off hours, perhaps. The next problem of course, and to be honest, this may be the killer, test with your name service in the loop. You may need netgroups to delineate permissions for your shares, and to define your automounter maps. In my personal experience, with about 1-2% as many shares and mount points as you need, the name servers gets stressed out really fast. There have been some issues around LDAP port reuse in Solaris that can cause some headaches as well, but there are patches to help you too. Also, as you may know, Linux doesn''t play well with hundreds of concurrent mount operations. If you use Linux NFS clients in your environment, be sure to lab that out as well. At any rate, you may indeed be an outlier with so many file systems and NFS mounts, but I imagine many of us are waiting on the edge of our seats to see if you can make it all work. Speaking for my self, I would love to know how ZFS, NFS and LDAP scale up to such a huge system. Regards, Jon Paul B. Henson wrote:> On Fri, 21 Sep 2007, James F. Hranicky wrote: > > >>> It just seems rather involved, and relatively inefficient to continuously >>> be mounting/unmounting stuff all the time. One of the applications to be >>> deployed against the filesystem will be web service, I can''t really >>> envision a web server with tens of thousands of NFS mounts coming and >>> going, seems like a lot of overhead. >>> >> Well, that''s why ZFS wouldn''t work for us :-( . >> > > Although, I''m just saying that from my gut -- does anyone have any actual > experience with automounting thousands of file systems? Does it work? Is it > horribly inefficient? Poor performance? Resource intensive? > > > >> Makes sense -- in that case you would be looking at multiple SMB servers, >> though. >> > > Yes, with again the resultant problem of worrying about where a user''s > files are when they want to access them :(. > > >-- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070922/6d12e90f/attachment.html>
Richard Elling
2007-Sep-24  17:21 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Thu, 20 Sep 2007, James F. Hranicky wrote: > >> This can be solved using an automounter as well. > > Well, I''d say more "kludged around" than "solved" ;), but again unless > you''ve used DFS it might not seem that way. > > It just seems rather involved, and relatively inefficient to continuously > be mounting/unmounting stuff all the time. One of the applications to be > deployed against the filesystem will be web service, I can''t really > envision a web server with tens of thousands of NFS mounts coming and > going, seems like a lot of overhead. >> I might need to pursue a similar route though if I can''t get one large > system to house everything in one place. I can''t imagine a web server serving tens of thousands of pages. I think you should put a more scalable architecture in place, if that is your goal. BTW, there are many companies that do this: google, yahoo, etc. In no case do they have a single file system or single server dishing out thousands of sites. -- richard
Richard Elling
2007-Sep-24  17:35 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Fri, 21 Sep 2007, James F. Hranicky wrote: > >>> It just seems rather involved, and relatively inefficient to continuously >>> be mounting/unmounting stuff all the time. One of the applications to be >>> deployed against the filesystem will be web service, I can''t really >>> envision a web server with tens of thousands of NFS mounts coming and >>> going, seems like a lot of overhead. >> Well, that''s why ZFS wouldn''t work for us :-( . > > Although, I''m just saying that from my gut -- does anyone have any actual > experience with automounting thousands of file systems? Does it work? Is it > horribly inefficient? Poor performance? Resource intensive?Yes. Sun currently has over 45,000 users with automounted home directories. I do not know how many servers are involved, though, in part because home directories are highly available services and thus their configuration is abstracted away from the clients. Suffice to say, there is more than one server. Measuring mount performance would vary based on where in the world you were, so it probably isn''t worth the effort. -- richard
Richard Elling
2007-Sep-24  17:56 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Thu, 20 Sep 2007, Richard Elling wrote: > >> 50,000 directories aren''t a problem, unless you also need 50,000 quotas >> and hence 50,000 file systems. Such a large, single storage pool system >> will be an outlier... significantly beyond what we have real world >> experience with. > > Yes, considering that 45,000 of those users will be students, we definitely > need separate quotas for each one :).or groups... think long tail.> Hmm, I get a bit of a shiver down my spine at the prospect of deploying a > critical central service in a relatively untested configuration 8-/. What > is the maximum number of file systems in a given pool that has undergone > some reasonable amount of real world deployment?good question. I might have some field data on this, but won''t be able to look at it for a month or three. Perhaps someone on the list will brag ;-)> One issue I have is that our previous filesystem, DFS, completely spoiled > me with its global namespace and location transparency. We had three fairly > large servers, with the content evenly dispersed among them, but from the > perspective of the client any user''s files were available at > /dfs/user/<username>, regardless of which physical server they resided on. > We could even move them around between servers transparently. > > Unfortunately, there aren''t really any filesystems available with similar > features and enterprise applicability. OpenAFS comes closest, we''ve been > prototyping that but the lack of per file ACLs bites, and as an add-on > product we''ve had issues with kernel compatibility across upgrades. > > I was hoping to replicate a similar feel by just having one large file > server with all the data on it. If I split our user files across multiple > servers, we would have to worry about which server contained what files, > which would be rather annoying. > > There are some features in NFSv4 that seem like they might someday help > resolve this problem, but I don''t think they are readily available in > servers and definitely not in the common client. > >>> I was planning to provide CIFS services via Samba. I noticed a posting a >>> while back from a Sun engineer working on integrating NFSv4/ZFS ACL support >>> into Samba, but I''m not sure if that was ever completed and shipped either >>> in the Sun version or pending inclusion in the official version, does >>> anyone happen to have an update on that? Also, I saw a patch proposing a >>> different implementation of shadow copies that better supported ZFS >>> snapshots, any thoughts on that would also be appreciated. >> This work is done and, AFAIK, has been integrated into S10 8/07. > > Excellent. I did a little further research myself on the Samba mailing > lists, and it looks like ZFS ACL support was merged into the official > 3.0.26 release. Unfortunately, the patch to improve shadow copy performance > on top of ZFS still appears to be floating around the technical mailing > list under discussion. > >>> Is there any facility for managing ZFS remotely? We have a central identity >>> management system that automatically provisions resources as necessary for > [...] >> This is a loaded question. There is a webconsole interface to ZFS which can >> be run from most browsers. But I think you''ll find that the CLI is easier >> for remote management. > > Perhaps I should have been more clear -- a remote facility available via > programmatic access, not manual user direct access. If I wanted to do > something myself, I would absolutely login to the system and use the CLI. > However, the question was regarding an automated process. For example, our > Perl-based identity management system might create a user in the middle of > the night based on the appearance in our authoritative database of that > user''s identity, and need to create a ZFS filesystem and quota for that > user. So, I need to be able to manipulate ZFS remotely via a programmatic > API.I''d argue that it isn''t worth the trouble. zfs create zfs set is all that would be required. If you are ok with inheritance, zfs create will suffice.>> Active/passive only. ZFS is not supported over pxfs and ZFS cannot be >> mounted simultaneously from two different nodes. > > That''s what I thought, I''ll have to get back to that SE. Makes me wonder as > to the reliability of his other answers :). > >> For most large file servers, people will split the file systems across >> servers such that under normal circumstances, both nodes are providing >> file service. This implies two or more storage pools. > > Again though, that would imply two different storage locations visible to > the clients? I''d really rather avoid that. For example, with our current > Samba implementation, a user can just connect to > ''\\files.csupomona.edu\<username>'' to access their home directory or > ''\\files.csupomona.edu\<groupname>'' to access a shared group directory. > They don''t need to worry on which physical server it resides or determine > what server name to connect to.Yes, that sort of abstraction is achievable using several different technologies. In general, such services aren''t scalable for a single server.>> The SE is mistaken. Sun^H^Holaris Cluster supports a wide variety of >> JBOD and RAID array solutions. For ZFS, I recommend a configuration >> which allows ZFS to repair corrupted data. > > That would also be my preference, but if I were forced to use hardware > RAID, the additional loss of storage for ZFS redundancy would be painful. > > Would anyone happen to have any good recommendations for an enterprise > scale storage subsystem suitable for ZFS deployment? If I recall correctly, > the SE we spoke with recommended the StorageTek 6140 in a hardware raid > configuration, and evidently mistakenly claimed that Cluster would not work > with JBOD.Any. StorageTek products preferred, of course. -- richard
Paul B. Henson
2007-Sep-24  21:47 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, Ed Plese wrote:> ZFS ACL support was going to be merged into 3.0.26 but 3.0.26 ended up > being a security fix release and the merge got pushed back. The next > release will be 3.2.0 and ACL support will be in there.Arg, you''re right, I based that on the mailing list posting: http://marc.info/?l=samba-technical&m=117918697907120&w=2 but checking the actual release notes shows no ZFS mention. 3.0.26 to 3.2.0? That seems an odd version bump...> As others have pointed out though, Samba is included in Solaris 10 > Update 4 along with support for ZFS ACLs, Active Directory, and SMF.I usually prefer to use the version directly from the source, but depending on the timeliness of the release of 3.2.0 maybe I''ll have to make an exception. SMF I know is the new Solaris service management framework replacing /etc/init.d scripts, but what additional active directory support does the Sun branded samba include over stock?> The patches for the shadow copy module can be found here: > > http://www.edplese.com/samba-with-zfs.htmlAh, I thought I recognized your name :), I came across that page while researching ZFS. Thanks for your work on that patch , hopefully it will be accepted into mainstream samba soon. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-24  21:51 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Sat, 22 Sep 2007, Peter Tribble wrote:> filesystem per user on the server, just to see how it would work. While > managing 20,00 filesystems with the automounter was trivial, the attempt > to manage 20,000 zfs filesystems wasn''t entirely successful. In fact, > based on that experience I simply wouldn''t go down the road of one user > per filesystem.Really? Could you provide further detail about what problems you experienced? Our current filesystem based on DFS effectively utilizes a separate filesystem per user (although in DFS terminology they are called filesets), and we''ve never had a problem managing them.> directories). This has been fixed, I believe, but only very recently in > S10.]As long as the fix has been included in U4 we should be good... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-24  21:58 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Sat, 22 Sep 2007, Jonathan Loran wrote:> My gut tells me that you won''t have much trouble mounting 50K file > systems with ZFS. But who knows until you try. My questions for you is > can you lab this out?Yeah, after this research phase has been completed, we''re going to have to go into a prototyping phase. I should be able to get funding for a half dozen or so x4100 systems to play with. We standardized on those systems for our Linux deployment.> test with your name service in the loop. You may need netgroups to > delineate permissions for your shares, and to define your automounter > maps.We''re planning to use NFSv4 with Kerberos authentication, so shouldn''t need netgroups. Tentatively I think I''d put automounter maps in LDAP, although doing so for both Solaris and Linux at the same time based on a little quick research seems possibly problematic.> Also, as you may know, Linux doesn''t play well with hundreds of > concurrent mount operations. If you use Linux NFS clients in your > environment, be sure to lab that out as well.I didn''t know that -- we''re currently using RHEL 4 and Gentoo distributions for a number of services. I''ve done some initial testing of NFSv4, but never tried lots of simultaneous mounts...> At any rate, you may indeed be an outlier with so many file systems and > NFS mounts, but I imagine many of us are waiting on the edge of our seats > to see if you can make it all work. Speaking for my self, I would love > to know how ZFS, NFS and LDAP scale up to such a huge system.I don''t necessarily mind being a pioneer, but not on this particular project -- it has a rather high visibility and it would not be good for it to blow chunks after deployment when use starts scaling up 8-/. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Mike Gerdts
2007-Sep-24  22:04 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/24/07, Paul B. Henson <henson at acm.org> wrote:> but checking the actual release notes shows no ZFS mention. 3.0.26 to > 3.2.0? That seems an odd version bump...3.0.x and before are GPLv2. 3.2.0 and later are GPLv3. http://news.samba.org/announcements/samba_gplv3/ -- Mike Gerdts http://mgerdts.blogspot.com/
Paul B. Henson
2007-Sep-24  22:10 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Mon, 24 Sep 2007, Richard Elling wrote:> I can''t imagine a web server serving tens of thousands of pages. I think > you should put a more scalable architecture in place, if that is your > goal. BTW, there are many companies that do this: google, yahoo, etc. > In no case do they have a single file system or single server dishing out > thousands of sites.Our current implementation already serves tens of thousands of pages, and it''s for the most part running on 8-10 year old hardware. We have three core DFS servers housing files, and three web servers serving content. The only time we''ve ever had a problem was once we got Slashdot''d by a staff member''s personal project: http://www.csupomona.edu/~jelerma/springfield/map/index.html other than that, it''s been fine. I can''t imagine brand-new hardware running shiny new filesystems couldn''t handle the same load 10-year-old hardware has been? Although arguably, considering I can''t find anything equivalent feature wise to DFS, perhaps the current offerings aren''t equivalent scalability-wise either :(... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-24  22:12 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Mon, 24 Sep 2007, Richard Elling wrote:> Yes. Sun currently has over 45,000 users with automounted home > directories. I do not know how many servers are involved, though, in part > because home directories are highly available services and thus their > configuration is abstracted away from the clients.Hmm, highly available home directories -- that sounds like what I''m looking for ;). Any other Sun employees on the list that might be able to provide further details of the internal Sun ZFS/NFS auto mounted home directory implementation? -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-24  22:15 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Mon, 24 Sep 2007, Richard Elling wrote:> > Perhaps I should have been more clear -- a remote facility available via > > programmatic access, not manual user direct access. If I wanted to do > > I''d argue that it isn''t worth the trouble. > zfs create > zfs set > is all that would be required. If you are ok with inheritance, zfs create > will suffice.Well, considering that some days we automatically create accounts for thousands of students, I wouldn''t want to be the one stuck typing ''zfs create'' a thousand times 8-/. And that still wouldn''t resolve our requirement for our help desk staff to be able to manage quotas through our existing identity management system. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Jonathan Loran
2007-Sep-24  22:20 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> On Sat, 22 Sep 2007, Jonathan Loran wrote: > > >> My gut tells me that you won''t have much trouble mounting 50K file >> systems with ZFS. But who knows until you try. My questions for you is >> can you lab this out? >> > > Yeah, after this research phase has been completed, we''re going to have to > go into a prototyping phase. I should be able to get funding for a half > dozen or so x4100 systems to play with. We standardized on those systems > for our Linux deployment. > > >> test with your name service in the loop. You may need netgroups to >> delineate permissions for your shares, and to define your automounter >> maps. >> > > We''re planning to use NFSv4 with Kerberos authentication, so shouldn''t need > netgroups. Tentatively I think I''d put automounter maps in LDAP, although > doing so for both Solaris and Linux at the same time based on a little > quick research seems possibly problematic. >We finally got autofs maps via LDAP working smoothly with both Linux (CentOS 4.x and 5.x) and Solaris (8,9,10). It took a lot of trial and error. We settled on the Fedora Directory server, because that worked across the board. I''m not the admin who did the leg work on that though, so I can''t really comment as to where we ran into problems. If you want, I can find out more on that and respond off the list.>> Also, as you may know, Linux doesn''t play well with hundreds of >> concurrent mount operations. If you use Linux NFS clients in your >> environment, be sure to lab that out as well. >> > > > I didn''t know that -- we''re currently using RHEL 4 and Gentoo distributions > for a number of services. I''ve done some initial testing of NFSv4, but > never tried lots of simultaneous mounts... > >Sort of an old problem, but using the insecure option in your exports/shares and mount opt helps. May have been patched by now though. Too much Linux talk for this list ;)>> At any rate, you may indeed be an outlier with so many file systems and >> NFS mounts, but I imagine many of us are waiting on the edge of our seats >> to see if you can make it all work. Speaking for my self, I would love >> to know how ZFS, NFS and LDAP scale up to such a huge system. >> > > I don''t necessarily mind being a pioneer, but not on this particular > project -- it has a rather high visibility and it would not be good for it > to blow chunks after deployment when use starts scaling up 8-/. > >Good luck. -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070924/13c78439/attachment.html>
Dale Ghent
2007-Sep-25  04:12 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Sep 24, 2007, at 6:15 PM, Paul B. Henson wrote:> Well, considering that some days we automatically create accounts for > thousands of students, I wouldn''t want to be the one stuck typing ''zfs > create'' a thousand times 8-/. And that still wouldn''t resolve our > requirement for our help desk staff to be able to manage quotas > through our > existing identity management system.Not to sway you away from ZFS/NFS considerations, but I''d like to add that people who in the past used DFS typically went on to replace it with AFS. Have you considered it? /dale
James F. Hranicky
2007-Sep-25  14:47 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote:> But all quotas were set in a single flat text file. Anytime you added a new > quota, you needed to turn off quotas, then turn them back on, and quota > enforcement was disabled while it recalculated space utilization.I believe in later versions of the OS ''quota resize'' did this without the massive recalculation. Jim
Peter Tribble
2007-Sep-25  21:27 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/24/07, Paul B. Henson <henson at acm.org> wrote:> On Sat, 22 Sep 2007, Peter Tribble wrote: > > > filesystem per user on the server, just to see how it would work. While > > managing 20,00 filesystems with the automounter was trivial, the attempt > > to manage 20,000 zfs filesystems wasn''t entirely successful. In fact, > > based on that experience I simply wouldn''t go down the road of one user > > per filesystem. > > Really? Could you provide further detail about what problems you > experienced? Our current filesystem based on DFS effectively utilizes a > separate filesystem per user (although in DFS terminology they are called > filesets), and we''ve never had a problem managing them.This was some time ago (a very long time ago, actually). There are two fundamental problems: 1. Each zfs filesystem consumes kernel memory. Significant amounts, 64K is what we worked out at the time. For normal numbers of filesystems that''s not a problem; multiply it by tens of thousands and you start to hit serious resource usage. 2. The zfs utilities didn''t scale well as the number of filesystems increased. I just kept on issuing zfs create until the machine had had enough. It got through the first 10,000 without too much difficulty (as I recall that took several hours), but soon got bogged down after that, to the point where it took a day to do anything. At which point (at about 15000 filesystems on a 1G system) it ran out of kernel memory and died. At this point it wouldn''t even boot. I know that some work has gone into improving the performance of the utilities, and things like in-kernel sharetab (we never even tried to share all those filesystems) are there to improve scalability. Perhaps I should find a spare machine and try repeating the experiment. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Paul B. Henson
2007-Sep-25  22:43 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Mon, 24 Sep 2007, Dale Ghent wrote:> Not to sway you away from ZFS/NFS considerations, but I''d like to add > that people who in the past used DFS typically went on to replace it with > AFS. Have you considered it?You''re right, AFS is the first choice coming to mind when replacing DFS. We actually implemented an OpenAFS prototype last year and have been running it for internal use only since then. Unfortunately, like almost everything we''ve looked at, AFS is a step backwards from DFS. As the precursor to DFS, AFS has enough similarities to DFS to make the features it lacks almost more painful. No per file access control lists is a serious bummer. Integration with Kerberos 5 rather than the internal kaserver is still at a bit of a duct tape level, and only support DES. Having to maintain an additional repository of user/group information (pts) is a bit of a pain, while there are long-term goals to replace that with some type of LDAP integration I don''t see that anytime soon. One of the most annoying things is that AFS requires integration at the kernel level, yet is not maintained by the same people that maintain the kernel. Frequently a Linux kernel upgrade will break AFS, and the developers need to scramble to release a patch or update to resolve it. While we are not currently using AFS under Solaris, based on mailing list traffic similar issues arise. One of the benefits of NFSv4 is that it is a core part of the operating system, unlikely to be lightly broken during updates. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2007-Sep-25  22:47 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Tue, 25 Sep 2007, Peter Tribble wrote:> This was some time ago (a very long time ago, actually). There are two > fundamental problems: > > 1. Each zfs filesystem consumes kernel memory. Significant amounts, 64K > is what we worked out at the time. For normal numbers of filesystems that''s > not a problem; multiply it by tens of thousands and you start to hit serious > resource usage.Every server we''ve bought for about the last year came with 4 GB of memory, the servers we would deploy for this would have at least 8 if not 16GB. Given the downtrend in memory prices, hopefully memory would not be an issue.> 2. The zfs utilities didn''t scale well as the number of filesystems > increased.[...]> share all those filesystems) are there to improve scalability. Perhaps > I should find a spare machine and try repeating the experiment.There have supposedly been lots of improvements in scalability, based on my review of mailing list archives. If you do find the time to experiment again, I''d appreciate hearing what you find... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Vincent Fox
2007-Sep-26  00:32 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
> > > The SE also told me that Sun Cluster requires > hardware raid, which > conflicts with the general recommendation to feed ZFS > raw disk. It seems > such a configuration would either require configuring > zdevs directly on the > raid LUNs, losing ZFS self-healing and checksum > correction features, or > losing space to not only the hardware raid level, but > a partially redundant > ZFS level as well. What is the general consensus on > the best way to deploy > ZFS under a cluster using hardware raid?I have a pair of 3510FC units, each export 2 RAID-5 (5-disk) LUNs. On the T2000 to I map a LUN from each array into a mirror set, then add the 2nd set the same way into the ZFS pool. I guess it''s RAID-5+1+0. Yes we have multipath SAN setup too. e.g. {cyrus1:vf5:133} zpool status -v pool: ms1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ms1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600C0FF0000000000A73D97F16461700d0 ONLINE 0 0 0 c4t600C0FF0000000000A719D7C1126E500d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600C0FF0000000000A73D94517C4A900d0 ONLINE 0 0 0 c4t600C0FF0000000000A719D38B93FD200d0 ONLINE 0 0 0 errors: No known data errors Works great. Nothing beats having an entire 3510FC down and never having users notice there is a problem. I was replacing a controller in the 2nd array and goofed up my cabling taking the entire array offline. Not a hiccup in service, although I could see the problem in zpool status. I sorted everything out plugged it up right, and everything was fine. I like very much that the 3510 knows it has a global spare that is used for that array, and having that level of things handled locally. In ZFS AFAICT, there is no way to specify what affinity a spare has so a spare from one array if it went hot to replace one in the other array, becomes an undesirable dependency. This message posted from opensolaris.org
Vincent Fox
2007-Sep-26  00:39 UTC
[zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
> We need high availability, so are looking at Sun > Cluster. That seems to add > an extra layer of complexity <sigh>, but there''s no > way I''ll get signoff on > a solution without redundancy. It would appear that > ZFS failover is > supported with the latest version of Solaris/Sun > Cluster? I was speaking > with a Sun SE who claimed that ZFS would actually > operate active/active in > a cluster, simultaneously writable by both nodes. > From what I had read, ZFS > is not a cluster file system, and would only operate > in the active/passive > failover capacity. Any comments?The SE is not correct. There are relatively few applications in Sun Cluster that run scalable. Most of them are "failover". ZFS is definitely not a global file system, so that''s one problem. And NFS is a failover service. This can actually be an asset to you. Think of it this way, you have a KNOWN capacity. You do not have to worry that a failure of one node at peak leaves you crippled. Also have you ever had Sun patches break things? We certainly have enough scars from that. You can patch the idle node, do a cluster switch so it''s now the active node, and verify function for a few days before patching the other node. If there''s a problem that crops up due to some new patch, you switch it back the other way until you sort that out. This message posted from opensolaris.org