Hi there, I am working on the storage solution for our Cybera cloud in Alberta, Canada. After testing several distributed file systems such as MooseFS, Tahoe and Lustre, I finally realized that Lustre is the best among them. However, Lustre is originally designed to target at HPC clusters, i.e., systems on a single LAN environment. On the other hand, the cloud we are building is physically distributed at different cities in the province of Alberta. I did a preliminary test of Lustre between the two Universities at Calgary and Edmonton, the performance is impressively good, partly due to the fast network we are running in the province. I also know that Lustre can use Kerberos to do secure authentication, which is critical in a WAN environment. Everything said, I still would like to hear some insider thoughts on the possibility of implementing Lustre as a distributed storage solution for a physically distributed cloud system. Thank you very much. -- Shi Jin, Ph.D. Cloud Architect Cybera Inc. 3-43 Computing Science Center University of Alberta Edmonton, AB, Canada 780-232-7681 shi.jin at cybera.ca http://www.ualberta.ca/~sjin1/
On Mar 26, 2009 07:53 -0700, Shi Jin wrote:> I am working on the storage solution for our Cybera cloud in Alberta, Canada. > After testing several distributed file systems such as MooseFS, > Tahoe and Lustre, I finally realized that Lustre is the best among > them. However, Lustre is originally designed to target at HPC clusters, > i.e., systems on a single LAN environment. On the other hand, the cloud > we are building is physically distributed at different cities in the > province of Alberta. I did a preliminary test of Lustre between the two > Universities at Calgary and Edmonton, the performance is impressively > good, partly due to the fast network we are running in the province. I > also know that Lustre can use Kerberos to do secure authentication, > which is critical in a WAN environment. > > Everything said, I still would like to hear some insider thoughts on > the possibility of implementing Lustre as a distributed storage solution > for a physically distributed cloud system.This is very similar to the environment that is being used at Indiana University. They have the Lustre servers at a central site, but several labs/campuses in other cities are mounting the filesystem and they can saturate 10GigE links between the sites. Note that Kerberos is not yet available in a production release, though an preview release is available in CVS (use the v1_9_166 tag). Until kerberos is available you should use a VPN or physically secure link for the WAN connection. Depending on your level of committment to deploying a Lustre solution, there is the annual Lustre User Group in San Francisco on April 16, 17, at which many Lustre customers will be attending. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andreas Dilger wrote:> On Mar 26, 2009 07:53 -0700, Shi Jin wrote: >> I am working on the storage solution for our Cybera cloud in Alberta, Canada. >> After testing several distributed file systems such as MooseFS, >> Tahoe and Lustre, I finally realized that Lustre is the best among >> them. However, Lustre is originally designed to target at HPC clusters, >> i.e., systems on a single LAN environment. On the other hand, the cloud >> we are building is physically distributed at different cities in the >> province of Alberta. I did a preliminary test of Lustre between the two >> Universities at Calgary and Edmonton, the performance is impressively >> good, partly due to the fast network we are running in the province. I >> also know that Lustre can use Kerberos to do secure authentication, >> which is critical in a WAN environment. >> >> Everything said, I still would like to hear some insider thoughts on >> the possibility of implementing Lustre as a distributed storage solution >> for a physically distributed cloud system. > > This is very similar to the environment that is being used at Indiana > University. They have the Lustre servers at a central site, but > several labs/campuses in other cities are mounting the filesystem and > they can saturate 10GigE links between the sites. > > Note that Kerberos is not yet available in a production release, though > an preview release is available in CVS (use the v1_9_166 tag). Until > kerberos is available you should use a VPN or physically secure link > for the WAN connection. > > Depending on your level of committment to deploying a Lustre solution, > there is the annual Lustre User Group in San Francisco on April 16, 17, > at which many Lustre customers will be attending. >We''re experimenting with a similar environment at the University of Florida across our campus research network (storage centralized at the UF HPC Center) and across the state via the Florida Lambda Rail. I''ll be at the Lustre User Group meeting and am hopeful to share experiences with folks who have tried Lustre-over-WAN. Regarding the WAN stuff and in somewhat of a nutshell, we''ve also been impressed with some of results we''ve had. There are some open issues I''d love to hear about how people are tackling - management stuff like UID/GID domains over a WAN, as well as helpful tunings. In particular, one application we''ve tried over the WAN didn''t fare very well out of the box - lots of small random reads (read ~4k, seek a bunch, read ~4k, etc ad nauseum). Cheers, Craig --- Craig Prescott UF HPC Center
[ ... ]>>> Lustre is originally designed to target at HPC clusters, >>> i.e., systems on a single LAN environment.It is not so much single LAN, but streaming and low latency.>>> On the other hand, the cloud we are building is physically >>> distributed at different cities in the province of Alberta. >>> [ ... ] performance is impressively good, partly due to the >>> fast network we are running in the province.The question here is whether the *clients* and/or the *servers* are physically distributed. If the servers are physically distributed then what are the resilience requirements, and this largely relates to what is the redundancy strategy for the underlying storage, and whether files are striped across OSSes at different sites. In the example below the servers are centralized but maybe this is not what you mean by a "cloud".>> This is very similar to the environment that is being used at >> Indiana University. They have the Lustre servers at a >> central site, but several labs/campuses in other cities are >> mounting the filesystem and they can saturate 10GigE links >> between the sites.That is quite plausible, but relevant performance depends on whether access patterns are streaming or not, and doing some decent TCP setup to maximize link utilization.> [ ... ] one application we''ve tried over the WAN didn''t fare > very well out of the box - lots of small random reads (read > ~4k, seek a bunch, read ~4k, etc ad nauseum).That looks like some people have unrealistic expectations as to latency and synchronous IO more than something to which Lustre is relevant. Even if Lustre is mostly targeted at streaming workloads on low latency networks, and even if it is not too bad in different circumstances.