Here''s a good question for the list to ponder over the holiday weekend. We''ve got a new cluster system in the works. It will be a heavy lustre user; our plan is to use lustre as the rootfs and as the primary kind of storage for applications. I''ve been thinking through what configuration options we''ll need to deal with for connecting the core of the system to the storage array(s) with the FC hba''s we''re qualifying, as I''m going to need to be able to recommend at least one configuration. It seems to me there are a couple of obvious strategies, with of course many sub-options underneath them; those being either a bunch of essentially point-to-point links from server nodes to storage controllers, or a more traditional "cloud" implemented with some number of FC switches. If one views the set of storage as one big filesystem, it''s easy to think of it sort of like a SAN and decide that the right way to set it up is to stick some switches on the front and plug all the servers in there. But the way lustre uses the OSTs really isn''t SAN-like at all; the metaphor is really much closer to disk drives attached to individual servers. And indeed, the literature doesn''t talk about anything mix-and-match-like, but talks about individual (point to point) connections from the OSS''s to the OST''s. So to first approximation, it seems like it''s much more sensible to structure the FC that way, ie just plug the hbas straight into the controllers (not forgetting about criss-cross redundant connections, to support failover) and be done with it. A few people I''ve talked to (at least some from SAN backgrounds) have said many customers will want to structure it like a SAN because that''s what they''re used to thinking about. Separate from that, it''s not unlikely that will will run into situations where customers have existing SANs, which may or may not be shared with other systems, and want to use them as the backing store for our lustrefs. So I''d like to hear what folks are doing. 1. Does it make sense to structure the OSS<->OST connections as point to point links, or am I missing something? 2. Have you run into customer situations where they have an existing SAN that they want to run lustre on? a. If yes, what issues are involved in configuring it to appear as a bunch of disjoint luns and get lustre set up on it? b. If no, and they instead tend to buy dedicated storage for lustre, do they still want to set it up like a SAN even though it isn''t? 3. Starting from first principles, when talking about a new deployment, what do you recommend, and why? Thanks in advance, and I hope you''re all home with family rather than reading this list :-}
Hi John, On Fri, Dec 22, 2006 at 02:05:35PM -0500, John R. Dunning wrote:> 1. Does it make sense to structure the OSS<->OST connections as point to > point links, or am I missing something?Absolutely. This is, in fact, our recommended configuration. Doing it this way is less complex and has fewer points of failure compared to a SAN-like solution. As you''ve probably figured out, the only thing Lustre requires is that storage be shared between two OSSes (and MDSes) for failover purposes. Since most FC controllers are (at least) dual-ported, this can be done without FC switches.> 2. Have you run into customer situations where they have an existing SAN that > they want to run lustre on? > > a. If yes, what issues are involved in configuring it to appear as a > bunch of disjoint luns and get lustre set up on it? > > b. If no, and they instead tend to buy dedicated storage for lustre, do > they still want to set it up like a SAN even though it isn''t?Yes, and I''m not aware of any issues. SAN administrators are usually good at setting things like this up :)> 3. Starting from first principles, when talking about a new deployment, what > do you recommend, and why?Keep it as simple as possible - have as few components as you can. Not using FC switches fits well with this strategy. Also, create LUNs that are as close to the 8 TB limit as possible so you have fewer OSTs. In general, Lustre doesn''t care about OST count, but this will make things slightly easier to manage and reduce your chances of running into full OST problems. Cheers, Jody> > > Thanks in advance, and I hope you''re all home with family rather than reading > this list :-} > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss--
From: Jody McIntyre <scjody@clusterfs.com> Date: Wed, 27 Dec 2006 08:42:42 -0500 Hi John, On Fri, Dec 22, 2006 at 02:05:35PM -0500, John R. Dunning wrote: > 1. Does it make sense to structure the OSS<->OST connections as point to > point links, or am I missing something? Absolutely. This is, in fact, our recommended configuration. Doing it this way is less complex and has fewer points of failure compared to a SAN-like solution. As you''ve probably figured out, the only thing Lustre requires is that storage be shared between two OSSes (and MDSes) for failover purposes. Since most FC controllers are (at least) dual-ported, this can be done without FC switches. Right. In fact, the hba we''re qualifying is a qlogic 2462 2-port 4Gb unit, which seems like it should be exactly right for this kind of application. > 2. Have you run into customer situations where they have an existing SAN that > they want to run lustre on? > > a. If yes, what issues are involved in configuring it to appear as a > bunch of disjoint luns and get lustre set up on it? > > b. If no, and they instead tend to buy dedicated storage for lustre, do > they still want to set it up like a SAN even though it isn''t? Yes, and I''m not aware of any issues. SAN administrators are usually good at setting things like this up :) Well, ok, so is it a fair statement to say that the recommended configuration is to use point-to-point, but if a customer, for whatever reason, wants to use a SAN-like topology, there''s no real downside to that as long as the host-side software is allowed to treat it as if it was point-to-point? What I''m really after here is what we should put in our documentation, and what to point users to when they''re trying to work out how to set up their systems. > 3. Starting from first principles, when talking about a new deployment, what > do you recommend, and why? Keep it as simple as possible - have as few components as you can. Agreed. Not using FC switches fits well with this strategy. Also, create LUNs that are as close to the 8 TB limit as possible so you have fewer OSTs. In general, Lustre doesn''t care about OST count, but this will make things slightly easier to manage and reduce your chances of running into full OST problems. Sure. In our case, it''s unlikely that we''ll be getting anywhere near 8TB/OST, because we expect to be far more concerned about bandwidth than capacity. But I understand the point; aside from anything else, I want to be able to recommend to people what''s the simplest configuration they can use that will get them the results they''re looking for, and part of that means identifying the smallest set of controllers and things that will do the job. Thanks...
Hi John, On Wed, Dec 27, 2006 at 09:04:00AM -0500, John R. Dunning wrote:> Well, ok, so is it a fair statement to say that the recommended configuration > is to use point-to-point, but if a customer, for whatever reason, wants to use > a SAN-like topology, there''s no real downside to that as long as the host-side > software is allowed to treat it as if it was point-to-point? What I''m really > after here is what we should put in our documentation, and what to point users > to when they''re trying to work out how to set up their systems.Yes, that''s fair. The actual requirement is that it needs to show up as a block device on the host (OSS.) Beyond that, we don''t care how it got there as long as only one OSS is serving it at any given time. This is usually controlled by your failover software (heartbeat/etc.) Cheers, Jody
From: Jody McIntyre <scjody@clusterfs.com> Date: Wed, 27 Dec 2006 09:40:41 -0500 Hi John, On Wed, Dec 27, 2006 at 09:04:00AM -0500, John R. Dunning wrote: > Well, ok, so is it a fair statement to say that the recommended configuration > is to use point-to-point, but if a customer, for whatever reason, wants to use > a SAN-like topology, there''s no real downside to that as long as the host-side > software is allowed to treat it as if it was point-to-point? What I''m really > after here is what we should put in our documentation, and what to point users > to when they''re trying to work out how to set up their systems. Yes, that''s fair. The actual requirement is that it needs to show up as a block device on the host (OSS.) Beyond that, we don''t care how it got there as long as only one OSS is serving it at any given time. This is usually controlled by your failover software (heartbeat/etc.) Yup, got it. I''ve been testing the failover software as we speak, and scribbling down notes that will turn into the doc. I suspect that most of the time the only interesting difference between the san and non-san cases is which luns are visible where. As long as the overall system configuration (including the failover stuff) has a clear idea in its head of which luns get remounted where in the case of a failure, the rest can safely be left up to the sysadmin to do however he feels comfortable.