thr3ads.net - Lustre discuss - [Lustre-discuss] Connection topologies [Dec 2006]

If this information is useful, please help other people find it:
Share via:

John R. Dunning

2006-Dec-22 12:05 UTC

[Lustre-discuss] Connection topologies

Here''s a good question for the list to ponder over the holiday weekend.

We''ve got a new cluster system in the works.  It will be a heavy lustre
user;
our plan is to use lustre as the rootfs and as the primary kind of storage for
applications.  I''ve been thinking through what configuration options
we''ll
need to deal with for connecting the core of the system to the storage
array(s) with the FC hba''s we''re qualifying, as I''m
going to need to be able
to recommend at least one configuration.

It seems to me there are a couple of obvious strategies, with of course many
sub-options underneath them; those being either a bunch of essentially
point-to-point links from server nodes to storage controllers, or a more
traditional "cloud" implemented with some number of FC switches.

If one views the set of storage as one big filesystem, it''s easy to
think of
it sort of like a SAN and decide that the right way to set it up is to stick
some switches on the front and plug all the servers in there.  But the way
lustre uses the OSTs really isn''t SAN-like at all; the metaphor is
really much
closer to disk drives attached to individual servers.  And indeed, the
literature doesn''t talk about anything mix-and-match-like, but talks
about
individual (point to point) connections from the OSS''s to the
OST''s.  So to
first approximation, it seems like it''s much more sensible to structure
the FC
that way, ie just plug the hbas straight into the controllers (not forgetting
about criss-cross redundant connections, to support failover) and be done with
it.

A few people I''ve talked to (at least some from SAN backgrounds) have
said
many customers will want to structure it like a SAN because that''s what
they''re used to thinking about.  Separate from that, it''s not
unlikely that
will will run into situations where customers have existing SANs, which may or
may not be shared with other systems, and want to use them as the backing
store for our lustrefs.  

So I''d like to hear what folks are doing.  

1.  Does it make sense to structure the OSS<->OST connections as point to
    point links, or am I missing something?

2.  Have you run into customer situations where they have an existing SAN that
    they want to run lustre on?

    a.  If yes, what issues are involved in configuring it to appear as a
        bunch of disjoint luns and get lustre set up on it?

    b.  If no, and they instead tend to buy dedicated storage for lustre, do
        they still want to set it up like a SAN even though it isn''t?

3.  Starting from first principles, when talking about a new deployment, what
    do you recommend, and why?


Thanks in advance, and I hope you''re all home with family rather than
reading
this list :-}

Jody McIntyre

2006-Dec-27 06:42 UTC

head link

[Lustre-discuss] Connection topologies

Hi John,

On Fri, Dec 22, 2006 at 02:05:35PM -0500, John R. Dunning wrote:
> 1.  Does it make sense to structure the OSS<->OST connections as
point to
>     point links, or am I missing something?
Absolutely.  This is, in fact, our recommended configuration.  Doing it
this way is less complex and has fewer points of failure compared to a
SAN-like solution.  As you''ve probably figured out, the only thing
Lustre requires is that storage be shared between two OSSes (and MDSes)
for failover purposes.  Since most FC controllers are (at least)
dual-ported, this can be done without FC switches.
> 2.  Have you run into customer situations where they have an existing SAN
that
>     they want to run lustre on?
> 
>     a.  If yes, what issues are involved in configuring it to appear as a
>         bunch of disjoint luns and get lustre set up on it?
> 
>     b.  If no, and they instead tend to buy dedicated storage for lustre,
do
>         they still want to set it up like a SAN even though it
isn''t?
Yes, and I''m not aware of any issues.  SAN administrators are usually
good at setting things like this up :)
> 3.  Starting from first principles, when talking about a new deployment,
what
>     do you recommend, and why?
Keep it as simple as possible - have as few components as you can.
Not using FC switches fits well with this strategy.  Also, create LUNs
that are as close to the 8 TB limit as possible so you have fewer OSTs.
In general, Lustre doesn''t care about OST count, but this will make
things slightly easier to manage and reduce your chances of running into
full OST problems.

Cheers,
Jody
> 
> 
> Thanks in advance, and I hope you''re all home with family rather
than reading
> this list :-}
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
--

John R. Dunning

2006-Dec-27 07:04 UTC

head link

[Lustre-discuss] Connection topologies

From: Jody McIntyre <scjody@clusterfs.com>
    Date: Wed, 27 Dec 2006 08:42:42 -0500

    Hi John,

    On Fri, Dec 22, 2006 at 02:05:35PM -0500, John R. Dunning wrote:

    > 1.  Does it make sense to structure the OSS<->OST connections as
point to
    >     point links, or am I missing something?

    Absolutely.  This is, in fact, our recommended configuration.  Doing it
    this way is less complex and has fewer points of failure compared to a
    SAN-like solution.  As you''ve probably figured out, the only thing
    Lustre requires is that storage be shared between two OSSes (and MDSes)
    for failover purposes.  Since most FC controllers are (at least)
    dual-ported, this can be done without FC switches.

Right.  In fact, the hba we''re qualifying is a qlogic 2462 2-port 4Gb
unit,
which seems like it should be exactly right for this kind of application.

    > 2.  Have you run into customer situations where they have an existing
SAN that
    >     they want to run lustre on?
    > 
    >     a.  If yes, what issues are involved in configuring it to appear as
a
    >         bunch of disjoint luns and get lustre set up on it?
    > 
    >     b.  If no, and they instead tend to buy dedicated storage for
lustre, do
    >         they still want to set it up like a SAN even though it
isn''t?

    Yes, and I''m not aware of any issues.  SAN administrators are
usually
    good at setting things like this up :)

Well, ok, so is it a fair statement to say that the recommended configuration
is to use point-to-point, but if a customer, for whatever reason, wants to use
a SAN-like topology, there''s no real downside to that as long as the
host-side
software is allowed to treat it as if it was point-to-point?  What I''m
really
after here is what we should put in our documentation, and what to point users
to when they''re trying to work out how to set up their systems.

    > 3.  Starting from first principles, when talking about a new
deployment, what
    >     do you recommend, and why?

    Keep it as simple as possible - have as few components as you can.

Agreed.  

    Not using FC switches fits well with this strategy.  Also, create LUNs
    that are as close to the 8 TB limit as possible so you have fewer OSTs.
    In general, Lustre doesn''t care about OST count, but this will make
    things slightly easier to manage and reduce your chances of running into
    full OST problems.

Sure.  In our case, it''s unlikely that we''ll be getting
anywhere near 8TB/OST,
because we expect to be far more concerned about bandwidth than capacity.  But
I understand the point; aside from anything else, I want to be able to
recommend to people what''s the simplest configuration they can use that
will
get them the results they''re looking for, and part of that means
identifying
the smallest set of controllers and things that will do the job.

Thanks...

Jody McIntyre

2006-Dec-27 07:40 UTC

head link

[Lustre-discuss] Connection topologies

Hi John,

On Wed, Dec 27, 2006 at 09:04:00AM -0500, John R. Dunning
wrote:> Well, ok, so is it a fair statement to say that the recommended
configuration
> is to use point-to-point, but if a customer, for whatever reason, wants to
use
> a SAN-like topology, there''s no real downside to that as long as
the host-side
> software is allowed to treat it as if it was point-to-point?  What
I''m really
> after here is what we should put in our documentation, and what to point
users
> to when they''re trying to work out how to set up their systems.
Yes, that''s fair.

The actual requirement is that it needs to show up as a block device on
the host (OSS.)  Beyond that, we don''t care how it got there as long as
only one OSS is serving it at any given time.  This is usually
controlled by your failover software (heartbeat/etc.)

Cheers,
Jody

John R. Dunning

2006-Dec-27 08:22 UTC

head link

[Lustre-discuss] Connection topologies

From: Jody McIntyre <scjody@clusterfs.com>
    Date: Wed, 27 Dec 2006 09:40:41 -0500

    Hi John,

    On Wed, Dec 27, 2006 at 09:04:00AM -0500, John R. Dunning wrote:
    > Well, ok, so is it a fair statement to say that the recommended
configuration
    > is to use point-to-point, but if a customer, for whatever reason, wants
to use
    > a SAN-like topology, there''s no real downside to that as long
as the host-side
    > software is allowed to treat it as if it was point-to-point?  What
I''m really
    > after here is what we should put in our documentation, and what to
point users
    > to when they''re trying to work out how to set up their
systems.

    Yes, that''s fair.

    The actual requirement is that it needs to show up as a block device on
    the host (OSS.)  Beyond that, we don''t care how it got there as
long as
    only one OSS is serving it at any given time.  This is usually
    controlled by your failover software (heartbeat/etc.)

Yup, got it.  I''ve been testing the failover software as we speak, and
scribbling down notes that will turn into the doc.  I suspect that most of the
time the only interesting difference between the san and non-san cases is
which luns are visible where.  As long as the overall system configuration
(including the failover stuff) has a clear idea in its head of which luns get
remounted where in the case of a failure, the rest can safely be left up to
the sysadmin to do however he feels comfortable.

Lustre discuss - Dec 2006 - Connection topologies

[Lustre-discuss] Connection topologies

[Lustre-discuss] Connection topologies

[Lustre-discuss] Connection topologies

[Lustre-discuss] Connection topologies

[Lustre-discuss] Connection topologies