thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre Testbed System Design [May 2006]

If this information is useful, please help other people find it:
Share via:

Phil Schwan

2006-May-19 07:36 UTC

[Lustre-discuss] Lustre Testbed System Design

Brent M. Clements wrote:> 
> 10 MDS/OST Servers each having:
>    Dual Processor 2.8Ghz XEON machines w/1mb cache
>    Dual Gig Network Adaptors
>    8 Gigs Memory
>    1 internal 73Gig SCSI 15k Drive
>    Fibre Channel Attached External Storage with 220GB
> 
> Clients:
>    10 Dual Processor 2.4GHZ Xeon machines
>    1G Network Adaptor
>    4 Gigs Memory
>    1 internal 73Gig SCSI 15k Drive
> 
> Each MDS/OST server''s dual port network adapters will be bonded to
the
> GigE switch.
> 
> Three  questions:
> 1. Is this a good configuration?
> 2. Should I just have one MDS taking care of
> the entire lustre testbed, or should I do it like I have it designed
> above where I have multiple MDS''s(acting as backup/failures to all
other
> MDS''s) each one running along side an OST?
Three notes about your OST/MDS configuration:

First, a single machine should not be a combination OST/MDS.  Lustre 1.x
can recover from a string of failures, as long as the recovery protocols
have a chance to complete before the next failure occurs.  If you put an
MDS and OST on the same machine, you have created a situation in which
there will always be a double failure, which is not yet supported well.

Second, if you have separate machines for your MDSs, that might change
your decision about whether to use one or two.  If you have a second MDS
it will not increase your file system''s metadata performance, but it
will allow for testing failover.

Third, given that they''re separate machines, you can save yourself some
money by building them differently.  OSTs will, as a rule of thumb, not
make very good use of a lot of memory.  To improve performance, we
don''t
do any server-side write caching, and even read caching may be scaled
back soon.

The main consumers of memory on the OST are request buffers and locking,
which on a testbed are on the order of 100MB or less, not 8GB.  In our
production sites, OSTs tend to have 2GB or 4GB, and could do ok with less.
> 3. Am I missing any hardware such as an additional machine to act as a
> portals router? Or anything else that I may be missing?
No, I think this will be a nice little test cluster.  These were good
questions, and we''ll make sure that the answers get rolled into the
documentation.

Thanks--

-Phil

Daire Byrne

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Testbed System Design

> > Each MDS/OST server''s dual port network adapters will be
bonded to the
> > GigE switch.
In my experience bonding 2 gigabit connections together rarely gives you 
the performance increase you expect. TCP is a hard enough strain for a 
Linux machine without the extra burden of ordering packets arriving from 
two different NICs. I admit this view is based on year old first hand 
experience. I would be interested to know if the Lustre developers or 
any other users have experimented with channel bonding and whether they 
noticed any caveats or performance issues.

Daire

Brent M. Clements

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Testbed System Design

Daire,
I disagree, Bonding two ethernet channels together does in fact give you
the performance boost. The issue is that you must design the
infrastructure right. This means the following:

1. You must use GigE cards that can perform off-loading. IE. the GigE card
takes care of most of the tcp/ip overhead. Most server oriented cards can
do this. Generic GigE cards cannot.
2. You need to use jumbo frames(mtu 9000+)
3. You need a switch that has enough switching capacity and can do
wireline speed.

Take Care,
Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University

On Tue, 20 Jan 2004, Daire Byrne wrote:
>
> > > Each MDS/OST server''s dual port network adapters will be
bonded to the
> > > GigE switch.
>
> In my experience bonding 2 gigabit connections together rarely gives you
> the performance increase you expect. TCP is a hard enough strain for a
> Linux machine without the extra burden of ordering packets arriving from
> two different NICs. I admit this view is based on year old first hand
> experience. I would be interested to know if the Lustre developers or
> any other users have experimented with channel bonding and whether they
> noticed any caveats or performance issues.
>
> Daire
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
>

Brent M. Clements

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Testbed System Design

We are designing a testbed to test out the lustre filesystem.

We have the following configuration

10 MDS/OST Servers each having:
   Dual Processor 2.8Ghz XEON machines w/1mb cache
   Dual Gig Network Adaptors
   8 Gigs Memory
   1 internal 73Gig SCSI 15k Drive
   Fibre Channel Attached External Storage with 220GB

Clients:
   10 Dual Processor 2.4GHZ Xeon machines
   1G Network Adaptor
   4 Gigs Memory
   1 internal 73Gig SCSI 15k Drive

Each MDS/OST server''s dual port network adapters will be bonded to the
GigE switch.

Three  questions:
1. Is this a good configuration?
2. Should I just have one MDS taking care of
the entire lustre testbed, or should I do it like I have it designed
above where I have multiple MDS''s(acting as backup/failures to all
other
MDS''s) each one running along side an OST?
3. Am I missing any hardware such as an additional machine to act as a
portals router? Or anything else that I may be missing?


Thanks,

Brent Clements
Linux Technology Specialist
Information Technology
Rice University

Lustre discuss - May 2006 - Lustre Testbed System Design

[Lustre-discuss] Lustre Testbed System Design

[Lustre-discuss] Lustre Testbed System Design

[Lustre-discuss] Lustre Testbed System Design

[Lustre-discuss] Lustre Testbed System Design