thr3ads.net - Lustre discuss - [Lustre-discuss] How smart is Lustre? [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Jason Brooks

2012-Dec-19 17:36 UTC

[Lustre-discuss] How smart is Lustre?

Hello,

I am building a 2.3.x filesystem right now, and I am looking at setting up some
active-active failover abilities to my oss''s.  I have been looking at
Dell''s md3xxx arrays, as they have redundant controllers, and allow up
to four hosts to connect to each controller.

I can see how linux multi-path can be used with redundant disk controllers.  I
can even (slightly) understand how lustre fails over when an oss goes down.


 1.  Is lustre smart enough to use redundant paths, or failover oss''s
if an oss is congested?  (it would be cool, no?)
 2.  Does the linux multi-path module slow performance?
 3.  How much does a raid array such as the one listed above act as a
bottleneck, say if I have as many volumes available on the raid controllers as
there are oss hosts?
 4.  Are there arrays similar to Dell''s model that would work?

Thanks!

--jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121219/ab54d5e8/attachment.html

Philippe Weill

2012-Dec-19 18:18 UTC

head link

[Lustre-discuss] How smart is Lustre?

Le 19/12/2012 18:36, Jason Brooks a ?crit :> Hello,
>
> I am building a 2.3.x filesystem right now, and I am looking at setting up
some active-active
> failover abilities to my oss''s. I have been looking at
Dell''s md3xxx arrays, as they have redundant
> controllers, and allow up to four hosts to connect to each controller.
>
> I can see how linux multi-path can be used with redundant disk controllers.
I can even (slightly)
> understand how lustre fails over when an oss goes down.
>
>  1. Is lustre smart enough to use redundant paths, or failover
oss''s if an oss is congested? (it
>     would be cool, no?)
>  2. Does the linux multi-path module slow performance?
>  3. How much does a raid array such as the one listed above act as a
bottleneck, say if I have as
>     many volumes available on the raid controllers as there are oss hosts?
>  4. Are there arrays similar to Dell''s model that would work?
I''m using 1 dell MD3660F  on lustre (60x3to) disk
and also nec sgi ibm netapp-lsi this the same hardware (netapp 2660 ;-) on
1.8.8wc1 and IB
2 diskrack are in production for more than one year now without any problem
we ave now 5 rack like this + 2 jbod extension with 60 disk connected - each
jbod connected to one
primary rack

with Hyper perf license controllers do peak performance 2Go/s write on 6 raid6
8+2
to achieve this we used 3 server connected directly to diskrack with two FC
ports over IB
each server with 2 OST

on the rack write caching is disable ( cache mirror divide perf by two )

we work with 4 or 6 OST by OSS on production ( sufficient for our need )
failover per oss pair   ( not on congestion only when an oss goes down )

I didn''t notice many penalty with multipath ( active/passive mode for
this material )

-- 
  Weill Philippe -  Administrateur Systeme et Reseaux
  CNRS/UPMC/IPSL   LATMOS (UMR 8190)
  Email:philippe.weill at latmos.ipsl.fr

Allen, Benjamin S

2012-Dec-19 18:22 UTC

head link

[Lustre-discuss] How smart is Lustre?

Hi Jason,

1. You provide Lustre, when formatting with mkfs.lustre, a standard block
device. If you want Lustre to use the multi-pathed device, you''ll need
to setup Linux MPIO, then use the multi-pathed device path.

Failover between redundant OSS or MDS is not controlled by Lustre either. You
will need to setup a corosync + pacemaker or similar type fail-over service.

2. Having two paths to your storage should speed things up. I''m
guessing you''d have more than one LUN on the array, so you could do
something as simple as splitting the LUNs between the two paths, or use round
robin to balance the traffic between the two paths, etc.

3. Totally dependent on the whole system. Start sketching out the entire system
starting at the disks, all the way to your clients. Figure out the best case
throughput numbers for each part of the system (disks -> disk interconnect
-> array controller -> array interconnect to host -> FS Throughput on
OSS/MDS -> OSS/MDS network throughput -> switch throughput -> aggregate
clients network throughput ?, etc ). This will start giving you a basic idea of
where your bottlenecks are. Adjust your design to relieve some of the identified
bottlenecks if budget allows. Remember vendors are likely to overestimate
throughput numbers or give benchmarks that don''t match your workload.
As such it''s best to get your hands on the hardware and test it out
yourself.

4. Many if not most storage arrays will functionally work with Lustre. Which
will work best in your environment, is largely dependent on your expected work
load.

Ben

On Dec 19, 2012, at 10:36 AM, Jason Brooks wrote:

Hello,

I am building a 2.3.x filesystem right now, and I am looking at setting up some
active-active failover abilities to my oss''s. I have been looking at
Dell''s md3xxx arrays, as they have redundant controllers, and allow up
to four hosts to connect to each controller.

I can see how linux multi-path can be used with redundant disk controllers. I
can even (slightly) understand how lustre fails over when an oss goes down.

1. Is lustre smart enough to use redundant paths, or failover oss''s
if an oss is congested? (it would be cool, no?)
2. Does the linux multi-path module slow performance?
3. How much does a raid array such as the one listed above act as a
bottleneck, say if I have as many volumes available on the raid controllers as
there are oss hosts?
4. Are there arrays similar to Dell''s model that would work?

Thanks!

--jason
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at
lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121219/0484bc12/attachment.html

Dilger, Andreas

2012-Dec-19 21:38 UTC

head link

[Lustre-discuss] How smart is Lustre?

On 2012-12-19, at 11:22, "Allen, Benjamin S" <bsa at
lanl.gov<mailto:bsa at lanl.gov>> wrote:

Hi Jason,

2. Having two paths to your storage should speed things up. I''m
guessing you''d have more than one LUN on the array, so you could do
something as simple as splitting the LUNs between the two paths, or use round
robin to balance the traffic between the two paths, etc.

Using round-robin is not a good idea.  This will not increase bandwidth (which
is already constrained by the disk and bus) but on some RAID controllers will
cause severe performance impact.

Cheers, Andreas

Philippe Weill

2012-Dec-20 05:28 UTC

head link

[Lustre-discuss] How smart is Lustre?

Le 19/12/2012 22:38, Dilger, Andreas a ?crit :> On 2012-12-19, at 11:22, "Allen, Benjamin S"<bsa at
lanl.gov<mailto:bsa at lanl.gov>>  wrote:
>
> Hi Jason,
>
> 2. Having two paths to your storage should speed things up. I''m
guessing you''d have more than one LUN on the array,
> so you could do something as simple as splitting the LUNs between the two
paths, or use round robin to balance the
> traffic between the two paths, etc.
>
> Using round-robin is not a good idea.  This will not increase bandwidth
> (which is already constrained by the disk and bus) but on some RAID
controllers
> will cause severe performance impact.
>
I could confirm dell rack disk doesn''t support round robin

but you define a prefered controller for each raid volume like this

[root at oss-locean ~]# multipath -ll
LOCEAN_OST5 (3690b11c0000154a40000069a50b86cfb) dm-7 DELL,MD36xxf
[size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
  \_ 8:0:0:5  sdn 8:208 [active][ready]
\_ round-robin 0 [prio=0][enabled]
  \_ 7:0:0:5  sdg 8:96  [active][ghost]
LOCEAN_OST4 (3690b11c0000154b50000072350b86892) dm-6 DELL,MD36xxf
[size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
  \_ 7:0:0:4  sdf 8:80  [active][ready]
\_ round-robin 0 [prio=0][enabled]
  \_ 8:0:0:4  sdm 8:192 [active][ghost]
LOCEAN_OST3 (3690b11c0000154a40000069750b86cc5) dm-5 DELL,MD36xxf
[size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
  \_ 8:0:0:3  sdl 8:176 [active][ready]
\_ round-robin 0 [prio=0][enabled]
  \_ 7:0:0:3  sde 8:64  [active][ghost]
LOCEAN_OST2 (3690b11c0000154b50000072050b86851) dm-4 DELL,MD36xxf
[size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
  \_ 7:0:0:2  sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
  \_ 8:0:0:2  sdk 8:160 [active][ghost]
LOCEAN_OST1 (3690b11c0000154a40000069450b86c79) dm-3 DELL,MD36xxf
[size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
  \_ 8:0:0:1  sdj 8:144 [active][ready]
\_ round-robin 0 [prio=0][enabled]
  \_ 7:0:0:1  sdc 8:32  [active][ghost]
LOCEAN_OST0 (3690b11c0000154b50000071d50b86052) dm-2 DELL,MD36xxf
[size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
  \_ 7:0:0:0  sdb 8:16  [active][ready]
\_ round-robin 0 [prio=0][enabled]
  \_ 8:0:0:0  sdi 8:128 [active][ghost]


> Cheers, Andreas

-- 
  Weill Philippe -  Administrateur Systeme et Reseaux
  CNRS/UPMC/IPSL   LATMOS (UMR 8190)
  Tour 45/46 3e Etage B302 - 4 Place Jussieu - 75252 Paris Cedex 05 -  FRANCE
  Email:philippe.weill at latmos.ipsl.fr | tel:+33 0144274759 Fax:+33 0144273776

David Noriega

2012-Dec-20 20:11 UTC

head link

[Lustre-discuss] How smart is Lustre?

In my experience, if there is a particular driver for multipathing from the
vendor, go for that. In our setup, we have Oracle/Sun disk arrays and with
the standard linux multipathing daemon, I would get lots of weird I/O
errors. Turns out the disk arrays had picked their preferred path, but
Linux was trying to talk to the LUNs on both paths and would only receive a
response on the preferred one.

There is an rdac driver that can be installed. Simply disable the
multipathing daemon or configure it to ignore the disk arrays and use the
vendor solution. I had no more I/O errors(Which only served to slow down
the boot up process).

On Wed, Dec 19, 2012 at 11:36 AM, Jason Brooks <brookjas at ohsu.edu>
wrote:
> Hello,
>
> I am building a 2.3.x filesystem right now, and I am looking at setting up
> some active-active failover abilities to my oss''s.  I have been
looking at
> Dell''s md3xxx arrays, as they have redundant controllers, and
allow up to
> four hosts to connect to each controller.
>
> I can see how linux multi-path can be used with redundant disk
> controllers.  I can even (slightly) understand how lustre fails over when
> an oss goes down.
>
>
>    1. Is lustre smart enough to use redundant paths, or failover
oss''s if
>    an oss is congested?  (it would be cool, no?)
>    2. Does the linux multi-path module slow performance?
>    3. How much does a raid array such as the one listed above act as a
>    bottleneck, say if I have as many volumes available on the raid
controllers
>    as there are oss hosts?
>    4. Are there arrays similar to Dell''s model that would work?
>
> Thanks!
>
> --jason
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121220/98680806/attachment.html

Jason Brooks

2012-Dec-27 16:05 UTC

head link

[Lustre-discuss] How smart is Lustre?

Hey, thanks you guys!  I appreciate it a lot!

--jason


On Dec 20, 2012, at 12:11 PM, David Noriega wrote:
> In my experience, if there is a particular driver for multipathing from the
vendor, go for that. In our setup, we have Oracle/Sun disk arrays and with the
standard linux multipathing daemon, I would get lots of weird I/O errors. Turns
out the disk arrays had picked their preferred path, but Linux was trying to
talk to the LUNs on both paths and would only receive a response on the
preferred one.
> 
> There is an rdac driver that can be installed. Simply disable the
multipathing daemon or configure it to ignore the disk arrays and use the vendor
solution. I had no more I/O errors(Which only served to slow down the boot up
process).
> 
> 
> On Wed, Dec 19, 2012 at 11:36 AM, Jason Brooks <brookjas at ohsu.edu>
wrote:
> Hello,
> 
> I am building a 2.3.x filesystem right now, and I am looking at setting up
some active-active failover abilities to my oss''s.  I have been looking
at Dell''s md3xxx arrays, as they have redundant controllers, and allow
up to four hosts to connect to each controller.
> 
> I can see how linux multi-path can be used with redundant disk controllers.
I can even (slightly) understand how lustre fails over when an oss goes down.
> 
> 	? Is lustre smart enough to use redundant paths, or failover
oss''s if an oss is congested?  (it would be cool, no?)
> 	? Does the linux multi-path module slow performance?
> 	? How much does a raid array such as the one listed above act as a
bottleneck, say if I have as many volumes available on the raid controllers as
there are oss hosts?
> 	? Are there arrays similar to Dell''s model that would work?
> Thanks!
> 
> --jason
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre discuss - Dec 2012 - How smart is Lustre?

[Lustre-discuss] How smart is Lustre?

[Lustre-discuss] How smart is Lustre?

[Lustre-discuss] How smart is Lustre?

[Lustre-discuss] How smart is Lustre?

[Lustre-discuss] How smart is Lustre?

[Lustre-discuss] How smart is Lustre?

[Lustre-discuss] How smart is Lustre?