Hello, I am building a 2.3.x filesystem right now, and I am looking at setting up some active-active failover abilities to my oss''s. I have been looking at Dell''s md3xxx arrays, as they have redundant controllers, and allow up to four hosts to connect to each controller. I can see how linux multi-path can be used with redundant disk controllers. I can even (slightly) understand how lustre fails over when an oss goes down. 1. Is lustre smart enough to use redundant paths, or failover oss''s if an oss is congested? (it would be cool, no?) 2. Does the linux multi-path module slow performance? 3. How much does a raid array such as the one listed above act as a bottleneck, say if I have as many volumes available on the raid controllers as there are oss hosts? 4. Are there arrays similar to Dell''s model that would work? Thanks! --jason -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121219/ab54d5e8/attachment.html
Le 19/12/2012 18:36, Jason Brooks a ?crit :> Hello, > > I am building a 2.3.x filesystem right now, and I am looking at setting up some active-active > failover abilities to my oss''s. I have been looking at Dell''s md3xxx arrays, as they have redundant > controllers, and allow up to four hosts to connect to each controller. > > I can see how linux multi-path can be used with redundant disk controllers. I can even (slightly) > understand how lustre fails over when an oss goes down. > > 1. Is lustre smart enough to use redundant paths, or failover oss''s if an oss is congested? (it > would be cool, no?) > 2. Does the linux multi-path module slow performance? > 3. How much does a raid array such as the one listed above act as a bottleneck, say if I have as > many volumes available on the raid controllers as there are oss hosts? > 4. Are there arrays similar to Dell''s model that would work?I''m using 1 dell MD3660F on lustre (60x3to) disk and also nec sgi ibm netapp-lsi this the same hardware (netapp 2660 ;-) on 1.8.8wc1 and IB 2 diskrack are in production for more than one year now without any problem we ave now 5 rack like this + 2 jbod extension with 60 disk connected - each jbod connected to one primary rack with Hyper perf license controllers do peak performance 2Go/s write on 6 raid6 8+2 to achieve this we used 3 server connected directly to diskrack with two FC ports over IB each server with 2 OST on the rack write caching is disable ( cache mirror divide perf by two ) we work with 4 or 6 OST by OSS on production ( sufficient for our need ) failover per oss pair ( not on congestion only when an oss goes down ) I didn''t notice many penalty with multipath ( active/passive mode for this material ) -- Weill Philippe - Administrateur Systeme et Reseaux CNRS/UPMC/IPSL LATMOS (UMR 8190) Email:philippe.weill at latmos.ipsl.fr
Hi Jason, 1. You provide Lustre, when formatting with mkfs.lustre, a standard block device. If you want Lustre to use the multi-pathed device, you''ll need to setup Linux MPIO, then use the multi-pathed device path. Failover between redundant OSS or MDS is not controlled by Lustre either. You will need to setup a corosync + pacemaker or similar type fail-over service. 2. Having two paths to your storage should speed things up. I''m guessing you''d have more than one LUN on the array, so you could do something as simple as splitting the LUNs between the two paths, or use round robin to balance the traffic between the two paths, etc. 3. Totally dependent on the whole system. Start sketching out the entire system starting at the disks, all the way to your clients. Figure out the best case throughput numbers for each part of the system (disks -> disk interconnect -> array controller -> array interconnect to host -> FS Throughput on OSS/MDS -> OSS/MDS network throughput -> switch throughput -> aggregate clients network throughput ?, etc ). This will start giving you a basic idea of where your bottlenecks are. Adjust your design to relieve some of the identified bottlenecks if budget allows. Remember vendors are likely to overestimate throughput numbers or give benchmarks that don''t match your workload. As such it''s best to get your hands on the hardware and test it out yourself. 4. Many if not most storage arrays will functionally work with Lustre. Which will work best in your environment, is largely dependent on your expected work load. Ben On Dec 19, 2012, at 10:36 AM, Jason Brooks wrote: Hello, I am building a 2.3.x filesystem right now, and I am looking at setting up some active-active failover abilities to my oss''s. I have been looking at Dell''s md3xxx arrays, as they have redundant controllers, and allow up to four hosts to connect to each controller. I can see how linux multi-path can be used with redundant disk controllers. I can even (slightly) understand how lustre fails over when an oss goes down. 1. Is lustre smart enough to use redundant paths, or failover oss''s if an oss is congested? (it would be cool, no?) 2. Does the linux multi-path module slow performance? 3. How much does a raid array such as the one listed above act as a bottleneck, say if I have as many volumes available on the raid controllers as there are oss hosts? 4. Are there arrays similar to Dell''s model that would work? Thanks! --jason _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121219/0484bc12/attachment.html
On 2012-12-19, at 11:22, "Allen, Benjamin S" <bsa at lanl.gov<mailto:bsa at lanl.gov>> wrote: Hi Jason, 2. Having two paths to your storage should speed things up. I''m guessing you''d have more than one LUN on the array, so you could do something as simple as splitting the LUNs between the two paths, or use round robin to balance the traffic between the two paths, etc. Using round-robin is not a good idea. This will not increase bandwidth (which is already constrained by the disk and bus) but on some RAID controllers will cause severe performance impact. Cheers, Andreas
Le 19/12/2012 22:38, Dilger, Andreas a ?crit :> On 2012-12-19, at 11:22, "Allen, Benjamin S"<bsa at lanl.gov<mailto:bsa at lanl.gov>> wrote: > > Hi Jason, > > 2. Having two paths to your storage should speed things up. I''m guessing you''d have more than one LUN on the array, > so you could do something as simple as splitting the LUNs between the two paths, or use round robin to balance the > traffic between the two paths, etc. > > Using round-robin is not a good idea. This will not increase bandwidth > (which is already constrained by the disk and bus) but on some RAID controllers > will cause severe performance impact. >I could confirm dell rack disk doesn''t support round robin but you define a prefered controller for each raid volume like this [root at oss-locean ~]# multipath -ll LOCEAN_OST5 (3690b11c0000154a40000069a50b86cfb) dm-7 DELL,MD36xxf [size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][active] \_ 8:0:0:5 sdn 8:208 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 7:0:0:5 sdg 8:96 [active][ghost] LOCEAN_OST4 (3690b11c0000154b50000072350b86892) dm-6 DELL,MD36xxf [size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][active] \_ 7:0:0:4 sdf 8:80 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 8:0:0:4 sdm 8:192 [active][ghost] LOCEAN_OST3 (3690b11c0000154a40000069750b86cc5) dm-5 DELL,MD36xxf [size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][active] \_ 8:0:0:3 sdl 8:176 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 7:0:0:3 sde 8:64 [active][ghost] LOCEAN_OST2 (3690b11c0000154b50000072050b86851) dm-4 DELL,MD36xxf [size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][active] \_ 7:0:0:2 sdd 8:48 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 8:0:0:2 sdk 8:160 [active][ghost] LOCEAN_OST1 (3690b11c0000154a40000069450b86c79) dm-3 DELL,MD36xxf [size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][active] \_ 8:0:0:1 sdj 8:144 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 7:0:0:1 sdc 8:32 [active][ghost] LOCEAN_OST0 (3690b11c0000154b50000071d50b86052) dm-2 DELL,MD36xxf [size=22T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][active] \_ 7:0:0:0 sdb 8:16 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 8:0:0:0 sdi 8:128 [active][ghost]> Cheers, Andreas-- Weill Philippe - Administrateur Systeme et Reseaux CNRS/UPMC/IPSL LATMOS (UMR 8190) Tour 45/46 3e Etage B302 - 4 Place Jussieu - 75252 Paris Cedex 05 - FRANCE Email:philippe.weill at latmos.ipsl.fr | tel:+33 0144274759 Fax:+33 0144273776
In my experience, if there is a particular driver for multipathing from the vendor, go for that. In our setup, we have Oracle/Sun disk arrays and with the standard linux multipathing daemon, I would get lots of weird I/O errors. Turns out the disk arrays had picked their preferred path, but Linux was trying to talk to the LUNs on both paths and would only receive a response on the preferred one. There is an rdac driver that can be installed. Simply disable the multipathing daemon or configure it to ignore the disk arrays and use the vendor solution. I had no more I/O errors(Which only served to slow down the boot up process). On Wed, Dec 19, 2012 at 11:36 AM, Jason Brooks <brookjas at ohsu.edu> wrote:> Hello, > > I am building a 2.3.x filesystem right now, and I am looking at setting up > some active-active failover abilities to my oss''s. I have been looking at > Dell''s md3xxx arrays, as they have redundant controllers, and allow up to > four hosts to connect to each controller. > > I can see how linux multi-path can be used with redundant disk > controllers. I can even (slightly) understand how lustre fails over when > an oss goes down. > > > 1. Is lustre smart enough to use redundant paths, or failover oss''s if > an oss is congested? (it would be cool, no?) > 2. Does the linux multi-path module slow performance? > 3. How much does a raid array such as the one listed above act as a > bottleneck, say if I have as many volumes available on the raid controllers > as there are oss hosts? > 4. Are there arrays similar to Dell''s model that would work? > > Thanks! > > --jason > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121220/98680806/attachment.html
Hey, thanks you guys! I appreciate it a lot! --jason On Dec 20, 2012, at 12:11 PM, David Noriega wrote:> In my experience, if there is a particular driver for multipathing from the vendor, go for that. In our setup, we have Oracle/Sun disk arrays and with the standard linux multipathing daemon, I would get lots of weird I/O errors. Turns out the disk arrays had picked their preferred path, but Linux was trying to talk to the LUNs on both paths and would only receive a response on the preferred one. > > There is an rdac driver that can be installed. Simply disable the multipathing daemon or configure it to ignore the disk arrays and use the vendor solution. I had no more I/O errors(Which only served to slow down the boot up process). > > > On Wed, Dec 19, 2012 at 11:36 AM, Jason Brooks <brookjas at ohsu.edu> wrote: > Hello, > > I am building a 2.3.x filesystem right now, and I am looking at setting up some active-active failover abilities to my oss''s. I have been looking at Dell''s md3xxx arrays, as they have redundant controllers, and allow up to four hosts to connect to each controller. > > I can see how linux multi-path can be used with redundant disk controllers. I can even (slightly) understand how lustre fails over when an oss goes down. > > ? Is lustre smart enough to use redundant paths, or failover oss''s if an oss is congested? (it would be cool, no?) > ? Does the linux multi-path module slow performance? > ? How much does a raid array such as the one listed above act as a bottleneck, say if I have as many volumes available on the raid controllers as there are oss hosts? > ? Are there arrays similar to Dell''s model that would work? > Thanks! > > --jason > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss