Hi, Any body actually using multiple IB ports on a client for an aggregated connection? Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming the normal issues with bus bandwidth etc, what sort of perf can I expect qdr ~ 3-4Gbytes/Sec I''m trying to size a cluster and clients to get ~10GBytes/Sec on *one* client node. If I can aggregate IB linearly the next step will be to try and figure out How to get 10Gigabytes/s to local storage :-( Some times customers are crazy....... Brian O''Connor ------------------------------------------------- SGI Consulting Email: briano at sgi.com <mailto:briano at sgi.com> , Mobile +61 417 746 452 Phone: +61 3 9963 1900, Fax: +61 3 9963 1902 357 Camberwell Road, Camberwell, Victoria, 3124 AUSTRALIA http://www.sgi.com/support/services <http://www.sgi.com/support/services> ------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110320/9b624f41/attachment.html
On 2011-03-21, at 4:53 AM, Brian O''Connor wrote:> Any body actually using multiple IB ports on a client for an aggregated connection? > > Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming the > normal issues with bus bandwidth etc, what sort of perf can I expect > > qdr ~ 3-4Gbytes/Sec > > I?m trying to size a cluster and clients to get ~10GBytes/Sec on *one* > client node.I believe this is possible to some limited extent today. The main issue is that the primary NID addresses for the OST IB cards need to be on different subnets so that the clients will route the traffic to the OSTs via the different IB HCAs. I don''t have low-level details on this myself, but I believe there are a couple of sites that have done this.> If I can aggregate IB linearly the next step will be to try and figure out > How to get 10Gigabytes/s to local storage L > > > Some times customers are crazy??. > > > > Brian O''Connor > > ------------------------------------------------- > > SGI Consulting > > Email: briano at sgi.com, Mobile +61 417 746 452 > > Phone: +61 3 9963 1900, Fax: +61 3 9963 1902 > > 357 Camberwell Road, Camberwell, Victoria, 3124 > > AUSTRALIA http://www.sgi.com/support/services > > ------------------------------------------------- > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
Hi Brian,>From my understanding, but confirmation from more skilled people on thelist would be welcomed, using multiple IB ports with a lustre client will be difficult to manage, and will probably not bring any performance improvements. I was told by a colleague that there were currently too many internal locks in the clients to sustain a big throughput. Lustre is designed for global throughput on many clients, but not on individual clients. I can observe this on my site, where I have enough storage and servers to reach 21GB/s globally, but am unable to get more than 300MB/s on a single client even though the DDR IB network would sustain +800MB/s ... ________________________________ From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian O''Connor Sent: lundi 21 mars 2011 04:53 To: lustre-discuss at lists.lustre.org Subject: [Lustre-discuss] Multiple IB ports Hi, Any body actually using multiple IB ports on a client for an aggregated connection? Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming the normal issues with bus bandwidth etc, what sort of perf can I expect qdr ~ 3-4Gbytes/Sec I''m trying to size a cluster and clients to get ~10GBytes/Sec on *one* client node. If I can aggregate IB linearly the next step will be to try and figure out How to get 10Gigabytes/s to local storage :-( Some times customers are crazy....... Brian O''Connor ------------------------------------------------- SGI Consulting Email: briano at sgi.com <mailto:briano at sgi.com> , Mobile +61 417 746 452 Phone: +61 3 9963 1900, Fax: +61 3 9963 1902 357 Camberwell Road, Camberwell, Victoria, 3124 AUSTRALIA http://www.sgi.com/support/services <http://www.sgi.com/support/services> ------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110321/65afa9ba/attachment-0001.html
On 2011-03-21, at 10:18 AM, Sebastien Piechurski wrote:> From my understanding, but confirmation from more skilled people on the list would be welcomed, using multiple IB ports with a lustre client will be difficult to manage, and will probably not bring any performance improvements. > I was told by a colleague that there were currently too many internal locks in the clients to sustain a big throughput. Lustre is designed for global throughput on many clients, but not on individual clients. > I can observe this on my site, where I have enough storage and servers to reach 21GB/s globally, but am unable to get more than 300MB/s on a single client even though the DDR IB network would sustain +800MB/s ...There must be something wrong with your configuration or the code has some bug, because we have had single clients doing 2GB/s in the past. What version of Lustre did you test on? Is this a single-threaded write? With single-threaded IO the bottleneck often happens in the kernel copy_{to,from}_user() that is copying data to/from userspace in order to do data caching in the client. Having multiple threads doing the IO allows multiple cores to do the data copying. Is the lustre debugging disabled? "lctl set_param debug=0" if this helps. Is the Lustre network checksum disabled? "lctl set_param osc.*.checksums=0" There is a patch to allow hardware-assisted checksums, but it needs some debugging before it can be landed into the production release.> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian O''Connor > Sent: lundi 21 mars 2011 04:53 > To: lustre-discuss at lists.lustre.org > Subject: [Lustre-discuss] Multiple IB ports > > Hi, > Any body actually using multiple IB ports on a client for an aggregated connection? > > Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming the normal > issues with bus bandwidth etc, what sort of perf can I expect > > qdr ~ 3-4Gbytes/Sec > > I?m trying to size a cluster and clients to get ~10GBytes/Sec on *one* > client node. > > If I can aggregate IB linearly the next step will be to try and figure out > How to get 10Gigabytes/s to local storage L > > > Some times customers are crazy??. > > > > Brian O''Connor > > ------------------------------------------------- > > SGI Consulting > > Email: briano at sgi.com, Mobile +61 417 746 452 > > Phone: +61 3 9963 1900, Fax: +61 3 9963 1902 > > 357 Camberwell Road, Camberwell, Victoria, 3124 > > AUSTRALIA http://www.sgi.com/support/services > > ------------------------------------------------- > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
Hi Brian, I don''t think it''s crazy to strive for that rate, especially when there are machines on the market which can accommodate multiple TB''s of memory. Assuming my math is mostly correct, to load or unload a 16TB data set into a single machine (with 16TB of memory) would take about an hour and a half with a single QDR interface: (16*1024^4)/(3.4*1000^3)/60 == 86 mins The ratio of memory capacity to I/O bandwidth is a critical issue for most large machines. Typically in HPC, we''d like to dump all of memory in 5 to 10 minutes. thanks, paul Brian O''Connor wrote:> > Hi, > > Any body actually using multiple IB ports on a client for an > aggregated connection? > > Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. > Assuming the normal > > issues with bus bandwidth etc, what sort of perf can I expect > > qdr ~ 3-4Gbytes/Sec > > I?m trying to size a cluster and clients to get ~10GBytes/Sec on **one** > > client node. > > If I can aggregate IB linearly the next step will be to try and figure out > > How to get 10Gigabytes/s to local storage L > > Some times customers are crazy??. > > Brian O''Connor > > ------------------------------------------------- > > SGI Consulting > > Email: briano at sgi.com <mailto:briano at sgi.com>, Mobile +61 417 746 452 > > Phone: +61 3 9963 1900, Fax: +61 3 9963 1902 > > 357 Camberwell Road, Camberwell, Victoria, 3124 > > AUSTRALIA http://www.sgi.com/support/services > > ------------------------------------------------- > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > > I was told by a colleague that there were currently too many internal > locks in the clients to sustain a big throughput. Lustre is designed for > global throughput on many clients, but not on individual clients. >The LNet SMP scaling fixes/enhancements should help but I don''t believe they are coming until 2.1.> I can observe this on my site, where I have enough storage and servers to > reach 21GB/s globally, but am unable to get more than 300MB/s on a single > client even though the DDR IB network would sustain +800MB/s ... >You probably need to disable checksums, and a DDR should be able to sustain 1.5 GB/s. I''ve seen close to these rates with LNet self tests I don''t see them usually in normal operations with the file system added on top.> There must be something wrong with your configuration or the code has some > bug, because we have had single clients doing 2GB/s in the past. What > version of Lustre did you test on? >I''ve never seen as high as 2 GB/s from a single client but I''ve only been focused on single-threaded IO. For that I''ve seen between 1.3 and 1.4 GB/s peak. I spent a little time trying to figure out what that was before with system tap, but I only looked at the read case. It looked like the per page locking penalty can be high. Monitoring each ll_readpage I was seeing an median average of 2.4 us for the read scenario while the mode average was only .5 us. IIRC it was the llap locking that accounted for most of the ll_readpage time. I didn''t look at the penalty for rebalancing the cache between the various CPUs. Using those numbers:>>> ((1/.000002406) * 4096)/2**201623.5453034081463 Give me a best case scenario of ~1.6 GB/s. I thought about working the read case but realized the effort probably wasn''t worth putting into 1.8 and I would have to wait until 2.0 to test more. Unfortunately I haven''t had the time now to look at 2.0+.> > Is this a single-threaded write? With single-threaded IO the bottleneck > often happens in the kernel copy_{to,from}_user() that is copying data > to/from userspace in order to do data caching in the client. Having > multiple threads doing the IO allows multiple cores to do the data copying. >Even with the copy_{to,from}_user() should be able to provide at least >5 GB/s. I''ve seen about 5.5 GB/s reading cached data on a client with lots of memory. Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110321/439b65d1/attachment-0001.html
Thanks for the correction. I guess I need to redo some benchs, and go through the tunables ....> -----Original Message----- > From: Andreas Dilger [mailto:adilger at whamcloud.com] > Sent: lundi 21 mars 2011 12:38 > To: Sebastien Piechurski > Cc: Brian O''Connor; lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Multiple IB ports > > On 2011-03-21, at 10:18 AM, Sebastien Piechurski wrote: > > From my understanding, but confirmation from more skilled > people on the list would be welcomed, using multiple IB ports > with a lustre client will be difficult to manage, and will > probably not bring any performance improvements. > > I was told by a colleague that there were currently too > many internal locks in the clients to sustain a big > throughput. Lustre is designed for global throughput on many > clients, but not on individual clients. > > I can observe this on my site, where I have enough storage > and servers to reach 21GB/s globally, but am unable to get > more than 300MB/s on a single client even though the DDR IB > network would sustain +800MB/s ... > > There must be something wrong with your configuration or the > code has some bug, because we have had single clients doing > 2GB/s in the past. What version of Lustre did you test on? > > Is this a single-threaded write? With single-threaded IO the > bottleneck often happens in the kernel copy_{to,from}_user() > that is copying data to/from userspace in order to do data > caching in the client. Having multiple threads doing the IO > allows multiple cores to do the data copying. > > Is the lustre debugging disabled? "lctl set_param debug=0" > if this helps. > > Is the Lustre network checksum disabled? "lctl set_param > osc.*.checksums=0" There is a patch to allow > hardware-assisted checksums, but it needs some debugging > before it can be landed into the production release. > > > > From: lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of > Brian O''Connor > > Sent: lundi 21 mars 2011 04:53 > > To: lustre-discuss at lists.lustre.org > > Subject: [Lustre-discuss] Multiple IB ports > > > > Hi, > > Any body actually using multiple IB ports on a client > for an aggregated connection? > > > > Ie. Many oss with one qdr IB each. Clients with 4 qdr IB > ports. Assuming the normal > > issues with bus bandwidth etc, what sort of perf can I expect > > > > qdr ~ 3-4Gbytes/Sec > > > > I''m trying to size a cluster and clients to get > ~10GBytes/Sec on *one* > > client node. > > > > If I can aggregate IB linearly the next step will be to try > and figure out > > How to get 10Gigabytes/s to local storage L > > > > > > Some times customers are crazy....... > > > > > > > > Brian O''Connor > > > > ------------------------------------------------- > > > > SGI Consulting > > > > Email: briano at sgi.com, Mobile +61 417 746 452 > > > > Phone: +61 3 9963 1900, Fax: +61 3 9963 1902 > > > > 357 Camberwell Road, Camberwell, Victoria, 3124 > > > > AUSTRALIA http://www.sgi.com/support/services > > > > ------------------------------------------------- > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > > >
Hi Brian, With one 4x QDR IB port, you can achieve 2 GB/Sec on single client, multi-threaded workload provided that you have right storage (with enough bandwidth) at other end. We have tested this multiple times at DDN. I have seen sites that do IB-bonding across 2 ports but mostly in failover configuration. To get 10GB/Sec to a single node requires aggregating 5 QDR IB ports. You will need to confirm from your IB vendor (Mellanox? ), OS vendor (SGI/RedHat/Novell) and Lustre vendor whether they support aggregating so many links. I think the challenge you will have is to find a Lustre client node that has enough x8 PCIe slots to sustain 3 dual-port Infiniband adapters at full rate (think multiple such nodes in a typical Lustre filesystem, not so economical). Other alternative is to find a server that can support 8X or 12X QDR IB port on the motherboard to get more bandwidth. With a typical Lustre client memory of 24-64GB and memory to CPU bandwidth of 10GB/Sec (with standard DDR3-1333MHz DIMMS), it is not possible to fit dataset larger than 2/3rd of memory. If you still want to achieve 10GB/Sec of bandwidth between storage and memory, there are clever alternatives. You will have to stage your data into memory beforehand and keep memory pages locked and continue feeding data as these pages are consumed. It is lot harder than it seems on the paper. Cheers, -Atul From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian O''Connor Sent: Monday, 21 March 2011 9:23 AM To: lustre-discuss at lists.lustre.org Subject: [Lustre-discuss] Multiple IB ports Hi, Any body actually using multiple IB ports on a client for an aggregated connection? Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming the normal issues with bus bandwidth etc, what sort of perf can I expect qdr ~ 3-4Gbytes/Sec I''m trying to size a cluster and clients to get ~10GBytes/Sec on *one* client node. If I can aggregate IB linearly the next step will be to try and figure out How to get 10Gigabytes/s to local storage :( Some times customers are crazy....... Brian O''Connor ------------------------------------------------- SGI Consulting Email: briano at sgi.com<mailto:briano at sgi.com>, Mobile +61 417 746 452 Phone: +61 3 9963 1900, Fax: +61 3 9963 1902 357 Camberwell Road, Camberwell, Victoria, 3124 AUSTRALIA http://www.sgi.com/support/services ------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110321/4d779bbd/attachment.html
On Tuesday, March 22, 2011 06:15:35 am Atul Vidwansa wrote:> Hi Brian, > > With one 4x QDR IB port, you can achieve 2 GB/Sec on single client, > multi-threaded workload provided that you have right storage (with enough > bandwidth) at other end. We have tested this multiple times at DDN. > > I have seen sites that do IB-bonding across 2 ports but mostly in failover > configuration. To get 10GB/Sec to a single node requires aggregating 5 QDR > IB ports. You will need to confirm from your IB vendor (Mellanox? ), OS > vendor (SGI/RedHat/Novell) and Lustre vendor whether they support > aggregating so many links. I think the challenge you will have is to find > a Lustre client node that has enough x8 PCIe slots to sustain 3 dual-port > Infiniband adapters at full rateJust adding a small detail, a single port of QDR consumes all of the HCAs pci bandwidth so you would need 5 x8 IB HCAs for a total of 40 lanes of pci- express. This will of course change with the introduction of future pci- express generations... /Peter> (think multiple such nodes in a typical > Lustre filesystem, not so economical). Other alternative is to find a > server that can support 8X or 12X QDR IB port on the motherboard to get > more bandwidth. > > With a typical Lustre client memory of 24-64GB and memory to CPU bandwidth > of 10GB/Sec (with standard DDR3-1333MHz DIMMS), it is not possible to fit > dataset larger than 2/3rd of memory. If you still want to achieve > 10GB/Sec of bandwidth between storage and memory, there are clever > alternatives. You will have to stage your data into memory beforehand and > keep memory pages locked and continue feeding data as these pages are > consumed. It is lot harder than it seems on the paper. > > Cheers, > -Atul-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110322/57759f49/attachment.bin
I''m curios about the checksums, The manual tells you how to turn both types of checksum on or off (client in memory, and wire/network): $ echo 0 > /proc/fs/lustre/llite/<fsname>/checksum_pages Then it tells you how to check the status of wire checksums: $ /usr/sbin/lctl get_param osc.*.checksums It''s not clear if 0 in the checksum_pages file overrides the osc.*.checksums setting, or the opposite (assuming the results of the get_param shows all OSTs with "...checksums=1". Also, what''s the typical recommendation for 1.8 sites? in-memory off and wire on? -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Peter Kjellstr?m Sent: Tuesday, March 22, 2011 7:24 AM To: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Multiple IB ports On Tuesday, March 22, 2011 06:15:35 am Atul Vidwansa wrote:> Hi Brian, > > With one 4x QDR IB port, you can achieve 2 GB/Sec on single client, > multi-threaded workload provided that you have right storage (with enough > bandwidth) at other end. We have tested this multiple times at DDN. > > I have seen sites that do IB-bonding across 2 ports but mostly in failover > configuration. To get 10GB/Sec to a single node requires aggregating 5 QDR > IB ports. You will need to confirm from your IB vendor (Mellanox? ), OS > vendor (SGI/RedHat/Novell) and Lustre vendor whether they support > aggregating so many links. I think the challenge you will have is to find > a Lustre client node that has enough x8 PCIe slots to sustain 3 dual-port > Infiniband adapters at full rateJust adding a small detail, a single port of QDR consumes all of the HCAs pci bandwidth so you would need 5 x8 IB HCAs for a total of 40 lanes of pci- express. This will of course change with the introduction of future pci- express generations... /Peter> (think multiple such nodes in a typical > Lustre filesystem, not so economical). Other alternative is to find a > server that can support 8X or 12X QDR IB port on the motherboard to get > more bandwidth. > > With a typical Lustre client memory of 24-64GB and memory to CPU bandwidth > of 10GB/Sec (with standard DDR3-1333MHz DIMMS), it is not possible to fit > dataset larger than 2/3rd of memory. If you still want to achieve > 10GB/Sec of bandwidth between storage and memory, there are clever > alternatives. You will have to stage your data into memory beforehand and > keep memory pages locked and continue feeding data as these pages are > consumed. It is lot harder than it seems on the paper. > > Cheers, > -Atul
On 2011-03-22, at 3:30 PM, Mike Hanby wrote:> I''m curios about the checksums, > > The manual tells you how to turn both types of checksum on or off (client in memory, and wire/network): > $ echo 0 > /proc/fs/lustre/llite/<fsname>/checksum_pagesThis is enabling/disabling the in-memory page checksums, as well as the network RPC checksums. The assumption is that there is no value in doing the in-memory checksums without the RPC checksums. It is possible to enable/disable the RPC checksums independently.> Then it tells you how to check the status of wire checksums: > $ /usr/sbin/lctl get_param osc.*.checksums > > It''s not clear if 0 in the checksum_pages file overrides the osc.*.checksums setting,Yes, it does.> or the opposite (assuming the results of the get_param shows all OSTs with "...checksums=1". > > Also, what''s the typical recommendation for 1.8 sites? in-memory off and wire on?The default is in-memory off, RPC checksums on, which is recommended. The only time I suggest disabling the RPC checksums is if single-threaded IO performance is a bottleneck for specific applications, and disabling the checksum CPU usage is a significant performance boost.> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Peter Kjellstr?m > Sent: Tuesday, March 22, 2011 7:24 AM > To: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Multiple IB ports > > On Tuesday, March 22, 2011 06:15:35 am Atul Vidwansa wrote: >> Hi Brian, >> >> With one 4x QDR IB port, you can achieve 2 GB/Sec on single client, >> multi-threaded workload provided that you have right storage (with enough >> bandwidth) at other end. We have tested this multiple times at DDN. >> >> I have seen sites that do IB-bonding across 2 ports but mostly in failover >> configuration. To get 10GB/Sec to a single node requires aggregating 5 QDR >> IB ports. You will need to confirm from your IB vendor (Mellanox? ), OS >> vendor (SGI/RedHat/Novell) and Lustre vendor whether they support >> aggregating so many links. I think the challenge you will have is to find >> a Lustre client node that has enough x8 PCIe slots to sustain 3 dual-port >> Infiniband adapters at full rate > > Just adding a small detail, a single port of QDR consumes all of the HCAs pci > bandwidth so you would need 5 x8 IB HCAs for a total of 40 lanes of pci- > express. This will of course change with the introduction of future pci- > express generations... > > /Peter > >> (think multiple such nodes in a typical >> Lustre filesystem, not so economical). Other alternative is to find a >> server that can support 8X or 12X QDR IB port on the motherboard to get >> more bandwidth. >> >> With a typical Lustre client memory of 24-64GB and memory to CPU bandwidth >> of 10GB/Sec (with standard DDR3-1333MHz DIMMS), it is not possible to fit >> dataset larger than 2/3rd of memory. If you still want to achieve >> 10GB/Sec of bandwidth between storage and memory, there are clever >> alternatives. You will have to stage your data into memory beforehand and >> keep memory pages locked and continue feeding data as these pages are >> consumed. It is lot harder than it seems on the paper. >> >> Cheers, >> -Atul > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
On Sun, 2011-03-20 at 22:53 -0500, Brian O''Connor wrote:> Any body actually using multiple IB ports on a client for an > aggregated connection?I am trying to do something like what you mentioned. I am working on a machine with multiple IB ports, but rather than trying to aggregate links, I am just trying to direct Lustre traffic over different IB ports so there will essentially be a single QDR IB link dedicated to each MDS/OSS server. Below are some of the main details. (I can provide more detailed info if you think it would be useful.) The storage is a DDN SFA10k couplet with 28 LUNs. Each controller in the couplet has 4 QDR IB ports, but only 2 on each controller are connected to the IB fabric. The is a single MGS/MDS server and 4 OSS servers. All servers have a single QDR IB port connected to the fabric. Each OSS node does SRP login to a different DDN port and serves out 7 of the 28 OSTs. The lustre client is a SGI UV1000 (1024 cores, 4TB RAM) with 24 QDR IB ports (of which we are currently only using 5 ports). The 5 MDS/OSS servers have their single IB ports configured on 2 different lnets. All 5 servers have o2ib0 configured as well as a specific lnet for that server (oss1 => o2ib1, oss2->o2ib2, ..., mds->o2ib5). The client has lnets o2ib[1-5] configured (one on each of the 5 IB ports). I also had to configure some static ip routes on the client so that each lustre server could ping the corresponding port on the client. I am still doing performance testing and playing around with configuration parameters. In general, I am getting performance that is better than using a single QDR IB link, but it certainly is not scaling up linearly. I can''t say for sure where the bottleneck is. It could be a misconfiguration on my part, some limitation I am hitting within lustre, or just the natural result of running lustre on a giant single system image SMP machine. (Although I am pretty sure that at least part of the problem is due to poor NUMA remote memory access performance.) -- Rick Mohr HPC Systems Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu/