thr3ads.net - Lustre discuss - [Lustre-discuss] Multiple IB ports [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Brian O''Connor

2011-Mar-21 03:53 UTC

[Lustre-discuss] Multiple IB ports

Hi,

    Any body actually using multiple IB ports on a client for an
aggregated connection?

 

Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming
the normal

issues with bus bandwidth etc, what sort of perf can I expect 

 

qdr ~ 3-4Gbytes/Sec

 

I''m trying to size a cluster and clients to get ~10GBytes/Sec on *one* 

client node. 

 

If I can aggregate IB linearly the next step will be to try and figure
out

How to get 10Gigabytes/s to local storage :-(

 

 

Some times customers are crazy.......

 

 

 

Brian O''Connor

 

-------------------------------------------------

 

SGI Consulting

 

Email: briano at sgi.com <mailto:briano at sgi.com> , Mobile +61 417 746
452

 

Phone: +61 3 9963 1900, Fax: +61 3 9963 1902

 

357 Camberwell Road, Camberwell, Victoria, 3124 

 

AUSTRALIA http://www.sgi.com/support/services
<http://www.sgi.com/support/services> 

 

-------------------------------------------------

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110320/9b624f41/attachment.html

Andreas Dilger

2011-Mar-21 07:41 UTC

head link

[Lustre-discuss] Multiple IB ports

On 2011-03-21, at 4:53 AM, Brian O''Connor
wrote:>     Any body actually using multiple IB ports on a client for an aggregated
connection?
>  
> Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming
the
> normal issues with bus bandwidth etc, what sort of perf can I expect
>  
> qdr ~ 3-4Gbytes/Sec
>  
> I?m trying to size a cluster and clients to get ~10GBytes/Sec on *one*
> client node.
I believe this is possible to some limited extent today.  The main issue is that
the primary NID addresses for the OST IB cards need to be on different subnets
so that the clients will route the traffic to the OSTs via the different IB
HCAs.

I don''t have low-level details on this myself, but I believe there are
a couple of sites that have done this.
> If I can aggregate IB linearly the next step will be to try and figure out
> How to get 10Gigabytes/s to local storage L
>  
>  
> Some times customers are crazy??.
>  
>  
>  
> Brian O''Connor
>  
> -------------------------------------------------
>  
> SGI Consulting
>  
> Email: briano at sgi.com, Mobile +61 417 746 452
>  
> Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
>  
> 357 Camberwell Road, Camberwell, Victoria, 3124
>  
> AUSTRALIA http://www.sgi.com/support/services
>  
> -------------------------------------------------
>  
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.

Sebastien Piechurski

2011-Mar-21 09:18 UTC

head link

[Lustre-discuss] Multiple IB ports

Hi Brian,
 >From my understanding, but confirmation from more skilled people on thelist would be welcomed, using multiple IB ports with a lustre client
will be difficult to manage, and will probably not bring any performance
improvements.
I was told by a colleague that there were currently too many internal
locks in the clients to sustain a big throughput. Lustre is designed for
global throughput on many clients, but not on individual clients.
I can observe this on my site, where I have enough storage and servers
to reach 21GB/s globally, but am unable to get more than 300MB/s on a
single client even though the DDR IB network would sustain +800MB/s ...


________________________________

	From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian
O''Connor
	Sent: lundi 21 mars 2011 04:53
	To: lustre-discuss at lists.lustre.org
	Subject: [Lustre-discuss] Multiple IB ports
	
	

	Hi,

	    Any body actually using multiple IB ports on a client for an
aggregated connection?

	 

	Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports.
Assuming the normal

	issues with bus bandwidth etc, what sort of perf can I expect 

	 

	qdr ~ 3-4Gbytes/Sec

	 

	I''m trying to size a cluster and clients to get ~10GBytes/Sec on
*one* 

	client node. 

	 

	If I can aggregate IB linearly the next step will be to try and
figure out

	How to get 10Gigabytes/s to local storage :-(

	 

	 

	Some times customers are crazy.......

	 

	 

	 

	Brian O''Connor

	 

	-------------------------------------------------

	 

	SGI Consulting

	 

	Email: briano at sgi.com <mailto:briano at sgi.com> , Mobile +61 417
746 452

	 

	Phone: +61 3 9963 1900, Fax: +61 3 9963 1902

	 

	357 Camberwell Road, Camberwell, Victoria, 3124 

	 

	AUSTRALIA http://www.sgi.com/support/services
<http://www.sgi.com/support/services> 

	 

	-------------------------------------------------

	 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110321/65afa9ba/attachment-0001.html

Andreas Dilger

2011-Mar-21 11:37 UTC

head link

[Lustre-discuss] Multiple IB ports

On 2011-03-21, at 10:18 AM, Sebastien Piechurski wrote: > From my understanding, but confirmation from more skilled people on the
list would be welcomed, using multiple IB ports with a lustre client will be
difficult to manage, and will probably not bring any performance improvements.
> I was told by a colleague that there were currently too many internal locks
in the clients to sustain a big throughput. Lustre is designed for global
throughput on many clients, but not on individual clients.
> I can observe this on my site, where I have enough storage and servers to
reach 21GB/s globally, but am unable to get more than 300MB/s on a single client
even though the DDR IB network would sustain +800MB/s ...
There must be something wrong with your configuration or the code has some bug,
because we have had single clients doing 2GB/s in the past.  What version of
Lustre did you test on?

Is this a single-threaded write?  With single-threaded IO the bottleneck often
happens in the kernel copy_{to,from}_user() that is copying data to/from
userspace in order to do data caching in the client.  Having multiple threads
doing the IO allows multiple cores to do the data copying.

Is the lustre debugging disabled?  "lctl set_param debug=0" if this
helps.

Is the Lustre network checksum disabled?  "lctl set_param
osc.*.checksums=0"  There is a patch to allow hardware-assisted checksums,
but it needs some debugging before it can be landed into the production release.

> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian
O''Connor
> Sent: lundi 21 mars 2011 04:53
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] Multiple IB ports
> 
> Hi,
>     Any body actually using multiple IB ports on a client for an aggregated
connection?
>  
> Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming
the normal
> issues with bus bandwidth etc, what sort of perf can I expect
>  
> qdr ~ 3-4Gbytes/Sec
>  
> I?m trying to size a cluster and clients to get ~10GBytes/Sec on *one*
> client node.
>  
> If I can aggregate IB linearly the next step will be to try and figure out
> How to get 10Gigabytes/s to local storage L
>  
>  
> Some times customers are crazy??.
>  
>  
>  
> Brian O''Connor
>  
> -------------------------------------------------
>  
> SGI Consulting
>  
> Email: briano at sgi.com, Mobile +61 417 746 452
>  
> Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
>  
> 357 Camberwell Road, Camberwell, Victoria, 3124
>  
> AUSTRALIA http://www.sgi.com/support/services
>  
> -------------------------------------------------
>  
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.

Paul Nowoczynski

2011-Mar-21 15:28 UTC

head link

[Lustre-discuss] Multiple IB ports

Hi Brian,
I don''t think it''s crazy to strive for that rate, especially
when there
are machines on the market which can accommodate multiple TB''s of 
memory. Assuming my math is mostly correct, to load or unload a 16TB 
data set into a single machine (with 16TB of memory) would take about an 
hour and a half with a single QDR interface:

(16*1024^4)/(3.4*1000^3)/60 == 86 mins

The ratio of memory capacity to I/O bandwidth is a critical issue for 
most large machines. Typically in HPC, we''d like to dump all of memory 
in 5 to 10 minutes.
thanks,
paul


Brian O''Connor wrote:>
> Hi,
>
> Any body actually using multiple IB ports on a client for an 
> aggregated connection?
>
> Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. 
> Assuming the normal
>
> issues with bus bandwidth etc, what sort of perf can I expect
>
> qdr ~ 3-4Gbytes/Sec
>
> I?m trying to size a cluster and clients to get ~10GBytes/Sec on **one**
>
> client node.
>
> If I can aggregate IB linearly the next step will be to try and figure out
>
> How to get 10Gigabytes/s to local storage L
>
> Some times customers are crazy??.
>
> Brian O''Connor
>
> -------------------------------------------------
>
> SGI Consulting
>
> Email: briano at sgi.com <mailto:briano at sgi.com>, Mobile +61 417
746 452
>
> Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
>
> 357 Camberwell Road, Camberwell, Victoria, 3124
>
> AUSTRALIA http://www.sgi.com/support/services
>
> -------------------------------------------------
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Jeremy Filizetti

2011-Mar-21 16:23 UTC

head link

[Lustre-discuss] Multiple IB ports

>
> > I was told by a colleague that there were currently too many internal
> locks in the clients to sustain a big throughput. Lustre is designed for
> global throughput on many clients, but not on individual clients.
>
The LNet SMP scaling fixes/enhancements should help but I don''t believe
they
are coming until 2.1.
> I can observe this on my site, where I have enough storage and servers to
> reach 21GB/s globally, but am unable to get more than 300MB/s on a single
> client even though the DDR IB network would sustain +800MB/s ...
>
You probably need to disable checksums, and a DDR should be able to sustain
1.5 GB/s. I''ve seen close to these rates with LNet self tests I
don''t see
them usually in normal operations with the file system added on top.

> There must be something wrong with your configuration or the code has some
> bug, because we have had single clients doing 2GB/s in the past.  What
> version of Lustre did you test on?
>
I''ve never seen as high as 2 GB/s from a single client but
I''ve only been
focused on single-threaded IO.  For that I''ve seen between 1.3 and 1.4
GB/s
peak.

I spent a little time trying to figure out what that was before with system
tap, but I only looked at the read case.  It looked like the per page
locking penalty can be high.  Monitoring each ll_readpage I was seeing an
median average of 2.4 us for the read scenario while the mode average was
only .5 us.  IIRC it was the llap locking that accounted for most of the
ll_readpage time.  I didn''t look at the penalty for rebalancing the
cache
between the various CPUs.

Using those numbers:>>> ((1/.000002406) * 4096)/2**201623.5453034081463

Give me a best case scenario of ~1.6 GB/s.  I thought about working the read
case but realized the effort probably wasn''t worth putting into 1.8 and
I
would have to wait until 2.0 to test more.  Unfortunately I haven''t had
the
time now to look at 2.0+.
>
> Is this a single-threaded write?  With single-threaded IO the bottleneck
> often happens in the kernel copy_{to,from}_user() that is copying data
> to/from userspace in order to do data caching in the client.  Having
> multiple threads doing the IO allows multiple cores to do the data copying.
>
Even with the copy_{to,from}_user() should be able to provide at least >5
GB/s.  I''ve seen about 5.5 GB/s reading cached data on a client with
lots of
memory.

Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110321/439b65d1/attachment-0001.html

Sebastien Piechurski

2011-Mar-21 17:01 UTC

head link

[Lustre-discuss] Multiple IB ports

Thanks for the correction.

I guess I need to redo some benchs, and go through the tunables ....
> -----Original Message-----
> From: Andreas Dilger [mailto:adilger at whamcloud.com] 
> Sent: lundi 21 mars 2011 12:38
> To: Sebastien Piechurski
> Cc: Brian O''Connor; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Multiple IB ports
> 
> On 2011-03-21, at 10:18 AM, Sebastien Piechurski wrote: 
> > From my understanding, but confirmation from more skilled 
> people on the list would be welcomed, using multiple IB ports 
> with a lustre client will be difficult to manage, and will 
> probably not bring any performance improvements.
> > I was told by a colleague that there were currently too 
> many internal locks in the clients to sustain a big 
> throughput. Lustre is designed for global throughput on many 
> clients, but not on individual clients.
> > I can observe this on my site, where I have enough storage 
> and servers to reach 21GB/s globally, but am unable to get 
> more than 300MB/s on a single client even though the DDR IB 
> network would sustain +800MB/s ...
> 
> There must be something wrong with your configuration or the 
> code has some bug, because we have had single clients doing 
> 2GB/s in the past.  What version of Lustre did you test on?
> 
> Is this a single-threaded write?  With single-threaded IO the 
> bottleneck often happens in the kernel copy_{to,from}_user() 
> that is copying data to/from userspace in order to do data 
> caching in the client.  Having multiple threads doing the IO 
> allows multiple cores to do the data copying.
> 
> Is the lustre debugging disabled?  "lctl set_param debug=0" 
> if this helps.
> 
> Is the Lustre network checksum disabled?  "lctl set_param 
> osc.*.checksums=0"  There is a patch to allow 
> hardware-assisted checksums, but it needs some debugging 
> before it can be landed into the production release.
> 
> 
> > From: lustre-discuss-bounces at lists.lustre.org 
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 
> Brian O''Connor
> > Sent: lundi 21 mars 2011 04:53
> > To: lustre-discuss at lists.lustre.org
> > Subject: [Lustre-discuss] Multiple IB ports
> > 
> > Hi,
> >     Any body actually using multiple IB ports on a client 
> for an aggregated connection?
> >  
> > Ie. Many oss with one qdr IB each. Clients with 4 qdr IB 
> ports. Assuming the normal
> > issues with bus bandwidth etc, what sort of perf can I expect
> >  
> > qdr ~ 3-4Gbytes/Sec
> >  
> > I''m trying to size a cluster and clients to get 
> ~10GBytes/Sec on *one*
> > client node.
> >  
> > If I can aggregate IB linearly the next step will be to try 
> and figure out
> > How to get 10Gigabytes/s to local storage L
> >  
> >  
> > Some times customers are crazy.......
> >  
> >  
> >  
> > Brian O''Connor
> >  
> > -------------------------------------------------
> >  
> > SGI Consulting
> >  
> > Email: briano at sgi.com, Mobile +61 417 746 452
> >  
> > Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
> >  
> > 357 Camberwell Road, Camberwell, Victoria, 3124
> >  
> > AUSTRALIA http://www.sgi.com/support/services
> >  
> > -------------------------------------------------
> >  
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger 
> Principal Engineer
> Whamcloud, Inc.
> 
> 
> 
>

Atul Vidwansa

2011-Mar-22 05:15 UTC

head link

[Lustre-discuss] Multiple IB ports

Hi Brian,

With one 4x QDR IB port, you can achieve 2 GB/Sec on single client,
multi-threaded workload provided that you have right storage (with enough
bandwidth) at other end.  We have tested this multiple times at DDN.

I have seen sites that do IB-bonding across 2 ports but mostly in failover
configuration. To get 10GB/Sec to a single node requires aggregating 5 QDR IB
ports. You will need to confirm from your IB vendor (Mellanox? ), OS vendor
(SGI/RedHat/Novell) and Lustre vendor whether they support aggregating so many
links.  I think the challenge you will have is to find a Lustre client node that
has enough x8 PCIe slots to sustain 3 dual-port Infiniband adapters at full rate
(think multiple such nodes in a typical Lustre filesystem, not so economical).
Other alternative is to find a server that can support 8X or 12X QDR IB port on
the motherboard to get more bandwidth.

With a typical Lustre client memory of 24-64GB and memory to CPU bandwidth of
10GB/Sec (with standard DDR3-1333MHz  DIMMS), it is not possible to fit dataset
larger than 2/3rd  of memory. If you still want to achieve 10GB/Sec of bandwidth
between storage and memory, there are clever alternatives. You will have to
stage your data into memory beforehand and keep memory pages locked and continue
feeding data as these pages are consumed. It is lot harder than it seems on the
paper.

Cheers,
-Atul


From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of Brian O''Connor
Sent: Monday, 21 March 2011 9:23 AM
To: lustre-discuss at lists.lustre.org
Subject: [Lustre-discuss] Multiple IB ports

Hi,
    Any body actually using multiple IB ports on a client for an aggregated
connection?

Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming the
normal
issues with bus bandwidth etc, what sort of perf can I expect

qdr ~ 3-4Gbytes/Sec

I''m trying to size a cluster and clients to get ~10GBytes/Sec on *one*
client node.

If I can aggregate IB linearly the next step will be to try and figure out
How to get 10Gigabytes/s to local storage :(


Some times customers are crazy.......



Brian O''Connor

-------------------------------------------------

SGI Consulting

Email: briano at sgi.com<mailto:briano at sgi.com>, Mobile +61 417 746 452

Phone: +61 3 9963 1900, Fax: +61 3 9963 1902

357 Camberwell Road, Camberwell, Victoria, 3124

AUSTRALIA http://www.sgi.com/support/services

-------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110321/4d779bbd/attachment.html

Peter Kjellström

2011-Mar-22 12:23 UTC

head link

[Lustre-discuss] Multiple IB ports

On Tuesday, March 22, 2011 06:15:35 am Atul Vidwansa
wrote:> Hi Brian,
> 
> With one 4x QDR IB port, you can achieve 2 GB/Sec on single client,
> multi-threaded workload provided that you have right storage (with enough
> bandwidth) at other end.  We have tested this multiple times at DDN.
> 
> I have seen sites that do IB-bonding across 2 ports but mostly in failover
> configuration. To get 10GB/Sec to a single node requires aggregating 5 QDR
> IB ports. You will need to confirm from your IB vendor (Mellanox? ), OS
> vendor (SGI/RedHat/Novell) and Lustre vendor whether they support
> aggregating so many links.  I think the challenge you will have is to find
> a Lustre client node that has enough x8 PCIe slots to sustain 3 dual-port
> Infiniband adapters at full rate
Just adding a small detail, a single port of QDR consumes all of the HCAs pci 
bandwidth so you would need 5 x8 IB HCAs for a total of 40 lanes of pci-
express. This will of course change with the introduction of future pci-
express generations...

/Peter
> (think multiple such nodes in a typical
> Lustre filesystem, not so economical). Other alternative is to find a
> server that can support 8X or 12X QDR IB port on the motherboard to get
> more bandwidth.
> 
> With a typical Lustre client memory of 24-64GB and memory to CPU bandwidth
> of 10GB/Sec (with standard DDR3-1333MHz  DIMMS), it is not possible to fit
> dataset larger than 2/3rd  of memory. If you still want to achieve
> 10GB/Sec of bandwidth between storage and memory, there are clever
> alternatives. You will have to stage your data into memory beforehand and
> keep memory pages locked and continue feeding data as these pages are
> consumed. It is lot harder than it seems on the paper.
> 
> Cheers,
> -Atul-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110322/57759f49/attachment.bin

Mike Hanby

2011-Mar-22 14:30 UTC

head link

[Lustre-discuss] Multiple IB ports

I''m curios about the checksums,

The manual tells you how to turn both types of checksum on or off (client in
memory, and wire/network):
$ echo 0 > /proc/fs/lustre/llite/<fsname>/checksum_pages

Then it tells you how to check the status of wire checksums:
$ /usr/sbin/lctl get_param osc.*.checksums

It''s not clear if 0 in the checksum_pages file overrides the
osc.*.checksums setting, or the opposite (assuming the results of the get_param
shows all OSTs with "...checksums=1".

Also, what''s the typical recommendation for 1.8 sites? in-memory off
and wire on?

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of Peter Kjellstr?m
Sent: Tuesday, March 22, 2011 7:24 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Multiple IB ports

On Tuesday, March 22, 2011 06:15:35 am Atul Vidwansa
wrote:> Hi Brian,
> 
> With one 4x QDR IB port, you can achieve 2 GB/Sec on single client,
> multi-threaded workload provided that you have right storage (with enough
> bandwidth) at other end.  We have tested this multiple times at DDN.
> 
> I have seen sites that do IB-bonding across 2 ports but mostly in failover
> configuration. To get 10GB/Sec to a single node requires aggregating 5 QDR
> IB ports. You will need to confirm from your IB vendor (Mellanox? ), OS
> vendor (SGI/RedHat/Novell) and Lustre vendor whether they support
> aggregating so many links.  I think the challenge you will have is to find
> a Lustre client node that has enough x8 PCIe slots to sustain 3 dual-port
> Infiniband adapters at full rate
Just adding a small detail, a single port of QDR consumes all of the HCAs pci 
bandwidth so you would need 5 x8 IB HCAs for a total of 40 lanes of pci-
express. This will of course change with the introduction of future pci-
express generations...

/Peter
> (think multiple such nodes in a typical
> Lustre filesystem, not so economical). Other alternative is to find a
> server that can support 8X or 12X QDR IB port on the motherboard to get
> more bandwidth.
> 
> With a typical Lustre client memory of 24-64GB and memory to CPU bandwidth
> of 10GB/Sec (with standard DDR3-1333MHz  DIMMS), it is not possible to fit
> dataset larger than 2/3rd  of memory. If you still want to achieve
> 10GB/Sec of bandwidth between storage and memory, there are clever
> alternatives. You will have to stage your data into memory beforehand and
> keep memory pages locked and continue feeding data as these pages are
> consumed. It is lot harder than it seems on the paper.
> 
> Cheers,
> -Atul

Andreas Dilger

2011-Mar-22 15:30 UTC

head link

[Lustre-discuss] Multiple IB ports

On 2011-03-22, at 3:30 PM, Mike Hanby wrote:> I''m curios about the checksums,
> 
> The manual tells you how to turn both types of checksum on or off (client
in memory, and wire/network):
> $ echo 0 > /proc/fs/lustre/llite/<fsname>/checksum_pages
This is enabling/disabling the in-memory page checksums, as well as the network
RPC checksums.  The assumption is that there is no value in doing the in-memory
checksums without the RPC checksums.  It is possible to enable/disable the RPC
checksums independently.
> Then it tells you how to check the status of wire checksums:
> $ /usr/sbin/lctl get_param osc.*.checksums
> 
> It''s not clear if 0 in the checksum_pages file overrides the
osc.*.checksums setting,
Yes, it does.
> or the opposite (assuming the results of the get_param shows all OSTs with
"...checksums=1".
> 
> Also, what''s the typical recommendation for 1.8 sites? in-memory
off and wire on?
The default is in-memory off, RPC checksums on, which is recommended.  The only
time I suggest disabling the RPC checksums is if single-threaded IO performance
is a bottleneck for specific applications, and disabling the checksum CPU usage
is a significant performance boost.
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Peter
Kjellstr?m
> Sent: Tuesday, March 22, 2011 7:24 AM
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Multiple IB ports
> 
> On Tuesday, March 22, 2011 06:15:35 am Atul Vidwansa wrote:
>> Hi Brian,
>> 
>> With one 4x QDR IB port, you can achieve 2 GB/Sec on single client,
>> multi-threaded workload provided that you have right storage (with
enough
>> bandwidth) at other end.  We have tested this multiple times at DDN.
>> 
>> I have seen sites that do IB-bonding across 2 ports but mostly in
failover
>> configuration. To get 10GB/Sec to a single node requires aggregating 5
QDR
>> IB ports. You will need to confirm from your IB vendor (Mellanox? ), OS
>> vendor (SGI/RedHat/Novell) and Lustre vendor whether they support
>> aggregating so many links.  I think the challenge you will have is to
find
>> a Lustre client node that has enough x8 PCIe slots to sustain 3
dual-port
>> Infiniband adapters at full rate
> 
> Just adding a small detail, a single port of QDR consumes all of the HCAs
pci
> bandwidth so you would need 5 x8 IB HCAs for a total of 40 lanes of pci-
> express. This will of course change with the introduction of future pci-
> express generations...
> 
> /Peter
> 
>> (think multiple such nodes in a typical
>> Lustre filesystem, not so economical). Other alternative is to find a
>> server that can support 8X or 12X QDR IB port on the motherboard to get
>> more bandwidth.
>> 
>> With a typical Lustre client memory of 24-64GB and memory to CPU
bandwidth
>> of 10GB/Sec (with standard DDR3-1333MHz  DIMMS), it is not possible to
fit
>> dataset larger than 2/3rd  of memory. If you still want to achieve
>> 10GB/Sec of bandwidth between storage and memory, there are clever
>> alternatives. You will have to stage your data into memory beforehand
and
>> keep memory pages locked and continue feeding data as these pages are
>> consumed. It is lot harder than it seems on the paper.
>> 
>> Cheers,
>> -Atul
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.

Rick Mohr

2011-Apr-01 21:34 UTC

head link

[Lustre-discuss] Multiple IB ports

On Sun, 2011-03-20 at 22:53 -0500, Brian O''Connor wrote:
>     Any body actually using multiple IB ports on a client for an
> aggregated connection?
I am trying to do something like what you mentioned.  I am working on a
machine with multiple IB ports, but rather than trying to aggregate
links, I am just trying to direct Lustre traffic over different IB ports
so there will essentially be a single QDR IB link dedicated to each
MDS/OSS server.  Below are some of the main details.  (I can provide
more detailed info if you think it would be useful.)

The storage is a DDN SFA10k couplet with 28 LUNs.  Each controller in
the couplet has 4 QDR IB ports, but only 2 on each controller are
connected to the IB fabric.  The is a single MGS/MDS server and 4 OSS
servers.  All servers have a single QDR IB port connected to the fabric.
Each OSS node does SRP login to a different DDN port and serves out 7 of
the 28 OSTs.  The lustre client is a SGI UV1000 (1024 cores, 4TB RAM)
with 24 QDR IB ports (of which we are currently only using 5 ports).

The 5 MDS/OSS servers have their single IB ports configured on 2
different lnets.  All 5 servers have o2ib0 configured as well as a
specific lnet for that server (oss1 => o2ib1, oss2->o2ib2, ...,
mds->o2ib5).  The client has lnets o2ib[1-5] configured (one on each of
the 5 IB ports).  I also had to configure some static ip routes on the
client so that each lustre server could ping the corresponding port on
the client.

I am still doing performance testing and playing around with
configuration parameters.  In general, I am getting performance that is
better than using a single QDR IB link, but it certainly is not scaling
up linearly.  I can''t say for sure where the bottleneck is.  It could
be
a misconfiguration on my part, some limitation I am hitting within
lustre, or just the natural result of running lustre on a giant single
system image SMP machine.  (Although I am pretty sure that at least part
of the problem is due to poor NUMA remote memory access performance.)

-- 
Rick Mohr
HPC Systems Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu/

Lustre discuss - Mar 2011 - Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports

[Lustre-discuss] Multiple IB ports