thr3ads.net - Lustre discuss - [Lustre-discuss] Line rate performance for clients [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Brock Palen

2011-Jul-29 17:33 UTC

[Lustre-discuss] Line rate performance for clients

I think this is a networking question.

We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are
running full duplex.

If I do the following:

cp /lustre/largeilfe.h5 /tmp/

I get 117MB/s

If I then use globus-url-copy to move that file from /tmp/ to -> remove tape
archive I get 117MB/s

If I go directly from  /lustre -> archive  I get 50MB/s,  

this is consistently reproducible.  It doesn''t mater if I just copy a
large file on lustre to lustre,  or scp, or globus.  If I try to ingest and
outgest data I get what looks like half duplex performance.

Anyone have ideas why I cannot do 1Gig-e full duplex?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985

Colin Faber

2011-Jul-29 17:35 UTC

head link

[Lustre-discuss] Line rate performance for clients

Try testing it with lnet self test and see what kind of results you see.

-cf


On 07/29/2011 11:33 AM, Brock Palen wrote:> I think this is a networking question.
>
> We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool
are running full duplex.
>
> If I do the following:
>
> cp /lustre/largeilfe.h5 /tmp/
>
> I get 117MB/s
>
> If I then use globus-url-copy to move that file from /tmp/ to ->  remove
tape archive I get 117MB/s
>
> If I go directly from  /lustre ->  archive  I get 50MB/s,
>
> this is consistently reproducible.  It doesn''t mater if I just
copy a large file on lustre to lustre,  or scp, or globus.  If I try to ingest
and outgest data I get what looks like half duplex performance.
>
> Anyone have ideas why I cannot do 1Gig-e full duplex?
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2011-Jul-29 18:01 UTC

head link

[Lustre-discuss] Line rate performance for clients

On 2011-07-29, at 11:33 AM, Brock Palen wrote:> I think this is a networking question.
> 
> We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool
are running full duplex.
> 
> If I do the following:
> 
> cp /lustre/largeilfe.h5 /tmp/
> 
> I get 117MB/s
> 
> If I then use globus-url-copy to move that file from /tmp/ to -> remove
tape archive I get 117MB/s
> 
> If I go directly from  /lustre -> archive  I get 50MB/s,  
Strace your globus-url-copy and see what IO size it is using.  "cp"
has long ago been modified to use the blocksize reported by stat(2) for copying,
and Lustre reports a 2MB IO size for striped files (1MB for unstriped).  If your
globus tool is using e.g. 4kB reads then it will be very inefficient for Lustre,
but much less so than from /tmp.
> this is consistently reproducible.  It doesn''t mater if I just
copy a large file on lustre to lustre,  or scp, or globus.  If I try to ingest
and outgest data I get what looks like half duplex performance.
> 
> Anyone have ideas why I cannot do 1Gig-e full duplex?
I don''t think this has anything to do with "full duplex". 
117MB/s is pretty much  the maximum line rate for GigE (and pretty good for
Lustre, if I do say so myself) in one direction.  There is presumably no data
moving in the other direction at that time.

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.

Brock Palen

2011-Jul-29 18:15 UTC

head link

[Lustre-discuss] Line rate performance for clients

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jul 29, 2011, at 2:01 PM, Andreas Dilger wrote:
> On 2011-07-29, at 11:33 AM, Brock Palen wrote:
>> I think this is a networking question.
>> 
>> We have lustre 1.8 clients with 1gig-e interfaces that according to
ethtool are running full duplex.
>> 
>> If I do the following:
>> 
>> cp /lustre/largeilfe.h5 /tmp/
>> 
>> I get 117MB/s
>> 
>> If I then use globus-url-copy to move that file from /tmp/ to ->
remove tape archive I get 117MB/s
>> 
>> If I go directly from  /lustre -> archive  I get 50MB/s,  
> 
> Strace your globus-url-copy and see what IO size it is using. 
"cp" has long ago been modified to use the blocksize reported by
stat(2) for copying, and Lustre reports a 2MB IO size for striped files (1MB for
unstriped).  If your globus tool is using e.g. 4kB reads then it will be very
inefficient for Lustre, but much less so than from /tmp.
> 
>> this is consistently reproducible.  It doesn''t mater if I just
copy a large file on lustre to lustre,  or scp, or globus.  If I try to ingest
and outgest data I get what looks like half duplex performance.
>> 
>> Anyone have ideas why I cannot do 1Gig-e full duplex?
> 
> I don''t think this has anything to do with "full
duplex".  117MB/s is pretty much  the maximum line rate for GigE (and
pretty good for Lustre, if I do say so myself) in one direction.  There is
presumably no data moving in the other direction at that time.
Ah I guess I wasn''t clear, I only get 117MB/s when I do ''one
direction on the network''  eg copy form lustre to /tmp (local
drive)'',   /tmp using globus out.

Its just when the client is reading form lustre and sending the data out at the
same time that I only get 50MB/s.

Does that make sense?  Is it even right for me to expect that I could combine
the performance together and expect full speed in and full speed out if I can
consistently get them independent of each other?
> 
> Cheers, Andreas
> --
> Andreas Dilger 
> Principal Engineer
> Whamcloud, Inc.
> 
> 
> 
> 
>

Peter Kjellström

2011-Aug-01 12:52 UTC

head link

[Lustre-discuss] Line rate performance for clients

> > On 2011-07-29, at 11:33 AM, Brock Palen wrote:
> >> I think this is a networking question.
> >> 
> >> We have lustre 1.8 clients with 1gig-e interfaces that according
to
> >> ethtool are running full duplex.
> >> 
> >> If I do the following:
> >> 
> >> cp /lustre/largeilfe.h5 /tmp/
> >> 
> >> I get 117MB/s
> >> 
> >> If I then use globus-url-copy to move that file from /tmp/ to
-> remove
> >> tape archive I get 117MB/s
> >> 
> >> If I go directly from  /lustre -> archive  I get 50MB/s,
...> Its just when the client is reading form lustre and sending the data out at
> the same time that I only get 50MB/s.
> 
> Does that make sense?  Is it even right for me to expect that I could
> combine the performance together and expect full speed in and full speed
> out if I can consistently get them independent of each other?
Can your setup do wirespeed full duplex in the simplest case (never mind with 
lustre)? I''d try iperf or something similar before investing too much
time
looking for "lost" performance in higher layers.

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110801/8357c7f8/attachment.bin

Isaac Huang

2011-Aug-03 00:21 UTC

head link

[Lustre-discuss] Line rate performance for clients

On Mon, Aug 01, 2011 at 02:52:07PM +0200, Peter Kjellstr?m
wrote:> > > On 2011-07-29, at 11:33 AM, Brock Palen wrote:
> > ......
> > Does that make sense?  Is it even right for me to expect that I could
> > combine the performance together and expect full speed in and full
speed
> > out if I can consistently get them independent of each other?
I believe yes. I remember that we once did a test on 1GigE where one
client read from and another wrote to a same server and observed
about 223MB/s aggregate read/write throughput.
> Can your setup do wirespeed full duplex in the simplest case (never mind
with
> lustre)? I''d try iperf or something similar before investing too
much time
> looking for "lost" performance in higher layers.
Agree. And if ''iperf'' results look good, I''d suggest
to move on to the
LNet selftest, and it''d tell you whether the Lustre networking stack
is capable of saturating the link in both directions.

Here''s a script we once used, with outputs:

[root at sata16 ~]# export LST_SESSION=$$
[root at sata16 ~]# lst new_session --timeout 100 read/write
SESSION: read/write TIMEOUT: 100 FORCE: No
[root at sata16 ~]# lst add_group servers sata14 at tcp
sata14 at tcp are added to session
[root at sata16 ~]# lst add_group readers sata16 at tcp
sata16 at tcp are added to session
[root at sata16 ~]# lst add_group writers sata16 at tcp
sata16 at tcp are added to session
[root at sata16 ~]# lst add_batch bulk_rw
[root at sata16 ~]# lst add_test --batch bulk_rw --concurrency 8 --from
readers --to servers brw read size=1M
Test was added successfully
[root at sata16 ~]# lst add_test --batch bulk_rw --concurrency 8 --from
writers --to servers brw write size=1M
Test was added successfully
[root at sata16 ~]# lst run bulk_rw
bulk_rw is running now
[root at sata16 ~]# lst stat servers
[LNet Rates of servers]
[R] Avg: 335      RPC/s Min: 335      RPC/s Max: 335      RPC/s
[W] Avg: 446      RPC/s Min: 446      RPC/s Max: 446      RPC/s
[LNet Bandwidth of servers]
[R] Avg: 111.83   MB/s  Min: 111.83   MB/s  Max: 111.83   MB/s
[W] Avg: 111.23   MB/s  Min: 111.23   MB/s  Max: 111.23   MB/s

The script can be easily adapted to run on your system. Please load
the lnet_selftest kernel module on all test nodes before running it.
Lustre needs not to be running.

- Isaac
______________________________________________________________________
This email may contain privileged or confidential information, which should only
be used for the purpose for which it was sent by Xyratex. No further rights or
licenses are granted to use such information. If you are not the intended
recipient of this message, please notify the sender by return and delete it. You
may not use, copy, disclose or rely on the information contained in it.
 
Internet email is susceptible to data corruption, interception and unauthorised
amendment for which Xyratex does not accept liability. While we have taken
reasonable precautions to ensure that this email is free of viruses, Xyratex
does not accept liability for the presence of any computer viruses in this
email, nor for any losses caused as a result of viruses.
 
Xyratex Technology Limited (03134912), Registered in England & Wales,
Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
 
The Xyratex group of companies also includes, Xyratex Ltd, registered in
Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia)
Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in
The People''s Republic of China and Xyratex Japan Limited registered in
Japan.
______________________________________________________________________

Lustre discuss - Jul 2011 - Line rate performance for clients

[Lustre-discuss] Line rate performance for clients

[Lustre-discuss] Line rate performance for clients

[Lustre-discuss] Line rate performance for clients

[Lustre-discuss] Line rate performance for clients

[Lustre-discuss] Line rate performance for clients

[Lustre-discuss] Line rate performance for clients