I think this is a networking question. We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex. If I do the following: cp /lustre/largeilfe.h5 /tmp/ I get 117MB/s If I then use globus-url-copy to move that file from /tmp/ to -> remove tape archive I get 117MB/s If I go directly from /lustre -> archive I get 50MB/s, this is consistently reproducible. It doesn''t mater if I just copy a large file on lustre to lustre, or scp, or globus. If I try to ingest and outgest data I get what looks like half duplex performance. Anyone have ideas why I cannot do 1Gig-e full duplex? Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985
Try testing it with lnet self test and see what kind of results you see. -cf On 07/29/2011 11:33 AM, Brock Palen wrote:> I think this is a networking question. > > We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex. > > If I do the following: > > cp /lustre/largeilfe.h5 /tmp/ > > I get 117MB/s > > If I then use globus-url-copy to move that file from /tmp/ to -> remove tape archive I get 117MB/s > > If I go directly from /lustre -> archive I get 50MB/s, > > this is consistently reproducible. It doesn''t mater if I just copy a large file on lustre to lustre, or scp, or globus. If I try to ingest and outgest data I get what looks like half duplex performance. > > Anyone have ideas why I cannot do 1Gig-e full duplex? > > Brock Palen > www.umich.edu/~brockp > Center for Advanced Computing > brockp at umich.edu > (734)936-1985 > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2011-07-29, at 11:33 AM, Brock Palen wrote:> I think this is a networking question. > > We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex. > > If I do the following: > > cp /lustre/largeilfe.h5 /tmp/ > > I get 117MB/s > > If I then use globus-url-copy to move that file from /tmp/ to -> remove tape archive I get 117MB/s > > If I go directly from /lustre -> archive I get 50MB/s,Strace your globus-url-copy and see what IO size it is using. "cp" has long ago been modified to use the blocksize reported by stat(2) for copying, and Lustre reports a 2MB IO size for striped files (1MB for unstriped). If your globus tool is using e.g. 4kB reads then it will be very inefficient for Lustre, but much less so than from /tmp.> this is consistently reproducible. It doesn''t mater if I just copy a large file on lustre to lustre, or scp, or globus. If I try to ingest and outgest data I get what looks like half duplex performance. > > Anyone have ideas why I cannot do 1Gig-e full duplex?I don''t think this has anything to do with "full duplex". 117MB/s is pretty much the maximum line rate for GigE (and pretty good for Lustre, if I do say so myself) in one direction. There is presumably no data moving in the other direction at that time. Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985 On Jul 29, 2011, at 2:01 PM, Andreas Dilger wrote:> On 2011-07-29, at 11:33 AM, Brock Palen wrote: >> I think this is a networking question. >> >> We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex. >> >> If I do the following: >> >> cp /lustre/largeilfe.h5 /tmp/ >> >> I get 117MB/s >> >> If I then use globus-url-copy to move that file from /tmp/ to -> remove tape archive I get 117MB/s >> >> If I go directly from /lustre -> archive I get 50MB/s, > > Strace your globus-url-copy and see what IO size it is using. "cp" has long ago been modified to use the blocksize reported by stat(2) for copying, and Lustre reports a 2MB IO size for striped files (1MB for unstriped). If your globus tool is using e.g. 4kB reads then it will be very inefficient for Lustre, but much less so than from /tmp. > >> this is consistently reproducible. It doesn''t mater if I just copy a large file on lustre to lustre, or scp, or globus. If I try to ingest and outgest data I get what looks like half duplex performance. >> >> Anyone have ideas why I cannot do 1Gig-e full duplex? > > I don''t think this has anything to do with "full duplex". 117MB/s is pretty much the maximum line rate for GigE (and pretty good for Lustre, if I do say so myself) in one direction. There is presumably no data moving in the other direction at that time.Ah I guess I wasn''t clear, I only get 117MB/s when I do ''one direction on the network'' eg copy form lustre to /tmp (local drive)'', /tmp using globus out. Its just when the client is reading form lustre and sending the data out at the same time that I only get 50MB/s. Does that make sense? Is it even right for me to expect that I could combine the performance together and expect full speed in and full speed out if I can consistently get them independent of each other?> > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > > > >
> > On 2011-07-29, at 11:33 AM, Brock Palen wrote: > >> I think this is a networking question. > >> > >> We have lustre 1.8 clients with 1gig-e interfaces that according to > >> ethtool are running full duplex. > >> > >> If I do the following: > >> > >> cp /lustre/largeilfe.h5 /tmp/ > >> > >> I get 117MB/s > >> > >> If I then use globus-url-copy to move that file from /tmp/ to -> remove > >> tape archive I get 117MB/s > >> > >> If I go directly from /lustre -> archive I get 50MB/s,...> Its just when the client is reading form lustre and sending the data out at > the same time that I only get 50MB/s. > > Does that make sense? Is it even right for me to expect that I could > combine the performance together and expect full speed in and full speed > out if I can consistently get them independent of each other?Can your setup do wirespeed full duplex in the simplest case (never mind with lustre)? I''d try iperf or something similar before investing too much time looking for "lost" performance in higher layers. /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110801/8357c7f8/attachment.bin
On Mon, Aug 01, 2011 at 02:52:07PM +0200, Peter Kjellstr?m wrote:> > > On 2011-07-29, at 11:33 AM, Brock Palen wrote: > > ...... > > Does that make sense? Is it even right for me to expect that I could > > combine the performance together and expect full speed in and full speed > > out if I can consistently get them independent of each other?I believe yes. I remember that we once did a test on 1GigE where one client read from and another wrote to a same server and observed about 223MB/s aggregate read/write throughput.> Can your setup do wirespeed full duplex in the simplest case (never mind with > lustre)? I''d try iperf or something similar before investing too much time > looking for "lost" performance in higher layers.Agree. And if ''iperf'' results look good, I''d suggest to move on to the LNet selftest, and it''d tell you whether the Lustre networking stack is capable of saturating the link in both directions. Here''s a script we once used, with outputs: [root at sata16 ~]# export LST_SESSION=$$ [root at sata16 ~]# lst new_session --timeout 100 read/write SESSION: read/write TIMEOUT: 100 FORCE: No [root at sata16 ~]# lst add_group servers sata14 at tcp sata14 at tcp are added to session [root at sata16 ~]# lst add_group readers sata16 at tcp sata16 at tcp are added to session [root at sata16 ~]# lst add_group writers sata16 at tcp sata16 at tcp are added to session [root at sata16 ~]# lst add_batch bulk_rw [root at sata16 ~]# lst add_test --batch bulk_rw --concurrency 8 --from readers --to servers brw read size=1M Test was added successfully [root at sata16 ~]# lst add_test --batch bulk_rw --concurrency 8 --from writers --to servers brw write size=1M Test was added successfully [root at sata16 ~]# lst run bulk_rw bulk_rw is running now [root at sata16 ~]# lst stat servers [LNet Rates of servers] [R] Avg: 335 RPC/s Min: 335 RPC/s Max: 335 RPC/s [W] Avg: 446 RPC/s Min: 446 RPC/s Max: 446 RPC/s [LNet Bandwidth of servers] [R] Avg: 111.83 MB/s Min: 111.83 MB/s Max: 111.83 MB/s [W] Avg: 111.23 MB/s Min: 111.23 MB/s Max: 111.23 MB/s The script can be easily adapted to run on your system. Please load the lnet_selftest kernel module on all test nodes before running it. Lustre needs not to be running. - Isaac ______________________________________________________________________ This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People''s Republic of China and Xyratex Japan Limited registered in Japan. ______________________________________________________________________