While performing a single copy, single client write/read test using dd, we are finding that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 write about half the speed of our Nehalem clients running 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. This is true even though the slower clients have the same processors and more RAM, 18GB for the slow writers and 12GB for the fast writers. Both systems use OFED 1.3.1. All benchmarks we use perform better on the slow-write clients and read speed from LFS is comparable across all clients. Max_rpcs_in_flight and max_pages_per_rpc are default on both systems. They are on the same IB network, with the same QDR cards and IB connectivity has been verified with the IB utilities. They are almost identical in bandwidth and latency. We''re also using the same modprobe.conf and openibd.conf files on both systems. We''re using 34GB file size on the 12GB and 18GB RAM systems, 137GB file on the 96GB RAM system. So it''s not a matter of caching in RAM. Are there known issues with our 2.6.18-92.1.10.el5-lustre-1.6.5.1 combination? This is not a problem with the lustre file system as we get the same type of results no matter which of our three lustre systems the test is being written to. Here are the summaries from several runs of ost-survey on our new Lustre system. Please comment on the worst/best deltas of the read and write operations. Number of Active OST devices : 96 Worst Read 38.167753 38.932928 39.006537 39.782153 38.717915 Best Read 61.704534 61.832461 63.284999 65.000491 61.836016 Read Average: 51.433847 51.281630 51.297278 51.582327 51.318410 Worst Write 34.311237 49.009757 55.272744 51.532331 51.816523 Best Write 94.001170 96.033483 93.401792 93.081544 91.030717 Write Average: 74.248683 71.831019 75.179863 74.723100 74.930529 /bob Bob Hayes System Administrator SSG-DRD-DP Office: 253-371-3040 Cell: 253-441-5482 e-mail: robert.n.hayes at Intel.Com<mailto:robert.n.hayes at Intel.Com> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090511/e3469fd3/attachment.html
On May 11, 2009 13:35 -0700, Hayes, Robert N wrote:> While performing a single copy, single client write/read test using dd, > we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 > write about half the speed of our Nehalem clients running > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.> This is true even though the slower clients have the same processors and > more RAM, 18GB for the slow writers and 12GB for the fast writers. Both > systems use OFED 1.3.1. All benchmarks we use perform better on the > slow-write clients and read speed from LFS is comparable across all > clients.Have you tried booting the slower-with-more-RAM clients using "mem=12G" to see if the performance gets worse with more RAM? There is a known performance bottleneck with the client-side cache in 1.6 clients, and you may be triggering this... If you have the luxury to do so, testing a 1.8.0 client''s IO performance against the same filesystems would also determine if the client-side cache performance fixes therein will already solve your problems. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote:> While performing a single copy, single client write/read test using > dd, we are finding that our Nehalem clients running > > 2.6.18-92.1.10.el5-lustre-1.6.5.1 > > write about half the speed of our Nehalem clients running > > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file > systems.We''ve seen a fairly substantial block-level device throughput regression going from -53 to -92 without involving Lustre, but I''ve not yet had time to run down the changes to see what could be causing it. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office
We will test the mem=12G suggestion. Before attempting the 1.8.0 client, can you confirm that a 1.8 client should work with a 1.6 server without causing any more complications? /bob -----Original Message----- From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of Andreas Dilger Sent: Monday, May 11, 2009 1:54 PM To: Hayes, Robert N Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] (no subject) On May 11, 2009 13:35 -0700, Hayes, Robert N wrote:> While performing a single copy, single client write/read test using dd, > we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 > write about half the speed of our Nehalem clients running > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.> This is true even though the slower clients have the same processors and > more RAM, 18GB for the slow writers and 12GB for the fast writers. Both > systems use OFED 1.3.1. All benchmarks we use perform better on the > slow-write clients and read speed from LFS is comparable across all > clients.Have you tried booting the slower-with-more-RAM clients using "mem=12G" to see if the performance gets worse with more RAM? There is a known performance bottleneck with the client-side cache in 1.6 clients, and you may be triggering this... If you have the luxury to do so, testing a 1.8.0 client''s IO performance against the same filesystems would also determine if the client-side cache performance fixes therein will already solve your problems. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Dave Does the "substantial block-level device throughput regression" exist in 2.6.18-128? /bob -----Original Message----- From: David Dillow [mailto:dillowda at ornl.gov] Sent: Monday, May 11, 2009 2:20 PM To: Hayes, Robert N Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] (no subject) On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote:> While performing a single copy, single client write/read test using > dd, we are finding that our Nehalem clients running > > 2.6.18-92.1.10.el5-lustre-1.6.5.1 > > write about half the speed of our Nehalem clients running > > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file > systems.We''ve seen a fairly substantial block-level device throughput regression going from -53 to -92 without involving Lustre, but I''ve not yet had time to run down the changes to see what could be causing it. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office
On Mon, 2009-05-11 at 15:44 -0700, Hayes, Robert N wrote:> Dave > Does the "substantial block-level device throughput regression" exist > in 2.6.18-128?I couldn''t say, haven''t tested it. Our environment is SRP over IB for the storage, so that''ll be my focus when I look at the changes between the two kernels. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office
On May 11, 2009 15:44 -0700, Hayes, Robert N wrote:> Does the "substantial block-level device throughput regression" exist > in 2.6.18-128?Note that "block-level device" is meaningless from the point of view of Lustre clients. If you changed the client software only then this shouldn''t be a factor.> -----Original Message----- > From: David Dillow [mailto:dillowda at ornl.gov] > Sent: Monday, May 11, 2009 2:20 PM > To: Hayes, Robert N > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] (no subject) > > On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote: > > While performing a single copy, single client write/read test using > > dd, we are finding that our Nehalem clients running > > > > 2.6.18-92.1.10.el5-lustre-1.6.5.1 > > > > write about half the speed of our Nehalem clients running > > > > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file > > systems. > > We''ve seen a fairly substantial block-level device throughput regression > going from -53 to -92 without involving Lustre, but I''ve not yet had > time to run down the changes to see what could be causing it. > > -- > Dave Dillow > National Center for Computational Science > Oak Ridge National Laboratory > (865) 241-6602 office > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On May 11, 2009 14:38 -0700, Hayes, Robert N wrote:> We will test the mem=12G suggestion. Before attempting the 1.8.0 client, > can you confirm that a 1.8 client should work with a 1.6 server without > causing any more complications?Yes, the 1.8.x clients are interoperable with 1.6.x servers. If you are worried about testing this out during live system time then you can wait for an outage window to test the 1.8 client in isolation. There is nothing to do on the server, and just RPM upgrade/downgrades on the client.> -----Original Message----- > From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of Andreas Dilger > Sent: Monday, May 11, 2009 1:54 PM > To: Hayes, Robert N > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] (no subject) > > On May 11, 2009 13:35 -0700, Hayes, Robert N wrote: > > While performing a single copy, single client write/read test using dd, > > we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 > > write about half the speed of our Nehalem clients running > > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. > > > This is true even though the slower clients have the same processors and > > more RAM, 18GB for the slow writers and 12GB for the fast writers. Both > > systems use OFED 1.3.1. All benchmarks we use perform better on the > > slow-write clients and read speed from LFS is comparable across all > > clients. > > Have you tried booting the slower-with-more-RAM clients using "mem=12G" > to see if the performance gets worse with more RAM? There is a known > performance bottleneck with the client-side cache in 1.6 clients, and > you may be triggering this... > > If you have the luxury to do so, testing a 1.8.0 client''s IO performance > against the same filesystems would also determine if the client-side > cache performance fixes therein will already solve your problems. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc.Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On May 11, 2009, at 8:07 PM, Andreas Dilger wrote:> On May 11, 2009 14:38 -0700, Hayes, Robert N wrote: >> We will test the mem=12G suggestion. Before attempting the 1.8.0 >> client, >> can you confirm that a 1.8 client should work with a 1.6 server >> without >> causing any more complications? > > Yes, the 1.8.x clients are interoperable with 1.6.x servers. If you > are > worried about testing this out during live system time then you can > wait > for an outage window to test the 1.8 client in isolation. There is > nothing to do on the server, and just RPM upgrade/downgrades on the > client.And it''s a beautiful thing. :) Charlie Taylor UF HPC Center