Andrei Maslennikov
2008-Jul-02 08:18 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
Some fresh numbers: One single thread writing over Infiniband (with RHEL4-supplied OFED 1.2++). lmdd of=/lustre/testX bs=1M time=20 fsync=1 Lustre 1.6.4.3 - AMD 2210 (4 cores) - 396 MB/sec (official kernel) Lustre 1.6.5 - AMD 2210 (4 cores) - 169 MB/sec (official kernel) Lustre 1.6.5 - AMD 2210 (4 cores) - 165 MB/sec (patchless client) Lustre 1.6.4.3 - AMD 2354 (8 cores) - 725 MB/sec Lustre 1.6.5 - AMD 2354 (8 cores) - 246 MB/sec -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080702/24b2caa8/attachment.html
Andreas Dilger
2008-Jul-04 04:52 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
On Jul 02, 2008 10:18 +0200, Andrei Maslennikov wrote:> One single thread writing over Infiniband (with RHEL4-supplied OFED 1.2++). > > lmdd of=/lustre/testX bs=1M time=20 fsync=1 > > Lustre 1.6.4.3 - AMD 2210 (4 cores) - 396 MB/sec (official kernel) > Lustre 1.6.5 - AMD 2210 (4 cores) - 169 MB/sec (official kernel) > Lustre 1.6.5 - AMD 2210 (4 cores) - 165 MB/sec (patchless client) > > Lustre 1.6.4.3 - AMD 2354 (8 cores) - 725 MB/sec > Lustre 1.6.5 - AMD 2354 (8 cores) - 246 MB/secCan you try disabling checksumming on the client: lctl set_param osc.*.checksums=0 Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andrei Maslennikov
2008-Jul-06 00:30 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
Thanks Andreas! Disabling checksumming certainly leads to a big performance impact on the client side. However it looks like we still have some performance gap between 1.6.4.3and 1.6.5. I have repeated the tests making sure that the file sizes are much larger than available RAM on the client, to avoid any caching effects. Here is what came out: Single stream writing: (lmdd of=/lustre/tstfileXX bs=1M time=200 fsync=1) Client: AMD 2354 @ 2.21 GHz 2xQuad core, 16GB RAM, Infiniband, Servers: official 1.6.4.1 -------------------------------------------------------------------------------------------------------------------------------------- Kernel: official 2.6.9-67.0.4.EL_lustre.1.6.4.3smp, Lustre: 1.6.4.3 official : 681 MB/sec Kernel: patchless 2.6.7-67.0.20.ELsmp x86_64, Lustre: 1.6.5 (no checksum) : 590 MB/sec Kernel: patchless 2.6.7-67.0.20.ELsmp x86_64, Lustre: 1.6.5 (with checksum) : 265 MB/sec Client: Intel X5450 @ 3.00GHz 2xQuad core, 16GB RAM, Infiniband, Servers: official 1.6.4.1 -------------------------------------------------------------------------------------------------------------------------------------- Kernel: official 2.6.9-67.0.4.EL_lustre.1.6.4.3smp, Lustre: 1.6.4.3 : 832 MB/sec Kernel: official 2.6.9-67.0.7.EL_lustre.1.6.5smp, Lustre: 1.6.5 (no checksum) : 681 MB/sec Kernel: patchless 2.6.7-67.0.20.ELsmp x86_64, Lustre: 1.6.5 (no checksum) : 675 MB/sec Kernel: official 2.6.9-67.0.7.EL_lustre.1.6.5smp, Lustre: 1.6.5 (with checksum) : 326 MB/sec Kernel: patchless 2.6.7-67.0.20.ELsmp x86_64, Lustre: 1.6.5 (with checksum) : 322 MB/sec Here we see that 1.6.5.0 with fully patched client and no checksumming still performs worse than 1.6.4.3 with fully patched client (only 681 MB/sec against 832 MB/sec, almost 18% less). Is there some other parameter to play with? Regards - Andrei. On Fri, Jul 4, 2008 at 6:52 AM, Andreas Dilger <adilger at sun.com> wrote:> > Can you try disabling checksumming on the client: > > lctl set_param osc.*.checksums=0 > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080706/302b23a5/attachment.html
Andreas Dilger
2008-Jul-08 18:23 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
On Jul 06, 2008 02:30 +0200, Andrei Maslennikov wrote:> Disabling checksumming certainly leads to a big performance impact on the > client side. > However it looks like we still have some performance gap between 1.6.4.3and > 1.6.5. > I have repeated the tests making sure that the file sizes are much larger > than available > RAM on the client, to avoid any caching effects. Here is what came out: > > Single stream writing: (lmdd of=/lustre/tstfileXX bs=1M time=200 fsync=1)client patched 1.6.4.3 patchless 1.6.5(nocsum) patchless 1.6.5(csum) 2.2GHz 681 MB/sec 590 MB/sec 265 MB/sec 3.0GHz 832 MB/sec 675 MB/sec 322 MB/sec> Here we see that 1.6.5.0 with fully patched client and no checksumming > still performs worse than 1.6.4.3 with fully patched client (only > 681 MB/sec against 832 MB/sec, almost 18% less).Can you please check the CPU usage during these tests? Is there still a more CPU usage on the client or server in 1.6.5 compared to 1.6.4.3 even with the checksumming disabled? It is important to use something like "top" with the ''1'' option to list per-cpu usage to see if a single CPU is at 100% and others are less busy, instead of using the average across all CPUs. Is the test with only a single thread? Have you tried running with 2 or more threads on the client?> Is there some other parameter to play with?Do you have the same IB stack used with both the 1.6.5 and 1.6.4.3 releases? It would be very useful to test with LNET Self Test (LST) to see if the slowdown is related to the IB 1.3 in 1.6.5. LST is available in both of these releases, and details on running it are at: http://manual.lustre.org/manual/LustreManual16_HTML/LustreIOKit.html#50446382_pgfId-1290255 Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andrei Maslennikov
2008-Jul-08 18:48 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
Hello Andreas, the tests were always done with only one thread as I was interested in the unsmeared peak performance. Will also compare 1.6.4.3 and 1.6.5 with 2-4 simultaneous threads. BTW, during the tests I always had this setting on the client: "options ksocklnd irq_affinity=0". In these tests, I always used the stock OFED 1.2+ that comes with the RHEL4 distribution. In case of official 1.6.5 kernel which comes without IB support I had to add it by hand starting with the official kernel tree for this exact kernel. So OFED 1.3 may hardly be blamed and LST would reveal nothing, right? I will try to check the CPU loads for both cases at the first opportunity, hopefully before the end of this week Regards - Andrei. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080708/da15987e/attachment-0001.html
Andrei Maslennikov
2008-Jul-16 13:41 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
*New performance numbers (1.6.5.1 vs 1.6.4.3):* --------------------------------------------------------------------------------------- Client : Intel X5450 at 3.00GHz 2xQuad core, 16GB RAM, Infiniband, RHEL4 x86_64 Servers: Official 1.6.4.1 Single stream writing: (lmdd of=/lustre/tstfileXX bs=1M time=200 fsync=1) --------------------------------------------------------------------------------------- *2.6.9-67.0.20.ELsmp unmodified, OFED 1.2, 319 MB/sec * *Lustre 1.6.5.1 (with checksumming): * ** Client loads: lmdd - 100% (1 CPU), ptlrpcd - 5% , pdflush- 15% On 2 OSS servers in use: circa 50% total sys (2 CPUs), circa 10% I/O wait. *2.6.9-67.0.7.EL_lustre.1.6.5.1smp, OFED 1.3, 340 MB/sec * *Lustre 1.6.5.1 (with checksumming):* Client loads: lmdd - 100% (1 CPU), ptlrpcd - 5%, pdflush- 15% On 2 OSS servers in use: circa 50% total sys (2 CPUs), circa 12% I/O wait. *2.6.9-67.0.20.ELsmp unmodified, OFED 1.2, 671 MB/sec * *Lustre 1.6.5.1 (no checksumming) :* * * Client loads: lmdd - 100% (1 CPU), ptlrpcd - 15%, pdflush- 2-3% On 2 OSS servers in use: circa 35% total sys (2 CPUs), circa 35% I/O wait. *2.6.9-67.0.7.EL_lustre.1.6.5.1smp, OFED 1.3, 670 MB/sec * *Lustre 1.6.5.1 (no checksumming) :* Client loads: lmdd - 100% (1 CPU), ptlrpcd - 12%, pdflush- 2-3% On 2 OSS servers in use: circa 32% total sys (2 CPUs), circa 32% I/O wait. *2.6.9-67.0.4.EL_lustre.1.6.4.3smp, OFED 1.2, 843 MB/sec* *Lustre 1.6.4.3 * Client loads: lmdd - 100% (1 CPU), ptlrpcd - 20%, pdflush - 1% On 2 OSS servers in use: circa 33 % total sys (2 CPUs), circa 30% I/O wait. -------------------------------------------------------------------------------------- Running several (2,4) simultaneous jobs on the same 1.6.4.3 client does not improve the aggregate performance. I have seen 750 MB/sec aggregate with 4 streams, and 806 MB/sec aggregate with 2 streams. With 1.6.5.1 client with no checksumming I can get up to 800 MB/sec aggregate with 4 streams, and some 730 MB/sec with 2 streams. But Lustre 1.6.5.1 is visibly (20%) less performant on a single stream when compared with 1.6.4.3. Andrei. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080716/4f02ff48/attachment.html
Johann Lombardi
2008-Jul-16 14:06 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
On Wed, Jul 16, 2008 at 03:41:17PM +0200, Andrei Maslennikov wrote:> Client : Intel X5450 at 3.00GHz 2xQuad core, 16GB RAM, > Infiniband, RHEL4 x86_64 > Servers: Official 1.6.4.1 > Single stream writing: (lmdd of=/lustre/tstfileXX bs=1M time=200 fsync=1)Does it mean that you run 1.6.4.1 on the OSSs even when testing 1.6.5.1? Because 1.6.5.1 clients cannot use adler32 checksums if the OSSs don''t support it and thus fall back to crc32. In 1.6.4.x, checksumming was disabled by default because crc32 was the only supported checksum type and it hurt performance (see bug 13805). Johann
Andrei Maslennikov
2008-Jul-16 16:26 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
On Wed, Jul 16, 2008 at 4:06 PM, Johann Lombardi <johann at sun.com> wrote:> > Does it mean that you run 1.6.4.1 on the OSSs even when testing 1.6.5.1? > Because 1.6.5.1 clients cannot use adler32 checksums if the OSSs don''t > support > it and thus fall back to crc32. In 1.6.4.x, checksumming was disabled by > default > because crc32 was the only supported checksum type and it hurt performance > (see > bug 13805). >Right! This might be an explanation for the great drop in performance in case of tests done with checksumming enabled. (BTW, I always stated that we have 1.6.4.1 on servers, but nobody had mentioned so far that there might be a problem on that side...) I however worry much more about the case when checksumming is disabled. Do you think that the 20% performance drop may also be explained by the combination of (1.6.4.1 server / 1.6.5.1 client)?? Andrei. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080716/0d21c158/attachment-0001.html
Mike Bui
2008-Oct-18 00:49 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
Hello, I have Lustre 1.6.5.1 and want to downgrade to Lustre 1.6.4.3 for better performance, how would I do this ? Please advise. Thanks, Mike Andrei Maslennikov wrote:> > > *New performance numbers (1.6.5.1 <http://1.6.5.1> vs 1.6.4.3 > <http://1.6.4.3>):* > > > --------------------------------------------------------------------------------------- > Client : Intel X5450 at 3.00GHz <mailto:X5450 at 3.00GHz> 2xQuad core, > 16GB RAM, > Infiniband, RHEL4 x86_64 > Servers: Official 1.6.4.1 <http://1.6.4.1/> > Single stream writing: (lmdd of=/lustre/tstfileXX bs=1M time=200 > fsync=1) > > --------------------------------------------------------------------------------------- > > > > _2.6.9-67.0.20.ELsmp unmodified, OFED 1.2, *319 > MB/sec* > _ _Lustre 1.6.5.1 <http://1.6.5.1/> (with > checksumming): _ > Client loads: lmdd - 100% (1 CPU), ptlrpcd - 5% , pdflush- 15% > On 2 OSS servers in use: circa 50% total sys (2 CPUs), circa 10% I/O > wait. > > _2.6.9-67.0.7.EL_lustre.1.6.5.1smp, OFED 1.3, > *340 MB/sec* > _ _Lustre 1.6.5.1 <http://1.6.5.1/> (with checksumming):_ > Client loads: lmdd - 100% (1 CPU), ptlrpcd - 5%, pdflush- 15% > On 2 OSS servers in use: circa 50% total sys (2 CPUs), circa 12% I/O > wait. > > _2.6.9-67.0.20.ELsmp unmodified, OFED 1.2, > *671 MB/sec* > _ _Lustre 1.6.5.1 <http://1.6.5.1/> (no checksumming) :_ * > * > Client loads: lmdd - 100% (1 CPU), ptlrpcd - 15%, pdflush- 2-3% > On 2 OSS servers in use: circa 35% total sys (2 CPUs), circa 35% I/O > wait. > > _2.6.9-67.0.7.EL_lustre.1.6.5.1smp, OFED 1.3, > *670 MB/sec* > _ _Lustre 1.6.5.1 <http://1.6.5.1/> (no checksumming) :_ > Client loads: lmdd - 100% (1 CPU), ptlrpcd - 12%, pdflush- 2-3% > On 2 OSS servers in use: circa 32% total sys (2 CPUs), circa 32% I/O > wait. > > _2.6.9-67.0.4.EL_lustre.1.6.4.3smp, OFED 1.2, > *843 MB/sec*_ > _Lustre 1.6.4.3 <http://1.6.4.3> _ > Client loads: lmdd - 100% (1 CPU), ptlrpcd - 20%, pdflush - 1% > On 2 OSS servers in use: circa 33 % total sys (2 CPUs), circa 30% > I/O wait. > > > > -------------------------------------------------------------------------------------- > Running several (2,4) simultaneous jobs on the same 1.6.4.3 > <http://1.6.4.3> client > does not improve the aggregate performance. I have seen 750 MB/sec > aggregate with 4 streams, and 806 MB/sec aggregate with 2 streams. > > With 1.6.5.1 <http://1.6.5.1> client with no checksumming I can get > up to 800 MB/sec > aggregate with 4 streams, and some 730 MB/sec with 2 streams. > > But Lustre 1.6.5.1 <http://1.6.5.1> is visibly (20%) less performant > on a single stream when > compared with 1.6.4.3 <http://1.6.4.3>. > > Andrei. > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081017/83c56431/attachment.html
Johann Lombardi
2008-Oct-18 08:23 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
On Wed, Jul 16, 2008 at 06:26:05PM +0200, Andrei Maslennikov wrote:> I however worry much more about the case when checksumming is > disabled. Do you think that the 20% performance drop may also be explained by > the combination of ([6]1.6.4.1 server / [7]1.6.5.1 client)??LRU resize is disabled by default in 1.6.4.3 whereas it is enabled in 1.6.5.1. We have found a problem with this new feature recently (i.e. the LRU shrinking code is running much too often causing a performance drop): https://bugzilla.lustre.org/show_bug.cgi?id=17282 You can disable LRU resize in 1.6.5.1 by running the following command on the lustre clients: lctl set_param ldlm.namespaces.*osc*.lru_size=$((100 * nbprocs)) (replace nbprocs by the number of CPUs on the node) Johann
Johann Lombardi
2008-Oct-20 05:59 UTC
[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?
On Fri, Oct 17, 2008 at 05:49:43PM -0700, Mike Bui wrote:> I have Lustre 1.6.5.1 and want to downgrade to Lustre 1.6.4.3 for better > performance, how would I do this ? Please advise.If you are not using quotas, there should be no problems to downgrade. However, I *suspect* that you will achieve the same results with 1.6.5.1 if you disable LRU resize (on a live system, see my previous email, or at configure time by using --disable-lru-resize). Please also note that: * checksumming is disabled by default in 1.6.4* whereas it is enabled in 1.6.5.1 (see bug 13805 for more details). * the performance drop was only noticed with a single stream. Johann