Hello all I am running test between an X4150 and SGI Altix with luster 1.6.6 and I can see a read performance drop on the SGI. Numbers: X4150 dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k count=10000 (336MB/s) this a folder with a stripe of 4 for all tests dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k (461MB/s) SGI Altix dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k count=2000 (371MB/s) 99% of the core is running dd dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k (142MB/s) 20% is running dd and 80% is running ptlrpcd The X4140 is running: 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 5.2 (Tikanga) The SGI is running: SGI Altix systems 2.6.16.46-0.12-default #1 SMP Thu May 17 14:00:09 UTC 2007 4 ia64 ia64 GNU/Linux cat /etc/issue Welcome to SUSE Linux Enterprise Server 10 SP1 (ia64) We compiled lustre 1.6.6 on the SGI and everything works but the read performance is slowed down by the ptlrpcd. What should we look at to fix. Thanks Michel
What page size are you running on IA64? What networking? Kevin Michel Dionne wrote:> Hello all > > I am running test between an X4150 and SGI Altix with luster 1.6.6 and I > can see a read performance drop on the SGI. > > Numbers: > X4150 > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=10000 (336MB/s) this a folder with a stripe of 4 for all tests > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (461MB/s) > > SGI Altix > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=2000 (371MB/s) 99% of the core is running dd > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (142MB/s) 20% is running dd and 80% is running ptlrpcd > > > The X4140 is running: > 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 5.2 (Tikanga) > > The SGI is running: > SGI Altix systems 2.6.16.46-0.12-default #1 SMP Thu May 17 14:00:09 UTC > 2007 4 ia64 ia64 GNU/Linux > > cat /etc/issue > Welcome to SUSE Linux Enterprise Server 10 SP1 (ia64) > > We compiled lustre 1.6.6 on the SGI and everything works but the read > performance is slowed down by the ptlrpcd. > > What should we look at to fix. > > Thanks > > Michel > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Pagesize is 16k (16384) And lnet is o2ib Michel -----Original Message----- From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] Sent: March 4, 2009 15:42 To: Michel Dionne Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] read performance on ia64 What page size are you running on IA64? What networking? Kevin Michel Dionne wrote:> Hello all > > I am running test between an X4150 and SGI Altix with luster 1.6.6 andI> can see a read performance drop on the SGI. > > Numbers: > X4150 > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=10000 (336MB/s) this a folder with a stripe of 4 for all tests > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (461MB/s) > > SGI Altix > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=2000 (371MB/s) 99% of the core is running dd > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (142MB/s) 20% is running dd and 80% is running ptlrpcd > > > The X4140 is running: > 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 5.2 (Tikanga) > > The SGI is running: > SGI Altix systems 2.6.16.46-0.12-default #1 SMP Thu May 17 14:00:09UTC> 2007 4 ia64 ia64 GNU/Linux > > cat /etc/issue > Welcome to SUSE Linux Enterprise Server 10 SP1 (ia64) > > We compiled lustre 1.6.6 on the SGI and everything works but the read > performance is slowed down by the ptlrpcd. > > What should we look at to fix. > > Thanks > > Michel > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Mar 04, 2009 14:53 -0500, Michel Dionne wrote:> I am running test between an X4150 and SGI Altix with luster 1.6.6 and I > can see a read performance drop on the SGI.How many cores on the SGI system?> Numbers: > X4150 > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=10000 (336MB/s) this a folder with a stripe of 4 for all tests > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (461MB/s) > > SGI Altix > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=2000 (371MB/s) 99% of the core is running dd > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (142MB/s) 20% is running dd and 80% is running ptlrpcd > > > The X4140 is running: > 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 5.2 (Tikanga) > > The SGI is running: > SGI Altix systems 2.6.16.46-0.12-default #1 SMP Thu May 17 14:00:09 UTC > 2007 4 ia64 ia64 GNU/LinuxPlease try running the v1_8_0_RC2 from CVS - it has patches that should improve the scalability of the client when there are many CPUs. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
The sgi has 256 dual core 1.6Ghz cpus. The write performance is up to 375MB/s on a single dd using 1 core and writing on a folder with lfs setstripe of 4. It is only on read that the cpu usage of this one core limits reads to 142MB/s. This core is busy with ptlrpcd during the read. Is version v1_8_0_RC2 production ready or should we wait? Thanks Michel -----Original Message----- From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of Andreas Dilger Sent: March 6, 2009 05:07 To: Michel Dionne Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] read performance on ia64 On Mar 04, 2009 14:53 -0500, Michel Dionne wrote:> I am running test between an X4150 and SGI Altix with luster 1.6.6 andI> can see a read performance drop on the SGI.How many cores on the SGI system?> Numbers: > X4150 > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=10000 (336MB/s) this a folder with a stripe of 4 for all tests > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (461MB/s) > > SGI Altix > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > count=2000 (371MB/s) 99% of the core is running dd > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > (142MB/s) 20% is running dd and 80% is running ptlrpcd > > > The X4140 is running: > 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 5.2 (Tikanga) > > The SGI is running: > SGI Altix systems 2.6.16.46-0.12-default #1 SMP Thu May 17 14:00:09UTC> 2007 4 ia64 ia64 GNU/LinuxPlease try running the v1_8_0_RC2 from CVS - it has patches that should improve the scalability of the client when there are many CPUs. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Mar 09, 2009 13:57 -0400, Michel Dionne wrote:> The sgi has 256 dual core 1.6Ghz cpus.Is this considered a single system (e.g. there are 256 entries in /proc/cpuinfo) or is this a cluster of nodes? The client SMP scalabilty fix is only really significantwhen there are 8+ cores on a node.> The write performance is up to 375MB/s on a single dd using 1 core and > writing on a folder with lfs setstripe of 4. > It is only on read that the cpu usage of this one core limits reads to > 142MB/s. This core is busy with ptlrpcd during the read. > Is version v1_8_0_RC2 production ready or should we wait?The v1_8_0_RC2 is just a release candidate, and has since been replaced by v1_8_0_RC4, which is currently undergoing testing. There aren''t expected to be serious problems remaining at this stage of testing, but this is not yet a fully supported release so I would only advise using it for testing. The SMP locking fixes previously referenced are only needed on the client node, so if you have a time where you can do testing with the RC4 code this will tell you if the read locking is the problem.> -----Original Message----- > From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf > Of Andreas Dilger > Sent: March 6, 2009 05:07 > To: Michel Dionne > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] read performance on ia64 > > On Mar 04, 2009 14:53 -0500, Michel Dionne wrote: > > I am running test between an X4150 and SGI Altix with luster 1.6.6 and > > I can see a read performance drop on the SGI. > > How many cores on the SGI system? > > > Numbers: > > X4150 > > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > > count=10000 (336MB/s) this a folder with a stripe of 4 for all tests > > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > > (461MB/s) > > > > SGI Altix > > dd if=/dev/zero of=/mnt/exec4_client_ib/mike/4/bigfile bs=4096k > > count=2000 (371MB/s) 99% of the core is running dd > > dd if=/mnt/exec4_client_ib/mike/4/bigfile of=/dev/null bs=4096k > > (142MB/s) 20% is running dd and 80% is running ptlrpcd > > > > > > The X4140 is running: > > 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 > > x86_64 x86_64 x86_64 GNU/Linux > > Red Hat Enterprise Linux Server release 5.2 (Tikanga) > > > > The SGI is running: > > SGI Altix systems 2.6.16.46-0.12-default #1 SMP Thu May 17 14:00:09 > UTC > > 2007 4 ia64 ia64 GNU/Linux > > Please try running the v1_8_0_RC2 from CVS - it has patches that should > improve the scalability of the client when there are many CPUs. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc.Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.