thr3ads.net - Lustre discuss - [Lustre-discuss] (no subject) [May 2009]

If this information is useful, please help other people find it:
Share via:

Hayes, Robert N

2009-May-11 20:35 UTC

[Lustre-discuss] (no subject)

While performing a single copy, single client write/read test using dd, we are
finding that our Nehalem clients running
2.6.18-92.1.10.el5-lustre-1.6.5.1
write about half the speed of our Nehalem clients running
2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.
This is true even though the slower clients have the same processors and more
RAM, 18GB for the slow writers and 12GB for the fast writers. Both systems use
OFED 1.3.1. All benchmarks we use perform better on the slow-write clients and
read speed from LFS is comparable across all clients.
Max_rpcs_in_flight and max_pages_per_rpc are default on both systems.
They are on the same IB network, with the same QDR cards and IB connectivity has
been verified with the IB utilities. They are almost identical in bandwidth and
latency.

We''re also using the same modprobe.conf and openibd.conf files on both
systems.
We''re using 34GB file size on the 12GB and 18GB RAM systems, 137GB file
on the 96GB RAM system. So it''s not a matter of caching in RAM.

Are there known issues with our 2.6.18-92.1.10.el5-lustre-1.6.5.1 combination?

This is not a problem with the lustre file system as we get the same type of
results no matter which of our three lustre systems the test is being written
to.

Here are the summaries from several runs of ost-survey on our new Lustre system.
Please comment on the worst/best deltas of the read and write operations.
Number of Active OST devices : 96
Worst Read 38.167753 38.932928 39.006537 39.782153 38.717915
Best Read 61.704534 61.832461 63.284999 65.000491 61.836016
Read Average: 51.433847 51.281630 51.297278 51.582327 51.318410
Worst Write 34.311237 49.009757 55.272744 51.532331 51.816523
Best Write 94.001170 96.033483 93.401792 93.081544 91.030717
Write Average: 74.248683 71.831019 75.179863 74.723100 74.930529

/bob

Bob Hayes
System Administrator
SSG-DRD-DP
Office: 253-371-3040
Cell: 253-441-5482
e-mail: robert.n.hayes at Intel.Com<mailto:robert.n.hayes at Intel.Com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090511/e3469fd3/attachment.html

Andreas Dilger

2009-May-11 20:53 UTC

head link

[Lustre-discuss] (no subject)

On May 11, 2009  13:35 -0700, Hayes, Robert N wrote:> While performing a single copy, single client write/read test using dd,
> we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1
> write about half the speed of our Nehalem clients running
> 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.
> This is true even though the slower clients have the same processors and
> more RAM, 18GB for the slow writers and 12GB for the fast writers. Both
> systems use OFED 1.3.1. All benchmarks we use perform better on the
> slow-write clients and read speed from LFS is comparable across all
> clients.
Have you tried booting the slower-with-more-RAM clients using
"mem=12G"
to see if the performance gets worse with more RAM?  There is a known
performance bottleneck with the client-side cache in 1.6 clients, and
you may be triggering this...

If you have the luxury to do so, testing a 1.8.0 client''s IO
performance
against the same filesystems would also determine if the client-side
cache performance fixes therein will already solve your problems.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

David Dillow

2009-May-11 21:20 UTC

head link

[Lustre-discuss] (no subject)

On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N
wrote:> While performing a single copy, single client write/read test using
> dd, we are finding that our Nehalem clients running 
> 
> 2.6.18-92.1.10.el5-lustre-1.6.5.1 
> 
> write about half the speed of our Nehalem clients running 
> 
> 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file
> systems.
We''ve seen a fairly substantial block-level device throughput
regression
going from -53 to -92 without involving Lustre, but I''ve not yet had
time to run down the changes to see what could be causing it.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

Hayes, Robert N

2009-May-11 21:38 UTC

head link

[Lustre-discuss] (no subject)

We will test the mem=12G suggestion. Before attempting the 1.8.0 client, can you
confirm that a 1.8 client should work with a 1.6 server without causing any more
complications?

/bob

-----Original Message-----
From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of
Andreas Dilger
Sent: Monday, May 11, 2009 1:54 PM
To: Hayes, Robert N
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] (no subject)

On May 11, 2009  13:35 -0700, Hayes, Robert N wrote:> While performing a single copy, single client write/read test using dd,
> we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1
> write about half the speed of our Nehalem clients running
> 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.
> This is true even though the slower clients have the same processors and
> more RAM, 18GB for the slow writers and 12GB for the fast writers. Both
> systems use OFED 1.3.1. All benchmarks we use perform better on the
> slow-write clients and read speed from LFS is comparable across all
> clients.
Have you tried booting the slower-with-more-RAM clients using
"mem=12G"
to see if the performance gets worse with more RAM?  There is a known
performance bottleneck with the client-side cache in 1.6 clients, and
you may be triggering this...

If you have the luxury to do so, testing a 1.8.0 client''s IO
performance
against the same filesystems would also determine if the client-side
cache performance fixes therein will already solve your problems.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Hayes, Robert N

2009-May-11 22:44 UTC

head link

[Lustre-discuss] (no subject)

Dave
Does the "substantial block-level device throughput regression" exist
in 2.6.18-128?

/bob

-----Original Message-----
From: David Dillow [mailto:dillowda at ornl.gov] 
Sent: Monday, May 11, 2009 2:20 PM
To: Hayes, Robert N
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] (no subject)

On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N
wrote:> While performing a single copy, single client write/read test using
> dd, we are finding that our Nehalem clients running 
> 
> 2.6.18-92.1.10.el5-lustre-1.6.5.1 
> 
> write about half the speed of our Nehalem clients running 
> 
> 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file
> systems.
We''ve seen a fairly substantial block-level device throughput
regression
going from -53 to -92 without involving Lustre, but I''ve not yet had
time to run down the changes to see what could be causing it.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

David Dillow

2009-May-11 23:35 UTC

head link

[Lustre-discuss] (no subject)

On Mon, 2009-05-11 at 15:44 -0700, Hayes, Robert N
wrote:> Dave
> Does the "substantial block-level device throughput regression"
exist
> in 2.6.18-128?
I couldn''t say, haven''t tested it. Our environment is SRP over
IB for
the storage, so that''ll be my focus when I look at the changes between
the two kernels.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

Andreas Dilger

2009-May-12 00:05 UTC

head link

[Lustre-discuss] (no subject)

On May 11, 2009  15:44 -0700, Hayes, Robert N wrote:> Does the "substantial block-level device throughput regression"
exist
> in 2.6.18-128?
Note that "block-level device" is meaningless from the point of view
of Lustre clients.  If you changed the client software only then this
shouldn''t be a factor.
> -----Original Message-----
> From: David Dillow [mailto:dillowda at ornl.gov] 
> Sent: Monday, May 11, 2009 2:20 PM
> To: Hayes, Robert N
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] (no subject)
> 
> On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote:
> > While performing a single copy, single client write/read test using
> > dd, we are finding that our Nehalem clients running 
> > 
> > 2.6.18-92.1.10.el5-lustre-1.6.5.1 
> > 
> > write about half the speed of our Nehalem clients running 
> > 
> > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file
> > systems.
> 
> We''ve seen a fairly substantial block-level device throughput
regression
> going from -53 to -92 without involving Lustre, but I''ve not yet
had
> time to run down the changes to see what could be causing it.
> 
> -- 
> Dave Dillow
> National Center for Computational Science
> Oak Ridge National Laboratory
> (865) 241-6602 office
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Andreas Dilger

2009-May-12 00:07 UTC

head link

[Lustre-discuss] (no subject)

On May 11, 2009  14:38 -0700, Hayes, Robert N wrote:> We will test the mem=12G suggestion. Before attempting the 1.8.0 client,
> can you confirm that a 1.8 client should work with a 1.6 server without
> causing any more complications?
Yes, the 1.8.x clients are interoperable with 1.6.x servers.  If you are
worried about testing this out during live system time then you can wait
for an outage window to test the 1.8 client in isolation.  There is
nothing to do on the server, and just RPM upgrade/downgrades on the client.
> -----Original Message-----
> From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On
Behalf Of Andreas Dilger
> Sent: Monday, May 11, 2009 1:54 PM
> To: Hayes, Robert N
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] (no subject)
> 
> On May 11, 2009  13:35 -0700, Hayes, Robert N wrote:
> > While performing a single copy, single client write/read test using
dd,
> > we find that our Nehalem clients running
2.6.18-92.1.10.el5-lustre-1.6.5.1
> > write about half the speed of our Nehalem clients running
> > 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file
systems.
> 
> > This is true even though the slower clients have the same processors
and
> > more RAM, 18GB for the slow writers and 12GB for the fast writers.
Both
> > systems use OFED 1.3.1. All benchmarks we use perform better on the
> > slow-write clients and read speed from LFS is comparable across all
> > clients.
> 
> Have you tried booting the slower-with-more-RAM clients using
"mem=12G"
> to see if the performance gets worse with more RAM?  There is a known
> performance bottleneck with the client-side cache in 1.6 clients, and
> you may be triggering this...
> 
> If you have the luxury to do so, testing a 1.8.0 client''s IO
performance
> against the same filesystems would also determine if the client-side
> cache performance fixes therein will already solve your problems.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Charles Taylor

2009-May-12 01:20 UTC

head link

[Lustre-discuss] (no subject)

On May 11, 2009, at 8:07 PM, Andreas Dilger wrote:
> On May 11, 2009  14:38 -0700, Hayes, Robert N wrote:
>> We will test the mem=12G suggestion. Before attempting the 1.8.0  
>> client,
>> can you confirm that a 1.8 client should work with a 1.6 server  
>> without
>> causing any more complications?
>
> Yes, the 1.8.x clients are interoperable with 1.6.x servers.  If you  
> are
> worried about testing this out during live system time then you can  
> wait
> for an outage window to test the 1.8 client in isolation.  There is
> nothing to do on the server, and just RPM upgrade/downgrades on the  
> client.
And it''s a beautiful thing.  :)

Charlie Taylor
UF HPC Center

Lustre discuss - May 2009 - (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)

[Lustre-discuss] (no subject)