thr3ads.net - Lustre discuss - [Lustre-discuss] finding performance issues [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Brock Palen

2010-Dec-10 19:42 UTC

[Lustre-discuss] finding performance issues

We have an lustre 1.6.x filesystem,

4 OSS,  3 x4500 and 1 ddn s2a6620

Each oss has 4 1gig interfaces bonded, or 1 10gig interface.

I have a user who is running a few hundred serial jobs that are all accessing
the same 16GB file, we striped the file over all the osts, and are tapped at
500-600MB/s no mater the number of hosts running.   IO per OST is around
15-20MB/s  (31 total ost''s)

This set of jobs keeps reading in the same data set, and has been running for
about 24 hours (the group of about 900 total jobs).

*  Is there a recommendation of a better way to do these sorts of jobs?  The
compute nodes have 48GB of ram, he does not use much ram for the job just all
the IO.

* Is there a better way to tune?  What should I be looking for to tune?

Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985

Cliff White

2010-Dec-10 20:14 UTC

head link

[Lustre-discuss] finding performance issues

On 12/10/2010 11:42 AM, Brock Palen wrote:> We have an lustre 1.6.x filesystem,
1.6 has been dead for well over a year. End Of Life.>
> 4 OSS,  3 x4500 and 1 ddn s2a6620
>
> Each oss has 4 1gig interfaces bonded, or 1 10gig interface.
>
> I have a user who is running a few hundred serial jobs that are all
accessing the same 16GB file, we striped the file over all the osts, and are
tapped at 500-600MB/s no mater the number of hosts running.   IO per OST is
around 15-20MB/s  (31 total ost''s)
>
> This set of jobs keeps reading in the same data set, and has been running
for about 24 hours (the group of about 900 total jobs).
>
> *  Is there a recommendation of a better way to do these sorts of jobs?
Upgrade to the latest release of Lustre.

  The compute nodes have 48GB of ram, he does not use much ram for the 
job just all the IO.>
> * Is there a better way to tune?
Yes, you upgrade to the code that has all the tuning fixes/enhancements 
- Lustre 1.8

  What should I be looking for to tune?
You are wasting your time tuning here.
1.8 supports many things, including cache on OSTs which would likely 
help bunches in your case.

cliffw
>
> Thanks!
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2010-Dec-11 07:51 UTC

head link

[Lustre-discuss] finding performance issues

On 2010-12-10, at 12:42, Brock Palen wrote:> We have an lustre 1.6.x filesystem,
> 
> 4 OSS,  3 x4500 and 1 ddn s2a6620
> 
> Each oss has 4 1gig interfaces bonded, or 1 10gig interface.
> 
> I have a user who is running a few hundred serial jobs that are all
accessing the same 16GB file, we striped the file over all the osts, and are
tapped at 500-600MB/s no mater the number of hosts running.   IO per OST is
around 15-20MB/s  (31 total ost''s)
How big is the IO size?  Are all the clients both reading and writing this same
file?  Presumably you see better performance when so many jobs are not running
against the filesystem?
> This set of jobs keeps reading in the same data set, and has been running
for about 24 hours (the group of about 900 total jobs).
> 
> *  Is there a recommendation of a better way to do these sorts of jobs? 
The compute nodes have 48GB of ram, he does not use much ram for the job just
all the IO.
I agree with Cliff that the 1.8 OSS read cache will probably help the
performance in this case.  OSS read cache does not need a client-side upgrade to
work, though of course I''d suggest upgrading the clients anyway.

1.8.5 was just released this week...
> * Is there a better way to tune?  What should I be looking for to tune?
Start by looking at /proc/fs/lustre/obdfilter/*/brw_stats on the OSTs.  It
should be reset before the job (echo 0 to each file) so you get stats relevant
to that job only.  You can also check iostat on the OSS nodes to see how busy
the disks are.  They may be imbalanced due to being different hardware, and will
only go as fast as the slowest OSTs.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Lustre discuss - Dec 2010 - finding performance issues

[Lustre-discuss] finding performance issues

[Lustre-discuss] finding performance issues

[Lustre-discuss] finding performance issues