thr3ads.net - Lustre discuss - [Lustre-discuss] How to determine which lustre clients are loading filesystem. [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Wojciech Turek

2010-Jul-08 18:03 UTC

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

Hi,

Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and users
are noticing the slowness. The Lustre system consists of ~550 clients and
currently we have 50 different users running jobs. I can see that OSS
servers have load oscillating between 100-300 and collectl shows that there
are lots of I/O going on (mainly read). I would like to find a good method
of finding out which Lustre clients are generating the I/O so I could
pinpoint the high load to a particular jobs. I hope that some Lustre users
can share their experience in that matter.

Best regards,

-- 
--
Wojciech Turek
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100708/6fb8a5cb/attachment.html

Craig Prescott

2010-Jul-08 18:52 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

Hi Wojciech;

We run collectl on each compute node, and toss some interesting numbers 
from it into ganglia (r/s, w/s, throughputs, etc).

collectl can be found here:

http://collectl.sourceforge.net/

There also are per-filesystem statistics on each client in the 
directories underneath /proc/fs/lustre/llite, and per-OST stats 
underneath /proc/fs/lustre/osc.  You can feed the ''stats''
files in these
dirs to the ''llstat'' command to show stats in an interval of
your choosing.

Cheers,
Craig Prescott
UF HPC Center

Wojciech Turek wrote:> Hi,
> 
> Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and 
> users are noticing the slowness. The Lustre system consists of ~550 
> clients and currently we have 50 different users running jobs. I can see 
> that OSS servers have load oscillating between 100-300 and collectl 
> shows that there are lots of I/O going on (mainly read). I would like to 
> find a good method of finding out which Lustre clients are generating 
> the I/O so I could pinpoint the high load to a particular jobs. I hope 
> that some Lustre users can share their experience in that matter.
> 
> Best regards,
> 
> -- 
> --
> Wojciech Turek
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2010-Jul-08 19:35 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

On 2010-07-08, at 12:03, Wojciech Turek wrote:> Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and
users are noticing the slowness. The Lustre system consists of ~550 clients and
currently we have 50 different users running jobs. I can see that OSS servers
have load oscillating between 100-300 and collectl shows that there are lots of
I/O going on (mainly read). I would like to find a good method of finding out
which Lustre clients are generating the I/O so I could pinpoint the high load to
a particular jobs. I hope that some Lustre users can share their experience in
that matter.
There are a number of ways to do this.  One way is to check the
"/proc/fs/lustre/obdfilter/*/exports/*/stats" files, which contains
per-client statistics.  They can be cleared by writing "0" to the
file, and then check for files with lots of operations.

Another way that I heard some sites were doing this is to use the "rpc
history".  They may already have a script to do this, but the basics are
below:

oss# lctl set_param ost.OSS.ost_io.req_buffer_history=10240
{wait a few seconds to collect some history}
oss# lctl get_param ost.OSS.ost_io.req_history

This will give you a list of the past (up to) 10240 RPCs for the
"ost_io" RPC service, which is what you are observing the high load
on:

3436037:192.168.20.1 at tcp:12345-192.168.20.159 at
tcp:x1340648957534353:448:Complete:1278612656:0s(-6s) opc 3
3436038:192.168.20.1 at tcp:12345-192.168.20.159 at
tcp:x1340648957536190:448:Complete:1278615489:1s(-41s) opc 3
3436039:192.168.20.1 at tcp:12345-192.168.20.159 at
tcp:x1340648957536193:448:Complete:1278615490:0s(-6s) opc 3

This output is in the format:

identifier:target_nid:source_nid:rpc_xid:rpc_size:rpc_status:arrival_time:service_time(deadline)
opcode

Using some shell scripting, one can find the clients sending the most RPC
requests:

oss# lctl get_param ost.OSS.ost_io.req_history | tr ":" " "
| cut -d" " -f3,9,10 | sort | uniq -c | sort -nr | head -20


   3443 12345-192.168.20.159 at tcp opc 3
   1215 12345-192.168.20.157 at tcp opc 3
    121 12345-192.168.20.157 at tcp opc 4

This will give you a sorted list of the top 20 clients that are sending the most
RPCs to the ost_io service, along with the operation being done (3 = OST_READ, 4
= OST_WRITE).

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Guy Coates

2010-Jul-08 20:01 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

On 08/07/10 19:03, Wojciech Turek wrote:> Hi,
> 
> Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and
> users are noticing the slowness. The Lustre system consists of ~550
> clients and currently we have 50 different users running jobs. I can see
> that OSS servers have load oscillating between 100-300 and collectl
> shows that there are lots of I/O going on (mainly read). I would like to
> find a good method of finding out which Lustre clients are generating
> the I/O so I could pinpoint the high load to a particular jobs. I hope
> that some Lustre users can share their experience in that matter.
Try this script; (It is from Bernd Schubert). It will parse the
per-client  proc stats on the mds/oss into something nice and
humanly-readable. It is very useful.

Cheers,

Guy

-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_client_stats.sh
Type: application/x-sh
Size: 796 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100708/4d2e1979/attachment.sh

Andreas Dilger

2010-Jul-08 21:21 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

On 2010-07-08, at 14:01, Guy Coates wrote:> Try this script; (It is from Bernd Schubert). It will parse the
> per-client  proc stats on the mds/oss into something nice and
> humanly-readable. It is very useful.
I''m not sure I''d quite call it "human readable", but
it does show that there is a need for something to print out stats for all of
the clients.

===================== /proc/fs/lustre/obdfilter/myth-OST0004/exports
===========================0 at lo read_bytes 123343 samples [bytes] 1 1048576
64498717397 write_bytes 18457 samples [bytes] 1 1048576 3200834973 get_info 2
samples [reqs] set_info_async 1 samples [reqs] disconnect 3 samples [reqs]
create 420 samples [reqs] destroy 883 samples [reqs] setattr 13276 samples
[reqs] punch 15 samples [reqs] preprw 141800 samples [reqs] commitrw 141800
samples [reqs]
192.168.20.147 at tcp read_bytes 146 samples [bytes] 4096 1048576 114471161
write_bytes 7 samples [bytes] 163840 1048576 5244376 disconnect 6 samples [reqs]
preprw 153 samples [reqs] commitrw 153 samples [reqs]
192.168.20.154 at tcp read_bytes 550 samples [bytes] 4096 1048576 270017490
write_bytes 1126 samples [bytes] 32 1048576 614266996 disconnect 2 samples
[reqs] preprw 1676 samples [reqs] commitrw 1676 samples [reqs]
192.168.20.159 at tcp read_bytes 88745 samples [bytes] 0 1048576 61982699353
write_bytes 75428 samples [bytes] 16 1048576 27989934969 get_info 4 samples
[reqs] disconnect 22 samples [reqs] destroy 113 samples [reqs] setattr 1 samples
[reqs] punch 154 samples [reqs] sync 81914 samples [reqs] preprw 164173 samples
[reqs] commitrw 164173 samples [reqs]
============================================================================================
Probably an equivalent script that produces more readable output would be like:

egrep -v "snapshot|ping"
/proc/fs/lustre/{mds,obdfilter}/*/exports/*/stats | cut -d/ -f 6,8,9

which will print something like:

myth-MDT0000/0 at lo/stats:open                      10 samples [reqs]
myth-MDT0000/0 at lo/stats:close                      2 samples [reqs]
myth-MDT0000/0 at lo/stats:getxattr                   1 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:open      3654 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:close     1827 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:unlink       1 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:getxattr 15674 samples [reqs]
myth-OST0000/0 at lo/stats:read_bytes              2137 samples [bytes]
myth-OST0000/0 at lo/stats:preprw                  2137 samples [reqs]
:
:

I would also recommend the "llstat" tool that is part of Lustre for
ages already, that will do mostly the same thing but can print it like
"vmstat" output with the current operation rates.  The main difference
is that the "lustre_client_stats.sh" script prints the output for all
of the clients at once.

While we are on the topic, people may also be interested in
"llobdstat", which prints an IO-oriented status for any
"stats" file containing the read_bytes and write_bytes entries:

llobdstat myth-OST0000 2
/usr/bin/llobdstat on obdfilter/myth-OST0000
Processor counters run at 2800.419 MHz
Read: 4.08846e+11, Write: 9.0329e+10, create/destroy: 1133/1996, stat: 12128,
punch: 241
[NOTE: cx: create, dx: destroy, st: statfs, pu: punch ]

Timestamp   Read-delta  ReadRate  Write-delta  WriteRate
--------------------------------------------------------
1278622955   21.00MB   10.48MB/s     0.00MB    0.00MB/s
1278622957   23.00MB   11.48MB/s     0.00MB    0.00MB/s
1278622959   22.33MB   11.14MB/s     0.00MB    0.00MB/s
1278622961   11.68MB    5.83MB/s     0.00MB    0.00MB/s
1278622963   18.45MB    9.20MB/s     0.00MB    0.00MB/s st:1
1278622965   20.72MB   10.34MB/s     0.00MB    0.00MB/s st:1

It can also be used on a client stats file, like
/proc/fs/lustre/osc/myth-OST0000-osc-ffff81001f5d54d0/stats

Bernd, would you (or anyone) be interested to enhance those tools to be able to
show stats data from multiple files at once (each prefixed by the device name
and/or client NID)?  I don''t think it makes sense to create separate
tools for this.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Bernd Schubert

2010-Jul-08 22:11 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/08/2010 11:21 PM, Andreas Dilger wrote:> On 2010-07-08, at 14:01, Guy Coates wrote:
>> Try this script; (It is from Bernd Schubert). It will parse the 
>> per-client  proc stats on the mds/oss into something nice and 
>> humanly-readable. It is very useful.
> 
> I''m not sure I''d quite call it "human
readable", but it does show
> that there is a need for something to print out stats for all of the
> clients.
Yeah, I agree, it is not perfect yet. Especially it needs to be sorted
by clients doing most IO. That shouldn''t be too difficult with the
existing script.

[...]
> Bernd, would you (or anyone) be interested to enhance those tools to
> be able to show stats data from multiple files at once (each prefixed
> by the device name and/or client NID)?  I don''t think it makes
sense
> to create separate tools for this.
I''m not sure if the existing lustre tools are really what we need. If
you have a cluster with 200 or more clients and then want to figure out
which clients are doing most IO, several lines per client provide too
much output. One line sorted by IO seems to be better, IMHO.

I would be for interested to enhance the existing tools, but then if I
look into the number of open bugs I have, several of those have a higher
priorty (btw, this script is among my bug list (bug 22469)).

Additionally at least still for the next couple of weeks I''m very very
limited with my time to finish my thesis.

Cheers,
Bernd

- --
Bernd Schubert
DataDirect Networks
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkw2TQEACgkQqh74FqyuOzS0XQCgs7J7MqetIr1Y99gIqXBa9ntW
9pgAn2gFp+gI6R2aa3GverrNsR4v9bfO
=YcKt
-----END PGP SIGNATURE-----

Andreas Dilger

2010-Jul-09 17:26 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

On 2010-07-08, at 16:11, Bernd Schubert wrote:>> Bernd, would you (or anyone) be interested to enhance those tools to be
able to show stats data from multiple files at once (each prefixed by the device
name and/or client NID)?  I don''t think it makes sense to create
separate tools for this.
> 
> I''m not sure if the existing lustre tools are really what we need.
If you have a cluster with 200 or more clients and then want to figure out which
clients are doing most IO, several lines per client provide too much output.
I agree, but having a 200-column line is also not very useful.  I like the
"llobdstat" output where it prints the IO numbers, and then appends
only the abbreviated values that are changing for that interval, instead of
printing all of the values.
> One line sorted by IO seems to be better, IMHO.
The commands that I posted using the rpc_history file will print out a summary
of all client RPC counts sorted by maximum user.  Something similar could be
done by aggregating all of the per-client stats as well, though it would mean
touching a lot more input files for each interval.
> I would be for interested to enhance the existing tools, but then if I look
into the number of open bugs I have, several of those have a higher priorty
(btw, this script is among my bug list (bug 22469)).
I was actually hoping that someone else might take it up.  The llstat and
llobdstat scripts are perl, and there should be a good number of people who
could do a bit of perl hacking.

The scripts are currently "vmstat" or "iostat" like, in that
they print out the parameters as they change over time.  It might also be
interesting (if someone has the perl-fu to do it) to have a "top"
mode, where it resets the screen position each time and sorts the output from
all of the clients.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Wojciech Turek

2010-Jul-09 17:41 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

Thank you all for very useful suggestions. The Andreas''s way which uses
rpc_history gave out exactly what I was looking for in a quite easy to read
form.

On 9 July 2010 18:26, Andreas Dilger <andreas.dilger at oracle.com> wrote:
> On 2010-07-08, at 16:11, Bernd Schubert wrote:
> >> Bernd, would you (or anyone) be interested to enhance those tools
to be
> able to show stats data from multiple files at once (each prefixed by the
> device name and/or client NID)?  I don''t think it makes sense to
create
> separate tools for this.
> >
> > I''m not sure if the existing lustre tools are really what we
need. If you
> have a cluster with 200 or more clients and then want to figure out which
> clients are doing most IO, several lines per client provide too much
output.
>
> I agree, but having a 200-column line is also not very useful.  I like the
> "llobdstat" output where it prints the IO numbers, and then
appends only the
> abbreviated values that are changing for that interval, instead of printing
> all of the values.
>
> > One line sorted by IO seems to be better, IMHO.
>
> The commands that I posted using the rpc_history file will print out a
> summary of all client RPC counts sorted by maximum user.  Something similar
> could be done by aggregating all of the per-client stats as well, though it
> would mean touching a lot more input files for each interval.
>
> > I would be for interested to enhance the existing tools, but then if I
> look into the number of open bugs I have, several of those have a higher
> priorty (btw, this script is among my bug list (bug 22469)).
>
> I was actually hoping that someone else might take it up.  The llstat and
> llobdstat scripts are perl, and there should be a good number of people who
> could do a bit of perl hacking.
>
> The scripts are currently "vmstat" or "iostat" like, in
that they print out
> the parameters as they change over time.  It might also be interesting (if
> someone has the perl-fu to do it) to have a "top" mode, where it
resets the
> screen position each time and sorts the output from all of the clients.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>

-- 
--
Wojciech Turek
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100709/da98f7cf/attachment.html

Seger, Mark

2010-Jul-11 21:56 UTC

head link

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

Wojciech Turek <wjt27 at ...<mailto:wjt27 at ...>> writes:


>
>
> Thank you all for very useful suggestions. The Andreas''s way which
> uses
rpc_history gave out exactly what I was looking for in a quite easy to read
form.
> On 9 July 2010 18:26, Andreas Dilger <andreas.dilger-
QHcLZuEGTsvQT0dZR+AlfA at public.gmane.org<mailto:QHcLZuEGTsvQT0dZR+AlfA at
public.gmane.org>> wrote:
> On 2010-07-08, at 16:11, Bernd Schubert wrote:
> >> Bernd, would you (or anyone) be interested to enhance those tools
> >> to be
able to show stats data from multiple files at once (each prefixed by the device
name and/or client NID)? ? I don''t think it makes sense to create
separate tools for this.
>


For what it''s worth, you can get very detailed client-side stats from
collectl.

The way it figures out what the client is doing is to actually look at the ost-
level stats and add them up!  Why?  because that means you can they replay the
data and break things down by OST.



There are also client side switches to look at BRW stats, readahead stats and
even what''s going on with meta-data.  If you then plot the data with
colplot you can drill down and look at all kinds of things.  For example if  you
have data from multiple clients you can even compare it side-by-side.  check out
collectl-utils on sourceforge if you haven''t yet.



Alas, I''m one of the few people (I think) who ever gets into this level
of analysis because I fear the number of switches tend to scare people off.  ;)



-mark


> >
> > I''m not sure if the existing lustre tools are really what we
need.
> > If you
have a cluster with 200 or more clients and then want to figure out which
clients are doing most IO, several lines per client provide too much output.
> I agree, but having a 200-column line is also not very useful. ? I
> like the
"llobdstat" output where it prints the IO numbers, and then appends
only the abbreviated values that are changing for that interval, instead of
printing all of the values.
>
> > One line sorted by IO seems to be better, IMHO.
> The commands that I posted using the rpc_history file will print out a
> summary
of all client RPC counts sorted by maximum user. ? Something similar could be
done by aggregating all of the per-client stats as well, though it would mean
touching a lot more input files for each interval.
>
> > I would be for interested to enhance the existing tools, but then if
> > I look
into the number of open bugs I have, several of those have a higher priorty
(btw, this script is among my bug list (bug 22469)).
> I was actually hoping that someone else might take it up. ? The llstat
> and
llobdstat scripts are perl, and there should be a good number of people who
could do a bit of perl hacking.
> The scripts are currently "vmstat" or "iostat" like, in
that they
> print out
the parameters as they change over time. ? It might also be interesting (if
someone has the perl-fu to do it) to have a "top" mode, where it
resets the screen position each time and sorts the output from all of the
clients.
>
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>
>
>
> -- --Wojciech Turek
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100711/41e56fdc/attachment.html

Lustre discuss - Jul 2010 - How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.

[Lustre-discuss] How to determine which lustre clients are loading filesystem.