Wojciech Turek
2010-Jul-08 18:03 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
Hi, Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and users are noticing the slowness. The Lustre system consists of ~550 clients and currently we have 50 different users running jobs. I can see that OSS servers have load oscillating between 100-300 and collectl shows that there are lots of I/O going on (mainly read). I would like to find a good method of finding out which Lustre clients are generating the I/O so I could pinpoint the high load to a particular jobs. I hope that some Lustre users can share their experience in that matter. Best regards, -- -- Wojciech Turek -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100708/6fb8a5cb/attachment.html
Craig Prescott
2010-Jul-08 18:52 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
Hi Wojciech; We run collectl on each compute node, and toss some interesting numbers from it into ganglia (r/s, w/s, throughputs, etc). collectl can be found here: http://collectl.sourceforge.net/ There also are per-filesystem statistics on each client in the directories underneath /proc/fs/lustre/llite, and per-OST stats underneath /proc/fs/lustre/osc. You can feed the ''stats'' files in these dirs to the ''llstat'' command to show stats in an interval of your choosing. Cheers, Craig Prescott UF HPC Center Wojciech Turek wrote:> Hi, > > Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and > users are noticing the slowness. The Lustre system consists of ~550 > clients and currently we have 50 different users running jobs. I can see > that OSS servers have load oscillating between 100-300 and collectl > shows that there are lots of I/O going on (mainly read). I would like to > find a good method of finding out which Lustre clients are generating > the I/O so I could pinpoint the high load to a particular jobs. I hope > that some Lustre users can share their experience in that matter. > > Best regards, > > -- > -- > Wojciech Turek > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Andreas Dilger
2010-Jul-08 19:35 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
On 2010-07-08, at 12:03, Wojciech Turek wrote:> Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and users are noticing the slowness. The Lustre system consists of ~550 clients and currently we have 50 different users running jobs. I can see that OSS servers have load oscillating between 100-300 and collectl shows that there are lots of I/O going on (mainly read). I would like to find a good method of finding out which Lustre clients are generating the I/O so I could pinpoint the high load to a particular jobs. I hope that some Lustre users can share their experience in that matter.There are a number of ways to do this. One way is to check the "/proc/fs/lustre/obdfilter/*/exports/*/stats" files, which contains per-client statistics. They can be cleared by writing "0" to the file, and then check for files with lots of operations. Another way that I heard some sites were doing this is to use the "rpc history". They may already have a script to do this, but the basics are below: oss# lctl set_param ost.OSS.ost_io.req_buffer_history=10240 {wait a few seconds to collect some history} oss# lctl get_param ost.OSS.ost_io.req_history This will give you a list of the past (up to) 10240 RPCs for the "ost_io" RPC service, which is what you are observing the high load on: 3436037:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957534353:448:Complete:1278612656:0s(-6s) opc 3 3436038:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536190:448:Complete:1278615489:1s(-41s) opc 3 3436039:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536193:448:Complete:1278615490:0s(-6s) opc 3 This output is in the format: identifier:target_nid:source_nid:rpc_xid:rpc_size:rpc_status:arrival_time:service_time(deadline) opcode Using some shell scripting, one can find the clients sending the most RPC requests: oss# lctl get_param ost.OSS.ost_io.req_history | tr ":" " " | cut -d" " -f3,9,10 | sort | uniq -c | sort -nr | head -20 3443 12345-192.168.20.159 at tcp opc 3 1215 12345-192.168.20.157 at tcp opc 3 121 12345-192.168.20.157 at tcp opc 4 This will give you a sorted list of the top 20 clients that are sending the most RPCs to the ost_io service, along with the operation being done (3 = OST_READ, 4 = OST_WRITE). Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Guy Coates
2010-Jul-08 20:01 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
On 08/07/10 19:03, Wojciech Turek wrote:> Hi, > > Our Lustre filesystem (Lustre 1.8.3, RHEL5) got recently very busy and > users are noticing the slowness. The Lustre system consists of ~550 > clients and currently we have 50 different users running jobs. I can see > that OSS servers have load oscillating between 100-300 and collectl > shows that there are lots of I/O going on (mainly read). I would like to > find a good method of finding out which Lustre clients are generating > the I/O so I could pinpoint the high load to a particular jobs. I hope > that some Lustre users can share their experience in that matter.Try this script; (It is from Bernd Schubert). It will parse the per-client proc stats on the mds/oss into something nice and humanly-readable. It is very useful. Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- A non-text attachment was scrubbed... Name: lustre_client_stats.sh Type: application/x-sh Size: 796 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100708/4d2e1979/attachment.sh
Andreas Dilger
2010-Jul-08 21:21 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
On 2010-07-08, at 14:01, Guy Coates wrote:> Try this script; (It is from Bernd Schubert). It will parse the > per-client proc stats on the mds/oss into something nice and > humanly-readable. It is very useful.I''m not sure I''d quite call it "human readable", but it does show that there is a need for something to print out stats for all of the clients. ===================== /proc/fs/lustre/obdfilter/myth-OST0004/exports ===========================0 at lo read_bytes 123343 samples [bytes] 1 1048576 64498717397 write_bytes 18457 samples [bytes] 1 1048576 3200834973 get_info 2 samples [reqs] set_info_async 1 samples [reqs] disconnect 3 samples [reqs] create 420 samples [reqs] destroy 883 samples [reqs] setattr 13276 samples [reqs] punch 15 samples [reqs] preprw 141800 samples [reqs] commitrw 141800 samples [reqs] 192.168.20.147 at tcp read_bytes 146 samples [bytes] 4096 1048576 114471161 write_bytes 7 samples [bytes] 163840 1048576 5244376 disconnect 6 samples [reqs] preprw 153 samples [reqs] commitrw 153 samples [reqs] 192.168.20.154 at tcp read_bytes 550 samples [bytes] 4096 1048576 270017490 write_bytes 1126 samples [bytes] 32 1048576 614266996 disconnect 2 samples [reqs] preprw 1676 samples [reqs] commitrw 1676 samples [reqs] 192.168.20.159 at tcp read_bytes 88745 samples [bytes] 0 1048576 61982699353 write_bytes 75428 samples [bytes] 16 1048576 27989934969 get_info 4 samples [reqs] disconnect 22 samples [reqs] destroy 113 samples [reqs] setattr 1 samples [reqs] punch 154 samples [reqs] sync 81914 samples [reqs] preprw 164173 samples [reqs] commitrw 164173 samples [reqs] ============================================================================================ Probably an equivalent script that produces more readable output would be like: egrep -v "snapshot|ping" /proc/fs/lustre/{mds,obdfilter}/*/exports/*/stats | cut -d/ -f 6,8,9 which will print something like: myth-MDT0000/0 at lo/stats:open 10 samples [reqs] myth-MDT0000/0 at lo/stats:close 2 samples [reqs] myth-MDT0000/0 at lo/stats:getxattr 1 samples [reqs] myth-MDT0000/192.168.20.159 at tcp/stats:open 3654 samples [reqs] myth-MDT0000/192.168.20.159 at tcp/stats:close 1827 samples [reqs] myth-MDT0000/192.168.20.159 at tcp/stats:unlink 1 samples [reqs] myth-MDT0000/192.168.20.159 at tcp/stats:getxattr 15674 samples [reqs] myth-OST0000/0 at lo/stats:read_bytes 2137 samples [bytes] myth-OST0000/0 at lo/stats:preprw 2137 samples [reqs] : : I would also recommend the "llstat" tool that is part of Lustre for ages already, that will do mostly the same thing but can print it like "vmstat" output with the current operation rates. The main difference is that the "lustre_client_stats.sh" script prints the output for all of the clients at once. While we are on the topic, people may also be interested in "llobdstat", which prints an IO-oriented status for any "stats" file containing the read_bytes and write_bytes entries: llobdstat myth-OST0000 2 /usr/bin/llobdstat on obdfilter/myth-OST0000 Processor counters run at 2800.419 MHz Read: 4.08846e+11, Write: 9.0329e+10, create/destroy: 1133/1996, stat: 12128, punch: 241 [NOTE: cx: create, dx: destroy, st: statfs, pu: punch ] Timestamp Read-delta ReadRate Write-delta WriteRate -------------------------------------------------------- 1278622955 21.00MB 10.48MB/s 0.00MB 0.00MB/s 1278622957 23.00MB 11.48MB/s 0.00MB 0.00MB/s 1278622959 22.33MB 11.14MB/s 0.00MB 0.00MB/s 1278622961 11.68MB 5.83MB/s 0.00MB 0.00MB/s 1278622963 18.45MB 9.20MB/s 0.00MB 0.00MB/s st:1 1278622965 20.72MB 10.34MB/s 0.00MB 0.00MB/s st:1 It can also be used on a client stats file, like /proc/fs/lustre/osc/myth-OST0000-osc-ffff81001f5d54d0/stats Bernd, would you (or anyone) be interested to enhance those tools to be able to show stats data from multiple files at once (each prefixed by the device name and/or client NID)? I don''t think it makes sense to create separate tools for this. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Bernd Schubert
2010-Jul-08 22:11 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/08/2010 11:21 PM, Andreas Dilger wrote:> On 2010-07-08, at 14:01, Guy Coates wrote: >> Try this script; (It is from Bernd Schubert). It will parse the >> per-client proc stats on the mds/oss into something nice and >> humanly-readable. It is very useful. > > I''m not sure I''d quite call it "human readable", but it does show > that there is a need for something to print out stats for all of the > clients.Yeah, I agree, it is not perfect yet. Especially it needs to be sorted by clients doing most IO. That shouldn''t be too difficult with the existing script. [...]> Bernd, would you (or anyone) be interested to enhance those tools to > be able to show stats data from multiple files at once (each prefixed > by the device name and/or client NID)? I don''t think it makes sense > to create separate tools for this.I''m not sure if the existing lustre tools are really what we need. If you have a cluster with 200 or more clients and then want to figure out which clients are doing most IO, several lines per client provide too much output. One line sorted by IO seems to be better, IMHO. I would be for interested to enhance the existing tools, but then if I look into the number of open bugs I have, several of those have a higher priorty (btw, this script is among my bug list (bug 22469)). Additionally at least still for the next couple of weeks I''m very very limited with my time to finish my thesis. Cheers, Bernd - -- Bernd Schubert DataDirect Networks -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkw2TQEACgkQqh74FqyuOzS0XQCgs7J7MqetIr1Y99gIqXBa9ntW 9pgAn2gFp+gI6R2aa3GverrNsR4v9bfO =YcKt -----END PGP SIGNATURE-----
Andreas Dilger
2010-Jul-09 17:26 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
On 2010-07-08, at 16:11, Bernd Schubert wrote:>> Bernd, would you (or anyone) be interested to enhance those tools to be able to show stats data from multiple files at once (each prefixed by the device name and/or client NID)? I don''t think it makes sense to create separate tools for this. > > I''m not sure if the existing lustre tools are really what we need. If you have a cluster with 200 or more clients and then want to figure out which clients are doing most IO, several lines per client provide too much output.I agree, but having a 200-column line is also not very useful. I like the "llobdstat" output where it prints the IO numbers, and then appends only the abbreviated values that are changing for that interval, instead of printing all of the values.> One line sorted by IO seems to be better, IMHO.The commands that I posted using the rpc_history file will print out a summary of all client RPC counts sorted by maximum user. Something similar could be done by aggregating all of the per-client stats as well, though it would mean touching a lot more input files for each interval.> I would be for interested to enhance the existing tools, but then if I look into the number of open bugs I have, several of those have a higher priorty (btw, this script is among my bug list (bug 22469)).I was actually hoping that someone else might take it up. The llstat and llobdstat scripts are perl, and there should be a good number of people who could do a bit of perl hacking. The scripts are currently "vmstat" or "iostat" like, in that they print out the parameters as they change over time. It might also be interesting (if someone has the perl-fu to do it) to have a "top" mode, where it resets the screen position each time and sorts the output from all of the clients. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Wojciech Turek
2010-Jul-09 17:41 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
Thank you all for very useful suggestions. The Andreas''s way which uses rpc_history gave out exactly what I was looking for in a quite easy to read form. On 9 July 2010 18:26, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-07-08, at 16:11, Bernd Schubert wrote: > >> Bernd, would you (or anyone) be interested to enhance those tools to be > able to show stats data from multiple files at once (each prefixed by the > device name and/or client NID)? I don''t think it makes sense to create > separate tools for this. > > > > I''m not sure if the existing lustre tools are really what we need. If you > have a cluster with 200 or more clients and then want to figure out which > clients are doing most IO, several lines per client provide too much output. > > I agree, but having a 200-column line is also not very useful. I like the > "llobdstat" output where it prints the IO numbers, and then appends only the > abbreviated values that are changing for that interval, instead of printing > all of the values. > > > One line sorted by IO seems to be better, IMHO. > > The commands that I posted using the rpc_history file will print out a > summary of all client RPC counts sorted by maximum user. Something similar > could be done by aggregating all of the per-client stats as well, though it > would mean touching a lot more input files for each interval. > > > I would be for interested to enhance the existing tools, but then if I > look into the number of open bugs I have, several of those have a higher > priorty (btw, this script is among my bug list (bug 22469)). > > I was actually hoping that someone else might take it up. The llstat and > llobdstat scripts are perl, and there should be a good number of people who > could do a bit of perl hacking. > > The scripts are currently "vmstat" or "iostat" like, in that they print out > the parameters as they change over time. It might also be interesting (if > someone has the perl-fu to do it) to have a "top" mode, where it resets the > screen position each time and sorts the output from all of the clients. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >-- -- Wojciech Turek -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100709/da98f7cf/attachment.html
Seger, Mark
2010-Jul-11 21:56 UTC
[Lustre-discuss] How to determine which lustre clients are loading filesystem.
Wojciech Turek <wjt27 at ...<mailto:wjt27 at ...>> writes:>>> Thank you all for very useful suggestions. The Andreas''s way which> usesrpc_history gave out exactly what I was looking for in a quite easy to read form.> On 9 July 2010 18:26, Andreas Dilger <andreas.dilger-QHcLZuEGTsvQT0dZR+AlfA at public.gmane.org<mailto:QHcLZuEGTsvQT0dZR+AlfA at public.gmane.org>> wrote:> On 2010-07-08, at 16:11, Bernd Schubert wrote:> >> Bernd, would you (or anyone) be interested to enhance those tools> >> to beable to show stats data from multiple files at once (each prefixed by the device name and/or client NID)? ? I don''t think it makes sense to create separate tools for this.>For what it''s worth, you can get very detailed client-side stats from collectl. The way it figures out what the client is doing is to actually look at the ost- level stats and add them up! Why? because that means you can they replay the data and break things down by OST. There are also client side switches to look at BRW stats, readahead stats and even what''s going on with meta-data. If you then plot the data with colplot you can drill down and look at all kinds of things. For example if you have data from multiple clients you can even compare it side-by-side. check out collectl-utils on sourceforge if you haven''t yet. Alas, I''m one of the few people (I think) who ever gets into this level of analysis because I fear the number of switches tend to scare people off. ;) -mark> >> > I''m not sure if the existing lustre tools are really what we need.> > If youhave a cluster with 200 or more clients and then want to figure out which clients are doing most IO, several lines per client provide too much output.> I agree, but having a 200-column line is also not very useful. ? I> like the"llobdstat" output where it prints the IO numbers, and then appends only the abbreviated values that are changing for that interval, instead of printing all of the values.>> > One line sorted by IO seems to be better, IMHO.> The commands that I posted using the rpc_history file will print out a> summaryof all client RPC counts sorted by maximum user. ? Something similar could be done by aggregating all of the per-client stats as well, though it would mean touching a lot more input files for each interval.>> > I would be for interested to enhance the existing tools, but then if> > I lookinto the number of open bugs I have, several of those have a higher priorty (btw, this script is among my bug list (bug 22469)).> I was actually hoping that someone else might take it up. ? The llstat> andllobdstat scripts are perl, and there should be a good number of people who could do a bit of perl hacking.> The scripts are currently "vmstat" or "iostat" like, in that they> print outthe parameters as they change over time. ? It might also be interesting (if someone has the perl-fu to do it) to have a "top" mode, where it resets the screen position each time and sorts the output from all of the clients.>>>> Cheers, Andreas> --> Andreas Dilger> Lustre Technical Lead> Oracle Corporation Canada Inc.>>>>> -- --Wojciech Turek-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100711/41e56fdc/attachment.html