Hi.
I''m trying to set up some simple collection of io-statisticts for
lustre using
ganglia, but I''m quite confused with the large amount of stats files in
the /proc file system. Lots of files with a lot of numbers, and this is only
on the servers:
# find /proc/fs/lustre/ -name stats
/proc/fs/lustre/ldlm/services/ldlm_canceld/stats
/proc/fs/lustre/ldlm/services/ldlm_cbd/stats
/proc/fs/lustre/mdt/MDT/mds_readpage/stats
/proc/fs/lustre/mdt/MDT/mds_setattr/stats
/proc/fs/lustre/mdt/MDT/mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost7_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost6_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost5_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost4_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost3_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost2_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost1_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost0_work-mds/stats
/proc/fs/lustre/obdfilter/ost9/stats
/proc/fs/lustre/obdfilter/ost8/stats
/proc/fs/lustre/obdfilter/ost3/stats
/proc/fs/lustre/obdfilter/ost2/stats
/proc/fs/lustre/obdfilter/ost1/stats
/proc/fs/lustre/obdfilter/ost0/stats
/proc/fs/lustre/ost/OSS/ost_io/stats
/proc/fs/lustre/ost/OSS/ost_create/stats
/proc/fs/lustre/ost/OSS/ost/stats
Where can I find some info on what files to parse and what the numbers mean?
I just want to collect the bulk io in bytes per second on the server and
client side.
r.
--
The Computer Center, University of Troms?, N-9037 TROMS?, Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, High Performance Computing System Administrator
Direct call: +47 77 64 62 56. email: royd@cc.uit.no
Hi Roy, On Fri, Oct 06, 2006 at 10:37:55PM +0200, Roy Dragseth wrote:> Where can I find some info on what files to parse and what the numbers mean?FYI, there''s a section dedicated to /proc in the Lustre manual: https://mail.clusterfs.com/wikis/attachments/LustreManual.html#Chapter_III-2._LustreProc> I just want to collect the bulk io in bytes per second on the server and > client side.I think you should take a look at llobstat.pl, e.g.: # llobdstat.pl /proc/fs/lustre/obdfilter/testfs-OST0000/stats 2 /usr/sbin/llobdstat.pl on /proc/fs/lustre/obdfilter/testfs-OST0000/stats Processor counters run at 1296.471995 MHz Read: 0, Write: 0, create/destroy: 1/0, stat: 0, punch: 0 [NOTE: cx: create, dx: destroy, st: statfs, pu: punch ] Timestamp Read-delta ReadRate Write-delta WriteRate -------------------------------------------------------- 1160388933 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1160388935 0.00MB 0.00MB/s 27.00MB 13.36MB/s cx:1 st:1 1160388937 0.00MB 0.00MB/s 71.00MB 35.15MB/s 1160388939 0.00MB 0.00MB/s 71.00MB 35.15MB/s Besides, you can also use LLNL''s LMT (http://sourceforge.net/projects/lmt/). Cheers, Johann
Roy,
The Lustre kits/code drops contain scripts called llobdstat.pl and llstat.pl
that are intended to present the respective stats info in a more meaningful form
including, at least on the server side, the sort of info you are looking for...
So they are probably a good place to start.
You usually run the scripts passing as argument the stats file it is meant to
monitor/summarise, e.g.
# llobdstat.pl /proc/fs/lustre/obdfilter/ost9/stats 1
Fergal.
--
Fergal.McCarthy@HP.com
(The contents of this message and any attachments to it are confidential and may
be legally privileged. If you have received this message in error you should
delete it from your system immediately and advise the sender. To any recipient
of this message within HP, unless otherwise stated, you should consider this
message and attachments as "HP CONFIDENTIAL".)
-----Original Message-----
From: lustre-discuss-bounces@clusterfs.com
[mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Roy Dragseth
Sent: 06 October 2006 21:38
To: lustre-discuss@clusterfs.com
Subject: [Lustre-discuss] Collecting stats.
Hi.
I''m trying to set up some simple collection of io-statisticts for
lustre using
ganglia, but I''m quite confused with the large amount of stats files in
the /proc file system. Lots of files with a lot of numbers, and this is only
on the servers:
# find /proc/fs/lustre/ -name stats
/proc/fs/lustre/ldlm/services/ldlm_canceld/stats
/proc/fs/lustre/ldlm/services/ldlm_cbd/stats
/proc/fs/lustre/mdt/MDT/mds_readpage/stats
/proc/fs/lustre/mdt/MDT/mds_setattr/stats
/proc/fs/lustre/mdt/MDT/mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost7_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost6_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost5_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost4_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost3_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost2_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost1_work-mds/stats
/proc/fs/lustre/osc/OSC_lustre-11-0.local_ost0_work-mds/stats
/proc/fs/lustre/obdfilter/ost9/stats
/proc/fs/lustre/obdfilter/ost8/stats
/proc/fs/lustre/obdfilter/ost3/stats
/proc/fs/lustre/obdfilter/ost2/stats
/proc/fs/lustre/obdfilter/ost1/stats
/proc/fs/lustre/obdfilter/ost0/stats
/proc/fs/lustre/ost/OSS/ost_io/stats
/proc/fs/lustre/ost/OSS/ost_create/stats
/proc/fs/lustre/ost/OSS/ost/stats
Where can I find some info on what files to parse and what the numbers mean?
I just want to collect the bulk io in bytes per second on the server and
client side.
r.
--
The Computer Center, University of Troms?, N-9037 TROMS?, Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, High Performance Computing System Administrator
Direct call: +47 77 64 62 56. email: royd@cc.uit.no
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Hi, and thanks for the info. It seems to me that it should be fairly easy to tweak the lmt collector daemons to publish ganglia metrics for the server side. I''m not so sure what to do on the client side though, but that should be doable too. r.
By using llobdstat.pl as a guide I was able to create a fairly simple reporter
module for the the ganglia system (www.ganglia.info) which reports io-rates
for all the osts on a server. This gives utilization graphs for choosable
time-scales through the use of rrdtool, see attached picture for an example.
To make it more general I would like this module to be self aware of what
filesystem (or rather which lov, I guess) an ost is belonging to. Is this
information available somewhere in the proc tree? I can find the lov-ost
mapping on the server running the mds, but is this somehow available on the
other hosts too?
r.
--
The Computer Center, University of Troms?, N-9037 TROMS? Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, High Performance Computing System Administrator
Direct call: +47 77 64 62 56. email: royd@cc.uit.no
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_iorates.png
Type: image/png
Size: 10462 bytes
Desc: not available
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070112/bfff5f22/lustre_iorates.png