Hi. I''m trying to set up some simple collection of io-statisticts for lustre using ganglia, but I''m quite confused with the large amount of stats files in the /proc file system. Lots of files with a lot of numbers, and this is only on the servers: # find /proc/fs/lustre/ -name stats /proc/fs/lustre/ldlm/services/ldlm_canceld/stats /proc/fs/lustre/ldlm/services/ldlm_cbd/stats /proc/fs/lustre/mdt/MDT/mds_readpage/stats /proc/fs/lustre/mdt/MDT/mds_setattr/stats /proc/fs/lustre/mdt/MDT/mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost7_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost6_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost5_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost4_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost3_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost2_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost1_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost0_work-mds/stats /proc/fs/lustre/obdfilter/ost9/stats /proc/fs/lustre/obdfilter/ost8/stats /proc/fs/lustre/obdfilter/ost3/stats /proc/fs/lustre/obdfilter/ost2/stats /proc/fs/lustre/obdfilter/ost1/stats /proc/fs/lustre/obdfilter/ost0/stats /proc/fs/lustre/ost/OSS/ost_io/stats /proc/fs/lustre/ost/OSS/ost_create/stats /proc/fs/lustre/ost/OSS/ost/stats Where can I find some info on what files to parse and what the numbers mean? I just want to collect the bulk io in bytes per second on the server and client side. r. -- The Computer Center, University of Troms?, N-9037 TROMS?, Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd@cc.uit.no
Hi Roy, On Fri, Oct 06, 2006 at 10:37:55PM +0200, Roy Dragseth wrote:> Where can I find some info on what files to parse and what the numbers mean?FYI, there''s a section dedicated to /proc in the Lustre manual: https://mail.clusterfs.com/wikis/attachments/LustreManual.html#Chapter_III-2._LustreProc> I just want to collect the bulk io in bytes per second on the server and > client side.I think you should take a look at llobstat.pl, e.g.: # llobdstat.pl /proc/fs/lustre/obdfilter/testfs-OST0000/stats 2 /usr/sbin/llobdstat.pl on /proc/fs/lustre/obdfilter/testfs-OST0000/stats Processor counters run at 1296.471995 MHz Read: 0, Write: 0, create/destroy: 1/0, stat: 0, punch: 0 [NOTE: cx: create, dx: destroy, st: statfs, pu: punch ] Timestamp Read-delta ReadRate Write-delta WriteRate -------------------------------------------------------- 1160388933 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1160388935 0.00MB 0.00MB/s 27.00MB 13.36MB/s cx:1 st:1 1160388937 0.00MB 0.00MB/s 71.00MB 35.15MB/s 1160388939 0.00MB 0.00MB/s 71.00MB 35.15MB/s Besides, you can also use LLNL''s LMT (http://sourceforge.net/projects/lmt/). Cheers, Johann
Roy, The Lustre kits/code drops contain scripts called llobdstat.pl and llstat.pl that are intended to present the respective stats info in a more meaningful form including, at least on the server side, the sort of info you are looking for... So they are probably a good place to start. You usually run the scripts passing as argument the stats file it is meant to monitor/summarise, e.g. # llobdstat.pl /proc/fs/lustre/obdfilter/ost9/stats 1 Fergal. -- Fergal.McCarthy@HP.com (The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".) -----Original Message----- From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Roy Dragseth Sent: 06 October 2006 21:38 To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] Collecting stats. Hi. I''m trying to set up some simple collection of io-statisticts for lustre using ganglia, but I''m quite confused with the large amount of stats files in the /proc file system. Lots of files with a lot of numbers, and this is only on the servers: # find /proc/fs/lustre/ -name stats /proc/fs/lustre/ldlm/services/ldlm_canceld/stats /proc/fs/lustre/ldlm/services/ldlm_cbd/stats /proc/fs/lustre/mdt/MDT/mds_readpage/stats /proc/fs/lustre/mdt/MDT/mds_setattr/stats /proc/fs/lustre/mdt/MDT/mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost7_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost6_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost5_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost4_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost3_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost2_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost1_work-mds/stats /proc/fs/lustre/osc/OSC_lustre-11-0.local_ost0_work-mds/stats /proc/fs/lustre/obdfilter/ost9/stats /proc/fs/lustre/obdfilter/ost8/stats /proc/fs/lustre/obdfilter/ost3/stats /proc/fs/lustre/obdfilter/ost2/stats /proc/fs/lustre/obdfilter/ost1/stats /proc/fs/lustre/obdfilter/ost0/stats /proc/fs/lustre/ost/OSS/ost_io/stats /proc/fs/lustre/ost/OSS/ost_create/stats /proc/fs/lustre/ost/OSS/ost/stats Where can I find some info on what files to parse and what the numbers mean? I just want to collect the bulk io in bytes per second on the server and client side. r. -- The Computer Center, University of Troms?, N-9037 TROMS?, Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd@cc.uit.no _______________________________________________ Lustre-discuss mailing list Lustre-discuss@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Hi, and thanks for the info. It seems to me that it should be fairly easy to tweak the lmt collector daemons to publish ganglia metrics for the server side. I''m not so sure what to do on the client side though, but that should be doable too. r.
By using llobdstat.pl as a guide I was able to create a fairly simple reporter module for the the ganglia system (www.ganglia.info) which reports io-rates for all the osts on a server. This gives utilization graphs for choosable time-scales through the use of rrdtool, see attached picture for an example. To make it more general I would like this module to be self aware of what filesystem (or rather which lov, I guess) an ost is belonging to. Is this information available somewhere in the proc tree? I can find the lov-ost mapping on the server running the mds, but is this somehow available on the other hosts too? r. -- The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd@cc.uit.no -------------- next part -------------- A non-text attachment was scrubbed... Name: lustre_iorates.png Type: image/png Size: 10462 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070112/bfff5f22/lustre_iorates.png