Is it possible to figure out what client is taking up the most I/O? We have 8 OSS and 200 clients and it seems 5 to 6 clients are taking up all the bandwidth and I am trying to figure out who it is... TIA
On 2009-12-04, at 20:18, Mag Gam wrote:> Is it possible to figure out what client is taking up the most I/O? We > have 8 OSS and 200 clients and it seems 5 to 6 clients are taking up > all the bandwidth and I am trying to figure out who it is...In newer versions of Lustre (1.8 definitely, and later 1.6.x) there is an "exports" directory that contains statistics about each client, including brw_stats. Alternately, you can just enable the RPC statistics and then dump the debug logs after a some seconds and check for the OST_READ(3) and OST_WRITE(4). lctl set_param debug=+rpctrace sleep 20 lctl dk /tmp/debug grep "Handled.*:[34]$" /tmp/debug Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
People may want to be a little careful when trying to utilize the "exports" directory that you mention. I just tried the cat the stats file for a random client on my MDS and it caused a kernel panic. The exact command I used was: cat mgs/MGS/exports/172.16.44.30\@o2ib/stats Could this be a known bug or should I file a bug report on it? We are running Lustre 1.6.7.2 on RedHat 4.5. I didn''t get a full kernel dump, but I did get a picture of the screen before I rebooted. The last line of the panic reads: RIP [<ffffffff8868a07b.] :obdclass:lprocfs_stats_seq_show+0xcb/0x210 Mike Robbert On Dec 4, 2009, at 11:39 PM, Andreas Dilger wrote:> On 2009-12-04, at 20:18, Mag Gam wrote: >> Is it possible to figure out what client is taking up the most I/O? We >> have 8 OSS and 200 clients and it seems 5 to 6 clients are taking up >> all the bandwidth and I am trying to figure out who it is... > > > In newer versions of Lustre (1.8 definitely, and later 1.6.x) there is > an "exports" directory that contains statistics about each client, > including brw_stats. > > Alternately, you can just enable the RPC statistics and then dump the > debug logs after a some seconds and check for the OST_READ(3) and > OST_WRITE(4). > > lctl set_param debug=+rpctrace > sleep 20 > lctl dk /tmp/debug > grep "Handled.*:[34]$" /tmp/debug > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2010-01-07, at 13:32, Michael Robbert wrote:> People may want to be a little careful when trying to utilize the > "exports" directory that you mention. I just tried the cat the stats > file for a random client on my MDS and it caused a kernel panic. The > exact command I used was: > cat mgs/MGS/exports/172.16.44.30\@o2ib/stats > > Could this be a known bug or should I file a bug report on it? We > are running Lustre 1.6.7.2 on RedHat 4.5. I didn''t get a full kernel > dump, but I did get a picture of the screen before I rebooted. The > last line of the panic reads: > > RIP [<ffffffff8868a07b.] :obdclass:lprocfs_stats_seq_show+0xcb/0x210This looks like bug 21420, fixed in 1.8.2.> Mike Robbert > > On Dec 4, 2009, at 11:39 PM, Andreas Dilger wrote: > >> On 2009-12-04, at 20:18, Mag Gam wrote: >>> Is it possible to figure out what client is taking up the most I/ >>> O? We >>> have 8 OSS and 200 clients and it seems 5 to 6 clients are taking up >>> all the bandwidth and I am trying to figure out who it is... >> >> >> In newer versions of Lustre (1.8 definitely, and later 1.6.x) there >> is >> an "exports" directory that contains statistics about each client, >> including brw_stats. >> >> Alternately, you can just enable the RPC statistics and then dump the >> debug logs after a some seconds and check for the OST_READ(3) and >> OST_WRITE(4). >> >> lctl set_param debug=+rpctrace >> sleep 20 >> lctl dk /tmp/debug >> grep "Handled.*:[34]$" /tmp/debug >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Will the fix be back-ported into 1.6? -- Ben Evans ben at terascala.com Principal Engineer Terascala Inc. Office: 508-588-1501 x223 www.terascala.com -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Andreas Dilger Sent: Thursday, January 07, 2010 6:10 PM To: Michael Robbert Cc: lustre-discuss at lists.lustre.org; Mag Gam Subject: Re: [Lustre-discuss] client I/O On 2010-01-07, at 13:32, Michael Robbert wrote:> People may want to be a little careful when trying to utilize the > "exports" directory that you mention. I just tried the cat the stats > file for a random client on my MDS and it caused a kernel panic. The > exact command I used was: > cat mgs/MGS/exports/172.16.44.30\@o2ib/stats > > Could this be a known bug or should I file a bug report on it? We > are running Lustre 1.6.7.2 on RedHat 4.5. I didn''t get a full kernel > dump, but I did get a picture of the screen before I rebooted. The > last line of the panic reads: > > RIP [<ffffffff8868a07b.] :obdclass:lprocfs_stats_seq_show+0xcb/0x210This looks like bug 21420, fixed in 1.8.2.> Mike Robbert > > On Dec 4, 2009, at 11:39 PM, Andreas Dilger wrote: > >> On 2009-12-04, at 20:18, Mag Gam wrote: >>> Is it possible to figure out what client is taking up the most I/ >>> O? We >>> have 8 OSS and 200 clients and it seems 5 to 6 clients are taking up >>> all the bandwidth and I am trying to figure out who it is... >> >> >> In newer versions of Lustre (1.8 definitely, and later 1.6.x) there >> is >> an "exports" directory that contains statistics about each client, >> including brw_stats. >> >> Alternately, you can just enable the RPC statistics and then dump the >> debug logs after a some seconds and check for the OST_READ(3) and >> OST_WRITE(4). >> >> lctl set_param debug=+rpctrace >> sleep 20 >> lctl dk /tmp/debug >> grep "Handled.*:[34]$" /tmp/debug >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss