Adrian Ulrich
2010-Jul-28 13:38 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
First: Sorry for the shameless self advertising, but... I uploaded two lustre-related modules to the CPAN: #1: Lustre::Info provides easy access to information located at /proc/fs/lustre, it also comes with a ''performance monitoring'' script called ''lustre-info.pl'' #2 Lustre::LFS offers IO::Dir and IO::File-like filehandles but with additional lustre-specific features ($dir_fh->set_stripe...) Examples and details: Lustre::Info and lustre-info.pl --------------------------------------- Lustre::Info provides a Perl-OO interface to lustres procfs information. (confusing) example code to get the blockdevice of all OSTs: ######################################################### my $l = Lustre::Info->new; print join("\n", map( { $l->get_ost($_)->get_name.": ".$l->get_ost($_)->get_blockdevice } \ @{$l->get_ost_list}), '''' ) if $l->is_ost; ######################################################### ..output: $ perl test.pl lustre1-OST001e: /dev/md17 lustre1-OST0016: /dev/md15 lustre1-OST000e: /dev/md13 lustre1-OST0006: /dev/md11 The module also includes a script called ''lustre-info.pl'' that can be used to gather some live performance statistics: Use `--ost-stats'' to get a quick overview on what''s going on: $ lustre-info.pl --ost-stats lustre1-OST0006 (@ /dev/md11) : write= 5.594 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST000e (@ /dev/md13) : write= 3.997 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 4.0 R/s lustre1-OST0016 (@ /dev/md15) : write= 5.502 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST001e (@ /dev/md17) : write= 5.905 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.7 R/s You can also get client<->ost details via `--monitor=MODE'' $ lustre-info.pl --monitor=ost --as-list # this will only show clients where read+write >= 1MB/s> client nid | lustre1-OST0006 | lustre1-OST000e | lustre1-OST0016 | lustre1-OST001e | +++ TOTALS +++ (MB/s)10.201.46.25 at o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 1.1 | read= 0.0, write= 1.1 10.201.47.27 at o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 1.2 | r= 0.0, w= 2.0 | r= 0.0, w= 0.0 | read= 0.0, write= 3.2 There are many more options, checkout `lustre-info.pl --help'' for details! Lustre::LFS::Dir and Lustre::LFS::File --------------------------------------- This two packages behave like IO::File and IO::Dir but both of them add some lustre-only features to the returned filehandle. Quick example: my $fh = Lustre::LFS::File; # $fh is a normal IO::File-like FH $fh->open("> test") or die; print $fh "Foo Bar!\n"; my $stripe_info = $fh->get_stripe or die "Not on a lustre filesystem?!\n"; Keep in mind that both Lustre modules are far from being complete: Lustre::Info really needs some MDT support and Lustre::LFS is just a wrapper for /usr/bin/lfs: An XS-Version would be much better. But i''d love to hear some feedback if someone decides to play around with this modules + lustre-info.pl :-) Cheers, Adrian
Larry
2010-Jul-29 13:00 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
good job, I''ll download and learn from them. On Wed, Jul 28, 2010 at 9:38 PM, Adrian Ulrich <adrian at blinkenlights.ch> wrote:> First: Sorry for the shameless self advertising, but... > > I uploaded two lustre-related modules to the CPAN: > > #1: Lustre::Info provides easy access to information located > ? ?at /proc/fs/lustre, it also comes with a ''performance monitoring'' > ? ?script called ''lustre-info.pl'' > > #2 Lustre::LFS offers IO::Dir and IO::File-like filehandles but > ? with additional lustre-specific features ($dir_fh->set_stripe...) > > > Examples and details: > > Lustre::Info and lustre-info.pl > --------------------------------------- > > Lustre::Info provides a Perl-OO interface to lustres procfs information. > > (confusing) example code to get the blockdevice of all OSTs: > > ?######################################################### > ?my $l = Lustre::Info->new; > ?print join("\n", map( { $l->get_ost($_)->get_name.": ".$l->get_ost($_)->get_blockdevice } \ > ? ? ? ? ? ? ? ? ? ? ? ?@{$l->get_ost_list}), '''' ) if $l->is_ost; > ?######################################################### > > ..output: > ?$ perl test.pl > ?lustre1-OST001e: /dev/md17 > ?lustre1-OST0016: /dev/md15 > ?lustre1-OST000e: /dev/md13 > ?lustre1-OST0006: /dev/md11 > > The module also includes a script called ''lustre-info.pl'' that can > be used to gather some live performance statistics: > > Use `--ost-stats'' to get a quick overview on what''s going on: > $ lustre-info.pl --ost-stats > ?lustre1-OST0006 (@ /dev/md11) : ?write= ? 5.594 MB/s, read= ? 0.000 MB/s, create= ?0.0 R/s, destroy= ?0.0 R/s, setattr= ?0.0 R/s, preprw= ?6.0 R/s > ?lustre1-OST000e (@ /dev/md13) : ?write= ? 3.997 MB/s, read= ? 0.000 MB/s, create= ?0.0 R/s, destroy= ?0.0 R/s, setattr= ?0.0 R/s, preprw= ?4.0 R/s > ?lustre1-OST0016 (@ /dev/md15) : ?write= ? 5.502 MB/s, read= ? 0.000 MB/s, create= ?0.0 R/s, destroy= ?0.0 R/s, setattr= ?0.0 R/s, preprw= ?6.0 R/s > ?lustre1-OST001e (@ /dev/md17) : ?write= ? 5.905 MB/s, read= ? 0.000 MB/s, create= ?0.0 R/s, destroy= ?0.0 R/s, setattr= ?0.0 R/s, preprw= ?6.7 R/s > > > You can also get client<->ost details via `--monitor=MODE'' > > $ lustre-info.pl --monitor=ost --as-list ?# this will only show clients where read+write >= 1MB/s >> client nid ? ? ? | lustre1-OST0006 ? ?| lustre1-OST000e ? ?| lustre1-OST0016 ? ?| lustre1-OST001e ? ?| +++ TOTALS +++ (MB/s) > 10.201.46.25 at o2ib ?| r= ? 0.0, w= ? 0.0 | r= ? 0.0, w= ? 0.0 | r= ? 0.0, w= ? 0.0 | r= ? 0.0, w= ? 1.1 | read= ? 0.0, write= ? 1.1 > 10.201.47.27 at o2ib ?| r= ? 0.0, w= ? 0.0 | r= ? 0.0, w= ? 1.2 | r= ? 0.0, w= ? 2.0 | r= ? 0.0, w= ? 0.0 | read= ? 0.0, write= ? 3.2 > > > There are many more options, checkout `lustre-info.pl --help'' for details! > > > Lustre::LFS::Dir and Lustre::LFS::File > --------------------------------------- > > This two packages behave like IO::File and IO::Dir but both of > them add some lustre-only features to the returned filehandle. > > Quick example: > ?my $fh = Lustre::LFS::File; # $fh is a normal IO::File-like FH > ?$fh->open("> test") or die; > ?print $fh "Foo Bar!\n"; > ?my $stripe_info = $fh->get_stripe or die "Not on a lustre filesystem?!\n"; > > > > Keep in mind that both Lustre modules are far from being complete: > Lustre::Info really needs some MDT support and Lustre::LFS is just a > wrapper for /usr/bin/lfs: An XS-Version would be much better. > > But i''d love to hear some feedback if someone decides to play around > with this modules + lustre-info.pl :-) > > > Cheers, > ?Adrian > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Frederik Ferner
2010-Jul-29 15:08 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Adrian, thanks for sharing these with us. Adrian Ulrich wrote:> I uploaded two lustre-related modules to the CPAN: > > #1: Lustre::Info provides easy access to information located > at /proc/fs/lustre, it also comes with a ''performance monitoring'' > script called ''lustre-info.pl''I did have a bit of a play with the lustre-info.pl script on our test file system and it seems to work nicely. If you''ve got a lot of OSTs on your server you need a wide monitor for some of the options like --monitor=ost-patterns for all OSTs... We are currently running Lustre 1.6.7.2 (+ a few patches) on our OSTs, in case this makes a difference for my issues below. [snip]> Examples and details: > > Lustre::Info and lustre-info.pl > ---------------------------------------[snip]> The module also includes a script called ''lustre-info.pl'' that can > be used to gather some live performance statistics: > > Use `--ost-stats'' to get a quick overview on what''s going on: > $ lustre-info.pl --ost-statsIn our case this looks like this (on a very quiet file system):> play01-OST0000 (@ /dev/sdb) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 0.0 R/s > play01-OST0001 (@ /dev/sdc) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/sUse of uninitialized value in division (/) at /usr/local/bin/lustre-info.pl line 187. > , setattr= 0.0 R/s, preprw= 0.0 R/s > play01-OST0002 (@ /dev/sdd) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 0.0 R/s > play01-OST0003 (@ /dev/sde) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 0.0 R/s > play01-OST0004 (@ /dev/sdf) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/sUse of uninitialized value in division (/) at /usr/local/bin/lustre-info.pl line 187. > , setattr= 0.0 R/s, preprw= 0.0 R/s > play01-OST0005 (@ /dev/sdg) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/sUse of uninitialized value in division (/) at /usr/local/bin/lustre-info.pl line 187. > , setattr= 0.0 R/s, preprw= 0.0 R/sNote the ''Use of uninitialized value in division...'' errors. Looking at the code it seems the value for ''setattr'' is missing from the stats file for some of our OSTs. Looking at the stats file, indeed the setattr line is missing for some OSTs. Has anyone seen this before? What could have caused this?> You can also get client<->ost details via `--monitor=MODE'' > > $ lustre-info.pl --monitor=ost --as-list # this will only show clients where read+write >= 1MB/s >> client nid | lustre1-OST0006 | lustre1-OST000e | lustre1-OST0016 | lustre1-OST001e | +++ TOTALS +++ (MB/s) > 10.201.46.25 at o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 1.1 | read= 0.0, write= 1.1 > 10.201.47.27 at o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 1.2 | r= 0.0, w= 2.0 | r= 0.0, w= 0.0 | read= 0.0, write= 3.2''lustre-info.pl --monitor=io-size'' seems to sit at "collecting data, please wait..." for a very long time until I killed it, I have not had the time to debug this yet. Kind regards, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.)
Andreas Dilger
2010-Jul-29 16:33 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
On 2010-07-29, at 09:08, Frederik Ferner wrote:> Note the ''Use of uninitialized value in division...'' errors. Looking at > the code it seems the value for ''setattr'' is missing from the stats file > for some of our OSTs. Looking at the stats file, indeed the setattr line > is missing for some OSTs. > > Has anyone seen this before? What could have caused this?The statistics code for OBD devices is generic. For operations that have never been done on a particular target there is no stats value printed. Otherwise, the "stats" file would be 60 lines long and mostly be filled with counters that are all "0". Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Tina Friedrich
2010-Jul-29 16:55 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Thanks Andreas, that explains that. So the warning can be made to go by changing line 183 in lustre-info.pl from printf(", %s=%5.1f R/s",$type,$stats->{$type}/$slice) to be if (exists $stats->{$type}) { printf(", %s=%5.1f R/s",$type,$stats->{$type}/$slice); } (patch attached) Tina Andreas Dilger wrote:> On 2010-07-29, at 09:08, Frederik Ferner wrote: >> Note the ''Use of uninitialized value in division...'' errors. Looking at >> the code it seems the value for ''setattr'' is missing from the stats file >> for some of our OSTs. Looking at the stats file, indeed the setattr line >> is missing for some OSTs. >> >> Has anyone seen this before? What could have caused this? > > The statistics code for OBD devices is generic. For operations that have never been done on a particular target there is no stats value printed. Otherwise, the "stats" file would be 60 lines long and mostly be filled with counters that are all "0". > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd Diamond House, Harwell Science and Innovation Campus - 01235 77 8442 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lustre-info.pl_patch Url: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100729/aa8e61a3/attachment.pl
Adrian Ulrich
2010-Jul-29 17:02 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Frederik,> If you''ve got a lot of OSTs on your server you need a wide monitor for some > of the options like --monitor=ost-patterns for all OSTs...The output format is not ideal, but it''s a good reason to upgrade your workstation to a dualhead configuration ;-)> Looking at the code it seems the value for ''setattr'' is missing from the stats file > for some of our OSTs. Looking at the stats file, indeed the setattr line > is missing for some OSTs.As Andreas already said: If ''setattr'' is missing, there was no setattr operation (yet). Changing > printf(", %s=%5.1f R/s",$type,$stats->{$type}/$slice); into > printf(", %s=%5.1f R/s",$type,(($stats->{$type}||0)/$slice) ); should fix the warning. (The totals are ok, because in perl undef/$x == 0/$x)> ''lustre-info.pl --monitor=io-size'' seems to sit at "collecting data, > please wait..." for a very long time until I killed it, I have not had > the time to debug this yet.I never tested it with anything else than 1.8.1.1 but this should be trivial to fix: Could you mail me the output of /proc/fs/lustre/obdfilter/##SOME_OST##/exports/##A_RANDOM_NID##/brw_stats ? Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.
Frederik Ferner
2010-Jul-29 17:11 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Adrian Ulrich wrote:>> If you''ve got a lot of OSTs on your server you need a wide monitor for some >> of the options like --monitor=ost-patterns for all OSTs... > > The output format is not ideal, but it''s a good reason to upgrade your > workstation to a dualhead configuration ;-);-) [snip]>> ''lustre-info.pl --monitor=io-size'' seems to sit at "collecting data, >> please wait..." for a very long time until I killed it, I have not had >> the time to debug this yet. > > I never tested it with anything else than 1.8.1.1 but this should be > trivial to fix: > > Could you mail me the output of > /proc/fs/lustre/obdfilter/##SOME_OST##/exports/##A_RANDOM_NID##/brw_stats ?See attached. Thanks! Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: oss8_sample_brw_stats Url: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100729/425fba90/attachment.pl
Adrian Ulrich
2010-Jul-30 07:42 UTC
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Frederik,> ''lustre-info.pl --monitor=io-size'' seems to sit at "collecting data,`io-size'' reads its data from the ''disk I/O size'' part of brw_stats:> read | write > disk I/O size ios % cum % | ios % cum %..and in your case there are no stats, (for reasons unknown to me...) that''s why lustre-info.pl cannot display anything. Otherwise the file looks fine: eg. `--monitor=io-time'' should work. Regards, Adrian