I ask because the lmt project seem to be quite moribund. Anyone else out there doing something? /andreas -- Systems Engineer PDC Center for High Performance Computing CSC School of Computer Science and Communication KTH Royal Institute of Technology SE-100 44 Stockholm, Sweden Phone: 087906658 "A satellite, an earring, and a dust bunny are what made America great!"
We use ganglia with collectl. These versions are the only ones I could find to work in this way: Sep 30 13:35 [root at wn125:~]# rpm -qa |grep collectl collectl-3.4.2-5 Sep 30 13:35 [root at wn125:~]# rpm -qa |grep ganglia ganglia-gmond-3.1.7-1 We are quite happy with it. Thanks, Jason -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Andreas Davour Sent: gioved?, 30. settembre 2010 11:47 To: lustre-discuss at lists.lustre.org Subject: [Lustre-discuss] How do you monitor your lustre? I ask because the lmt project seem to be quite moribund. Anyone else out there doing something? /andreas -- Systems Engineer PDC Center for High Performance Computing CSC School of Computer Science and Communication KTH Royal Institute of Technology SE-100 44 Stockholm, Sweden Phone: 087906658 "A satellite, an earring, and a dust bunny are what made America great!" _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
We use: LMT as well as Ganglia + collectl. Nagios for system health, hardware health, and cluster health (crm_mon -s). Splunk for monitoring and reviewing log messages. Erik On Thu, Sep 30, 2010 at 7:36 AM, Temple Jason <jtemple at cscs.ch> wrote:> We use ganglia with collectl. ?These versions are the only ones I could find to work in this way: > > Sep 30 13:35 [root at wn125:~]# rpm -qa |grep collectl > collectl-3.4.2-5 > Sep 30 13:35 [root at wn125:~]# rpm -qa |grep ganglia > ganglia-gmond-3.1.7-1 > > We are quite happy with it. > > Thanks, > > Jason > > -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Andreas Davour > Sent: gioved?, 30. settembre 2010 11:47 > To: lustre-discuss at lists.lustre.org > Subject: [Lustre-discuss] How do you monitor your lustre? > > > I ask because the lmt project seem to be quite moribund. Anyone else out there > doing something? > > /andreas > -- > Systems Engineer > PDC Center for High Performance Computing > CSC School of Computer Science and Communication > KTH Royal Institute of Technology > SE-100 44 Stockholm, Sweden > Phone: 087906658 > "A satellite, an earring, and a dust bunny are what made America great!" > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Erik, if you use LMT and ganglia+collectl, there must be difference in what is being monitored. Can you give a quick summary or what those differences are? lisa On 9/30/10 8:57 AM, Erik Froese wrote:> We use: > > LMT as well as Ganglia + collectl. > Nagios for system health, hardware health, and cluster health (crm_mon -s). > Splunk for monitoring and reviewing log messages. > > Erik > > On Thu, Sep 30, 2010 at 7:36 AM, Temple Jason<jtemple at cscs.ch> wrote: >> We use ganglia with collectl. These versions are the only ones I could find to work in this way: >> >> Sep 30 13:35 [root at wn125:~]# rpm -qa |grep collectl >> collectl-3.4.2-5 >> Sep 30 13:35 [root at wn125:~]# rpm -qa |grep ganglia >> ganglia-gmond-3.1.7-1 >> >> We are quite happy with it. >> >> Thanks, >> >> Jason >> >> -----Original Message----- >> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Andreas Davour >> Sent: gioved?, 30. settembre 2010 11:47 >> To: lustre-discuss at lists.lustre.org >> Subject: [Lustre-discuss] How do you monitor your lustre? >> >> >> I ask because the lmt project seem to be quite moribund. Anyone else out there >> doing something? >> >> /andreas >> -- >> Systems Engineer >> PDC Center for High Performance Computing >> CSC School of Computer Science and Communication >> KTH Royal Institute of Technology >> SE-100 44 Stockholm, Sweden >> Phone: 087906658 >> "A satellite, an earring, and a dust bunny are what made America great!" >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: lisa.vcf Type: text/x-vcard Size: 276 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100930/c408dc3b/attachment.vcf
the latest lmt (lmt-2.6.4-2) is updated on Sep 17, 2010. Why say moribund? On Thu, Sep 30, 2010 at 5:46 PM, Andreas Davour <davour at pdc.kth.se> wrote:> > I ask because the lmt project seem to be quite moribund. Anyone else out there > doing something? > > /andreas > -- > Systems Engineer > PDC Center for High Performance Computing > CSC School of Computer Science and Communication > KTH Royal Institute of Technology > SE-100 44 Stockholm, Sweden > Phone: 087906658 > "A satellite, an earring, and a dust bunny are what made America great!" > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
A lot of the data is overlaps. The LMT tools do provide more fine grained stats about MDS/T activity though. The ganglia + collectl setup is nice because it presents the lustre data in the context of the rest of the machine activity and also provides multiple levels of resolution (from the cluster of machines down to the specific services). I wouldn''t say the LMT project is moribund. 2.4 was released recently. It could use some more documentation though, especially in the cerebro area. All in all more data about, and views of, the systems is better than less. On Thu, Sep 30, 2010 at 10:08 AM, Larry <tsrjzq at gmail.com> wrote:> the latest lmt (lmt-2.6.4-2) is updated on Sep 17, 2010. Why say moribund? > > On Thu, Sep 30, 2010 at 5:46 PM, Andreas Davour <davour at pdc.kth.se> wrote: >> >> I ask because the lmt project seem to be quite moribund. Anyone else out there >> doing something? >> >> /andreas >> -- >> Systems Engineer >> PDC Center for High Performance Computing >> CSC School of Computer Science and Communication >> KTH Royal Institute of Technology >> SE-100 44 Stockholm, Sweden >> Phone: 087906658 >> "A satellite, an earring, and a dust bunny are what made America great!" >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Thursday 30 September 2010 16:08:08 Larry wrote:> the latest lmt (lmt-2.6.4-2) is updated on Sep 17, 2010. Why say moribund?Well, the code might be released often, but the amount of responses and the quality thereof on the mailing list have been very indicative of a moribund project. Sometimes there''s a slump in activity, though, so I wanted to check here among the serious lustre users if it was in fact used, maybe not just well supported as far as community support is concerned. /andreas -- Systems Engineer PDC Center for High Performance Computing CSC School of Computer Science and Communication KTH Royal Institute of Technology SE-100 44 Stockholm, Sweden Phone: 087906658 "A satellite, an earring, and a dust bunny are what made America great!"
Get all your oss''s to syslog to the mds and then just tail -f /var/log/messages on the mds. I notice problems way before our monitoring software notices. -- Dr Stuart Midgley sdm900 at gmail.com On 30/09/2010, at 17:46 , Andreas Davour wrote:> > I ask because the lmt project seem to be quite moribund. Anyone else out there > doing something? > > /andreas > -- > Systems Engineer > PDC Center for High Performance Computing > CSC School of Computer Science and Communication > KTH Royal Institute of Technology > SE-100 44 Stockholm, Sweden > Phone: 087906658 > "A satellite, an earring, and a dust bunny are what made America great!" > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Make sure you have enough space on the log volume if everyone is logging to your MDS! On Thu, Sep 30, 2010 at 11:32 AM, Stuart Midgley <sdm900 at gmail.com> wrote:> Get all your oss''s to syslog to the mds and then just > > tail -f /var/log/messages > > on the mds. ?I notice problems way before our monitoring software notices. > > > -- > Dr Stuart Midgley > sdm900 at gmail.com > > > > On 30/09/2010, at 17:46 , Andreas Davour wrote: > >> >> I ask because the lmt project seem to be quite moribund. Anyone else out there >> doing something? >> >> /andreas >> -- >> Systems Engineer >> PDC Center for High Performance Computing >> CSC School of Computer Science and Communication >> KTH Royal Institute of Technology >> SE-100 44 Stockholm, Sweden >> Phone: 087906658 >> "A satellite, an earring, and a dust bunny are what made America great!" >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Would''t it be better to log to a non-Lustre machine? If your MDS goes down, you lose all your logs, which would make things a bit trickier. -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Erik Froese Sent: Thursday, September 30, 2010 1:26 PM To: Stuart Midgley Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] How do you monitor your lustre? Make sure you have enough space on the log volume if everyone is logging to your MDS! On Thu, Sep 30, 2010 at 11:32 AM, Stuart Midgley <sdm900 at gmail.com> wrote:> Get all your oss''s to syslog to the mds and then just > > tail -f /var/log/messages > > on the mds. ?I notice problems way before our monitoring software notices. > > > -- > Dr Stuart Midgley > sdm900 at gmail.com > > > > On 30/09/2010, at 17:46 , Andreas Davour wrote: > >> >> I ask because the lmt project seem to be quite moribund. Anyone else out there >> doing something? >> >> /andreas >> -- >> Systems Engineer >> PDC Center for High Performance Computing >> CSC School of Computer Science and Communication >> KTH Royal Institute of Technology >> SE-100 44 Stockholm, Sweden >> Phone: 087906658 >> "A satellite, an earring, and a dust bunny are what made America great!" >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >_______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Yes absolutely. You also run the risk of choking the MDS if the OSSs start freaking out and logging a lot of data. A dedicated syslog/management host is probably the best bet. Erik On Thu, Sep 30, 2010 at 1:31 PM, Ben Evans <Ben.Evans at terascala.com> wrote:> Would''t it be better to log to a non-Lustre machine? ?If your MDS goes down, you lose all your logs, which would make things a bit trickier. > > -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Erik Froese > Sent: Thursday, September 30, 2010 1:26 PM > To: Stuart Midgley > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] How do you monitor your lustre? > > Make sure you have enough space on the log volume if everyone is > logging to your MDS! > > On Thu, Sep 30, 2010 at 11:32 AM, Stuart Midgley <sdm900 at gmail.com> wrote: >> Get all your oss''s to syslog to the mds and then just >> >> tail -f /var/log/messages >> >> on the mds. ?I notice problems way before our monitoring software notices. >> >> >> -- >> Dr Stuart Midgley >> sdm900 at gmail.com >> >> >> >> On 30/09/2010, at 17:46 , Andreas Davour wrote: >> >>> >>> I ask because the lmt project seem to be quite moribund. Anyone else out there >>> doing something? >>> >>> /andreas >>> -- >>> Systems Engineer >>> PDC Center for High Performance Computing >>> CSC School of Computer Science and Communication >>> KTH Royal Institute of Technology >>> SE-100 44 Stockholm, Sweden >>> Phone: 087906658 >>> "A satellite, an earring, and a dust bunny are what made America great!" >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
syslog-ng bob On 9/30/2010 1:31 PM, Ben Evans wrote:> Would''t it be better to log to a non-Lustre machine? If your MDS goes down, you lose all your logs, which would make things a bit trickier. > > -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Erik Froese > Sent: Thursday, September 30, 2010 1:26 PM > To: Stuart Midgley > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] How do you monitor your lustre? > > Make sure you have enough space on the log volume if everyone is > logging to your MDS! > > On Thu, Sep 30, 2010 at 11:32 AM, Stuart Midgley<sdm900 at gmail.com> wrote: >> Get all your oss''s to syslog to the mds and then just >> >> tail -f /var/log/messages >> >> on the mds. I notice problems way before our monitoring software notices. >> >> >> -- >> Dr Stuart Midgley >> sdm900 at gmail.com >> >> >> >> On 30/09/2010, at 17:46 , Andreas Davour wrote: >> >>> I ask because the lmt project seem to be quite moribund. Anyone else out there >>> doing something? >>> >>> /andreas >>> -- >>> Systems Engineer >>> PDC Center for High Performance Computing >>> CSC School of Computer Science and Communication >>> KTH Royal Institute of Technology >>> SE-100 44 Stockholm, Sweden >>> Phone: 087906658 >>> "A satellite, an earring, and a dust bunny are what made America great!" >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Splunk also includes syslog server. Its free as long as you index < 500MB of log data a day. Erik On Thu, Sep 30, 2010 at 1:54 PM, Bob Ball <ball at umich.edu> wrote:> ?syslog-ng > > bob > > On 9/30/2010 1:31 PM, Ben Evans wrote: >> Would''t it be better to log to a non-Lustre machine? ?If your MDS goes down, you lose all your logs, which would make things a bit trickier. >> >> -----Original Message----- >> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Erik Froese >> Sent: Thursday, September 30, 2010 1:26 PM >> To: Stuart Midgley >> Cc: lustre-discuss at lists.lustre.org >> Subject: Re: [Lustre-discuss] How do you monitor your lustre? >> >> Make sure you have enough space on the log volume if everyone is >> logging to your MDS! >> >> On Thu, Sep 30, 2010 at 11:32 AM, Stuart Midgley<sdm900 at gmail.com> ?wrote: >>> Get all your oss''s to syslog to the mds and then just >>> >>> tail -f /var/log/messages >>> >>> on the mds. ?I notice problems way before our monitoring software notices. >>> >>> >>> -- >>> Dr Stuart Midgley >>> sdm900 at gmail.com >>> >>> >>> >>> On 30/09/2010, at 17:46 , Andreas Davour wrote: >>> >>>> I ask because the lmt project seem to be quite moribund. Anyone else out there >>>> doing something? >>>> >>>> /andreas >>>> -- >>>> Systems Engineer >>>> PDC Center for High Performance Computing >>>> CSC School of Computer Science and Communication >>>> KTH Royal Institute of Technology >>>> SE-100 44 Stockholm, Sweden >>>> Phone: 087906658 >>>> "A satellite, an earring, and a dust bunny are what made America great!" >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On the lustre-discuss subject of "is LMT moribund": LMT2 definitely had a burst of activity a few years ago and then dropped off dramatically as the original developers, some of whom were "borrowed" from another group, moved on to other things. I''ve been maintaining it to a degree in my spare time, but I acknowledge that support has been somewhat lacking. A couple of months ago, I began a serious attempt to address some of the issues that come up repeatedly on the lmt-discuss mailing list. This turned into a rewrite of the back end (cerebro and mysql interface) and ltop utility. The new code will appear in lmt3, to be released sometime in October. Highlights of lmt3: * New ltop that works directly with cerebro and has an expanded display. * Auto-configuration of mysql database (lustre config is determied on the fly) * Improved error handling and logging (configurable) * New config file * Code improvements for maintainability Since ltop is now written in C/curses and talks directly to cerebro, the mysql database for storing historical data and the lwatch java client, which can graph historical data, are optional. So in summary, LMT was never abandoned, and although support has been fairly thin and the code somewhat fragile, there are good prospects for that improving in the future. LMT Google code page: http://code.google.com/p/lmt/ Jim Garlick LLNL
Hi, I use collectl with a tool I''ve been writing called msica for charting torque/pbs/moab/collectl data. The charts are all defined in XML with plugins for supplying data, charting the data, applying functions to the data, changing how the data is displayed, etc. Eventually I want to make it into an interactive data exploration tool, but right now it''s all command-line driven. It''s open source, but I haven''t got it into a 1.0 release state yet (maybe in the next couple of weeks). If you are interested in looking, the current trunk code is available at: http://code.google.com/p/msica Here''s some sample output I took while doing sweeping IOR runs across our OSTs (test repeated 5 times): http://www.msi.umn.edu/~mark/msica/2GB-block_64MB_directIO_posix_nocache.png And some parsing of moab logs to monitor job statistics on one of our smaller clusters: http://www.msi.umn.edu/~mark/msica/mirror_20090928-20100927.png Mark On 09/30/2010 04:46 AM, Andreas Davour wrote:> > I ask because the lmt project seem to be quite moribund. Anyone else out there > doing something? > > /andreas-- Mark Nelson, Lead Software Developer Minnesota Supercomputing Institute Phone: (612)626-4479 Email: mark at msi.umn.edu
Hey Jim, This is great news. THanks for all of your hard work. Erik On Thu, Sep 30, 2010 at 4:08 PM, Jim Garlick <garlick at llnl.gov> wrote:> On the lustre-discuss subject of "is LMT moribund": > > LMT2 definitely had a burst of activity a few years ago and then dropped > off dramatically as the original developers, some of whom were "borrowed" > from another group, moved on to other things. ?I''ve been maintaining it > to a degree in my spare time, but I acknowledge that support has been > somewhat lacking. > > A couple of months ago, I began a serious attempt to address some of the > issues that come up repeatedly on the lmt-discuss mailing list. ?This turned > into a rewrite of the back end (cerebro and mysql interface) and ltop utility. > The new code will appear in lmt3, to be released sometime in October. > > Highlights of lmt3: > * New ltop that works directly with cerebro and has an expanded display. > * Auto-configuration of mysql database (lustre config is determied on the fly) > * Improved error handling and logging (configurable) > * New config file > * Code improvements for maintainability > > Since ltop is now written in C/curses and talks directly to cerebro, > the mysql database for storing historical data and the lwatch java client, > which can graph historical data, are optional. > > So in summary, LMT was never abandoned, and although support has been > fairly thin and the code somewhat fragile, there are good prospects for > that improving in the future. > > LMT Google code page: http://code.google.com/p/lmt/ > > Jim Garlick > LLNL > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Thursday 30 September 2010 22:08:55 Jim Garlick wrote:> On the lustre-discuss subject of "is LMT moribund": > > LMT2 definitely had a burst of activity a few years ago and then dropped > off dramatically as the original developers, some of whom were "borrowed" > from another group, moved on to other things. I''ve been maintaining it > to a degree in my spare time, but I acknowledge that support has been > somewhat lacking. > > A couple of months ago, I began a serious attempt to address some of the > issues that come up repeatedly on the lmt-discuss mailing list. This > turned into a rewrite of the back end (cerebro and mysql interface) and > ltop utility. The new code will appear in lmt3, to be released sometime in > October. > > Highlights of lmt3: > * New ltop that works directly with cerebro and has an expanded display. > * Auto-configuration of mysql database (lustre config is determied on the > fly) * Improved error handling and logging (configurable) > * New config file > * Code improvements for maintainability > > Since ltop is now written in C/curses and talks directly to cerebro, > the mysql database for storing historical data and the lwatch java client, > which can graph historical data, are optional. > > So in summary, LMT was never abandoned, and although support has been > fairly thin and the code somewhat fragile, there are good prospects for > that improving in the future.Thanks Jim. Nice summary. /andreas -- Systems Engineer PDC Center for High Performance Computing CSC School of Computer Science and Communication KTH Royal Institute of Technology SE-100 44 Stockholm, Sweden Phone: 087906658 "A satellite, an earring, and a dust bunny are what made America great!"