Hi, After a discussion started on lustre-discuss@ [1], I''d like to join other users [2] to make an official feature request about the Lustre SNMP module. I believe it could be extremely useful for Lustre systems administrators to get more than just the number of free space and available objects from the SNMP module. For instance, it could be interesting to get the following live stats through SNMP: on clients: /proc/fs/lustre/llite/*/stats on OSSes: /proc/fs/lustre/obdfilter/*/stats on MDSes: /proc/fs/lustre/mds/*/stats on MDSes. But it would be especially interesting to not limit the SNMPable values to just a subset of what''s available in /proc/fs/lustre. Since it looks like some work has begun to rework the Lustre /proc structure [3], maybe it would be the right opportunity to incorporate SNMP more closely into the new UI. The idea being to translate everything available in /proc into SNMP variables, so that future variables could be exported too, without having to explicitly add them to the SNMP code. I have little idea on how easily this can be achieved, but that would be an excellent foundation stone for next-to-come Lustre monitoring systems. [1]http://lists.lustre.org/pipermail/lustre-discuss/2008-March/005277.html [2]http://lists.lustre.org/pipermail/lustre-devel/2008-January/001504.html, and bug #14729 [3]http://lists.lustre.org/pipermail/lustre-devel/2008-January/001475.html Thanks! -- Kilian PS: I also created bug #15197 to keep track of this.
patrice.lucas at cea.fr
2008-Mar-12 14:00 UTC
[Lustre-devel] Feature request: expand SNMP scope
Hi,> After a discussion started on lustre-discuss@ [1], I''d like to join > other users [2] to make an official feature request about the Lustre > SNMP module. > > I believe it could be extremely useful for Lustre systems administrators > to get more than just the number of free space and available objects > from the SNMP module. For instance, it could be interesting to get the > following live stats through SNMP: > on clients: /proc/fs/lustre/llite/*/stats > on OSSes: /proc/fs/lustre/obdfilter/*/stats > on MDSes: /proc/fs/lustre/mds/*/stats on MDSes.Kilian, as you noticed from my previous mail and patch, I definitely agree with you.> > But it would be especially interesting to not limit the SNMPable values > to just a subset of what''s available in /proc/fs/lustre. Since it looks > like some work has begun to rework the Lustre /proc structure [3], > maybe it would be the right opportunity to incorporate SNMP more > closely into the new UI. The idea being to translate everything > available in /proc into SNMP variables, so that future variables could > be exported too, without having to explicitly add them to the SNMP > code. > > I have little idea on how easily this can be achieved, but that would be > an excellent foundation stone for next-to-come Lustre monitoring > systems.In the patch "bug #14729", I just add a new external access from the snmp agent to a /proc entry . I create this patch as an instance of what could be easyly done. The goal was to start to discuss around this need of improving access to monitoring data. This patch was accepted by Lustre team but without discussion. This method is not integrated to the inner Lustre code. If people change /proc entries, the snmp agent code must clearly be rewrite. I agree with you when you emphasize the need to link the snmp code to the rest of the Lustre development. From a more integrated point of view, do you think it could be a good idea to benefit from Lustre itself to deliver monitoring data ? Lustre is a parallel filesystem. Data delivered by Lustre can be accessed by remote client. Instead of using "/proc", can Lustre benefits from its capability of distributed filesystem to deliver monitoring data ? By doing that, we could lose the advantage of snmp to interface with many available common snmp network monitoring tools.> > [1]http://lists.lustre.org/pipermail/lustre-discuss/2008-March/005277.html > [2]http://lists.lustre.org/pipermail/lustre-devel/2008-January/001504.html, > and bug #14729 > [3]http://lists.lustre.org/pipermail/lustre-devel/2008-January/001475.html > > Thanks! > -- > Kilian > > PS: I also created bug #15197 to keep track of this.Thanks, Patrice LUCAS
Bonjour Patrice, On Wednesday 12 March 2008 07:00:05 am patrice.lucas at cea.fr wrote:> This method is not > integrated to the inner Lustre code. If people change /proc entries, > the snmp agent code must clearly be rewrite. I agree with you when > you emphasize the need to link the snmp code to the rest of the > Lustre development.Yes, that''s what Brian first pointed out, and I think that''s really the cornerstone here. Manually editing the SNMP code and the corresponding MIB files each time a new metric is added, removed or renamed, will rapidly get to be a nightmare.> From a more integrated point of view, do you think it could be a > good idea to benefit from Lustre itself to deliver monitoring data ? > Lustre is a parallel filesystem. Data delivered by Lustre can be > accessed by remote client. Instead of using "/proc", can Lustre > benefits from its capability of distributed filesystem to deliver > monitoring data ? By doing that, we could lose the advantage of snmp > to interface with many available common snmp network monitoring > tools.Well, yes, actually, that sounds like a very reasonnable approach too. The main advantages for SNMP, from my standpoint are the following: 1. It''s a network protocol, so the monitored system doesn''t have to be the same as the monitoring one. This allows remote collection of metrics, aggregation, and central administration. 2. It''s an industry standard (even if vendors sometimes tend to have a proprietary interpretation of what is a ''standard''), so it can be used across a large variety of monitoring systems. Interoperability is always a good thing But only point 1. is really required to allow easier Lustre monitoring. If all the lnet/client/oss/mds data could be accessed from clients, that would be enough. One specific client (potentially patchless) could be dedicated for monitoring with almost the same advantage as a SNMP host. That looks like the OFED approach: SNMP is not a priority for OpenFabrics, since the IB counters from all over the fabric can be gathered with a single perfquery, from a simple IB node. And this may also be easier to implement than mapping SNMP exports to the Lustre stats files. Cheers, -- Kilian
On 3/12/08 1:51 PM, "Kilian CAVALOTTI" <kilian at stanford.edu> wrote:> Bonjour Patrice, > > On Wednesday 12 March 2008 07:00:05 am patrice.lucas at cea.fr wrote: >> This method is not >> integrated to the inner Lustre code. If people change /proc entries, >> the snmp agent code must clearly be rewrite. I agree with you when >> you emphasize the need to link the snmp code to the rest of the >> Lustre development. > > Yes, that''s what Brian first pointed out, and I think that''s really the > cornerstone here. Manually editing the SNMP code and the corresponding > MIB files each time a new metric is added, removed or renamed, will > rapidly get to be a nightmare. > >> From a more integrated point of view, do you think it could be a >> good idea to benefit from Lustre itself to deliver monitoring data ? >> Lustre is a parallel filesystem. Data delivered by Lustre can be >> accessed by remote client. Instead of using "/proc", can Lustre >> benefits from its capability of distributed filesystem to deliver >> monitoring data ? By doing that, we could lose the advantage of snmp >> to interface with many available common snmp network monitoring >> tools.There are already some /proc files for Lustre that actually make an RPC when read. We have talked often about greatly enlarging this and in addition letting servers also report on the client state. So a monitoring node would poll servers and servers would export their own data including data for each client that is connected to the server. Generating SNMP info from this is then easy, and it would hook very nicely into the various management tools too, and work on non-IP networked computers (if there are any left). - Peter -> > Well, yes, actually, that sounds like a very reasonnable approach too. > The main advantages for SNMP, from my standpoint are the following: > > 1. It''s a network protocol, so the monitored system doesn''t have to be > the same as the monitoring one. This allows remote collection of > metrics, aggregation, and central administration. > > 2. It''s an industry standard (even if vendors sometimes tend to have a > proprietary interpretation of what is a ''standard''), so it can be > used across a large variety of monitoring systems. Interoperability > is always a good thing > > But only point 1. is really required to allow easier Lustre monitoring. > If all the lnet/client/oss/mds data could be accessed from clients, > that would be enough. One specific client (potentially patchless) could > be dedicated for monitoring with almost the same advantage as a SNMP > host. > > That looks like the OFED approach: SNMP is not a priority for > OpenFabrics, since the IB counters from all over the fabric can be > gathered with a single perfquery, from a simple IB node. > > And this may also be easier to implement than mapping SNMP exports to > the Lustre stats files. > > Cheers,