I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up. Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes. I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case. So my question is: does anyone know of an existing too that does this sort of thing? Cheers Tony -- Tony Mountifield Work: tony at softins.co.uk - http://www.softins.co.uk Play: tony at mountifield.org - http://tony.mountifield.org
Tony Mountifield wrote:> I have a small number of boxes in different locations, and currently have > a fairly crude cron job running on each, which does a ping of one or more > of the other boxes, and if the ping fails, it emails me to say the other > box might be down. It then emails me again the next time the other box > appears to be up. > > Of course, this can't distinguish between the remote box really being down > and there being a network problem somewhere between the local and remote > boxes. > > I've been mulling over the idea of a more sophisticated scheme, where > a number of boxes send each other messages, indicating not only their > presence, but which other boxes they believe to be up. Then if a box > goes down, the other boxes all see it has gone and agree that it really > is down. However, if there is instead a network outage or routing flap > so that a box is reachable from some places but not all, it might be > possible to distinguish this case. > > So my question is: does anyone know of an existing too that does this > sort of thing? > > Cheers > Tony > >Tony, Nagios, maybe. http://www.nagios.org/ Not familiar with it, but there has been a lot of talk on the list. Bob...
Tony Mountifield wrote:> I have a small number of boxes in different locations, and currently have > a fairly crude cron job running on each, which does a ping of one or more > of the other boxes, and if the ping fails, it emails me to say the other > box might be down. It then emails me again the next time the other box > appears to be up. > > Of course, this can't distinguish between the remote box really being down > and there being a network problem somewhere between the local and remote > boxes. > > I've been mulling over the idea of a more sophisticated scheme, where > a number of boxes send each other messages, indicating not only their > presence, but which other boxes they believe to be up. Then if a box > goes down, the other boxes all see it has gone and agree that it really > is down. However, if there is instead a network outage or routing flap > so that a box is reachable from some places but not all, it might be > possible to distinguish this case. > > So my question is: does anyone know of an existing too that does this > sort of thing?It might be overkill for this case, but OpenNMS (http://www.opennms.org) has a concept of "path outage" to limit the notifications for things past a network link that is down. Plus it can maintain graphs of any values you can obtain via snmp, like bandwidth and CPU use. -- Les Mikesell lesmikesell at gmail.com
From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On Behalf Of Tony Mountifield Sent: Tuesday, May 29, 2007 3:24 AM To: centos at centos.org Subject: [CentOS] Remote system up/down monitoring tool? I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up. Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes. I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case. So my question is: does anyone know of an existing too that does this sort of thing? Cheers Tony -- Tony Mountifield Work: tony at softins.co.uk - http://www.softins.co.uk Play: tony at mountifield.org - http://tony.mountifield.org _______________________________________________ Check out Hobbit, supports many platforms (son of Big Brother): http://sourceforge.net/projects/hobbit/ Frank M. Ramaekers Jr. Systems Programmer; MCP, MCP+I, MCSE & RHCE American Income Life Insurance Company Phone: (254) 761-6649 Fax: (254) 741-5777 ---------------------------------------- This message contains information which is privileged and confidential and is solely for the use of the intended recipient. If you are not the intended recipient, be aware that any review, disclosure, copying, distribution, or use of the contents of this message is strictly prohibited. If you have received this in error, please destroy it immediately and notify us at PrivacyAct at ailife.com. ---------------------------------------- This message contains information which is privileged and confidential and is solely for the use of the intended recipient. If you are not the intended recipient, be aware that any review, disclosure, copying, distribution, or use of the contents of this message is strictly prohibited. If you have received this in error, please destroy it immediately and notify us at PrivacyAct at ailife.com.
On 5/29/07, Tony Mountifield <tony at softins.clara.co.uk> wrote:> So my question is: does anyone know of an existing too that does this > sort of thing?Perhaps running SmokePing on multiple systems? http://oss.oetiker.ch/smokeping/ -- Dave K Unix Systems & Network Administrator Mount Laurel NJ
On May 29, 2007, at 4:24 PM, Tony Mountifield wrote:> I have a small number of boxes in different locations, and > currently have > a fairly crude cron job running on each, which does a ping of one > or more > of the other boxes, and if the ping fails, it emails me to say the > other > box might be down. It then emails me again the next time the other box > appears to be up. > > Of course, this can't distinguish between the remote box really > being down > and there being a network problem somewhere between the local and > remote > boxes. > > I've been mulling over the idea of a more sophisticated scheme, where > a number of boxes send each other messages, indicating not only their > presence, but which other boxes they believe to be up. Then if a box > goes down, the other boxes all see it has gone and agree that it > really > is down. However, if there is instead a network outage or routing flap > so that a box is reachable from some places but not all, it might be > possible to distinguish this case. > > So my question is: does anyone know of an existing too that does this > sort of thing? > > Cheers > TonyNagios does this... although it can be a bit much to configure. And what you're particularly looking for seems to be "dependency" support, ie If your gateway is down, you don't want to be notified that every server you have to connect through that gateway is also down. A nice basic tutorial for Nagios I found is at: http://www2.maxsworld.org/howtos/nagios.html It doesn't delve on dependencies too much, but it shouldn't be that difficult. dex ---------- Mobile: +63 (917) 5357191, Office: +63 (2) 6312718 i4 Asia Incorporated - http://www.i4asiacorp.com/
Tony Mountifield
2007-May-30  07:35 UTC
[CentOS] Re: Remote system up/down monitoring tool? [SUMMARY]
In article <f3gnv9$k18$1 at softins.clara.co.uk>, Tony Mountifield <tony at softins.clara.co.uk> wrote:> > So my question is: does anyone know of an existing too that does this > sort of thing?Thanks for all the responses. To summarise: I had several recommendations for Nagios, some for Hobbit, and one each for OpenNMS and SmokePing. Time to go and investigate them! Cheers Tony -- Tony Mountifield Work: tony at softins.co.uk - http://www.softins.co.uk Play: tony at mountifield.org - http://tony.mountifield.org