thr3ads.net - Puppet users - [Puppet Users] MCollective discovery

If this information is useful, please help other people find it:
Share via:

Jens Braeuer

2011-Dec-06 10:49 UTC

[Puppet Users] MCollective discovery - we did not discover any nodes

Hi everyone,

I run MCollective 1.2.1 together with ActiveMQ 5.5 under Scientific Linux 6.1 on
Amazon EC2. Overall it works like a
charm, but sometimes (eg. 1/30) discovery fails. Still the exit-code of mco will
be 0, which is a problem for me as I
use MCollective e.g. to trigger deployments from Jenkins.

I would like to ask for some feedback on the following ideas, that could fix
this problem.

a) Increase discovery timeout
mco offers a option to tweak the discovery timeout. What is your experience with
increasing this value? When running
"mco ping", I see ping times of 130ms, so 2 seconds (the default
should be enough), or?
Is there a way to configure is global?

b) Mco should exit != 0 when no nodes are found
I would like to see a "--batch" or "--non-interactive" mode,
where mco has a exit code different from 0, when no nodes
are found.

c) Add "expected count" to mco command
I thing there are some situation, where one knows the number of MCollective
nodes. So what about adding a options
"--expect-number" option to mco, where I can either give a count or
range of expected nodes.

d) Is this normal at all?
I have no experience with MCollective in a datacenter, so: Is this problem
cloud/EC2 related or does it happen in
non-cloud setups too? How could I debug what makes the discovery fail?

I really appreciate your feedback. You can find me a jbraeuer on #puppet.
Cheers,
Jens

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

R.I.Pienaar

2011-Dec-07 09:56 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

hello,

----- Original Message -----> Hi everyone,
> 
> I run MCollective 1.2.1 together with ActiveMQ 5.5 under Scientific
> Linux 6.1 on Amazon EC2. Overall it works like a
> charm, but sometimes (eg. 1/30) discovery fails. Still the exit-code
> of mco will be 0, which is a problem for me as I
> use MCollective e.g. to trigger deployments from Jenkins.
does mco ping exhibit the same behavior if you run it often?  Does it
tend to happen more after a period of the collective being idle or just
really randomly?
> 
> I would like to ask for some feedback on the following ideas, that
> could fix this problem.
> 
> a) Increase discovery timeout
> mco offers a option to tweak the discovery timeout. What is your
> experience with increasing this value? When running
> "mco ping", I see ping times of 130ms, so 2 seconds (the default
> should be enough), or?
> Is there a way to configure is global?
it''s not global - its a client setting but with those ping times it 
should be sufficient.  Discovery does exactly what mco ping does so its
a good way to diagnose

Might be worth enabling verbose gc logging on your activemq its possible
that during these times it just did a big full garbage collection which
would block it and that might indicate some tuning is needed
> 
> b) Mco should exit != 0 when no nodes are found
> I would like to see a "--batch" or "--non-interactive"
mode, where
> mco has a exit code different from 0, when no nodes
> are found.
ok, you can file tickets for this
> c) Add "expected count" to mco command
> I thing there are some situation, where one knows the number of
> MCollective nodes. So what about adding a options
> "--expect-number" option to mco, where I can either give a count
or
> range of expected nodes.
mcollective 1.3.x which will soon become 2.0 have a new mode of communications
where you can provide it a host list etc and it will bypass discovery, this
is ment to be used for things like deployers where you know what machines you
wish to affect and it will probably help
> 
> d) Is this normal at all?
> I have no experience with MCollective in a datacenter, so: Is this
> problem cloud/EC2 related or does it happen in
> non-cloud setups too? How could I debug what makes the discovery
> fail?
it shouldn''t happen, I''ve seen it happen:

 - activemq doing long full garbage collections
 - network is interrupted after long periods of idle time
 - activemq is idle for a long time and was swapped etc
 - you have very busy machines that do not respond at all - unlikely in your
case

there are probably other reasons too but these are the rough likely causes.
Amazon has a pretty aggressive idle connection timeout though so you might
enable registration just to keep the stomp connections from being idle too long

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Jens Braeuer

2011-Dec-08 14:50 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Hi everyone,

thank R.I.P. for the pointers. I already planed to add more monitoring to
ActiveMQ, so I''ll take this as a chance. I
dont think Amazon network settings are the root cause in my case, as I have
registration agent enabled on all machines.
This should keep the connection busy.

I filed a bug regarding the exit-code. Is there any chance the fix will make it
in 1.2.2?

Thanks,
Jens

-- 
Jens Bräuer
Senior Systems Engineer
Dipl. Inf.
NumberFour AG
Schönhauser Allee 8
10119 Berlin
Germany
Mobile: +49 175 221 88 34
Phone: +49 30 40505411
Fax: +49 30 40505410
jens@numberfour.eu
 
numberfour.eu
facebook.com/NumberFour
twitter.com/numberfourag

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

R.I.Pienaar

2011-Dec-08 14:53 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

----- Original Message -----> Hi everyone,
> 
> thank R.I.P. for the pointers. I already planed to add more
> monitoring to ActiveMQ, so I''ll take this as a chance. I
> dont think Amazon network settings are the root cause in my case, as
> I have registration agent enabled on all machines.
> This should keep the connection busy.
> 
> I filed a bug regarding the exit-code. Is there any chance the fix
> will make it in 1.2.2?
No, behavior changes can''t go into the current prod branch so will go
into 1.3.x which should become 2.0.0 very soon - like in January maybe

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Yaakov Nemoy

2011-Dec-08 22:22 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

If you''re looking for monitoring tips, one thing we do is run an
''mco
find'' on our entire network. A nagios page goes out if the count
(''wc
-l'') drops below a threshold.

-Yaakov

On Thu, Dec 8, 2011 at 06:50, Jens Braeuer <jens.braeuer@numberfour.eu>
wrote:> Hi everyone,
>
> thank R.I.P. for the pointers. I already planed to add more monitoring to
ActiveMQ, so I''ll take this as a chance. I
> dont think Amazon network settings are the root cause in my case, as I have
registration agent enabled on all machines.
> This should keep the connection busy.
>
> I filed a bug regarding the exit-code. Is there any chance the fix will
make it in 1.2.2?
>
> Thanks,
> Jens
>
> --
> Jens Bräuer
> Senior Systems Engineer
> Dipl. Inf.
> NumberFour AG
> Schönhauser Allee 8
> 10119 Berlin
> Germany
> Mobile: +49 175 221 88 34
> Phone: +49 30 40505411
> Fax: +49 30 40505410
> jens@numberfour.eu
>
> numberfour.eu
> facebook.com/NumberFour
> twitter.com/numberfourag
>
> --
> You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Jens Braeuer

2011-Dec-09 09:08 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Hi Yaakov,

would you be so kind and share your Nagios script with us? Would be cool :-)

Jens

Am 08.12.11 23:22, schrieb Yaakov Nemoy:> If you''re looking for monitoring tips, one thing we do is run an
''mco
> find'' on our entire network. A nagios page goes out if the count
(''wc
> -l'') drops below a threshold.
>
> -Yaakov
>
> On Thu, Dec 8, 2011 at 06:50, Jens Braeuer
<jens.braeuer@numberfour.eu> wrote:
>> Hi everyone,
>>
>> thank R.I.P. for the pointers. I already planed to add more monitoring
to ActiveMQ, so I''ll take this as a chance. I
>> dont think Amazon network settings are the root cause in my case, as I
have registration agent enabled on all machines.
>> This should keep the connection busy.
>>
>> I filed a bug regarding the exit-code. Is there any chance the fix will
make it in 1.2.2?
>>
>> Thanks,
>> Jens
>>
>> --
>> Jens Bräuer
>> Senior Systems Engineer
>> Dipl. Inf.
>> NumberFour AG
>> Schönhauser Allee 8
>> 10119 Berlin
>> Germany
>> Mobile: +49 175 221 88 34
>> Phone: +49 30 40505411
>> Fax: +49 30 40505410
>> jens@numberfour.eu
>>
>> numberfour.eu
>> facebook.com/NumberFour
>> twitter.com/numberfourag
>>
>> --
>> You received this message because you are subscribed to the Google
Groups "Puppet Users" group.
>> To post to this group, send email to puppet-users@googlegroups.com.
>> To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
>> For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
>>

-- 
Jens Bräuer
Senior Systems Engineer
Dipl. Inf.
NumberFour AG
Schönhauser Allee 8
10119 Berlin
Germany
Mobile: +49 175 221 88 34
Phone: +49 30 40505411
Fax: +49 30 40505410
jens@numberfour.eu
 
numberfour.eu
facebook.com/NumberFour
twitter.com/numberfourag

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

R.I.Pienaar

2011-Dec-09 09:29 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

----- Original Message -----> Hi Yaakov,
> 
> would you be so kind and share your Nagios script with us? Would be
> cool :-)
I have some activemq monitor stuff @
https://github.com/ripienaar/monitoring-scripts/tree/master/activemq
not quite what you''re after but will help.

You can also use the registration system and the file based monitor to 
notify you of machines thats not functional 

http://projects.puppetlabs.com/projects/mcollective-plugins/wiki/AgentRegistrationMonitor

and indeed, the check Yaakov propose would also be useful

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Yaakov Nemoy

2011-Dec-09 21:28 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Absolutely. Attached.

On Fri, Dec 9, 2011 at 01:08, Jens Braeuer <jens.braeuer@numberfour.eu>
wrote:> Hi Yaakov,
>
> would you be so kind and share your Nagios script with us? Would be cool
:-)
>
> Jens
>
> Am 08.12.11 23:22, schrieb Yaakov Nemoy:
>> If you''re looking for monitoring tips, one thing we do is run
an ''mco
>> find'' on our entire network. A nagios page goes out if the
count (''wc
>> -l'') drops below a threshold.
>>
>> -Yaakov
>>
>> On Thu, Dec 8, 2011 at 06:50, Jens Braeuer
<jens.braeuer@numberfour.eu> wrote:
>>> Hi everyone,
>>>
>>> thank R.I.P. for the pointers. I already planed to add more
monitoring to ActiveMQ, so I''ll take this as a chance. I
>>> dont think Amazon network settings are the root cause in my case,
as I have registration agent enabled on all machines.
>>> This should keep the connection busy.
>>>
>>> I filed a bug regarding the exit-code. Is there any chance the fix
will make it in 1.2.2?
>>>
>>> Thanks,
>>> Jens
>>>
>>> --
>>> Jens Bräuer
>>> Senior Systems Engineer
>>> Dipl. Inf.
>>> NumberFour AG
>>> Schönhauser Allee 8
>>> 10119 Berlin
>>> Germany
>>> Mobile: +49 175 221 88 34
>>> Phone: +49 30 40505411
>>> Fax: +49 30 40505410
>>> jens@numberfour.eu
>>>
>>> numberfour.eu
>>> facebook.com/NumberFour
>>> twitter.com/numberfourag
>>>
>>> --
>>> You received this message because you are subscribed to the Google
Groups "Puppet Users" group.
>>> To post to this group, send email to puppet-users@googlegroups.com.
>>> To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
>>> For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
>>>
>
>
> --
> Jens Bräuer
> Senior Systems Engineer
> Dipl. Inf.
> NumberFour AG
> Schönhauser Allee 8
> 10119 Berlin
> Germany
> Mobile: +49 175 221 88 34
> Phone: +49 30 40505411
> Fax: +49 30 40505410
> jens@numberfour.eu
>
> numberfour.eu
> facebook.com/NumberFour
> twitter.com/numberfourag
>
> --
> You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Jayapandian Ponraj

2012-Nov-26 09:10 UTC

head link

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Overall it works like a
> > charm, but sometimes (eg. 1/30) discovery fails. 
>
> does mco ping exhibit the same behavior if you run it often?  Does it
> tend to happen more after a period of the collective being idle or just
> really randomly?
>Am facing the same problem where the discovery fails (1/4). I have a setup 
with 
a network of 2 activemq brokers and the client and nodes fails over from 
one to other.
The mco ping also exhibits the same problem, mostly after beeing idle and 
when one of the broker 
boxes is heavily loaded. I have nodes with 4000, 5000 ms ping and with 60, 
70 ms ping 
but when i run mco ping with default i get no nodes discovered. Atleast it 
should discover the
low ping boxes right? 
*I have configured fail over as in the documentaion but yet i find these 
discovery problems especially when*
*one of the brokers is loaded. Can mcollective to tuned to use the less 
loaded broker ?*
 
> > 
> > I would like to ask for some feedback on the following ideas, that
> > could fix this problem.
> > 
> > a) Increase discovery timeout
> > mco offers a option to tweak the discovery timeout. What is your
> > experience with increasing this value? When running
> > "mco ping", I see ping times of 130ms, so 2 seconds (the
default
> > should be enough), or?
> > Is there a way to configure is global?
>
> it''s not global - its a client setting but with those ping times
it
> should be sufficient.  Discovery does exactly what mco ping does so its
> a good way to diagnose
>by this documentation it shows its a server config  
http://docs.puppetlabs.com/mcollective/reference/basic/configuration.html

when i add the command line option --dt 5 it works fine. I dono why the low 
ping nodes
are not discovered in the default discovery timeout. I find this option 

export MCOLLECTIVE_EXTRA_OPTS="--dt 5 --timeout 3 --config 
/home/you/mcollective.cfg"

*Is there a way to configure default discovery timeout for the client in 
the client.cfg?*

Can anyone clarify me on this whole discovery timeout thing?
 
 
> Might be worth enabling verbose gc logging on your activemq its possible
> that during these times it just did a big full garbage collection which
> would block it and that might indicate some tuning is needed
>
> > 
> > b) Mco should exit != 0 when no nodes are found
> > I would like to see a "--batch" or
"--non-interactive" mode, where
> > mco has a exit code different from 0, when no nodes
> > are found.
>
> ok, you can file tickets for this
>I haven''t setup any mcollective monitoring, Is checking if mco ping
works
fine 
enough ?
 
> > c) Add "expected count" to mco command
> > I thing there are some situation, where one knows the number of
> > MCollective nodes. So what about adding a options
> > "--expect-number" option to mco, where I can either give a
count or
> > range of expected nodes.
>
> mcollective 1.3.x which will soon become 2.0 have a new mode of 
> communications
> where you can provide it a host list etc and it will bypass discovery, this
> is ment to be used for things like deployers where you know what machines 
> you
> wish to affect and it will probably help
>Am using the direct addressing stuff and i works just fine

 - activemq doing long full garbage collections>  - network is interrupted after long periods of idle time
>  - activemq is idle for a long time and was swapped etc
>  - you have very busy machines that do not respond at all - unlikely in 
> your case
>
> there are probably other reasons too but these are the rough likely causes.
> Amazon has a pretty aggressive idle connection timeout though so you might
> enable registration just to keep the stomp connections from being idle too 
> long
>
Mine is a data centre setup, this is my first experience with message 
brokers so can 
i get some ideas in fine tuning activemq or ll using rabbitmq solve any of 
the problems? 
 
 

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/VRlFkZKguGcJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Jayapandian Ponraj

2012-Dec-11 21:24 UTC

head link

[Puppet Users] Re: MCollective discovery - we did not discover any nodes

Any reply to my questions is hugely appreciated..

On Tuesday, December 6, 2011 4:19:46 PM UTC+5:30, Jens Braeuer
wrote:>
> Hi everyone,
>
> I run MCollective 1.2.1 together with ActiveMQ 5.5 under Scientific Linux 
> 6.1 on Amazon EC2. Overall it works like a
> charm, but sometimes (eg. 1/30) discovery fails. Still the exit-code of 
> mco will be 0, which is a problem for me as I
> use MCollective e.g. to trigger deployments from Jenkins.
>
> I would like to ask for some feedback on the following ideas, that could 
> fix this problem.
>
> a) Increase discovery timeout
> mco offers a option to tweak the discovery timeout. What is your 
> experience with increasing this value? When running
> "mco ping", I see ping times of 130ms, so 2 seconds (the default
should be
> enough), or?
> Is there a way to configure is global?
>
> b) Mco should exit != 0 when no nodes are found
> I would like to see a "--batch" or "--non-interactive"
mode, where mco has
> a exit code different from 0, when no nodes
> are found.
>
> c) Add "expected count" to mco command
> I thing there are some situation, where one knows the number of 
> MCollective nodes. So what about adding a options
> "--expect-number" option to mco, where I can either give a count
or range
> of expected nodes.
>
> d) Is this normal at all?
> I have no experience with MCollective in a datacenter, so: Is this problem 
> cloud/EC2 related or does it happen in
> non-cloud setups too? How could I debug what makes the discovery fail?
>
> I really appreciate your feedback. You can find me a jbraeuer on #puppet.
> Cheers,
> Jens
>
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/7DDpZQq0tRYJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Possibly Parallel Threads

Search for more maybe matching threads

Puppet users - Dec 2011 - MCollective discovery - we did not discover any nodes

[Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

Re: [Puppet Users] MCollective discovery - we did not discover any nodes

[Puppet Users] Re: MCollective discovery - we did not discover any nodes

Possibly Parallel Threads