David Mesler
2012-Dec-14 20:15 UTC
[Puppet Users] puppet agent periodically not running after 3.0 upgrade
I''ve recently upgraded from 2.6.9 to 3.0.1 and have noticed an oddity. Our puppet agents are configured with a runinterval of 900 and a splaylimit of 450. Since upgrading I''ve noticed that once or twice a day our puppet agents simply won''t run for about an hour or so. Has anyone else experienced anything like this? --david -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/VvlxFFuXDYgJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
David Mesler
2012-Dec-19 00:00 UTC
[Puppet Users] Re: puppet agent periodically not running after 3.0 upgrade
I''ve noticed when I strace a puppet agent that has failed to run after
its
900 second runinterval, it''s blocking on a really long select call.
Like:
select(1, [], [], [], {1032, 350000} <unfinished ...>
When that''s done it finally re-reads pupet.conf and stars a catalog
run. I
have no idea where that long select call comes from.
On Friday, December 14, 2012 3:15:25 PM UTC-5, David Mesler
wrote:>
> I''ve recently upgraded from 2.6.9 to 3.0.1 and have noticed an
oddity. Our
> puppet agents are configured with a runinterval of 900 and a splaylimit of
> 450. Since upgrading I''ve noticed that once or twice a day our
puppet
> agents simply won''t run for about an hour or so. Has anyone else
> experienced anything like this?
>
> --david
>
--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/1JiAUgNJFmQJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
David Mesler
2012-Dec-20 21:09 UTC
[Puppet Users] Re: puppet agent periodically not running after 3.0 upgrade
I’ve added some debugging messages to run_event_loop and figured out what was going on. I’m going to reference line numbers in https://github.com/puppetlabs/puppet/blob/master/lib/puppet/daemon.rb to make it easier on me to explain. I found that occasionally the “if” statement on line 180 was failing because the value of “now” was one second behind “next_reparse”. I believe this is because line 168 uses “to_i” when it should use “ceil”. I’m deploying a patched version of puppet with this change to my servers now. But I think a bigger issue is line 175 where next_event is set to the current time plus one hour. That’s a pretty arbitrary and unpleasant value. Why not set next_event to the lower vale of :runinterval or :filetimeout? The way it is now, if line 180 fails, you’re stuck with an hour long wait even though your :runinterval may be far less. Also, line 197 is backwards. It should be “next_agent_run += new_run_interval – agent_run_interval”. On Tuesday, December 18, 2012 7:00:42 PM UTC-5, David Mesler wrote:> > I''ve noticed when I strace a puppet agent that has failed to run after its > 900 second runinterval, it''s blocking on a really long select call. Like: > select(1, [], [], [], {1032, 350000} <unfinished ...> > > When that''s done it finally re-reads pupet.conf and stars a catalog run. I > have no idea where that long select call comes from. > > On Friday, December 14, 2012 3:15:25 PM UTC-5, David Mesler wrote: >> >> I''ve recently upgraded from 2.6.9 to 3.0.1 and have noticed an oddity. >> Our puppet agents are configured with a runinterval of 900 and a splaylimit >> of 450. Since upgrading I''ve noticed that once or twice a day our puppet >> agents simply won''t run for about an hour or so. Has anyone else >> experienced anything like this? >> >> --david >> >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/dBF1U0izljQJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
David Mesler
2012-Dec-21 16:48 UTC
[Puppet Users] Re: puppet agent periodically not running after 3.0 upgrade
My ceil theory didn''t pan out. But here''s a working fix:
--- puppet-3.0.1.orig/lib/puppet/daemon.rb 2012-10-20 00:57:50.000000000
-0400
+++ puppet-3.0.1/lib/puppet/daemon.rb 2012-12-21 08:27:17.000000000 -0500
@@ -164,6 +164,8 @@
next_reparse = 0
reparse_interval = Puppet[:filetimeout]
+ next_event = 0
+
loop do
now = Time.now.to_i
@@ -172,7 +174,9 @@
# the math messes up or something; it should be inexpensive enough to
# wake once an hour, then go back to sleep after doing nothing, if
# someone only wants listen mode.
- next_event = now + 60 * 60
+ if now >= next_event || next_event == 0
+ next_event = now + 60 * 60
+ end
# Handle reparsing of configuration files, if desired and required.
# `reparse` will just check if the action is required, and would be
@@ -194,7 +198,7 @@
# next time it is scheduled to run, just in case. In the event that
# we made no change the result will be a zero second adjustment.
new_run_interval = [Puppet[:runinterval], 0].max
- next_agent_run += agent_run_interval - new_run_interval
+ next_agent_run += new_run_interval - agent_run_interval
agent_run_interval = new_run_interval
end
On Thursday, December 20, 2012 4:09:09 PM UTC-5, David Mesler
wrote:>
> I’ve added some debugging messages to run_event_loop and figured out what
> was going on. I’m going to reference line numbers in
> https://github.com/puppetlabs/puppet/blob/master/lib/puppet/daemon.rb to
> make it easier on me to explain. I found that occasionally the “if”
> statement on line 180 was failing because the value of “now” was one second
> behind “next_reparse”. I believe this is because line 168 uses “to_i” when
> it should use “ceil”. I’m deploying a patched version of puppet with this
> change to my servers now.
>
> But I think a bigger issue is line 175 where next_event is set to the
> current time plus one hour. That’s a pretty arbitrary and unpleasant value.
> Why not set next_event to the lower vale of :runinterval or :filetimeout?
> The way it is now, if line 180 fails, you’re stuck with an hour long wait
> even though your :runinterval may be far less.
>
> Also, line 197 is backwards. It should be “next_agent_run +=
> new_run_interval – agent_run_interval”.
>
> On Tuesday, December 18, 2012 7:00:42 PM UTC-5, David Mesler wrote:
>>
>> I''ve noticed when I strace a puppet agent that has failed to
run after
>> its 900 second runinterval, it''s blocking on a really long
select call.
>> Like:
>> select(1, [], [], [], {1032, 350000} <unfinished ...>
>>
>> When that''s done it finally re-reads pupet.conf and stars a
catalog run.
>> I have no idea where that long select call comes from.
>>
>> On Friday, December 14, 2012 3:15:25 PM UTC-5, David Mesler wrote:
>>>
>>> I''ve recently upgraded from 2.6.9 to 3.0.1 and have
noticed an oddity.
>>> Our puppet agents are configured with a runinterval of 900 and a
splaylimit
>>> of 450. Since upgrading I''ve noticed that once or twice a
day our puppet
>>> agents simply won''t run for about an hour or so. Has
anyone else
>>> experienced anything like this?
>>>
>>> --david
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/mcnppG9SNo4J.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.