thr3ads.net - Puppet users - [Puppet Users] wrong facts going into storeconfigs, 0.25+2.6 [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Jonathon Anderson

2010-Aug-02 11:07 UTC

[Puppet Users] wrong facts going into storeconfigs, 0.25+2.6

I''m re-posting this because I''m not sure that it got through
the first
time.  If someone could at least echo back that this is reaching the
list, I''d appreciate it.  (I''m new to the list.)

Sometimes (with variable frequency) storeconfigs stores the wrong data
in the fact_values table.  This has the end result that exported
resources, when collected, have invalid configuration.

The most recent example: the "hostname" fact for one of our nodes got,
in stead, the value that should have gone in the "processorcount"
fact.  The had the end result that the node''s nagios configuration
started trying to monitor a host "8" rather than "cn19", and
ssh keys
for cn19 were collected at other nodes as "8,8.example.com
<keytext>"
in stead of "cn19,cn19.example.com <keytext>".  The hostname
fact is
the only destination that I''ve noticed the corrupted data in, but the
source has been swapfree/swapsize, processor[n], operatingsystem,
operatingsystemrelease, kernelrelease, and others.

I realize that I don''t have much of a "simple, repeatable,
minimal"
test case here, but I''ve been trying to figure it out for months to no
avail.  I had hoped that an upgrade to 2.6 would make this problem go
away, but no:  we''ve just now experienced it again.  For the record,
we''ve seen it since sometime in the 0.24.x branch (when we started
using it).

It might have something to do with an appropriately high load on
storeconfigs.  I ran it for 2 days with nodes exporting data (but not
collecting) to see if it would happen again, and I didn''t notice any
corruption.  Then, today, I enabled collection (e.g., ssh_known_hosts)
on all (~138) hosts, and soon after found a corrupt nagios
configuration.  (Then again, it might just be that it''s more probably
with more nodes doing the collection.)

I''ve never seen the actual facter command return one of these bits of
misplaced data: the furthest back I''ve been able to trace it is to the
facts_values table.

We''re using a single puppet master, with storeconfigs storing to a
postgresql database on a different host from the puppet master host.
Everything works in the majority of cases, but fails just often enough
to make it really, really annoying.

Any help anyone can provide, including insight into where I might look
to track down the cause even further, would be much appreciated.
Thanks.

~jon

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Brice Figureau

2010-Aug-02 12:18 UTC

head link

Re: [Puppet Users] wrong facts going into storeconfigs, 0.25+2.6

Hi,

On Mon, 2010-08-02 at 14:07 +0300, Jonathon Anderson
wrote:> I''m re-posting this because I''m not sure that it got
through the first
> time.  If someone could at least echo back that this is reaching the
> list, I''d appreciate it.  (I''m new to the list.)
I don''t know if your first message went through, but I confirm this one
did.
> Sometimes (with variable frequency) storeconfigs stores the wrong data
> in the fact_values table.  This has the end result that exported
> resources, when collected, have invalid configuration.
> 
> The most recent example: the "hostname" fact for one of our nodes
got,
> in stead, the value that should have gone in the "processorcount"
> fact.  The had the end result that the node''s nagios configuration
> started trying to monitor a host "8" rather than
"cn19", and ssh keys
> for cn19 were collected at other nodes as "8,8.example.com
<keytext>"
> in stead of "cn19,cn19.example.com <keytext>".  The
hostname fact is
> the only destination that I''ve noticed the corrupted data in, but
the
> source has been swapfree/swapsize, processor[n], operatingsystem,
> operatingsystemrelease, kernelrelease, and others.
> 
> I realize that I don''t have much of a "simple, repeatable,
minimal"
> test case here, but I''ve been trying to figure it out for months
to no
> avail.  I had hoped that an upgrade to 2.6 would make this problem go
> away, but no:  we''ve just now experienced it again.  For the
record,
> we''ve seen it since sometime in the 0.24.x branch (when we started
> using it).
So that''s an "old" issue, not something introduced in the
brand new 2.6.
> It might have something to do with an appropriately high load on
> storeconfigs.  I ran it for 2 days with nodes exporting data (but not
> collecting) to see if it would happen again, and I didn''t notice
any
> corruption.  Then, today, I enabled collection (e.g., ssh_known_hosts)
> on all (~138) hosts, and soon after found a corrupt nagios
> configuration.  (Then again, it might just be that it''s more
probably
> with more nodes doing the collection.)
Which seems logical.
> I''ve never seen the actual facter command return one of these bits
of
> misplaced data: the furthest back I''ve been able to trace it is to
the
> facts_values table.
> 
> We''re using a single puppet master, with storeconfigs storing to a
> postgresql database on a different host from the puppet master host.
> Everything works in the majority of cases, but fails just often enough
> to make it really, really annoying.
> 
> Any help anyone can provide, including insight into where I might look
> to track down the cause even further, would be much appreciated.
> Thanks.
So, the real question is to be able to understand where does the issue
come. As I see it, the facts the node sends to the puppetmaster are
correct, otherwise the received catalog wouldn''t apply correctly.
So the issue is, to my understanding, a pure storeconfig issue.

The first thing you should check is the version of active record or the
postgres lib you are using. Try to upgrade those, maybe the issue was
fixed (assuming the issue is not on the Puppet side).

Next, you should try to analyse where the issue came, by having a look
to the SQL queries active record generated:
1) clean up the mess so that you start with a good database
2) activate on your master the active_record log (set
rails_loglevel=debug and railslog=/path/to/rails.log)
3) let it run until you notice the issue
4) read the rails log to find the culprit sql request, maybe that could
give you more information. At least we''ll know what it tries to save.

Then, I''d add debug statements to the puppetmaster (check
lib/puppet/rails/host.rb especially the merge_facts method). By
correlating this debug information and the query log, you might be able
to notice a pattern or at least find if the problem comes from an issue
in the data Puppet has, or if the issue is created in the AR layer.

You should also file a bug report with all the information you''ll find.
Hope that helps,
-- 
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Apparently Analagous Threads

Search for more reasonably related threads

Puppet users - Aug 2010 - wrong facts going into storeconfigs, 0.25+2.6

[Puppet Users] wrong facts going into storeconfigs, 0.25+2.6

Re: [Puppet Users] wrong facts going into storeconfigs, 0.25+2.6

Apparently Analagous Threads