Jon Ward
2013-Feb-20 16:03 UTC
[Puppet Users] hiera-gpg causing big increase in catalog compile time
Hi, I''ve been using hiera for a little while and have started using the hiera-gpg back-end for passwords etc. The problem is, I am seeing a massive increase in catalog compile time when using hiera-gpg. On one particular node where there is quite a lot going on, catalog compile time jumped from <5s to >90s. My typical compile times have gone from around 2s to around 30s. I have the gpg backend listed underneath yaml in my hiera.yaml file, so from what I understand my .gpg config files should only be interrogated if no answer is found in the .yaml files. I only have half a dozen or so vars stored in the .gpg files. By simply removing the gpg backend from hiera.yaml the compile times go back down to normal. I''m using Puppet 3.1.0 installed from apt.puppetlabs.com on Debian Squeeze & hiera-gpg 1.1.0 installed from Rubygems. Would appreciate any tips for debugging this problem, thanks in advance. Jon -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com. To post to this group, send email to puppet-users@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Aaron Mills
2013-May-08 16:38 UTC
[Puppet Users] Re: hiera-gpg causing big increase in catalog compile time
Thought I''d drag this topic back to life rather than open a new one for the same issue. I''m seeing pretty much the exact same behavior on my catalog compile times. With puppet 3.1.0 and hiera-gpg 1.1.0 I''m seeing compile times usually in the 60-90-second range. This is causing a lot of agent runs to timeout or get an "end of file" error. Consolidating GPG-encrypted data into a single file doesn''t seem to have any bearing on compile times. Running the master in debug mode doesn''t seem to surface any obvious issues. Has anyone made any headway on this issue? -Aaron On Wednesday, February 20, 2013 9:03:19 AM UTC-7, Jon Ward wrote:> > Hi, > > I''ve been using hiera for a little while and have started using the > hiera-gpg back-end for passwords etc. > > The problem is, I am seeing a massive increase in catalog compile time > when using hiera-gpg. On one particular node where there is quite a lot > going on, catalog compile time jumped from <5s to >90s. My typical compile > times have gone from around 2s to around 30s. > > I have the gpg backend listed underneath yaml in my hiera.yaml file, so > from what I understand my .gpg config files should only be interrogated if > no answer is found in the .yaml files. I only have half a dozen or so vars > stored in the .gpg files. > > By simply removing the gpg backend from hiera.yaml the compile times go > back down to normal. > > I''m using Puppet 3.1.0 installed from apt.puppetlabs.com on Debian > Squeeze & hiera-gpg 1.1.0 installed from Rubygems. > > Would appreciate any tips for debugging this problem, thanks in advance. > > Jon >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com. To post to this group, send email to puppet-users@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
jcbollinger
2013-May-08 19:44 UTC
[Puppet Users] Re: hiera-gpg causing big increase in catalog compile time
On Wednesday, May 8, 2013 11:38:06 AM UTC-5, Aaron Mills wrote:> > Thought I''d drag this topic back to life rather than open a new one for > the same issue. I''m seeing pretty much the exact same behavior on my > catalog compile times. With puppet 3.1.0 and hiera-gpg 1.1.0 I''m seeing > compile times usually in the 60-90-second range. This is causing a lot of > agent runs to timeout or get an "end of file" error. Consolidating > GPG-encrypted data into a single file doesn''t seem to have any bearing on > compile times. > > Running the master in debug mode doesn''t seem to surface any obvious > issues. Has anyone made any headway on this issue? > >I assumed the first time around that the increased compile times were a function of the relatively large computational cost of cryptography. Nevertheless, there might be some inefficiencies in the way hiera-gpg works internally, in the way it works together with the hiera framework, and in the behavior your manifests provoke from it. Hiera-gpg decrypts each target file it consults, in its entirety, whenever it is queried for a key. How expensive that is depends on the number and size of the files, on the position in the hierarchy where target keys are typically found, and on which hiera access function you actually use. If you use many parameterized classes, then Puppet 3''s automatic class parameter binding will tend to aggravate that problem, especially if you typically allow class parameters to take default values or DSL-specified values (so that hiera has lots of complete misses on the parameter names). There are several things you could do to try to mitigate, among them: - Use the :gpg: back-end only for data that really need to be encrypted. Use the plain :yaml: backend for everything else, and give it higher priority. - Minimize use of parameterized classes, or else ensure that all class parameters are recorded in your hiera data files, even if they take default values. - If you use hiera for data other than class parameters, then avoid looking up the same key multiple times. Instead, read the data into some class''s variables, and have everyone else get the data from those variables. - Avoid need for hiera_hash() and hiera_array(), each of which will decrypt every one of the :gpg: backend''s files on every call. (And do be sure you know the difference between using those functions and using the plain hiera() function to retrieve hashes and arrays.) - If you''re willing to get a bit intrusive, then restructure your data and parameters so that fewer overall lookups are required. For instance, combine multiple individual values into hashes, so that you can perform a single lookup for the hash instead of a separate lookup for each component. Good luck, John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com. To post to this group, send email to puppet-users@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Aaron Mills
2013-May-09 15:18 UTC
[Puppet Users] Re: hiera-gpg causing big increase in catalog compile time
Hmm..it seems like a pretty basic use case is an accompanying gpg file for each level of a hierarchy, just to store things like passwords, or sensitive data. Minimizing the use of things like hiera''s 3.x data bindings to gain speed in hiera-gpg lookups feels like throwing the baby out with the bathwater. I wonder how difficult (read: secure) it would be to cache the data across calls. An md5sum could be used to determine whether the contents of a .gpg file have changed since the last lookup. Instead of decrypting each file for every call, hiera-gpg can do something like: - Calculate an md5sum of the .gpg files, and the data from these files stored in memory, redis, or wherever. - When asked for a variable, do an md5sum of the .gpg file and, if the values are the same, return the data from memory - If the hash values don''t match, reload the data from the .gpg file. Seems like this would be slightly faster than having to fully decrypt the contents of each file for every parameter lookup. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com. To post to this group, send email to puppet-users@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Jon Ward
2013-May-10 15:18 UTC
[Puppet Users] Re: hiera-gpg causing big increase in catalog compile time
> > If you use many parameterized classes, then Puppet 3''s automatic class > parameter binding will tend to aggravate that problem, especially if you > typically allow class parameters to take default values or DSL-specified > values (so that hiera has lots of complete misses on the parameter names). >Yep this was what I eventually worked out was causing my massive increase in compile times - I wasn''t aware of the new data binding stuff in Puppet 3 and was making heavy use of default values in parameterised classes without realizing that each one was triggering a Hiera lookup through all of my yaml and then gpg data files. I ended up changing most of my parameterised classes back to regular classes, and not using any default values in the remaining ones - to be honest with Hiera I didn''t really need them any more as I put my default values in common.yaml - and this reduced my catalog compile time back to a tolerable level. That said my compile times are still a lot higher than they were prior to using Hiera and the GPG backend, and I was shocked when I ran Puppetmaster in debug mode to see it opening each file in turn for every single lookup. Caching the data would seem the sensible way to go. On Thursday, May 9, 2013 4:18:21 PM UTC+1, Aaron Mills wrote:> > Hmm..it seems like a pretty basic use case is an accompanying gpg file for > each level of a hierarchy, just to store things like passwords, or > sensitive data. Minimizing the use of things like hiera''s 3.x data bindings > to gain speed in hiera-gpg lookups feels like throwing the baby out with > the bathwater. > > I wonder how difficult (read: secure) it would be to cache the data across > calls. An md5sum could be used to determine whether the contents of a .gpg > file have changed since the last lookup. Instead of decrypting each file > for every call, hiera-gpg can do something like: > > - Calculate an md5sum of the .gpg files, and the data from these files > stored in memory, redis, or wherever. > - When asked for a variable, do an md5sum of the .gpg file and, if the > values are the same, return the data from memory > - If the hash values don''t match, reload the data from the .gpg file. > > Seems like this would be slightly faster than having to fully decrypt the > contents of each file for every parameter lookup. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com. To post to this group, send email to puppet-users@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
jcbollinger
2013-May-10 17:34 UTC
[Puppet Users] Re: hiera-gpg causing big increase in catalog compile time
On Thursday, May 9, 2013 10:18:21 AM UTC-5, Aaron Mills wrote:> > Hmm..it seems like a pretty basic use case is an accompanying gpg file for > each level of a hierarchy, just to store things like passwords, or > sensitive data. Minimizing the use of things like hiera''s 3.x data bindings > to gain speed in hiera-gpg lookups feels like throwing the baby out with > the bathwater. > > I wonder how difficult (read: secure) it would be to cache the data across > calls. An md5sum could be used to determine whether the contents of a .gpg > file have changed since the last lookup. Instead of decrypting each file > for every call, hiera-gpg can do something like: > > - Calculate an md5sum of the .gpg files, and the data from these files > stored in memory, redis, or wherever. > - When asked for a variable, do an md5sum of the .gpg file and, if the > values are the same, return the data from memory > - If the hash values don''t match, reload the data from the .gpg file. > > Seems like this would be slightly faster than having to fully decrypt the > contents of each file for every parameter lookup. >I think caching might be a viable way to go, but there is the issue of recognizing when to invalidate cache entries. I don''t like the md5 approach very well, because it''s still a fairly expensive computation to perform so frequently. Better, I think, would be to simply clear the cache once at the beginning of each catalog compilation. I would not be worried about changes between two lookups during the same catalog run, because it is not a clear win to always pull the very freshest value for each item at the cost of possibly getting inconsistent data by pulling items from different versions of the same file. Indeed, my inclination would be to prefer consistency. I don''t use the :gpg: backend myself, but if someone cares enough to write it up, this looks like a viable enhancement request. I''m not sure where that would need to go, however, since hiera-gpg is not a PuppetLabs project. Craig hangs around here, so maybe he''ll see this, but I would not assume he will. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com. To post to this group, send email to puppet-users@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.