Hi there, I''m running into slow catalog runs because of many files that are managed. I was thinking about some optimizations of this functionality. 1: On puppetmaster: For files with "source => ''puppet:///modules...'' puppetmaster should already calculate md5 and send it with the catalog. 2: On managed node: As md5s for files are already there once catalog is received, there is no need for x https calls (x is the number of files managed with source=> parameter) 3. Puppetmaster md5 cache This would of course put some strain on puppetmaster, which would then benefit from some sort of file md5 cache: - when md5 is calculated, put in into cache, key is filename. Also add file mtime and time of cache insert. - on each catalog request, for each file in the catalog check if mtime has changed, and if so, recalculate md5 hash, else just retrieve md5 hash from cache - some sort of stale cache entries removal, based on cache insert time, maybe at the end of each puppet catalog compilation, maybe controlled with probability 1:100 or something Do you have any comments about these optimizations? They will be greatly appreciated... really :) b. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/_2z4LaLw_IoJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi, On Mon, 2012-10-22 at 12:09 -0700, Bostjan Skufca wrote:> Hi there, > > > I''m running into slow catalog runs because of many files that are > managed. I was thinking about some optimizations of this > functionality.Your suggestions look reasonable to me, but I''m not a puppetlabs person, so can''t make an official comment. Turn the question around for a moment: why do you have so many file resources? Cheers, -- Stephen Gran Senior Systems Integrator - guardian.co.uk Please consider the environment before printing this email. ------------------------------------------------------------------ Visit guardian.co.uk - newspaper of the year www.guardian.co.uk www.observer.co.uk www.guardiannews.com On your mobile, visit m.guardian.co.uk or download the Guardian iPhone app www.guardian.co.uk/iphone and iPad edition www.guardian.co.uk/iPad Save up to 37% by subscribing to the Guardian and Observer - choose the papers you want and get full digital access. Visit guardian.co.uk/subscribe --------------------------------------------------------------------- This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News & Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News & Media Limited A member of Guardian Media Group plc Registered Office PO Box 68164 Kings Place 90 York Way London N1P 2AP Registered in England Number 908396 -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On 23/10/12 01:39, Nikola Petrov wrote:> On Mon, Oct 22, 2012 at 12:09:45PM -0700, Bostjan Skufca wrote: >> Hi there, >> >> I''m running into slow catalog runs because of many files that are managed. >> I was thinking about some optimizations of this functionality. >> >> 1: On puppetmaster: >> For files with "source => ''puppet:///modules...'' puppetmaster should >> already calculate md5 and send it with the catalog. >> >> 2: On managed node: >> As md5s for files are already there once catalog is received, there is no >> need for x https calls (x is the number of files managed with source=> >> parameter) >> >> 3. Puppetmaster md5 cache >> This would of course put some strain on puppetmaster, which would then >> benefit from some sort of file md5 cache: >> - when md5 is calculated, put in into cache, key is filename. Also add file >> mtime and time of cache insert. >> - on each catalog request, for each file in the catalog check if mtime has >> changed, and if so, recalculate md5 hash, else just retrieve md5 hash from >> cache >> - some sort of stale cache entries removal, based on cache insert time, >> maybe at the end of each puppet catalog compilation, maybe controlled with >> probability 1:100 or something >> >> Do you have any comments about these optimizations? They will be greatly >> appreciated... really :) >> >> b. >> > Hi, > > When using puppet I found that it is a far better idea to serf files > with something else. You will be far better with something else for this > job like sftp or ssh. My conclusions just came from the fact that we > were trying to import a big dump(2GB seems ok to me but who knows) and > puppet just died because they are not streaming the file but it is > *fully* loaded into memory.This assertion is not true anymore since at least 2.6.0. Also, for big files you can activate http compression on the client, this might help (or not, YMMV). -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi, For development questions, feel free to post in puppet-dev :) You''re not the first irritated by those md5 computations taking time. That''s something I''d like to really optimize since a loooong time. That''s simple quite difficult. On 22/10/12 21:09, Bostjan Skufca wrote:> Hi there, > > I''m running into slow catalog runs because of many files that are > managed. I was thinking about some optimizations of this functionality. > > 1: On puppetmaster: > For files with "source => ''puppet:///modules...'' puppetmaster should > already calculate md5 and send it with the catalog.That''s what the static compiler does, if I''m not mistaken. The static compiler is part of puppet since 2.7.> 2: On managed node: > As md5s for files are already there once catalog is received, there is > no need for x https calls (x is the number of files managed with > source=> parameter) > > 3. Puppetmaster md5 cache > This would of course put some strain on puppetmaster, which would then > benefit from some sort of file md5 cache: > - when md5 is calculated, put in into cache, key is filename. Also add > file mtime and time of cache insert. > - on each catalog request, for each file in the catalog check if mtime > has changed, and if so, recalculate md5 hash, else just retrieve md5 > hash from cache > - some sort of stale cache entries removal, based on cache insert time, > maybe at the end of each puppet catalog compilation, maybe controlled > with probability 1:100 or somethingActually checking the mtime/size prior to do any md5 computations could be a big win. But that''s not all, in fact there are 3 md5 computations per files taking place during a puppet run: * one by the master when computing file metadata * one by the agent on the existing file (this helps to know if the file changed) * and finally one after writing the change to the files to make sure we wrote it correctly. A potential solution would be to implement a different checksum type (maybe less powerful than a md5, but faster).> Do you have any comments about these optimizations? They will be greatly > appreciated... really :)Well, I believe we''re (at least myself) very aware of those issues. The fact that it never got fixed (except by the static compiler) is that it''s a complex stuff. Last time I tried to fiddle with the checksumming, I never quite got anywhere :) As I said in the preamble, feel free to chime in puppet-dev to talk about this, and check the various redmine tickets regarding those issues. -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Inline On Monday, 22 October 2012 23:28:03 UTC+2, Brice Figureau wrote:> > Hi, > > For development questions, feel free to post in puppet-dev :) > > You''re not the first irritated by those md5 computations taking time. > That''s something I''d like to really optimize since a loooong time. > That''s simple quite difficult. >Damn it, two Google Groups tabs open, for this topic it was intended to go to puppet-dev :/> 1: On puppetmaster: > > For files with "source => ''puppet:///modules...'' puppetmaster should > > already calculate md5 and send it with the catalog. > > That''s what the static compiler does, if I''m not mistaken. The static > compiler is part of puppet since 2.7. >Documentation is scarce at best. What I could find, I see that it maintains another copy of the file in the bucket, which is not the best option (my usecase, if someone wants even speedier approach, this is the way to go). b. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/rPKVJhK03zMJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi Stephen, On Monday, 22 October 2012 21:28:23 UTC+2, Stephen Gran wrote:> > Turn the question around for a moment: why do you have so many file > resources? >These systems are puppet-controlled from the /etc/inittab through the whole boot process and each and every service startup file definition, along with their packages, which are compiled (unfurtunately, this requires at least one file resource - install script) and service configuration files. Why? Let''s put it this way: if you have a cluster of redundant machines, you can do a rolling upgrade to newer OSes etc. If not, then uptime must not be disturbed and this is the only way we can run recent/fresh software on quite old distributions (install bare! distro, no libs, then compile everything in a controlled maner.). I hope I have given you a decent answer, because Borat tends to disagree with me:) http://twitter.com/DEVOPS_BORAT/status/209720453881798656 b. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/pBm9LlYO5OsJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Mon, Oct 22, 2012 at 12:09:45PM -0700, Bostjan Skufca wrote:> Hi there, > > I''m running into slow catalog runs because of many files that are managed. > I was thinking about some optimizations of this functionality. > > 1: On puppetmaster: > For files with "source => ''puppet:///modules...'' puppetmaster should > already calculate md5 and send it with the catalog. > > 2: On managed node: > As md5s for files are already there once catalog is received, there is no > need for x https calls (x is the number of files managed with source=> > parameter) > > 3. Puppetmaster md5 cache > This would of course put some strain on puppetmaster, which would then > benefit from some sort of file md5 cache: > - when md5 is calculated, put in into cache, key is filename. Also add file > mtime and time of cache insert. > - on each catalog request, for each file in the catalog check if mtime has > changed, and if so, recalculate md5 hash, else just retrieve md5 hash from > cache > - some sort of stale cache entries removal, based on cache insert time, > maybe at the end of each puppet catalog compilation, maybe controlled with > probability 1:100 or something > > Do you have any comments about these optimizations? They will be greatly > appreciated... really :) > > b. >Hi, When using puppet I found that it is a far better idea to serf files with something else. You will be far better with something else for this job like sftp or ssh. My conclusions just came from the fact that we were trying to import a big dump(2GB seems ok to me but who knows) and puppet just died because they are not streaming the file but it is *fully* loaded into memory. Apart from that these ideas seems reasonable but I am not a commiter. Best, Nikola -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Mon, Oct 22, 2012 at 11:17:52PM +0200, Brice Figureau wrote:> On 23/10/12 01:39, Nikola Petrov wrote: > > On Mon, Oct 22, 2012 at 12:09:45PM -0700, Bostjan Skufca wrote: > >> Hi there, > >> > >> I''m running into slow catalog runs because of many files that are managed. > >> I was thinking about some optimizations of this functionality. > >> > >> 1: On puppetmaster: > >> For files with "source => ''puppet:///modules...'' puppetmaster should > >> already calculate md5 and send it with the catalog. > >> > >> 2: On managed node: > >> As md5s for files are already there once catalog is received, there is no > >> need for x https calls (x is the number of files managed with source=> > >> parameter) > >> > >> 3. Puppetmaster md5 cache > >> This would of course put some strain on puppetmaster, which would then > >> benefit from some sort of file md5 cache: > >> - when md5 is calculated, put in into cache, key is filename. Also add file > >> mtime and time of cache insert. > >> - on each catalog request, for each file in the catalog check if mtime has > >> changed, and if so, recalculate md5 hash, else just retrieve md5 hash from > >> cache > >> - some sort of stale cache entries removal, based on cache insert time, > >> maybe at the end of each puppet catalog compilation, maybe controlled with > >> probability 1:100 or something > >> > >> Do you have any comments about these optimizations? They will be greatly > >> appreciated... really :) > >> > >> b. > >> > > Hi, > > > > When using puppet I found that it is a far better idea to serf files > > with something else. You will be far better with something else for this > > job like sftp or ssh. My conclusions just came from the fact that we > > were trying to import a big dump(2GB seems ok to me but who knows) and > > puppet just died because they are not streaming the file but it is > > *fully* loaded into memory. > > This assertion is not true anymore since at least 2.6.0. > Also, for big files you can activate http compression on the client, > this might help (or not, YMMV).Nice to hear that. It is true that the version with which I tested this was somewhat old. Best, Nikola -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.