thr3ads.net - Puppet users - [Puppet Users] File optimizations [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Bostjan Skufca

2012-Oct-22 19:09 UTC

[Puppet Users] File optimizations

Hi there,

I''m running into slow catalog runs because of many files that are
managed.
I was thinking about some optimizations of this functionality.

1: On puppetmaster:
For files with "source => ''puppet:///modules...''
puppetmaster should
already calculate md5 and send it with the catalog.

2: On managed node:
As md5s for files are already there once catalog is received, there is no 
need for x https calls (x is the number of files managed with source=> 
parameter)

3. Puppetmaster md5 cache
This would of course put some strain on puppetmaster, which would then 
benefit from some sort of file md5 cache:
- when md5 is calculated, put in into cache, key is filename. Also add file 
mtime and time of cache insert.
- on each catalog request, for each file in the catalog check if mtime has 
changed, and if so, recalculate md5 hash, else just retrieve md5 hash from 
cache
- some sort of stale cache entries removal, based on cache insert time, 
maybe at the end of each puppet catalog compilation, maybe controlled with 
probability 1:100 or something

Do you have any comments about these optimizations? They will be greatly 
appreciated... really :)

b.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/_2z4LaLw_IoJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Stephen Gran

2012-Oct-22 19:27 UTC

head link

Re: [Puppet Users] File optimizations

Hi,

On Mon, 2012-10-22 at 12:09 -0700, Bostjan Skufca wrote:> Hi there,
> 
> 
> I''m running into slow catalog runs because of many files that are
> managed. I was thinking about some optimizations of this
> functionality.
Your suggestions look reasonable to me, but I''m not a puppetlabs
person,
so can''t make an official comment.

Turn the question around for a moment: why do you have so many file
resources?

Cheers,
-- 
Stephen Gran
Senior Systems Integrator - guardian.co.uk

Please consider the environment before printing this email.
------------------------------------------------------------------
Visit guardian.co.uk - newspaper of the year

www.guardian.co.uk    www.observer.co.uk     www.guardiannews.com 

On your mobile, visit m.guardian.co.uk or download the Guardian
iPhone app www.guardian.co.uk/iphone and iPad edition www.guardian.co.uk/iPad 
 
Save up to 37% by subscribing to the Guardian and Observer - choose the papers
you want and get full digital access.
Visit guardian.co.uk/subscribe 

---------------------------------------------------------------------
This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.
 
Guardian News & Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News & Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Brice Figureau

2012-Oct-22 21:17 UTC

head link

Re: [Puppet Users] File optimizations

On 23/10/12 01:39, Nikola Petrov wrote:> On Mon, Oct 22, 2012 at 12:09:45PM -0700, Bostjan Skufca wrote:
>> Hi there,
>>
>> I''m running into slow catalog runs because of many files that
are managed.
>> I was thinking about some optimizations of this functionality.
>>
>> 1: On puppetmaster:
>> For files with "source =>
''puppet:///modules...'' puppetmaster should
>> already calculate md5 and send it with the catalog.
>>
>> 2: On managed node:
>> As md5s for files are already there once catalog is received, there is
no
>> need for x https calls (x is the number of files managed with
source=>
>> parameter)
>>
>> 3. Puppetmaster md5 cache
>> This would of course put some strain on puppetmaster, which would then 
>> benefit from some sort of file md5 cache:
>> - when md5 is calculated, put in into cache, key is filename. Also add
file
>> mtime and time of cache insert.
>> - on each catalog request, for each file in the catalog check if mtime
has
>> changed, and if so, recalculate md5 hash, else just retrieve md5 hash
from
>> cache
>> - some sort of stale cache entries removal, based on cache insert time,
>> maybe at the end of each puppet catalog compilation, maybe controlled
with
>> probability 1:100 or something
>>
>> Do you have any comments about these optimizations? They will be
greatly
>> appreciated... really :)
>>
>> b.
>>
> Hi,
> 
> When using puppet I found that it is a far better idea to serf files
> with something else. You will be far better with something else for this
> job like sftp or ssh. My conclusions just came from the fact that we
> were trying to import a big dump(2GB seems ok to me but who knows) and
> puppet just died because they are not streaming the file but it is
> *fully* loaded into memory.
This assertion is not true anymore since at least 2.6.0.
Also, for big files you can activate http compression on the client,
this might help (or not, YMMV).
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Brice Figureau

2012-Oct-22 21:27 UTC

head link

Re: [Puppet Users] File optimizations

Hi,

For development questions, feel free to post in puppet-dev :)

You''re not the first irritated by those md5 computations taking time.
That''s something I''d like to really optimize since a loooong
time.
That''s simple quite difficult.

On 22/10/12 21:09, Bostjan Skufca wrote:> Hi there,
> 
> I''m running into slow catalog runs because of many files that are
> managed. I was thinking about some optimizations of this functionality.
> 
> 1: On puppetmaster:
> For files with "source => ''puppet:///modules...''
puppetmaster should
> already calculate md5 and send it with the catalog.
That''s what the static compiler does, if I''m not mistaken. The
static
compiler is part of puppet since 2.7.
> 2: On managed node:
> As md5s for files are already there once catalog is received, there is
> no need for x https calls (x is the number of files managed with
> source=> parameter)
> 
> 3. Puppetmaster md5 cache
> This would of course put some strain on puppetmaster, which would then
> benefit from some sort of file md5 cache:
> - when md5 is calculated, put in into cache, key is filename. Also add
> file mtime and time of cache insert.
> - on each catalog request, for each file in the catalog check if mtime
> has changed, and if so, recalculate md5 hash, else just retrieve md5
> hash from cache
> - some sort of stale cache entries removal, based on cache insert time,
> maybe at the end of each puppet catalog compilation, maybe controlled
> with probability 1:100 or something
Actually checking the mtime/size prior to do any md5 computations could
be a big win.

But that''s not all, in fact there are 3 md5 computations per files
taking place during a puppet run:
* one by the master when computing file metadata
* one by the agent on the existing file (this helps to know if the file
changed)
* and finally one after writing the change to the files to make sure we
wrote it correctly.

A potential solution would be to implement a different checksum type
(maybe less powerful than a md5, but faster).
> Do you have any comments about these optimizations? They will be greatly
> appreciated... really :)
Well, I believe we''re (at least myself) very aware of those issues. The
fact that it never got fixed (except by the static compiler) is that
it''s a complex stuff. Last time I tried to fiddle with the
checksumming,
I never quite got anywhere :)

As I said in the preamble, feel free to chime in puppet-dev to talk
about this, and check the various redmine tickets regarding those issues.
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Bostjan Skufca

2012-Oct-22 22:20 UTC

head link

Re: [Puppet Users] File optimizations

Inline

On Monday, 22 October 2012 23:28:03 UTC+2, Brice Figureau
wrote:>
> Hi, 
>
> For development questions, feel free to post in puppet-dev :) 
>
> You''re not the first irritated by those md5 computations taking
time.
> That''s something I''d like to really optimize since a
loooong time.
> That''s simple quite difficult. 
>
Damn it, two Google Groups tabs open, for this topic it was intended to go 
to puppet-dev :/ 

> 1: On puppetmaster: 
> > For files with "source =>
''puppet:///modules...'' puppetmaster should
> > already calculate md5 and send it with the catalog. 
>
> That''s what the static compiler does, if I''m not
mistaken. The static
> compiler is part of puppet since 2.7. 
>
Documentation is scarce at best. What I could find, I see that it maintains 
another copy of the file in the bucket, which is not the best option (my 
usecase, if someone wants even speedier approach, this is the way to go).

b.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/rPKVJhK03zMJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Bostjan Skufca

2012-Oct-22 22:35 UTC

head link

Re: [Puppet Users] File optimizations

Hi Stephen,

On Monday, 22 October 2012 21:28:23 UTC+2, Stephen Gran
wrote:>
> Turn the question around for a moment: why do you have so many file 
> resources? 
>
These systems are puppet-controlled from the /etc/inittab through the whole 
boot process and each and every service startup file definition, along with 
their packages, which are compiled (unfurtunately, this requires at least 
one file resource - install script) and service configuration files.
Why? Let''s put it this way: if you have a cluster of redundant
machines,
you can do a rolling upgrade to newer OSes etc. If not, then uptime must 
not be disturbed and this is the only way we can run recent/fresh software 
on quite old distributions (install bare! distro, no libs, then compile 
everything in a controlled maner.).

I hope I have given you a decent answer, because Borat tends to disagree 
with me:)
http://twitter.com/DEVOPS_BORAT/status/209720453881798656

b.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/pBm9LlYO5OsJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Nikola Petrov

2012-Oct-22 23:39 UTC

head link

Re: [Puppet Users] File optimizations

On Mon, Oct 22, 2012 at 12:09:45PM -0700, Bostjan Skufca
wrote:> Hi there,
> 
> I''m running into slow catalog runs because of many files that are
managed.
> I was thinking about some optimizations of this functionality.
> 
> 1: On puppetmaster:
> For files with "source => ''puppet:///modules...''
puppetmaster should
> already calculate md5 and send it with the catalog.
> 
> 2: On managed node:
> As md5s for files are already there once catalog is received, there is no 
> need for x https calls (x is the number of files managed with source=> 
> parameter)
> 
> 3. Puppetmaster md5 cache
> This would of course put some strain on puppetmaster, which would then 
> benefit from some sort of file md5 cache:
> - when md5 is calculated, put in into cache, key is filename. Also add file
> mtime and time of cache insert.
> - on each catalog request, for each file in the catalog check if mtime has 
> changed, and if so, recalculate md5 hash, else just retrieve md5 hash from 
> cache
> - some sort of stale cache entries removal, based on cache insert time, 
> maybe at the end of each puppet catalog compilation, maybe controlled with 
> probability 1:100 or something
> 
> Do you have any comments about these optimizations? They will be greatly 
> appreciated... really :)
> 
> b.
> Hi,

When using puppet I found that it is a far better idea to serf files
with something else. You will be far better with something else for this
job like sftp or ssh. My conclusions just came from the fact that we
were trying to import a big dump(2GB seems ok to me but who knows) and
puppet just died because they are not streaming the file but it is
*fully* loaded into memory.

Apart from that these ideas seems reasonable but I am not a commiter.

Best, Nikola

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Nikola Petrov

2012-Oct-23 00:37 UTC

head link

Re: [Puppet Users] File optimizations

On Mon, Oct 22, 2012 at 11:17:52PM +0200, Brice Figureau
wrote:> On 23/10/12 01:39, Nikola Petrov wrote:
> > On Mon, Oct 22, 2012 at 12:09:45PM -0700, Bostjan Skufca wrote:
> >> Hi there,
> >>
> >> I''m running into slow catalog runs because of many files
that are managed.
> >> I was thinking about some optimizations of this functionality.
> >>
> >> 1: On puppetmaster:
> >> For files with "source =>
''puppet:///modules...'' puppetmaster should
> >> already calculate md5 and send it with the catalog.
> >>
> >> 2: On managed node:
> >> As md5s for files are already there once catalog is received,
there is no
> >> need for x https calls (x is the number of files managed with
source=>
> >> parameter)
> >>
> >> 3. Puppetmaster md5 cache
> >> This would of course put some strain on puppetmaster, which would
then
> >> benefit from some sort of file md5 cache:
> >> - when md5 is calculated, put in into cache, key is filename. Also
add file
> >> mtime and time of cache insert.
> >> - on each catalog request, for each file in the catalog check if
mtime has
> >> changed, and if so, recalculate md5 hash, else just retrieve md5
hash from
> >> cache
> >> - some sort of stale cache entries removal, based on cache insert
time,
> >> maybe at the end of each puppet catalog compilation, maybe
controlled with
> >> probability 1:100 or something
> >>
> >> Do you have any comments about these optimizations? They will be
greatly
> >> appreciated... really :)
> >>
> >> b.
> >>
> > Hi,
> > 
> > When using puppet I found that it is a far better idea to serf files
> > with something else. You will be far better with something else for
this
> > job like sftp or ssh. My conclusions just came from the fact that we
> > were trying to import a big dump(2GB seems ok to me but who knows) and
> > puppet just died because they are not streaming the file but it is
> > *fully* loaded into memory.
> 
> This assertion is not true anymore since at least 2.6.0.
> Also, for big files you can activate http compression on the client,
> this might help (or not, YMMV).
Nice to hear that. It is true that the version with which I tested this
was somewhat old.

Best, Nikola

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Puppet users - Oct 2012 - File optimizations

[Puppet Users] File optimizations

Re: [Puppet Users] File optimizations

Re: [Puppet Users] File optimizations

Re: [Puppet Users] File optimizations

Re: [Puppet Users] File optimizations

Re: [Puppet Users] File optimizations

Re: [Puppet Users] File optimizations

Re: [Puppet Users] File optimizations