thr3ads.net - Mongrel users - [Mongrel] mongrel garbage collection [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Scott Windsor

2008-Mar-21 18:12 UTC

[Mongrel] mongrel garbage collection

Sorry, for the re-post, but I''m new to the mailing list and wanted to
bring
back up and old topic I saw in the archives.

http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html

I think a patch to delay garbage collection and run it later is pretty
important for high performance web applications.  I do understand the
trade-offs of having explicit vs. implicit garbage collection running, and
would much prefer to off-load my garbage collection until later point (when
users are not waiting for a request).

I agree from the previous points that this could very well be
rails-specific, but isn''t this a feature that would benefit all of the
frameworks that use mongrel?

This could be easily added as a configuration option to run after N number
of requests or let the GC behave as normal and run when needed, the default
of course, allowing the GC when it deems necessary.  Adding the collection
would be explicit after processing a request, but before listening to any
new requests.

- scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080321/f1ad87d5/attachment.html

Kirk Haines

2008-Mar-21 18:49 UTC

head link

[Mongrel] mongrel garbage collection

On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at gmail.com>
wrote:> Sorry, for the re-post, but I''m new to the mailing list and wanted
to bring
> back up and old topic I saw in the archives.
>
> http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html
>
> I think a patch to delay garbage collection and run it later is pretty
> important for high performance web applications.  I do understand the
In the vast majority of cases you are going to do a worse job of
determining when and how often to run the GC than even MRI Ruby''s
simple algorithms.  MRI garbage collection stops the world -- nothing
else happens while the GC runs --  so when talking about overall
throughput on an application, you don''t want it to run any more than
necessary.

I don''t use Rails, but in the past I have experimented with this quite
a lot under IOWA, and in my normal applications (i.e. not using
RMagick) I could never come up with an algorithm of self-managed
GC.disable/GC.enable/GC.start that gave the same overall level of
throughput that I got by letting Ruby start the GC according to its
own algorithms.  That experience makes me skeptical of that approach
in the general case, though there are occasional specific cases where
it can be useful.

Kirk Haines

Scott Windsor

2008-Mar-21 19:23 UTC

head link

[Mongrel] mongrel garbage collection

On Fri, Mar 21, 2008 at 11:49 AM, Kirk Haines <wyhaines at gmail.com>
wrote:
> On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at
gmail.com>
> wrote:
> > Sorry, for the re-post, but I''m new to the mailing list and
wanted to
> bring
> > back up and old topic I saw in the archives.
> >
> > http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html
> >
> > I think a patch to delay garbage collection and run it later is pretty
> > important for high performance web applications.  I do understand the
>
> In the vast majority of cases you are going to do a worse job of
> determining when and how often to run the GC than even MRI Ruby''s
> simple algorithms.  MRI garbage collection stops the world -- nothing
> else happens while the GC runs --  so when talking about overall
> throughput on an application, you don''t want it to run any more
than
> necessary.
>
> I don''t use Rails, but in the past I have experimented with this
quite
> a lot under IOWA, and in my normal applications (i.e. not using
> RMagick) I could never come up with an algorithm of self-managed
> GC.disable/GC.enable/GC.start that gave the same overall level of
> throughput that I got by letting Ruby start the GC according to its
> own algorithms.  That experience makes me skeptical of that approach
> in the general case, though there are occasional specific cases where
> it can be useful.
>
>
> Kirk Haines

I understand that the GC is quite knowledgeable about when to run garbage
collection when examining the heap.  But, the GC doesn''t know anything
about
my application or it''s state.  The fact that when the GC runs
everything
stops is why I''d prefer to limit when the GC will run.  I''d
rather it run
outside of serving a web request rather then when it''s right in the
middle
of serving requests.

I know that the ideal situation is to not need to run the GC, but the
reality is that I''m using various gems and plugins and not all are well
behaved and free of memory leaks.  Rails itself may also have regular leaks
from time to time and I''d prefer to have my application consistently be
slow
than randomly (and unexpectedly) be slow.  The alternative is to terminate
your application after N number of requests and never run the GC, which
I''m
not a fan of.

- scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080321/50aab13b/attachment.html

Kirk Haines

2008-Mar-21 20:19 UTC

head link

[Mongrel] mongrel garbage collection

On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at gmail.com>
wrote:
> I understand that the GC is quite knowledgeable about when to run garbage
> collection when examining the heap.  But, the GC doesn''t know
anything about
> my application or it''s state.  The fact that when the GC runs
everything
> stops is why I''d prefer to limit when the GC will run. 
I''d rather it run
> outside of serving a web request rather then when it''s right in
the middle
> of serving requests.
It doesn''t matter, if one is looking at overall throughput.  And how
long do your GC runs take?  If you have a GC invocation that is
noticable on a single request, your processes must be gigantic, which
would suggest to me that there''s a more fundamental problem with the
app.
> I know that the ideal situation is to not need to run the GC, but the
> reality is that I''m using various gems and plugins and not all are
well
> behaved and free of memory leaks.  Rails itself may also have regular leaks
No, it''s impractical to never run the GC.  The ideal situation, at
least where execution performance and throughput on a high performance
app is concerned, is to just intelligently reduce how often it needs
to run by paying attention to your object creation.  In particular,
pay attention to the throwaway object creation.
> from time to time and I''d prefer to have my application
consistently be slow
> than randomly (and unexpectedly) be slow.  The alternative is to terminate
> your application after N number of requests and never run the GC, which
I''m
> not a fan of.
If your goal is to deal with memory leaks, then you really need to
define what that means in a GC''d language like Ruby.
To me, a leak is something that consumes memory in a way that eludes
the GC''s ability to track it and reuse it.  The fundamental nature of
that sort of thing is that the GC can''t help you with it.

If by leaks, you mean code that just creates a lot of objects that the
GC needs to clean up, then those aren''t leaks.  It may be inefficient
code, but it''s not a memory leak.

And in the end, while disabling GC over the course of a request may
result in processing that one request more quickly than it would have
been processed otherwise, the disable/enable dance is going to cost
you something.

You''ll likely either end up using more RAM than you otherwise would
have in between GC calls, resulting in bigger processes, or you end up
calling GC more often than you otherwise would have, reducing your
high performance app''s throughput.

And for the general cases, that''s not an advantageous situation.

To be more specific, if excessive RAM usage and GC costs that are
noticable to the user during requests is a common thing for Rails
apps, and the reason for that is bad code in Rails and not just bad
user code, then the Rails folks should be the targets of a
conversation on the matter.  Mongrel itself, though, does not need to
be, and should not be playing manual memory management games on the
behalf of a web framework.

Kirk Haines

Evan Weaver

2008-Mar-21 20:22 UTC

head link

[Mongrel] mongrel garbage collection

> The alternative is to terminate your application after N number of requests
and never run the > GC, which I''m not a fan of.
WSGI (Python) can do that, and it''s a pretty nice alternative to
having Monit kill a leaky app that may have a bunch of requests queued
up (Mongrel soft shutdown not withstanding).

Evan

On Fri, Mar 21, 2008 at 3:23 PM, Scott Windsor <swindsor at gmail.com>
wrote:> On Fri, Mar 21, 2008 at 11:49 AM, Kirk Haines <wyhaines at gmail.com>
wrote:
>
> >
> > On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at
gmail.com>
> wrote:
> > > Sorry, for the re-post, but I''m new to the mailing list
and wanted to
> bring
> > > back up and old topic I saw in the archives.
> > >
> > >
http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html
> > >
> > > I think a patch to delay garbage collection and run it later is
pretty
> > > important for high performance web applications.  I do understand
the
> >
> > In the vast majority of cases you are going to do a worse job of
> > determining when and how often to run the GC than even MRI
Ruby''s
> > simple algorithms.  MRI garbage collection stops the world -- nothing
> > else happens while the GC runs --  so when talking about overall
> > throughput on an application, you don''t want it to run any
more than
> > necessary.
> >
> > I don''t use Rails, but in the past I have experimented with
this quite
> > a lot under IOWA, and in my normal applications (i.e. not using
> > RMagick) I could never come up with an algorithm of self-managed
> > GC.disable/GC.enable/GC.start that gave the same overall level of
> > throughput that I got by letting Ruby start the GC according to its
> > own algorithms.  That experience makes me skeptical of that approach
> > in the general case, though there are occasional specific cases where
> > it can be useful.
> >
> >
> > Kirk Haines
>
> I understand that the GC is quite knowledgeable about when to run garbage
> collection when examining the heap.  But, the GC doesn''t know
anything about
> my application or it''s state.  The fact that when the GC runs
everything
> stops is why I''d prefer to limit when the GC will run. 
I''d rather it run
> outside of serving a web request rather then when it''s right in
the middle
> of serving requests.
>
> I know that the ideal situation is to not need to run the GC, but the
> reality is that I''m using various gems and plugins and not all are
well
> behaved and free of memory leaks.  Rails itself may also have regular leaks
> from time to time and I''d prefer to have my application
consistently be slow
> than randomly (and unexpectedly) be slow.  The alternative is to terminate
> your application after N number of requests and never run the GC, which
I''m
> not a fan of.
>
> - scott
>
> _______________________________________________
>  Mongrel-users mailing list
>  Mongrel-users at rubyforge.org
>  http://rubyforge.org/mailman/listinfo/mongrel-users
>


-- 
Evan Weaver
Cloudburst, LLC

Evan Weaver

2008-Mar-21 20:40 UTC

head link

[Mongrel] mongrel garbage collection

> You''ll likely either end up using more RAM than you otherwise
would
> have in between GC calls, resulting in bigger processes
This is definitely true. Keep in mind that the in-struct mark phase
means that the entire process has to lurch out of swap whenever the GC
runs. Since the process is now much bigger, and the pages idled longer
and are more likely to be swapped out, that can be pretty a brutal
hit.

Evan

On Fri, Mar 21, 2008 at 4:19 PM, Kirk Haines <wyhaines at gmail.com>
wrote:> On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at
gmail.com> wrote:
>
>  > I understand that the GC is quite knowledgeable about when to run
garbage
>  > collection when examining the heap.  But, the GC doesn''t
know anything about
>  > my application or it''s state.  The fact that when the GC
runs everything
>  > stops is why I''d prefer to limit when the GC will run. 
I''d rather it run
>  > outside of serving a web request rather then when it''s right
in the middle
>  > of serving requests.
>
>  It doesn''t matter, if one is looking at overall throughput.  And
how
>  long do your GC runs take?  If you have a GC invocation that is
>  noticable on a single request, your processes must be gigantic, which
>  would suggest to me that there''s a more fundamental problem with
the
>  app.
>
>
>  > I know that the ideal situation is to not need to run the GC, but the
>  > reality is that I''m using various gems and plugins and not
all are well
>  > behaved and free of memory leaks.  Rails itself may also have regular
leaks
>
>  No, it''s impractical to never run the GC.  The ideal situation,
at
>  least where execution performance and throughput on a high performance
>  app is concerned, is to just intelligently reduce how often it needs
>  to run by paying attention to your object creation.  In particular,
>  pay attention to the throwaway object creation.
>
>
>  > from time to time and I''d prefer to have my application
consistently be slow
>  > than randomly (and unexpectedly) be slow.  The alternative is to
terminate
>  > your application after N number of requests and never run the GC,
which I''m
>  > not a fan of.
>
>  If your goal is to deal with memory leaks, then you really need to
>  define what that means in a GC''d language like Ruby.
>  To me, a leak is something that consumes memory in a way that eludes
>  the GC''s ability to track it and reuse it.  The fundamental
nature of
>  that sort of thing is that the GC can''t help you with it.
>
>  If by leaks, you mean code that just creates a lot of objects that the
>  GC needs to clean up, then those aren''t leaks.  It may be
inefficient
>  code, but it''s not a memory leak.
>
>  And in the end, while disabling GC over the course of a request may
>  result in processing that one request more quickly than it would have
>  been processed otherwise, the disable/enable dance is going to cost
>  you something.
>
>  You''ll likely either end up using more RAM than you otherwise
would
>  have in between GC calls, resulting in bigger processes, or you end up
>  calling GC more often than you otherwise would have, reducing your
>  high performance app''s throughput.
>
>  And for the general cases, that''s not an advantageous situation.
>
>  To be more specific, if excessive RAM usage and GC costs that are
>  noticable to the user during requests is a common thing for Rails
>  apps, and the reason for that is bad code in Rails and not just bad
>  user code, then the Rails folks should be the targets of a
>  conversation on the matter.  Mongrel itself, though, does not need to
>  be, and should not be playing manual memory management games on the
>  behalf of a web framework.
>
>
>
>
>  Kirk Haines
>  _______________________________________________
>  Mongrel-users mailing list
>  Mongrel-users at rubyforge.org
>  http://rubyforge.org/mailman/listinfo/mongrel-users
>


-- 
Evan Weaver
Cloudburst, LLC

Steve Midgley

2008-Mar-21 21:19 UTC

head link

[Mongrel] mongrel garbage collection

At 01:19 PM 3/21/2008, mongrel-users-request at rubyforge.org
wrote:>Date: Fri, 21 Mar 2008 14:19:01 -0600
>From: "Kirk Haines" <wyhaines at gmail.com>
>Subject: Re: [Mongrel] mongrel garbage collection
>To: mongrel-users at rubyforge.org
>Message-ID:
>         <f4cd26df0803211319o491ad3c1u92ef5509c433c76b at
mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at gmail.com>
>wrote:
>
> > I understand that the GC is quite knowledgeable about when to run 
> garbage
> > collection when examining the heap.  But, the GC doesn''t know
> anything about
> > my application or it''s state.  The fact that when the GC runs
> everything
> > stops is why I''d prefer to limit when the GC will run. 
I''d rather
> it run
> > outside of serving a web request rather then when it''s right
in the
> middle
> > of serving requests.
>
>It doesn''t matter, if one is looking at overall throughput.
Hi Kirk,

One thought on this - would it be possible to schedule GC to run just 
after all the html has been rendered to the client from Rails, but 
while leaving open the connection (so that mongrel is blocked on 
Rails)?

If so, it seems like if one were using something like nginx fair proxy, 
then the mongrel would be running it''s garbage collection AFTER the 
client got all its html but BEFORE any new requests were sent to it.

In a fully loaded server it wouldn''t matter at all, but most 
environments have a little headroom at least, so that nginx fair proxy 
would just route around the mongrel that is still running a GC at the 
end of it''s Rails loop.

So total throughput for a given (non-max) volume of requests might be 
unaffected since nothing would ever pile up behind a rails process that 
has slowed down to run GC (and the client will be happy since they got 
all their html before the GC started).

I have no idea if this is meaningful, but I''ve been playing with some 
performance tests against mongrel + nginx fair proxy and it occurs to 
me that this might be relevant..

Best,

Steve

Dave Cheney

2008-Mar-22 10:39 UTC

head link

[Mongrel] mongrel garbage collection

On 22/03/2008, at 8:19 AM, Steve Midgley wrote:
> If so, it seems like if one were using something like nginx fair  
> proxy,
> then the mongrel would be running it''s garbage collection AFTER
the
> client got all its html but BEFORE any new requests were sent to it.
>
> In a fully loaded server it wouldn''t matter at all, but most
> environments have a little headroom at least, so that nginx fair proxy
> would just route around the mongrel that is still running a GC at the
> end of it''s Rails loop.
That would only be true if you set the connect timeout on the backend  
to 1 second AND your GC pass took longer than 1 second.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080322/67f7c34a/attachment.html

Scott Windsor

2008-Mar-24 15:21 UTC

head link

[Mongrel] mongrel garbage collection

On Fri, Mar 21, 2008 at 1:19 PM, Kirk Haines <wyhaines at gmail.com>
wrote:
> On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at
gmail.com> wrote:
>
> > I understand that the GC is quite knowledgeable about when to run
> garbage
> > collection when examining the heap.  But, the GC doesn''t know
anything
> about
> > my application or it''s state.  The fact that when the GC runs
everything
> > stops is why I''d prefer to limit when the GC will run. 
I''d rather it
> run
> > outside of serving a web request rather then when it''s right
in the
> middle
> > of serving requests.
>
> It doesn''t matter, if one is looking at overall throughput.  And
how
> long do your GC runs take?  If you have a GC invocation that is
> noticable on a single request, your processes must be gigantic, which
> would suggest to me that there''s a more fundamental problem with
the
> app.

Right now, my processes aren''t gigantic... I''m preparing for a
''worst case''
scenario when I have a extremely large processes or memory usage.  This can
easily happen on specific applications such as an image server (using image
magick) or parsing/creating large xml payloads (a large REST server).  For
those applications, I may have a large amount of memory used for each
request, which will increase until the GC is run.

> > I know that the ideal situation is to not need to run the GC, but the
> > reality is that I''m using various gems and plugins and not
all are well
> > behaved and free of memory leaks.  Rails itself may also have regular
> leaks
>
> No, it''s impractical to never run the GC.  The ideal situation, at
> least where execution performance and throughput on a high performance
> app is concerned, is to just intelligently reduce how often it needs
> to run by paying attention to your object creation.  In particular,
> pay attention to the throwaway object creation.
>
There may be perfectly good reasons to have intermediate object creation
(good encapsulation, usage of a another library/gem you can''t modify,
large
operations that you need to keep atomic).  While ideally you''d fix the
memory usage problem, this doesn''t solve all cases.

>
> > from time to time and I''d prefer to have my application
consistently be
> slow
> > than randomly (and unexpectedly) be slow.  The alternative is to
> terminate
> > your application after N number of requests and never run the GC,
which
> I''m
> > not a fan of.
>
> If your goal is to deal with memory leaks, then you really need to
> define what that means in a GC''d language like Ruby.
> To me, a leak is something that consumes memory in a way that eludes
> the GC''s ability to track it and reuse it.  The fundamental nature
of
> that sort of thing is that the GC can''t help you with it.
>
Yes, for Ruby (and other GC''d languages), it''s much harder to
leak memory
such that the GC can never clean it up - but it does (and has) happened.
This case I''m less concerned about as a leak of this magnitude should
be
considered a bug and fixed.
>
> If by leaks, you mean code that just creates a lot of objects that the
> GC needs to clean up, then those aren''t leaks.  It may be
inefficient
> code, but it''s not a memory leak.
>
Inefficient it may be - but it might be just optimizing for a different
problem.  For example, take ActiveRecord''s association cache and
it''s query
cache.  If you''re doing a large number of queries each page load,
ActiveRecord is still going to cache them for each request - this is far
better than further round trips to the database, but may lead to a large
amount of memory consumed per each request.

>
> And in the end, while disabling GC over the course of a request may
> result in processing that one request more quickly than it would have
> been processed otherwise, the disable/enable dance is going to cost
> you something.
>
Agreed.  But again, I''d rather it be a constant cost outside of
processing a
request than a variable cost inside of processing a request.

>
> You''ll likely either end up using more RAM than you otherwise
would
> have in between GC calls, resulting in bigger processes, or you end up
> calling GC more often than you otherwise would have, reducing your
> high performance app''s throughput.
>
> And for the general cases, that''s not an advantageous situation.
>
This can vary from application to application - all the more reason to make
this a configurable option (and not the default).
>
> To be more specific, if excessive RAM usage and GC costs that are
> noticable to the user during requests is a common thing for Rails
> apps, and the reason for that is bad code in Rails and not just bad
> user code, then the Rails folks should be the targets of a
> conversation on the matter.  Mongrel itself, though, does not need to
> be, and should not be playing manual memory management games on the
> behalf of a web framework.
>
>
> Kirk Haines
>
I still disagree on this point - I doubt that Rails is the only web
framework that would benefit from being able to control when the GC is run.
This is going to be a common problem across frameworks whenever web
applications are consuming then releasing large amounts of memory - I''d
say
it can be a pretty common use case for certain types of web applications.

- scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/4fa5c50e/attachment-0001.html

Scott Windsor

2008-Mar-24 15:27 UTC

head link

[Mongrel] mongrel garbage collection

If you plan on regularly killing your application (for whatever reason),
then this is a pretty good option.  This is a pretty common practice for
apache modules and fastcgi applications as a hold-over from dealing with
older leaky C apps.

I''d personally prefer for my Ruby web apps to re-run the GC rather than
have
to startup/shutdown/parse configs/connect to external resources costs, but
it''s because they are far less likely to leak memory that the GC
can''t catch
or get into an unstable state.

- scott

On Fri, Mar 21, 2008 at 1:22 PM, Evan Weaver <evan at cloudbur.st> wrote:
> > The alternative is to terminate your application after N number of
> requests and never run the > GC, which I''m not a fan of.
>
> WSGI (Python) can do that, and it''s a pretty nice alternative to
> having Monit kill a leaky app that may have a bunch of requests queued
> up (Mongrel soft shutdown not withstanding).
>
> Evan
>
> On Fri, Mar 21, 2008 at 3:23 PM, Scott Windsor <swindsor at
gmail.com> wrote:
> > On Fri, Mar 21, 2008 at 11:49 AM, Kirk Haines <wyhaines at
gmail.com>
> wrote:
> >
> > >
> > > On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at
gmail.com>
> > wrote:
> > > > Sorry, for the re-post, but I''m new to the mailing
list and wanted
> to
> > bring
> > > > back up and old topic I saw in the archives.
> > > >
> > > >
> http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html
> > > >
> > > > I think a patch to delay garbage collection and run it later
is
> pretty
> > > > important for high performance web applications.  I do
understand
> the
> > >
> > > In the vast majority of cases you are going to do a worse job of
> > > determining when and how often to run the GC than even MRI
Ruby''s
> > > simple algorithms.  MRI garbage collection stops the world --
nothing
> > > else happens while the GC runs --  so when talking about overall
> > > throughput on an application, you don''t want it to run
any more than
> > > necessary.
> > >
> > > I don''t use Rails, but in the past I have experimented
with this quite
> > > a lot under IOWA, and in my normal applications (i.e. not using
> > > RMagick) I could never come up with an algorithm of self-managed
> > > GC.disable/GC.enable/GC.start that gave the same overall level of
> > > throughput that I got by letting Ruby start the GC according to
its
> > > own algorithms.  That experience makes me skeptical of that
approach
> > > in the general case, though there are occasional specific cases
where
> > > it can be useful.
> > >
> > >
> > > Kirk Haines
> >
> > I understand that the GC is quite knowledgeable about when to run
> garbage
> > collection when examining the heap.  But, the GC doesn''t know
anything
> about
> > my application or it''s state.  The fact that when the GC runs
everything
> > stops is why I''d prefer to limit when the GC will run. 
I''d rather it
> run
> > outside of serving a web request rather then when it''s right
in the
> middle
> > of serving requests.
> >
> > I know that the ideal situation is to not need to run the GC, but the
> > reality is that I''m using various gems and plugins and not
all are well
> > behaved and free of memory leaks.  Rails itself may also have regular
> leaks
> > from time to time and I''d prefer to have my application
consistently be
> slow
> > than randomly (and unexpectedly) be slow.  The alternative is to
> terminate
> > your application after N number of requests and never run the GC,
which
> I''m
> > not a fan of.
> >
> > - scott
> >
> > _______________________________________________
> >  Mongrel-users mailing list
> >  Mongrel-users at rubyforge.org
> >  http://rubyforge.org/mailman/listinfo/mongrel-users
> >
>
>
>
> --
> Evan Weaver
> Cloudburst, LLC
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/fae9ef17/attachment.html

Scott Windsor

2008-Mar-24 15:32 UTC

head link

[Mongrel] mongrel garbage collection

On Sat, Mar 22, 2008 at 3:39 AM, Dave Cheney <dave at cheney.net> wrote:
>
> On 22/03/2008, at 8:19 AM, Steve Midgley wrote:
>
> If so, it seems like if one were using something like nginx fair proxy,
> then the mongrel would be running it''s garbage collection AFTER
the
> client got all its html but BEFORE any new requests were sent to it.
>
> In a fully loaded server it wouldn''t matter at all, but most
> environments have a little headroom at least, so that nginx fair proxy
> would just route around the mongrel that is still running a GC at the
> end of it''s Rails loop.
>
>
> That would only be true if you set the connect timeout on the backend to 1
> second AND your GC pass took longer than 1 second.
>
Yes, but worse case here is that another request gets delayed before
processing.  Still potentially better (IMHO) than dealing with this delaying
when processing a request.

- scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/77dafa2d/attachment.html

Steve Midgley

2008-Mar-24 15:50 UTC

head link

[Mongrel] mongrel garbage collection

At 08:21 AM 3/24/2008, mongrel-users-request at rubyforge.org
wrote:>Message: 7
>Date: Mon, 24 Mar 2008 08:21:52 -0700
>From: "Scott Windsor" <swindsor at gmail.com>
>Subject: Re: [Mongrel] mongrel garbage collection
>To: mongrel-users at rubyforge.org
>Message-ID:
>         <aa2e18ec0803240821o1f4f5921tf4156780ed0928d at
mail.gmail.com>
>[snip]
>Right now, my processes aren''t gigantic... I''m preparing
for a ''worst
>case''
>scenario when I have a extremely large processes or memory 
>usage.  This can
>easily happen on specific applications such as an image server (using 
>image
>magick) or parsing/creating large xml payloads (a large REST 
>server).  For
>those applications, I may have a large amount of memory used for each
>request, which will increase until the GC is run.
>
>[snip]
>There may be perfectly good reasons to have intermediate object 
>creation
>(good encapsulation, usage of a another library/gem you can''t
modify,
>large
>operations that you need to keep atomic).  While ideally you''d fix
the
>memory usage problem, this doesn''t solve all cases.
Hi Scott,

I hope this somewhat OT post is ok (feedback welcome). I''ve had memory 
problems with image magick too - even when it runs out of process. On 
certain (rare but reasonably sized) image files it seems to go memory 
haywire, eating too much memory and throwing my app stack into swap.

So I wrote this simple rails plug-in which is very limited in function, 
but does mostly what I needed from an image processor. Notable for your 
issue above, it lets you easily specify limits on how much memory image 
magick is allowed to consume while doing its work (thanks to Ara Howard 
on initial direction on that one). It might be interest to you:

http://www.misuse.org/science/2008/01/30/mojomagick-ruby-image-library-for-imagemagick/

Best,

Steve

Kirk Haines

2008-Mar-24 16:51 UTC

head link

[Mongrel] mongrel garbage collection

On Mon, Mar 24, 2008 at 9:21 AM, Scott Windsor <swindsor at gmail.com>
wrote:
> Right now, my processes aren''t gigantic... I''m preparing
for a ''worst case''
> scenario when I have a extremely large processes or memory usage.  This can
> easily happen on specific applications such as an image server (using image
> magick) or parsing/creating large xml payloads (a large REST server).  For
> those applications, I may have a large amount of memory used for each
> request, which will increase until the GC is run.
(*nod*)  image magick is a well known bad citizen.  Either don''t use
it at all, or use it in an external process from your web app
processes.

And if, for whatever reason, you must use it inside of your web app
process, and your use case really does create processes so enormous
that you can perceive a response lag from a manual GC.start inside of
your request processing, then create a custom Rails handler that does
it.  You can trivially alter it to do whatever GC.foo actions you
desire.  The code is simple and easy to follow, so just make your own
Mongrel::Rails::RailsHandlerWithParanoidGCManagement.
> There may be perfectly good reasons to have intermediate object creation
> (good encapsulation, usage of a another library/gem you can''t
modify, large
> operations that you need to keep atomic).  While ideally you''d fix
the
> memory usage problem, this doesn''t solve all cases.
Obviously.  It''s easy and convenient to ignore the issue, and often
the issue doesn''t matter for a given piece of code.  But if memory
usage or execution speed becomes an issue for one''s code, going back
and taking a look at the throwaway object creation, and addressing it,
can net considerable improvements.
> Yes, for Ruby (and other GC''d languages), it''s much
harder to leak memory
> such that the GC can never clean it up - but it does (and has) happened.
> This case I''m less concerned about as a leak of this magnitude
should be
> considered a bug and fixed.
Oh, I know.  That''s why I brought it up, though.  You were talking
about memory leaks, so I wanted to make a distinction.  Real leaks,
like the Array#shift bug, or leaky continuations,  or badly behaved
Ruby extensions, aren''t affected by GC manipulations.
> Inefficient it may be - but it might be just optimizing for a different
> problem.  For example, take ActiveRecord''s association cache and
it''s query
> cache.  If you''re doing a large number of queries each page load,
> ActiveRecord is still going to cache them for each request - this is far
> better than further round trips to the database, but may lead to a large
> amount of memory consumed per each request.
Sure.  And if it''s optimizing for a different problem, then
that''s
fine, so long as the optimization isn''t creating a worse probablem
than the issue it''s trying to address.

But that''s also largely irrelevant, I think.  I just did a quick test.
 I created a program that creates 10 million objects.  It has a
footprint of about a gigabyte of RAM usage.  It takes Ruby 0.5 seconds
to walk that 10 million objects on my server.

If you have a web app that has processes anywhere near that large, you
have bigger problems to deal with.  And if you have a more reasonably
large, million object app, then on my server, the GC cost would be
0.05 seconds.  Given the typical speed of Rails apps, an occasional
0.05 second delay is going to be unnoticable.
> Agreed.  But again, I''d rather it be a constant cost outside of
processing a
> request than a variable cost inside of processing a request.
You''re worrying about something that just isn''t a problem in
the vast,
vast majority of cases.  Again, testing on my server, even with a very
simple, very fast piece of code creating objects, it takes almost 20x
as long to create the objects as to GC them.
> This can vary from application to application - all the more reason to make
> this a configurable option (and not the default).
It''s still my position that it''s not Mongrel''s job to
be implementing
a manual memory management scheme that is almost always going to be a
performance loser over just leaving it alone.

It''s still my position that if one has an application that, through
testing, has been shown to have a use case where it can actually
benefit from manual GC.foo management, then one can trivially create a
mongrel handler that will do this for you.
> I still disagree on this point - I doubt that Rails is the only web
> framework that would benefit from being able to control when the GC is run.
> This is going to be a common problem across frameworks whenever web
> applications are consuming then releasing large amounts of memory -
I''d say
> it can be a pretty common use case for certain types of web applications.
My point is that if it is _Rails_ code that is causing the problem,
that''s a _Rails_ problem.

My point is also that manual GC.foo management is going to cause more
problems than it helps for the vast majority of applications.  GC
cycles aren''t that slow, especially compared to the speed of a typical
Rails app, and certainly not when compared to the speed of a Rails
request that makes a lot of objects and does any sort of intensive,
time consuming operations.


Kirk Haines

Scott Windsor

2008-Mar-24 18:58 UTC

head link

[Mongrel] mongrel garbage collection

On Tue, Mar 25, 2008 at 11:53 AM, Zed A. Shaw <zedshaw at zedshaw.com>
wrote:
> On Mon, 24 Mar 2008 08:21:52 -0700
> "Scott Windsor" <swindsor at gmail.com> wrote:
>
> > Right now, my processes aren''t gigantic... I''m
preparing for a ''worst
> case''
> > scenario when I have a extremely large processes or memory usage. 
This
> can
> > easily happen on specific applications such as an image server (using
> image
> > magick) or parsing/creating large xml payloads (a large REST server).
>  For
> > those applications, I may have a large amount of memory used for each
> > request, which will increase until the GC is run.
>
> Well, does that mean you DO have this problem or DO NOT have this
> problem?  If you aren''t actually facing a real problem that could
be
> solved by this change then you most likely won''t get very far. 
Any
> imagined scenario you come up with could easily just be avoided.

Right now my current deployment configuration for all my rails applications
is using apache + fastcgi.

With this deployment strategy,  if I don''t set the garbage collection
in my
dispatch.fcgi, any rails application I use that uses image magick (for
resizing/effects/etc) eats memory like a hog.
In my dispatch...
http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi
I usually set this to around 50 executions per gc run and my rails apps seem
pretty happy.

This has been working great for me thus far, but using mod_fastcgi leaves
zombies processes occasionally during restart.  Checking in with the docs,
mod_fastcgi is more or less deprecated, and mod_fcgid is prefered.
mod_fcgid has all sorts of issues (random 500s and the like), and to boot
the documentation is quite poor.

So, I''ve decieded to move my apps over to using nginx with proxy with
mongrel.  The decsion to move the nginx is pretty minor (it''s lighter
weight
and easier to configure), but my decision to move to mongrel warrented a bit
of research.  I do want to ensure that all of my applications behave
properly in terms of memory consumption and the first thing I''ve
noticed is
that mongrel doesn''t have the same options available for customizing
when
the GC runs.

This leads me to believe that either there''s something specific to
rails
running under FastCGI that requires the GC to disabled/enabled during
processes execution or mongrel hasn''t implemented the feature yet.

>
> If you want to do this then you''ll have to write code and
you''ll have
> to learn how to make a Mongrel handler that is registered before and
> after the regular rails handler.  All you do is have this before handler
> analyze the request and disable the GC on the way in.  In the after
> handler you just have to renable the GC and make it do the work.
>
> It''s pretty simple, but *you* will have to go read Mongrel code
and
> understand it first.  Otherwise you''re just wasting your time
really.
--> Zed A. Shaw
> - Hate: http://savingtheinternetwithhate.com/
> - Good: http://www.zedshaw.com/
> - Evil: http://yearofevil.com/
>
Sounds good to me - I don''t mind writing code, I just want to see if I
do
spend the time if it''s something the mongrel community would accept...

Quick question about the code change....

Counting the number of processes served and determining the GC behavior
should be done inside a mutex (or we start to run the risk of running the GC
twice or mis-counting the number of requests processed).

I don''t see any common mutex used for all mongrel dispatchers, but the
logic
is specific within each type of http handler (rails, camping, etc).  Would
it make sense then to put the optional GC run check (and GC run, if
applicable) within the syncronize block for each http handler or is the
something that should live in the base HTTPHandler class?

- scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/2604c5f2/attachment.html

Luis Lavena

2008-Mar-24 19:18 UTC

head link

[Mongrel] mongrel garbage collection

On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor <swindsor at gmail.com>
wrote:>
> Right now my current deployment configuration for all my rails applications
> is using apache + fastcgi.
>
> With this deployment strategy,  if I don''t set the garbage
collection in my
> dispatch.fcgi, any rails application I use that uses image magick (for
> resizing/effects/etc) eats memory like a hog.
>  In my dispatch...
> http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi
> I usually set this to around 50 executions per gc run and my rails apps
seem
> pretty happy.
>
You''re using *RMagick*, not ImageMagick directly. If you used the
later (via system calls) there will no be memory leakage you can worry
about.
> This has been working great for me thus far, but using mod_fastcgi leaves
> zombies processes occasionally during restart.  Checking in with the docs,
> mod_fastcgi is more or less deprecated, and mod_fcgid is prefered.
> mod_fcgid has all sorts of issues (random 500s and the like), and to boot
> the documentation is quite poor.
>
Moving from FastCGi to Mongrel will also require you monitor your
cluster processes with external tools, since you''re suing things that
leak too much memory like RMagick and requires restart of the process.

To make it clear: the memory leaked by RMagick cannot be recovered
with garbage collection mechanism. I tried that several times but in
the long run, required to restart and hunt down all the zombies
processes left by Apache.
> So, I''ve decieded to move my apps over to using nginx with proxy
with
> mongrel.  The decsion to move the nginx is pretty minor (it''s
lighter weight
> and easier to configure), but my decision to move to mongrel warrented a
bit
> of research.  I do want to ensure that all of my applications behave
> properly in terms of memory consumption and the first thing I''ve
noticed is
> that mongrel doesn''t have the same options available for
customizing when
> the GC runs.
>
Can you tell me how you addressed the "schedule" of the garbage
collection execution on your previous scenario? AFAIK most of the
frameworks or servers don''t impose to the user how often GC should be
performed.
> This leads me to believe that either there''s something specific to
rails
> running under FastCGI that requires the GC to disabled/enabled during
> processes execution or mongrel hasn''t implemented the feature yet.
>
I''ll bet is rails specific, or you should take a look at the fcgi ruby
extension, since it is responsible, ruby-side, of bridging both
worlds.

On a personal note, I believe is not responsibility of Mongrel, as a
webserver, take care of the garbage collection and leakage issues of
the Vm on which your application runs. In any case, the GC of the VM
(MRI Ruby) should be enhanced to work better with heavy load and long
running environments.

-- 
Luis Lavena
Multimedia systems
-
Human beings, who are almost unique in having the ability to learn from
the experience of others, are also remarkable for their apparent
disinclination to do so.
Douglas Adams

Scott Windsor

2008-Mar-24 19:59 UTC

head link

[Mongrel] mongrel garbage collection

On Mon, Mar 24, 2008 at 12:18 PM, Luis Lavena <luislavena at gmail.com>
wrote:
> On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor <swindsor at
gmail.com> wrote:
>
>
> You''re using *RMagick*, not ImageMagick directly. If you used the
> later (via system calls) there will no be memory leakage you can worry
> about.

You''re correct - I''m using ''RMagick'' - and
it uses a large amount of
memory.  But that''s not really the overall point.  My overall point is
how
to properly handle a rails app that uses a great deal of memory during each
request.  I''m pretty sure this happens in other rails applications that
don''t happen to use ''RMagick''.

>
> Moving from FastCGi to Mongrel will also require you monitor your
> cluster processes with external tools, since you''re suing things
that
> leak too much memory like RMagick and requires restart of the process.
>
Yes, although all monitoring will be able to do is kill off a mis-behaved
application.  I''d much rather run garbage collection rather than kill
of my
application.
>
> To make it clear: the memory leaked by RMagick cannot be recovered
> with garbage collection mechanism. I tried that several times but in
> the long run, required to restart and hunt down all the zombies
> processes left by Apache.
>
So far, running the GC under fastcgi has given me pretty good results.  The
zombing issue with fast cgi is a known issue with mod_fastcgi and I''m
pretty
sure unrelated to RMagick or garbage collection.

Can you tell me how you addressed the "schedule" of the
garbage> collection execution on your previous scenario? AFAIK most of the
> frameworks or servers don''t impose to the user how often GC should
be
> performed.
>
In the previous scenario I was using fast_cgi with rails.  In my previous
reply I provided a link to the rails fastcgi dispatcher.
http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi

In addtion, in other languages and other language web frameworks there are
provisions to control garbage collection (for languages that have garbage
collections, of course).

> I''ll bet is rails specific, or you should take a look at the fcgi
ruby
> extension, since it is responsible, ruby-side, of bridging both
> worlds.
>
This is done in the Rails FastCGI dispatcher.  I believe that the equivalent
of this in Mongrel is the Mongrel Rails dispatcher.  Since the Mongrel Rails
dispatcher is distributed as a part of Mongrel, I''d say this code is
owned
by Mongrel, which bridges these two worlds when using mongrel as a
webserver.

>
> On a personal note, I believe is not responsibility of Mongrel, as a
> webserver, take care of the garbage collection and leakage issues of
> the Vm on which your application runs. In any case, the GC of the VM
> (MRI Ruby) should be enhanced to work better with heavy load and long
> running environments.
>
Ruby provides an API to access and call the Garbage Collector.  This
provides ruby application developers the ability to control when the garbage
collection is run because in some cases, there may be an
application-specific reason to prevent or explicity run the GC.  Web servers
are a good example of these applications where state may help determine a
better time to run the GC.  As you''re serving each request,
you''re generally
allocating a number of objects, then rendering output, then moving on to the
next request.

By limiting the GC to run in between requests rather than during requests
you are trading request time for latency between requests.  This is a
trade-off that I think web application developers should deciede, but by no
means should this be a default or silver bullet for all.  My position is
that this just be an option within Mongrel as a web server.

- scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/8035dbd3/attachment-0001.html

Luis Lavena

2008-Mar-24 20:37 UTC

head link

[Mongrel] mongrel garbage collection

On Mon, Mar 24, 2008 at 4:59 PM, Scott Windsor <swindsor at gmail.com>
wrote:> On Mon, Mar 24, 2008 at 12:18 PM, Luis Lavena <luislavena at
gmail.com> wrote:
>
> >
> >
> > On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor <swindsor at
gmail.com> wrote:
> >
> >
> >
> > You''re using *RMagick*, not ImageMagick directly. If you used
the
> > later (via system calls) there will no be memory leakage you can worry
> > about.
>
> You''re correct - I''m using ''RMagick'' -
and it uses a large amount of memory.
> But that''s not really the overall point.  My overall point is how
to
> properly handle a rails app that uses a great deal of memory during each
> request.  I''m pretty sure this happens in other rails applications
that
> don''t happen to use ''RMagick''.
>
Yes, I faced huge memory usage issues with other things non related to
image processing and found that a good thing was move them out of the
request-response cycle and into a out-of-bound background job.
>
> So far, running the GC under fastcgi has given me pretty good results.  The
> zombing issue with fast cgi is a known issue with mod_fastcgi and
I''m pretty
> sure unrelated to RMagick or garbage collection.
>
Yes, but even you "reclaim" the memory with GC, there will be pieces
that wouldn''t be GC''ed ever, since the leaked in the C side,
outside
GC control (some of the RMagick and ImageMagick mysteries).
>
> > Can you tell me how you addressed the "schedule" of the
garbage
> > collection execution on your previous scenario? AFAIK most of the
> > frameworks or servers don''t impose to the user how often GC
should be
> > performed.
> >
> >
>
> In the previous scenario I was using fast_cgi with rails.  In my previous
> reply I provided a link to the rails fastcgi dispatcher.
>
> http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi
>
> In addtion, in other languages and other language web frameworks there are
> provisions to control garbage collection (for languages that have garbage
> collections, of course).
>
>
> > I''ll bet is rails specific, or you should take a look at the
fcgi ruby
> > extension, since it is responsible, ruby-side, of bridging both
> > worlds.
> >
>
> This is done in the Rails FastCGI dispatcher.  I believe that the
equivalent
> of this in Mongrel is the Mongrel Rails dispatcher.  Since the Mongrel
Rails
> dispatcher is distributed as a part of Mongrel, I''d say this code
is owned
> by Mongrel, which bridges these two worlds when using mongrel as a
> webserver.
>
Then you could provide a different Mongrel Handler that could perform
that, or even a series of GemPlugins that provide a gc:start instead
of plain ''start'' command mongrel_rails scripts provides.
> >
> > On a personal note, I believe is not responsibility of Mongrel, as a
> > webserver, take care of the garbage collection and leakage issues of
> > the Vm on which your application runs. In any case, the GC of the VM
> > (MRI Ruby) should be enhanced to work better with heavy load and long
> > running environments.
> >
>
> Ruby provides an API to access and call the Garbage Collector.  This
> provides ruby application developers the ability to control when the
garbage
> collection is run because in some cases, there may be an
> application-specific reason to prevent or explicity run the GC.  Web
servers
> are a good example of these applications where state may help determine a
> better time to run the GC.  As you''re serving each request,
you''re generally
> allocating a number of objects, then rendering output, then moving on to
the
> next request.
>
> By limiting the GC to run in between requests rather than during requests
> you are trading request time for latency between requests.  This is a
> trade-off that I think web application developers should deciede, but by no
> means should this be a default or silver bullet for all.  My position is
> that this just be an option within Mongrel as a web server.
>
--gc-interval maybe?

Now that you convinced me and proved your point, having the option to
perform it (optionally, not forced) will be something good to have.

Patches are Welcome ;-)

-- 
Luis Lavena
Multimedia systems
-
Human beings, who are almost unique in having the ability to learn from
the experience of others, are also remarkable for their apparent
disinclination to do so.
Douglas Adams

James Tucker

2008-Mar-25 10:40 UTC

head link

[Mongrel] mongrel garbage collection

Forgive me for not having read the whole thread, however, there is one  
thing that seems to be really important, and that is, ruby hardly ever  
runs the damned GC. It certainly doesn''t do full runs nearly often  
enough (IMO).

Also, implicit OOMEs or GC runs quite often DO NOT affect the  
extensions correctly. I don''t know what rmagick is doing under the  
hood in this area, but having been generating large portions of  
country maps with it (and moving away from it very rapidly), I know  
the GC doesn''t do "The Right Thing".

First call of address is GC_MALLOC_LIMIT and friends. For any small  
script that doesn''t breach that value, the GC simply doesn''t
run. More
than this, RMagick, in it''s apparent ''wisdom'' never
frees memory if
the GC never runs. Seriously, check it out. Make a tiny script, and  
make a huge image with it. Hell, make 20, get an OOME, and watch for a  
run of the GC. The OOME will reach your code before the GC calls on  
RMagick to free.

Now, add a call to GC.start, and no OOME. Despite the limitations of  
it (ruby performance only IMO), most of the above experience was built  
up on windows, and last usage was about 6 months ago, FYI.

On 24 Mar 2008, at 20:37, Luis Lavena wrote:> On Mon, Mar 24, 2008 at 4:59 PM, Scott Windsor <swindsor at
gmail.com>
> wrote:
>> On Mon, Mar 24, 2008 at 12:18 PM, Luis Lavena  
>> <luislavena at gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor  
>>> <swindsor at gmail.com> wrote:
>>>
>>>
>>>
>>> You''re using *RMagick*, not ImageMagick directly. If you
used the
>>> later (via system calls) there will no be memory leakage you can  
>>> worry
>>> about.
>>
>> You''re correct - I''m using
''RMagick'' - and it uses a large amount
>> of memory.
>> But that''s not really the overall point.  My overall point is
how to
>> properly handle a rails app that uses a great deal of memory during  
>> each
>> request.  I''m pretty sure this happens in other rails
applications
>> that
>> don''t happen to use ''RMagick''.
Personally, I''ll simply say call the GC more often. Seriously. I mean  
it. It''s not *that* slow, not at all. In fact, I call GC.start  
explicitly inside of by ubygems.rb due to stuff I have observed before:

http://blog.ra66i.org/archives/informatics/2007/10/05/calling-on-the-gc-after-rubygems/
    - N.B. This isn''t "FIXED" it''s still a good idea
(gem 1.0.1).
http://zdavatz.wordpress.com/2007/07/18/heap-fragmentation-in-a-long-running-ruby-process/

Now, by my reckoning (and a few production apps seem to be showing  
emperically (purely emperical, sorry)) we should be calling on the GC  
whilst loading up the apps. I mean come on, when are a really serious  
number of temporary objects being created. Actually, it''s when  
rubygems loads, and that''s the first thing that happens in, hmm,  
probably over 90% of ruby processes out there.
>
> Yes, I faced huge memory usage issues with other things non related to
> image processing and found that a good thing was move them out of the
> request-response cycle and into a out-of-bound background job.
>
>>
>> So far, running the GC under fastcgi has given me pretty good  
>> results.  The
>> zombing issue with fast cgi is a known issue with mod_fastcgi and  
>> I''m pretty
>> sure unrelated to RMagick or garbage collection.
>>
>
> Yes, but even you "reclaim" the memory with GC, there will be
pieces
> that wouldn''t be GC''ed ever, since the leaked in the C
side, outside
> GC control (some of the RMagick and ImageMagick mysteries).
Sure, but leaks are odd things. Some processes that appear to be  
leaking are really just fragmenting (allocating more ram due to lack  
of ''usable'' space on ''the heap''. Call the GC
more often, take a 0.01%
performance hit, and monitor. I bet it''ll get better. In fact, you can
drop fragmentation the first allocated segment significantly just by  
calling GC.start after a rubygems load, if you have more than a few  
gems.
>>>
>>> Can you tell me how you addressed the "schedule" of the
garbage
>>> collection execution on your previous scenario? AFAIK most of the
>>> frameworks or servers don''t impose to the user how often
GC should
>>> be
>>> performed.
In fact there are many rubyists who hate the idea of splatting  
GC.start into processes. Given what I''ve seen, I''m willing to
reject
that notion completely. Test yourself, YMMV.

FYI, even on windows under the OCI, where performance for the  
interpreter sucks, really really hard, I couldn''t reliably measure the
runtime of a call to GC.start after loading rubygems. I don''t know  
what kind of ''performance'' people are after, but I
can''t see the point
in not running the GC more often, especially for ''more common''
daemon
load. Furthermore, hitting the kernel for more allocations more often,  
is actually pretty slow too, so this may actually even result in  
faster processes under *certain* conditions.

Running a lib like RMagick, I would say you *should* be doing this,  
straight up, no arguments.
>>
>> In the previous scenario I was using fast_cgi with rails.  In my  
>> previous
>> reply I provided a link to the rails fastcgi dispatcher.
>>
>>
http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi
>>
>> In addtion, in other languages and other language web frameworks  
>> there are
>> provisions to control garbage collection (for languages that have  
>> garbage
>> collections, of course).
>>
>>
>>> I''ll bet is rails specific, or you should take a look at
the fcgi
>>> ruby
>>> extension, since it is responsible, ruby-side, of bridging both
>>> worlds.
>>>
>>
>> This is done in the Rails FastCGI dispatcher.  I believe that the  
>> equivalent
>> of this in Mongrel is the Mongrel Rails dispatcher.  Since the  
>> Mongrel Rails
>> dispatcher is distributed as a part of Mongrel, I''d say this
code
>> is owned
>> by Mongrel, which bridges these two worlds when using mongrel as a
>> webserver.
It doesn''t *really* matter where you run the GC. It matters that it  
runs, how often, and what it''s doing. If you''re actually
calling on
the GC and freeing nothing, that''s stupid, but if you''ve run
RMagick
up, just call GC.start anyway, and I''m pretty sure it''ll help.
There''s
certainly no harm in investigating this, unless you''re doing something
silly with weakrefs.

> Then you could provide a different Mongrel Handler that could perform
> that, or even a series of GemPlugins that provide a gc:start instead
> of plain ''start'' command mongrel_rails scripts provides.

$occasional_gc_run_counter = 0
before_filter :occasional_gc_run

def occasional_gc_run
   $occasional_gc_run_counter += 1
   if $occasional_gc_run_counter > 1_000
     $occasional_gc_run_counter = 0
     GC.start
   end
end

Or whatever. It doesn''t really matter that much where you do this, or  
when, it just needs to happen every now and then. More importantly,  
add a GC.start to the end of environment.rb, and you will have  
literally half the number of objects in ObjectSpace.
>>> On a personal note, I believe is not responsibility of Mongrel, as
a
>>> webserver, take care of the garbage collection and leakage issues
of
>>> the Vm on which your application runs. In any case, the GC of the
VM
>>> (MRI Ruby) should be enhanced to work better with heavy load and  
>>> long
>>> running environments.
Right, and it''s not just the interpreter, although indirection around  
this stuff can help. (such as compacting).
>>
>> Ruby provides an API to access and call the Garbage Collector.  This
>> provides ruby application developers the ability to control when  
>> the garbage
>> collection is run because in some cases, there may be an
>> application-specific reason to prevent or explicity run the GC.   
>> Web servers
>> are a good example of these applications where state may help  
>> determine a
>> better time to run the GC.  As you''re serving each request,
you''re
>> generally
>> allocating a number of objects, then rendering output, then moving  
>> on to the
>> next request.
>>
>> By limiting the GC to run in between requests rather than during  
>> requests
>> you are trading request time for latency between requests.  This is a
>> trade-off that I think web application developers should deciede,  
>> but by no
>> means should this be a default or silver bullet for all.  My  
>> position is
>> that this just be an option within Mongrel as a web server.
>>
Right, I think this is important too. You''re absolutely right that  
there''s no specific place to provide a generic solution. In rails the  
answer may be simple, but that''s because rails outer architecture is  
simplistic. No threads, no out-of-request processing, and so
on.>
> --gc-interval maybe?
>
> Now that you convinced me and proved your point, having the option to
> perform it (optionally, not forced) will be something good to have.
Surely you can just:

require ''thread''
Thread.new { loop { sleep GC_FORCE_INTERVAL; GC.start } }

In environment.rb in that case.

Of course, this is going to kill performance under evented_mongrel,  
thin and so on. I''d stay away from threaded solutions. _why blogged  
years ago about the GC, trying to remind people that we actually have  
control. I know ruby is supposed to abstract memory problems etc away  
from us, and for the most part it does, but hey, no one''s perfect,  
right? :-)

http://whytheluckystiff.net/articles/theFullyUpturnedBin.html
> Patches are Welcome ;-)
Have fun! :o)

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20080325/75800781/attachment-0001.html

Kirk Haines

2008-Mar-25 15:26 UTC

head link

[Mongrel] mongrel garbage collection

On Tue, Mar 25, 2008 at 4:40 AM, James Tucker <jftucker at gmail.com>
wrote:> Forgive me for not having read the whole thread, however, there is one
thing
> that seems to be really important, and that is, ruby hardly ever runs the
> damned GC. It certainly doesn''t do full runs nearly often enough
(IMO).
There''s only one kind of garbage collection sweep.  And yeah,
depending on what''s happening, GC may not run very often. 
That''s not
generally a problem.
> Also, implicit OOMEs or GC runs quite often DO NOT affect the extensions
> correctly. I don''t know what rmagick is doing under the hood in
this area,
> but having been generating large portions of country maps with it (and
> moving away from it very rapidly), I know the GC doesn''t do
"The Right
> Thing".
There should be no difference between a GC run that is initiated by
the interpreter and one that is initiated by one''s code.  It ends up
calling the same thing in gc.c.  Extensions can easily mismanage
memory, though, and I have a hunch about what''s happening with
rmagick.
> First call of address is GC_MALLOC_LIMIT and friends. For any small script
> that doesn''t breach that value, the GC simply doesn''t
run. More than this,
> RMagick, in it''s apparent ''wisdom'' never frees
memory if the GC never runs.
> Seriously, check it out. Make a tiny script, and make a huge image with it.
> Hell, make 20, get an OOME, and watch for a run of the GC. The OOME will
> reach your code before the GC calls on RMagick to free.
>
> Now, add a call to GC.start, and no OOME. Despite the limitations of it
> (ruby performance only IMO), most of the above experience was built up on
> windows, and last usage was about 6 months ago, FYI.
My hunch is that rmagick is allocating large amounts of RAM ouside of
Ruby.  It registers its objects with the interpreter, but the RAM
usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT because
Ruby didn''t allocate it, so doesn''t know about it.

So, it uses huge amounts of RAM, but doesn''t use huge numbers of
objects.  Thus you never trigger a GC cycle by exceeding the
GC_MALLOC_LIMIT nor by running our of object slots in the heap.  I''d
have to go look at the code to be sure, but the theory fits the
behavior that is described very well.

I don''t think this is a case for building GC.foo memory management
into Mongrel, though.  As I think you are suggesting, just call
GC.start yourself in your code when necessary.  In a typical Rails app
doing big things with rmagick, the extra time to do GC.start at the
end of the image manipulation, in the request handling, isn''t going to
be noticable.
> But that''s not really the overall point.  My overall point is how
to
> properly handle a rails app that uses a great deal of memory during each
> request.  I''m pretty sure this happens in other rails applications
that
> don''t happen to use ''RMagick''.
>
> Personally, I''ll simply say call the GC more often. Seriously. I
mean it.
> It''s not *that* slow, not at all. In fact, I call GC.start
explicitly inside
> of by ubygems.rb due to stuff I have observed before:
I completely concur with this.  If there are issues with huge memory
use (most likely caused by extensions making RAM allocations outside
of Ruby''s accounting, so implicit GC isn''t triggered), just
call
GC.start in one''s own code.
> Now, by my reckoning (and a few production apps seem to be showing
> emperically (purely emperical, sorry)) we should be calling on the GC
whilst
> loading up the apps. I mean come on, when are a really serious number of
> temporary objects being created. Actually, it''s when rubygems
loads, and
> that''s the first thing that happens in, hmm, probably over 90% of
ruby
> processes out there.
Just as a tangent, I do this in Swiftiply.  I make an explicit call to
GC.start after everything is loaded and all configs are parsed, just
to make sure execution is going into the main event loop with as much
junk cleaned out as possible.
> Or whatever. It doesn''t really matter that much where you do this,
or when,
> it just needs to happen every now and then. More importantly, add a
GC.start
> to the end of environment.rb, and you will have literally half the number
of
> objects in ObjectSpace.
This makes sense to me.

I could also see providing a 2nd Rails handler that had some GC
management stuff in it, along with some documentation on what it
actually does or does not do, so people can make an explicit choice to
use it, if they need it.  I''m still completely against throwing code
into Mongrel itself for this sort of thing.  I just prefer not to
throw more things into Mongrel than we really _need_ to, when there is
no strong argument for them being inside of Mongrel itself.  GC.start
stuff is simple enough to put into one''s own code at appropriate
locations, or to put into a customized Mongrel handler if one needs
it.

Maybe this simply needs to be documented in the body of Mongrel documentation?


Kirk Haines

Evan Weaver

2008-Mar-25 17:02 UTC

head link

[Mongrel] mongrel garbage collection

>  My hunch is that rmagick is allocating large amounts of RAM ouside of
>  Ruby.  It registers its objects with the interpreter, but the RAM
>  usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT
because
>  Ruby didn''t allocate it, so doesn''t know about it.
It''s allocating opaque objects on the Ruby heap but not using
Ruby''s
built-in malloc? That seems pretty evil.

Evan
>
>  So, it uses huge amounts of RAM, but doesn''t use huge numbers of
>  objects.  Thus you never trigger a GC cycle by exceeding the
>  GC_MALLOC_LIMIT nor by running our of object slots in the heap. 
I''d
>  have to go look at the code to be sure, but the theory fits the
>  behavior that is described very well.
>
>  I don''t think this is a case for building GC.foo memory
management
>  into Mongrel, though.  As I think you are suggesting, just call
>  GC.start yourself in your code when necessary.  In a typical Rails app
>  doing big things with rmagick, the extra time to do GC.start at the
>  end of the image manipulation, in the request handling, isn''t
going to
>  be noticable.
>
>
>  > But that''s not really the overall point.  My overall point
is how to
>  > properly handle a rails app that uses a great deal of memory during
each
>  > request.  I''m pretty sure this happens in other rails
applications that
>  > don''t happen to use ''RMagick''.
>  >
>  > Personally, I''ll simply say call the GC more often.
Seriously. I mean it.
>  > It''s not *that* slow, not at all. In fact, I call GC.start
explicitly inside
>  > of by ubygems.rb due to stuff I have observed before:
>
>  I completely concur with this.  If there are issues with huge memory
>  use (most likely caused by extensions making RAM allocations outside
>  of Ruby''s accounting, so implicit GC isn''t triggered),
just call
>  GC.start in one''s own code.
>
>
>  > Now, by my reckoning (and a few production apps seem to be showing
>  > emperically (purely emperical, sorry)) we should be calling on the GC
whilst
>  > loading up the apps. I mean come on, when are a really serious number
of
>  > temporary objects being created. Actually, it''s when
rubygems loads, and
>  > that''s the first thing that happens in, hmm, probably over
90% of ruby
>  > processes out there.
>
>  Just as a tangent, I do this in Swiftiply.  I make an explicit call to
>  GC.start after everything is loaded and all configs are parsed, just
>  to make sure execution is going into the main event loop with as much
>  junk cleaned out as possible.
>
>
>  > Or whatever. It doesn''t really matter that much where you do
this, or when,
>  > it just needs to happen every now and then. More importantly, add a
GC.start
>  > to the end of environment.rb, and you will have literally half the
number of
>  > objects in ObjectSpace.
>
>  This makes sense to me.
>
>  I could also see providing a 2nd Rails handler that had some GC
>  management stuff in it, along with some documentation on what it
>  actually does or does not do, so people can make an explicit choice to
>  use it, if they need it.  I''m still completely against throwing
code
>  into Mongrel itself for this sort of thing.  I just prefer not to
>  throw more things into Mongrel than we really _need_ to, when there is
>  no strong argument for them being inside of Mongrel itself.  GC.start
>  stuff is simple enough to put into one''s own code at appropriate
>  locations, or to put into a customized Mongrel handler if one needs
>  it.
>
>  Maybe this simply needs to be documented in the body of Mongrel
documentation?
>
>
>  Kirk Haines
>
>
> _______________________________________________
>  Mongrel-users mailing list
>  Mongrel-users at rubyforge.org
>  http://rubyforge.org/mailman/listinfo/mongrel-users
>


-- 
Evan Weaver
Cloudburst, LLC

Steve Midgley

2008-Mar-25 17:05 UTC

head link

[Mongrel] mongrel garbage collection

At 03:41 AM 3/25/2008, mongrel-users-request at rubyforge.org
wrote:>Date: Tue, 25 Mar 2008 10:40:50 +0000
>From: James Tucker <jftucker at gmail.com>
>Subject: Re: [Mongrel] mongrel garbage collection
>To: mongrel-users at rubyforge.org
>Message-ID: <E5223BA4-90BD-4976-8651-33226BC23299 at gmail.com>
>Content-Type: text/plain; charset="us-ascii"
>[snip]
>Also, implicit OOMEs or GC runs quite often DO NOT affect the
>extensions correctly. I don''t know what rmagick is doing under the
>hood in this area, but having been generating large portions of
>country maps with it (and moving away from it very rapidly), I know
>the GC doesn''t do "The Right Thing".
>[snip]
Hi James,

My understanding with RMagick is that it is hooking the Imagemagick 
libs directly in process. As a result, memory is not always freed when 
you''d expect it to be. I haven''t read up on the details,
having chosen
to just use out of process image management, but you might find this 
link interesting - in it, there is a claim that the latest releases of 
RMagick do *not* in fact leak any memory and that running a full GC 
manually will reclaim all memory it uses after the references are out 
of scope.

http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618

Steve

Steve Midgley

2008-Mar-25 17:05 UTC

head link

[Mongrel] mongrel garbage collection

At 03:41 AM 3/25/2008, mongrel-users-request at rubyforge.org
wrote:>Date: Tue, 25 Mar 2008 10:40:50 +0000
>From: James Tucker <jftucker at gmail.com>
>Subject: Re: [Mongrel] mongrel garbage collection
>To: mongrel-users at rubyforge.org
>Message-ID: <E5223BA4-90BD-4976-8651-33226BC23299 at gmail.com>
>Content-Type: text/plain; charset="us-ascii"
>[snip]
>Also, implicit OOMEs or GC runs quite often DO NOT affect the
>extensions correctly. I don''t know what rmagick is doing under the
>hood in this area, but having been generating large portions of
>country maps with it (and moving away from it very rapidly), I know
>the GC doesn''t do "The Right Thing".
>[snip]
Hi James,

My understanding with RMagick is that it is hooking the Imagemagick 
libs directly in process. As a result, memory is not always freed when 
you''d expect it to be. I haven''t read up on the details,
having chosen
to just use out of process image management, but you might find this 
link interesting - in it, there is a claim that the latest releases of 
RMagick do *not* in fact leak any memory and that running a full GC 
manually will reclaim all memory it uses after the references are out 
of scope.

http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618

Steve

Kirk Haines

2008-Mar-25 17:20 UTC

head link

[Mongrel] mongrel garbage collection

On Tue, Mar 25, 2008 at 11:02 AM, Evan Weaver <evan at cloudbur.st>
wrote:> >  My hunch is that rmagick is allocating large amounts of RAM ouside of
> >  Ruby.  It registers its objects with the interpreter, but the RAM
> >  usage in rmagick itself doesn''t count against
GC_MALLOC_LIMIT because
> >  Ruby didn''t allocate it, so doesn''t know about it.
>
> It''s allocating opaque objects on the Ruby heap but not using
Ruby''s
> built-in malloc? That seems pretty evil.
Not really.  It''s pretty common in extensions.  You alloc your
structures in whatever way is appropriate for the library you are
using, then use Data_Wrap_Struct with a mark and a free function to
hook your stuff into the Ruby garbage collector.

Your objects are thus known to Ruby as Ruby objects, but you have
potentially large chunks of memory that Ruby itself knows nothing
about.


Kirk Haines

James Tucker

2008-Mar-25 17:26 UTC

head link

[Mongrel] mongrel garbage collection

On 25 Mar 2008, at 15:26, Kirk Haines wrote:> On Tue, Mar 25, 2008 at 4:40 AM, James Tucker <jftucker at gmail.com>
> wrote:
>> Forgive me for not having read the whole thread, however, there is  
>> one thing
>> that seems to be really important, and that is, ruby hardly ever  
>> runs the
>> damned GC. It certainly doesn''t do full runs nearly often
enough
>> (IMO).
>
> There''s only one kind of garbage collection sweep.  And yeah,
> depending on what''s happening, GC may not run very often. 
That''s not
> generally a problem.
Sure, inside ruby there''s only one kind of run, but....
>> Also, implicit OOMEs or GC runs quite often DO NOT affect the  
>> extensions
>> correctly. I don''t know what rmagick is doing under the hood
in
>> this area,
>> but having been generating large portions of country maps with it  
>> (and
>> moving away from it very rapidly), I know the GC doesn''t do
"The
>> Right
>> Thing".
>
> There should be no difference between a GC run that is initiated by
> the interpreter and one that is initiated by one''s code.  It ends
up
> calling the same thing in gc.c.  Extensions can easily mismanage
> memory, though, and I have a hunch about what''s happening with
> rmagick.
I just realised the obvious truth, that ruby isn''t actually running  
the GC under those OOME conditions.
>> First call of address is GC_MALLOC_LIMIT and friends. For any small  
>> script
>> that doesn''t breach that value, the GC simply doesn''t
run. More
>> than this,
>> RMagick, in it''s apparent ''wisdom'' never
frees memory if the GC
>> never runs.
>> Seriously, check it out. Make a tiny script, and make a huge image  
>> with it.
>> Hell, make 20, get an OOME, and watch for a run of the GC. The OOME  
>> will
>> reach your code before the GC calls on RMagick to free.
>>
>> Now, add a call to GC.start, and no OOME. Despite the limitations  
>> of it
>> (ruby performance only IMO), most of the above experience was built  
>> up on
>> windows, and last usage was about 6 months ago, FYI.
>
> My hunch is that rmagick is allocating large amounts of RAM ouside of
> Ruby.  It registers its objects with the interpreter, but the RAM
> usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT
because
> Ruby didn''t allocate it, so doesn''t know about it.
Yup, it''s ImageMagick, un-patched and they don''t provide afaik
a
callback to replace malloc, or maybe that''s an rmagick issue.
> So, it uses huge amounts of RAM, but doesn''t use huge numbers of
> objects.  Thus you never trigger a GC cycle by exceeding the
> GC_MALLOC_LIMIT nor by running our of object slots in the heap. 
I''d
> have to go look at the code to be sure, but the theory fits the
> behavior that is described very well.
Right, in fact, I think the OOME actually comes from outside of ruby  
(unverified), and ruby can''t or won''t run the GC before going
down. As
the free() calls inside RMagick / ImageMagick aren''t happening without
calling GC.start. The GC.start call, somewhere/how is being used to  
trigger frees in the framework. Personally, this is bad design, and  
the really common complaints may also suggest so, however, I don''t  
know what their domain specific issues and limitations are. Maybe it''s
an ImageMagick thing.

Creating an OOME inside ruby, the interpreter calls on GC.start prior  
to going down. I started talking to zenspider about this stuff, and  
eventually he just pointed me at gc.c, fair enough. I still hold the  
opinion that an OOME hitting the interpreter (from whatever source)  
should attempt to invoke the GC. Of course, a hell of a lot of  
software doesn''t check the result of a call to malloc(), tut tut.

Tool: http://ideas.water-powered.com/projects/libgreat
> I don''t think this is a case for building GC.foo memory management
> into Mongrel, though.  As I think you are suggesting, just call
> GC.start yourself in your code when necessary.  In a typical Rails app
> doing big things with rmagick, the extra time to do GC.start at the
> end of the image manipulation, in the request handling, isn''t
going to
> be noticable.
Absolutely right, and yes, this is my opinion.
>> But that''s not really the overall point.  My overall point is
how to
>> properly handle a rails app that uses a great deal of memory during  
>> each
>> request.  I''m pretty sure this happens in other rails
applications
>> that
>> don''t happen to use ''RMagick''.
>>
>> Personally, I''ll simply say call the GC more often. Seriously.
I
>> mean it.
>> It''s not *that* slow, not at all. In fact, I call GC.start  
>> explicitly inside
>> of by ubygems.rb due to stuff I have observed before:
>
> I completely concur with this.  If there are issues with huge memory
> use (most likely caused by extensions making RAM allocations outside
> of Ruby''s accounting, so implicit GC isn''t triggered),
just call
> GC.start in one''s own code.
>
>> Now, by my reckoning (and a few production apps seem to be showing
>> emperically (purely emperical, sorry)) we should be calling on the  
>> GC whilst
>> loading up the apps. I mean come on, when are a really serious  
>> number of
>> temporary objects being created. Actually, it''s when rubygems
>> loads, and
>> that''s the first thing that happens in, hmm, probably over 90%
of
>> ruby
>> processes out there.
>
> Just as a tangent, I do this in Swiftiply.  I make an explicit call to
> GC.start after everything is loaded and all configs are parsed, just
> to make sure execution is going into the main event loop with as much
> junk cleaned out as possible.
I''ve done similar in anything that is running as a fire and forget  
style daemon. You know, the kinds of things that get setup once, and  
run for 1 to 20 years. There are several that I have never restarted.  
No rails, though. These kinds of things I also simply don''t want to  
waste the ram to silly fragmentation, the next allocation takes you up  
to a registerable percentage on medium aged machines. IIRC there''s one
in my copy of analogger too, or maybe you had that in there already :-)
>> Or whatever. It doesn''t really matter that much where you do
this,
>> or when,
>> it just needs to happen every now and then. More importantly, add a  
>> GC.start
>> to the end of environment.rb, and you will have literally half the  
>> number of
>> objects in ObjectSpace.
>
> This makes sense to me.
>
> I could also see providing a 2nd Rails handler that had some GC
> management stuff in it, along with some documentation on what it
> actually does or does not do, so people can make an explicit choice to
> use it, if they need it.  I''m still completely against throwing
code
> into Mongrel itself for this sort of thing.  I just prefer not to
> throw more things into Mongrel than we really _need_ to, when there is
> no strong argument for them being inside of Mongrel itself.  GC.start
> stuff is simple enough to put into one''s own code at appropriate
> locations, or to put into a customized Mongrel handler if one needs
> it.
If it wasn''t app specific I''d say put it in mongrel. It is
though, and
peoples tendency to pre-optimize probably makes this pointless.

I mean the cost of doing it in a thread under eventmachine is way  
higher than the ram usage costs for pure ruby apps, at least for my  
pure ruby apps. 20-40mb vs. lots of req. / sec.

But then, one could check for better alternatives, like add_timer(),  
etc, but that route tends towards bloat, so your original assertion of  
put it in the app configuration, is what I would choose.
> Maybe this simply needs to be documented in the body of Mongrel  
> documentation?
Maybe not even there. I think research needs to be done into the  
longer running effects of the GC under real environments. I know some  
people have done some (including myself), but the results are never  
released in public. The GC also seems to be one of those topics, as  
it''s so close to performance, where people are happy to see how high  
up the wall they can go, prior to doing research.

With regard to mongrel and this stuff, it''s really not a mongrel  
issue. Mongrel is a great citizen wrt the GC (at least by comparison  
to a lot of other code).

Particularly bad citizens in this area include:
  - Every single pure ruby pdf lib I''ve seen
  - rubygems (by way of the spec loading semantics, not rubygems  
itself, kinda (lets just say, I''d do it different, but by design, not  
implementation))
  - rails
  - rmagick
>
>
>
> Kirk Haines
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users

James Tucker

2008-Mar-25 17:45 UTC

head link

[Mongrel] mongrel garbage collection

On 25 Mar 2008, at 17:05, Steve Midgley wrote:> At 03:41 AM 3/25/2008, mongrel-users-request at rubyforge.org wrote:
>> Date: Tue, 25 Mar 2008 10:40:50 +0000
>> From: James Tucker <jftucker at gmail.com>
>> Subject: Re: [Mongrel] mongrel garbage collection
>> To: mongrel-users at rubyforge.org
>> Message-ID: <E5223BA4-90BD-4976-8651-33226BC23299 at gmail.com>
>> Content-Type: text/plain; charset="us-ascii"
>> [snip]
>> Also, implicit OOMEs or GC runs quite often DO NOT affect the
>> extensions correctly. I don''t know what rmagick is doing under
the
>> hood in this area, but having been generating large portions of
>> country maps with it (and moving away from it very rapidly), I know
>> the GC doesn''t do "The Right Thing".
>> [snip]
>
> Hi James,
>
> My understanding with RMagick is that it is hooking the Imagemagick
> libs directly in process. As a result, memory is not always freed when
> you''d expect it to be. I haven''t read up on the details,
having chosen
> to just use out of process image management, but you might find this
> link interesting - in it, there is a claim that the latest releases of
> RMagick do *not* in fact leak any memory and that running a full GC
> manually will reclaim all memory it uses after the references are out
> of scope.
Thank you for kindly ensuring that I got this. We actually moved  
completely away from anything ImageMagick based. There really wasn''t  
any sensible way to ''fix'' it.

Whilst destroy! looks ok, even when doing what we were (high res  
tiling, covering around 250 square miles), we found performance was  
fine and we could avoid all allocation issues by using the crazy  
thread solution (Thread.new { loop { sleep some_time; GC.start } }).

This is all good in most scenarios but then there are times when  
running a framework like eventmachine, where threads (yes, even green  
ones) can be total performance killers too. Mind you, under rails,  
there''s always a linear reaction run, so I''m not going to
speculate
more on that detail. It''s also OT for here, mostly...
> http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618
>
> Steve
Thanks again,

James.

P.S. Personally, if I was coming up against this problem today, I''d  
drop out to a separate process, driven by something like background  
job if under rails. If under pure ruby, I''d use drb or eventmachine +  
a marshalling protocol, depending on specific requirements. The  
biggest issue for our old project was hitting swap / page file. Image  
rendering, when you''re already working on the per-pixel layer, is  
really easy to divide up, though, so optimizing for speed is pretty  
easy really.

When it comes to background concurrent scheduling, staying away from  
ACID can really help, too, but that really is another topic for  
another time. Lets just say, allow slack, and life will be easier if  
you ever hit a silly scale. I''ve seen people trying to recover broken  
ACID implementations by trawling logs, and my god, tearful.

Zed A. Shaw

2008-Mar-25 18:53 UTC

head link

[Mongrel] mongrel garbage collection

On Mon, 24 Mar 2008 08:21:52 -0700
"Scott Windsor" <swindsor at gmail.com> wrote:
> Right now, my processes aren''t gigantic... I''m preparing
for a ''worst case''
> scenario when I have a extremely large processes or memory usage.  This can
> easily happen on specific applications such as an image server (using image
> magick) or parsing/creating large xml payloads (a large REST server).  For
> those applications, I may have a large amount of memory used for each
> request, which will increase until the GC is run.
Well, does that mean you DO have this problem or DO NOT have this
problem?  If you aren''t actually facing a real problem that could be
solved by this change then you most likely won''t get very far.  Any
imagined scenario you come up with could easily just be avoided.

If you want to do this then you''ll have to write code and
you''ll have
to learn how to make a Mongrel handler that is registered before and
after the regular rails handler.  All you do is have this before handler
analyze the request and disable the GC on the way in.  In the after
handler you just have to renable the GC and make it do the work.

It''s pretty simple, but *you* will have to go read Mongrel code and
understand it first.  Otherwise you''re just wasting your time really.

-- 
Zed A. Shaw
- Hate: http://savingtheinternetwithhate.com/
- Good: http://www.zedshaw.com/
- Evil: http://yearofevil.com/

Evan Weaver

2008-Mar-26 12:48 UTC

head link

[Mongrel] mongrel garbage collection

You''re right, ok. So the memory causing the OOM error isn''t
actually
on the Ruby heap, but it can''t get freed until the opaque object gets
GC''d.

Evan

On Tue, Mar 25, 2008 at 1:20 PM, Kirk Haines <wyhaines at gmail.com>
wrote:> On Tue, Mar 25, 2008 at 11:02 AM, Evan Weaver <evan at cloudbur.st>
wrote:
>  > >  My hunch is that rmagick is allocating large amounts of RAM
ouside of
>  > >  Ruby.  It registers its objects with the interpreter, but the
RAM
>  > >  usage in rmagick itself doesn''t count against
GC_MALLOC_LIMIT because
>  > >  Ruby didn''t allocate it, so doesn''t know
about it.
>  >
>  > It''s allocating opaque objects on the Ruby heap but not
using Ruby''s
>  > built-in malloc? That seems pretty evil.
>
>  Not really.  It''s pretty common in extensions.  You alloc your
>  structures in whatever way is appropriate for the library you are
>  using, then use Data_Wrap_Struct with a mark and a free function to
>  hook your stuff into the Ruby garbage collector.
>
>  Your objects are thus known to Ruby as Ruby objects, but you have
>  potentially large chunks of memory that Ruby itself knows nothing
>  about.
>
>
>
>
>  Kirk Haines
>  _______________________________________________
>  Mongrel-users mailing list
>  Mongrel-users at rubyforge.org
>  http://rubyforge.org/mailman/listinfo/mongrel-users
>


-- 
Evan Weaver
Cloudburst, LLC

Possibly Parallel Threads

Search for more seemingly similar threads

Mongrel users - Mar 2008 - mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

[Mongrel] mongrel garbage collection

Possibly Parallel Threads