Sorry, for the re-post, but I''m new to the mailing list and wanted to bring back up and old topic I saw in the archives. http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html I think a patch to delay garbage collection and run it later is pretty important for high performance web applications. I do understand the trade-offs of having explicit vs. implicit garbage collection running, and would much prefer to off-load my garbage collection until later point (when users are not waiting for a request). I agree from the previous points that this could very well be rails-specific, but isn''t this a feature that would benefit all of the frameworks that use mongrel? This could be easily added as a configuration option to run after N number of requests or let the GC behave as normal and run when needed, the default of course, allowing the GC when it deems necessary. Adding the collection would be explicit after processing a request, but before listening to any new requests. - scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080321/f1ad87d5/attachment.html
On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at gmail.com> wrote:> Sorry, for the re-post, but I''m new to the mailing list and wanted to bring > back up and old topic I saw in the archives. > > http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html > > I think a patch to delay garbage collection and run it later is pretty > important for high performance web applications. I do understand theIn the vast majority of cases you are going to do a worse job of determining when and how often to run the GC than even MRI Ruby''s simple algorithms. MRI garbage collection stops the world -- nothing else happens while the GC runs -- so when talking about overall throughput on an application, you don''t want it to run any more than necessary. I don''t use Rails, but in the past I have experimented with this quite a lot under IOWA, and in my normal applications (i.e. not using RMagick) I could never come up with an algorithm of self-managed GC.disable/GC.enable/GC.start that gave the same overall level of throughput that I got by letting Ruby start the GC according to its own algorithms. That experience makes me skeptical of that approach in the general case, though there are occasional specific cases where it can be useful. Kirk Haines
On Fri, Mar 21, 2008 at 11:49 AM, Kirk Haines <wyhaines at gmail.com> wrote:> On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at gmail.com> > wrote: > > Sorry, for the re-post, but I''m new to the mailing list and wanted to > bring > > back up and old topic I saw in the archives. > > > > http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html > > > > I think a patch to delay garbage collection and run it later is pretty > > important for high performance web applications. I do understand the > > In the vast majority of cases you are going to do a worse job of > determining when and how often to run the GC than even MRI Ruby''s > simple algorithms. MRI garbage collection stops the world -- nothing > else happens while the GC runs -- so when talking about overall > throughput on an application, you don''t want it to run any more than > necessary. > > I don''t use Rails, but in the past I have experimented with this quite > a lot under IOWA, and in my normal applications (i.e. not using > RMagick) I could never come up with an algorithm of self-managed > GC.disable/GC.enable/GC.start that gave the same overall level of > throughput that I got by letting Ruby start the GC according to its > own algorithms. That experience makes me skeptical of that approach > in the general case, though there are occasional specific cases where > it can be useful. > > > Kirk HainesI understand that the GC is quite knowledgeable about when to run garbage collection when examining the heap. But, the GC doesn''t know anything about my application or it''s state. The fact that when the GC runs everything stops is why I''d prefer to limit when the GC will run. I''d rather it run outside of serving a web request rather then when it''s right in the middle of serving requests. I know that the ideal situation is to not need to run the GC, but the reality is that I''m using various gems and plugins and not all are well behaved and free of memory leaks. Rails itself may also have regular leaks from time to time and I''d prefer to have my application consistently be slow than randomly (and unexpectedly) be slow. The alternative is to terminate your application after N number of requests and never run the GC, which I''m not a fan of. - scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080321/50aab13b/attachment.html
On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at gmail.com> wrote:> I understand that the GC is quite knowledgeable about when to run garbage > collection when examining the heap. But, the GC doesn''t know anything about > my application or it''s state. The fact that when the GC runs everything > stops is why I''d prefer to limit when the GC will run. I''d rather it run > outside of serving a web request rather then when it''s right in the middle > of serving requests.It doesn''t matter, if one is looking at overall throughput. And how long do your GC runs take? If you have a GC invocation that is noticable on a single request, your processes must be gigantic, which would suggest to me that there''s a more fundamental problem with the app.> I know that the ideal situation is to not need to run the GC, but the > reality is that I''m using various gems and plugins and not all are well > behaved and free of memory leaks. Rails itself may also have regular leaksNo, it''s impractical to never run the GC. The ideal situation, at least where execution performance and throughput on a high performance app is concerned, is to just intelligently reduce how often it needs to run by paying attention to your object creation. In particular, pay attention to the throwaway object creation.> from time to time and I''d prefer to have my application consistently be slow > than randomly (and unexpectedly) be slow. The alternative is to terminate > your application after N number of requests and never run the GC, which I''m > not a fan of.If your goal is to deal with memory leaks, then you really need to define what that means in a GC''d language like Ruby. To me, a leak is something that consumes memory in a way that eludes the GC''s ability to track it and reuse it. The fundamental nature of that sort of thing is that the GC can''t help you with it. If by leaks, you mean code that just creates a lot of objects that the GC needs to clean up, then those aren''t leaks. It may be inefficient code, but it''s not a memory leak. And in the end, while disabling GC over the course of a request may result in processing that one request more quickly than it would have been processed otherwise, the disable/enable dance is going to cost you something. You''ll likely either end up using more RAM than you otherwise would have in between GC calls, resulting in bigger processes, or you end up calling GC more often than you otherwise would have, reducing your high performance app''s throughput. And for the general cases, that''s not an advantageous situation. To be more specific, if excessive RAM usage and GC costs that are noticable to the user during requests is a common thing for Rails apps, and the reason for that is bad code in Rails and not just bad user code, then the Rails folks should be the targets of a conversation on the matter. Mongrel itself, though, does not need to be, and should not be playing manual memory management games on the behalf of a web framework. Kirk Haines
> The alternative is to terminate your application after N number of requests and never run the > GC, which I''m not a fan of.WSGI (Python) can do that, and it''s a pretty nice alternative to having Monit kill a leaky app that may have a bunch of requests queued up (Mongrel soft shutdown not withstanding). Evan On Fri, Mar 21, 2008 at 3:23 PM, Scott Windsor <swindsor at gmail.com> wrote:> On Fri, Mar 21, 2008 at 11:49 AM, Kirk Haines <wyhaines at gmail.com> wrote: > > > > > On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at gmail.com> > wrote: > > > Sorry, for the re-post, but I''m new to the mailing list and wanted to > bring > > > back up and old topic I saw in the archives. > > > > > > http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html > > > > > > I think a patch to delay garbage collection and run it later is pretty > > > important for high performance web applications. I do understand the > > > > In the vast majority of cases you are going to do a worse job of > > determining when and how often to run the GC than even MRI Ruby''s > > simple algorithms. MRI garbage collection stops the world -- nothing > > else happens while the GC runs -- so when talking about overall > > throughput on an application, you don''t want it to run any more than > > necessary. > > > > I don''t use Rails, but in the past I have experimented with this quite > > a lot under IOWA, and in my normal applications (i.e. not using > > RMagick) I could never come up with an algorithm of self-managed > > GC.disable/GC.enable/GC.start that gave the same overall level of > > throughput that I got by letting Ruby start the GC according to its > > own algorithms. That experience makes me skeptical of that approach > > in the general case, though there are occasional specific cases where > > it can be useful. > > > > > > Kirk Haines > > I understand that the GC is quite knowledgeable about when to run garbage > collection when examining the heap. But, the GC doesn''t know anything about > my application or it''s state. The fact that when the GC runs everything > stops is why I''d prefer to limit when the GC will run. I''d rather it run > outside of serving a web request rather then when it''s right in the middle > of serving requests. > > I know that the ideal situation is to not need to run the GC, but the > reality is that I''m using various gems and plugins and not all are well > behaved and free of memory leaks. Rails itself may also have regular leaks > from time to time and I''d prefer to have my application consistently be slow > than randomly (and unexpectedly) be slow. The alternative is to terminate > your application after N number of requests and never run the GC, which I''m > not a fan of. > > - scott > > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Evan Weaver Cloudburst, LLC
> You''ll likely either end up using more RAM than you otherwise would > have in between GC calls, resulting in bigger processesThis is definitely true. Keep in mind that the in-struct mark phase means that the entire process has to lurch out of swap whenever the GC runs. Since the process is now much bigger, and the pages idled longer and are more likely to be swapped out, that can be pretty a brutal hit. Evan On Fri, Mar 21, 2008 at 4:19 PM, Kirk Haines <wyhaines at gmail.com> wrote:> On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at gmail.com> wrote: > > > I understand that the GC is quite knowledgeable about when to run garbage > > collection when examining the heap. But, the GC doesn''t know anything about > > my application or it''s state. The fact that when the GC runs everything > > stops is why I''d prefer to limit when the GC will run. I''d rather it run > > outside of serving a web request rather then when it''s right in the middle > > of serving requests. > > It doesn''t matter, if one is looking at overall throughput. And how > long do your GC runs take? If you have a GC invocation that is > noticable on a single request, your processes must be gigantic, which > would suggest to me that there''s a more fundamental problem with the > app. > > > > I know that the ideal situation is to not need to run the GC, but the > > reality is that I''m using various gems and plugins and not all are well > > behaved and free of memory leaks. Rails itself may also have regular leaks > > No, it''s impractical to never run the GC. The ideal situation, at > least where execution performance and throughput on a high performance > app is concerned, is to just intelligently reduce how often it needs > to run by paying attention to your object creation. In particular, > pay attention to the throwaway object creation. > > > > from time to time and I''d prefer to have my application consistently be slow > > than randomly (and unexpectedly) be slow. The alternative is to terminate > > your application after N number of requests and never run the GC, which I''m > > not a fan of. > > If your goal is to deal with memory leaks, then you really need to > define what that means in a GC''d language like Ruby. > To me, a leak is something that consumes memory in a way that eludes > the GC''s ability to track it and reuse it. The fundamental nature of > that sort of thing is that the GC can''t help you with it. > > If by leaks, you mean code that just creates a lot of objects that the > GC needs to clean up, then those aren''t leaks. It may be inefficient > code, but it''s not a memory leak. > > And in the end, while disabling GC over the course of a request may > result in processing that one request more quickly than it would have > been processed otherwise, the disable/enable dance is going to cost > you something. > > You''ll likely either end up using more RAM than you otherwise would > have in between GC calls, resulting in bigger processes, or you end up > calling GC more often than you otherwise would have, reducing your > high performance app''s throughput. > > And for the general cases, that''s not an advantageous situation. > > To be more specific, if excessive RAM usage and GC costs that are > noticable to the user during requests is a common thing for Rails > apps, and the reason for that is bad code in Rails and not just bad > user code, then the Rails folks should be the targets of a > conversation on the matter. Mongrel itself, though, does not need to > be, and should not be playing manual memory management games on the > behalf of a web framework. > > > > > Kirk Haines > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Evan Weaver Cloudburst, LLC
At 01:19 PM 3/21/2008, mongrel-users-request at rubyforge.org wrote:>Date: Fri, 21 Mar 2008 14:19:01 -0600 >From: "Kirk Haines" <wyhaines at gmail.com> >Subject: Re: [Mongrel] mongrel garbage collection >To: mongrel-users at rubyforge.org >Message-ID: > <f4cd26df0803211319o491ad3c1u92ef5509c433c76b at mail.gmail.com> >Content-Type: text/plain; charset=ISO-8859-1 >On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at gmail.com> >wrote: > > > I understand that the GC is quite knowledgeable about when to run > garbage > > collection when examining the heap. But, the GC doesn''t know > anything about > > my application or it''s state. The fact that when the GC runs > everything > > stops is why I''d prefer to limit when the GC will run. I''d rather > it run > > outside of serving a web request rather then when it''s right in the > middle > > of serving requests. > >It doesn''t matter, if one is looking at overall throughput.Hi Kirk, One thought on this - would it be possible to schedule GC to run just after all the html has been rendered to the client from Rails, but while leaving open the connection (so that mongrel is blocked on Rails)? If so, it seems like if one were using something like nginx fair proxy, then the mongrel would be running it''s garbage collection AFTER the client got all its html but BEFORE any new requests were sent to it. In a fully loaded server it wouldn''t matter at all, but most environments have a little headroom at least, so that nginx fair proxy would just route around the mongrel that is still running a GC at the end of it''s Rails loop. So total throughput for a given (non-max) volume of requests might be unaffected since nothing would ever pile up behind a rails process that has slowed down to run GC (and the client will be happy since they got all their html before the GC started). I have no idea if this is meaningful, but I''ve been playing with some performance tests against mongrel + nginx fair proxy and it occurs to me that this might be relevant.. Best, Steve
On 22/03/2008, at 8:19 AM, Steve Midgley wrote:> If so, it seems like if one were using something like nginx fair > proxy, > then the mongrel would be running it''s garbage collection AFTER the > client got all its html but BEFORE any new requests were sent to it. > > In a fully loaded server it wouldn''t matter at all, but most > environments have a little headroom at least, so that nginx fair proxy > would just route around the mongrel that is still running a GC at the > end of it''s Rails loop.That would only be true if you set the connect timeout on the backend to 1 second AND your GC pass took longer than 1 second. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080322/67f7c34a/attachment.html
On Fri, Mar 21, 2008 at 1:19 PM, Kirk Haines <wyhaines at gmail.com> wrote:> On Fri, Mar 21, 2008 at 1:23 PM, Scott Windsor <swindsor at gmail.com> wrote: > > > I understand that the GC is quite knowledgeable about when to run > garbage > > collection when examining the heap. But, the GC doesn''t know anything > about > > my application or it''s state. The fact that when the GC runs everything > > stops is why I''d prefer to limit when the GC will run. I''d rather it > run > > outside of serving a web request rather then when it''s right in the > middle > > of serving requests. > > It doesn''t matter, if one is looking at overall throughput. And how > long do your GC runs take? If you have a GC invocation that is > noticable on a single request, your processes must be gigantic, which > would suggest to me that there''s a more fundamental problem with the > app.Right now, my processes aren''t gigantic... I''m preparing for a ''worst case'' scenario when I have a extremely large processes or memory usage. This can easily happen on specific applications such as an image server (using image magick) or parsing/creating large xml payloads (a large REST server). For those applications, I may have a large amount of memory used for each request, which will increase until the GC is run.> > I know that the ideal situation is to not need to run the GC, but the > > reality is that I''m using various gems and plugins and not all are well > > behaved and free of memory leaks. Rails itself may also have regular > leaks > > No, it''s impractical to never run the GC. The ideal situation, at > least where execution performance and throughput on a high performance > app is concerned, is to just intelligently reduce how often it needs > to run by paying attention to your object creation. In particular, > pay attention to the throwaway object creation. >There may be perfectly good reasons to have intermediate object creation (good encapsulation, usage of a another library/gem you can''t modify, large operations that you need to keep atomic). While ideally you''d fix the memory usage problem, this doesn''t solve all cases.> > > from time to time and I''d prefer to have my application consistently be > slow > > than randomly (and unexpectedly) be slow. The alternative is to > terminate > > your application after N number of requests and never run the GC, which > I''m > > not a fan of. > > If your goal is to deal with memory leaks, then you really need to > define what that means in a GC''d language like Ruby. > To me, a leak is something that consumes memory in a way that eludes > the GC''s ability to track it and reuse it. The fundamental nature of > that sort of thing is that the GC can''t help you with it. >Yes, for Ruby (and other GC''d languages), it''s much harder to leak memory such that the GC can never clean it up - but it does (and has) happened. This case I''m less concerned about as a leak of this magnitude should be considered a bug and fixed.> > If by leaks, you mean code that just creates a lot of objects that the > GC needs to clean up, then those aren''t leaks. It may be inefficient > code, but it''s not a memory leak. >Inefficient it may be - but it might be just optimizing for a different problem. For example, take ActiveRecord''s association cache and it''s query cache. If you''re doing a large number of queries each page load, ActiveRecord is still going to cache them for each request - this is far better than further round trips to the database, but may lead to a large amount of memory consumed per each request.> > And in the end, while disabling GC over the course of a request may > result in processing that one request more quickly than it would have > been processed otherwise, the disable/enable dance is going to cost > you something. >Agreed. But again, I''d rather it be a constant cost outside of processing a request than a variable cost inside of processing a request.> > You''ll likely either end up using more RAM than you otherwise would > have in between GC calls, resulting in bigger processes, or you end up > calling GC more often than you otherwise would have, reducing your > high performance app''s throughput. > > And for the general cases, that''s not an advantageous situation. >This can vary from application to application - all the more reason to make this a configurable option (and not the default).> > To be more specific, if excessive RAM usage and GC costs that are > noticable to the user during requests is a common thing for Rails > apps, and the reason for that is bad code in Rails and not just bad > user code, then the Rails folks should be the targets of a > conversation on the matter. Mongrel itself, though, does not need to > be, and should not be playing manual memory management games on the > behalf of a web framework. > > > Kirk Haines >I still disagree on this point - I doubt that Rails is the only web framework that would benefit from being able to control when the GC is run. This is going to be a common problem across frameworks whenever web applications are consuming then releasing large amounts of memory - I''d say it can be a pretty common use case for certain types of web applications. - scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/4fa5c50e/attachment-0001.html
If you plan on regularly killing your application (for whatever reason), then this is a pretty good option. This is a pretty common practice for apache modules and fastcgi applications as a hold-over from dealing with older leaky C apps. I''d personally prefer for my Ruby web apps to re-run the GC rather than have to startup/shutdown/parse configs/connect to external resources costs, but it''s because they are far less likely to leak memory that the GC can''t catch or get into an unstable state. - scott On Fri, Mar 21, 2008 at 1:22 PM, Evan Weaver <evan at cloudbur.st> wrote:> > The alternative is to terminate your application after N number of > requests and never run the > GC, which I''m not a fan of. > > WSGI (Python) can do that, and it''s a pretty nice alternative to > having Monit kill a leaky app that may have a bunch of requests queued > up (Mongrel soft shutdown not withstanding). > > Evan > > On Fri, Mar 21, 2008 at 3:23 PM, Scott Windsor <swindsor at gmail.com> wrote: > > On Fri, Mar 21, 2008 at 11:49 AM, Kirk Haines <wyhaines at gmail.com> > wrote: > > > > > > > > On Fri, Mar 21, 2008 at 12:12 PM, Scott Windsor <swindsor at gmail.com> > > wrote: > > > > Sorry, for the re-post, but I''m new to the mailing list and wanted > to > > bring > > > > back up and old topic I saw in the archives. > > > > > > > > > http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html > > > > > > > > I think a patch to delay garbage collection and run it later is > pretty > > > > important for high performance web applications. I do understand > the > > > > > > In the vast majority of cases you are going to do a worse job of > > > determining when and how often to run the GC than even MRI Ruby''s > > > simple algorithms. MRI garbage collection stops the world -- nothing > > > else happens while the GC runs -- so when talking about overall > > > throughput on an application, you don''t want it to run any more than > > > necessary. > > > > > > I don''t use Rails, but in the past I have experimented with this quite > > > a lot under IOWA, and in my normal applications (i.e. not using > > > RMagick) I could never come up with an algorithm of self-managed > > > GC.disable/GC.enable/GC.start that gave the same overall level of > > > throughput that I got by letting Ruby start the GC according to its > > > own algorithms. That experience makes me skeptical of that approach > > > in the general case, though there are occasional specific cases where > > > it can be useful. > > > > > > > > > Kirk Haines > > > > I understand that the GC is quite knowledgeable about when to run > garbage > > collection when examining the heap. But, the GC doesn''t know anything > about > > my application or it''s state. The fact that when the GC runs everything > > stops is why I''d prefer to limit when the GC will run. I''d rather it > run > > outside of serving a web request rather then when it''s right in the > middle > > of serving requests. > > > > I know that the ideal situation is to not need to run the GC, but the > > reality is that I''m using various gems and plugins and not all are well > > behaved and free of memory leaks. Rails itself may also have regular > leaks > > from time to time and I''d prefer to have my application consistently be > slow > > than randomly (and unexpectedly) be slow. The alternative is to > terminate > > your application after N number of requests and never run the GC, which > I''m > > not a fan of. > > > > - scott > > > > _______________________________________________ > > Mongrel-users mailing list > > Mongrel-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mongrel-users > > > > > > -- > Evan Weaver > Cloudburst, LLC > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/fae9ef17/attachment.html
On Sat, Mar 22, 2008 at 3:39 AM, Dave Cheney <dave at cheney.net> wrote:> > On 22/03/2008, at 8:19 AM, Steve Midgley wrote: > > If so, it seems like if one were using something like nginx fair proxy, > then the mongrel would be running it''s garbage collection AFTER the > client got all its html but BEFORE any new requests were sent to it. > > In a fully loaded server it wouldn''t matter at all, but most > environments have a little headroom at least, so that nginx fair proxy > would just route around the mongrel that is still running a GC at the > end of it''s Rails loop. > > > That would only be true if you set the connect timeout on the backend to 1 > second AND your GC pass took longer than 1 second. >Yes, but worse case here is that another request gets delayed before processing. Still potentially better (IMHO) than dealing with this delaying when processing a request. - scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/77dafa2d/attachment.html
At 08:21 AM 3/24/2008, mongrel-users-request at rubyforge.org wrote:>Message: 7 >Date: Mon, 24 Mar 2008 08:21:52 -0700 >From: "Scott Windsor" <swindsor at gmail.com> >Subject: Re: [Mongrel] mongrel garbage collection >To: mongrel-users at rubyforge.org >Message-ID: > <aa2e18ec0803240821o1f4f5921tf4156780ed0928d at mail.gmail.com> >[snip] >Right now, my processes aren''t gigantic... I''m preparing for a ''worst >case'' >scenario when I have a extremely large processes or memory >usage. This can >easily happen on specific applications such as an image server (using >image >magick) or parsing/creating large xml payloads (a large REST >server). For >those applications, I may have a large amount of memory used for each >request, which will increase until the GC is run. > >[snip] >There may be perfectly good reasons to have intermediate object >creation >(good encapsulation, usage of a another library/gem you can''t modify, >large >operations that you need to keep atomic). While ideally you''d fix the >memory usage problem, this doesn''t solve all cases.Hi Scott, I hope this somewhat OT post is ok (feedback welcome). I''ve had memory problems with image magick too - even when it runs out of process. On certain (rare but reasonably sized) image files it seems to go memory haywire, eating too much memory and throwing my app stack into swap. So I wrote this simple rails plug-in which is very limited in function, but does mostly what I needed from an image processor. Notable for your issue above, it lets you easily specify limits on how much memory image magick is allowed to consume while doing its work (thanks to Ara Howard on initial direction on that one). It might be interest to you: http://www.misuse.org/science/2008/01/30/mojomagick-ruby-image-library-for-imagemagick/ Best, Steve
On Mon, Mar 24, 2008 at 9:21 AM, Scott Windsor <swindsor at gmail.com> wrote:> Right now, my processes aren''t gigantic... I''m preparing for a ''worst case'' > scenario when I have a extremely large processes or memory usage. This can > easily happen on specific applications such as an image server (using image > magick) or parsing/creating large xml payloads (a large REST server). For > those applications, I may have a large amount of memory used for each > request, which will increase until the GC is run.(*nod*) image magick is a well known bad citizen. Either don''t use it at all, or use it in an external process from your web app processes. And if, for whatever reason, you must use it inside of your web app process, and your use case really does create processes so enormous that you can perceive a response lag from a manual GC.start inside of your request processing, then create a custom Rails handler that does it. You can trivially alter it to do whatever GC.foo actions you desire. The code is simple and easy to follow, so just make your own Mongrel::Rails::RailsHandlerWithParanoidGCManagement.> There may be perfectly good reasons to have intermediate object creation > (good encapsulation, usage of a another library/gem you can''t modify, large > operations that you need to keep atomic). While ideally you''d fix the > memory usage problem, this doesn''t solve all cases.Obviously. It''s easy and convenient to ignore the issue, and often the issue doesn''t matter for a given piece of code. But if memory usage or execution speed becomes an issue for one''s code, going back and taking a look at the throwaway object creation, and addressing it, can net considerable improvements.> Yes, for Ruby (and other GC''d languages), it''s much harder to leak memory > such that the GC can never clean it up - but it does (and has) happened. > This case I''m less concerned about as a leak of this magnitude should be > considered a bug and fixed.Oh, I know. That''s why I brought it up, though. You were talking about memory leaks, so I wanted to make a distinction. Real leaks, like the Array#shift bug, or leaky continuations, or badly behaved Ruby extensions, aren''t affected by GC manipulations.> Inefficient it may be - but it might be just optimizing for a different > problem. For example, take ActiveRecord''s association cache and it''s query > cache. If you''re doing a large number of queries each page load, > ActiveRecord is still going to cache them for each request - this is far > better than further round trips to the database, but may lead to a large > amount of memory consumed per each request.Sure. And if it''s optimizing for a different problem, then that''s fine, so long as the optimization isn''t creating a worse probablem than the issue it''s trying to address. But that''s also largely irrelevant, I think. I just did a quick test. I created a program that creates 10 million objects. It has a footprint of about a gigabyte of RAM usage. It takes Ruby 0.5 seconds to walk that 10 million objects on my server. If you have a web app that has processes anywhere near that large, you have bigger problems to deal with. And if you have a more reasonably large, million object app, then on my server, the GC cost would be 0.05 seconds. Given the typical speed of Rails apps, an occasional 0.05 second delay is going to be unnoticable.> Agreed. But again, I''d rather it be a constant cost outside of processing a > request than a variable cost inside of processing a request.You''re worrying about something that just isn''t a problem in the vast, vast majority of cases. Again, testing on my server, even with a very simple, very fast piece of code creating objects, it takes almost 20x as long to create the objects as to GC them.> This can vary from application to application - all the more reason to make > this a configurable option (and not the default).It''s still my position that it''s not Mongrel''s job to be implementing a manual memory management scheme that is almost always going to be a performance loser over just leaving it alone. It''s still my position that if one has an application that, through testing, has been shown to have a use case where it can actually benefit from manual GC.foo management, then one can trivially create a mongrel handler that will do this for you.> I still disagree on this point - I doubt that Rails is the only web > framework that would benefit from being able to control when the GC is run. > This is going to be a common problem across frameworks whenever web > applications are consuming then releasing large amounts of memory - I''d say > it can be a pretty common use case for certain types of web applications.My point is that if it is _Rails_ code that is causing the problem, that''s a _Rails_ problem. My point is also that manual GC.foo management is going to cause more problems than it helps for the vast majority of applications. GC cycles aren''t that slow, especially compared to the speed of a typical Rails app, and certainly not when compared to the speed of a Rails request that makes a lot of objects and does any sort of intensive, time consuming operations. Kirk Haines
On Tue, Mar 25, 2008 at 11:53 AM, Zed A. Shaw <zedshaw at zedshaw.com> wrote:> On Mon, 24 Mar 2008 08:21:52 -0700 > "Scott Windsor" <swindsor at gmail.com> wrote: > > > Right now, my processes aren''t gigantic... I''m preparing for a ''worst > case'' > > scenario when I have a extremely large processes or memory usage. This > can > > easily happen on specific applications such as an image server (using > image > > magick) or parsing/creating large xml payloads (a large REST server). > For > > those applications, I may have a large amount of memory used for each > > request, which will increase until the GC is run. > > Well, does that mean you DO have this problem or DO NOT have this > problem? If you aren''t actually facing a real problem that could be > solved by this change then you most likely won''t get very far. Any > imagined scenario you come up with could easily just be avoided.Right now my current deployment configuration for all my rails applications is using apache + fastcgi. With this deployment strategy, if I don''t set the garbage collection in my dispatch.fcgi, any rails application I use that uses image magick (for resizing/effects/etc) eats memory like a hog. In my dispatch... http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi I usually set this to around 50 executions per gc run and my rails apps seem pretty happy. This has been working great for me thus far, but using mod_fastcgi leaves zombies processes occasionally during restart. Checking in with the docs, mod_fastcgi is more or less deprecated, and mod_fcgid is prefered. mod_fcgid has all sorts of issues (random 500s and the like), and to boot the documentation is quite poor. So, I''ve decieded to move my apps over to using nginx with proxy with mongrel. The decsion to move the nginx is pretty minor (it''s lighter weight and easier to configure), but my decision to move to mongrel warrented a bit of research. I do want to ensure that all of my applications behave properly in terms of memory consumption and the first thing I''ve noticed is that mongrel doesn''t have the same options available for customizing when the GC runs. This leads me to believe that either there''s something specific to rails running under FastCGI that requires the GC to disabled/enabled during processes execution or mongrel hasn''t implemented the feature yet.> > If you want to do this then you''ll have to write code and you''ll have > to learn how to make a Mongrel handler that is registered before and > after the regular rails handler. All you do is have this before handler > analyze the request and disable the GC on the way in. In the after > handler you just have to renable the GC and make it do the work. > > It''s pretty simple, but *you* will have to go read Mongrel code and > understand it first. Otherwise you''re just wasting your time really.--> Zed A. Shaw > - Hate: http://savingtheinternetwithhate.com/ > - Good: http://www.zedshaw.com/ > - Evil: http://yearofevil.com/ >Sounds good to me - I don''t mind writing code, I just want to see if I do spend the time if it''s something the mongrel community would accept... Quick question about the code change.... Counting the number of processes served and determining the GC behavior should be done inside a mutex (or we start to run the risk of running the GC twice or mis-counting the number of requests processed). I don''t see any common mutex used for all mongrel dispatchers, but the logic is specific within each type of http handler (rails, camping, etc). Would it make sense then to put the optional GC run check (and GC run, if applicable) within the syncronize block for each http handler or is the something that should live in the base HTTPHandler class? - scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/2604c5f2/attachment.html
On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor <swindsor at gmail.com> wrote:> > Right now my current deployment configuration for all my rails applications > is using apache + fastcgi. > > With this deployment strategy, if I don''t set the garbage collection in my > dispatch.fcgi, any rails application I use that uses image magick (for > resizing/effects/etc) eats memory like a hog. > In my dispatch... > http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi > I usually set this to around 50 executions per gc run and my rails apps seem > pretty happy. >You''re using *RMagick*, not ImageMagick directly. If you used the later (via system calls) there will no be memory leakage you can worry about.> This has been working great for me thus far, but using mod_fastcgi leaves > zombies processes occasionally during restart. Checking in with the docs, > mod_fastcgi is more or less deprecated, and mod_fcgid is prefered. > mod_fcgid has all sorts of issues (random 500s and the like), and to boot > the documentation is quite poor. >Moving from FastCGi to Mongrel will also require you monitor your cluster processes with external tools, since you''re suing things that leak too much memory like RMagick and requires restart of the process. To make it clear: the memory leaked by RMagick cannot be recovered with garbage collection mechanism. I tried that several times but in the long run, required to restart and hunt down all the zombies processes left by Apache.> So, I''ve decieded to move my apps over to using nginx with proxy with > mongrel. The decsion to move the nginx is pretty minor (it''s lighter weight > and easier to configure), but my decision to move to mongrel warrented a bit > of research. I do want to ensure that all of my applications behave > properly in terms of memory consumption and the first thing I''ve noticed is > that mongrel doesn''t have the same options available for customizing when > the GC runs. >Can you tell me how you addressed the "schedule" of the garbage collection execution on your previous scenario? AFAIK most of the frameworks or servers don''t impose to the user how often GC should be performed.> This leads me to believe that either there''s something specific to rails > running under FastCGI that requires the GC to disabled/enabled during > processes execution or mongrel hasn''t implemented the feature yet. >I''ll bet is rails specific, or you should take a look at the fcgi ruby extension, since it is responsible, ruby-side, of bridging both worlds. On a personal note, I believe is not responsibility of Mongrel, as a webserver, take care of the garbage collection and leakage issues of the Vm on which your application runs. In any case, the GC of the VM (MRI Ruby) should be enhanced to work better with heavy load and long running environments. -- Luis Lavena Multimedia systems - Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. Douglas Adams
On Mon, Mar 24, 2008 at 12:18 PM, Luis Lavena <luislavena at gmail.com> wrote:> On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor <swindsor at gmail.com> wrote: > > > You''re using *RMagick*, not ImageMagick directly. If you used the > later (via system calls) there will no be memory leakage you can worry > about.You''re correct - I''m using ''RMagick'' - and it uses a large amount of memory. But that''s not really the overall point. My overall point is how to properly handle a rails app that uses a great deal of memory during each request. I''m pretty sure this happens in other rails applications that don''t happen to use ''RMagick''.> > Moving from FastCGi to Mongrel will also require you monitor your > cluster processes with external tools, since you''re suing things that > leak too much memory like RMagick and requires restart of the process. >Yes, although all monitoring will be able to do is kill off a mis-behaved application. I''d much rather run garbage collection rather than kill of my application.> > To make it clear: the memory leaked by RMagick cannot be recovered > with garbage collection mechanism. I tried that several times but in > the long run, required to restart and hunt down all the zombies > processes left by Apache. >So far, running the GC under fastcgi has given me pretty good results. The zombing issue with fast cgi is a known issue with mod_fastcgi and I''m pretty sure unrelated to RMagick or garbage collection. Can you tell me how you addressed the "schedule" of the garbage> collection execution on your previous scenario? AFAIK most of the > frameworks or servers don''t impose to the user how often GC should be > performed. >In the previous scenario I was using fast_cgi with rails. In my previous reply I provided a link to the rails fastcgi dispatcher. http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi In addtion, in other languages and other language web frameworks there are provisions to control garbage collection (for languages that have garbage collections, of course).> I''ll bet is rails specific, or you should take a look at the fcgi ruby > extension, since it is responsible, ruby-side, of bridging both > worlds. >This is done in the Rails FastCGI dispatcher. I believe that the equivalent of this in Mongrel is the Mongrel Rails dispatcher. Since the Mongrel Rails dispatcher is distributed as a part of Mongrel, I''d say this code is owned by Mongrel, which bridges these two worlds when using mongrel as a webserver.> > On a personal note, I believe is not responsibility of Mongrel, as a > webserver, take care of the garbage collection and leakage issues of > the Vm on which your application runs. In any case, the GC of the VM > (MRI Ruby) should be enhanced to work better with heavy load and long > running environments. >Ruby provides an API to access and call the Garbage Collector. This provides ruby application developers the ability to control when the garbage collection is run because in some cases, there may be an application-specific reason to prevent or explicity run the GC. Web servers are a good example of these applications where state may help determine a better time to run the GC. As you''re serving each request, you''re generally allocating a number of objects, then rendering output, then moving on to the next request. By limiting the GC to run in between requests rather than during requests you are trading request time for latency between requests. This is a trade-off that I think web application developers should deciede, but by no means should this be a default or silver bullet for all. My position is that this just be an option within Mongrel as a web server. - scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080324/8035dbd3/attachment-0001.html
On Mon, Mar 24, 2008 at 4:59 PM, Scott Windsor <swindsor at gmail.com> wrote:> On Mon, Mar 24, 2008 at 12:18 PM, Luis Lavena <luislavena at gmail.com> wrote: > > > > > > > On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor <swindsor at gmail.com> wrote: > > > > > > > > You''re using *RMagick*, not ImageMagick directly. If you used the > > later (via system calls) there will no be memory leakage you can worry > > about. > > You''re correct - I''m using ''RMagick'' - and it uses a large amount of memory. > But that''s not really the overall point. My overall point is how to > properly handle a rails app that uses a great deal of memory during each > request. I''m pretty sure this happens in other rails applications that > don''t happen to use ''RMagick''. >Yes, I faced huge memory usage issues with other things non related to image processing and found that a good thing was move them out of the request-response cycle and into a out-of-bound background job.> > So far, running the GC under fastcgi has given me pretty good results. The > zombing issue with fast cgi is a known issue with mod_fastcgi and I''m pretty > sure unrelated to RMagick or garbage collection. >Yes, but even you "reclaim" the memory with GC, there will be pieces that wouldn''t be GC''ed ever, since the leaked in the C side, outside GC control (some of the RMagick and ImageMagick mysteries).> > > Can you tell me how you addressed the "schedule" of the garbage > > collection execution on your previous scenario? AFAIK most of the > > frameworks or servers don''t impose to the user how often GC should be > > performed. > > > > > > In the previous scenario I was using fast_cgi with rails. In my previous > reply I provided a link to the rails fastcgi dispatcher. > > http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi > > In addtion, in other languages and other language web frameworks there are > provisions to control garbage collection (for languages that have garbage > collections, of course). > > > > I''ll bet is rails specific, or you should take a look at the fcgi ruby > > extension, since it is responsible, ruby-side, of bridging both > > worlds. > > > > This is done in the Rails FastCGI dispatcher. I believe that the equivalent > of this in Mongrel is the Mongrel Rails dispatcher. Since the Mongrel Rails > dispatcher is distributed as a part of Mongrel, I''d say this code is owned > by Mongrel, which bridges these two worlds when using mongrel as a > webserver. >Then you could provide a different Mongrel Handler that could perform that, or even a series of GemPlugins that provide a gc:start instead of plain ''start'' command mongrel_rails scripts provides.> > > > On a personal note, I believe is not responsibility of Mongrel, as a > > webserver, take care of the garbage collection and leakage issues of > > the Vm on which your application runs. In any case, the GC of the VM > > (MRI Ruby) should be enhanced to work better with heavy load and long > > running environments. > > > > Ruby provides an API to access and call the Garbage Collector. This > provides ruby application developers the ability to control when the garbage > collection is run because in some cases, there may be an > application-specific reason to prevent or explicity run the GC. Web servers > are a good example of these applications where state may help determine a > better time to run the GC. As you''re serving each request, you''re generally > allocating a number of objects, then rendering output, then moving on to the > next request. > > By limiting the GC to run in between requests rather than during requests > you are trading request time for latency between requests. This is a > trade-off that I think web application developers should deciede, but by no > means should this be a default or silver bullet for all. My position is > that this just be an option within Mongrel as a web server. >--gc-interval maybe? Now that you convinced me and proved your point, having the option to perform it (optionally, not forced) will be something good to have. Patches are Welcome ;-) -- Luis Lavena Multimedia systems - Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. Douglas Adams
Forgive me for not having read the whole thread, however, there is one thing that seems to be really important, and that is, ruby hardly ever runs the damned GC. It certainly doesn''t do full runs nearly often enough (IMO). Also, implicit OOMEs or GC runs quite often DO NOT affect the extensions correctly. I don''t know what rmagick is doing under the hood in this area, but having been generating large portions of country maps with it (and moving away from it very rapidly), I know the GC doesn''t do "The Right Thing". First call of address is GC_MALLOC_LIMIT and friends. For any small script that doesn''t breach that value, the GC simply doesn''t run. More than this, RMagick, in it''s apparent ''wisdom'' never frees memory if the GC never runs. Seriously, check it out. Make a tiny script, and make a huge image with it. Hell, make 20, get an OOME, and watch for a run of the GC. The OOME will reach your code before the GC calls on RMagick to free. Now, add a call to GC.start, and no OOME. Despite the limitations of it (ruby performance only IMO), most of the above experience was built up on windows, and last usage was about 6 months ago, FYI. On 24 Mar 2008, at 20:37, Luis Lavena wrote:> On Mon, Mar 24, 2008 at 4:59 PM, Scott Windsor <swindsor at gmail.com> > wrote: >> On Mon, Mar 24, 2008 at 12:18 PM, Luis Lavena >> <luislavena at gmail.com> wrote: >> >>> >>> >>> On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor >>> <swindsor at gmail.com> wrote: >>> >>> >>> >>> You''re using *RMagick*, not ImageMagick directly. If you used the >>> later (via system calls) there will no be memory leakage you can >>> worry >>> about. >> >> You''re correct - I''m using ''RMagick'' - and it uses a large amount >> of memory. >> But that''s not really the overall point. My overall point is how to >> properly handle a rails app that uses a great deal of memory during >> each >> request. I''m pretty sure this happens in other rails applications >> that >> don''t happen to use ''RMagick''.Personally, I''ll simply say call the GC more often. Seriously. I mean it. It''s not *that* slow, not at all. In fact, I call GC.start explicitly inside of by ubygems.rb due to stuff I have observed before: http://blog.ra66i.org/archives/informatics/2007/10/05/calling-on-the-gc-after-rubygems/ - N.B. This isn''t "FIXED" it''s still a good idea (gem 1.0.1). http://zdavatz.wordpress.com/2007/07/18/heap-fragmentation-in-a-long-running-ruby-process/ Now, by my reckoning (and a few production apps seem to be showing emperically (purely emperical, sorry)) we should be calling on the GC whilst loading up the apps. I mean come on, when are a really serious number of temporary objects being created. Actually, it''s when rubygems loads, and that''s the first thing that happens in, hmm, probably over 90% of ruby processes out there.> > Yes, I faced huge memory usage issues with other things non related to > image processing and found that a good thing was move them out of the > request-response cycle and into a out-of-bound background job. > >> >> So far, running the GC under fastcgi has given me pretty good >> results. The >> zombing issue with fast cgi is a known issue with mod_fastcgi and >> I''m pretty >> sure unrelated to RMagick or garbage collection. >> > > Yes, but even you "reclaim" the memory with GC, there will be pieces > that wouldn''t be GC''ed ever, since the leaked in the C side, outside > GC control (some of the RMagick and ImageMagick mysteries).Sure, but leaks are odd things. Some processes that appear to be leaking are really just fragmenting (allocating more ram due to lack of ''usable'' space on ''the heap''. Call the GC more often, take a 0.01% performance hit, and monitor. I bet it''ll get better. In fact, you can drop fragmentation the first allocated segment significantly just by calling GC.start after a rubygems load, if you have more than a few gems.>>> >>> Can you tell me how you addressed the "schedule" of the garbage >>> collection execution on your previous scenario? AFAIK most of the >>> frameworks or servers don''t impose to the user how often GC should >>> be >>> performed.In fact there are many rubyists who hate the idea of splatting GC.start into processes. Given what I''ve seen, I''m willing to reject that notion completely. Test yourself, YMMV. FYI, even on windows under the OCI, where performance for the interpreter sucks, really really hard, I couldn''t reliably measure the runtime of a call to GC.start after loading rubygems. I don''t know what kind of ''performance'' people are after, but I can''t see the point in not running the GC more often, especially for ''more common'' daemon load. Furthermore, hitting the kernel for more allocations more often, is actually pretty slow too, so this may actually even result in faster processes under *certain* conditions. Running a lib like RMagick, I would say you *should* be doing this, straight up, no arguments.>> >> In the previous scenario I was using fast_cgi with rails. In my >> previous >> reply I provided a link to the rails fastcgi dispatcher. >> >> http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi >> >> In addtion, in other languages and other language web frameworks >> there are >> provisions to control garbage collection (for languages that have >> garbage >> collections, of course). >> >> >>> I''ll bet is rails specific, or you should take a look at the fcgi >>> ruby >>> extension, since it is responsible, ruby-side, of bridging both >>> worlds. >>> >> >> This is done in the Rails FastCGI dispatcher. I believe that the >> equivalent >> of this in Mongrel is the Mongrel Rails dispatcher. Since the >> Mongrel Rails >> dispatcher is distributed as a part of Mongrel, I''d say this code >> is owned >> by Mongrel, which bridges these two worlds when using mongrel as a >> webserver.It doesn''t *really* matter where you run the GC. It matters that it runs, how often, and what it''s doing. If you''re actually calling on the GC and freeing nothing, that''s stupid, but if you''ve run RMagick up, just call GC.start anyway, and I''m pretty sure it''ll help. There''s certainly no harm in investigating this, unless you''re doing something silly with weakrefs.> Then you could provide a different Mongrel Handler that could perform > that, or even a series of GemPlugins that provide a gc:start instead > of plain ''start'' command mongrel_rails scripts provides.$occasional_gc_run_counter = 0 before_filter :occasional_gc_run def occasional_gc_run $occasional_gc_run_counter += 1 if $occasional_gc_run_counter > 1_000 $occasional_gc_run_counter = 0 GC.start end end Or whatever. It doesn''t really matter that much where you do this, or when, it just needs to happen every now and then. More importantly, add a GC.start to the end of environment.rb, and you will have literally half the number of objects in ObjectSpace.>>> On a personal note, I believe is not responsibility of Mongrel, as a >>> webserver, take care of the garbage collection and leakage issues of >>> the Vm on which your application runs. In any case, the GC of the VM >>> (MRI Ruby) should be enhanced to work better with heavy load and >>> long >>> running environments.Right, and it''s not just the interpreter, although indirection around this stuff can help. (such as compacting).>> >> Ruby provides an API to access and call the Garbage Collector. This >> provides ruby application developers the ability to control when >> the garbage >> collection is run because in some cases, there may be an >> application-specific reason to prevent or explicity run the GC. >> Web servers >> are a good example of these applications where state may help >> determine a >> better time to run the GC. As you''re serving each request, you''re >> generally >> allocating a number of objects, then rendering output, then moving >> on to the >> next request. >> >> By limiting the GC to run in between requests rather than during >> requests >> you are trading request time for latency between requests. This is a >> trade-off that I think web application developers should deciede, >> but by no >> means should this be a default or silver bullet for all. My >> position is >> that this just be an option within Mongrel as a web server. >>Right, I think this is important too. You''re absolutely right that there''s no specific place to provide a generic solution. In rails the answer may be simple, but that''s because rails outer architecture is simplistic. No threads, no out-of-request processing, and so on.> > --gc-interval maybe? > > Now that you convinced me and proved your point, having the option to > perform it (optionally, not forced) will be something good to have.Surely you can just: require ''thread'' Thread.new { loop { sleep GC_FORCE_INTERVAL; GC.start } } In environment.rb in that case. Of course, this is going to kill performance under evented_mongrel, thin and so on. I''d stay away from threaded solutions. _why blogged years ago about the GC, trying to remind people that we actually have control. I know ruby is supposed to abstract memory problems etc away from us, and for the most part it does, but hey, no one''s perfect, right? :-) http://whytheluckystiff.net/articles/theFullyUpturnedBin.html> Patches are Welcome ;-)Have fun! :o) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20080325/75800781/attachment-0001.html
On Tue, Mar 25, 2008 at 4:40 AM, James Tucker <jftucker at gmail.com> wrote:> Forgive me for not having read the whole thread, however, there is one thing > that seems to be really important, and that is, ruby hardly ever runs the > damned GC. It certainly doesn''t do full runs nearly often enough (IMO).There''s only one kind of garbage collection sweep. And yeah, depending on what''s happening, GC may not run very often. That''s not generally a problem.> Also, implicit OOMEs or GC runs quite often DO NOT affect the extensions > correctly. I don''t know what rmagick is doing under the hood in this area, > but having been generating large portions of country maps with it (and > moving away from it very rapidly), I know the GC doesn''t do "The Right > Thing".There should be no difference between a GC run that is initiated by the interpreter and one that is initiated by one''s code. It ends up calling the same thing in gc.c. Extensions can easily mismanage memory, though, and I have a hunch about what''s happening with rmagick.> First call of address is GC_MALLOC_LIMIT and friends. For any small script > that doesn''t breach that value, the GC simply doesn''t run. More than this, > RMagick, in it''s apparent ''wisdom'' never frees memory if the GC never runs. > Seriously, check it out. Make a tiny script, and make a huge image with it. > Hell, make 20, get an OOME, and watch for a run of the GC. The OOME will > reach your code before the GC calls on RMagick to free. > > Now, add a call to GC.start, and no OOME. Despite the limitations of it > (ruby performance only IMO), most of the above experience was built up on > windows, and last usage was about 6 months ago, FYI.My hunch is that rmagick is allocating large amounts of RAM ouside of Ruby. It registers its objects with the interpreter, but the RAM usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT because Ruby didn''t allocate it, so doesn''t know about it. So, it uses huge amounts of RAM, but doesn''t use huge numbers of objects. Thus you never trigger a GC cycle by exceeding the GC_MALLOC_LIMIT nor by running our of object slots in the heap. I''d have to go look at the code to be sure, but the theory fits the behavior that is described very well. I don''t think this is a case for building GC.foo memory management into Mongrel, though. As I think you are suggesting, just call GC.start yourself in your code when necessary. In a typical Rails app doing big things with rmagick, the extra time to do GC.start at the end of the image manipulation, in the request handling, isn''t going to be noticable.> But that''s not really the overall point. My overall point is how to > properly handle a rails app that uses a great deal of memory during each > request. I''m pretty sure this happens in other rails applications that > don''t happen to use ''RMagick''. > > Personally, I''ll simply say call the GC more often. Seriously. I mean it. > It''s not *that* slow, not at all. In fact, I call GC.start explicitly inside > of by ubygems.rb due to stuff I have observed before:I completely concur with this. If there are issues with huge memory use (most likely caused by extensions making RAM allocations outside of Ruby''s accounting, so implicit GC isn''t triggered), just call GC.start in one''s own code.> Now, by my reckoning (and a few production apps seem to be showing > emperically (purely emperical, sorry)) we should be calling on the GC whilst > loading up the apps. I mean come on, when are a really serious number of > temporary objects being created. Actually, it''s when rubygems loads, and > that''s the first thing that happens in, hmm, probably over 90% of ruby > processes out there.Just as a tangent, I do this in Swiftiply. I make an explicit call to GC.start after everything is loaded and all configs are parsed, just to make sure execution is going into the main event loop with as much junk cleaned out as possible.> Or whatever. It doesn''t really matter that much where you do this, or when, > it just needs to happen every now and then. More importantly, add a GC.start > to the end of environment.rb, and you will have literally half the number of > objects in ObjectSpace.This makes sense to me. I could also see providing a 2nd Rails handler that had some GC management stuff in it, along with some documentation on what it actually does or does not do, so people can make an explicit choice to use it, if they need it. I''m still completely against throwing code into Mongrel itself for this sort of thing. I just prefer not to throw more things into Mongrel than we really _need_ to, when there is no strong argument for them being inside of Mongrel itself. GC.start stuff is simple enough to put into one''s own code at appropriate locations, or to put into a customized Mongrel handler if one needs it. Maybe this simply needs to be documented in the body of Mongrel documentation? Kirk Haines
> My hunch is that rmagick is allocating large amounts of RAM ouside of > Ruby. It registers its objects with the interpreter, but the RAM > usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT because > Ruby didn''t allocate it, so doesn''t know about it.It''s allocating opaque objects on the Ruby heap but not using Ruby''s built-in malloc? That seems pretty evil. Evan> > So, it uses huge amounts of RAM, but doesn''t use huge numbers of > objects. Thus you never trigger a GC cycle by exceeding the > GC_MALLOC_LIMIT nor by running our of object slots in the heap. I''d > have to go look at the code to be sure, but the theory fits the > behavior that is described very well. > > I don''t think this is a case for building GC.foo memory management > into Mongrel, though. As I think you are suggesting, just call > GC.start yourself in your code when necessary. In a typical Rails app > doing big things with rmagick, the extra time to do GC.start at the > end of the image manipulation, in the request handling, isn''t going to > be noticable. > > > > But that''s not really the overall point. My overall point is how to > > properly handle a rails app that uses a great deal of memory during each > > request. I''m pretty sure this happens in other rails applications that > > don''t happen to use ''RMagick''. > > > > Personally, I''ll simply say call the GC more often. Seriously. I mean it. > > It''s not *that* slow, not at all. In fact, I call GC.start explicitly inside > > of by ubygems.rb due to stuff I have observed before: > > I completely concur with this. If there are issues with huge memory > use (most likely caused by extensions making RAM allocations outside > of Ruby''s accounting, so implicit GC isn''t triggered), just call > GC.start in one''s own code. > > > > Now, by my reckoning (and a few production apps seem to be showing > > emperically (purely emperical, sorry)) we should be calling on the GC whilst > > loading up the apps. I mean come on, when are a really serious number of > > temporary objects being created. Actually, it''s when rubygems loads, and > > that''s the first thing that happens in, hmm, probably over 90% of ruby > > processes out there. > > Just as a tangent, I do this in Swiftiply. I make an explicit call to > GC.start after everything is loaded and all configs are parsed, just > to make sure execution is going into the main event loop with as much > junk cleaned out as possible. > > > > Or whatever. It doesn''t really matter that much where you do this, or when, > > it just needs to happen every now and then. More importantly, add a GC.start > > to the end of environment.rb, and you will have literally half the number of > > objects in ObjectSpace. > > This makes sense to me. > > I could also see providing a 2nd Rails handler that had some GC > management stuff in it, along with some documentation on what it > actually does or does not do, so people can make an explicit choice to > use it, if they need it. I''m still completely against throwing code > into Mongrel itself for this sort of thing. I just prefer not to > throw more things into Mongrel than we really _need_ to, when there is > no strong argument for them being inside of Mongrel itself. GC.start > stuff is simple enough to put into one''s own code at appropriate > locations, or to put into a customized Mongrel handler if one needs > it. > > Maybe this simply needs to be documented in the body of Mongrel documentation? > > > Kirk Haines > > > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Evan Weaver Cloudburst, LLC
At 03:41 AM 3/25/2008, mongrel-users-request at rubyforge.org wrote:>Date: Tue, 25 Mar 2008 10:40:50 +0000 >From: James Tucker <jftucker at gmail.com> >Subject: Re: [Mongrel] mongrel garbage collection >To: mongrel-users at rubyforge.org >Message-ID: <E5223BA4-90BD-4976-8651-33226BC23299 at gmail.com> >Content-Type: text/plain; charset="us-ascii" >[snip] >Also, implicit OOMEs or GC runs quite often DO NOT affect the >extensions correctly. I don''t know what rmagick is doing under the >hood in this area, but having been generating large portions of >country maps with it (and moving away from it very rapidly), I know >the GC doesn''t do "The Right Thing". >[snip]Hi James, My understanding with RMagick is that it is hooking the Imagemagick libs directly in process. As a result, memory is not always freed when you''d expect it to be. I haven''t read up on the details, having chosen to just use out of process image management, but you might find this link interesting - in it, there is a claim that the latest releases of RMagick do *not* in fact leak any memory and that running a full GC manually will reclaim all memory it uses after the references are out of scope. http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618 Steve
At 03:41 AM 3/25/2008, mongrel-users-request at rubyforge.org wrote:>Date: Tue, 25 Mar 2008 10:40:50 +0000 >From: James Tucker <jftucker at gmail.com> >Subject: Re: [Mongrel] mongrel garbage collection >To: mongrel-users at rubyforge.org >Message-ID: <E5223BA4-90BD-4976-8651-33226BC23299 at gmail.com> >Content-Type: text/plain; charset="us-ascii" >[snip] >Also, implicit OOMEs or GC runs quite often DO NOT affect the >extensions correctly. I don''t know what rmagick is doing under the >hood in this area, but having been generating large portions of >country maps with it (and moving away from it very rapidly), I know >the GC doesn''t do "The Right Thing". >[snip]Hi James, My understanding with RMagick is that it is hooking the Imagemagick libs directly in process. As a result, memory is not always freed when you''d expect it to be. I haven''t read up on the details, having chosen to just use out of process image management, but you might find this link interesting - in it, there is a claim that the latest releases of RMagick do *not* in fact leak any memory and that running a full GC manually will reclaim all memory it uses after the references are out of scope. http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618 Steve
On Tue, Mar 25, 2008 at 11:02 AM, Evan Weaver <evan at cloudbur.st> wrote:> > My hunch is that rmagick is allocating large amounts of RAM ouside of > > Ruby. It registers its objects with the interpreter, but the RAM > > usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT because > > Ruby didn''t allocate it, so doesn''t know about it. > > It''s allocating opaque objects on the Ruby heap but not using Ruby''s > built-in malloc? That seems pretty evil.Not really. It''s pretty common in extensions. You alloc your structures in whatever way is appropriate for the library you are using, then use Data_Wrap_Struct with a mark and a free function to hook your stuff into the Ruby garbage collector. Your objects are thus known to Ruby as Ruby objects, but you have potentially large chunks of memory that Ruby itself knows nothing about. Kirk Haines
On 25 Mar 2008, at 15:26, Kirk Haines wrote:> On Tue, Mar 25, 2008 at 4:40 AM, James Tucker <jftucker at gmail.com> > wrote: >> Forgive me for not having read the whole thread, however, there is >> one thing >> that seems to be really important, and that is, ruby hardly ever >> runs the >> damned GC. It certainly doesn''t do full runs nearly often enough >> (IMO). > > There''s only one kind of garbage collection sweep. And yeah, > depending on what''s happening, GC may not run very often. That''s not > generally a problem.Sure, inside ruby there''s only one kind of run, but....>> Also, implicit OOMEs or GC runs quite often DO NOT affect the >> extensions >> correctly. I don''t know what rmagick is doing under the hood in >> this area, >> but having been generating large portions of country maps with it >> (and >> moving away from it very rapidly), I know the GC doesn''t do "The >> Right >> Thing". > > There should be no difference between a GC run that is initiated by > the interpreter and one that is initiated by one''s code. It ends up > calling the same thing in gc.c. Extensions can easily mismanage > memory, though, and I have a hunch about what''s happening with > rmagick.I just realised the obvious truth, that ruby isn''t actually running the GC under those OOME conditions.>> First call of address is GC_MALLOC_LIMIT and friends. For any small >> script >> that doesn''t breach that value, the GC simply doesn''t run. More >> than this, >> RMagick, in it''s apparent ''wisdom'' never frees memory if the GC >> never runs. >> Seriously, check it out. Make a tiny script, and make a huge image >> with it. >> Hell, make 20, get an OOME, and watch for a run of the GC. The OOME >> will >> reach your code before the GC calls on RMagick to free. >> >> Now, add a call to GC.start, and no OOME. Despite the limitations >> of it >> (ruby performance only IMO), most of the above experience was built >> up on >> windows, and last usage was about 6 months ago, FYI. > > My hunch is that rmagick is allocating large amounts of RAM ouside of > Ruby. It registers its objects with the interpreter, but the RAM > usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT because > Ruby didn''t allocate it, so doesn''t know about it.Yup, it''s ImageMagick, un-patched and they don''t provide afaik a callback to replace malloc, or maybe that''s an rmagick issue.> So, it uses huge amounts of RAM, but doesn''t use huge numbers of > objects. Thus you never trigger a GC cycle by exceeding the > GC_MALLOC_LIMIT nor by running our of object slots in the heap. I''d > have to go look at the code to be sure, but the theory fits the > behavior that is described very well.Right, in fact, I think the OOME actually comes from outside of ruby (unverified), and ruby can''t or won''t run the GC before going down. As the free() calls inside RMagick / ImageMagick aren''t happening without calling GC.start. The GC.start call, somewhere/how is being used to trigger frees in the framework. Personally, this is bad design, and the really common complaints may also suggest so, however, I don''t know what their domain specific issues and limitations are. Maybe it''s an ImageMagick thing. Creating an OOME inside ruby, the interpreter calls on GC.start prior to going down. I started talking to zenspider about this stuff, and eventually he just pointed me at gc.c, fair enough. I still hold the opinion that an OOME hitting the interpreter (from whatever source) should attempt to invoke the GC. Of course, a hell of a lot of software doesn''t check the result of a call to malloc(), tut tut. Tool: http://ideas.water-powered.com/projects/libgreat> I don''t think this is a case for building GC.foo memory management > into Mongrel, though. As I think you are suggesting, just call > GC.start yourself in your code when necessary. In a typical Rails app > doing big things with rmagick, the extra time to do GC.start at the > end of the image manipulation, in the request handling, isn''t going to > be noticable.Absolutely right, and yes, this is my opinion.>> But that''s not really the overall point. My overall point is how to >> properly handle a rails app that uses a great deal of memory during >> each >> request. I''m pretty sure this happens in other rails applications >> that >> don''t happen to use ''RMagick''. >> >> Personally, I''ll simply say call the GC more often. Seriously. I >> mean it. >> It''s not *that* slow, not at all. In fact, I call GC.start >> explicitly inside >> of by ubygems.rb due to stuff I have observed before: > > I completely concur with this. If there are issues with huge memory > use (most likely caused by extensions making RAM allocations outside > of Ruby''s accounting, so implicit GC isn''t triggered), just call > GC.start in one''s own code. > >> Now, by my reckoning (and a few production apps seem to be showing >> emperically (purely emperical, sorry)) we should be calling on the >> GC whilst >> loading up the apps. I mean come on, when are a really serious >> number of >> temporary objects being created. Actually, it''s when rubygems >> loads, and >> that''s the first thing that happens in, hmm, probably over 90% of >> ruby >> processes out there. > > Just as a tangent, I do this in Swiftiply. I make an explicit call to > GC.start after everything is loaded and all configs are parsed, just > to make sure execution is going into the main event loop with as much > junk cleaned out as possible.I''ve done similar in anything that is running as a fire and forget style daemon. You know, the kinds of things that get setup once, and run for 1 to 20 years. There are several that I have never restarted. No rails, though. These kinds of things I also simply don''t want to waste the ram to silly fragmentation, the next allocation takes you up to a registerable percentage on medium aged machines. IIRC there''s one in my copy of analogger too, or maybe you had that in there already :-)>> Or whatever. It doesn''t really matter that much where you do this, >> or when, >> it just needs to happen every now and then. More importantly, add a >> GC.start >> to the end of environment.rb, and you will have literally half the >> number of >> objects in ObjectSpace. > > This makes sense to me. > > I could also see providing a 2nd Rails handler that had some GC > management stuff in it, along with some documentation on what it > actually does or does not do, so people can make an explicit choice to > use it, if they need it. I''m still completely against throwing code > into Mongrel itself for this sort of thing. I just prefer not to > throw more things into Mongrel than we really _need_ to, when there is > no strong argument for them being inside of Mongrel itself. GC.start > stuff is simple enough to put into one''s own code at appropriate > locations, or to put into a customized Mongrel handler if one needs > it.If it wasn''t app specific I''d say put it in mongrel. It is though, and peoples tendency to pre-optimize probably makes this pointless. I mean the cost of doing it in a thread under eventmachine is way higher than the ram usage costs for pure ruby apps, at least for my pure ruby apps. 20-40mb vs. lots of req. / sec. But then, one could check for better alternatives, like add_timer(), etc, but that route tends towards bloat, so your original assertion of put it in the app configuration, is what I would choose.> Maybe this simply needs to be documented in the body of Mongrel > documentation?Maybe not even there. I think research needs to be done into the longer running effects of the GC under real environments. I know some people have done some (including myself), but the results are never released in public. The GC also seems to be one of those topics, as it''s so close to performance, where people are happy to see how high up the wall they can go, prior to doing research. With regard to mongrel and this stuff, it''s really not a mongrel issue. Mongrel is a great citizen wrt the GC (at least by comparison to a lot of other code). Particularly bad citizens in this area include: - Every single pure ruby pdf lib I''ve seen - rubygems (by way of the spec loading semantics, not rubygems itself, kinda (lets just say, I''d do it different, but by design, not implementation)) - rails - rmagick> > > > Kirk Haines > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users
On 25 Mar 2008, at 17:05, Steve Midgley wrote:> At 03:41 AM 3/25/2008, mongrel-users-request at rubyforge.org wrote: >> Date: Tue, 25 Mar 2008 10:40:50 +0000 >> From: James Tucker <jftucker at gmail.com> >> Subject: Re: [Mongrel] mongrel garbage collection >> To: mongrel-users at rubyforge.org >> Message-ID: <E5223BA4-90BD-4976-8651-33226BC23299 at gmail.com> >> Content-Type: text/plain; charset="us-ascii" >> [snip] >> Also, implicit OOMEs or GC runs quite often DO NOT affect the >> extensions correctly. I don''t know what rmagick is doing under the >> hood in this area, but having been generating large portions of >> country maps with it (and moving away from it very rapidly), I know >> the GC doesn''t do "The Right Thing". >> [snip] > > Hi James, > > My understanding with RMagick is that it is hooking the Imagemagick > libs directly in process. As a result, memory is not always freed when > you''d expect it to be. I haven''t read up on the details, having chosen > to just use out of process image management, but you might find this > link interesting - in it, there is a claim that the latest releases of > RMagick do *not* in fact leak any memory and that running a full GC > manually will reclaim all memory it uses after the references are out > of scope.Thank you for kindly ensuring that I got this. We actually moved completely away from anything ImageMagick based. There really wasn''t any sensible way to ''fix'' it. Whilst destroy! looks ok, even when doing what we were (high res tiling, covering around 250 square miles), we found performance was fine and we could avoid all allocation issues by using the crazy thread solution (Thread.new { loop { sleep some_time; GC.start } }). This is all good in most scenarios but then there are times when running a framework like eventmachine, where threads (yes, even green ones) can be total performance killers too. Mind you, under rails, there''s always a linear reaction run, so I''m not going to speculate more on that detail. It''s also OT for here, mostly...> http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618 > > SteveThanks again, James. P.S. Personally, if I was coming up against this problem today, I''d drop out to a separate process, driven by something like background job if under rails. If under pure ruby, I''d use drb or eventmachine + a marshalling protocol, depending on specific requirements. The biggest issue for our old project was hitting swap / page file. Image rendering, when you''re already working on the per-pixel layer, is really easy to divide up, though, so optimizing for speed is pretty easy really. When it comes to background concurrent scheduling, staying away from ACID can really help, too, but that really is another topic for another time. Lets just say, allow slack, and life will be easier if you ever hit a silly scale. I''ve seen people trying to recover broken ACID implementations by trawling logs, and my god, tearful.
On Mon, 24 Mar 2008 08:21:52 -0700 "Scott Windsor" <swindsor at gmail.com> wrote:> Right now, my processes aren''t gigantic... I''m preparing for a ''worst case'' > scenario when I have a extremely large processes or memory usage. This can > easily happen on specific applications such as an image server (using image > magick) or parsing/creating large xml payloads (a large REST server). For > those applications, I may have a large amount of memory used for each > request, which will increase until the GC is run.Well, does that mean you DO have this problem or DO NOT have this problem? If you aren''t actually facing a real problem that could be solved by this change then you most likely won''t get very far. Any imagined scenario you come up with could easily just be avoided. If you want to do this then you''ll have to write code and you''ll have to learn how to make a Mongrel handler that is registered before and after the regular rails handler. All you do is have this before handler analyze the request and disable the GC on the way in. In the after handler you just have to renable the GC and make it do the work. It''s pretty simple, but *you* will have to go read Mongrel code and understand it first. Otherwise you''re just wasting your time really. -- Zed A. Shaw - Hate: http://savingtheinternetwithhate.com/ - Good: http://www.zedshaw.com/ - Evil: http://yearofevil.com/
You''re right, ok. So the memory causing the OOM error isn''t actually on the Ruby heap, but it can''t get freed until the opaque object gets GC''d. Evan On Tue, Mar 25, 2008 at 1:20 PM, Kirk Haines <wyhaines at gmail.com> wrote:> On Tue, Mar 25, 2008 at 11:02 AM, Evan Weaver <evan at cloudbur.st> wrote: > > > My hunch is that rmagick is allocating large amounts of RAM ouside of > > > Ruby. It registers its objects with the interpreter, but the RAM > > > usage in rmagick itself doesn''t count against GC_MALLOC_LIMIT because > > > Ruby didn''t allocate it, so doesn''t know about it. > > > > It''s allocating opaque objects on the Ruby heap but not using Ruby''s > > built-in malloc? That seems pretty evil. > > Not really. It''s pretty common in extensions. You alloc your > structures in whatever way is appropriate for the library you are > using, then use Data_Wrap_Struct with a mark and a free function to > hook your stuff into the Ruby garbage collector. > > Your objects are thus known to Ruby as Ruby objects, but you have > potentially large chunks of memory that Ruby itself knows nothing > about. > > > > > Kirk Haines > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Evan Weaver Cloudburst, LLC