Monit 4.9, Mongrel 1.0.1, Rails 1.2.6, Mac OS X 10.4.11 (PPC) I don''t know whether this is a mongrel issue or a monit issue. I''m trying to poke my way around a system set up by someone else. I have no more experience w/ mongrel that local Rails dev at this point, and a conceptual understanding of how monit is working. I have the Deploying Rails beta book, and I''m muddling my way thru mongrel and monit docs, but I think some hints as to direction would be useful. I am suspicious that all cannot be well on this setup as monit will send dozens of messages a day, and occasionally hundreds of messages. The worst day was 1400 alerts. Yes, 1400. The bulk comes from there being 3 clusters (staging, beta, production), and 10 mongrels per cluster, and two servers. So, we can reduce the total quantity by these factors, I get that part, but still, there''s an aweful lot of "this stopped" and "that does not exist" even factoring the redundancy out. I don''t understand the implications of what each of these means. Mongrel keep crashing? Rails crashing? Monit crashing? Thanks for any clues you can offer. Sample messages I get are: -- (A)---------------------------------- Monit instance changed Service [domain snipped] Date: Tue, 08 Jan 2008 14:41:50 -0800 Action: alert Host: [domain snipped] Description: Monit stopped -- (B)---------------------------------- Does not exist Service mongrel-production-8300 Date: Tue, 08 Jan 2008 15:30:04 -0800 Action: restart Host: [domain snipped] Description: ''mongrel-production-8300'' process is not running -- (C)---------------------------------- Execution failed Service mongrel-production-8301 Date: Tue, 08 Jan 2008 15:30:34 -0800 Action: alert Host: [domain snipped] Description: ''mongrel-production-8301'' failed to start -- Posted via http://www.ruby-forum.com/.
Sounds like you have a number of issues. Starting with mongrel, what do the mongrel logs for the pids that have stopped running say ? Also check /var/log/system.log for monit messages. It may be worth upgrading to monit 4.10.1, which includes a number of fixes for running monit under OSX. Cheers Dave On 09/01/2008, at 11:58 AM, Greg Willits wrote:> I don''t understand the implications of what each of these means. > Mongrel > keep crashing? Rails crashing? Monit crashing?
At Wed, 9 Jan 2008 01:58:58 +0100, Greg Willits <lists at ruby-forum.com> wrote:> > Monit 4.9, Mongrel 1.0.1, Rails 1.2.6, Mac OS X 10.4.11 (PPC) > > I don''t know whether this is a mongrel issue or a monit issue. > > I''m trying to poke my way around a system set up by someone else. I have > no more experience w/ mongrel that local Rails dev at this point, and a > conceptual understanding of how monit is working. I have the Deploying > Rails beta book, and I''m muddling my way thru mongrel and monit docs, > but I think some hints as to direction would be useful. > > [?]I have seen a similar situation here. What happened was (more or less, this is from memory) a mongrel instance would be locked up on an HTTP response that would take a long time to complete. Because requests would just queue up behind this one, monit would fail to get a response in a reasonable time, would assume that the process was non-responsive and try to restart it gracefully (using mongrel_rails stop). Mongrel would take a long time to shut down because it was still processing that long running response, so we would get a message that monit couldn''t shut it down and it would fail to start (or something like that). Finally the long running rails process would complete, mongrel would restart, and monit would let us know that the process was back up. The solution was to make sure that responses come back in a reasonable amount of time. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://rubyforge.org/pipermail/mongrel-users/attachments/20080108/15656214/attachment.bin
Make sure your Monit check interval (not sure abou the default) is greater than your Mongrel request timeout interval (default 60 seconds). Evan On Jan 8, 2008 8:18 PM, Erik Hetzner <erik.hetzner at ucop.edu> wrote:> At Wed, 9 Jan 2008 01:58:58 +0100, > Greg Willits <lists at ruby-forum.com> wrote: > > > > Monit 4.9, Mongrel 1.0.1, Rails 1.2.6, Mac OS X 10.4.11 (PPC) > > > > I don''t know whether this is a mongrel issue or a monit issue. > > > > I''m trying to poke my way around a system set up by someone else. I have > > no more experience w/ mongrel that local Rails dev at this point, and a > > conceptual understanding of how monit is working. I have the Deploying > > Rails beta book, and I''m muddling my way thru mongrel and monit docs, > > but I think some hints as to direction would be useful. > > > > [?] > > I have seen a similar situation here. What happened was (more or less, > this is from memory) a mongrel instance would be locked up on an HTTP > response that would take a long time to complete. Because requests > would just queue up behind this one, monit would fail to get a > response in a reasonable time, would assume that the process was > non-responsive and try to restart it gracefully (using mongrel_rails > stop). Mongrel would take a long time to shut down because it was > still processing that long running response, so we would get a message > that monit couldn''t shut it down and it would fail to start (or > something like that). Finally the long running rails process would > complete, mongrel would restart, and monit would let us know that the > process was back up. > > The solution was to make sure that responses come back in a reasonable > amount of time. > > best, > Erik Hetzner > ;; Erik Hetzner, California Digital Library > ;; gnupg key id: 1024D/01DB07E3 > > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Evan Weaver Cloudburst, LLC
Thanks for the ideas so far. I''ll look into the latest monit. Message (A) is starting to look like a monit crash to me. It is always followed by a bunch of similar messages that monit maybe stopping/starting all the mongrels. looks like the logs have little or no date/time stamps, so they''re semi-useless in trying to correlate to the email alerts. I do have some requests that can take a while to process (depends on response time from external services), so that''s a valid lead. Evan Weaver wrote:> Make sure your Monit check interval (not sure abou the default) is > greater than your Mongrel request timeout interval (default 60 > seconds).I have looked everywhere I can think of, and I don''t see any mention of this timeout value anywhere in Mongrel docs. This page (http://mongrel.rubyforge.org/docs/howto.html) mentions a -t (timeout), but the description doesn''t match what you''re referring to. It looks like a delay between the end of responding to request A and starting to handle request B, not when to give up on A. I guess I''ll assume the 60 secs, and play with monit accordingly. -- gw -- Posted via http://www.ruby-forum.com/.
That page is out of date. The RDoc is probably better. And there''s always the source... Soon we''ll do some work on the state of the documentation. Evan On Jan 9, 2008 1:55 PM, Greg Willits <lists at ruby-forum.com> wrote:> Thanks for the ideas so far. I''ll look into the latest monit. Message > (A) is starting to look like a monit crash to me. It is always followed > by a bunch of similar messages that monit maybe stopping/starting all > the mongrels. > > looks like the logs have little or no date/time stamps, so they''re > semi-useless in trying to correlate to the email alerts. > > I do have some requests that can take a while to process (depends on > response time from external services), so that''s a valid lead. > > Evan Weaver wrote: > > Make sure your Monit check interval (not sure abou the default) is > > greater than your Mongrel request timeout interval (default 60 > > seconds). > > I have looked everywhere I can think of, and I don''t see any mention of > this timeout value anywhere in Mongrel docs. This page > (http://mongrel.rubyforge.org/docs/howto.html) mentions a -t (timeout), > but the description doesn''t match what you''re referring to. It looks > like a delay between the end of responding to request A and starting to > handle request B, not when to give up on A. > > I guess I''ll assume the 60 secs, and play with monit accordingly. > > -- gw > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Evan Weaver Cloudburst, LLC
At Wed, 9 Jan 2008 19:55:27 +0100, Greg Willits <lists at ruby-forum.com> wrote:> > Thanks for the ideas so far. I''ll look into the latest monit. Message > (A) is starting to look like a monit crash to me. It is always followed > by a bunch of similar messages that monit maybe stopping/starting all > the mongrels. > > [?]I doubt a monit crash. This is the message I get when I start monit with the ?-I quit? option. It sounds like something (a cron job?) is restarting monit, & monit is not noticing that the mongrels are running when it restarts, so it tries to bring the mongrels up. Fool around with the monitrc: perhaps monit is failing to notice the pid files that exist for mongrel? best, Erik Hetzner -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://rubyforge.org/pipermail/mongrel-users/attachments/20080109/a60a434d/attachment.bin
Erik Hetzner wrote:> At Wed, 9 Jan 2008 19:55:27 +0100, > Greg Willits <lists at ruby-forum.com> wrote: >> >> Thanks for the ideas so far. I''ll look into the latest monit. Message >> (A) is starting to look like a monit crash to me. It is always followed >> by a bunch of similar messages that monit maybe stopping/starting all >> the mongrels. >> >> [?] > > I doubt a monit crash. This is the message I get when I start monit > with the ?-I quit? option. It sounds like something (a cron job?) is > restarting monit....Yeah we have launchd monitoring monit, so that could explain that. When it was all set up it was explained to me that "mongrel/rails crashes/has leaks, so we use monit to keep an eye on that, but monit crashes/has leaks, so we''ll use launchd to monitor monit" Sounded like a house of cards to me, but wasn''t in a position to argue it at the time. IIRC the monit thing may have been a leak specific to OS X at the time. So hopefully the recent versions are the solution to that. I should get a chance to look into that tonight. Thanks. -- gw -- Posted via http://www.ruby-forum.com/.
On Jan 9, 2008, at 3:09 PM, Greg Willits wrote:> Yeah we have launchd monitoring monit, so that could explain that.Y''know, you can just have launchd monitor mongrel. That probably makes more sense than launchd watching monit watching mongrel ;-) -n
Nathan Vack wrote:> On Jan 9, 2008, at 3:09 PM, Greg Willits wrote: > >> Yeah we have launchd monitoring monit, so that could explain that. > > Y''know, you can just have launchd monitor mongrel. That probably > makes more sense than launchd watching monit watching mongrel ;-)Yep. Now that I''ve been poking around and getting more familiar with this setup and see that launchd can monitor those details, that seemed like a logical thing to me, so now I have a "second" :-) The orginal guy was just learning OS X at the time and was more familiar with monit as part of his overall Rails deployment package. -- gw -- Posted via http://www.ruby-forum.com/.