I''m running a site that gets ~30k to 40k page hits per day. In the last 4 days my mongrel processes have been jumping into high CPU usage a couple of times a day to the point where my site becomes unresponsive (database on a diff machine with no load). The only way for me to resolve the problem and reduce load on the machine is to delete my rails cache directory (I have plenty of space and I''m not hitting any filesystem limit in terms of # of files and directories). I initially thought it was the DOS vulnerability posted a couple of days ago but I upgraded to ruby 1.8.5p2 with mongrel 0.3.18 and we are still experiencing the same problem multiple times a day. Is there some way to debug where in our application/rails/mongrel the cpu is getting spiked from? I''m just not sure how to proceed further in debugging this. Pen Mongrel 0.3.18 ( 5 running ) ruby 1.8.5p2 Thanks, Dallas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061206/b936ecd2/attachment.html
Not sure I have an answer to your question, but we are running mongrel (x5) and apache 2.0 (mod_proxy_balancer) also separately from the database, on a single 2-CPU low-end server with about 30K page hits a day, with no caching at all, and the load on the box is nearly zero all the time. I see that you are using pen which is a TCP based load balancer? Does it server your static files, or does mongrel do that? If you are serving static files with mongrel I can see how you can be overloading your box so easily. We have 40+ apache processes in front of mongrel which serve two purposes: 1. they serve static files very quickly and free up mongrel to only handle actual RoR actions 2. they push the data over the network, so that slow clients do not hold up scarce mongrel processes. Also note that each apache process is abount 2-3Mb. Our mongrel is about 80Mb in comparison. Again, it''s a lot cheaper to server as much content as possible without hitting mongrel. Hope this helps, Konstantin On Dec 6, 2006, at 1:56 PM, Dallas DeVries wrote:> I''m running a site that gets ~30k to 40k page hits per day. In the > last 4 days my mongrel processes have been jumping into high CPU > usage a couple of times a day to the point where my site becomes > unresponsive (database on a diff machine with no load). The only > way for me to resolve the problem and reduce load on the machine is > to delete my rails cache directory (I have plenty of space and I''m > not hitting any filesystem limit in terms of # of files and > directories). I initially thought it was the DOS vulnerability > posted a couple of days ago but I upgraded to ruby 1.8.5p2 with > mongrel 0.3.18 and we are still experiencing the same problem > multiple times a day. > > Is there some way to debug where in our application/rails/mongrel > the cpu is getting spiked from? I''m just not sure how to proceed > further in debugging this. > > Pen > Mongrel 0.3.18 ( 5 running ) > ruby 1.8.5p2 > > Thanks, > Dallas > > > > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users
Thanks Konstantin, We will give that a try (apache+mod_proxy). We try to cache everything (using fragment caching) we can. Gets to be tens of thousands of files quickly(never too many in one directory). It just seemed bizarre to us that load goes way up periodically( and only recently) and that removing our cache directory fixes this. Just restarting mongrel seemingly has no effect until we do remove the cache. Anyways we will try with a clean install and use apache. Thanks for the info. Cheers, Dallas On 12/6/06, kigsteronline at mac.com <kigsteronline at mac.com> wrote:> > Not sure I have an answer to your question, but we are running > mongrel (x5) and apache 2.0 (mod_proxy_balancer) also separately from > the database, on a single 2-CPU low-end server with about 30K page > hits a day, with no caching at all, and the load on the box is nearly > zero all the time. > > I see that you are using pen which is a TCP based load balancer? > Does it server your static files, or does mongrel do that? > > If you are serving static files with mongrel I can see how you can be > overloading your box so easily. We have 40+ apache processes in > front of mongrel which serve two purposes: > > 1. they serve static files very quickly and free up mongrel to only > handle actual RoR actions > 2. they push the data over the network, so that slow clients do not > hold up scarce mongrel processes. > > Also note that each apache process is abount 2-3Mb. Our mongrel is > about 80Mb in comparison. Again, it''s a lot cheaper to server as much > content as possible without hitting mongrel. > > Hope this helps, > Konstantin > > > > On Dec 6, 2006, at 1:56 PM, Dallas DeVries wrote: > > > I''m running a site that gets ~30k to 40k page hits per day. In the > > last 4 days my mongrel processes have been jumping into high CPU > > usage a couple of times a day to the point where my site becomes > > unresponsive (database on a diff machine with no load). The only > > way for me to resolve the problem and reduce load on the machine is > > to delete my rails cache directory (I have plenty of space and I''m > > not hitting any filesystem limit in terms of # of files and > > directories). I initially thought it was the DOS vulnerability > > posted a couple of days ago but I upgraded to ruby 1.8.5p2 with > > mongrel 0.3.18 and we are still experiencing the same problem > > multiple times a day. > > > > Is there some way to debug where in our application/rails/mongrel > > the cpu is getting spiked from? I''m just not sure how to proceed > > further in debugging this. > > > > Pen > > Mongrel 0.3.18 ( 5 running ) > > ruby 1.8.5p2 > > > > Thanks, > > Dallas > > > > > > > > _______________________________________________ > > Mongrel-users mailing list > > Mongrel-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mongrel-users > > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061206/4e05ebf6/attachment-0001.html
Dallas, One more thing to consider. I remember doing research on file system management for common OS choices a few years back. There was a paper published by someone at NetApp R&D that analyzed the time it takes to open a file by name when there are thousands of files in the same directory. As I understand it, this requires scanning the directory file until you find the given file, then determining it''s i-node and then reading the file blocks directly from the i-node. This should tell you that depending on how the directory file is structured, and how the search is conducted will have a dramatic affect on the performance. Their whole point was that in standard Linux file system implementations (and most other UNIXes) the access time would increase exponentially with the number of files in the directory. When you reach 10-100K files in a single directory you may notice the slowdown without a stopwatch. NetApp''s own file system was designed to have a constant access time regardless of the number of files in the directory (that was their claim to fame anyway). Bottom line is that I wouldn''t assume that a directory with 10K+ files is not the main reason for your performance degradation. Hope this helps, Konstantin On Dec 6, 2006, at 3:13 PM, Dallas DeVries wrote:> Thanks Konstantin, > > We will give that a try (apache+mod_proxy). We try to cache > everything (using fragment caching) we can. Gets to be tens of > thousands of files quickly(never too many in one directory). It > just seemed bizarre to us that load goes way up periodically( and > only recently) and that removing our cache directory fixes this. > Just restarting mongrel seemingly has no effect until we do remove > the cache. Anyways we will try with a clean install and use > apache. Thanks for the info.
> > directory. When you reach 10-100K files in a single directory you > may notice the slowdown without a stopwatch. NetApp''s own file > system was designed to have a constant access time regardless of the > number of files in the directory (that was their claim to fame anyway). > > Bottom line is that I wouldn''t assume that a directory with 10K+ > files is not the main reason for your performance degradation. > > Hope this helps, > Konstantin > > >Hi Konstantin, You''re right on. Dallas : ReiserFS is designed to perform well in such a situation of having a lot of small files in the same directory. Can you switch to ReiserFS ? If not, can you switch to memcached ? Are you expiring your sessions ? BR, ~A -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061206/9ccaaf6d/attachment.html
Hi guys thanks for the response, Yeah I initially figured that the # of files in the directory was the problem and it was the first thing I changed earlier this week. I changed my caching to use many more subdirectories so that I don''t have 10,000+ files in one directory any more. Right now my largest directory has about 2000-3000 files most the rest are much smaller than this. Unfortunately that change did not help the problem...I can have upwards of 50,000 files so I''m not sure how feasible memcached is at the moment. Anyways I''m almost done with a clean install with apache/mongrel cluster mentioned by Konstantin so hopefully that solves my problem. Cheers, Dallas On 12/6/06, anjan bacchu <anjan.summit at gmail.com> wrote:> > directory. When you reach 10-100K files in a single directory you > > may notice the slowdown without a stopwatch. NetApp''s own file > > system was designed to have a constant access time regardless of the > > number of files in the directory (that was their claim to fame anyway). > > > > Bottom line is that I wouldn''t assume that a directory with 10K+ > > files is not the main reason for your performance degradation. > > > > Hope this helps, > > Konstantin > > > > > > > Hi Konstantin, > > You''re right on. > > Dallas : ReiserFS is designed to perform well in such a situation of > having a lot of small files in the same directory. Can you switch to > ReiserFS ? If not, can you switch to memcached ? Are you expiring your > sessions ? > > BR, > ~A > > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061207/f5c868a4/attachment.html
Hi, On Don 07.12.2006 09:30, Dallas DeVries wrote:>Anyways I''m almost done with a clean install with apache/mongrel >cluster mentioned by Konstantin so hopefully that solves my problem.how about to look into nginx with mongrel?! http://brainspl.at/articles/2006/08/23/nginx-my-new-favorite-front-end-for-mongrel-cluster http://www.brainspl.at/articles/2006/09/12/new-nginx-conf-with-rails-caching The Doku wiki ;-) http://wiki.codemongers.com/ Regards Aleks
Alright, so I started clean on a different machine (3Ghz, 2 Gigs RAM) - pen/mongrel (tried prerelease and stable - 7 instances) -> apache/mongrel_cluster/mongrel prerelease (9 instances) - debian sarge -> ubuntu stable Unfortunately I''m still getting hangs. Things run fine for hours at a time and tens of thousands of page hits. - Plenty of RAM left - No Swap being used - CPU hanging 80-95% on mongrel processes - Killing on mongrel processes and restarting will not fix it until I delete cache directory Couple of the many things I tried - lsof -i -P | grep CLOSE_WAIT and got no mongrels in the list - killall -USR1 mongrel_rails to run the server in debug mode ** USR1 received, toggling $mongrel_debug_client to true X 9 However this toggle didn''t seem to provide any extra information than what was normally outputted. I''m not sure if I''m using it wrong or not. If anyone has any insight or other suggestions to try it would be greatly appreciated! Cheers, Dallas On 12/7/06, Dallas DeVries <dallas.devries at gmail.com > wrote:> > Hi guys thanks for the response, > > Yeah I initially figured that the # of files in the directory was the > problem and it was the first thing I changed earlier this week. I changed > my caching to use many more subdirectories so that I don''t have 10,000+ > files in one directory any more. Right now my largest directory has about > 2000-3000 files most the rest are much smaller than this. Unfortunately that > change did not help the problem...I can have upwards of 50,000 files so I''m > not sure how feasible memcached is at the moment. Anyways I''m almost done > with a clean install with apache/mongrel cluster mentioned by Konstantin > so hopefully that solves my problem. > > Cheers, > Dallas > > On 12/6/06, anjan bacchu <anjan.summit at gmail.com> wrote: > > > directory. When you reach 10-100K files in a single directory you > > > may notice the slowdown without a stopwatch. NetApp''s own file > > > system was designed to have a constant access time regardless of the > > > number of files in the directory (that was their claim to fame > > > anyway). > > > > > > Bottom line is that I wouldn''t assume that a directory with 10K+ > > > files is not the main reason for your performance degradation. > > > > > > Hope this helps, > > > Konstantin > > > > > > > > > > > Hi Konstantin, > > > > You''re right on. > > > > Dallas : ReiserFS is designed to perform well in such a situation of > > having a lot of small files in the same directory. Can you switch to > > ReiserFS ? If not, can you switch to memcached ? Are you expiring your > > sessions ? > > > > BR, > > ~A > > > > _______________________________________________ > > Mongrel-users mailing list > > Mongrel-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mongrel -users<http://rubyforge.org/mailman/listinfo/mongrel-users> > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061213/50d3015a/attachment.html
On Wed, Dec 13, 2006 at 09:41:27PM -0500, Dallas DeVries wrote:> Alright, so I started clean on a different machine (3Ghz, 2 Gigs RAM) > > - pen/mongrel (tried prerelease and stable - 7 instances) -> > apache/mongrel_cluster/mongrel prerelease (9 instances) > > > - debian sarge -> ubuntu stable > > > Unfortunately I''m still getting hangs. Things run fine for hours at a time > and tens of thousands of page hits.Are you using mentalguy''s fastthread? If not, try it out and see if it helps. -- Cheers, - Jacob Atzen
Thanks I will give it a whirl. I assume I just need to do this with the mongrel 0.3.18pr I have gem install fastthread --source=http://mongrel.rubyforge.org/releases/ Thanks! Dallas On 12/14/06, Jacob Atzen <jacob at jacobatzen.dk> wrote:> > On Wed, Dec 13, 2006 at 09:41:27PM -0500, Dallas DeVries wrote: > > Alright, so I started clean on a different machine (3Ghz, 2 Gigs RAM) > > > > - pen/mongrel (tried prerelease and stable - 7 instances) -> > > apache/mongrel_cluster/mongrel prerelease (9 instances) > > > > > > - debian sarge -> ubuntu stable > > > > > > Unfortunately I''m still getting hangs. Things run fine for hours at a > time > > and tens of thousands of page hits. > > Are you using mentalguy''s fastthread? If not, try it out and see if it > helps. > > -- > Cheers, > - Jacob Atzen > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061214/7ac933a1/attachment.html
Alright so I got fastthread installed yesterday. Unfortunately the massive load spike happened again (until rails cache was deleted of course). I''m just wondering if I''m using killall -USR1 mongrel_rails the proper way. What sort of messages should I be looking for? They are suppose to be in mongrel.log right? Other than turning the toggling debugging mode to true I don''t seem to be getting any extra info in that mongrel.log or production.log. Is there some lower level thing I can try? If it really is a cache thing and there is a way for me to see it stuck in some process loading or building a fragment or something... Are the 0.3.19pr items pretty much centered around the MIME type stuff or should I try upgrading to that from .18pr? Thanks Again, Dallas On 12/14/06, Dallas DeVries < dallas.devries at gmail.com> wrote:> > Thanks I will give it a whirl. I assume I just need to do this with the > mongrel 0.3.18pr I have > > gem install fastthread --source= http://mongrel.rubyforge.org/releases/ > > Thanks! > Dallas > > > On 12/14/06, Jacob Atzen < jacob at jacobatzen.dk> wrote: > > > > On Wed, Dec 13, 2006 at 09:41:27PM -0500, Dallas DeVries wrote: > > > Alright, so I started clean on a different machine (3Ghz, 2 Gigs RAM) > > > > > > - pen/mongrel (tried prerelease and stable - 7 instances) -> > > > apache/mongrel_cluster/mongrel prerelease (9 instances) > > > > > > > > > - debian sarge -> ubuntu stable > > > > > > > > > Unfortunately I''m still getting hangs. Things run fine for hours at a > > time > > > and tens of thousands of page hits. > > > > Are you using mentalguy''s fastthread? If not, try it out and see if it > > helps. > > > > -- > > Cheers, > > - Jacob Atzen > > _______________________________________________ > > Mongrel-users mailing list > > Mongrel-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mongrel-users > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061215/98ced44b/attachment.html
On Fri, 15 Dec 2006 12:36:47 -0500 "Dallas DeVries" <dallas.devries at gmail.com> wrote:> Alright so I got fastthread installed yesterday. Unfortunately the massive > load spike happened again (until rails cache was deleted of course). I''m > just wondering if I''m using killall -USR1 mongrel_rails the proper way. > What sort of messages should I be looking for? They are suppose to be in > mongrel.log right? Other than turning the toggling debugging mode to true I > don''t seem to be getting any extra info in that mongrel.log or > production.log. Is there some lower level thing I can try? If it really is > a cache thing and there is a way for me to see it stuck in some process > loading or building a fragment or something...If you turn on USR1 and don''t see any log messages than most likely it''s a problem that isn''t triggering one of the exceptions. You''re turning it on right, and you should see a little log message saying it''s on. If you''ve got a process that''s "stuck" for some reason, try strace -p PID. Using strace you can see what system calls the process is calling and then see if it''s in some loop or something. Another option--which is more advanced--is to attach to the process with gdb and then interrupt it and step through. I haven''t used this yet, but check out Jamis Buck''s blog and a few others for handy macros that can dump the Ruby callstack.> Are the 0.3.19pr items pretty much centered around the MIME type stuff or > should I try upgrading to that from .18pr?Yeah, it''s MIME that was put into the 0.3.19 and just a few fastthread changes. -- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://www.awprofessional.com/title/0321483502 -- The Mongrel Book http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
Thanks for the strace tip Zed. It helped me figure out what was wrong. I noticed stat64() flying by on pretty much every cache file I have. Apparently expire_fragment using a regular expression as the argument will actually go through every single cache file and by midday I have about 30000+ of them so I''m pretty sure that''s the cause. I thought it only tried matching files in the controller and action it was called from or specified, this does not appear to be the case. So I guess I''ll have to write a plugin to change caching or find someone else who solved the same problem. Thanks for the help. Cheers, Dallas On 12/15/06, Zed A. Shaw <zedshaw at zedshaw.com> wrote:> > On Fri, 15 Dec 2006 12:36:47 -0500 > "Dallas DeVries" <dallas.devries at gmail.com> wrote: > > > Alright so I got fastthread installed yesterday. Unfortunately the > massive > > load spike happened again (until rails cache was deleted of > course). I''m > > just wondering if I''m using killall -USR1 mongrel_rails the proper way. > > What sort of messages should I be looking for? They are suppose to be > in > > mongrel.log right? Other than turning the toggling debugging mode to > true I > > don''t seem to be getting any extra info in that mongrel.log or > > production.log. Is there some lower level thing I can try? If it really > is > > a cache thing and there is a way for me to see it stuck in some process > > loading or building a fragment or something... > > If you turn on USR1 and don''t see any log messages than most likely it''s a > problem that isn''t triggering one of the exceptions. You''re turning it on > right, and you should see a little log message saying it''s on. > > If you''ve got a process that''s "stuck" for some reason, try strace -p > PID. Using strace you can see what system calls the process is calling and > then see if it''s in some loop or something. > > Another option--which is more advanced--is to attach to the process with > gdb and then interrupt it and step through. I haven''t used this yet, but > check out Jamis Buck''s blog and a few others for handy macros that can dump > the Ruby callstack. > > > Are the 0.3.19pr items pretty much centered around the MIME type stuff > or > > should I try upgrading to that from .18pr? > > Yeah, it''s MIME that was put into the 0.3.19 and just a few fastthread > changes. > > -- > Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu > http://www.zedshaw.com/ > http://www.awprofessional.com/title/0321483502 -- The Mongrel Book > http://mongrel.rubyforge.org/ > http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help. > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061217/c2fa660a/attachment-0001.html