Two ideas, one more controversial than the other. First: auto-killing bloated workers.?My current app has some memory leakage that wasn''t really visible on our older passenger setup, since the auto-scaling meant that bloated workers got killed periodically. In a perfect world, we''d find and patch all of the leaks, but in the meantime (and as a safety net) I''d like to get the bloated workers auto-killed. It looks like it''d be simple to add in a bloated-worker check at the same point when we check for timeout violations, and it could be hidden behind a config setting. Alternately, I could write this in a separate script. Pros: might be a useful built-in feature, looks easy to implement the killing Cons: Getting the memory usage might actually be surprisingly difficult. Comparing to passenger''s memory management code, where they actually?use platform-specific system calls, and we might get a sizeable quantity of code that we don''t want dirtying up the unicorn internals. Also, some methods of checking appear to have performance risks. Second: in my use case, I have webservers running as VMs, sharing a physical box with backend utility servers. The util servers run lots of very CPU- and memory-hungry jobs, mostly at night; the webservers handle requests, mostly in the daytime. Currently, most of these webservers are running passenger, which is very polite about not using more resources than it needs to handle requests. Unicorn, by contrast (and by design) is very resource-greedy, what with the "scale to what you can theoretically handle" strategy. If I spin down my number of unicorn workers when they''re not needed, I free up resources for my util servers, which is important. TTOU and TTIN signals give me a (very nice) means to write an auto-scaling module outside of unicorn, but it might be nice to have it included as an optional component. (I expect this will get voted down, as I expect the dev team is not interested in it). Happy to work on implementing these myself, just wanted to poll to see if it''d be worth developing them as part of unicorn proper rather than standalone scripts. -ben
We use the following at the bottom of a God config. I believe it''s in an example somewhere. Bloated worker gets sent a QUIT and it will finish processing the request and exits gracefully. We never really use TTOU and TTIN since the # of workers used is basically determined by the ram in the machine in question. ```ruby # unicorn workers unicorn_worker_memory_limit = 220_000 Thread.new do loop do begin workers = `ps -e -www -o pid,rss,command | grep ''[u]nicorn_rails worker''` workers.split("\n").each do |line| parts = line.split('' '') if parts[1].to_i > unicorn_worker_memory_limit # tell the worker to die after it finishes serving its request ::Process.kill(''QUIT'', parts[0].to_i) end end rescue Object # don''t die ever once we''ve tested this nil end sleep 30 end end ``` Clifton On Mar 1, 2012, at 5:52 PM, Ben Somers wrote:> Two ideas, one more controversial than the other. > First: auto-killing bloated workers. My current app has some memory > leakage that wasn''t really visible on our older passenger setup, since > the auto-scaling meant that bloated workers got killed periodically. > In a perfect world, we''d find and patch all of the leaks, but in the > meantime (and as a safety net) I''d like to get the bloated workers > auto-killed. It looks like it''d be simple to add in a bloated-worker > check at the same point when we check for timeout violations, and it > could be hidden behind a config setting. Alternately, I could write > this in a separate script. > > Pros: might be a useful built-in feature, looks easy to implement the killing > Cons: Getting the memory usage might actually be surprisingly > difficult. Comparing to passenger''s memory management code, where they > actually use platform-specific system calls, and we might get a > sizeable quantity of code that we don''t want dirtying up the unicorn > internals. Also, some methods of checking appear to have performance > risks. > > Second: in my use case, I have webservers running as VMs, sharing a > physical box with backend utility servers. The util servers run lots > of very CPU- and memory-hungry jobs, mostly at night; the webservers > handle requests, mostly in the daytime. Currently, most of these > webservers are running passenger, which is very polite about not using > more resources than it needs to handle requests. Unicorn, by contrast > (and by design) is very resource-greedy, what with the "scale to what > you can theoretically handle" strategy. If I spin down my number of > unicorn workers when they''re not needed, I free up resources for my > util servers, which is important. TTOU and TTIN signals give me a > (very nice) means to write an auto-scaling module outside of unicorn, > but it might be nice to have it included as an optional component. (I > expect this will get voted down, as I expect the dev team is not > interested in it). > > Happy to work on implementing these myself, just wanted to poll to see > if it''d be worth developing them as part of unicorn proper rather than > standalone scripts. > > -ben > _______________________________________________ > Unicorn mailing list - mongrel-unicorn at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-unicorn > Do not quote signatures (like this one) or top post when replying
Ben Somers <somers.ben at gmail.com> wrote:> Two ideas, one more controversial than the other.Neither is really controversal.> First: auto-killing bloated workers.?My current app has some memory > leakage that wasn''t really visible on our older passenger setup, since > the auto-scaling meant that bloated workers got killed periodically. > In a perfect world, we''d find and patch all of the leaks, but in the > meantime (and as a safety net) I''d like to get the bloated workers > auto-killed. It looks like it''d be simple to add in a bloated-worker > check at the same point when we check for timeout violations, and it > could be hidden behind a config setting. Alternately, I could write > this in a separate script. > > Pros: might be a useful built-in feature, looks easy to implement the killing > Cons: Getting the memory usage might actually be surprisingly > difficult. Comparing to passenger''s memory management code, where they > actually?use platform-specific system calls, and we might get a > sizeable quantity of code that we don''t want dirtying up the unicorn > internals. Also, some methods of checking appear to have performance > risks.You can try something like the following middleware (totally untested, but I''ve done similar things here and there). I don''t know about non-Linux, but I suspect /proc/#$$/* is likely to have something similar... class MemCheckLinux < Struct.new(:app) def call(env) # a faster, but less-readable version may use /proc/#$$/stat or # /proc/#$$/statm but those aren''t as human-friendly as # /proc/#$$/status if /VmRSS:\s+(\d+)\s/ =~ File.read("/proc/#$$/status") # gracefully kill ourselves if we exceed ~100M Process.kill(:QUIT, $$) if $1.to_i > 100_000 end app.call(env) end end use MemCheckLinux run Rack::Lobster.new Sadly, setrlimit(:RLIMIT_AS) only causes SIGSEGV to get raised, and Ruby controls that signal for itself. I sometimes use setrlimit(:RLIMIT_CPU) + trap(:XCPU) to kill runaway processes. Apache has long had a similar parameter where you could just tell a worker to gracefully die after X number of requests. That''d also be trivial to implement with middlware using SIGQUIT.> Second: in my use case, I have webservers running as VMs, sharing a > physical box with backend utility servers. The util servers run lots > of very CPU- and memory-hungry jobs, mostly at night; the webservers > handle requests, mostly in the daytime. Currently, most of these > webservers are running passenger, which is very polite about not using > more resources than it needs to handle requests. Unicorn, by contrast > (and by design) is very resource-greedy, what with the "scale to what > you can theoretically handle" strategy. If I spin down my number of > unicorn workers when they''re not needed, I free up resources for my > util servers, which is important. TTOU and TTIN signals give me a > (very nice) means to write an auto-scaling module outside of unicorn, > but it might be nice to have it included as an optional component. (I > expect this will get voted down, as I expect the dev team is not > interested in it).If you can make an effort to support it when it breaks, I wouldn''t mind including a script in the examples/ section or as an optional module. It''s definitely not going to ever be the default. Auto-scaling is hard (if not impossible) to get right. In my experience, it always get things wrong by default or gets configured wrong, making things more difficult to fix. Dedicated servers will always be the primary target of unicorn. (And unicorn of course only scales to server + backend resources, nginx handles scaling to client connections :)> Happy to work on implementing these myself, just wanted to poll to see > if it''d be worth developing them as part of unicorn proper rather than > standalone scripts.If you''re willing to help support users of these scripts/modules, I''d have no reservations about distributing them along with unicorn.
Ben Somers <somers.ben at gmail.com> wrote:> First: auto-killing bloated workers.?My current app has some memory > leakage that wasn''t really visible on our older passenger setup, sinceBtw, you reported issues with memory usage on a Ubuntu system a few months ago, is this the same system? Are you using stock malloc() or tcmalloc()? (tcmalloc comes standard with REE afaik and never releases memory to the kernel). For glibc malloc (ptmalloc) users I mentioned MALLOC_MMAP_THRESHOLD_, but forgot about MALLOC_ARENA_MAX. Since MRI is mostly single-threaded (especially the memory allocation portions), I''m tending to think the per-thread optimizations in glibc malloc do not help in most cases and will only lead to internal fragmentation. So perhaps setting MALLOC_ARENA_MAX=1 (perhaps along with mmap threshold) in the environment will reduce fragmentation and memory usage. It doesn''t look like the MALLOC_ARENA_* environment variables are in the manpages, yet: http://udrepper.livejournal.com/20948.html