Hi I''ve migrated from Passenger to Unicorn about a week ago. It''s great. Great transparency and management, thanks for this great software! A few of my Rails applications start background jobs using Kernel#fork. Of course, the ActiveRecord connections are closed and reopened again in the parent and child processes. The child process also does its job. Unfortunately, it seems that the parent (a Unicorn worker) waits for the child (background job) to finish before serving any new requests. Process.detach is done in the child. Process.setsid is not done. The child''s STDOUT, STDERR and the Rails logger are redirected to their own files right after forking. Software used: ruby 1.9.1p376 Rubygems 1.8.17 Linux 2.6.16.60-0.21-smp (SUSE 10.2) unicorn 4.2.1 nginx 0.8.53 The problem persists even when multiple workers are started. And the problem was not present in the old setup with Passenger. My question: Does Unicorn somehow check/wait for child processes forked by the worker processes? Thanks in advance for your help. -- paddor
paddor <paddor at gmail.com> wrote:> Hi > > I''ve migrated from Passenger to Unicorn about a week ago. It''s great. > Great transparency and management, thanks for this great software!:>> A few of my Rails applications start background jobs using > Kernel#fork. Of course, the ActiveRecord connections are closed and > reopened again in the parent and child processes. The child process > also does its job.OK, that''s good.> Unfortunately, it seems that the parent (a Unicorn worker) waits for > the child (background job) to finish before serving any new requests. > Process.detach is done in the child. Process.setsid is not done. The > child''s STDOUT, STDERR and the Rails logger are redirected to their > own files right after forking.So you''re only calling fork and not exec (or system/popen, right?) It may be the case that the client socket is kept alive in the background process. The client socket has the close-on-exec flag (FD_CLOEXEC) set, but there''s no close-on-fork flag, so you might have to find + close it yourself. Here''s is a nasty workaround for the child process: ObjectSpace.each_object(Kgio::Socket) do |io| io.close unless io.closed? end> The problem persists even when multiple workers are started. And the > problem was not present in the old setup with Passenger. > > My question: Does Unicorn somehow check/wait for child processes > forked by the worker processes?Unicorn workers do not explicitly wait on child processes themselves, unicorn workers set: trap(:CHLD,"DEFAULT") after forking, even (the unicorn master must handle SIGCHLD, of course) The difference between nginx+unicorn and Passenger is probabably: nginx relies on unicorn generating an EOF to signal the end-of-response (nginx <-> unicorn uses HTTP/1.0), this I''m sure about. I think Passenger uses a protocol which can signal the end-of-request inline without relying on an EOF on the socket (Hongli can correct me on this if I''m wrong). However, nginx can still forward subsequent requests to the same unicorn (even the same unicorn worker), because as far as the unicorn worker is concerned (but not the OS), it''s done with the original request. It''s just the original request (perhaps the original client) is stuck waiting for the background process to finish. I can probably writeup a better explanation (perhaps on the usp.ruby (Unix Systems Programming for Ruby mailing list) if this doesn''t make sense.
On Thu, Apr 12, 2012 at 10:39 PM, Eric Wong <normalperson at yhbt.net> wrote:>> The problem persists even when multiple workers are started. And the >> problem was not present in the old setup with Passenger. >> >> My question: Does Unicorn somehow check/wait for child processes >> forked by the worker processes? > > Unicorn workers do not explicitly wait on child processes themselves, > unicorn workers set: trap(:CHLD,"DEFAULT") after forking, even (the > unicorn master must handle SIGCHLD, of course) > > The difference between nginx+unicorn and Passenger is probabably: nginx > relies on unicorn generating an EOF to signal the end-of-response (nginx > <-> unicorn uses HTTP/1.0), this I''m sure about. ?I think Passenger uses a > protocol which can signal the end-of-request inline without relying on > an EOF on the socket (Hongli can correct me on this if I''m wrong).We don''t. We call #close_write on the socket to prevent this problem. #close_write calls shutdown(fd, SHUT_WR) which isn''t affected by the number of processes that have inherited the socket handle. -- Phusion | Ruby & Rails deployment, scaling and tuning solutions Web: http://www.phusion.nl/ E-mail: info at phusion.nl Chamber of commerce no: 08173483 (The Netherlands)
Hongli Lai <hongli at phusion.nl> wrote:> On Thu, Apr 12, 2012 at 10:39 PM, Eric Wong <normalperson at yhbt.net> wrote: > > paddor wrote: > >> The problem persists even when multiple workers are started. And the > >> problem was not present in the old setup with Passenger. > > > > The difference between nginx+unicorn and Passenger is probabably: nginx > > relies on unicorn generating an EOF to signal the end-of-response (nginx > > <-> unicorn uses HTTP/1.0), this I''m sure about. ?I think Passenger uses a > > protocol which can signal the end-of-request inline without relying on > > an EOF on the socket (Hongli can correct me on this if I''m wrong). > > We don''t. We call #close_write on the socket to prevent this problem. > #close_write calls shutdown(fd, SHUT_WR) which isn''t affected by the > number of processes that have inherited the socket handle.Ah, thanks for that clarification. It''s an extra syscall for an uncommon case, but I doubt most apps will notice the hit... Might as well shutdown(SHUT_RDWR), too: diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index ede6264..f942e2f 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -536,6 +536,7 @@ class Unicorn::HttpServer end @request.headers? or headers = nil http_response_write(client, status, headers, body) + client.shutdown # in case of fork() in Rack app client.close # flush and uncork socket immediately, no keepalive rescue => e handle_error(client, e)
Thanks for the answers.> So you''re only calling fork and not exec (or system/popen, right?) It > may be the case that the client socket is kept alive in the background > process.Yes, I''m only calling Kernel#fork. @Eric, your guess makes sense to me.> The client socket has the close-on-exec flag (FD_CLOEXEC) set, but > there''s no close-on-fork flag, so you might have to find + close it > yourself. Here''s is a nasty workaround for the child process: > > ObjectSpace.each_object(Kgio::Socket) do |io| > io.close unless io.closed? > endIsn''t there another way to retrieve the right socket? Here some additional info that might bring some clarification. Another action in the same controller which does about the same regarding the background job works (check.run!). The only differences I see are: 1) it''s called via AJAX 2) the response is nothing (render :nothing => true) instead of a redirect (redirect_to checks_path) I reckon the second difference kind of confirms Eric''s guess, as the client socket probably isn''t considered anymore with render :nothing => true.> However, nginx can still forward subsequent requests to the same unicorn > (even the same unicorn worker), because as far as the unicorn worker is > concerned (but not the OS), it''s done with the original request. It''s > just the original request (perhaps the original client) is stuck > waiting for the background process to finish. > > I can probably writeup a better explanation (perhaps on the usp.ruby > (Unix Systems Programming for Ruby mailing list) if this doesn''t make > sense.Yeah I don''t really understand this part. The "hanging" Unicorn worker can read another request because the client socket wasn''t closed because it''s still open in the child process? I would appreciate a better explanation, thank you.
Patrik Wenger <paddor at gmail.com> wrote:> Thanks for the answers.No problem.> Isn''t there another way to retrieve the right socket?Actually, I think my proposed patch (in reply to Hongli) at http://mid.gmane.org/20120412221022.GA20640 at dcvr.yhbt.net should fix your issue.> > I can probably writeup a better explanation (perhaps on the usp.ruby > > (Unix Systems Programming for Ruby mailing list) if this doesn''t make > > sense.> Yeah I don''t really understand this part. The "hanging" Unicorn worker > can read another request because the client socket wasn''t closed > because it''s still open in the child process? I would appreciate a > better explanation, thank you.Basically, fork() has a similar effect as dup() in that it creates multiple references to the same kernel object (the client socket). close() basically lowers the refcount of a kernel object, when the refcount is zero, resources inside the kernel are freed. When the refcount of a kernel object reaches zero, a shutdown(SHUT_RDWR) is implied. This works for 99% of Rack apps since they don''t fork() nor call dup() on the client socket, so refcount==1 when unicorn calls close(), leading to unicorn setting refcount=0 upon close() => everything is freed. However, since the client socket increments refcount via fork(), close() in the parent (unicorn worker) no longer implies shutdown(SHUT_RDWR). parent timeline | child timeline ------------------------------------------------------------------ | accept() -> sockfd created | (child doesn''t exist, yet) sockfd.refcount == 1 | | fork() | child exists, now | sockfd is shared by both processes now: sockfd.refcount == 2 if either the child or parent forks again: sockfd.recount += 1 | close() => sockfd.recount -= 1 | child continues running since sockfd.refcount == 1 at this point, the socket is still considerd "alive" by the kernel. If the child calls close() (or exits), sockfd.refcount is decremented again (and now reaches zero). Now, to write this as a commit message :>
> Actually, I think my proposed patch (in reply to Hongli) at > http://mid.gmane.org/20120412221022.GA20640 at dcvr.yhbt.net > should fix your issue.Oh, this sounds great! I see it''s just one line. I''ll try adding this line and see if it fixes my problem. Thanks already!> Basically, fork() has a similar effect as dup() in that it creates > multiple references to the same kernel object (the client socket). > > close() basically lowers the refcount of a kernel object, when the > refcount is zero, resources inside the kernel are freed. When > the refcount of a kernel object reaches zero, a shutdown(SHUT_RDWR) > is implied. > > This works for 99% of Rack apps since they don''t fork() nor call dup() > on the client socket, so refcount==1 when unicorn calls close(), leading > to unicorn setting refcount=0 upon close() => everything is freed. > > However, since the client socket increments refcount via fork(), > close() in the parent (unicorn worker) no longer implies > shutdown(SHUT_RDWR).I don''t know much about system calls (honestly, never heard of shutdown() before) but I think I understand. The kernel then doesn''t fee the allocated client socket and thus Unicorn (or Nginx? That''s the other end of the client socket, I think...) thinks there''s still more to come.
Patrik Wenger <paddor at gmail.com> wrote:> > Actually, I think my proposed patch (in reply to Hongli) at > > http://mid.gmane.org/20120412221022.GA20640 at dcvr.yhbt.net > > should fix your issue. > > Oh, this sounds great! I see it''s just one line. I''ll try adding this line and see if it fixes my problem. > Thanks already!No problem. I''ve also pushed it out as a commit + test case: http://bogomips.org/unicorn.git/patch?id=b26d3e2c4387707c I''m looking at releasing unicorn 4.3.0 early next week, probably Tuesday (including REQUEST_PATH/REQUEST_URI length limit tweaks).> I don''t know much about system calls (honestly, never heard of > shutdown() before) but I think I understand. > The kernel then doesn''t fee the allocated client socket and thus > Unicorn (or Nginx? That''s the other end of the client socket, I > think...) thinks there''s still more to come.Sorta, I think I conflated the shutdown vs releasing resources a bit. Releasing kernel resources happens for all file descriptors. However, connected sockets may negotiate a graceful termination of the connection, and that''s what shutdown() can do. However, close() will do it implicitly. If a kernel were implemented in Ruby, close() would be something like this, showing how shutdown() can get called automatically: def SYS_close(fd) # assume @fd_map is an array mapping fd to file/socket objects io_object = @fd_map[fd] return Errno::EBADF if io_object.nil? # allow +fd+ to be reused immediately @fd_map[fd] = nil io_object.refcount -= 1 # if there are no more references, do other work: if io_object.refcount == 0 if io_object.kind_of?(Socket) # assume this is idempotent SYS_shutdown(fd, SHUT_RDWR) end # free memory and any other resources allocated io_object.destroy! end end
> No problem. ?I''ve also pushed it out as a commit + test case: > > ?http://bogomips.org/unicorn.git/patch?id=b26d3e2c4387707cThank you (and Hongli Lai) so much. It works like a charm :-)> If a kernel were implemented in Ruby, close() would be something like > this, showing how shutdown() can get called automatically: > > ?def SYS_close(fd) > ? ?# assume @fd_map is an array mapping fd to file/socket objects > ? ?io_object = @fd_map[fd] > > ? ?return Errno::EBADF if io_object.nil? > > ? ?# allow +fd+ to be reused immediately > ? ?@fd_map[fd] = nil > > ? ?io_object.refcount -= 1 > > ? ?# if there are no more references, do other work: > ? ?if io_object.refcount == 0 > ? ? ?if io_object.kind_of?(Socket) > ? ? ? ?# assume this is idempotent > ? ? ? ?SYS_shutdown(fd, SHUT_RDWR) > ? ? ?end > > ? ? ?# free memory and any other resources allocated > ? ? ?io_object.destroy! > ? ?end > ?endThanks for this explanation. I understand now. :-) -- paddor
Reasonably Related Threads
- app error: Socket is not connected (Errno::ENOTCONN)
- [PATCH] Start the server if another user has a PID matching our stale pidfile.
- [PATCH] construct listener_fds Hash in 1.8 compatible way
- Handling closed clients
- Rack content-length Rack::Lint::LintErrors errors with unicorn