thr3ads.net - mongrel unicorn - Background jobs with #fork [Apr 2012]

If this information is useful, please help other people find it:
Share via:

paddor

2012-Apr-12 17:36 UTC

Background jobs with #fork

Hi

I''ve migrated from Passenger to Unicorn about a week ago. It''s
great.
Great transparency and management, thanks for this great software!

A few of my Rails applications start background jobs using
Kernel#fork. Of course, the ActiveRecord connections are closed and
reopened again in the parent and child processes. The child process
also does its job.

Unfortunately, it seems that the parent (a Unicorn worker) waits for
the child (background job) to finish before serving any new requests.
Process.detach is done in the child. Process.setsid is not done. The
child''s STDOUT, STDERR and the Rails logger are redirected to their
own files right after forking.

Software used:
ruby 1.9.1p376
Rubygems 1.8.17
Linux 2.6.16.60-0.21-smp (SUSE 10.2)
unicorn 4.2.1
nginx 0.8.53

The problem persists even when multiple workers are started. And the
problem was not present in the old setup with Passenger.

My question: Does Unicorn somehow check/wait for child processes
forked by the worker processes?

Thanks in advance for your help.

-- 
paddor

Eric Wong

2012-Apr-12 20:39 UTC

head link

Background jobs with #fork

paddor <paddor at gmail.com> wrote:> Hi
> 
> I''ve migrated from Passenger to Unicorn about a week ago.
It''s great.
> Great transparency and management, thanks for this great software!
:>
> A few of my Rails applications start background jobs using
> Kernel#fork. Of course, the ActiveRecord connections are closed and
> reopened again in the parent and child processes. The child process
> also does its job.
OK, that''s good.
> Unfortunately, it seems that the parent (a Unicorn worker) waits for
> the child (background job) to finish before serving any new requests.
> Process.detach is done in the child. Process.setsid is not done. The
> child''s STDOUT, STDERR and the Rails logger are redirected to
their
> own files right after forking.
So you''re only calling fork and not exec (or system/popen, right?)  It
may be the case that the client socket is kept alive in the background
process.

The client socket has the close-on-exec flag (FD_CLOEXEC) set, but
there''s no close-on-fork flag, so you might have to find + close it
yourself.  Here''s is a nasty workaround for the child process:

  ObjectSpace.each_object(Kgio::Socket) do |io|
    io.close unless io.closed?
  end
> The problem persists even when multiple workers are started. And the
> problem was not present in the old setup with Passenger.
> 
> My question: Does Unicorn somehow check/wait for child processes
> forked by the worker processes?
Unicorn workers do not explicitly wait on child processes themselves,
unicorn workers set: trap(:CHLD,"DEFAULT") after forking, even (the
unicorn master must handle SIGCHLD, of course)

The difference between nginx+unicorn and Passenger is probabably: nginx
relies on unicorn generating an EOF to signal the end-of-response (nginx
<-> unicorn uses HTTP/1.0), this I''m sure about.  I think
Passenger uses a
protocol which can signal the end-of-request inline without relying on
an EOF on the socket (Hongli can correct me on this if I''m wrong).

However, nginx can still forward subsequent requests to the same unicorn
(even the same unicorn worker), because as far as the unicorn worker is
concerned (but not the OS), it''s done with the original request. 
It''s
just the original request (perhaps the original client) is stuck
waiting for the background process to finish.

I can probably writeup a better explanation (perhaps on the usp.ruby
(Unix Systems Programming for Ruby mailing list) if this doesn''t make
sense.

Hongli Lai

2012-Apr-12 21:15 UTC

head link

Background jobs with #fork

On Thu, Apr 12, 2012 at 10:39 PM, Eric Wong <normalperson at yhbt.net>
wrote:>> The problem persists even when multiple workers are started. And the
>> problem was not present in the old setup with Passenger.
>>
>> My question: Does Unicorn somehow check/wait for child processes
>> forked by the worker processes?
>
> Unicorn workers do not explicitly wait on child processes themselves,
> unicorn workers set: trap(:CHLD,"DEFAULT") after forking, even
(the
> unicorn master must handle SIGCHLD, of course)
>
> The difference between nginx+unicorn and Passenger is probabably: nginx
> relies on unicorn generating an EOF to signal the end-of-response (nginx
> <-> unicorn uses HTTP/1.0), this I''m sure about. ?I think
Passenger uses a
> protocol which can signal the end-of-request inline without relying on
> an EOF on the socket (Hongli can correct me on this if I''m wrong).
We don''t. We call #close_write on the socket to prevent this problem.
#close_write calls shutdown(fd, SHUT_WR) which isn''t affected by the
number of processes that have inherited the socket handle.

-- 
Phusion | Ruby & Rails deployment, scaling and tuning solutions

Web: http://www.phusion.nl/
E-mail: info at phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)

Eric Wong

2012-Apr-12 22:10 UTC

head link

Background jobs with #fork

Hongli Lai <hongli at phusion.nl> wrote:> On Thu, Apr 12, 2012 at 10:39 PM, Eric Wong <normalperson at
yhbt.net> wrote:
> > paddor wrote:
> >> The problem persists even when multiple workers are started. And
the
> >> problem was not present in the old setup with Passenger.
> >
> > The difference between nginx+unicorn and Passenger is probabably:
nginx
> > relies on unicorn generating an EOF to signal the end-of-response
(nginx
> > <-> unicorn uses HTTP/1.0), this I''m sure about. ?I
think Passenger uses a
> > protocol which can signal the end-of-request inline without relying on
> > an EOF on the socket (Hongli can correct me on this if I''m
wrong).
> 
> We don''t. We call #close_write on the socket to prevent this
problem.
> #close_write calls shutdown(fd, SHUT_WR) which isn''t affected by
the
> number of processes that have inherited the socket handle.
Ah, thanks for that clarification.  It''s an extra syscall for an
uncommon case, but I doubt most apps will notice the hit...

Might as well shutdown(SHUT_RDWR), too:

diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index ede6264..f942e2f 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -536,6 +536,7 @@ class Unicorn::HttpServer
     end
     @request.headers? or headers = nil
     http_response_write(client, status, headers, body)
+    client.shutdown # in case of fork() in Rack app
     client.close # flush and uncork socket immediately, no keepalive
   rescue => e
     handle_error(client, e)

Patrik Wenger

2012-Apr-12 22:41 UTC

head link

Background jobs with #fork

Thanks for the answers.
> So you''re only calling fork and not exec (or system/popen, right?)
It
> may be the case that the client socket is kept alive in the background
> process.
Yes, I''m only calling Kernel#fork. @Eric, your guess makes sense to me.
> The client socket has the close-on-exec flag (FD_CLOEXEC) set, but
> there''s no close-on-fork flag, so you might have to find + close
it
> yourself.  Here''s is a nasty workaround for the child process:
> 
>  ObjectSpace.each_object(Kgio::Socket) do |io|
>    io.close unless io.closed?
>  end
Isn''t there another way to retrieve the right socket?

Here some additional info that might bring some clarification.

Another action in the same controller which does about the same regarding the
background job works (check.run!).
The only differences I see are:

1) it''s called via AJAX
2) the response is nothing (render :nothing => true) instead of a redirect
(redirect_to checks_path)

I reckon the second difference kind of confirms Eric''s guess, as the
client socket probably isn''t considered anymore with render :nothing
=> true.
> However, nginx can still forward subsequent requests to the same unicorn
> (even the same unicorn worker), because as far as the unicorn worker is
> concerned (but not the OS), it''s done with the original request. 
It''s
> just the original request (perhaps the original client) is stuck
> waiting for the background process to finish.
> 
> I can probably writeup a better explanation (perhaps on the usp.ruby
> (Unix Systems Programming for Ruby mailing list) if this doesn''t
make
> sense.
Yeah I don''t really understand this part. The "hanging"
Unicorn worker can read another request because the client socket
wasn''t closed because it''s still open in the child process? I
would appreciate a better explanation, thank you.

Eric Wong

2012-Apr-12 23:04 UTC

head link

Background jobs with #fork

Patrik Wenger <paddor at gmail.com> wrote:> Thanks for the answers.
No problem.
> Isn''t there another way to retrieve the right socket?
Actually, I think my proposed patch (in reply to Hongli) at
http://mid.gmane.org/20120412221022.GA20640 at dcvr.yhbt.net
should fix your issue.
> > I can probably writeup a better explanation (perhaps on the usp.ruby
> > (Unix Systems Programming for Ruby mailing list) if this
doesn''t make
> > sense.
> Yeah I don''t really understand this part. The "hanging"
Unicorn worker
> can read another request because the client socket wasn''t closed
> because it''s still open in the child process? I would appreciate a
> better explanation, thank you.
Basically, fork() has a similar effect as dup() in that it creates
multiple references to the same kernel object (the client socket).

close() basically lowers the refcount of a kernel object, when the
refcount is zero, resources inside the kernel are freed.  When
the refcount of a kernel object reaches zero, a shutdown(SHUT_RDWR)
is implied.

This works for 99% of Rack apps since they don''t fork() nor call dup()
on the client socket, so refcount==1 when unicorn calls close(), leading
to unicorn setting refcount=0 upon close() => everything is freed.

However, since the client socket increments refcount via fork(),
close() in the parent (unicorn worker) no longer implies
shutdown(SHUT_RDWR).



  parent timeline                  | child timeline
  ------------------------------------------------------------------
                                   |
  accept() -> sockfd created       | (child doesn''t exist, yet)
  sockfd.refcount == 1             |
                                   |
  fork()                           | child exists, now
                                   |

    sockfd is shared by both processes now: sockfd.refcount == 2
    if either the child or parent forks again: sockfd.recount += 1

                                   |
  close() => sockfd.recount -= 1   | child continues running

    since sockfd.refcount == 1 at this point, the socket is still
    considerd "alive" by the kernel.  If the child calls close()
    (or exits), sockfd.refcount is decremented again (and now
    reaches zero).

Now, to write this as a commit message :>

Patrik Wenger

2012-Apr-13 01:04 UTC

head link

Background jobs with #fork

> Actually, I think my proposed patch (in reply to Hongli) at
> http://mid.gmane.org/20120412221022.GA20640 at dcvr.yhbt.net
> should fix your issue.
Oh, this sounds great! I see it''s just one line. I''ll try
adding this line and see if it fixes my problem.
Thanks already!
> Basically, fork() has a similar effect as dup() in that it creates
> multiple references to the same kernel object (the client socket).
> 
> close() basically lowers the refcount of a kernel object, when the
> refcount is zero, resources inside the kernel are freed.  When
> the refcount of a kernel object reaches zero, a shutdown(SHUT_RDWR)
> is implied.
> 
> This works for 99% of Rack apps since they don''t fork() nor call
dup()
> on the client socket, so refcount==1 when unicorn calls close(), leading
> to unicorn setting refcount=0 upon close() => everything is freed.
> 
> However, since the client socket increments refcount via fork(),
> close() in the parent (unicorn worker) no longer implies
> shutdown(SHUT_RDWR).
I don''t know much about system calls (honestly, never heard of
shutdown() before) but I think I understand.
The kernel then doesn''t fee the allocated client socket and thus
Unicorn (or Nginx? That''s the other end of the client socket, I
think...) thinks there''s still more to come.

Eric Wong

2012-Apr-13 02:10 UTC

head link

Background jobs with #fork

Patrik Wenger <paddor at gmail.com> wrote:> > Actually, I think my proposed patch (in reply to Hongli) at
> > http://mid.gmane.org/20120412221022.GA20640 at dcvr.yhbt.net
> > should fix your issue.
> 
> Oh, this sounds great! I see it''s just one line. I''ll try
adding this line and see if it fixes my problem.
> Thanks already!
No problem.  I''ve also pushed it out as a commit + test case:

  http://bogomips.org/unicorn.git/patch?id=b26d3e2c4387707c

I''m looking at releasing unicorn 4.3.0 early next week, probably
Tuesday (including REQUEST_PATH/REQUEST_URI length limit tweaks).
> I don''t know much about system calls (honestly, never heard of
> shutdown() before) but I think I understand.
> The kernel then doesn''t fee the allocated client socket and thus
> Unicorn (or Nginx? That''s the other end of the client socket, I
> think...) thinks there''s still more to come.
Sorta, I think I conflated the shutdown vs releasing resources a bit.

Releasing kernel resources happens for all file descriptors.

However, connected sockets may negotiate a graceful termination
of the connection, and that''s what shutdown() can do.  However,
close() will do it implicitly.

If a kernel were implemented in Ruby, close() would be something like
this, showing how shutdown() can get called automatically:

  def SYS_close(fd)
    # assume @fd_map is an array mapping fd to file/socket objects
    io_object = @fd_map[fd]

    return Errno::EBADF if io_object.nil?

    # allow +fd+ to be reused immediately
    @fd_map[fd] = nil

    io_object.refcount -= 1

    # if there are no more references, do other work:
    if io_object.refcount == 0
      if io_object.kind_of?(Socket)
        # assume this is idempotent
        SYS_shutdown(fd, SHUT_RDWR)
      end

      # free memory and any other resources allocated
      io_object.destroy!
    end
  end

paddor

2012-Apr-13 12:11 UTC

head link

Background jobs with #fork

> No problem. ?I''ve also pushed it out as a commit + test case:
>
> ?http://bogomips.org/unicorn.git/patch?id=b26d3e2c4387707c
Thank you (and Hongli Lai) so much. It works like a charm :-)

> If a kernel were implemented in Ruby, close() would be something like
> this, showing how shutdown() can get called automatically:
>
> ?def SYS_close(fd)
> ? ?# assume @fd_map is an array mapping fd to file/socket objects
> ? ?io_object = @fd_map[fd]
>
> ? ?return Errno::EBADF if io_object.nil?
>
> ? ?# allow +fd+ to be reused immediately
> ? ?@fd_map[fd] = nil
>
> ? ?io_object.refcount -= 1
>
> ? ?# if there are no more references, do other work:
> ? ?if io_object.refcount == 0
> ? ? ?if io_object.kind_of?(Socket)
> ? ? ? ?# assume this is idempotent
> ? ? ? ?SYS_shutdown(fd, SHUT_RDWR)
> ? ? ?end
>
> ? ? ?# free memory and any other resources allocated
> ? ? ?io_object.destroy!
> ? ?end
> ?end
Thanks for this explanation. I understand now. :-)


--
paddor

Reasonably Related Threads

Search for more apparently analagous threads

mongrel unicorn - Apr 2012 - Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Background jobs with #fork

Reasonably Related Threads