thr3ads.net - mongrel unicorn - Issues with PID file renaming [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Jimmy Soho

2013-Nov-26 01:00 UTC

Issues with PID file renaming

Hi Eric,

Since we upgraded from 4.6.2 to 4.7.0 we''re regularly having issues
where one or more unicorn master processes would not upgrade
correctly, resulting in an (old) master process.

What I notice is the following: when upgrading with 4.6.2 the file
unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid file
is updated (the (new) master process). After a while the worker pid
files are updated and the unicorn.pid.oldbin file disappears, and all
is fine.

When upgrading with 4.7.0 though the file unicorn.pid.oldbin is
created, but there is no unicorn.pid file. After a while (when the new
master process has finished initialising, and is ready to fork the
workers) the worker pid files are updated and a unicorn.pid file
appears, and then the unicorn.pid.oldbin file disappears.

So there is a relatively long period where there is no unicorn.pid file.

I think the problem for us is caused by monit, our process monitor,
which monitors the unicorn.pid file:

check process unicorn with pidfile
/srv/app.itrp-staging.com/shared/pids/unicorn.pid
  start program = "/etc/init.d/unicorn start"
  stop program = "/etc/init.d/unicorn stop"
  ...

Because the unicorn.pid file is absent for a relatively long period
monit will try to start unicorn, while an upgrade is in progress.
(which might be another issue: the start action should recognise a
start or upgrade process is already underway; sadly this check too
relies on the existence of the unicorn.pid file)

I think the way 4.6.2 worked is better. There should be a pid file for
the new master process the moment it''s created.

What do you think?

-- 
Jim
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Eric Wong

2013-Nov-26 01:20 UTC

head link

Re: Issues with PID file renaming

Jimmy Soho <jimmy.soho@gmail.com> wrote:> Hi Eric,
> 
> Since we upgraded from 4.6.2 to 4.7.0 we''re regularly having
issues
> where one or more unicorn master processes would not upgrade
> correctly, resulting in an (old) master process.
> 
> What I notice is the following: when upgrading with 4.6.2 the file
> unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid file
> is updated (the (new) master process). After a while the worker pid
> files are updated and the unicorn.pid.oldbin file disappears, and all
> is fine.
Ugh.  This is what I feared...  Slow startup time of most Ruby web apps
doesn''t help.
> When upgrading with 4.7.0 though the file unicorn.pid.oldbin is
> created, but there is no unicorn.pid file. After a while (when the new
> master process has finished initialising, and is ready to fork the
> workers) the worker pid files are updated and a unicorn.pid file
> appears, and then the unicorn.pid.oldbin file disappears.
> 
> So there is a relatively long period where there is no unicorn.pid file.
> 
> I think the problem for us is caused by monit, our process monitor,
> which monitors the unicorn.pid file:
> 
> check process unicorn with pidfile
> /srv/app.itrp-staging.com/shared/pids/unicorn.pid
>   start program = "/etc/init.d/unicorn start"
>   stop program = "/etc/init.d/unicorn stop"
>   ...
Is there an alternate way to monitor unicorn with monit?
But I''d like to keep your case working if possible.
> Because the unicorn.pid file is absent for a relatively long period
> monit will try to start unicorn, while an upgrade is in progress.
> (which might be another issue: the start action should recognise a
> start or upgrade process is already underway; sadly this check too
> relies on the existence of the unicorn.pid file)
> 
> I think the way 4.6.2 worked is better. There should be a pid file for
> the new master process the moment it''s created.
> What do you think?
How about having the old process create a hard link to .oldbin,
and having the new one override the pid if Process.ppid == pid file?
The check is still racy, but that''s what pid files are :<
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Michael Fischer

2013-Nov-26 01:40 UTC

head link

Re: Issues with PID file renaming

On Mon, Nov 25, 2013 at 5:20 PM, Eric Wong <normalperson@yhbt.net> wrote:
>> What I notice is the following: when upgrading with 4.6.2 the file
>> unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid file
>> is updated (the (new) master process). After a while the worker pid
>> files are updated and the unicorn.pid.oldbin file disappears, and all
>> is fine.
>
> Ugh.  This is what I feared...  Slow startup time of most Ruby web apps
> doesn''t help.
Why doesn''t the new master write its pid file immediately upon
forking?  It shouldn''t have to wait for everything else to load,
should it?
> How about having the old process create a hard link to .oldbin,
> and having the new one override the pid if Process.ppid == pid file?
> The check is still racy, but that''s what pid files are :<
If you decide to implement this, please make it optional.  The current
behavior is correct IMO, and it will work much better if the above
issue is addressed.

--Michael
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Michael Fischer

2013-Nov-26 01:42 UTC

head link

Re: Issues with PID file renaming

On Mon, Nov 25, 2013 at 5:00 PM, Jimmy Soho <jimmy.soho@gmail.com> wrote:
> I think the problem for us is caused by monit, our process monitor,
> which monitors the unicorn.pid file:
>
> check process unicorn with pidfile
> /srv/app.itrp-staging.com/shared/pids/unicorn.pid
>   start program = "/etc/init.d/unicorn start"
>   stop program = "/etc/init.d/unicorn stop"
>   …
I''d suggest that you monitor Unicorn by issuing a test request to it
via its listening socket instead.  Ultimately, you''re more likely
concerned about whether Unicorn is serving requests, not whether its
pid file exists.  (Such a check can also lead to false positives;
consider what might happen if an admin or the Linux OOM killer sends
it a SIGKILL, leaving the pid file intact.)

Best regards,

--Michael
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Eric Wong

2013-Nov-26 02:07 UTC

head link

Re: Issues with PID file renaming

Michael Fischer <mfischer@zendesk.com> wrote:> On Mon, Nov 25, 2013 at 5:20 PM, Eric Wong <normalperson@yhbt.net>
wrote:
> 
> >> What I notice is the following: when upgrading with 4.6.2 the file
> >> unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid
file
> >> is updated (the (new) master process). After a while the worker
pid
> >> files are updated and the unicorn.pid.oldbin file disappears, and
all
> >> is fine.
> >
> > Ugh.  This is what I feared...  Slow startup time of most Ruby web
apps
> > doesn''t help.
> 
> Why doesn''t the new master write its pid file immediately upon
> forking?  It shouldn''t have to wait for everything else to load,
> should it?
It used to do this in 4.6, but I wanted to support dropping the PID
later in case of config failures.  I''ll revert that part of the change.
> > How about having the old process create a hard link to .oldbin,
> > and having the new one override the pid if Process.ppid == pid file?
> > The check is still racy, but that''s what pid files are :<
> 
> If you decide to implement this, please make it optional.  The current
> behavior is correct IMO, and it will work much better if the above
> issue is addressed.
I''ve decided against that :)

Just pushed this out to git://bogomips.org/unicorn
as 795c3527337ff4f03ae6db08c5df01141565ed96

---------------------------- 8< -----------------------------
Subject: [PATCH] always write PID file early for compatibility

This reduces the window for a non-existent PID for folks who monitor
PIDs (not a great idea anyways).  Unfortunately, this change also brings
us back to the case where having a PID later (for other process monitors)
is beneficial but more unicorn releases exist where we write the PID
early.

Thanks to Jimmy Soho for reporting this issue.
ref: <CAHStS5gFYcPBDxkVizAHrOeDKAkjT69kruFdgaY0CbB+vLbK8Q@mail.gmail.com>

This partially reverts 7d6ac0c17eb29a00a5b74099dbb3d4d015999f27

Folks: please monitor your app with HTTP requests rather than checking
processes, a stuck/wedged Ruby VM is still a running one.
---
 lib/unicorn/http_server.rb | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index 402f133..f15c8a7 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -136,20 +136,15 @@ class Unicorn::HttpServer
     trap(:CHLD) { awaken_master }
 
     # write pid early for Mongrel compatibility if we''re not
inheriting sockets
-    # This was needed for compatibility with some health checker a long time
-    # ago.  This unfortunately has the side effect of clobbering valid PID
-    # files.
-    self.pid = config[:pid] unless ENV["UNICORN_FD"]
+    # This is needed for compatibility some Monit setups at least.
+    # This unfortunately has the side effect of clobbering valid PID if
+    # we upgrade and the upgrade breaks during preload_app==true &&
build_app!
+    self.pid = config[:pid]
 
     self.master_pid = $$
     build_app! if preload_app
     bind_new_listeners!
 
-    # Assuming preload_app==false, we drop the pid file after the app is ready
-    # to process requests.  If binding or build_app! fails with
-    # preload_app==true, we''ll never get here and the parent will
recover
-    self.pid = config[:pid] if ENV["UNICORN_FD"]
-
     spawn_missing_workers
     self
   end
-- 
EW

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Jimmy Soho

2013-Nov-26 04:55 UTC

head link

Re: Issues with PID file renaming

On Tue, Nov 26, 2013 at 12:42 PM, Michael Fischer <mfischer@zendesk.com>
wrote:
> I''d suggest that you monitor Unicorn by issuing a test request to
it
> via its listening socket instead.  Ultimately, you''re more likely
> concerned about whether Unicorn is serving requests, not whether its
> pid file exists.  (Such a check can also lead to false positives;
> consider what might happen if an admin or the Linux OOM killer sends
> it a SIGKILL, leaving the pid file intact.)
Great suggestion, thanks. I''ve added that, as shown below, in case
others are interested. Though perhaps not necessary, am still checking
the PID file as well so it will start asap on absence of the pid file.

@Eric: Thanks for that change.

check process unicorn
  with pidfile /srv/app.example.com/shared/pids/unicorn.pid
  group web
  start program = "/etc/init.d/unicorn start"
  stop program = "/etc/init.d/unicorn stop"
  if mem > 300.0 MB for 1 cycles then restart
  if cpu > 50% for 2 cycles then alert
  if cpu > 80% for 3 cycles then restart
  if failed unixsocket /tmp/.unicorn_sock type tcp protocol http
    and request ''/ping'' hostheader
''app.example.com''
    with timeout 20 seconds for 2 cycles
    then restart


--
Jimmy
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Petteri Räty

2013-Dec-10 12:46 UTC

head link

Re: Issues with PID file renaming

On 26.11.2013 3.20, Eric Wong wrote:
>>
>> I think the way 4.6.2 worked is better. There should be a pid file for
>> the new master process the moment it''s created.
> 
>> What do you think?
> 
> How about having the old process create a hard link to .oldbin,
> and having the new one override the pid if Process.ppid == pid file?
> The check is still racy, but that''s what pid files are :<
> 
Isn''t it possible to always keep a valid pid file by using the fact
that
mv is atomic? Basically the new process writes the pid first to a temp
file and then moves it over the old pid file after having hard linked
the file to .oldbin?

$ echo "1" > foo.pid
$ ln foo.pid foo.oldpid
$ echo "2" > tmp
$ mv tmp foo.pid
$ cat *pid
1
2

Regards,
Petteri

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Eric Wong

2013-Dec-10 19:52 UTC

head link

Re: Issues with PID file renaming

Petteri Räty <betelgeuse@gentoo.org> wrote:> On 26.11.2013 3.20, Eric Wong wrote:
> > How about having the old process create a hard link to .oldbin,
> > and having the new one override the pid if Process.ppid == pid file?
> > The check is still racy, but that's what pid files are :<
> 
> Isn't it possible to always keep a valid pid file by using the fact
that
> mv is atomic? Basically the new process writes the pid first to a temp
> file and then moves it over the old pid file after having hard linked
> the file to .oldbin?
> 
> $ echo "1" > foo.pid
> $ ln foo.pid foo.oldpid
> $ echo "2" > tmp
> $ mv tmp foo.pid
> $ cat *pid
> 1
> 2
It's possible, but is it worth it?  Having both pid files point to the
same pid is still wrong.
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Petteri Räty

2013-Dec-10 22:44 UTC

head link

Re: Issues with PID file renaming

On 10.12.2013 21.52, Eric Wong wrote:> Petteri Räty <betelgeuse@gentoo.org> wrote:
>> On 26.11.2013 3.20, Eric Wong wrote:
>>> How about having the old process create a hard link to .oldbin,
>>> and having the new one override the pid if Process.ppid == pid
file?
>>> The check is still racy, but that's what pid files are :<
>>
>> Isn't it possible to always keep a valid pid file by using the fact
that
>> mv is atomic? Basically the new process writes the pid first to a temp
>> file and then moves it over the old pid file after having hard linked
>> the file to .oldbin?
>>
>> $ echo "1" > foo.pid
>> $ ln foo.pid foo.oldpid
>> $ echo "2" > tmp
>> $ mv tmp foo.pid
>> $ cat *pid
>> 1
>> 2
> 
> It's possible, but is it worth it?  Having both pid files point to the
> same pid is still wrong.
> 
At least for pid based monitoring tools it is (I do agree with others
that you should also be monitoring http though). For example monit
requires that you give it a pid file. Why is it wrong for them to point
to the same pid? Seems ok to me especially since they share the inode.
Any way I think this is better than there being no pid file. After all
we are doing a zero downtime deploy which means there's always a valid
process.

Regards,
Petteri
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Michael Fischer

2013-Dec-11 13:54 UTC

head link

Re: Issues with PID file renaming

On Tue, Dec 10, 2013 at 10:44 PM, Petteri Räty <betelgeuse@gentoo.org>
wrote:
> At least for pid based monitoring tools it is (I do agree with others
> that you should also be monitoring http though). For example monit
> requires that you give it a pid file. Why is it wrong for them to point
> to the same pid?
Monit doesn''t require a pid, never has:

http://mmonit.com/monit/documentation/monit.html#connection_testing

To answer your question, though, the reason the pid files must contain
different PIDs is that the two processes (previous and
current-generation masters) have different PIDs.  And some of us do
care that they differ :)

--Michael
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Petteri Räty

2013-Dec-12 19:15 UTC

head link

Re: Issues with PID file renaming

On 11.12.2013 15.54, Michael Fischer wrote:> On Tue, Dec 10, 2013 at 10:44 PM, Petteri Räty
<betelgeuse@gentoo.org> wrote:
> 
>> At least for pid based monitoring tools it is (I do agree with others
>> that you should also be monitoring http though). For example monit
>> requires that you give it a pid file. Why is it wrong for them to point
>> to the same pid?
> 
> Monit doesn''t require a pid, never has:
> 
> http://mmonit.com/monit/documentation/monit.html#connection_testing
> 
If you only want to do connection testing. What if you want to monitor
the memory usage of unicorn processes?
> To answer your question, though, the reason the pid files must contain
> different PIDs is that the two processes (previous and
> current-generation masters) have different PIDs.  And some of us do
> care that they differ :)
> 
Eric''s original comment and my response was about the pid files having
the same content (albeit shortly).

Regards,
Petteri

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Eric Wong

2013-Dec-12 20:51 UTC

head link

Re: Issues with PID file renaming

Petteri Räty <betelgeuse@gentoo.org> wrote:> On 11.12.2013 15.54, Michael Fischer wrote:
> > On Tue, Dec 10, 2013 at 10:44 PM, Petteri Räty
<betelgeuse@gentoo.org> wrote:
> > 
> >> At least for pid based monitoring tools it is (I do agree with
others
> >> that you should also be monitoring http though). For example monit
> >> requires that you give it a pid file. Why is it wrong for them to
point
> >> to the same pid?
> > 
> > Monit doesn't require a pid, never has:
> > 
> > http://mmonit.com/monit/documentation/monit.html#connection_testing
> > 
> 
> If you only want to do connection testing. What if you want to monitor
> the memory usage of unicorn processes?
Your monitoring has to adapt to the new processes anyways.  Can it
really not deal with a PID file being temporarily absent?

There's a plethora of tools in the wild which deal with PID files.
Consider this case:

It's likely a tool will see foo.pid.oldbin existing, and wait for
foo.pid to become available.  If foo.pid is already available from a
hardlink, it'll be pointing to the old PID, and any tool which backs out
of the upgrade by intending to kill the new PID will end up hitting the
old one.
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

mongrel unicorn - Nov 2013 - Issues with PID file renaming

Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming

Re: Issues with PID file renaming