Hi Eric, Since we upgraded from 4.6.2 to 4.7.0 we''re regularly having issues where one or more unicorn master processes would not upgrade correctly, resulting in an (old) master process. What I notice is the following: when upgrading with 4.6.2 the file unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid file is updated (the (new) master process). After a while the worker pid files are updated and the unicorn.pid.oldbin file disappears, and all is fine. When upgrading with 4.7.0 though the file unicorn.pid.oldbin is created, but there is no unicorn.pid file. After a while (when the new master process has finished initialising, and is ready to fork the workers) the worker pid files are updated and a unicorn.pid file appears, and then the unicorn.pid.oldbin file disappears. So there is a relatively long period where there is no unicorn.pid file. I think the problem for us is caused by monit, our process monitor, which monitors the unicorn.pid file: check process unicorn with pidfile /srv/app.itrp-staging.com/shared/pids/unicorn.pid start program = "/etc/init.d/unicorn start" stop program = "/etc/init.d/unicorn stop" ... Because the unicorn.pid file is absent for a relatively long period monit will try to start unicorn, while an upgrade is in progress. (which might be another issue: the start action should recognise a start or upgrade process is already underway; sadly this check too relies on the existence of the unicorn.pid file) I think the way 4.6.2 worked is better. There should be a pid file for the new master process the moment it''s created. What do you think? -- Jim _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Jimmy Soho <jimmy.soho@gmail.com> wrote:> Hi Eric, > > Since we upgraded from 4.6.2 to 4.7.0 we''re regularly having issues > where one or more unicorn master processes would not upgrade > correctly, resulting in an (old) master process. > > What I notice is the following: when upgrading with 4.6.2 the file > unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid file > is updated (the (new) master process). After a while the worker pid > files are updated and the unicorn.pid.oldbin file disappears, and all > is fine.Ugh. This is what I feared... Slow startup time of most Ruby web apps doesn''t help.> When upgrading with 4.7.0 though the file unicorn.pid.oldbin is > created, but there is no unicorn.pid file. After a while (when the new > master process has finished initialising, and is ready to fork the > workers) the worker pid files are updated and a unicorn.pid file > appears, and then the unicorn.pid.oldbin file disappears. > > So there is a relatively long period where there is no unicorn.pid file. > > I think the problem for us is caused by monit, our process monitor, > which monitors the unicorn.pid file: > > check process unicorn with pidfile > /srv/app.itrp-staging.com/shared/pids/unicorn.pid > start program = "/etc/init.d/unicorn start" > stop program = "/etc/init.d/unicorn stop" > ...Is there an alternate way to monitor unicorn with monit? But I''d like to keep your case working if possible.> Because the unicorn.pid file is absent for a relatively long period > monit will try to start unicorn, while an upgrade is in progress. > (which might be another issue: the start action should recognise a > start or upgrade process is already underway; sadly this check too > relies on the existence of the unicorn.pid file) > > I think the way 4.6.2 worked is better. There should be a pid file for > the new master process the moment it''s created.> What do you think?How about having the old process create a hard link to .oldbin, and having the new one override the pid if Process.ppid == pid file? The check is still racy, but that''s what pid files are :< _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
On Mon, Nov 25, 2013 at 5:20 PM, Eric Wong <normalperson@yhbt.net> wrote:>> What I notice is the following: when upgrading with 4.6.2 the file >> unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid file >> is updated (the (new) master process). After a while the worker pid >> files are updated and the unicorn.pid.oldbin file disappears, and all >> is fine. > > Ugh. This is what I feared... Slow startup time of most Ruby web apps > doesn''t help.Why doesn''t the new master write its pid file immediately upon forking? It shouldn''t have to wait for everything else to load, should it?> How about having the old process create a hard link to .oldbin, > and having the new one override the pid if Process.ppid == pid file? > The check is still racy, but that''s what pid files are :<If you decide to implement this, please make it optional. The current behavior is correct IMO, and it will work much better if the above issue is addressed. --Michael _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
On Mon, Nov 25, 2013 at 5:00 PM, Jimmy Soho <jimmy.soho@gmail.com> wrote:> I think the problem for us is caused by monit, our process monitor, > which monitors the unicorn.pid file: > > check process unicorn with pidfile > /srv/app.itrp-staging.com/shared/pids/unicorn.pid > start program = "/etc/init.d/unicorn start" > stop program = "/etc/init.d/unicorn stop" > …I''d suggest that you monitor Unicorn by issuing a test request to it via its listening socket instead. Ultimately, you''re more likely concerned about whether Unicorn is serving requests, not whether its pid file exists. (Such a check can also lead to false positives; consider what might happen if an admin or the Linux OOM killer sends it a SIGKILL, leaving the pid file intact.) Best regards, --Michael _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Michael Fischer <mfischer@zendesk.com> wrote:> On Mon, Nov 25, 2013 at 5:20 PM, Eric Wong <normalperson@yhbt.net> wrote: > > >> What I notice is the following: when upgrading with 4.6.2 the file > >> unicorn.pid is copied to unicorn.pid.oldbin and the unicorn.pid file > >> is updated (the (new) master process). After a while the worker pid > >> files are updated and the unicorn.pid.oldbin file disappears, and all > >> is fine. > > > > Ugh. This is what I feared... Slow startup time of most Ruby web apps > > doesn''t help. > > Why doesn''t the new master write its pid file immediately upon > forking? It shouldn''t have to wait for everything else to load, > should it?It used to do this in 4.6, but I wanted to support dropping the PID later in case of config failures. I''ll revert that part of the change.> > How about having the old process create a hard link to .oldbin, > > and having the new one override the pid if Process.ppid == pid file? > > The check is still racy, but that''s what pid files are :< > > If you decide to implement this, please make it optional. The current > behavior is correct IMO, and it will work much better if the above > issue is addressed.I''ve decided against that :) Just pushed this out to git://bogomips.org/unicorn as 795c3527337ff4f03ae6db08c5df01141565ed96 ---------------------------- 8< ----------------------------- Subject: [PATCH] always write PID file early for compatibility This reduces the window for a non-existent PID for folks who monitor PIDs (not a great idea anyways). Unfortunately, this change also brings us back to the case where having a PID later (for other process monitors) is beneficial but more unicorn releases exist where we write the PID early. Thanks to Jimmy Soho for reporting this issue. ref: <CAHStS5gFYcPBDxkVizAHrOeDKAkjT69kruFdgaY0CbB+vLbK8Q@mail.gmail.com> This partially reverts 7d6ac0c17eb29a00a5b74099dbb3d4d015999f27 Folks: please monitor your app with HTTP requests rather than checking processes, a stuck/wedged Ruby VM is still a running one. --- lib/unicorn/http_server.rb | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index 402f133..f15c8a7 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -136,20 +136,15 @@ class Unicorn::HttpServer trap(:CHLD) { awaken_master } # write pid early for Mongrel compatibility if we''re not inheriting sockets - # This was needed for compatibility with some health checker a long time - # ago. This unfortunately has the side effect of clobbering valid PID - # files. - self.pid = config[:pid] unless ENV["UNICORN_FD"] + # This is needed for compatibility some Monit setups at least. + # This unfortunately has the side effect of clobbering valid PID if + # we upgrade and the upgrade breaks during preload_app==true && build_app! + self.pid = config[:pid] self.master_pid = $$ build_app! if preload_app bind_new_listeners! - # Assuming preload_app==false, we drop the pid file after the app is ready - # to process requests. If binding or build_app! fails with - # preload_app==true, we''ll never get here and the parent will recover - self.pid = config[:pid] if ENV["UNICORN_FD"] - spawn_missing_workers self end -- EW _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
On Tue, Nov 26, 2013 at 12:42 PM, Michael Fischer <mfischer@zendesk.com> wrote:> I''d suggest that you monitor Unicorn by issuing a test request to it > via its listening socket instead. Ultimately, you''re more likely > concerned about whether Unicorn is serving requests, not whether its > pid file exists. (Such a check can also lead to false positives; > consider what might happen if an admin or the Linux OOM killer sends > it a SIGKILL, leaving the pid file intact.)Great suggestion, thanks. I''ve added that, as shown below, in case others are interested. Though perhaps not necessary, am still checking the PID file as well so it will start asap on absence of the pid file. @Eric: Thanks for that change. check process unicorn with pidfile /srv/app.example.com/shared/pids/unicorn.pid group web start program = "/etc/init.d/unicorn start" stop program = "/etc/init.d/unicorn stop" if mem > 300.0 MB for 1 cycles then restart if cpu > 50% for 2 cycles then alert if cpu > 80% for 3 cycles then restart if failed unixsocket /tmp/.unicorn_sock type tcp protocol http and request ''/ping'' hostheader ''app.example.com'' with timeout 20 seconds for 2 cycles then restart -- Jimmy _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
On 26.11.2013 3.20, Eric Wong wrote:>> >> I think the way 4.6.2 worked is better. There should be a pid file for >> the new master process the moment it''s created. > >> What do you think? > > How about having the old process create a hard link to .oldbin, > and having the new one override the pid if Process.ppid == pid file? > The check is still racy, but that''s what pid files are :< >Isn''t it possible to always keep a valid pid file by using the fact that mv is atomic? Basically the new process writes the pid first to a temp file and then moves it over the old pid file after having hard linked the file to .oldbin? $ echo "1" > foo.pid $ ln foo.pid foo.oldpid $ echo "2" > tmp $ mv tmp foo.pid $ cat *pid 1 2 Regards, Petteri _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Petteri Räty <betelgeuse@gentoo.org> wrote:> On 26.11.2013 3.20, Eric Wong wrote: > > How about having the old process create a hard link to .oldbin, > > and having the new one override the pid if Process.ppid == pid file? > > The check is still racy, but that's what pid files are :< > > Isn't it possible to always keep a valid pid file by using the fact that > mv is atomic? Basically the new process writes the pid first to a temp > file and then moves it over the old pid file after having hard linked > the file to .oldbin? > > $ echo "1" > foo.pid > $ ln foo.pid foo.oldpid > $ echo "2" > tmp > $ mv tmp foo.pid > $ cat *pid > 1 > 2It's possible, but is it worth it? Having both pid files point to the same pid is still wrong. _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
On 10.12.2013 21.52, Eric Wong wrote:> Petteri Räty <betelgeuse@gentoo.org> wrote: >> On 26.11.2013 3.20, Eric Wong wrote: >>> How about having the old process create a hard link to .oldbin, >>> and having the new one override the pid if Process.ppid == pid file? >>> The check is still racy, but that's what pid files are :< >> >> Isn't it possible to always keep a valid pid file by using the fact that >> mv is atomic? Basically the new process writes the pid first to a temp >> file and then moves it over the old pid file after having hard linked >> the file to .oldbin? >> >> $ echo "1" > foo.pid >> $ ln foo.pid foo.oldpid >> $ echo "2" > tmp >> $ mv tmp foo.pid >> $ cat *pid >> 1 >> 2 > > It's possible, but is it worth it? Having both pid files point to the > same pid is still wrong. >At least for pid based monitoring tools it is (I do agree with others that you should also be monitoring http though). For example monit requires that you give it a pid file. Why is it wrong for them to point to the same pid? Seems ok to me especially since they share the inode. Any way I think this is better than there being no pid file. After all we are doing a zero downtime deploy which means there's always a valid process. Regards, Petteri _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
On Tue, Dec 10, 2013 at 10:44 PM, Petteri Räty <betelgeuse@gentoo.org> wrote:> At least for pid based monitoring tools it is (I do agree with others > that you should also be monitoring http though). For example monit > requires that you give it a pid file. Why is it wrong for them to point > to the same pid?Monit doesn''t require a pid, never has: http://mmonit.com/monit/documentation/monit.html#connection_testing To answer your question, though, the reason the pid files must contain different PIDs is that the two processes (previous and current-generation masters) have different PIDs. And some of us do care that they differ :) --Michael _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
On 11.12.2013 15.54, Michael Fischer wrote:> On Tue, Dec 10, 2013 at 10:44 PM, Petteri Räty <betelgeuse@gentoo.org> wrote: > >> At least for pid based monitoring tools it is (I do agree with others >> that you should also be monitoring http though). For example monit >> requires that you give it a pid file. Why is it wrong for them to point >> to the same pid? > > Monit doesn''t require a pid, never has: > > http://mmonit.com/monit/documentation/monit.html#connection_testing >If you only want to do connection testing. What if you want to monitor the memory usage of unicorn processes?> To answer your question, though, the reason the pid files must contain > different PIDs is that the two processes (previous and > current-generation masters) have different PIDs. And some of us do > care that they differ :) >Eric''s original comment and my response was about the pid files having the same content (albeit shortly). Regards, Petteri _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Petteri Räty <betelgeuse@gentoo.org> wrote:> On 11.12.2013 15.54, Michael Fischer wrote: > > On Tue, Dec 10, 2013 at 10:44 PM, Petteri Räty <betelgeuse@gentoo.org> wrote: > > > >> At least for pid based monitoring tools it is (I do agree with others > >> that you should also be monitoring http though). For example monit > >> requires that you give it a pid file. Why is it wrong for them to point > >> to the same pid? > > > > Monit doesn't require a pid, never has: > > > > http://mmonit.com/monit/documentation/monit.html#connection_testing > > > > If you only want to do connection testing. What if you want to monitor > the memory usage of unicorn processes?Your monitoring has to adapt to the new processes anyways. Can it really not deal with a PID file being temporarily absent? There's a plethora of tools in the wild which deal with PID files. Consider this case: It's likely a tool will see foo.pid.oldbin existing, and wait for foo.pid to become available. If foo.pid is already available from a hardlink, it'll be pointing to the old PID, and any tool which backs out of the upgrade by intending to kill the new PID will end up hitting the old one. _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying