thr3ads.net - Mongrel users - [Mongrel] Why not ignore stale PID files? [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Gunnar Wolf

2008-Jun-05 21:08 UTC

[Mongrel] Why not ignore stale PID files?

Hi,

I have an application which is dying horrible deaths
(i.e. segmentation faults) in mid-flight, in production... And of
course, I should fix it. But while I find and fix the bugs, I found
something I think should be different - I can work on submitting a
patch, as it is quite simple, but I might be losing something on my
rationale. 

When Mongrel segfaults, it does not -obviously- get to clean up after
itself, so it does not remove the PID files. As an example:

$ sudo /etc/init.d/mongrel-cluster start
Starting mongrel-cluster: Starting all mongrel_clusters...
mongrel-cluster.
$ sudo cat tmp/pids/mongrel.8203.pid | xargs kill -9 
$ sudo /etc/init.d/mongrel-cluster status
(...)
found pid_file: tmp/pids/mongrel.8203.pid
missing mongrel_rails: port 8203
(...)
$ sudo /etc/init.d/mongrel-cluster restart
Restarting mongrel-cluster: Restarting all mongrel_clusters...
** !!! PID file tmp/pids/mongrel.8203.pid already exists.  Mongrel could be
running already.  Check your log/mongrel.8203.log for errors.
** !!! Exiting with error.  You must stop mongrel and clear the .pid before
I''ll attempt a start.
mongrel-cluster.

So, what''s the solution? I must manually do:

$ sudo rm tmp/pids/mongrel.8203.pid 
$ sudo /etc/init.d/mongrel-cluster restart

And now it works.

What should happen? Well, ''status'' already found that there is
a stale
PID. Of course, the ''status'' action means exactly that: Get
the
status, do nothing else. But the ''stop'' action should clean
the PIDs
if they do no longer exist, and the ''start'' action should
check
whether the process with that PID is alive, and ignore it if it''s
not. At least, this behaviour should be specifiable via the
configuration file.

What do you think? 

-- 
Gunnar Wolf - gwolf at iiec.unam.mx - (+52-55)5623-0154 / 1451-2244
PGP key 1024D/8BB527AF 2001-10-23
Fingerprint: 0C79 D2D1 2C4E 9CE4 5973  F800 D80E F35A 8BB5 27AF

Tikhon Bernstam

2008-Jun-06 02:29 UTC

head link

[Mongrel] Why not ignore stale PID files?

use the mongrel_cluster --clean option

On 6/5/08, Gunnar Wolf <gwolf at gwolf.org> wrote:>
> Hi,
>
> I have an application which is dying horrible deaths
> (i.e. segmentation faults) in mid-flight, in production... And of
> course, I should fix it. But while I find and fix the bugs, I found
> something I think should be different - I can work on submitting a
> patch, as it is quite simple, but I might be losing something on my
> rationale.
>
> When Mongrel segfaults, it does not -obviously- get to clean up after
> itself, so it does not remove the PID files. As an example:
>
> $ sudo /etc/init.d/mongrel-cluster start
> Starting mongrel-cluster: Starting all mongrel_clusters...
> mongrel-cluster.
> $ sudo cat tmp/pids/mongrel.8203.pid | xargs kill -9
> $ sudo /etc/init.d/mongrel-cluster status
> (...)
> found pid_file: tmp/pids/mongrel.8203.pid
> missing mongrel_rails: port 8203
> (...)
> $ sudo /etc/init.d/mongrel-cluster restart
> Restarting mongrel-cluster: Restarting all mongrel_clusters...
> ** !!! PID file tmp/pids/mongrel.8203.pid already exists.  Mongrel could be
> running already.  Check your log/mongrel.8203.log for errors.
> ** !!! Exiting with error.  You must stop mongrel and clear the .pid before
> I''ll attempt a start.
> mongrel-cluster.
>
> So, what''s the solution? I must manually do:
>
> $ sudo rm tmp/pids/mongrel.8203.pid
> $ sudo /etc/init.d/mongrel-cluster restart
>
> And now it works.
>
> What should happen? Well, ''status'' already found that
there is a stale
> PID. Of course, the ''status'' action means exactly that:
Get the
> status, do nothing else. But the ''stop'' action should
clean the PIDs
> if they do no longer exist, and the ''start'' action should
check
> whether the process with that PID is alive, and ignore it if it''s
> not. At least, this behaviour should be specifiable via the
> configuration file.
>
> What do you think?
>
>
> --
> Gunnar Wolf - gwolf at iiec.unam.mx - (+52-55)5623-0154 / 1451-2244
> PGP key 1024D/8BB527AF 2001-10-23
> Fingerprint: 0C79 D2D1 2C4E 9CE4 5973  F800 D80E F35A 8BB5 27AF
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mongrel-users/attachments/20080605/833e6299/attachment-0001.html>

Zed A. Shaw

2008-Jun-06 05:01 UTC

head link

[Mongrel] Why not ignore stale PID files?

On Thu, 5 Jun 2008 16:08:06 -0500
Gunnar Wolf <gwolf at gwolf.org> wrote:
> What should happen? Well, ''status'' already found that
there is a stale
> PID. Of course, the ''status'' action means exactly that:
Get the
> status, do nothing else. But the ''stop'' action should
clean the PIDs
> if they do no longer exist, and the ''start'' action should
check
> whether the process with that PID is alive, and ignore it if it''s
> not. At least, this behaviour should be specifiable via the
> configuration file.
That would be the ideal situation, but Ruby doesn''t have good enough
process management APIs to do this portably.  To make it work you''d
have to portably be able to take a PID and see if there''s a mongrel
running with that PID.

You can''t use /proc or /sys because that''s linux only.  You
can''t use
`ps` because the OSX morons changed everything, Solaris has different
format, etc.

If you were to do this, you''d have to dip into C code to pull it off.

Now, if you''re only on linux then you could write yourself a small
little hack to the mongrel_rails script that did this with info out
of /proc.

-- 
Zed A. Shaw
- Hate: http://savingtheinternetwithhate.com/
- Good: http://www.zedshaw.com/
- Evil: http://yearofevil.com/

Erik Hetzner

2008-Jun-06 17:48 UTC

head link

[Mongrel] Why not ignore stale PID files?

At Thu, 5 Jun 2008 16:08:06 -0500,
Gunnar Wolf <gwolf at gwolf.org> wrote:> Hi,
> 
> I have an application which is dying horrible deaths
> (i.e. segmentation faults) in mid-flight, in production... And of
> course, I should fix it. But while I find and fix the bugs, I found
> something I think should be different - I can work on submitting a
> patch, as it is quite simple, but I might be losing something on my
> rationale. 
>
> [?]
I use the following bit in my Capistrano scripts before I start
Mongrel:

( [ -f pid_file ] && ( kill -0 `cat pid_file` >& /dev/null || rm
pid_file ) )

which handles the typical cases (in which no process with a given pid
is running, or a process is running with a different owner from the
mongrel owner) but not the edge case where a process is running, with
the same owner, but is no longer a mongrel process. You could
supplement this with Linux/Solaris specific stuff to check if the
process running is actually a mongrel.

best,
Erik Hetzner
-------------- next part --------------
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<http://rubyforge.org/pipermail/mongrel-users/attachments/20080606/48d231e3/attachment.bin>

Istvan Szukacs

2008-Jun-06 18:48 UTC

head link

[Mongrel] Why not ignore stale PID files?

kill -0 `cat pid_file` >& /dev/null

more like

kill -0 $(<pid_file) >& /dev/null

regards,
Istvan

Erik Hetzner wrote:> At Thu, 5 Jun 2008 16:08:06 -0500,
> Gunnar Wolf <gwolf at gwolf.org> wrote:
>   
>> Hi,
>>
>> I have an application which is dying horrible deaths
>> (i.e. segmentation faults) in mid-flight, in production... And of
>> course, I should fix it. But while I find and fix the bugs, I found
>> something I think should be different - I can work on submitting a
>> patch, as it is quite simple, but I might be losing something on my
>> rationale. 
>>
>> [?]
>>     
>
> I use the following bit in my Capistrano scripts before I start
> Mongrel:
>
> ( [ -f pid_file ] && ( kill -0 `cat pid_file` >& /dev/null
|| rm pid_file ) )
>
> which handles the typical cases (in which no process with a given pid
> is running, or a process is running with a different owner from the
> mongrel owner) but not the edge case where a process is running, with
> the same owner, but is no longer a mongrel process. You could
> supplement this with Linux/Solaris specific stuff to check if the
> process running is actually a mongrel.
>
> best,
> Erik Hetzner
>   
> ------------------------------------------------------------------------
>
> ;; Erik Hetzner, California Digital Library
> ;; gnupg key id: 1024D/01DB07E3
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users

Gunnar Wolf

2008-Jun-06 21:13 UTC

head link

[Mongrel] Why not ignore stale PID files?

Zed A. Shaw dijo [Fri, Jun 06, 2008 at 01:01:32AM
-0400]:> That would be the ideal situation, but Ruby doesn''t have good
enough
> process management APIs to do this portably.  To make it work
you''d
> have to portably be able to take a PID and see if there''s a
mongrel
> running with that PID.
> 
> You can''t use /proc or /sys because that''s linux only. 
You can''t use
> `ps` because the OSX morons changed everything, Solaris has different
> format, etc.
> 
> If you were to do this, you''d have to dip into C code to pull it
off.
> 
> Now, if you''re only on linux then you could write yourself a small
> little hack to the mongrel_rails script that did this with info out
> of /proc.
Oh, silly me... I thought Ruby''s Process class did with the
architectural incompatibilities... What I wrote to check for the
status is quite straightforward:

    ------------------------------------------------------------
    #!/usr/bin/ruby
    require ''yaml''
    confdir = ''/etc/mongrel-cluster/sites-enabled''
    restart_cmd = ''/etc/init.d/mongrel-cluster restart''
    needs_restart = false
    
    (Dir.open(confdir).entries - [''.'',
''..'']).each do |site|
      conf = YAML.load_file "#{confdir}/#{site}"
      pid_location = [conf[''cwd''],
     
conf[''pid_file'']].join(''/'').gsub(/\.pid$/,
''*.pid'')
      pid_files = Dir.glob(pid_location)
    
      pid_files.each do |pidf|
        pid = File.read(pidf)
        begin
          Process.getpgid(pid.to_i)
        rescue Errno::ESRCH
          warn "Process #{pid} (cluster #{site}) is dead!"
          File.unlink pidf
          needs_restart = true
        end
      end
    end
    
    system(restart_cmd) if needs_restart
    ------------------------------------------------------------

(periodically run via cron)

I guess this works in any Unixy environment... I have no idea on
whether Windows implements something similar to Process.getpgid, or
for that matter, anything on Windows'' process management.

Greetings,

-- 
Gunnar Wolf - gwolf at gwolf.org - (+52-55)5623-0154 / 1451-2244
PGP key 1024D/8BB527AF 2001-10-23
Fingerprint: 0C79 D2D1 2C4E 9CE4 5973  F800 D80E F35A 8BB5 27AF

Gunnar Wolf

2008-Jun-06 21:13 UTC

head link

[Mongrel] Why not ignore stale PID files?

Tikhon Bernstam dijo [Thu, Jun 05, 2008 at 07:29:22PM
-0700]:> use the mongrel_cluster --clean option
Very good addition to the overall logic, keeps things cleaner :-)

-- 
Gunnar Wolf - gwolf at gwolf.org - (+52-55)5623-0154 / 1451-2244
PGP key 1024D/8BB527AF 2001-10-23
Fingerprint: 0C79 D2D1 2C4E 9CE4 5973  F800 D80E F35A 8BB5 27AF

Eric Wong

2008-Jun-06 23:18 UTC

head link

[Mongrel] Why not ignore stale PID files?

Gunnar Wolf <gwolf at gwolf.org> wrote:> Zed A. Shaw dijo [Fri, Jun 06, 2008 at 01:01:32AM -0400]:
> > That would be the ideal situation, but Ruby doesn''t have good
enough
> > process management APIs to do this portably.  To make it work
you''d
> > have to portably be able to take a PID and see if there''s a
mongrel
> > running with that PID.
> > 
> > You can''t use /proc or /sys because that''s linux
only.  You can''t use
> > `ps` because the OSX morons changed everything, Solaris has different
> > format, etc.
> > 
> > If you were to do this, you''d have to dip into C code to pull
it off.
> > 
> I guess this works in any Unixy environment... I have no idea on
> whether Windows implements something similar to Process.getpgid, or
> for that matter, anything on Windows'' process management.
Process.kill(0, pid) also works and is (in my experience) more
widely used.

-- 
Eric Wong

Hongli Lai

2008-Jun-10 23:25 UTC

head link

[Mongrel] Why not ignore stale PID files?

Zed A. Shaw wrote:
 > That would be the ideal situation, but Ruby doesn''t have good
enough
 > process management APIs to do this portably.

Erik Hetzner:
 > ... but not the edge case where a process is running, with
 > the same owner, but is no longer a mongrel process.

I feel obligated to reply. :) PID files suck. I think it''s really
stupid
that modern operating systems don''t provide some kind of mechanism to 
automatically delete a file when a process exits (even when it exits 
abnormally).

Anyway, I''ve written a fair share of daemons in the past. What I tend
to
do is to combine PID files with a number of lock files:
- foo.pid. This is obviously the PID file.
- foo.lock. This is a lock file whose lock is acquired during the life 
time of the daemon. If the daemon exits, whether normally or abnormally, 
the lock on that file is released. To check whether foo.pid is stale, we 
simply check whether foo.lock is locked.

The only way to check whether foo.lock is locked, is to lock it with the 
non-blocking parameter. If locking fails then it means it''s already 
locked, meaning that the PID file is not stale. However, this could 
result in a racing condition. Suppose that you are starting a daemon, 
while simultaneously checking whether the daemon is already started:
1. The checker acquires a non-blocking lock on foo.lock. This succeeds, 
so it knows that the PID file is stale. It prints "stale PID file 
detected" on screen, and is about to release the lock on foo.lock.
2. All of a sudden, before the lock is released, a context switch 
occurs. The daemon that is being started tries to acquire a lock on 
foo.lock. This fails because the checker still has the lock, so the 
daemon thinks that there''s already a daemon running, and exits.

So we need another lock file to serialize all PID file related actions:
- foo.global.lock

So the code for checking whether the daemon''s running is something like
this:
   def check():
      lock(foo.global.lock)
      if try_lock(foo.lock):
         # Locking succeeded, so we have a stale PID file here.
         unlock(foo.lock)
         unlock(foo.global.lock)
         return nil
      else:
         # Locking failed. Process is still running.
         pid = read_pid_file(foo.pid)   # Of course, your code should 
also check whether the PID file actually exist.
         unlock(foo.global.lock)
         return pid

Daemon code:
   lock(foo.global.lock)
   write_pid_file(foo.pid)
   lock(foo.lock)
   unlock(foo.global.lock)

   main_loop()

   lock(foo.global.lock)
   delete_file(foo.pid)
   unlock(foo.lock)
   unlock(foo.global.lock)

NOTE: lock() creates the lock file if it doesn''t already exist.

This works great, even on Windows. The only gotchas are:
- flock() doesn''t work over NFS. You''ll have to use some kind
of fcntl()
call to lock files over NFS, but I''m not sure whether Ruby provides an 
API for that.
- foo.global.lock is never deleted. You cannot safely delete it without 
creating some kind of racing condition.

Hongli Lai

2008-Jun-10 23:28 UTC

head link

[Mongrel] Why not ignore stale PID files?

Hongli Lai wrote:> This works great, even on Windows. The only gotchas are:
> - flock() doesn''t work over NFS. You''ll have to use some
kind of fcntl()
> call to lock files over NFS, but I''m not sure whether Ruby
provides an
> API for that.
> - foo.global.lock is never deleted. You cannot safely delete it without 
> creating some kind of racing condition.
I forgot to mention that it is safe to delete foo.lock. So the shutdown 
part of the daemon code should look like this:

   lock(foo.global.lock)
   delete_file(foo.pid)
   unlock(foo.lock)
   delete_file(foo.lock)     # added this line
   unlock(foo.global.lock)

Jos Backus

2008-Jun-10 23:50 UTC

head link

[Mongrel] Why not ignore stale PID files?

On Wed, Jun 11, 2008 at 01:25:41AM +0200, Hongli Lai
wrote:> PID files suck.
Agreed. Just use daemontools or runit or some other process manager - no
pidfiles or complicated locking code needed.

-- 
Jos Backus
jos at catnook.com

Scott Windsor

2008-Jun-11 01:24 UTC

head link

[Mongrel] Why not ignore stale PID files?

Has anyone considering turning the mongrel_cluster into a process manager
daemon?

I know that generally many people rely on other applications (such as monit)
to ensure that mongrels are up and running, but it seems that integrated
process management out of the box would be a large win.  The mongrel_cluster
could remain running (rather than exiting) and keep track of the running
mongrels (potentially restarting them if they die or zombie).  At that
point, pid files become uneeded for tracking running mongrels.  The only
exception would be if the mongrel cluster itself dies - at this point it
would orphan the child processes and it would up to the cluster to kill off
(or resume ownership) of any orphaned processes.

thoughts?

- scott

On Tue, Jun 10, 2008 at 4:50 PM, Jos Backus <jos at catnook.com> wrote:
> On Wed, Jun 11, 2008 at 01:25:41AM +0200, Hongli Lai wrote:
> > PID files suck.
>
> Agreed. Just use daemontools or runit or some other process manager - no
> pidfiles or complicated locking code needed.
>
> --
> Jos Backus
> jos at catnook.com
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mongrel-users/attachments/20080610/9060e1fc/attachment.html>

Jos Backus

2008-Jun-11 01:42 UTC

head link

[Mongrel] Why not ignore stale PID files?

On Tue, Jun 10, 2008 at 06:24:58PM -0700, Scott Windsor
wrote:> Has anyone considering turning the mongrel_cluster into a process manager
> daemon?
I''m not using this myself (I use standalone daemontools) but
mongrel_runit
should fit the bill at least somewhat:

    https://wiki.hjksolutions.com/display/MR/Home

-- 
Jos Backus
jos at catnook.com

Zed A. Shaw

2008-Jun-11 20:23 UTC

head link

[Mongrel] Why not ignore stale PID files?

On Tue, 10 Jun 2008 16:50:39 -0700
Jos Backus <jos at catnook.com> wrote:
> On Wed, Jun 11, 2008 at 01:25:41AM +0200, Hongli Lai wrote:
> > PID files suck.
> 
> Agreed. Just use daemontools or runit or some other process manager - no
> pidfiles or complicated locking code needed.
You ever read the code to runit?  I wouldn''t touch that thing with a
10'' pole.  Haven''t used daemontools though.

-- 
Zed A. Shaw
- Hate: http://savingtheinternetwithhate.com/
- Good: http://www.zedshaw.com/
- Evil: http://yearofevil.com/

Jos Backus

2008-Jun-11 23:47 UTC

head link

[Mongrel] Why not ignore stale PID files?

On Wed, Jun 11, 2008 at 04:23:10PM -0400, Zed A. Shaw
wrote:> You ever read the code to runit?  I wouldn''t touch that thing with
a
> 10'' pole.  Haven''t used daemontools though.
Haven''t looked at runit code, no. Daemontools so far has worked great
for me
for over a decade.

-- 
Jos Backus
jos at catnook.com

Reasonably Related Threads

Search for more possibly parallel threads

Mongrel users - Jun 2008 - Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

[Mongrel] Why not ignore stale PID files?

Reasonably Related Threads