Hello, We have a stale lock file that is preventing nsdc from running. From the log file our cron job produces: Wed Oct 15 05:40:01 UTC 2008 5:40AM up 16:27, 0 users, load averages: 6.33, 8.02, 7.31 ns0a 25383 1 8175 /opt/prod/nsd/sbin/nsdc: line 138: /opt/nshome/ns0a/var/nsd.db.lock: cannot overwrite existing file database locked by PID: 78717 aborting... ns0a 25383 1 8175 Wed Oct 15 08:40:00 UTC 2008 8:40AM up 19:27, 0 users, load averages: 9.36, 5.78, 5.18 ns0a 25383 1 8175 /opt/prod/nsd/sbin/nsdc: line 138: /opt/nshome/ns0a/var/nsd.db.lock: cannot overwrite existing file database locked by PID: 78717 aborting... ns0a 25383 1 8175 This lock file does exist, and does point to process 78717: [root at app7 /opt/nshome/ns0a/var]# ls -l total 1639596 -rw-r--r-- 1 ns0a ns0a 601899131 Oct 15 09:09 ixfr.db -rw-r--r-- 1 root ns0a 426287130 Oct 14 06:01 nsd.db -rw-r--r-- 1 root ns0a 0 Oct 14 08:49 nsd.db.78717 -r--r--r-- 1 root ns0a 30 Oct 14 08:49 nsd.db.lock -rw-r--r-- 1 ns0a ns0a 6 Oct 14 20:31 nsd.pid -rw-r--r-- 1 root ns0a 1079 Jul 23 02:50 org.afilias-nst.info.zone -rw-r--r-- 1 root ns0a 1075 Jul 23 02:50 org.afilias-nst.org.zone -rw-r--r-- 1 root ns0a 649832393 Oct 14 08:49 org.zone -rw-r--r-- 1 ns0a ns0a 2414 Sep 5 00:21 xfrd.state [root at app7 /opt/nshome/ns0a/var]# cat nsd.db.lock database locked by PID: 78717 But the process is not running: [root at app7 /opt/nshome/ns0a/var]# ps ax | grep 78717 78079 p0 S+ 0:00.00 grep 78717 As with the signal() case reported a few months ago, nsdc.sh needs a bit of love. The lock() function needs to be improved so it handles stale locks. Something like this would probably work (and is even NFS-safe), but requires that everything that writes to the lock use the PID and not "database locked by PID: $$" as the contents. lock() { # create a temporary file based on our PID TEMPFILE="${dbfile}.$$" echo $$ > $TEMPFILE || (echo "error creating temporary file, aborting..."; exit 1) # try to lock using this file if ln $TEMPFILE ${lockfile} 2>/dev/null; then rm -f $TEMPFILE return fi # if that did not work, see if the locking process exists PID=`cat ${lockfile}` if kill -0 $PID 2>/dev/null; then rm -f $TEMPFILE echo "database locked by PID: $PID" exit 1 fi # if the locking process does not exist, consider the lock stale echo "removing stale lockfile" rm -f ${lockfile} # lock the database if ! ln $TEMPFILE ${lockfile} 2>/dev/null; then rm -f $TEMPFILE echo "unable to lock database" exit 1 fi } Bad things happen to good processes. :) Cheers, -- Shane