Matthijs,
On Jul 9, 2008, at 12:22 +0200, Matthijs Mekking wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Shane,
>
>> [1214740996] nsd[93921]: warning: nsd is already running as 93888,
>> continuing
>> [1214740996] nsd[93922]: error: can't bind the socket: Address
>> already
>> in use
>> [1214741027] nsd[94418]: error: can't bind the socket: Address
>> already
>> in use
>> [1214741057] nsd[94932]: error: can't bind the socket: Address
>> already
>> in use
>
> This occurs when you call nsd manually (eg without nsdc, NSD control
> script). Because NSD is already running, it can't bind the socket, and
> server initialization for this process fails. Because server
> initialization fails, it tries to remove the pidfile. Hence, later you
> will only see the socket bind error, and no longer the 'already
> running'
> warning. (and therefore, nsdc running will tell you it is not running)
>
> I changed in nsd.c that the pidfile is written only after succeeding
> server initialization.
Cool.
>> I think this is because we have a script monitoring to make sure
>> NSD is
>> running at all time and attempts to start it... even though NSD is
>> already running.
>
> What script do you use for monitoring NSD? nsdc also can be used for
> this. nsdc running to check if nsd is running, if it returns 1 (not
> running), you can do nsdc start.
We use nsdc for this. The script basically does:
while true; do
if ! nsdc running; then
nsdc start
fi
sleep 15
done
>> In the nsdc.sh script we see the following:
>>
>>
>> signal() {
>> if [ -s ${pidfile} ]
>> then
>> kill -"$1" `cat ${pidfile}` && return
0
>> else
>> echo "nsd is not running"
>> fi
>> return 1
>> }
>>
>>
>> But it seems like NSD restarts itself regularly, getting a new
>> process
>> ID when it does so. In this case, we have the possibility for the
>> following to happen:
>>
>> - nsdc.sh reads the contents of pidfile
>>
>> - NSD restarts, getting a new PID
>>
>> - nsdc.sh sends a signal to test NSD using the old PID, which
>> fails, so
>> nsdc claims NSD is not running
>>
>> Is this possible?
>
> As far as I know, when NSD restarts (because it received a dedicated
> signal), it takes care of updating the pidfile.
When you use "nsdc patch", you get an implicit "nsdc
reload". We run
this from a cron job.
nsdc reload issues a SIGHUP to NSD.
This eventually ends up in the server_main() function in server.c,
which calls fork(), and therefore gets a new pid, which it then writes
into the pidfile.
So, the scenario is:
Time 1: NSD, running as PID A, writes into pidfile
Time 2: nsdc reads PID A from pidfile
Time 3: NSD gets a SIGHUP, forks a new process with PID B, and exits
the old process
Time 4: nsdc sends a signal to PID A, which no longer exists
Time 5: nsdc returns "server not running" even though the server is
running.
>> It is possible to work around this with a little more
>> sophistication, I
>> think:
>>
>> signal() {
>> while true
>> do
>> # if there is no PID file, NSD is not running
>> if [ ! -s ${pidfile} ]
>> then
>> return 1
>> fi
>>
>> # if we can send the signal to the PID, then NSD is running
>> # (or some other process with that PID, but we hope
>> not...)
>> PID=`cat ${pidfile}`
>> if kill -"$1" $PID
>> then
>> return 0
>> fi
>>
>> # double-check NSD did not restart between the time we read
>> the PID
>> # and the time we sent the signal
>> CHECK_PID=`cat ${pidfile}`
>> if [ $PID -eq $CHECK_PID ]
>> then
>> echo "nsd is not running"
>> return 1
>> fi
>> done
>> }
>
> Could you try the trunk release? I think it already fixes this issue.
> Make sure your control script first checks if nsd is running (nsdc
> running) and if not start it (nsdc start).
The fix you made makes sense, and should be included.
But I am reasonably sure there is nothing that the server can do to
fix this problem (mind you I am a bit sleep-deprived right now, so no
promises). ;)
I think the script needs to work like I coded it here, where it checks
the PID of the server did not change while it was checking.
--
Shane