Summary ======After an unclean shutdown the file /var/run/dovecot/master.pid remained behind. This prevented dovecot from starting, and gave a misleading error message. To be more resilient and fault-tolerant, I recommend that dovecot also check the validity of the PID in /var/run/dovecot/master.pid in order to determine whether or not another dovecot process is running. Detail ===== In testing out my automatic UPS shutdown I inadvertently shut down my system uncleanly ... oops! As the system rebooted, I saw that dovecot did not start properly, with an error message: Fatal: Invalid configuration in /etc/dovecot.conf After the system came up, I tried to start dovecot manually. Turns out that there was an invalid PID in /var/run/dovecot/master.pid ----- [root at mykiss5 mth]# service dovecot start Starting Dovecot Imap: Error: Dovecot is already running with PID 1965 (read from /var/run/dovecot/master.pid) Fatal: Invalid configuration in /etc/dovecot.conf [FAILED] [root at mykiss5 mth]# ps 1965 PID TTY STAT TIME COMMAND [root at mykiss5 mth]# rm /var/run/dovecot/master.pid rm: remove regular file `/var/run/dovecot/master.pid'? y [root at mykiss5 mth]# service dovecot start Starting Dovecot Imap: [ OK ] [root at mykiss5 mth]# service dovecot stop Stopping Dovecot Imap: [ OK ] [root at mykiss5 mth]# service dovecot start Starting Dovecot Imap: [ OK ] [root at mykiss5 mth]# dovecot --version 1.0.7 [root at mykiss5 mth]# ----- This leads me to believe that dovecot is only checking for the existance of /var/run/dovecot/master.pid. It seems to me that it would be more fault-tolerant if it also checked the validity of the PID that is in /var/run/dovecot/master.pid. Michael
On Wed, 2008-01-16 at 11:34 -0500, Michael wrote:> Summary > ======> After an unclean shutdown the file /var/run/dovecot/master.pid remained > behind. This prevented dovecot from starting, and gave a misleading error > message. > > To be more resilient and fault-tolerant, I recommend that dovecot also > check the validity of the PID in /var/run/dovecot/master.pid in order to > determine whether or not another dovecot process is running.It does check if a process is running with that PID. Doing any further checks to make sure that the PID is a dovecot process would probably be more trouble than worth.> Starting Dovecot Imap: Error: Dovecot is already running with PID 1965 > (read from /var/run/dovecot/master.pid) > Fatal: Invalid configuration in /etc/dovecot.confThis "Invalid configuration" message is bad. Removed: http://hg.dovecot.org/dovecot/rev/805d0831deb6 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20080116/5558dc59/attachment-0002.bin>
> It does check if a process is running with that PID. Doing any further > checks to make sure that the PID is a dovecot process would probably be > more trouble than worth.Hmmm ... something strange must have happened, because it sure looks like to me that the test failed. Dovecot reported that the old PID was 1965. Per my previous message, I did a 'ps 1965' and saw no processes: ---- [root at mykiss5 mth]# service dovecot start Starting Dovecot Imap: Error: Dovecot is already running with PID 1965 (read from /var/run/dovecot/master.pid) Fatal: Invalid configuration in /etc/dovecot.conf [FAILED] [root at mykiss5 mth]# ps 1965 PID TTY STAT TIME COMMAND [root at mykiss5 mth]# rm /var/run/dovecot/master.pid rm: remove regular file `/var/run/dovecot/master.pid'? y [root at mykiss5 mth]# service dovecot start Starting Dovecot Imap: [ OK ] ---- Once I manually removed /var/run/dovecot/master.pid it started up. *** 5 minutes later *** Well, I just tried to reproduce this by hand, but was not able to. ---- [root at mykiss5 mth]# cd /var/run/dovecot/ [root at mykiss5 dovecot]# ls auth-worker.3053 dict-server login master.pid [root at mykiss5 dovecot]# ls -l total 12 srw------- 1 root root 0 2008-01-16 10:53 auth-worker.3053 srwxrwxrwx 1 root root 0 2008-01-16 10:53 dict-server drwxr-x--- 2 root dovecot 4096 2008-01-16 10:53 login -rw------- 1 root root 5 2008-01-16 10:53 master.pid [root at mykiss5 dovecot]# cp master.pid master.pid.backup [root at mykiss5 dovecot]# service dovecot stop Stopping Dovecot Imap: [ OK ] [root at mykiss5 dovecot]# ls -l total 12 srwxrwxrwx 1 root root 0 2008-01-16 10:53 dict-server drwxr-x--- 2 root dovecot 4096 2008-01-16 11:49 login -rw------- 1 root root 5 2008-01-16 11:49 master.pid.backup [root at mykiss5 dovecot]# ps `cat master.pid.backup` PID TTY STAT TIME COMMAND [root at mykiss5 dovecot]# mv master.pid.backup master.pid [root at mykiss5 dovecot]# service dovecot start Starting Dovecot Imap: [ OK ] [root at mykiss5 dovecot]# ls -l total 12 srw------- 1 root root 0 2008-01-16 11:50 auth-worker.3855 srwxrwxrwx 1 root root 0 2008-01-16 11:50 dict-server drwxr-x--- 2 root dovecot 4096 2008-01-16 11:50 login -rw------- 1 root root 5 2008-01-16 11:50 master.pid [root at mykiss5 dovecot]# ---- I have no idea why it failed after the unclean shutdown/restart ... curiouser and curiouser. Michael