Chris Hoogendyk
2011-Feb-25 15:12 UTC
[Dovecot] error - dovecot - child (login) killed with signal 16
We've been happily running Dovecot in Solaris 10 for about 8 months now.
Yesterday, Solaris 10's SMF put it in maintenance mode (shut it down)
seemingly out of the blue. The
only thing going on was that I had updated the ssl certificate. We had
previously been using a self
signed cert. Now we have a cert from InCommon. We had a problem with a few users
who are still on
ancient Eudora and didn't have the proper authority chains. We had a problem
with a few Thunderbird
users who hadn't set up the fully qualified server name in their
configuration. The shutdown was
coincident (and could be just a coincidence) to the minute with my boss working
with a Windows
Thunderbird user to change their configuration.
I can't figure out how this happened or why. I don't see how to track it
any further than what I
have here, and it doesn't make sense that a user issue or process could
cause this. The IP, by the
way, is our NAT. So no way of telling much from that.
From /var/adm/dovecot.log:
Feb 24 15:52:57 marlin dovecot: [ID 583609 local2.error] dovecot: child 21233
(login) killed with
signal 16 (ip=128.119.55.8)
Feb 24 15:52:58 marlin dovecot: [ID 583609 local2.warning] dovecot: Killed with
signal 15 (by
pid=26750 uid=0 code=kill)
Note that the first line is logged as an error.
These items from the SMF service log for dovecot:
[ Feb 24 15:52:58 Stopping because process received fatal signal from outside
the service. ]
[ Feb 24 15:52:58 Executing stop method
("/etc/mail/svc/method/dovecot.init.d stop") ]
Stopping Dovecot
are the source of the signal 15 in the second line of the dovecot log above.
So, the presumption is that the signal 16 precipitated SMF to put the service in
maintenance mode
which lead to the signal 15.
Any ideas what went wrong here? How to track it down? How to prevent it from
happening again?
Doing `svcadm clear dovecot` cleared the maintenance mode and started it up
again. No problems since.
--
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology& Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk at bio.umass.edu>
---------------
Erd?s 4
Timo Sirainen
2011-Feb-28 15:15 UTC
[Dovecot] error - dovecot - child (login) killed with signal 16
On Fri, 2011-02-25 at 10:12 -0500, Chris Hoogendyk wrote:> Feb 24 15:52:57 marlin dovecot: [ID 583609 local2.error] dovecot: child 21233 (login) killed with > signal 16 (ip=128.119.55.8)Signal 16 is SIGUSR1. Dovecot master process sends it to login processes when max number of login processes has been reached and it's telling the processes to kill their oldest connections. But I don't know why the process itself would get killed. That's not intentional and I can't see any bugs in the code related to that either..> Feb 24 15:52:58 marlin dovecot: [ID 583609 local2.warning] dovecot: Killed with signal 15 (by > pid=26750 uid=0 code=kill) > > Note that the first line is logged as an error. > > These items from the SMF service log for dovecot: > > [ Feb 24 15:52:58 Stopping because process received fatal signal from outside the service. ] > [ Feb 24 15:52:58 Executing stop method ("/etc/mail/svc/method/dovecot.init.d stop") ] > Stopping DovecotYou could try if you can reproduce this by killing one of the login processes with -USR1.