Hi! Recently I've upgraded one of my server running 9.3-STABLE with jail containing 4.11-STABLE system. The host was source-upgraded upto 10.3-STABLE first and next to 11.0-STABLE and jail configuration migrated to /etc/jail.conf. The jail kept intact. "service jail start" started the jail successfully but "service jail restart" fails due to jail being stuck in "dying" state for long time: "jls" shows no running jails and "jls -d" shows the dying jail. How do I know why is it stuck and how to forcebly kill it without reboot of the host? Eugene Grosbein
On 10/26/16 09:09, Eugene Grosbein wrote:> Recently I've upgraded one of my server running 9.3-STABLE with jail containing 4.11-STABLE system. > The host was source-upgraded upto 10.3-STABLE first and next to 11.0-STABLE > and jail configuration migrated to /etc/jail.conf. The jail kept intact. > > "service jail start" started the jail successfully > but "service jail restart" fails due to jail being stuck in "dying" state for long time: > "jls" shows no running jails and "jls -d" shows the dying jail. > > How do I know why is it stuck and how to forcebly kill it without reboot of the host?I've seen this fairly frequently. I think it may have something to do with old network connections waiting to be cleaned up -- if you run sockstat it's all the stuff that gets listed at the end with lots of question marks. BICBW. One tip I've found is *not* to specify the JID number in jail.conf, and just let the system allocate a new one as it feels necessary. If you've scripting that uses the JID to operate on a specific jail, it's easy to substitute the jail name instead. Cheers, Matthew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20161026/f30a6785/attachment.sig>
Eugene Grosbein wrote:> Hi! > > Recently I've upgraded one of my server running 9.3-STABLE with jail containing 4.11-STABLE system. > The host was source-upgraded upto 10.3-STABLE first and next to 11.0-STABLE > and jail configuration migrated to /etc/jail.conf. The jail kept intact. > > "service jail start" started the jail successfully > but "service jail restart" fails due to jail being stuck in "dying" state for long time: > "jls" shows no running jails and "jls -d" shows the dying jail.Same issue here. During upgrade to 10 I wrote a proper jail.conf, and, as this is now a much more transparent handling, I also began to start+stop my jails individually w/o reboot. I found the same issue: often jails do not want to fully terminate, but stay in the "dying" state - sometimes for a minute or so, but sometimes very long (indefinite). It seems this is not related to remaining processes or open files (there are none) but to network connections/sockets which are still present. Probably these connections can be displayed with netstat, and probably netstat -x shows some decreasing counters associated with them - I have not yet found the opportunity to figure out what they exactly mean, but anyway it seems like there may be long times involved (hours? forever?), unless one finds the proper connection and terminates both ends. There seems to be no other way to deliberately "kill" such connections and thereby terminate the jail, so the proposal to let it have a new number might be the only feasible approach. (I dont like it, I got used to the numbers of my jails.)
On Wed, Oct 26, 2016 at 03:09:31PM +0700, Eugene Grosbein wrote:> Hi! > > Recently I've upgraded one of my server running 9.3-STABLE with jail containing 4.11-STABLE system. > The host was source-upgraded upto 10.3-STABLE first and next to 11.0-STABLE > and jail configuration migrated to /etc/jail.conf. The jail kept intact. > > "service jail start" started the jail successfully > but "service jail restart" fails due to jail being stuck in "dying" state for long time: > "jls" shows no running jails and "jls -d" shows the dying jail. > > How do I know why is it stuck and how to forcebly kill it without reboot of the host? >I have the same problem on a FreeBSD 11.0 I have no specific jail.conf for exec.start is directly calls service cassandra onestart and stop calls the stop of the very same service. jail -f /myconf -r nameofthejail I can see the jail staying in dying mode for multiple minutes even after sockstat -j has been showing no TCP is left at all. No processes are left in the jail I'm mostly clueless on how to debug that and know that the dying jail is waiting on. It is painful as it prevents from exporting the zfs pool the jail was sitting on. Any one has ideas? Best regards, Bapt -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20161031/466cca00/attachment.sig>
01.11.2016 0:21, Baptiste Daroussin wrote:> I can see the jail staying in dying mode for multiple minutes > even after sockstat -j has been showing no TCP is left at all. > > No processes are left in the jailSame here, but not for multuple minutes but multiple days: my dying jail without a process cannot die 6 days already. It was restarted with another JID 6 days ago and its new instance runs just fine but old still dying. This is definitely a regression since 9.3-STABLE that ran "service jail restart" just fine even using fixed JID.