Andy Smith
2016-Apr-17 01:21 UTC
[Pkg-xen-devel] Bug#821254: systemd[1]: xendomains.service start operation timed out.
Package: xen-utils-common Version: 4.4.1-9+deb8u4 Severity: normal Dear Maintainer, I have a server with a large number of domUs set to auto-start. For the first time I have booted it with all of them needing to start from cold, but the xendomains service only got part way through. syslog showed nothing notable about the domains starting? Apr 16 14:57:45 snaps xendomains[4631]: Starting Xen domain lima (from /etc/xen/auto/010-lima.conf)...done. ?until? Apr 16 15:02:36 sierra xendomains[4631]: Starting Xen domain juliet (from /etc/xen/auto/627-juliet.conf)...done. Apr 16 15:02:37 sierra kernel: [ 341.269174] xen-blkback:ring-ref 8, event-channel 15, protocol 2 (x86_32-abi) Apr 16 15:02:37 sierra kernel: [ 341.367307] xen-blkback:ring-ref 9, event-channel 16, protocol 2 (x86_32-abi) Apr 16 15:02:38 sierra kernel: [ 341.437187] vif vif-51-0 v-juliet: Guest Rx ready Apr 16 15:02:38 sierra kernel: [ 341.437429] IPv6: ADDRCONF(NETDEV_CHANGE): v-juliet: link becomes ready Apr 16 15:02:40 sierra systemd[1]: xendomains.service start operation timed out. Terminating. Apr 16 15:02:40 sierra systemd[1]: Failed to start LSB: Start/stop secondary xen domains. Apr 16 15:02:40 sierra systemd[1]: Unit xendomains.service entered failed state. That the 51st domain, around 60% of the way through the list of domains it should have started. Once I'd realised only some of the domains were started I reran "service xendomains start" and it finished the job. So, is there a built in timeout of ~5 minutes here that I need to increase? I see that the generated /run/systemd/generator.late/xendomains.service file contains: . . [Service] Type=forking Restart=no TimeoutSec=5min . . So that's probably what is being hit, but I cannot work out how to make the generator apply a longer timeout. Any hints would be appreciated. Cheers, Andy -- System Information: Debian Release: 8.4 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/16 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages xen-utils-common depends on: ii lsb-base 4.1+Debian13+nmu1 ii python 2.7.9-1 ii ucf 3.0030 ii udev 215-17+deb8u4 ii xenstore-utils 4.4.1-9+deb8u4 xen-utils-common recommends no packages. xen-utils-common suggests no packages. -- Configuration Files: /etc/default/xendomains changed [not included] /etc/xen/scripts/vif-route changed [not included] /etc/xen/xl.conf changed [not included] -- no debconf information
Hans van Kranenburg
2019-Feb-02 22:24 UTC
[Pkg-xen-devel] Bug#821254: systemd[1]: xendomains.service start operation timed out.
Hi Andy, Just to set expectations... Ian is not using systemd at all, and for me, the current whatever init script stuff there is does its thing for my usecase at work. I don't use xendomains, I use live migrate to drain physical servers so I can reboot / upgrade / whatever them without any need to hurry. TBH, all the extra-time I had for working on Debian/Xen in the last months was eaten by getting things fixed for myself, like live migration bugs. Yesterday, we spend the day working on the Buster TODO list, and in the beginning of the day we identified "init scripts and systemd" as the main topic of the day. However, when starting to look into that, it quickly became clear that "just" merging the debian and upstream init scripts is not a trivial operation (it needs discussion with the redhat-based users). When working on actually shipping systemd units we'd really need to have a group of users that want to actively help testing everything. Downgrade, upgrade, try to break it etc... For buster, there will be a notice in the "known-issues" section of README.Debian about this issue. If you have an idea about how to change this timeout then please share. Don't wait for it to magically happen. :) Hans
Andy Smith
2019-Feb-02 22:49 UTC
[Pkg-xen-devel] Bug#821254: systemd[1]: xendomains.service start operation timed out.
Hi Hans, On Sat, Feb 02, 2019 at 11:24:36PM +0100, Hans van Kranenburg wrote:> When working on actually shipping systemd units we'd really need > to have a group of users that want to actively help testing > everything. Downgrade, upgrade, try to break it etc...I actually ended up going from Debian-packaged 4.4.x to Mark Pryor's Debian packages because I needed to upgrade version and patch some XSAs during embargo. At the time there wasn't much going on with the Debian packaging and I didn't feel confident to do it myself, so I based things off of Mark's work. I used that as a basis for 4.8.x and now 4.10.x packages. Now that you are helping with Debian packaging I would like to come back to Debian's packages, probably along with an upgrade to buster. The systemd stuff from those packages of Mark's did solve this problem though. I assume this is upstream content. I think in 4.4 it was generating a systemd service from an init script, whereas now it's a native systemd service. Here's /lib/systemd/system/xendomains.service: [Unit] Description=Xendomains - start and stop guests on boot and shutdown Requires=proc-xen.mount xenstored.service After=proc-xen.mount xenstored.service xenconsoled.service xen-init-dom0.service After=network-online.target After=remote-fs.target ConditionPathExists=/proc/xen/capabilities Conflicts=libvirtd.service [Service] Type=oneshot RemainAfterExit=true ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities ExecStart=-/usr/lib/xen-4.10/bin/xendomains start ExecStop=/usr/lib/xen-4.10/bin/xendomains stop ExecReload=/usr/lib/xen-4.10/bin/xendomains restart [Install] WantedBy=multi-user.target Those packages came from: http://107.185.106.30/xen/debian/stretch-nmu/4ax/ (plus the XSAs published since then) Would it be helpful if I installed buster and xen-hypervisor-4.11-amd64 and checked how the systemd unit files cope with trying to start 75 or so domains? If so I will try to find some time to try that, Cheers, Andy
Hans van Kranenburg
2019-Feb-03 08:52 UTC
[Pkg-xen-devel] Bug#821254: systemd[1]: xendomains.service start operation timed out.
Hi, On 2/2/19 11:49 PM, Andy Smith wrote:> > On Sat, Feb 02, 2019 at 11:24:36PM +0100, Hans van Kranenburg wrote: >> When working on actually shipping systemd units we'd really need >> to have a group of users that want to actively help testing >> everything. Downgrade, upgrade, try to break it etc... > > I actually ended up going from Debian-packaged 4.4.x to Mark Pryor's > Debian packages because I needed to upgrade version and patch some > XSAs during embargo. At the time there wasn't much going on with the > Debian packaging and I didn't feel confident to do it myself, so I > based things off of Mark's work. > > I used that as a basis for 4.8.x and now 4.10.x packages. Now that > you are helping with Debian packaging I would like to come back to > Debian's packages, probably along with an upgrade to buster. > > The systemd stuff from those packages of Mark's did solve this > problem though. I assume this is upstream content.Yes, it is, and this made me just realize that this means that you've been running the end result of what would happen when we would actually add the systemd stuff to the packaging, already, for quite some time. That's great, because I guess that already answers most of the "will these things do the right thing out of the box?" uncertainty.> I think in 4.4 it > was generating a systemd service from an init script, whereas now > it's a native systemd service. Here's /lib/systemd/system/xendomains.service: > > [Unit] > Description=Xendomains - start and stop guests on boot and shutdown > Requires=proc-xen.mount xenstored.service > After=proc-xen.mount xenstored.service xenconsoled.service > xen-init-dom0.service > After=network-online.target > After=remote-fs.target > ConditionPathExists=/proc/xen/capabilities > Conflicts=libvirtd.service > > [Service] > Type=oneshot > RemainAfterExit=true > ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities > ExecStart=-/usr/lib/xen-4.10/bin/xendomains start > ExecStop=/usr/lib/xen-4.10/bin/xendomains stop > ExecReload=/usr/lib/xen-4.10/bin/xendomains restart > > [Install] > WantedBy=multi-user.targetYup, https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/systemd/xendomains.service.in> Those packages came from: > > http://107.185.106.30/xen/debian/stretch-nmu/4ax/ > > (plus the XSAs published since then) > > Would it be helpful if I installed buster and > xen-hypervisor-4.11-amd64 and checked how the systemd unit files > cope with trying to start 75 or so domains? If so I will try to find > some time to try that,Absolutely. First option would be to find out who/what decides there should be a 5 minute timeout. But, other option is to upgrade a box to the buster 4.11 packages and then just put the systemd things from tools/hotplug/Linux/systemd in place and test what happens. This might be doable for Buster after all (...there's also 22 other items still on the TODO). TBH, I'm not an expert at all in this area, I never figured out yet how all these systemd<->init-script compatibility layers work yet. Hans
Martin Maney
2020-Jan-03 16:42 UTC
[Pkg-xen-devel] Bug#821254: systemd[1]: xendomains.service start operation timed out.
First, an answer that I happen to have handy to Hans's question from Feb 2019: "TBH, I'm not an expert at all in this area, I never figured out yet how all these systemd<->init-script compatibility layers work yet." Neither am I an expert, and I'd really prefer not needing to become one, but from what I just read in the systemd-sysv-generator man page, the answer would have to be "poorly, in general". I was specifically looking into a timeout on shutdown, and it's problematic at least in part because the generator does not process the $syslog item from the LSB header, so the hang happens in a black hole. The last visible message other than the truncated one about the "LSB stop job ... xendomains" on the console was one that seemed to be about the end of block device availability, which would account for the shutdown hang in xendomains very handily. The screen was apparently cleared just before that, which I guess is systemd being "helpful". As opposed to many things with which it actually is helpful <sigh>. Yes, the shutdown hang is a different issue, but I'm going to hope that the real systemd units mentioned in this bug will fix my problem, too. -- As economics is known as The Miserable Science, software engineering should be known as The Doomed Discipline, doomed because it cannot even approach its goal since its goal is self-contradictory. -- Edsger Dijkstra
Hans van Kranenburg
2020-Jan-06 20:53 UTC
[Pkg-xen-devel] Bug#821254: systemd[1]: xendomains.service start operation timed out.
Hi, On 1/3/20 5:42 PM, Martin Maney wrote:> > [...] > > Yes, the shutdown hang is a different issue, but I'm going to hope that > the real systemd units mentioned in this bug will fix my problem, too.What you could do already now is try testing those scripts, just shutting down and starting up the domUs, without actually rebooting the machine. By doing so we can learn if we could use them as a drop in replacement or not. The xendomains init script that we have in Debian is: https://salsa.debian.org/xen-team/debian-xen/blob/master/debian/xen-utils-common.xendomains.init The upstream one (which is quite a bit different) is: https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/xendomains.in Or, it seems that last one gets installed in a location for helper scripts and it's just called from both the init.d script and the systemd service: https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/init.d/xendomains.in https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/systemd/xendomains.service.in It would be really helpful if you would want to spend some time on this. Speaking for myself, I either deal with clusters and using live migrate to empty a server before shutting it down, or otherwise I rather have my own way to carefully shut down things before typing a reboot command, combined with a molly-guard script to prevent accidental reboots while something is still running. That way there's still an option to debug/salvage a misbehaving domU before shutdown. Hans