Laszlo Ersek
2022-Jun-09 13:53 UTC
[Libguestfs] [v2v PATCH 4/4] convert_linux: install the QEMU guest agent with a firstboot script
On 06/09/22 15:10, Laszlo Ersek wrote:> nm-online -x -q || > ( > systemctl -q is-active systemd-networkd && > /usr/lib/systemd/systemd-networkd-wait-online -q --timeout=30 > ) > > If the final exit status is nonzero, I think that's not a problem for > the firstboot script. > > The whole command seems to make sense also when nm-online, systemctl, or > both, are missing (a missing command results in exit status 127).Unfortunately, this does not work. Even though the command succeeds very quickly (implying network is available), the DNF throws a CURL error that it cannot access the repo (DNS resolution failure). So basically "nm-online" lies. Compare: https://bugzilla.redhat.com/show_bug.cgi?id=1482476#c4> My current workaround is to use "/usr/bin/sleep 60" instead of the > "/usr/bin/nm-online -s -q --timeout=30" command in the service unit > file.:/ NB, when I log in at the root prompt of the just-firstbooted-guest (on tty1), and run "nm-online -x -q", it succeeds alright, and I can even ssh into the guest... (more precisely, I get an ssh password prompt; I can't actually log in as root via ssh due to the default sshd config not permitting that). We can't rely on "nm-online" if it doesn't do its advertized job! Laszlo
Richard W.M. Jones
2022-Jun-09 14:11 UTC
[Libguestfs] [v2v PATCH 4/4] convert_linux: install the QEMU guest agent with a firstboot script
On Thu, Jun 09, 2022 at 03:53:39PM +0200, Laszlo Ersek wrote:> On 06/09/22 15:10, Laszlo Ersek wrote: > > > nm-online -x -q || > > ( > > systemctl -q is-active systemd-networkd && > > /usr/lib/systemd/systemd-networkd-wait-online -q --timeout=30 > > ) > > > > If the final exit status is nonzero, I think that's not a problem for > > the firstboot script. > > > > The whole command seems to make sense also when nm-online, systemctl, or > > both, are missing (a missing command results in exit status 127). > > Unfortunately, this does not work. Even though the command succeeds very > quickly (implying network is available), the DNF throws a CURL error > that it cannot access the repo (DNS resolution failure). So basically > "nm-online" lies.That's annoying. Thinking through this - it's a real guest boot (not libguestfs appliance). The network is up at some point possibly early in the boot. Could be some DNS resolver not yet started? (systemd-resolved, nscd, unbound, ...)> Compare: > > https://bugzilla.redhat.com/show_bug.cgi?id=1482476#c4 > > > My current workaround is to use "/usr/bin/sleep 60" instead of the > > "/usr/bin/nm-online -s -q --timeout=30" command in the service unit > > file. > > :/ > > NB, when I log in at the root prompt of the just-firstbooted-guest (on > tty1), and run "nm-online -x -q", it succeeds alright, and I can even > ssh into the guest... (more precisely, I get an ssh password prompt; I > can't actually log in as root via ssh due to the default sshd config not > permitting that). > > We can't rely on "nm-online" if it doesn't do its advertized job!Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Laszlo Ersek
2022-Jun-13 16:49 UTC
[Libguestfs] [v2v PATCH 4/4] convert_linux: install the QEMU guest agent with a firstboot script
On 06/09/22 15:53, Laszlo Ersek wrote:> We can't rely on "nm-online" if it doesn't do its advertized job!I've checked the nm-online source code, and various other stuff in the NetworkManager source tree. nm-online is *completely unusable* for its stated purpose. The source code made my hair stand, and the nm-online manual is incomprehensible / misleading on top of *that*. NetworkManager has this weird state machine where NetworkManager as a whole can be in some state (not running, running but disconnected, connecting, connected, etc); then, assuming it is "connected", there are still three kinds of connectivy (local, site, full). (In case NetworkManager is not connected, there's a fourth, technical, connectivity, called "none".) What we need is clearly full (aka global) connectivity. The only way I've found for checking *that* is to call nmcli in a shell loop with the appropriate incantation, and to sleep 1 second after each iteration. This works reliably; I'll post v2 soon. Really you couldn't invent a less intuitive state machine *AND* document it as badly. When googling nm-online, I've found bug reports and forum posts going back for a decade, complaining that nm-online does not do what it says on the tin. Thanks Laszlo