m.roth at 5-cent.us
2017-Feb-13 15:35 UTC
[CentOS] CentOS 7, systemd, NetworkMangler, oh, my
My manager tells me a system in the datacenter is down. I go down there, and plug in a monitor-on-a-stick and keyboard. It's up, but no network. I try systemctl restart NetworkManager several times, and ip a shows *no* change. Finally, I do an ifdown, followed by an ifup, and everything's wonderful. My manager thinks that the NM daemon thinks everything's fine, and there've been no changes, so it does nothing. He suggests that it might have to be stopped, then started, rather than restarted. This is completely unacceptable behavior, since it leave the system with no network connection. Pre-systemd, as we all know, restart *RESTARTED* the damn thing. Is there some Magic (#insert "pixie-dust-sparkles") incantation, either restarting NetworkManager, or using nm-cli, to force it to perform the expected actions? Btw, if this is supposed to be part of the "hide stuff, desktop Linux users don't need to know this stuff", this is a *much* worse result. mark (and yes, my manager's truly aggravated about this, also)
On 13 February 2017 at 15:35, <m.roth at 5-cent.us> wrote:> My manager tells me a system in the datacenter is down. I go down there, > and plug in a monitor-on-a-stick and keyboard. It's up, but no network. I > try systemctl restart NetworkManager several times, and ip a shows *no* > change. > > Finally, I do an ifdown, followed by an ifup, and everything's wonderful. > > My manager thinks that the NM daemon thinks everything's fine, and > there've been no changes, so it does nothing. He suggests that it might > have to be stopped, then started, rather than restarted. > > This is completely unacceptable behavior, since it leave the system with > no network connection. Pre-systemd, as we all know, restart *RESTARTED* > the damn thing. > > Is there some Magic (#insert "pixie-dust-sparkles") incantation, either > restarting NetworkManager, or using nm-cli, to force it to perform the > expected actions? >I'd be interested in the journal from the NetworkManager restart as that's not the way it behaves ... it uses the netlink API to get state and not it's own internal tracker of state (ie doing an ip link down will reflect in nmcli output) ... a restart of NetworkManager should not ignore interfaces but rather bring the system to the on disk configured state ... and a quick check it doesn't override ExecRestart in the unit file to do a reload or similar instead ... And indeed a quick test in a VM shows nmcli device status correctly changing between connected and unavailable when doing ip link set eth0 down/up Do note that on a NM based system ifup and ifdown are effectively aliases to nmcli conn down and nmcli conn up nmcli conn down "connection name" will make it disconnected nmcli conn up "connection mame" will bring it back to connected there is a slight interesting difference between using nmcli and ip link set though ... with ip link set down <interface> the interface is marked administratively down (as if you've pulled the cable) but nmcli conn down "connection name" will unconfigure the interface but leave it in an UP state ... just without an IP address etc anyway that's just an interesting diversion on behavioural differences NM won't change an interface state without some sort of event though (manual or virtual cable pulled etc), and if you have a case where it *has* done that then you have found a bug that would be great to get reported TL;DR: cannot reproduce, need logs to determine what happened without a working crystal ball
peter.winterflood
2017-Feb-13 16:17 UTC
[CentOS] CentOS 7, systemd, NetworkMangler, oh, my
On 13/02/17 15:35, m.roth wrote:> My manager tells me a system in the datacenter is down. I go down there, > and plug in a monitor-on-a-stick and keyboard. It's up, but no network. I > try systemctl restart NetworkManager several times, and ip a shows *no* > change. > > Finally, I do an ifdown, followed by an ifup, and everything's wonderful. > > My manager thinks that the NM daemon thinks everything's fine, and > there've been no changes, so it does nothing. He suggests that it might > have to be stopped, then started, rather than restarted. > > This is completely unacceptable behavior, since it leave the system with > no network connection. Pre-systemd, as we all know, restart *RESTARTED* > the damn thing. > > Is there some Magic (#insert "pixie-dust-sparkles") incantation, either > restarting NetworkManager, or using nm-cli, to force it to perform the > expected actions? > > Btw, if this is supposed to be part of the "hide stuff, desktop Linux > users don't need to know this stuff", this is a *much* worse result. > > mark (and yes, my manager's truly aggravated about this, also) > > > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centosthere's a really good solution to this. yum remove NetworkManager* chkconfig network on service network start and yes thats all under fedora 25, and centos 7. works like a charm. sometimes removing NM leaves resolv.conf pointing to the networkmanager directory, and its best to check this, and replace your resolv.conf link with a file with the correct settings. sorry if this upsets the people who maintain network mangler, but its inappropriate on a server. regards peter
On 13 February 2017 at 16:17, peter.winterflood <peter.winterflood at ossi.co.uk> wrote:> > > > there's a really good solution to this. > > yum remove NetworkManager* > > chkconfig network on > > service network start > > and yes thats all under fedora 25, and centos 7. > > works like a charm. > > sometimes removing NM leaves resolv.conf pointing to the networkmanager > directory, and its best to check this, and replace your resolv.conf link > with a file with the correct settings. > > sorry if this upsets the people who maintain network mangler, but its > inappropriate on a server. > >This is terribly bad advice I'm afraid ... https://access.redhat.com/solutions/783533 The legacy network service is a fragile compilation of shell scripts (which is why certain changes like some bonding or tagging alterations require a full system restart or very careful unpicking manually with ip) and is effectively deprecated in RHEL at this time due to major bug fixes only but no feature work. You really should have a read through this as well: https://www.hogarthuk.com/?q=node/8 On EL6 yes NM should be removed on anything but a wifi system but on EL7 unless you fall into a specific edge case as per the network docs: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Networking_Guide/index.html you really should be using NM for a variety of reasons. Incidentally Mark, this had nothing to do with systemd ... I wish you would pick your topics a little more appropriately rather than tempting the usual flames.
On 02/13/2017 07:35 AM, m.roth at 5-cent.us wrote:> Finally, I do an ifdown, followed by an ifup, and everything's wonderful.What's in /etc/sysconfig/network-scripts/ifcfg-<interface>? Does it say NM_CONTROLLED=no?> My manager thinks that the NM daemon thinks everything's fine, and > there've been no changes, so it does nothing. He suggests that it might > have to be stopped, then started, rather than restarted."systemctl restart NetworkManager" completely stops the service and starts it again.> This is completely unacceptable behavior, since it leave the system with > no network connection. Pre-systemd, as we all know, restart *RESTARTED* > the damn thing.Still does.
On 02/13/2017 11:15 AM, Gordon Messmer wrote:> On 02/13/2017 07:35 AM, m.roth at 5-cent.us wrote: >> Finally, I do an ifdown, followed by an ifup, and everything's wonderful. > > What's in /etc/sysconfig/network-scripts/ifcfg-<interface>? Does it say > NM_CONTROLLED=no?or onboot=no <snip> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos/attachments/20170213/7495ce7d/attachment-0001.sig>
m.roth at 5-cent.us
2017-Feb-13 18:29 UTC
[CentOS] CentOS 7, systemd, NetworkMangler, oh, my
James Hogarth wrote:> On 13 February 2017 at 15:35, <m.roth at 5-cent.us> wrote: >> My manager tells me a system in the datacenter is down. I go down there, >> and plug in a monitor-on-a-stick and keyboard. It's up, but no network. >> I >> try systemctl restart NetworkManager several times, and ip a shows *no* >> change. >> >> Finally, I do an ifdown, followed by an ifup, and everything's >> wonderful. >> >> My manager thinks that the NM daemon thinks everything's fine, and >> there've been no changes, so it does nothing. He suggests that it might >> have to be stopped, then started, rather than restarted. >> >> This is completely unacceptable behavior, since it leave the system with >> no network connection. Pre-systemd, as we all know, restart *RESTARTED* >> the damn thing. >> >> Is there some Magic (#insert "pixie-dust-sparkles") incantation, either >> restarting NetworkManager, or using nm-cli, to force it to perform the >> expected actions? >> > > > I'd be interested in the journal from the NetworkManager restart as > that's not the way it behaves ... it uses the netlink API to get state > and not it's own internal tracker of state (ie doing an ip link down > will reflect in nmcli output) ... a restart of NetworkManager should > not ignore interfaces but rather bring the system to the on disk > configured state ... and a quick check it doesn't override ExecRestart > in the unit file to do a reload or similar instead ... > > And indeed a quick test in a VM shows nmcli device status correctly > changing between connected and unavailable when doing ip link set eth0 > down/up > > Do note that on a NM based system ifup and ifdown are effectively > aliases to nmcli conn down and nmcli conn up > > nmcli conn down "connection name" will make it disconnected > nmcli conn up "connection mame" will bring it back to connected > > there is a slight interesting difference between using nmcli and ip > link set though ... > > with ip link set down <interface> the interface is marked > administratively down (as if you've pulled the cable) but nmcli conn > down "connection name" will unconfigure the interface but leave it in > an UP state ... just without an IP address etc > > anyway that's just an interesting diversion on behavioural differences > > NM won't change an interface state without some sort of event though > (manual or virtual cable pulled etc), and if you have a case where it > *has* done that then you have found a bug that would be great to get > reported > > TL;DR: cannot reproduce, need logs to determine what happened without > a working crystal ball>From journalctl, I see this happening when I do systemctl restartNetworkManager (much edited) Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7755] manager: (em1): new Ethernet device (/org/freedesktop/NetworkManager/Devi Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7791] ifcfg-rh: add connection in-memory (79d3ed9d-cc41-498c-9169-44320e332f68, Feb 13 09:47:52 <servername> systemd[1]: Started Hostname Service. Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7797] device (em1): state change: unmanaged -> unavailable (reason 'connection- Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7805] device (em1): state change: unavailable -> disconnected (reason 'connecti <...> eb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7986] device (em1): state change: disconnected -> prepare (reason 'none') [30 4 Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7999] policy: set 'em1' (em1) as default for IPv6 routing and DNS Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.8027] device (em1): state change: prepare -> config (reason 'none') [40 50 0] Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.8034] device (em1): state change: config -> ip-config (reason 'none') [50 70 0] Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3594] device (em1): state change: ip-config -> ip-check (reason 'none') [70 80 Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3661] device (em1): state change: ip-check -> secondaries (reason 'none') [80 9 Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3666] device (em1): state change: secondaries -> activated (reason 'none') [90 Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3667] manager: NetworkManager state is now CONNECTED_GLOBAL Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3670] manager: NetworkManager state is now CONNECTED_SITE Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3670] manager: NetworkManager state is now CONNECTED_GLOBAL Feb 13 09:47:53 <servername> nm-dispatcher[67317]: req:2 'connectivity-change': new request (6 scripts) Feb 13 09:47:53 <servername> nm-dispatcher[67317]: req:2 'connectivity-change': start running ordered scripts... Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3697] device (em1): Activation: successful, device activated. Note there is no IP address being obtained. Now, when I run ifdown/ifup: Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6804] device (em1): Activation: starting connection 'em1' (c432eaa1-023b-4f1f-a Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6809] audit: op="connection-activate" uuid="c432eaa1-023b-4f1f-a7b5-4605ec07195 Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6810] device (em1): state change: disconnected -> prepare (reason 'none') [30 4 Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6811] manager: NetworkManager state is now CONNECTING Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6816] device (em1): state change: prepare -> config (reason 'none') [40 50 0] Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6858] device (em1): state change: config -> ip-config (reason 'none') [50 70 0] Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6869] dhcp4 (em1): activation: beginning transaction (timeout in 45 seconds) Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6900] dhcp4 (em1): dhclient started with pid 67715 Feb 13 09:48:17 <servername> dhclient[67715]: DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 6 (xid=0x745ba623) Feb 13 09:48:17 <servername> dhclient[67715]: DHCPREQUEST on em1 to 255.255.255.255 port 67 (xid=0x745ba623) Feb 13 09:48:17 <servername> dhclient[67715]: DHCPOFFER from <DHCP server> Feb 13 09:48:17 <servername> dhclient[67715]: DHCPACK from <DHCP server> (xid=0x745ba623) And it then gets an IP address. And looking at /var/log/messages, it *appears* that the restart never invokes the dhclient script, while ifup does. mark
m.roth at 5-cent.us
2017-Feb-13 18:31 UTC
[CentOS] CentOS 7, systemd, NetworkMangler, oh, my
peter.winterflood wrote:> On 13/02/17 15:35, m.roth wrote: >> My manager tells me a system in the datacenter is down. I go down there, >> and plug in a monitor-on-a-stick and keyboard. It's up, but no network. >> I try systemctl restart NetworkManager several times, and ip a shows *no* >> change. >> >> Finally, I do an ifdown, followed by an ifup, and everything's >> wonderful. >> >> My manager thinks that the NM daemon thinks everything's fine, and >> there've been no changes, so it does nothing. He suggests that it might >> have to be stopped, then started, rather than restarted. >> >> This is completely unacceptable behavior, since it leave the system with >> no network connection. Pre-systemd, as we all know, restart *RESTARTED* >> the damn thing. >> >> Is there some Magic (#insert "pixie-dust-sparkles") incantation, either >> restarting NetworkManager, or using nm-cli, to force it to perform the >> expected actions? >> >> Btw, if this is supposed to be part of the "hide stuff, desktop Linux >> users don't need to know this stuff", this is a *much* worse result. >> >> mark (and yes, my manager's truly aggravated about this, also) > > there's a really good solution to this. > > yum remove NetworkManager* > > chkconfig network on > > service network start > > and yes thats all under fedora 25, and centos 7. > > works like a charm. > > sometimes removing NM leaves resolv.conf pointing to the networkmanager > directory, and its best to check this, and replace your resolv.conf link > with a file with the correct settings. > > sorry if this upsets the people who maintain network mangler, but its > inappropriate on a server. >That't'd be a 100% agreement, good buddy.... We may have done it on some systems, but in general, we appear to be stuck with the damn thing. And why the *hell* would a server want wifi enabled, or avahi-daemon running by default? mark
m.roth at 5-cent.us
2017-Feb-13 18:35 UTC
[CentOS] CentOS 7, systemd, NetworkMangler, oh, my
Gordon Messmer wrote:> On 02/13/2017 07:35 AM, m.roth at 5-cent.us wrote: >> Finally, I do an ifdown, followed by an ifup, and everything's >> wonderful. > > What's in /etc/sysconfig/network-scripts/ifcfg-<interface>? Does it say > NM_CONTROLLED=no? >Good catch. No, it doesn't say no... because the line was commented out. I've just uncommented it, and set it to yes. <snip> mark