Simon Matter via CentOS wrote:> >> We are seeing a problem that occurs ~5% of the time when rebooting > > I see such issues on a quite large multi user system but when this > happens, after forced restarts for kernel updates, I usually don't have > the time to analyze and play doctor on it. My "solution" now is to simply > reboot the server again in such a case, AKA the systemd way :-) > >> CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just >> after the D-Bus service starts - from 'journalctl -x' : >> >> ... >> Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System >> Message Bus. >> -- Subject: Unit dbus.service has finished start-up >> -- Defined-By: systemd >> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >> -- >> -- Unit dbus.service has finished starting up. >> -- >> -- The start-up result is done. >> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match >> for Disconnected message: Connection timed out >> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize >> D-Bus connection: Connection timed out >> ... >> >> This then has a knock-on effect that causes other services to fail - e.g. >> >> -- Unit gdm.service has begun starting up. >> Jan 21 16:10:39 linux7-7.mpc.local dbus[817]: [system] Activating >> systemd to hand-off: service name='org.freedesktop.login1' >> unit='dbus-org.freedesktop.login1.service' >> Jan 21 16:10:50 linux7-7.mpc.local dbus[817]: [system] Failed to >> activate service 'org.freedesktop.systemd1': timed out >> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to >> enable subscription: Failed to activate service >> 'org.freedesktop.systemd1': timed out >> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to fully >> start up daemon: Connection timed out >> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: systemd-logind.service: >> main process exited, code=exited, status=1/FAILURE >> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: Failed to start Login >> Service. >> -- Subject: Unit systemd-logind.service has failed >> -- Defined-By: systemd >> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >> -- >> -- Unit systemd-logind.service has failed. >> -- >> -- The result is failed. >> >> Whatever the issue is, it appears that polkit might be involved - if we >> restart the polkit service, things appear to return to normal (e.g. gdm >> starts up etc) >> >> We can't find any similar reports of this happening elsewhere with >> CentOS 7.7 - but we were wondering if anyone else had come across a >> problem like this? > > I think the root of the problem is that there are missing definitions in > some of the systemd scripts. They allow things to work in 95% or greater > of the cases but this happens by chance, not because of perfect process > handling and system control. Small delays somewhere or uncommon system > environments then lead to intermittent failures which are difficult to > diagnose - at least for me. > > The good news is that you can just fiddle with the systemd scripts the > same way we fiddled with init scripts in the past. That way you can try > and error until you find a solution. Doesn't sound like being in full > control of things but better than not finding a solution at all.Yeah, we found that by introducing a small delay before the ExecStart in the dbus.service unit - even a delay of just 0.01 seconds (via 'ExecStartPre=/usr/bin/sleep 0.01') _seems_ to workaround the issue ... However, we would still like to know what the issue is and get a 'real' fix - I guess we could try creating a bug report with Redhat ... Thanks James Pearson
> Simon Matter via CentOS wrote: >> >>> We are seeing a problem that occurs ~5% of the time when rebooting >> >> I see such issues on a quite large multi user system but when this >> happens, after forced restarts for kernel updates, I usually don't have >> the time to analyze and play doctor on it. My "solution" now is to >> simply >> reboot the server again in such a case, AKA the systemd way :-) >> >>> CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just >>> after the D-Bus service starts - from 'journalctl -x' : >>> >>> ... >>> Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System >>> Message Bus. >>> -- Subject: Unit dbus.service has finished start-up >>> -- Defined-By: systemd >>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>> -- >>> -- Unit dbus.service has finished starting up. >>> -- >>> -- The start-up result is done. >>> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match >>> for Disconnected message: Connection timed out >>> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize >>> D-Bus connection: Connection timed out >>> ... >>> >>> This then has a knock-on effect that causes other services to fail - >>> e.g. >>> >>> -- Unit gdm.service has begun starting up. >>> Jan 21 16:10:39 linux7-7.mpc.local dbus[817]: [system] Activating >>> systemd to hand-off: service name='org.freedesktop.login1' >>> unit='dbus-org.freedesktop.login1.service' >>> Jan 21 16:10:50 linux7-7.mpc.local dbus[817]: [system] Failed to >>> activate service 'org.freedesktop.systemd1': timed out >>> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to >>> enable subscription: Failed to activate service >>> 'org.freedesktop.systemd1': timed out >>> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to >>> fully >>> start up daemon: Connection timed out >>> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: systemd-logind.service: >>> main process exited, code=exited, status=1/FAILURE >>> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: Failed to start Login >>> Service. >>> -- Subject: Unit systemd-logind.service has failed >>> -- Defined-By: systemd >>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>> -- >>> -- Unit systemd-logind.service has failed. >>> -- >>> -- The result is failed. >>> >>> Whatever the issue is, it appears that polkit might be involved - if we >>> restart the polkit service, things appear to return to normal (e.g. gdm >>> starts up etc) >>> >>> We can't find any similar reports of this happening elsewhere with >>> CentOS 7.7 - but we were wondering if anyone else had come across a >>> problem like this? >> >> I think the root of the problem is that there are missing definitions in >> some of the systemd scripts. They allow things to work in 95% or greater >> of the cases but this happens by chance, not because of perfect process >> handling and system control. Small delays somewhere or uncommon system >> environments then lead to intermittent failures which are difficult to >> diagnose - at least for me. >> >> The good news is that you can just fiddle with the systemd scripts the >> same way we fiddled with init scripts in the past. That way you can try >> and error until you find a solution. Doesn't sound like being in full >> control of things but better than not finding a solution at all. > > Yeah, we found that by introducing a small delay before the ExecStart in > the dbus.service unit - even a delay of just 0.01 seconds (via > 'ExecStartPre=/usr/bin/sleep 0.01') _seems_ to workaround the issue ...Nice that you found at least a workaround. I think I remember that dbus is quite special here because systemd starts it but also depends on it. At least I remember cases where dbus got crazy for whatever reason: the result was that systemd became completely unresponsive and unmanageable and the whole system went down the drain, slowly but steady. Ever tried to shutdown a box if systemd doesn't listen to you anymore? The perfect Windows experience on Linux ;-)> However, we would still like to know what the issue is and get a 'real' > fix - I guess we could try creating a bug report with Redhat ...By bug report you mean BZ or a support request as paying RHEL customer? Unfortunately I'm not too happy anymore with how BZs are handled these days. Am I alone with this feeling? Regards, Simon
Simon Matter wrote:> >> However, we would still like to know what the issue is and get a 'real' >> fix - I guess we could try creating a bug report with Redhat ... > > By bug report you mean BZ or a support request as paying RHEL customer?A BZ ...> Unfortunately I'm not too happy anymore with how BZs are handled these > days. Am I alone with this feeling?I've had mixed results with BZs - it appears if a bug 'tickles the fancy' of someone a Redhat that sees the ticket, then you can get good results - otherwise, they just sit there until the release goes out of support and they get dropped :-) James Pearson
Possibly Parallel Threads
- dbus/systemd failure on startup (CentOS 7.7)
- dbus/systemd failure on startup (CentOS 7.7)
- dbus/systemd failure on startup (CentOS 7.7)
- VM bootup got failed with systemd/dbus error messages
- Centos 7: .tons of messages logs (systemd-logind[663]: Got message type....)