thr3ads.net - CentOS - [CentOS] systemd and 'Stale file handle' errors? [May 2021]

If this information is useful, please help other people find it:
Share via:

James Pearson

2021-May-13 14:15 UTC

[CentOS] systemd and 'Stale file handle' errors?

I have a CentOS 7 system where I needed to restart chronyd - but the systemctl
restart failed with the error:

 systemd[1]: Starting NTP client/server...
 systemd[43578]: Failed at step NAMESPACE spawning /usr/sbin/chronyd: Stale file
handle
 systemd[1]: chronyd.service: control process exited, code=exited status=226

Turns out there are a couple of Stale NFS file handles from fuse mounts (related
to gvfsd) of sub directories under an NFS mounted home directory server - but
the home directory for the user in this case, no longer exist (user has left)

However, I have no idea why these 'Stale file handles' prevent a service
being started by systemd ?

In this case, chronyd has nothing to do with NFS mounted user home directories -
so shouldn't really care ?

I have tried everything I can think of to clear these stale mounts, but with no
luck

Does anyone know why systemd complains about unconnected 'Stale file
handles' - and is there any way I can tell systemctl to start a service
regardless of these 'errors' ?

Rebooting the host will be a last resort (the system is used by many users) -
but in the meantime, I've manually started the /usr/sbin/chronyd binary
directly, which runs fine

Thanks

James Pearson

Simon Matter

2021-May-14 10:44 UTC

head link

[CentOS] systemd and 'Stale file handle' errors?

> I have a CentOS 7 system where I needed to restart chronyd - but the
> systemctl restart failed with the error:
>
>  systemd[1]: Starting NTP client/server...
>  systemd[43578]: Failed at step NAMESPACE spawning /usr/sbin/chronyd:
> Stale file handle
>  systemd[1]: chronyd.service: control process exited, code=exited
> status=226
>
> Turns out there are a couple of Stale NFS file handles from fuse mounts
> (related to gvfsd) of sub directories under an NFS mounted home directory
> server - but the home directory for the user in this case, no longer exist
> (user has left)
>
> However, I have no idea why these 'Stale file handles' prevent a
service
> being started by systemd ?
>
> In this case, chronyd has nothing to do with NFS mounted user home
> directories - so shouldn't really care ?
>
> I have tried everything I can think of to clear these stale mounts, but
> with no luck
>
> Does anyone know why systemd complains about unconnected 'Stale file
> handles' - and is there any way I can tell systemctl to start a service
> regardless of these 'errors' ?
>
> Rebooting the host will be a last resort (the system is used by many
> users) - but in the meantime, I've manually started the
/usr/sbin/chronyd
> binary directly, which runs fine
We're running large multi user systems with desktop sessions on Red Hat
based systems for decades but it became increasingly painful after EL6
with the introduction of systemd in EL7. It may have improved the user
experience on developers laptops but for our use case things are worse
today...

Regards,
Simon

Jonathan Billings

2021-May-14 12:47 UTC

head link

[CentOS] systemd and 'Stale file handle' errors?

On Thu, May 13, 2021 at 02:15:15PM +0000, James Pearson
wrote:>
> I have a CentOS 7 system where I needed to restart chronyd - but the
> systemctl restart failed with the error: 
> 
>  systemd[1]: Starting NTP client/server...
>  systemd[43578]: Failed at step NAMESPACE spawning /usr/sbin/chronyd: Stale
file handle
>  systemd[1]: chronyd.service: control process exited, code=exited
status=226
> 
> Turns out there are a couple of Stale NFS file handles from fuse
> mounts (related to gvfsd) of sub directories under an NFS mounted
> home directory server - but the home directory for the user in this
> case, no longer exist (user has left) 
> 
> However, I have no idea why these 'Stale file handles' prevent a
> service being started by systemd ? 
> 
> In this case, chronyd has nothing to do with NFS mounted user home
> directories - so shouldn't really care ? 
> 
> I have tried everything I can think of to clear these stale mounts,
> but with no luck 
> 
> Does anyone know why systemd complains about unconnected 'Stale file
> handles' - and is there any way I can tell systemctl to start a
> service regardless of these 'errors' ? 
> 
> Rebooting the host will be a last resort (the system is used by many
> users) - but in the meantime, I've manually started the
> /usr/sbin/chronyd binary directly, which runs fine 
So, the chronyd systemd unit looks like this:

    # /usr/lib/systemd/system/chronyd.service
    [Unit]
    Description=NTP client/server
    Documentation=man:chronyd(8) man:chrony.conf(5)
    After=ntpdate.service sntp.service ntpd.service
    Conflicts=ntpd.service systemd-timesyncd.service
    ConditionCapability=CAP_SYS_TIME

    [Service]
    Type=forking
    PIDFile=/var/run/chrony/chronyd.pid
    EnvironmentFile=-/etc/sysconfig/chronyd
    ExecStart=/usr/sbin/chronyd $OPTIONS
    ExecStartPost=/usr/libexec/chrony-helper update-daemon
    PrivateTmp=yes
    ProtectHome=yes
    ProtectSystem=full

    [Install]
    WantedBy=multi-user.target

So, you'll notice there are "ProtectHome=yes" and
"ProtectSystem=yes"
settings in the Service section.  This sets up a private namespace for
the systemd unit so /home, /root and /run/user are made inaccessible
and empty (ProtectHome), and /usr, /boot and /etc are read-only
(ProtectSystem).  It does this to reduce the ability of a malicious
NTP server attacking the system through bogus NTP traffic (which is a
real thing that can happen).  Many systemd services limit their
processes this way.

I suspect that is why you're seeing stale file handle errors, the
kernel can't set up the namespace for directories that are now stale
on the system.

You can probably just do a lazy unmount (umount -l) to make them go
away until you reboot.  You can also disable the namespaced
directories by doing a 'systemctl edit chronyd.service' and setting
the options to 'off', but you'll be reducing the security of your
system.

We've seen some weird stuff in the past related to this feature.  For
example, I couldn't unmount /home because a service with
ProtectHome=read-only was running (cups), and 'fuser' and 'lsof'
didn't show anything was using it.  It's because the kernel namespace
stuff operates as a mountpoint, so it's all kernel.  Another fun issue
I discovered is that we had some locally-developed services that used
files in /tmp as a communication channel, and with PrivateTmp=yes set,
they no longer could communicate.  So it forced us to actually do the
right thing and use more appropriate methods.

It is kinda confusing but I do appreciate that I now have a lot of
ways I can now lock down services beyond simple UNIX
permissions. systemd is a rather neat init system.  My complaints with
it usually are with the parts that reach outside of being an init
system (I'm looking at you, systemd-logind and systemd-resolved).

-- 
Jonathan Billings <billings at negate.org>

CentOS - May 2021 - systemd and 'Stale file handle' errors?

[CentOS] systemd and 'Stale file handle' errors?

[CentOS] systemd and 'Stale file handle' errors?

[CentOS] systemd and 'Stale file handle' errors?