mxs kolo
2017-Oct-18 18:00 UTC
[libvirt-users] Can we disable write to /sys/fs/cgroup tree inside container ?
Hi all Each lxc container on node have mounted tmpfs for cgroups tree: [root-inside-lxc@tst1 ~]# mount | grep cgroups cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) It's by default, at least in my case. Problem is, that it's full cgroups tree - from hardware node and from all another containers on node. [root-inside-lxc@tst1 ~]# for i in `ls /sys/fs/cgroup/devices/machine.slice/machine-lxc*/devices.list`; do echo $i; cat $i; done /sys/fs/cgroup/devices/machine.slice/machine-lxc\x2d10297\x2dtst2.scope/devices.list c 1:3 rwm c 1:5 rwm c 1:7 rwm c 1:8 rwm c 1:9 rwm c 5:0 rwm c 5:2 rwm c 10:229 rwm b 253:6 rw c 136:* rwm /sys/fs/cgroup/devices/machine.slice/machine-lxc\x2d9951\x2dtst1.scope/devices.list c 1:3 rwm c 1:5 rwm c 1:7 rwm c 1:8 rwm c 1:9 rwm c 5:0 rwm c 5:2 rwm c 10:229 rwm b 253:7 rw c 136:* rwm Hardware node file, view inside tst1 container: [root-inside-lxc@tst1 ~]# cat /sys/fs/cgroup/devices/devices.list a *:* rwm What is best way to prevent viewing and editing of all cgroups structures except belonging to current lxc container (selinux, apparmor ) ? Why libvirt mount /sys/fs/cgroup/* inside container as rw ? We use kernel 3.10.0-693.2.2.el7.x86_64 and XFS and therefore our containers are privileged. Yes, we know that in such containers root can use SysRq at least for reboot hardware node. But problem with cgroups can be more hidden and cryptic. p.s. As show short test, root user can disable device zero on node [root-lxc@tst1 ~]# echo "c 1:5 rwm" > /sys/fs/cgroup/devices/devices.deny or all devices in another container [root-lxc@tst1 ~]# echo "a *:* rwm" > /sys/fs/cgroup/devices/machine.slice/machine-lxc\x2d10297\x2dtst2.scope/devices.deny b.r. Maxim Kozin
mxs kolo
2017-Oct-18 18:05 UTC
Re: [libvirt-users] Can we disable write to /sys/fs/cgroup tree inside container ?
> Why libvirt mount /sys/fs/cgroup/* inside container as rw ? > > We use kernel 3.10.0-693.2.2.el7.x86_64 and XFS and therefore our > containers are privileged. Yes, we know that in such containers root > can use SysRq at least for reboot hardware node. But problem with > cgroups can be more hidden and cryptic.p.s.2 we still use libvirt-3.0.0, if it's important.
Daniel P. Berrange
2017-Oct-18 18:09 UTC
Re: [libvirt-users] Can we disable write to /sys/fs/cgroup tree inside container ?
On Wed, Oct 18, 2017 at 09:00:17PM +0300, mxs kolo wrote:> Hi all > > Each lxc container on node have mounted tmpfs for cgroups tree: > [root-inside-lxc@tst1 ~]# mount | grep cgroups > cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup > (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) > cgroup on /sys/fs/cgroup/cpuset type cgroup > (rw,nosuid,nodev,noexec,relatime,cpuset) > cgroup on /sys/fs/cgroup/memory type cgroup > (rw,nosuid,nodev,noexec,relatime,memory) > cgroup on /sys/fs/cgroup/devices type cgroup > (rw,nosuid,nodev,noexec,relatime,devices) > cgroup on /sys/fs/cgroup/freezer type cgroup > (rw,nosuid,nodev,noexec,relatime,freezer) > cgroup on /sys/fs/cgroup/blkio type cgroup > (rw,nosuid,nodev,noexec,relatime,blkio) > cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup > (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls) > cgroup on /sys/fs/cgroup/perf_event type cgroup > (rw,nosuid,nodev,noexec,relatime,perf_event) > cgroup on /sys/fs/cgroup/systemd type cgroup > (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) > cgroup on /sys/fs/cgroup/hugetlb type cgroup > (rw,nosuid,nodev,noexec,relatime,hugetlb) > cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) > > It's by default, at least in my case. > Problem is, that it's full cgroups tree - from hardware node and from > all another containers on node. > [root-inside-lxc@tst1 ~]# for i in `ls > /sys/fs/cgroup/devices/machine.slice/machine-lxc*/devices.list`; do > echo $i; cat $i; done > /sys/fs/cgroup/devices/machine.slice/machine-lxc\x2d10297\x2dtst2.scope/devices.list > c 1:3 rwm > c 1:5 rwm > c 1:7 rwm > c 1:8 rwm > c 1:9 rwm > c 5:0 rwm > c 5:2 rwm > c 10:229 rwm > b 253:6 rw > c 136:* rwm > /sys/fs/cgroup/devices/machine.slice/machine-lxc\x2d9951\x2dtst1.scope/devices.list > c 1:3 rwm > c 1:5 rwm > c 1:7 rwm > c 1:8 rwm > c 1:9 rwm > c 5:0 rwm > c 5:2 rwm > c 10:229 rwm > b 253:7 rw > c 136:* rwm > > Hardware node file, view inside tst1 container: > [root-inside-lxc@tst1 ~]# cat /sys/fs/cgroup/devices/devices.list > a *:* rwm > > What is best way to prevent viewing and editing of all cgroups > structures except belonging to current lxc container (selinux, > apparmor ) ? > Why libvirt mount /sys/fs/cgroup/* inside container as rw ? > > We use kernel 3.10.0-693.2.2.el7.x86_64 and XFS and therefore our > containers are privileged. Yes, we know that in such containers root > can use SysRq at least for reboot hardware node. But problem with > cgroups can be more hidden and cryptic. > > p.s. > As show short test, root user can disable device zero on node > [root-lxc@tst1 ~]# echo "c 1:5 rwm" > /sys/fs/cgroup/devices/devices.deny > or all devices in another container > [root-lxc@tst1 ~]# echo "a *:* rwm" > > /sys/fs/cgroup/devices/machine.slice/machine-lxc\x2d10297\x2dtst2.scope/devices.denyThere's only two ways to make a container secure - Use user namespaces - Apply SELinux policy to the container If neither of those are used, we don't try to play games to hide stuff like cgroups from root inside a container, as that's just security through obscurity Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|