Hi there! I am trying to turn on user namespace by adding following lines to the config: <idmap> <uid start='0' target='0' count='100000'/> <gid start='0' target='0' count='100000'/> </idmap> As you can see the root in container is mapped to the root outside. I was expected to see no difference after adding this lines, but unfortunately there are some (see details below). Am I missing something or is there a problem with system, libvirt or kernel? Full libvirt config: <domain type='lxc'> <name>test_with_idmap</name> <memory>102400</memory> <os> <type>exe</type> <init>/usr/lib/systemd/systemd</init> </os> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <idmap> <uid start='0' target='0' count='100000'/> <gid start='0' target='0' count='100000'/> </idmap> <devices> <console type='pty'/> <filesystem type='mount'> <source dir='/guest'/> <target dir='/'/> </filesystem> </devices> </domain> root:~> uname -a Linux localhost 3.10.19-01077-g4a19d28-dirty #5 SMP PREEMPT Mon Jan 13 12:56:09 CET 2014 armv7l GNU/Linux root:~> libvirtd --version libvirtd (libvirt) 1.2.1 root:~> systemd --version systemd 204 After adding idmap to config systemd can't start many of its services, in particular: Failed to mount Debug File System. Failed to mount Configuration File System. Failed to mount FUSE Control File System. Failed to start udev Kernel Device Manager. Failed to start Remount Root and Kernel File Systems. Failed to start Journal Service. systemctl status says: ExecMount=/bin/mount debugfs /sys/kernel/debug -t debugfs (code=exited, status=32) ExecMount=/bin/mount configfs /sys/kernel/config -t configfs (code=exited, status=32) ExecMount=/bin/mount fusectl /sys/fs/fuse/connections -t fusectl (code=exited, status=32) ExecStart=/usr/lib/systemd/systemd-udevd (code=exited,status=206/OOM_ADJUST) ExecStart=/usr/lib/systemd/systemd-remount-fs (code=exited,status=1/FAILURE) ExecStart=/usr/lib/systemd/systemd-journald (code=exited, status=218/CAPABILITIES) Thanks!
Daniel P. Berrange
2014-Jan-28 11:46 UTC
Re: [libvirt-users] Libvirt-LXC + systemd + user namespace
On Tue, Jan 28, 2014 at 12:32:41PM +0100, Jan Olszak wrote:> Hi there! > > I am trying to turn on user namespace by adding following lines to the > config: > > > > <idmap> > > <uid start='0' target='0' count='100000'/> > > <gid start='0' target='0' count='100000'/> > > </idmap> > > > > As you can see the root in container is mapped to the root outside. I was > expected to see no difference after adding this lines, but unfortunately > there are some (see details below). > > Am I missing something or is there a problem with system, libvirt or kernel?I've not had any chance to try LXC + user namespaces + systemd yet, but based on the list of things which fail, it seems like it might not be detecting that it is inside a container. Seems almost like it has still got the CAP_MKNOD permission and so is strying to start things it should not have like udev, and various filesystems. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Piotr Bartosiewicz
2014-Jan-29 11:35 UTC
Re: [libvirt-users] Libvirt-LXC + systemd + user namespace
On 28.01.2014 12:46, Daniel P. Berrange wrote:> On Tue, Jan 28, 2014 at 12:32:41PM +0100, Jan Olszak wrote: >> Hi there! >> >> I am trying to turn on user namespace by adding following lines to the >> config: >> >> >> >> <idmap> >> >> <uid start='0' target='0' count='100000'/> >> >> <gid start='0' target='0' count='100000'/> >> >> </idmap> >> >> >> >> As you can see the root in container is mapped to the root outside. I was >> expected to see no difference after adding this lines, but unfortunately >> there are some (see details below). >> >> Am I missing something or is there a problem with system, libvirt or kernel? > I've not had any chance to try LXC + user namespaces + systemd yet, but > based on the list of things which fail, it seems like it might not be > detecting that it is inside a container. Seems almost like it has still > got the CAP_MKNOD permission and so is strying to start things it should > not have like udev, and various filesystems. > > DanielI was able to reduce the problem by not using libvirt nor systemd. I've created a bash process inside user namespace with mapping root_inside<->root_outside. I've used a program from https://lwn.net/Articles/532593/ : ./userns_child_exec -U -M '0 0 1' -G '0 0 1' bash This program simply calls clone with CLONE_NEWUSER flag and set proper uid_map and gid_map. The test commands are as follows: mkdir /test mount debugfs /test -t debugfs and strace shows: mount("debugfs", "/test", "debugfs", MS_MGC_VAL, NULL) = -1 EPERM (Operation not permitted) Now the question is: Is it a kernel bug or expected behavior ie. inside user namespace we have always limited permissions even if uid=0 inside container is mapped to uid=0 outside? # cat /proc/$$/uid_map 0 0 1 # cat /proc/$$/gid_map 0 0 1 # cat /proc/$$/status | grep Cap CapInh: 0000000000000000 CapPrm: 0000001fffffffff CapEff: 0000001fffffffff CapBnd: 0000001fffffffff -- Piotr Bartosiewicz