On Sun, Aug 30, 2020 at 3:39 PM Marcus <shadowsor@gmail.com> wrote:
> > On Tue, Jun 30, 2020 at 04:02:05PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Jun 30, 2020 at 12:59:03PM +0200, Miguel Duarte de Mora
Barroso wrote:
> > > > On Mon, Apr 6, 2020 at 4:03 PM Laine Stump <lstump redhat
com> wrote:
> > > > >
> > > > > On 4/6/20 9:54 AM, Daniel P. Berrangé wrote:
> > > > > > On Mon, Apr 06, 2020 at 03:47:01PM +0200, Miguel
Duarte de Mora Barroso wrote:
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I'm aware that it is possible to plug
pre-created macvtap devices to
> > > > > >> libvirt guests - tracked in RFE [0].
> > > > > >>
> > > > > >> My interpretation of the wording in [1] and
[2] is that it is also
> > > > > >> possible to plug pre-created tap devices into
libvirt guests - that
> > > > > >> would be a requirement to allow kubevirt to
run with less capabilities
> > > > > >> in the pods that encapsulate the VMs.
> > > > > >>
> > > > > >> I took a look at the libvirt code ([3] &
[4]), and, from my limited
> > > > > >> understanding, I got the impression that
plugging existing interfaces
> > > > > >> via `managed='no' ` is only possible
for macvtap interfaces.
> > > > >
> > > > >
> > > > > No, it works for standard tap devices as well.
> > > > >
> > > > >
> > > > > The reason the BZs and commit logs talk mostly about
macvtap rather than
> > > > > tap is because 1) that's what kubevirt people had
asked for and 2) it
> > > > > already *mostly* worked for tap devices, so most of the
work was related
> > > > > to macvtap (my memory is already fuzzy, but I think
there were a couple
> > > > > privileged operations we still tried to do for standard
tap devices even
> > > > > if they were precreated (standard disclaimer: I often
misremember, so
> > > > > this memory could be wrong! But definitely precreated
tap devices do work).
> > > > >
> > > >
> > > > It's been a while since I've started this thread,
but lately I've
> > > > understood better how tap devices work, and that new insight
makes me
> > > > wonder about a couple of things.
> > > >
> > > > Our ultimate goal In kubevirt is to consume a pre-created
tap device
> > > > by a kubernetes pod that doesn't have the NET_ADMIN
capability.
> > > >
> > > > After looking at the current libvirt code, I don't think
that is
> > > > currently supported, since we'll *always* enter the
> > > > `virNetDevTapCreate` function in [1] (I'm interested in
the *tap*
> > > > scenario).
> > > >
> > > > The tap device is effectively created in that function - [2]
- by
> > > > opening the clone device (/dev/net/tun), and calling
`ioctl(fd,
> > > > TUNSETIFF,...)` in it. AFAIK, both of those operations
*require* the
> > > > NET_ADMIN capability. If I'm correct, this means that
the current
> > > > libvirt implementation makes our goals impossible to
achieve.
> > >
> > > AFAIK, that is not correct - CAP_NET_ADMIN isn't required to
open
> > > or create a tap device - only to add the tap device to a bridge.
> > >
> > > So if you create the tap device & attach it to a bridge ahead
of
> > > time, libvirt should then be able to open it and give it to QEMU
> >
> >
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/tun.c#n586
> >
> > ((uid_valid(tun->owner) && !uid_eq(cred->euid,
tun->owner)) ||
> > (gid_valid(tun->group) &&
!in_egroup_p(tun->group))) &&
> > !ns_capable(net->user_ns, CAP_NET_ADMIN);
> >
> >
> > This is called by the TUNSETIFF code.
> >
> > AFAICT, that means if you fchown(tapfd, uid, gid), to the uid+gid of
> > libvirtd, it should not require CAP_NET_ADMIN.
> >
> > Regards,
> > Daniel
>
> I have no idea if this message will get linked into the thread properly,
but
> I came across this and wanted to comment on the mystery without having an
actual
> email to reply to or headers.
>
> I recently ran into this issue as well, and found that even *with*
NET_ADMIN at
> the container level, trying to launch Qemu directly results in:
>
> qemu-system-x86_64: -netdev tap,id=hostnet0,ifname=tap0: could not
configure /dev/net/tun (tap0): Permission denied
>
> So as a note I'd say even Libvirt aside, Qemu is trying to do this as
well:
>
https://github.com/qemu/qemu/blob/0982a56a551556c704dc15752dabf57b4be1c640/net/tap-linux.c#L104
>
> But it's unclear where the EPERM is coming from in the kernel at
tun_set_iff().
>
> Of note, if I give Qemu a non-existing tap name, it will create it, but if
I give
> it an existing tap name, I get EPERM.
>
>
That was quick - turns out this other issue is SELinux related.
security_tun_dev_open, ultimately calling selinux_tun_dev_open