thr3ads.net - Linux Virtualization - Guest bridge setup variations [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Arnd Bergmann

2009-Dec-08 16:07 UTC

Guest bridge setup variations

As promised, here is my small writeup on which setups I feel
are important in the long run for server-type guests. This
does not cover -net user, which is really for desktop kinds
of applications where you do not want to connect into the
guest from another IP address.

I can see four separate setups that we may or may not want to
support, the main difference being how the forwarding between
guests happens:

1. The current setup, with a bridge and tun/tap devices on ports
of the bridge. This is what Gerhard's work on access controls is
focused on and the only option where the hypervisor actually
is in full control of the traffic between guests. CPU utilization should
be highest this way, and network management can be a burden,
because the controls are done through a Linux, libvirt and/or Director
specific interface.

2. Using macvlan as a bridging mechanism, replacing the bridge
and tun/tap entirely. This should offer the best performance on
inter-guest communication, both in terms of throughput and
CPU utilization, but offer no access control for this traffic at all.
Performance of guest-external traffic should be slightly better
than bridge/tap.

3. Doing the bridging in the NIC using macvlan in passthrough
mode. This lowers the CPU utilization further compared to 2,
at the expense of limiting throughput by the performance of
the PCIe interconnect to the adapter. Whether or not this
is a win is workload dependent. Access controls now happen
in the NIC. Currently, this is not supported yet, due to lack of
device drivers, but it will be an important scenario in the future
according to some people.

4. Using macvlan for actual VEPA on the outbound interface.
This is mostly interesting because it makes the network access
controls visible in an external switch that is already managed.
CPU utilization and guest-external throughput should be
identical to 3, but inter-guest latency can only be worse because
all frames go through the external switch.

In case 2 through 4, we have the choice between macvtap and
the raw packet interface for connecting macvlan to qemu.
Raw sockets are better tested right now, while macvtap has
better permission management (i.e. it does not require
CAP_NET_ADMIN). Neither one is upstream though at the
moment. The raw driver only requires qemu patches, while
macvtap requires both a new kernel driver and a trivial change
in qemu.

In all four cases, vhost-net could be used to move the workload
from user space into the kernel, which may be an advantage.
The decision for or against vhost-net is entirely independent of
the other decisions.

	Arnd

Anthony Liguori

2009-Dec-09 19:36 UTC

head link

[Qemu-devel] Guest bridge setup variations

Arnd Bergmann wrote:> As promised, here is my small writeup on which setups I feel
> are important in the long run for server-type guests. This
> does not cover -net user, which is really for desktop kinds
> of applications where you do not want to connect into the
> guest from another IP address.
>
> I can see four separate setups that we may or may not want to
> support, the main difference being how the forwarding between
> guests happens:
>
> 1. The current setup, with a bridge and tun/tap devices on ports
> of the bridge. This is what Gerhard's work on access controls is
> focused on and the only option where the hypervisor actually
> is in full control of the traffic between guests. CPU utilization should
> be highest this way, and network management can be a burden,
> because the controls are done through a Linux, libvirt and/or Director
> specific interface.
>   
Typical bridging.
> 2. Using macvlan as a bridging mechanism, replacing the bridge
> and tun/tap entirely. This should offer the best performance on
> inter-guest communication, both in terms of throughput and
> CPU utilization, but offer no access control for this traffic at all.
> Performance of guest-external traffic should be slightly better
> than bridge/tap.
>   
Optimization to typical bridge (no traffic control).
> 3. Doing the bridging in the NIC using macvlan in passthrough
> mode. This lowers the CPU utilization further compared to 2,
> at the expense of limiting throughput by the performance of
> the PCIe interconnect to the adapter. Whether or not this
> is a win is workload dependent. Access controls now happen
> in the NIC. Currently, this is not supported yet, due to lack of
> device drivers, but it will be an important scenario in the future
> according to some people.
>   
Optimization to typical bridge (hardware accelerated).
> 4. Using macvlan for actual VEPA on the outbound interface.
> This is mostly interesting because it makes the network access
> controls visible in an external switch that is already managed.
> CPU utilization and guest-external throughput should be
> identical to 3, but inter-guest latency can only be worse because
> all frames go through the external switch.
>   
VEPA.

While we go over all of these things one thing is becoming clear to me.  
We need to get qemu out of the network configuration business.  There's 
too much going on here.

What I'd like to see is the following interfaces supported:

1) given an fd, make socket calls to send packets.  Could be used with a 
raw socket, a multicast or tcp socket.
2) given an fd, use tap-style read/write calls to send packets*
3) given an fd, treat a vhost-style interface

* need to make all tun ioctls optional based on passed in flags

Every backend we have today could be implemented in terms of one of the 
above three.  They really come down to how the fd is created and setup.

I believe we should continue supporting the mechanisms we support 
today.  However, for people that invoke qemu directly from the command 
line, I believe we should provide a mechanism like the tap helper that 
can be used to call out to a separate program to create these initial 
file descriptors.  We'll have to think about how we can make this 
integrate well so that the syntax isn't clumsy.

Regards,

Anthony Liguori

Fischer, Anna

2009-Dec-10 12:26 UTC

head link

Guest bridge setup variations

> Subject: Guest bridge setup variations
> 
> As promised, here is my small writeup on which setups I feel
> are important in the long run for server-type guests. This
> does not cover -net user, which is really for desktop kinds
> of applications where you do not want to connect into the
> guest from another IP address.
> 
> I can see four separate setups that we may or may not want to
> support, the main difference being how the forwarding between
> guests happens:
> 
> 1. The current setup, with a bridge and tun/tap devices on ports
> of the bridge. This is what Gerhard's work on access controls is
> focused on and the only option where the hypervisor actually
> is in full control of the traffic between guests. CPU utilization should
> be highest this way, and network management can be a burden,
> because the controls are done through a Linux, libvirt and/or Director
> specific interface.
> 
> 2. Using macvlan as a bridging mechanism, replacing the bridge
> and tun/tap entirely. This should offer the best performance on
> inter-guest communication, both in terms of throughput and
> CPU utilization, but offer no access control for this traffic at all.
> Performance of guest-external traffic should be slightly better
> than bridge/tap.
> 
> 3. Doing the bridging in the NIC using macvlan in passthrough
> mode. This lowers the CPU utilization further compared to 2,
> at the expense of limiting throughput by the performance of
> the PCIe interconnect to the adapter. Whether or not this
> is a win is workload dependent. Access controls now happen
> in the NIC. Currently, this is not supported yet, due to lack of
> device drivers, but it will be an important scenario in the future
> according to some people.
Can you differentiate this option from typical PCI pass-through mode? It is not
clear to me where macvlan sits in a setup where the NIC does bridging.

Typically, in a PCI pass-through configuration, all configuration goes through
the physical function device driver (and all data goes directly to the NIC). Are
you suggesting to use macvlan as a common configuration layer that then
configures the underlying NIC? I could see some benefit in such a model, though
I am not certain I understand you correctly.

Thanks,
Anna

Leonid Grossman

2009-Dec-16 00:55 UTC

head link

Guest bridge setup variations

> > -----Original Message-----
> > From: virtualization-bounces at lists.linux-foundation.org
> > [mailto:virtualization-bounces at lists.linux-foundation.org] On
Behalf
> Of
> > Arnd Bergmann
> > Sent: Tuesday, December 08, 2009 8:08 AM
> > To: virtualization at lists.linux-foundation.org
> > Cc: qemu-devel at nongnu.org
> > Subject: Guest bridge setup variations
> >
> > As promised, here is my small writeup on which setups I feel
> > are important in the long run for server-type guests. This
> > does not cover -net user, which is really for desktop kinds
> > of applications where you do not want to connect into the
> > guest from another IP address.
> >
> > I can see four separate setups that we may or may not want to
> > support, the main difference being how the forwarding between
> > guests happens:
> >
> > 1. The current setup, with a bridge and tun/tap devices on ports
> > of the bridge. This is what Gerhard's work on access controls is
> > focused on and the only option where the hypervisor actually
> > is in full control of the traffic between guests. CPU utilization
> should
> > be highest this way, and network management can be a burden,
> > because the controls are done through a Linux, libvirt and/or
> Director
> > specific interface.
> >
> > 2. Using macvlan as a bridging mechanism, replacing the bridge
> > and tun/tap entirely. This should offer the best performance on
> > inter-guest communication, both in terms of throughput and
> > CPU utilization, but offer no access control for this traffic at
all.> > Performance of guest-external traffic should be slightly better
> > than bridge/tap.
> >
> > 3. Doing the bridging in the NIC using macvlan in passthrough
> > mode. This lowers the CPU utilization further compared to 2,
> > at the expense of limiting throughput by the performance of
> > the PCIe interconnect to the adapter. Whether or not this
> > is a win is workload dependent. 
This is certainly true today for pci-e 1.1 and 2.0 devices, but 
as NICs move to pci-e 3.0 (while remaining almost exclusively dual port
10GbE for a long while), 
EVB internal bandwidth will significantly exceed external bandwidth.
So, #3 can become a win for most inter-guest workloads.
> > Access controls now happen
> > in the NIC. Currently, this is not supported yet, due to lack of
> > device drivers, but it will be an important scenario in the future
> > according to some people.
Actually, x3100 10GbE drivers support this today via sysfs interface to
the host driver 
that can choose to control VEB tables (and therefore MAC addresses, vlan
memberships, etc. for all passthru interfaces behind the VEB).
OF course a more generic vendor-independent interface will be important
in the future.
> >
> > 4. Using macvlan for actual VEPA on the outbound interface.
> > This is mostly interesting because it makes the network access
> > controls visible in an external switch that is already managed.
> > CPU utilization and guest-external throughput should be
> > identical to 3, but inter-guest latency can only be worse because
> > all frames go through the external switch.
> >
> > In case 2 through 4, we have the choice between macvtap and
> > the raw packet interface for connecting macvlan to qemu.
> > Raw sockets are better tested right now, while macvtap has
> > better permission management (i.e. it does not require
> > CAP_NET_ADMIN). Neither one is upstream though at the
> > moment. The raw driver only requires qemu patches, while
> > macvtap requires both a new kernel driver and a trivial change
> > in qemu.
> >
> > In all four cases, vhost-net could be used to move the workload
> > from user space into the kernel, which may be an advantage.
> > The decision for or against vhost-net is entirely independent of
> > the other decisions.
> >
> > 	Arnd
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization at lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/virtualization

Reasonably Related Threads

Search for more maybe matching threads

Linux Virtualization - Dec 2009 - Guest bridge setup variations

Guest bridge setup variations

[Qemu-devel] Guest bridge setup variations

Guest bridge setup variations

Guest bridge setup variations

Reasonably Related Threads