Michael S. Tsirkin
2010-Jun-28 10:08 UTC
[PATCHv2] vhost-net: add dhclient work-around from userspace
Userspace virtio server has the following hack so guests rely on it, and we have to replicate it, too: Use port number to detect incoming IPv4 DHCP response packets, and fill in the checksum for these. The issue we are solving is that on linux guests, some apps that use recvmsg with AF_PACKET sockets, don't know how to handle CHECKSUM_PARTIAL; The interface to return the relevant information was added in 8dc4194474159660d7f37c495e3fc3f10d0db8cc, and older userspace does not use it. One important user of recvmsg with AF_PACKET is dhclient, so we add a work-around just for DHCP. Don't bother applying the hack to IPv6 as userspace virtio does not have a work-around for that - let's hope guests will do the right thing wrt IPv6. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- Dave, I'm going to put this patch on the vhost tree, no need for you to bother merging it - you'll get it with a pull request. drivers/vhost/net.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 43 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index cc19595..03bba6a 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -24,6 +24,10 @@ #include <linux/if_tun.h> #include <linux/if_macvlan.h> +#include <linux/ip.h> +#include <linux/udp.h> +#include <linux/netdevice.h> + #include <net/sock.h> #include "vhost.h" @@ -186,6 +190,44 @@ static void handle_tx(struct vhost_net *net) unuse_mm(net->dev.mm); } +static int peek_head(struct sock *sk) +{ + struct sk_buff *skb; + + lock_sock(sk); + skb = skb_peek(&sk->sk_receive_queue); + if (unlikely(!skb)) { + release_sock(sk); + return 0; + } + /* Userspace virtio server has the following hack so + * guests rely on it, and we have to replicate it, too: */ + /* Use port number to detect incoming IPv4 DHCP response packets, + * and fill in the checksum. */ + + /* The issue we are solving is that on linux guests, some apps + * that use recvmsg with AF_PACKET sockets, don't know how to + * handle CHECKSUM_PARTIAL; + * The interface to return the relevant information was added in + * 8dc4194474159660d7f37c495e3fc3f10d0db8cc, + * and older userspace does not use it. + * One important user of recvmsg with AF_PACKET is dhclient, + * so we add a work-around just for DHCP. */ + if (skb->ip_summed == CHECKSUM_PARTIAL && + skb_headlen(skb) >= skb_transport_offset(skb) + + sizeof(struct udphdr) && + udp_hdr(skb)->dest == htons(68) && + skb_network_header_len(skb) >= sizeof(struct iphdr) && + ip_hdr(skb)->protocol == IPPROTO_UDP && + skb->protocol == htons(ETH_P_IP)) { + skb_checksum_help(skb); + /* Restore ip_summed value: tun passes it to user. */ + skb->ip_summed = CHECKSUM_PARTIAL; + } + release_sock(sk); + return 1; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) @@ -222,7 +264,7 @@ static void handle_rx(struct vhost_net *net) vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ? vq->log : NULL; - for (;;) { + while (peek_head(sock->sk)) { head = vhost_get_vq_desc(&net->dev, vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, -- 1.7.1.12.g42b7f
Michael S. Tsirkin
2010-Jun-28 15:30 UTC
[PATCHv2] vhost-net: add dhclient work-around from userspace
On Mon, Jun 28, 2010 at 01:08:07PM +0300, Michael S. Tsirkin wrote:> Userspace virtio server has the following hack > so guests rely on it, and we have to replicate it, too: > > Use port number to detect incoming IPv4 DHCP response packets, > and fill in the checksum for these. > > The issue we are solving is that on linux guests, some apps > that use recvmsg with AF_PACKET sockets, don't know how to > handle CHECKSUM_PARTIAL; > The interface to return the relevant information was added > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc, > and older userspace does not use it. > One important user of recvmsg with AF_PACKET is dhclient, > so we add a work-around just for DHCP. > > Don't bother applying the hack to IPv6 as userspace virtio does not > have a work-around for that - let's hope guests will do the right > thing wrt IPv6. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com>Tested-by: Laine Stump <laine at redhat.com>
Sridhar Samudrala
2010-Jun-28 22:19 UTC
[PATCHv2] vhost-net: add dhclient work-around from userspace
On Mon, 2010-06-28 at 13:08 +0300, Michael S. Tsirkin wrote:> Userspace virtio server has the following hack > so guests rely on it, and we have to replicate it, too: > > Use port number to detect incoming IPv4 DHCP response packets, > and fill in the checksum for these. > > The issue we are solving is that on linux guests, some apps > that use recvmsg with AF_PACKET sockets, don't know how to > handle CHECKSUM_PARTIAL; > The interface to return the relevant information was added > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc, > and older userspace does not use it. > One important user of recvmsg with AF_PACKET is dhclient, > so we add a work-around just for DHCP. > > Don't bother applying the hack to IPv6 as userspace virtio does not > have a work-around for that - let's hope guests will do the right > thing wrt IPv6. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > > Dave, I'm going to put this patch on the vhost tree, > no need for you to bother merging it - you'll get > it with a pull request. > > > drivers/vhost/net.c | 44 +++++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 43 insertions(+), 1 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index cc19595..03bba6a 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -24,6 +24,10 @@ > #include <linux/if_tun.h> > #include <linux/if_macvlan.h> > > +#include <linux/ip.h> > +#include <linux/udp.h> > +#include <linux/netdevice.h> > + > #include <net/sock.h> > > #include "vhost.h" > @@ -186,6 +190,44 @@ static void handle_tx(struct vhost_net *net) > unuse_mm(net->dev.mm); > } > > +static int peek_head(struct sock *sk)This routine is doing more than just peeking the head of sk's receive queue. May be this should be named similar to what qemu calls 'work_around_broken_dhclient()'> +{ > + struct sk_buff *skb; > + > + lock_sock(sk); > + skb = skb_peek(&sk->sk_receive_queue); > + if (unlikely(!skb)) { > + release_sock(sk); > + return 0; > + } > + /* Userspace virtio server has the following hack so > + * guests rely on it, and we have to replicate it, too: */ > + /* Use port number to detect incoming IPv4 DHCP response packets, > + * and fill in the checksum. */ > + > + /* The issue we are solving is that on linux guests, some apps > + * that use recvmsg with AF_PACKET sockets, don't know how to > + * handle CHECKSUM_PARTIAL; > + * The interface to return the relevant information was added in > + * 8dc4194474159660d7f37c495e3fc3f10d0db8cc, > + * and older userspace does not use it. > + * One important user of recvmsg with AF_PACKET is dhclient, > + * so we add a work-around just for DHCP. */ > + if (skb->ip_summed == CHECKSUM_PARTIAL && > + skb_headlen(skb) >= skb_transport_offset(skb) + > + sizeof(struct udphdr) && > + udp_hdr(skb)->dest == htons(68) && > + skb_network_header_len(skb) >= sizeof(struct iphdr) && > + ip_hdr(skb)->protocol == IPPROTO_UDP && > + skb->protocol == htons(ETH_P_IP)) {Isn't it more logical to check for skb->protocol, followed by ip_hdr and then udp_hdr?> + skb_checksum_help(skb); > + /* Restore ip_summed value: tun passes it to user. */ > + skb->ip_summed = CHECKSUM_PARTIAL; > + } > + release_sock(sk); > + return 1; > +} > + > /* Expects to be always run from workqueue - which acts as > * read-size critical section for our kind of RCU. */ > static void handle_rx(struct vhost_net *net) > @@ -222,7 +264,7 @@ static void handle_rx(struct vhost_net *net) > vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ? > vq->log : NULL; > > - for (;;) { > + while (peek_head(sock->sk)) { > head = vhost_get_vq_desc(&net->dev, vq, vq->iov, > ARRAY_SIZE(vq->iov), > &out, &in,
David Miller
2010-Jun-29 07:36 UTC
[PATCHv2] vhost-net: add dhclient work-around from userspace
From: "Michael S. Tsirkin" <mst at redhat.com> Date: Mon, 28 Jun 2010 13:08:07 +0300> Userspace virtio server has the following hack > so guests rely on it, and we have to replicate it, too: > > Use port number to detect incoming IPv4 DHCP response packets, > and fill in the checksum for these. > > The issue we are solving is that on linux guests, some apps > that use recvmsg with AF_PACKET sockets, don't know how to > handle CHECKSUM_PARTIAL; > The interface to return the relevant information was added > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc, > and older userspace does not use it. > One important user of recvmsg with AF_PACKET is dhclient, > so we add a work-around just for DHCP. > > Don't bother applying the hack to IPv6 as userspace virtio does not > have a work-around for that - let's hope guests will do the right > thing wrt IPv6. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com>Yikes, this is awful too. Nothing in the kernel should be mucking around with procotol packets like this by default. In particular, what the heck does port 67 mean? Locally I can use it for whatever I want for my own purposes, I don't have to follow the conventions for service ports as specified by the IETF. But I can't have the packet checksum state be left alone for port 67 traffic on a box using virtio because you have this hack there. And yes it's broken on machines using the qemu thing, but at least the hack there is restricted to userspace. I really don't want anything in the kernel that looks like this. These applications are broken, and we've provided a way for them to work properly. What's the point of having fixed applications if all of these hacks grow like fungus over every virtualization transport? It just means that people won't fix the apps, since they don't have to. There is no incentive, and the mechanism we created to properly handle this loses it's value. At best, you can write a netfilter module that mucks up the packet checksum state in these situations. At least in that case, you can make it generic (it mangles iff a packet matches a certain rule, so for your virtio guests you'd make it match for DHCP frames) instead of being some hard-coded DHCP thing by design. And since this is so cleanly seperated and portable you don't even need to push it upstream. It's a temporary workaround for a temporary problem. You can just delete it as soon as the majority of guests have the fixed dhcp. The qemu crap should disappear similarly.
Possibly Parallel Threads
- [PATCHv2] vhost-net: add dhclient work-around from userspace
- [PATCH RFC] vhost-net: add dhclient work-around from userspace
- [PATCH RFC] vhost-net: add dhclient work-around from userspace
- [PATCH v2] Fix AF_PACKET ABI breakage in 4.2
- [PATCH v2] Fix AF_PACKET ABI breakage in 4.2