thr3ads.net - Linux Virtualization - [net-next RFC PATCH 0/5] Series short description [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2011-Dec-05 08:58 UTC

[net-next RFC PATCH 0/5] Series short description

multiple queue virtio-net: flow steering through host/guest cooperation

Hello all:

This is a rough series adds the guest/host cooperation of flow
steering support based on Krish Kumar's multiple queue virtio-net
driver patch 3/3 (http://lwn.net/Articles/467283/).

This idea is simple, the backend pass the rxhash to the guest and
guest would tell the backend the hash to queue mapping when necessary
then backend can choose the queue based on the hash value of the
packet.  The table is just a page shared bettwen userspace and the
backend.

Patch 1 enable the ability to pass the rxhash through vnet_hdr to
guest.
Patch 2,3 implement a very simple flow director for tap and
mavtap. tap part is based on the multiqueue tap patches posted by me
(http://lwn.net/Articles/459270/).
Patch 4 implement a method for virtio device to find the irq of a
specific virtqueue, in order to do device specific interrupt
optimization
Patch 5 is the part of the guest driver that using accelerate rfs to
program the flow director and with some optimizations on irq affinity
and tx queue selection.

This is just a prototype that demonstrates the idea, there are still
things need to be discussed:

- An alternative idea instead of shared page is ctrl vq, the reason
  that a shared table is preferable is the delay of ctrl vq itself.
- Optimization on irq affinity and tx queue selection

Comments are welcomed, thanks!

---

Jason Wang (5):
      virtio_net: passing rxhash through vnet_hdr
      tuntap: simple flow director support
      macvtap: flow director support
      virtio: introduce a method to get the irq of a specific virtqueue
      virtio-net: flow director support


 drivers/lguest/lguest_device.c |    8 ++
 drivers/net/macvlan.c          |    4 +
 drivers/net/macvtap.c          |   42 ++++++++-
 drivers/net/tun.c              |  105 ++++++++++++++++------
 drivers/net/virtio_net.c       |  189 +++++++++++++++++++++++++++++++++++++++-
 drivers/s390/kvm/kvm_virtio.c  |    6 +
 drivers/vhost/net.c            |   10 +-
 drivers/vhost/vhost.h          |    5 +
 drivers/virtio/virtio_mmio.c   |    8 ++
 drivers/virtio/virtio_pci.c    |   12 +++
 include/linux/if_macvlan.h     |    1 
 include/linux/if_tun.h         |   11 ++
 include/linux/virtio_config.h  |    4 +
 include/linux/virtio_net.h     |   16 +++
 14 files changed, 377 insertions(+), 44 deletions(-)

-- 
Signature

Jason Wang

2011-Dec-05 08:58 UTC

head link

[net-next RFC PATCH 1/5] virtio_net: passing rxhash through vnet_hdr

This patch enables the ability to pass the rxhash value to guest
through vnet_hdr. This is useful for guest when it wants to cooperate
with virtual device to steer a flow to dedicated guest cpu.

This feature is negotiated through VIRTIO_NET_F_GUEST_RXHASH.

Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/macvtap.c      |   10 ++++++----
 drivers/net/tun.c          |   44 +++++++++++++++++++++++++-------------------
 drivers/net/virtio_net.c   |   26 ++++++++++++++++++++++----
 drivers/vhost/net.c        |   10 +++++++---
 drivers/vhost/vhost.h      |    5 +++--
 include/linux/if_tun.h     |    1 +
 include/linux/virtio_net.h |   10 +++++++++-
 7 files changed, 73 insertions(+), 33 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 7c88d13..504c745 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -760,16 +760,17 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
 	int vnet_hdr_len = 0;
 
 	if (q->flags & IFF_VNET_HDR) {
-		struct virtio_net_hdr vnet_hdr;
+		struct virtio_net_hdr_rxhash vnet_hdr;
 		vnet_hdr_len = q->vnet_hdr_sz;
 		if ((len -= vnet_hdr_len) < 0)
 			return -EINVAL;
 
-		ret = macvtap_skb_to_vnet_hdr(skb, &vnet_hdr);
+		ret = macvtap_skb_to_vnet_hdr(skb, &vnet_hdr.hdr.hdr);
 		if (ret)
 			return ret;
 
-		if (memcpy_toiovecend(iv, (void *)&vnet_hdr, 0, sizeof(vnet_hdr)))
+		vnet_hdr.rxhash = skb->rxhash;
+		if (memcpy_toiovecend(iv, (void *)&vnet_hdr, 0, q->vnet_hdr_sz))
 			return -EFAULT;
 	}
 
@@ -890,7 +891,8 @@ static long macvtap_ioctl(struct file *file, unsigned int
cmd,
 		return ret;
 
 	case TUNGETFEATURES:
-		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
+		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH,
+			     up))
 			return -EFAULT;
 		return 0;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afb11d1..7d22b4b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -869,49 +869,55 @@ static ssize_t tun_put_user(struct tun_file *tfile,
 	}
 
 	if (tfile->flags & TUN_VNET_HDR) {
-		struct virtio_net_hdr gso = { 0 }; /* no info leak */
-		if ((len -= tfile->vnet_hdr_sz) < 0)
+		struct virtio_net_hdr_rxhash hdr;
+		struct virtio_net_hdr *gso = (struct virtio_net_hdr *)&hdr;
+
+		if ((len -= tfile->vnet_hdr_sz) < 0 ||
+		    tfile->vnet_hdr_sz > sizeof(struct virtio_net_hdr_rxhash))
 			return -EINVAL;
 
+		memset(&hdr, 0, sizeof(hdr));
 		if (skb_is_gso(skb)) {
 			struct skb_shared_info *sinfo = skb_shinfo(skb);
 
 			/* This is a hint as to how much should be linear. */
-			gso.hdr_len = skb_headlen(skb);
-			gso.gso_size = sinfo->gso_size;
+			gso->hdr_len = skb_headlen(skb);
+			gso->gso_size = sinfo->gso_size;
 			if (sinfo->gso_type & SKB_GSO_TCPV4)
-				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
+				gso->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 			else if (sinfo->gso_type & SKB_GSO_TCPV6)
-				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+				gso->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
 			else if (sinfo->gso_type & SKB_GSO_UDP)
-				gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
+				gso->gso_type = VIRTIO_NET_HDR_GSO_UDP;
 			else {
 				pr_err("unexpected GSO type: "
 				       "0x%x, gso_size %d, hdr_len %d\n",
-				       sinfo->gso_type, gso.gso_size,
-				       gso.hdr_len);
+				       sinfo->gso_type, gso->gso_size,
+				       gso->hdr_len);
 				print_hex_dump(KERN_ERR, "tun: ",
 					       DUMP_PREFIX_NONE,
 					       16, 1, skb->head,
-					       min((int)gso.hdr_len, 64), true);
+					       min((int)gso->hdr_len, 64),
+					       true);
 				WARN_ON_ONCE(1);
 				return -EINVAL;
 			}
 			if (sinfo->gso_type & SKB_GSO_TCP_ECN)
-				gso.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
+				gso->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
 		} else
-			gso.gso_type = VIRTIO_NET_HDR_GSO_NONE;
+			gso->gso_type = VIRTIO_NET_HDR_GSO_NONE;
 
 		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			gso.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			gso.csum_start = skb_checksum_start_offset(skb);
-			gso.csum_offset = skb->csum_offset;
+			gso->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+			gso->csum_start = skb_checksum_start_offset(skb);
+			gso->csum_offset = skb->csum_offset;
 		} else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
-			gso.flags = VIRTIO_NET_HDR_F_DATA_VALID;
+			gso->flags = VIRTIO_NET_HDR_F_DATA_VALID;
 		} /* else everything is zero */
 
-		if (unlikely(memcpy_toiovecend(iv, (void *)&gso, total,
-					       sizeof(gso))))
+		hdr.rxhash = skb_get_rxhash(skb);
+		if (unlikely(memcpy_toiovecend(iv, (void *)&hdr, total,
+					       tfile->vnet_hdr_sz)))
 			return -EFAULT;
 		total += tfile->vnet_hdr_sz;
 	}
@@ -1358,7 +1364,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned
int cmd,
 		 * This is needed because we never checked for invalid flags on
 		 * TUNSETIFF. */
 		return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE |
-				IFF_VNET_HDR | IFF_MULTI_QUEUE,
+				IFF_VNET_HDR | IFF_MULTI_QUEUE | IFF_RXHASH,
 				(unsigned int __user*)argp);
 	}
 
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 157ee63..0d871f8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -107,12 +107,16 @@ struct virtnet_info {
 
 	/* Host will merge rx buffers for big packets (shake it! shake it!) */
 	bool mergeable_rx_bufs;
+
+	/* Host will pass rxhash to us. */
+	bool has_rxhash;
 };
 
 struct skb_vnet_hdr {
 	union {
 		struct virtio_net_hdr hdr;
 		struct virtio_net_hdr_mrg_rxbuf mhdr;
+		struct virtio_net_hdr_rxhash rhdr;
 	};
 	unsigned int num_sg;
 };
@@ -205,7 +209,10 @@ static struct sk_buff *page_to_skb(struct receive_queue
*rq,
 	hdr = skb_vnet_hdr(skb);
 
 	if (vi->mergeable_rx_bufs) {
-		hdr_len = sizeof hdr->mhdr;
+		if (vi->has_rxhash)
+			hdr_len = sizeof hdr->rhdr;
+		else
+			hdr_len = sizeof hdr->mhdr;
 		offset = hdr_len;
 	} else {
 		hdr_len = sizeof hdr->hdr;
@@ -376,6 +383,9 @@ static void receive_buf(struct receive_queue *rq, void *buf,
unsigned int len)
 		skb_shinfo(skb)->gso_segs = 0;
 	}
 
+	if (vi->has_rxhash)
+		skb->rxhash = hdr->rhdr.rxhash;
+
 	netif_receive_skb(skb);
 	return;
 
@@ -645,9 +655,12 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff
*skb,
 	hdr->mhdr.num_buffers = 0;
 
 	/* Encode metadata header at front. */
-	if (vi->mergeable_rx_bufs)
-		sg_set_buf(sg, &hdr->mhdr, sizeof hdr->mhdr);
-	else
+	if (vi->mergeable_rx_bufs) {
+		if (vi->has_rxhash)
+			sg_set_buf(sg, &hdr->rhdr, sizeof hdr->rhdr);
+		else
+			sg_set_buf(sg, &hdr->mhdr, sizeof hdr->mhdr);
+	} else
 		sg_set_buf(sg, &hdr->hdr, sizeof hdr->hdr);
 
 	hdr->num_sg = skb_to_sgvec(skb, sg + 1, 0, skb->len) + 1;
@@ -1338,8 +1351,12 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
 		vi->mergeable_rx_bufs = true;
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_RXHASH))
+		vi->has_rxhash = true;
+
 	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
 	err = virtnet_setup_vqs(vi);
+
 	if (err)
 		goto free_netdev;
 
@@ -1436,6 +1453,7 @@ static unsigned int features[] = {
 	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
 	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, VIRTIO_NET_F_MULTIQUEUE,
+	VIRTIO_NET_F_GUEST_RXHASH,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 882a51f..b2d6548 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -768,9 +768,13 @@ static int vhost_net_set_features(struct vhost_net *n, u64
features)
 	size_t vhost_hlen, sock_hlen, hdr_len;
 	int i;
 
-	hdr_len = (features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ?
-			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
-			sizeof(struct virtio_net_hdr);
+	if (features & (1 << VIRTIO_NET_F_MRG_RXBUF))
+		hdr_len = (features & (1 << VIRTIO_NET_F_GUEST_RXHASH)) ?
+			sizeof(struct virtio_net_hdr_rxhash) :
+			sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		hdr_len = sizeof(struct virtio_net_hdr);
+
 	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
 		/* vhost provides vnet_hdr */
 		vhost_hlen = hdr_len;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a801e28..4ad2d5f 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -115,7 +115,7 @@ struct vhost_virtqueue {
 	/* hdr is used to store the virtio header.
 	 * Since each iovec has >= 1 byte length, we never need more than
 	 * header length entries to store the header. */
-	struct iovec hdr[sizeof(struct virtio_net_hdr_mrg_rxbuf)];
+	struct iovec hdr[sizeof(struct virtio_net_hdr_rxhash)];
 	struct iovec *indirect;
 	size_t vhost_hlen;
 	size_t sock_hlen;
@@ -203,7 +203,8 @@ enum {
 			 (1ULL << VIRTIO_RING_F_EVENT_IDX) |
 			 (1ULL << VHOST_F_LOG_ALL) |
 			 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
-			 (1ULL << VIRTIO_NET_F_MRG_RXBUF),
+			 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
+			 (1ULL << VIRTIO_NET_F_GUEST_RXHASH) ,
 };
 
 static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index d3f24d8..a1f6f3f 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -66,6 +66,7 @@
 #define IFF_VNET_HDR	0x4000
 #define IFF_TUN_EXCL	0x8000
 #define IFF_MULTI_QUEUE 0x0100
+#define IFF_RXHASH      0x0200
 
 /* Features for GSO (TUNSETOFFLOAD). */
 #define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index c92b83f..2291317 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -50,6 +50,7 @@
 #define VIRTIO_NET_F_CTRL_VLAN	19	/* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20	/* Extra RX mode control support */
 #define VIRTIO_NET_F_MULTIQUEUE	21	/* Device supports multiple TXQ/RXQ */
+#define VIRTIO_NET_F_GUEST_RXHASH 22    /* Guest can receive rxhash */
 
 #define VIRTIO_NET_S_LINK_UP	1	/* Link is up */
 
@@ -63,7 +64,7 @@ struct virtio_net_config {
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't
- * specify GSO or CSUM features, you can simply ignore the header. */
+ * specify GSO, CSUM or HASH features, you can simply ignore the header. */
 struct virtio_net_hdr {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	// Use csum_start, csum_offset
 #define VIRTIO_NET_HDR_F_DATA_VALID	2	// Csum is valid
@@ -87,6 +88,13 @@ struct virtio_net_hdr_mrg_rxbuf {
 	__u16 num_buffers;	/* Number of merged rx buffers */
 };
 
+/* This is the version of the header to use when GUEST_RXHASH
+ * feature has been negotiated. */
+struct virtio_net_hdr_rxhash {
+	struct virtio_net_hdr_mrg_rxbuf hdr;
+	__u32 rxhash;
+};
+
 /*
  * Control virtqueue data structures
  *

Jason Wang

2011-Dec-05 08:58 UTC

head link

[net-next RFC PATCH 2/5] tuntap: simple flow director support

This patch adds a simple flow director to tun/tap device. It is just a
page that contains the hash to queue mapping which could be changed by
user-space. The backend (tap/macvtap) would query this table to get
the desired queue of a packets when it send packets to userspace.

The page address were set through a new kind of ioctl - TUNSETFD and
were pinned until device exit or another new page were specified.

Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/tun.c      |   63 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/if_tun.h |   10 ++++++++
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7d22b4b..2efaf81 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -64,6 +64,7 @@
 #include <linux/nsproxy.h>
 #include <linux/virtio_net.h>
 #include <linux/rcupdate.h>
+#include <linux/highmem.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
@@ -109,6 +110,7 @@ struct tap_filter {
 };
 
 #define MAX_TAP_QUEUES (NR_CPUS < 16 ? NR_CPUS : 16)
+#define TAP_HASH_MASK  0xFF
 
 struct tun_file {
 	struct sock sk;
@@ -128,6 +130,7 @@ struct tun_sock;
 
 struct tun_struct {
 	struct tun_file		*tfiles[MAX_TAP_QUEUES];
+	struct page             *fd_page[1];
 	unsigned int            numqueues;
 	unsigned int 		flags;
 	uid_t			owner;
@@ -156,7 +159,7 @@ static struct tun_file *tun_get_queue(struct net_device
*dev,
 	struct tun_struct *tun = netdev_priv(dev);
 	struct tun_file *tfile = NULL;
 	int numqueues = tun->numqueues;
-	__u32 rxq;
+	__u32 rxq, rxhash;
 
 	BUG_ON(!rcu_read_lock_held());
 
@@ -168,6 +171,22 @@ static struct tun_file *tun_get_queue(struct net_device
*dev,
 		goto out;
 	}
 
+	rxhash = skb_get_rxhash(skb);
+	if (rxhash) {
+		if (tun->fd_page[0]) {
+			u16 *table = kmap_atomic(tun->fd_page[0]);
+			rxq = table[rxhash & TAP_HASH_MASK];
+			kunmap_atomic(table);
+			if (rxq < numqueues) {
+				tfile = rcu_dereference(tun->tfiles[rxq]);
+				goto out;
+			}
+		}
+		rxq = ((u64)rxhash * numqueues) >> 32;
+		tfile = rcu_dereference(tun->tfiles[rxq]);
+		goto out;
+	}
+
 	if (likely(skb_rx_queue_recorded(skb))) {
 		rxq = skb_get_rx_queue(skb);
 
@@ -178,14 +197,6 @@ static struct tun_file *tun_get_queue(struct net_device
*dev,
 		goto out;
 	}
 
-	/* Check if we can use flow to select a queue */
-	rxq = skb_get_rxhash(skb);
-	if (rxq) {
-		u32 idx = ((u64)rxq * numqueues) >> 32;
-		tfile = rcu_dereference(tun->tfiles[idx]);
-		goto out;
-	}
-
 	tfile = rcu_dereference(tun->tfiles[0]);
 out:
 	return tfile;
@@ -1020,6 +1031,14 @@ out:
 	return ret;
 }
 
+static void tun_destructor(struct net_device *dev)
+{
+	struct tun_struct *tun = netdev_priv(dev);
+	if (tun->fd_page[0])
+		put_page(tun->fd_page[0]);
+	free_netdev(dev);
+}
+
 static void tun_setup(struct net_device *dev)
 {
 	struct tun_struct *tun = netdev_priv(dev);
@@ -1028,7 +1047,7 @@ static void tun_setup(struct net_device *dev)
 	tun->group = -1;
 
 	dev->ethtool_ops = &tun_ethtool_ops;
-	dev->destructor = free_netdev;
+	dev->destructor = tun_destructor;
 }
 
 /* Trivial set of netlink ops to allow deleting tun or tap
@@ -1230,6 +1249,7 @@ static int tun_set_iff(struct net *net, struct file *file,
struct ifreq *ifr)
 		tun = netdev_priv(dev);
 		tun->dev = dev;
 		tun->flags = flags;
+		tun->fd_page[0] = NULL;
 
 		security_tun_dev_post_create(&tfile->sk);
 
@@ -1353,6 +1373,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned
int cmd,
 	struct net_device *dev = NULL;
 	void __user* argp = (void __user*)arg;
 	struct ifreq ifr;
+	struct tun_fd tfd;
 	int ret;
 
 	if (cmd == TUNSETIFF || cmd == TUNATTACHQUEUE || _IOC_TYPE(cmd) == 0x89)
@@ -1364,7 +1385,8 @@ static long __tun_chr_ioctl(struct file *file, unsigned
int cmd,
 		 * This is needed because we never checked for invalid flags on
 		 * TUNSETIFF. */
 		return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE |
-				IFF_VNET_HDR | IFF_MULTI_QUEUE | IFF_RXHASH,
+				IFF_VNET_HDR | IFF_MULTI_QUEUE | IFF_RXHASH |
+				IFF_FD,
 				(unsigned int __user*)argp);
 	}
 
@@ -1476,6 +1498,25 @@ static long __tun_chr_ioctl(struct file *file, unsigned
int cmd,
 		ret = set_offload(tun, arg);
 		break;
 
+	case TUNSETFD:
+		if (copy_from_user(&tfd, argp, sizeof(tfd)))
+			ret = -EFAULT;
+		else {
+			if (tun->fd_page[0]) {
+				put_page(tun->fd_page[0]);
+				tun->fd_page[0] = NULL;
+			}
+
+			/* put_page() in tun_destructor() */
+			if (get_user_pages_fast(tfd.addr, 1, 0,
+						&tun->fd_page[0]) != 1)
+				ret = -EFAULT;
+			else
+				ret = 0;
+		}
+
+		break;
+
 	case SIOCGIFHWADDR:
 		/* Get hw address */
 		memcpy(ifr.ifr_hwaddr.sa_data, tun->dev->dev_addr, ETH_ALEN);
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index a1f6f3f..726731d 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -36,6 +36,8 @@
 #define TUN_VNET_HDR 	0x0200
 #define TUN_TAP_MQ      0x0400
 
+struct tun_fd;
+
 /* Ioctl defines */
 #define TUNSETNOCSUM  _IOW('T', 200, int) 
 #define TUNSETDEBUG   _IOW('T', 201, int) 
@@ -56,6 +58,7 @@
 #define TUNSETVNETHDRSZ _IOW('T', 216, int)
 #define TUNATTACHQUEUE  _IOW('T', 217, int)
 #define TUNDETACHQUEUE  _IOW('T', 218, int)
+#define TUNSETFD        _IOW('T', 219, struct tun_fd)
 
 
 /* TUNSETIFF ifr flags */
@@ -67,6 +70,7 @@
 #define IFF_TUN_EXCL	0x8000
 #define IFF_MULTI_QUEUE 0x0100
 #define IFF_RXHASH      0x0200
+#define IFF_FD          0x0400
 
 /* Features for GSO (TUNSETOFFLOAD). */
 #define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
@@ -97,6 +101,12 @@ struct tun_filter {
 	__u8   addr[0][ETH_ALEN];
 };
 
+/* Programmable flow director */
+struct tun_fd {
+	unsigned long addr;
+	size_t size;
+};
+
 #ifdef __KERNEL__
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);

Jason Wang

2011-Dec-05 08:59 UTC

head link

[net-next RFC PATCH 3/5] macvtap: flow director support

Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/macvlan.c      |    4 ++++
 drivers/net/macvtap.c      |   36 ++++++++++++++++++++++++++++++++++--
 include/linux/if_macvlan.h |    1 +
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 7413497..b0cb7ce 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -706,6 +706,7 @@ int macvlan_common_newlink(struct net *src_net, struct
net_device *dev,
 	vlan->port     = port;
 	vlan->receive  = receive;
 	vlan->forward  = forward;
+	vlan->fd_page[0] = NULL;
 
 	vlan->mode     = MACVLAN_MODE_VEPA;
 	if (data && data[IFLA_MACVLAN_MODE])
@@ -749,6 +750,9 @@ void macvlan_dellink(struct net_device *dev, struct
list_head *head)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
+	if (vlan->fd_page[0])
+		put_page(vlan->fd_page[0]);
+
 	list_del(&vlan->list);
 	unregister_netdevice_queue(dev, head);
 }
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 504c745..a34eb84 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -14,6 +14,7 @@
 #include <linux/wait.h>
 #include <linux/cdev.h>
 #include <linux/fs.h>
+#include <linux/highmem.h>
 
 #include <net/net_namespace.h>
 #include <net/rtnetlink.h>
@@ -62,6 +63,8 @@ static DEFINE_IDR(minor_idr);
 static struct class *macvtap_class;
 static struct cdev macvtap_cdev;
 
+#define TAP_HASH_MASK 0xFF
+
 static const struct proto_ops macvtap_socket_ops;
 
 /*
@@ -189,6 +192,11 @@ static struct macvtap_queue *macvtap_get_queue(struct
net_device *dev,
 	/* Check if we can use flow to select a queue */
 	rxq = skb_get_rxhash(skb);
 	if (rxq) {
+		if (vlan->fd_page[0]) {
+			u16 *table = kmap_atomic(vlan->fd_page[0]);
+			rxq = table[rxq & TAP_HASH_MASK];
+			kunmap_atomic(table);
+		}
 		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
 		if (tap)
 			goto out;
@@ -851,6 +859,7 @@ static long macvtap_ioctl(struct file *file, unsigned int
cmd,
 {
 	struct macvtap_queue *q = file->private_data;
 	struct macvlan_dev *vlan;
+	struct tun_fd tfd;
 	void __user *argp = (void __user *)arg;
 	struct ifreq __user *ifr = argp;
 	unsigned int __user *up = argp;
@@ -891,8 +900,8 @@ static long macvtap_ioctl(struct file *file, unsigned int
cmd,
 		return ret;
 
 	case TUNGETFEATURES:
-		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH,
-			     up))
+		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH |
+			     IFF_FD, up))
 			return -EFAULT;
 		return 0;
 
@@ -918,6 +927,29 @@ static long macvtap_ioctl(struct file *file, unsigned int
cmd,
 		q->vnet_hdr_sz = s;
 		return 0;
 
+	case TUNSETFD:
+		rcu_read_lock_bh();
+		vlan = rcu_dereference(q->vlan);
+		if (!vlan)
+			ret = -ENOLINK;
+		else {
+			if (copy_from_user(&tfd, argp, sizeof(tfd)))
+				ret = -EFAULT;
+			if (vlan->fd_page[0]) {
+				put_page(vlan->fd_page[0]);
+				vlan->fd_page[0] = NULL;
+			}
+
+			/* put_page() in macvlan_dellink() */
+			if (get_user_pages_fast(tfd.addr, 1, 0,
+						&vlan->fd_page[0]) != 1)
+				ret = -EFAULT;
+			else
+				ret = 0;
+		}
+		rcu_read_unlock_bh();
+		return ret;
+
 	case TUNSETOFFLOAD:
 		/* let the user check for future flags */
 		if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index d103dca..69a87a1 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -65,6 +65,7 @@ struct macvlan_dev {
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
 	int			numvtaps;
 	int			minor;
+	struct page             *fd_page[1];
 };
 
 static inline void macvlan_count_rx(const struct macvlan_dev *vlan,

Jason Wang

2011-Dec-05 08:59 UTC

head link

[net-next RFC PATCH 4/5] virtio: introduce a method to get the irq of a specific virtqueue

Device specific irq configuration may be need in order to do some
optimization. So a new configuration is needed to get the irq of a
virtqueue.

Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/lguest/lguest_device.c |    8 ++++++++
 drivers/s390/kvm/kvm_virtio.c  |    6 ++++++
 drivers/virtio/virtio_mmio.c   |    8 ++++++++
 drivers/virtio/virtio_pci.c    |   12 ++++++++++++
 include/linux/virtio_config.h  |    4 ++++
 5 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 595d731..6483bff 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -386,6 +386,13 @@ static const char *lg_bus_name(struct virtio_device *vdev)
 	return "";
 }
 
+static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct lguest_vq_info *lvq = vq->priv;
+
+	return lvq->config.irq;
+}
+
 /* The ops structure which hooks everything together. */
 static struct virtio_config_ops lguest_config_ops = {
 	.get_features = lg_get_features,
@@ -398,6 +405,7 @@ static struct virtio_config_ops lguest_config_ops = {
 	.find_vqs = lg_find_vqs,
 	.del_vqs = lg_del_vqs,
 	.bus_name = lg_bus_name,
+	.get_vq_irq = lg_get_vq_irq,
 };
 
 /*
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 8af868b..a8d5ca1 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device *vdev)
 	return "";
 }
 
+static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	return 0x2603;
+}
+
 /*
  * The config ops structure as defined by virtio config
  */
@@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = {
 	.find_vqs = kvm_find_vqs,
 	.del_vqs = kvm_del_vqs,
 	.bus_name = kvm_bus_name,
+	.get_vq_irq = kvm_get_vq_irq,
 };
 
 /*
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 2f57380..309d471 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -368,6 +368,13 @@ static const char *vm_bus_name(struct virtio_device *vdev)
 	return vm_dev->pdev->name;
 }
 
+static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+	return platform_get_irq(vm_dev->pdev, 0);
+}
+
 static struct virtio_config_ops virtio_mmio_config_ops = {
 	.get		= vm_get,
 	.set		= vm_set,
@@ -379,6 +386,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = {
 	.get_features	= vm_get_features,
 	.finalize_features = vm_finalize_features,
 	.bus_name	= vm_bus_name,
+	.get_vq_irq     = vm_get_vq_irq,
 };
 
 
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 229ea56..4f99164 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -583,6 +583,17 @@ static const char *vp_bus_name(struct virtio_device *vdev)
 	return pci_name(vp_dev->pci_dev);
 }
 
+static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+
+	if (vp_dev->intx_enabled)
+		return vp_dev->pci_dev->irq;
+	else
+		return vp_dev->msix_entries[info->msix_vector].vector;
+}
+
 static struct virtio_config_ops virtio_pci_config_ops = {
 	.get		= vp_get,
 	.set		= vp_set,
@@ -594,6 +605,7 @@ static struct virtio_config_ops virtio_pci_config_ops = {
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
+	.get_vq_irq     = vp_get_vq_irq,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 63f98d0..7b783a6 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -104,6 +104,9 @@
  *	vdev: the virtio_device
  *      This returns a pointer to the bus name a la pci_name from which
  *      the caller can then copy.
+ * @get_vq_irq: get the irq numer of the specific virt queue.
+ *      vdev: the virtio_device
+ *      vq: the virtqueue
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -122,6 +125,7 @@ struct virtio_config_ops {
 	u32 (*get_features)(struct virtio_device *vdev);
 	void (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
+	int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */

Jason Wang

2011-Dec-05 08:59 UTC

head link

[net-next RFC PATCH 5/5] virtio-net: flow director support

In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/virtio_net.c   |  165 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/virtio_net.h |    6 ++
 2 files changed, 169 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d871f8..89bb5e7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,10 @@
 #include <linux/scatterlist.h>
 #include <linux/if_vlan.h>
 #include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/cpu_rmap.h>
+#include <linux/interrupt.h>
+#include <linux/cpumask.h>
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
@@ -40,6 +44,7 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX    2
 #define VIRTNET_DRIVER_VERSION "1.0.0"
+#define TAP_HASH_MASK 0xFF
 
 struct virtnet_send_stats {
 	struct u64_stats_sync syncp;
@@ -89,6 +94,9 @@ struct receive_queue {
 
 	/* Active rx statistics */
 	struct virtnet_recv_stats __percpu *stats;
+
+	/* FIXME: per vector instead of per queue ?? */
+	cpumask_var_t affinity_mask;
 };
 
 struct virtnet_info {
@@ -110,6 +118,11 @@ struct virtnet_info {
 
 	/* Host will pass rxhash to us. */
 	bool has_rxhash;
+
+	/* A page of flow director */
+	struct page *fd_page;
+
+	cpumask_var_t affinity_mask;
 };
 
 struct skb_vnet_hdr {
@@ -386,6 +399,7 @@ static void receive_buf(struct receive_queue *rq, void *buf,
unsigned int len)
 	if (vi->has_rxhash)
 		skb->rxhash = hdr->rhdr.rxhash;
 
+	skb_record_rx_queue(skb, rq->vq->queue_index / 2);
 	netif_receive_skb(skb);
 	return;
 
@@ -722,6 +736,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct
net_device *dev)
 	return NETDEV_TX_OK;
 }
 
+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct virtio_device *vdev = vi->vdev;
+
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+		vdev->config->set(vdev,
+				  offsetof(struct virtio_net_config_fd, addr),
+				  &pfn, sizeof(u32));
+	}
+	return 0;
+}
+
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -1017,6 +1044,39 @@ static int virtnet_change_mtu(struct net_device *dev, int
new_mtu)
 	return 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+	       u16 rxq_index, u32 flow_id)
+{
+	struct virtnet_info *vi = netdev_priv(net_dev);
+	u16 *table = NULL;
+
+	if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash)
+		return -EPROTONOSUPPORT;
+
+	table = kmap_atomic(vi->fd_page);
+	table[skb->rxhash & TAP_HASH_MASK] = rxq_index;
+	kunmap_atomic(table);
+
+	return 0;
+}
+#endif
+
+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+	int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+					       smp_processor_id();
+
+	/* As we make use of the accelerate rfs which let the scheduler to
+	 * balance the load, it make sense to choose the tx queue also based on
+	 * theprocessor id?
+	 */
+	while (unlikely(txq >= dev->real_num_tx_queues))
+		txq -= dev->real_num_tx_queues;
+	return txq;
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -1028,9 +1088,13 @@ static const struct net_device_ops virtnet_netdev = {
 	.ndo_get_stats64     = virtnet_stats,
 	.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+	.ndo_select_queue    = virtnet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = virtnet_netpoll,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer   = virtnet_fd,
+#endif
 };
 
 static void virtnet_update_status(struct virtnet_info *vi)
@@ -1272,12 +1336,76 @@ static int virtnet_setup_vqs(struct virtnet_info *vi)
 	return ret;
 }
 
+static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
+{
+#ifdef CONFIG_RFS_ACCEL
+	struct virtio_device *vdev = vi->vdev;
+	int i, rc;
+
+	vi->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(vi->num_queue_pairs);
+	if (!vi->dev->rx_cpu_rmap)
+		return -ENOMEM;
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		rc = irq_cpu_rmap_add(vi->dev->rx_cpu_rmap,
+				vdev->config->get_vq_irq(vdev, vi->rq[i]->vq));
+		if (rc) {
+			free_irq_cpu_rmap(vi->dev->rx_cpu_rmap);
+			vi->dev->rx_cpu_rmap = NULL;
+			return rc;
+		}
+	}
+#endif
+	return 0;
+}
+
+static int virtnet_init_rq_affinity(struct virtnet_info *vi)
+{
+	struct virtio_device *vdev = vi->vdev;
+	int i;
+
+	/* FIXME: TX/RX share a vector */
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		if (!alloc_cpumask_var(&vi->rq[i]->affinity_mask, GFP_KERNEL))
+			goto err_out;
+		cpumask_set_cpu(i, vi->rq[i]->affinity_mask);
+		irq_set_affinity_hint(vdev->config->get_vq_irq(vdev,
+							       vi->rq[i]->vq),
+				      vi->rq[i]->affinity_mask);
+	}
+
+	return 0;
+err_out:
+	while (i) {
+		i--;
+		irq_set_affinity_hint(vdev->config->get_vq_irq(vdev,
+							       vi->rq[i]->vq),
+				      NULL);
+		free_cpumask_var(vi->rq[i]->affinity_mask);
+	}
+	return -ENOMEM;
+}
+
+static void virtnet_free_rq_affinity(struct virtnet_info *vi)
+{
+	struct virtio_device *vdev = vi->vdev;
+	int i;
+
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		irq_set_affinity_hint(vdev->config->get_vq_irq(vdev,
+							       vi->rq[i]->vq),
+				      NULL);
+		free_cpumask_var(vi->rq[i]->affinity_mask);
+	}
+}
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
 	int i, err;
 	struct net_device *dev;
 	struct virtnet_info *vi;
 	u16 num_queues, num_queue_pairs;
+	struct page *page = NULL;
+	u16 *table = NULL;
 
 	/* Find if host supports multiqueue virtio_net device */
 	err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
@@ -1298,7 +1426,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 	/* Set up network device as normal. */
 	dev->priv_flags |= IFF_UNICAST_FLT;
 	dev->netdev_ops = &virtnet_netdev;
-	dev->features = NETIF_F_HIGHDMA;
+	dev->features = NETIF_F_HIGHDMA | NETIF_F_NTUPLE;
 
 	SET_ETHTOOL_OPS(dev, &virtnet_ethtool_ops);
 	SET_NETDEV_DEV(dev, &vdev->dev);
@@ -1342,6 +1470,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 	vdev->priv = vi;
 	vi->num_queue_pairs = num_queue_pairs;
 
+
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
@@ -1382,6 +1511,31 @@ static int virtnet_probe(struct virtio_device *vdev)
 		}
 	}
 
+	/* Config flow director */
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+		page = alloc_page(GFP_KERNEL);
+		if (!page)
+			return -ENOMEM;
+		table = (u16 *)kmap_atomic(page);
+		for (i = 0; i < (PAGE_SIZE / 16); i++) {
+			/* invalid all entries */
+			table[i] = num_queue_pairs;
+		}
+
+		vi->fd_page = page;
+		kunmap_atomic(table);
+		virtnet_set_fd(dev, page_to_pfn(page));
+
+		err = virtnet_init_rx_cpu_rmap(vi);
+		if (err)
+			goto free_recv_bufs;
+
+		err = virtnet_init_rq_affinity(vi);
+		if (err)
+			goto free_recv_bufs;
+
+	}
+
 	/* Assume link up if device can't report link status,
 	   otherwise get link status from config. */
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
@@ -1437,6 +1591,13 @@ static void __devexit virtnet_remove(struct virtio_device
*vdev)
 	/* Free memory for send and receive queues */
 	free_rq_sq(vi);
 
+	/* Free the page of flow director */
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+		if (vi->fd_page)
+			put_page(vi->fd_page);
+
+		virtnet_free_rq_affinity(vi);
+	}
 	free_netdev(vi->dev);
 }
 
@@ -1453,7 +1614,7 @@ static unsigned int features[] = {
 	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
 	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, VIRTIO_NET_F_MULTIQUEUE,
-	VIRTIO_NET_F_GUEST_RXHASH,
+	VIRTIO_NET_F_GUEST_RXHASH, VIRTIO_NET_F_HOST_FD,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 2291317..abcea52 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -51,6 +51,7 @@
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20	/* Extra RX mode control support */
 #define VIRTIO_NET_F_MULTIQUEUE	21	/* Device supports multiple TXQ/RXQ */
 #define VIRTIO_NET_F_GUEST_RXHASH 22    /* Guest can receive rxhash */
+#define VIRTIO_NET_F_HOST_FD    23      /* Host has a flow director */
 
 #define VIRTIO_NET_S_LINK_UP	1	/* Link is up */
 
@@ -63,6 +64,11 @@ struct virtio_net_config {
 	__u16 num_queues;
 } __attribute__((packed));
 
+struct virtio_net_config_fd {
+	struct virtio_net_config cfg;
+	u32 addr;
+} __packed;
+
 /* This is the first element of the scatter-gather list.  If you don't
  * specify GSO, CSUM or HASH features, you can simply ignore the header. */
 struct virtio_net_hdr {

Stefan Hajnoczi

2011-Dec-05 10:38 UTC

head link

[net-next RFC PATCH 2/5] tuntap: simple flow director support

On Mon, Dec 5, 2011 at 8:58 AM, Jason Wang <jasowang at redhat.com>
wrote:> This patch adds a simple flow director to tun/tap device. It is just a
> page that contains the hash to queue mapping which could be changed by
> user-space. The backend (tap/macvtap) would query this table to get
> the desired queue of a packets when it send packets to userspace.
>
> The page address were set through a new kind of ioctl - TUNSETFD and
> were pinned until device exit or another new page were specified.
Please use "flow" or "fdir" instead of "fd" in the
ioctl and code.
"fd" reminds of file descriptor.  The ixgbe driver uses
"fdir".

Stefan

Stefan Hajnoczi

2011-Dec-05 10:55 UTC

head link

[net-next RFC PATCH 5/5] virtio-net: flow director support

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang <jasowang at redhat.com>
wrote:> +static int virtnet_set_fd(struct net_device *dev, u32 pfn)
> +{
> + ? ? ? struct virtnet_info *vi = netdev_priv(dev);
> + ? ? ? struct virtio_device *vdev = vi->vdev;
> +
> + ? ? ? if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
> + ? ? ? ? ? ? ? vdev->config->set(vdev,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? offsetof(struct virtio_net_config_fd,
addr),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? &pfn, sizeof(u32));
Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.

Stefan

Rusty Russell

2011-Dec-07 07:30 UTC

head link

[net-next RFC PATCH 0/5] Series short description

On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang <jasowang at redhat.com>
wrote:> multiple queue virtio-net: flow steering through host/guest cooperation
> 
> Hello all:
> 
> This is a rough series adds the guest/host cooperation of flow
> steering support based on Krish Kumar's multiple queue virtio-net
> driver patch 3/3 (http://lwn.net/Articles/467283/).
Is there a real (physical) device which does this kind of thing?  How do
they do it?  Can we copy them?

Cheers,
Rusty.

Seemingly Similar Threads

Search for more maybe matching threads

Linux Virtualization - Dec 2011 - [net-next RFC PATCH 0/5] Series short description

[net-next RFC PATCH 0/5] Series short description

[net-next RFC PATCH 1/5] virtio_net: passing rxhash through vnet_hdr

[net-next RFC PATCH 2/5] tuntap: simple flow director support

[net-next RFC PATCH 3/5] macvtap: flow director support

[net-next RFC PATCH 4/5] virtio: introduce a method to get the irq of a specific virtqueue

[net-next RFC PATCH 5/5] virtio-net: flow director support

[net-next RFC PATCH 2/5] tuntap: simple flow director support

[net-next RFC PATCH 5/5] virtio-net: flow director support

[net-next RFC PATCH 0/5] Series short description

Seemingly Similar Threads