Jason Wang
2016-Jun-30  06:45 UTC
[PATCH net-next V4 0/6] switch to use tx skb array in tun
Hi all: This series tries to switch to use skb array in tun. This is used to eliminate the spinlock contention between producer and consumer. The conversion was straightforward: just introdce a tx skb array and use it instead of sk_receive_queue. A minor issue is to keep the tx_queue_len behaviour, since tun used to use it for the length of sk_receive_queue. This is done through: - add the ability to resize multiple rings at once to avoid handling partial resize failure for mutiple rings. - add the support for zero length ring. - introduce a notifier which was triggered when tx_queue_len was changed for a netdev. - resize all queues during the tx_queue_len changing. Tests shows about 15% improvement on guest rx pps: Before: ~1300000pps After : ~1500000pps Changes from V3: - fix kbuild warnings - call NETDEV_CHANGE_TX_QUEUE_LEN on IFLA_TXQLEN Changes from V2: - add multiple rings resizing support for ptr_ring/skb_array - add zero length ring support - introdce a NETDEV_CHANGE_TX_QUEUE_LEN - drop new flags Changes from V1: - switch to use skb array instead of a customized circular buffer - add non-blocking support - rename .peek to .peek_len - drop lockless peeking since test show very minor improvement Jason Wang (5): ptr_ring: support zero length ring skb_array: minor tweak skb_array: add wrappers for resizing net: introduce NETDEV_CHANGE_TX_QUEUE_LEN tun: switch to use skb array for tx Michael S. Tsirkin (1): ptr_ring: support resizing multiple queues drivers/net/tun.c | 138 ++++++++++++++++++++++++++++++++++++--- drivers/vhost/net.c | 16 ++++- include/linux/net.h | 1 + include/linux/netdevice.h | 1 + include/linux/ptr_ring.h | 77 ++++++++++++++++++---- include/linux/skb_array.h | 13 +++- net/core/net-sysfs.c | 15 ++++- net/core/rtnetlink.c | 16 +++-- tools/virtio/ringtest/ptr_ring.c | 5 ++ 9 files changed, 255 insertions(+), 27 deletions(-) -- 2.7.4
Jason Wang
2016-Jun-30  06:45 UTC
[PATCH net-next V4 1/6] ptr_ring: support zero length ring
Sometimes, we need zero length ring. But current code will crash since
we don't do any check before accessing the ring. This patch fixes this.
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 include/linux/ptr_ring.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 562a65e..d78b8b8 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -102,7 +102,7 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
  */
 static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
 {
-	if (r->queue[r->producer])
+	if (unlikely(!r->size) || r->queue[r->producer])
 		return -ENOSPC;
 
 	r->queue[r->producer++] = ptr;
@@ -164,7 +164,9 @@ static inline int ptr_ring_produce_bh(struct ptr_ring *r,
void *ptr)
  */
 static inline void *__ptr_ring_peek(struct ptr_ring *r)
 {
-	return r->queue[r->consumer];
+	if (likely(r->size))
+		return r->queue[r->consumer];
+	return NULL;
 }
 
 /* Note: callers invoking this in a loop must use a compiler barrier,
-- 
2.7.4
Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 include/linux/skb_array.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 678bfbf..2dd0d1e 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -151,12 +151,12 @@ static inline int skb_array_init(struct skb_array *a, int
size, gfp_t gfp)
 	return ptr_ring_init(&a->ring, size, gfp);
 }
 
-void __skb_array_destroy_skb(void *ptr)
+static void __skb_array_destroy_skb(void *ptr)
 {
 	kfree_skb(ptr);
 }
 
-int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
+static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
 {
 	return ptr_ring_resize(&a->ring, size, gfp, __skb_array_destroy_skb);
 }
-- 
2.7.4
Jason Wang
2016-Jun-30  06:45 UTC
[PATCH net-next V4 3/6] ptr_ring: support resizing multiple queues
From: "Michael S. Tsirkin" <mst at redhat.com>
Sometimes, we need support resizing multiple queues at once. This is
because it was not easy to recover to recover from a partial failure
of multiple queues resizing.
Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 include/linux/ptr_ring.h         | 71 +++++++++++++++++++++++++++++++++++-----
 tools/virtio/ringtest/ptr_ring.c |  5 +++
 2 files changed, 67 insertions(+), 9 deletions(-)
diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index d78b8b8..2052011 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -349,20 +349,14 @@ static inline int ptr_ring_init(struct ptr_ring *r, int
size, gfp_t gfp)
 	return 0;
 }
 
-static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
-				  void (*destroy)(void *))
+static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
+					   int size, gfp_t gfp,
+					   void (*destroy)(void *))
 {
-	unsigned long flags;
 	int producer = 0;
-	void **queue = __ptr_ring_init_queue_alloc(size, gfp);
 	void **old;
 	void *ptr;
 
-	if (!queue)
-		return -ENOMEM;
-
-	spin_lock_irqsave(&(r)->producer_lock, flags);
-
 	while ((ptr = ptr_ring_consume(r)))
 		if (producer < size)
 			queue[producer++] = ptr;
@@ -375,6 +369,23 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int
size, gfp_t gfp,
 	old = r->queue;
 	r->queue = queue;
 
+	return old;
+}
+
+static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
+				  void (*destroy)(void *))
+{
+	unsigned long flags;
+	void **queue = __ptr_ring_init_queue_alloc(size, gfp);
+	void **old;
+
+	if (!queue)
+		return -ENOMEM;
+
+	spin_lock_irqsave(&(r)->producer_lock, flags);
+
+	old = __ptr_ring_swap_queue(r, queue, size, gfp, destroy);
+
 	spin_unlock_irqrestore(&(r)->producer_lock, flags);
 
 	kfree(old);
@@ -382,6 +393,48 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int
size, gfp_t gfp,
 	return 0;
 }
 
+static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
+					   int size,
+					   gfp_t gfp, void (*destroy)(void *))
+{
+	unsigned long flags;
+	void ***queues;
+	int i;
+
+	queues = kmalloc(nrings * sizeof *queues, gfp);
+	if (!queues)
+		goto noqueues;
+
+	for (i = 0; i < nrings; ++i) {
+		queues[i] = __ptr_ring_init_queue_alloc(size, gfp);
+		if (!queues[i])
+			goto nomem;
+	}
+
+	for (i = 0; i < nrings; ++i) {
+		spin_lock_irqsave(&(rings[i])->producer_lock, flags);
+		queues[i] = __ptr_ring_swap_queue(rings[i], queues[i],
+						  size, gfp, destroy);
+		spin_unlock_irqrestore(&(rings[i])->producer_lock, flags);
+	}
+
+	for (i = 0; i < nrings; ++i)
+		kfree(queues[i]);
+
+	kfree(queues);
+
+	return 0;
+
+nomem:
+	while (--i >= 0)
+		kfree(queues[i]);
+
+	kfree(queues);
+
+noqueues:
+	return -ENOMEM;
+}
+
 static inline void ptr_ring_cleanup(struct ptr_ring *r, void (*destroy)(void
*))
 {
 	void *ptr;
diff --git a/tools/virtio/ringtest/ptr_ring.c b/tools/virtio/ringtest/ptr_ring.c
index 74abd74..68e4f9f 100644
--- a/tools/virtio/ringtest/ptr_ring.c
+++ b/tools/virtio/ringtest/ptr_ring.c
@@ -17,6 +17,11 @@
 typedef pthread_spinlock_t  spinlock_t;
 
 typedef int gfp_t;
+static void *kmalloc(unsigned size, gfp_t gfp)
+{
+	return memalign(64, size);
+}
+
 static void *kzalloc(unsigned size, gfp_t gfp)
 {
 	void *p = memalign(64, size);
-- 
2.7.4
Jason Wang
2016-Jun-30  06:45 UTC
[PATCH net-next V4 4/6] skb_array: add wrappers for resizing
Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 include/linux/skb_array.h | 9 +++++++++
 1 file changed, 9 insertions(+)
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 2dd0d1e..f4dfade 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -161,6 +161,15 @@ static inline int skb_array_resize(struct skb_array *a, int
size, gfp_t gfp)
 	return ptr_ring_resize(&a->ring, size, gfp, __skb_array_destroy_skb);
 }
 
+static inline int skb_array_resize_multiple(struct skb_array **rings,
+					    int nrings, int size, gfp_t gfp)
+{
+	BUILD_BUG_ON(offsetof(struct skb_array, ring));
+	return ptr_ring_resize_multiple((struct ptr_ring **)rings,
+					nrings, size, gfp,
+					__skb_array_destroy_skb);
+}
+
 static inline void skb_array_cleanup(struct skb_array *a)
 {
 	ptr_ring_cleanup(&a->ring, __skb_array_destroy_skb);
-- 
2.7.4
Jason Wang
2016-Jun-30  06:45 UTC
[PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
will be triggered when tx_queue_len. It could be used by net device
who want to do some processing at that time. An example is tun who may
want to resize tx array when tx_queue_len is changed.
Cc: John Fastabend <john.r.fastabend at intel.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 include/linux/netdevice.h |  1 +
 net/core/net-sysfs.c      | 15 ++++++++++++++-
 net/core/rtnetlink.c      | 16 ++++++++++++----
 3 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e84d9d2..7dc2ec7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2237,6 +2237,7 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_PRECHANGEUPPER	0x001A
 #define NETDEV_CHANGELOWERSTATE	0x001B
 #define NETDEV_UDP_TUNNEL_PUSH_INFO	0x001C
+#define NETDEV_CHANGE_TX_QUEUE_LEN	0x001E
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 7a0b616..6e4f347 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -322,7 +322,20 @@ NETDEVICE_SHOW_RW(flags, fmt_hex);
 
 static int change_tx_queue_len(struct net_device *dev, unsigned long new_len)
 {
-	dev->tx_queue_len = new_len;
+	int res, orig_len = dev->tx_queue_len;
+
+	if (new_len != orig_len) {
+		dev->tx_queue_len = new_len;
+		res = call_netdevice_notifiers(NETDEV_CHANGE_TX_QUEUE_LEN, dev);
+		res = notifier_to_errno(res);
+		if (res) {
+			netdev_err(dev,
+				   "refused to change device tx_queue_len\n");
+			dev->tx_queue_len = orig_len;
+			return -EFAULT;
+		}
+	}
+
 	return 0;
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eb49ca2..b16b779 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1927,11 +1927,19 @@ static int do_setlink(const struct sk_buff *skb,
 
 	if (tb[IFLA_TXQLEN]) {
 		unsigned long value = nla_get_u32(tb[IFLA_TXQLEN]);
-
-		if (dev->tx_queue_len ^ value)
+		unsigned long orig_len = dev->tx_queue_len;
+
+		if (dev->tx_queue_len ^ value) {
+			dev->tx_queue_len = value;
+			err = call_netdevice_notifiers(
+			      NETDEV_CHANGE_TX_QUEUE_LEN, dev);
+			err = notifier_to_errno(err);
+			if (err) {
+				dev->tx_queue_len = orig_len;
+				goto errout;
+			}
 			status |= DO_SETLINK_NOTIFY;
-
-		dev->tx_queue_len = value;
+		}
 	}
 
 	if (tb[IFLA_OPERSTATE])
-- 
2.7.4
Jason Wang
2016-Jun-30  06:45 UTC
[PATCH net-next V4 6/6] tun: switch to use skb array for tx
We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.
This patch tries to address this by:
- switch from sk_receive_queue to a skb_array, and resize it when
  tx_queue_len was changed.
- introduce a new proto_ops peek_len which was used for peeking the
  skb length.
- implement a tun version of peek_len for vhost_net to use and convert
  vhost_net to use peek_len if possible.
Pktgen test shows about 15.3% improvement on guest receiving pps for small
buffers:
Before: ~1300000pps
After : ~1500000pps
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/tun.c   | 138 +++++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/net.c |  16 +++++-
 include/linux/net.h |   1 +
 3 files changed, 146 insertions(+), 9 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4884802..7475215 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -71,6 +71,7 @@
 #include <net/sock.h>
 #include <linux/seq_file.h>
 #include <linux/uio.h>
+#include <linux/skb_array.h>
 
 #include <asm/uaccess.h>
 
@@ -167,6 +168,7 @@ struct tun_file {
 	};
 	struct list_head next;
 	struct tun_struct *detached;
+	struct skb_array tx_array;
 };
 
 struct tun_flow_entry {
@@ -515,7 +517,11 @@ static struct tun_struct *tun_enable_queue(struct tun_file
*tfile)
 
 static void tun_queue_purge(struct tun_file *tfile)
 {
-	skb_queue_purge(&tfile->sk.sk_receive_queue);
+	struct sk_buff *skb;
+
+	while ((skb = skb_array_consume(&tfile->tx_array)) != NULL)
+		kfree_skb(skb);
+
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
@@ -560,6 +566,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 			    tun->dev->reg_state == NETREG_REGISTERED)
 				unregister_netdevice(tun->dev);
 		}
+		if (tun)
+			skb_array_cleanup(&tfile->tx_array);
 		sock_put(&tfile->sk);
 	}
 }
@@ -613,6 +621,7 @@ static void tun_detach_all(struct net_device *dev)
 static int tun_attach(struct tun_struct *tun, struct file *file, bool
skip_filter)
 {
 	struct tun_file *tfile = file->private_data;
+	struct net_device *dev = tun->dev;
 	int err;
 
 	err = security_tun_dev_attach(tfile->socket.sk, tun->security);
@@ -642,6 +651,13 @@ static int tun_attach(struct tun_struct *tun, struct file
*file, bool skip_filte
 		if (!err)
 			goto out;
 	}
+
+	if (!tfile->detached &&
+	    skb_array_init(&tfile->tx_array, dev->tx_queue_len, GFP_KERNEL))
{
+		err = -ENOMEM;
+		goto out;
+	}
+
 	tfile->queue_index = tun->numqueues;
 	tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
 	rcu_assign_pointer(tfile->tun, tun);
@@ -891,8 +907,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct
net_device *dev)
 
 	nf_reset(skb);
 
-	/* Enqueue packet */
-	skb_queue_tail(&tfile->socket.sk->sk_receive_queue, skb);
+	if (skb_array_produce(&tfile->tx_array, skb))
+		goto drop;
 
 	/* Notify and wake up reader process */
 	if (tfile->flags & TUN_FASYNC)
@@ -1107,7 +1123,7 @@ static unsigned int tun_chr_poll(struct file *file,
poll_table *wait)
 
 	poll_wait(file, sk_sleep(sk), wait);
 
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_array_empty(&tfile->tx_array))
 		mask |= POLLIN | POLLRDNORM;
 
 	if (sock_writeable(sk) ||
@@ -1426,22 +1442,61 @@ done:
 	return total;
 }
 
+static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock,
+				     int *err)
+{
+	DECLARE_WAITQUEUE(wait, current);
+	struct sk_buff *skb = NULL;
+
+	skb = skb_array_consume(&tfile->tx_array);
+	if (skb)
+		goto out;
+	if (noblock) {
+		*err = -EAGAIN;
+		goto out;
+	}
+
+	add_wait_queue(&tfile->wq.wait, &wait);
+	current->state = TASK_INTERRUPTIBLE;
+
+	while (1) {
+		skb = skb_array_consume(&tfile->tx_array);
+		if (skb)
+			break;
+		if (signal_pending(current)) {
+			*err = -ERESTARTSYS;
+			break;
+		}
+		if (tfile->socket.sk->sk_shutdown & RCV_SHUTDOWN) {
+			*err = -EFAULT;
+			break;
+		}
+
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+	remove_wait_queue(&tfile->wq.wait, &wait);
+
+out:
+	return skb;
+}
+
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 			   struct iov_iter *to,
 			   int noblock)
 {
 	struct sk_buff *skb;
 	ssize_t ret;
-	int peeked, err, off = 0;
+	int err;
 
 	tun_debug(KERN_INFO, tun, "tun_do_read\n");
 
 	if (!iov_iter_count(to))
 		return 0;
 
-	/* Read frames from queue */
-	skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
-				  &peeked, &off, &err);
+	/* Read frames from ring */
+	skb = tun_ring_recv(tfile, noblock, &err);
 	if (!skb)
 		return err;
 
@@ -1574,8 +1629,25 @@ out:
 	return ret;
 }
 
+static int tun_peek_len(struct socket *sock)
+{
+	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
+	struct tun_struct *tun;
+	int ret = 0;
+
+	tun = __tun_get(tfile);
+	if (!tun)
+		return 0;
+
+	ret = skb_array_peek_len(&tfile->tx_array);
+	tun_put(tun);
+
+	return ret;
+}
+
 /* Ops structure to mimic raw sockets with tun */
 static const struct proto_ops tun_socket_ops = {
+	.peek_len = tun_peek_len,
 	.sendmsg = tun_sendmsg,
 	.recvmsg = tun_recvmsg,
 };
@@ -2397,6 +2469,53 @@ static const struct ethtool_ops tun_ethtool_ops = {
 	.get_ts_info	= ethtool_op_get_ts_info,
 };
 
+static int tun_queue_resize(struct tun_struct *tun)
+{
+	struct net_device *dev = tun->dev;
+	struct tun_file *tfile;
+	struct skb_array **arrays;
+	int n = tun->numqueues + tun->numdisabled;
+	int ret, i;
+
+	arrays = kmalloc(sizeof *arrays * n, GFP_KERNEL);
+	if (!arrays)
+		return -ENOMEM;
+
+	for (i = 0; i < tun->numqueues; i++) {
+		tfile = rtnl_dereference(tun->tfiles[i]);
+		arrays[i] = &tfile->tx_array;
+	}
+	list_for_each_entry(tfile, &tun->disabled, next)
+		arrays[i++] = &tfile->tx_array;
+
+	ret = skb_array_resize_multiple(arrays, n,
+					dev->tx_queue_len, GFP_KERNEL);
+
+	kfree(arrays);
+	return ret;
+}
+
+static int tun_device_event(struct notifier_block *unused,
+			    unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct tun_struct *tun = netdev_priv(dev);
+
+	switch (event) {
+	case NETDEV_CHANGE_TX_QUEUE_LEN:
+		if (tun_queue_resize(tun))
+			return NOTIFY_BAD;
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block tun_notifier_block __read_mostly = {
+	.notifier_call	= tun_device_event,
+};
 
 static int __init tun_init(void)
 {
@@ -2416,6 +2535,8 @@ static int __init tun_init(void)
 		pr_err("Can't register misc device %d\n", TUN_MINOR);
 		goto err_misc;
 	}
+
+	register_netdevice_notifier(&tun_notifier_block);
 	return  0;
 err_misc:
 	rtnl_link_unregister(&tun_link_ops);
@@ -2427,6 +2548,7 @@ static void tun_cleanup(void)
 {
 	misc_deregister(&tun_miscdev);
 	rtnl_link_unregister(&tun_link_ops);
+	unregister_netdevice_notifier(&tun_notifier_block);
 }
 
 /* Get an underlying socket object from tun file.  Returns error unless file is
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1d3e45f..e032ca3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -481,10 +481,14 @@ out:
 
 static int peek_head_len(struct sock *sk)
 {
+	struct socket *sock = sk->sk_socket;
 	struct sk_buff *head;
 	int len = 0;
 	unsigned long flags;
 
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
 	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
 	head = skb_peek(&sk->sk_receive_queue);
 	if (likely(head)) {
@@ -497,6 +501,16 @@ static int peek_head_len(struct sock *sk)
 	return len;
 }
 
+static int sk_has_rx_data(struct sock *sk)
+{
+	struct socket *sock = sk->sk_socket;
+
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
+	return skb_queue_empty(&sk->sk_receive_queue);
+}
+
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
@@ -513,7 +527,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net,
struct sock *sk)
 		endtime = busy_clock() + vq->busyloop_timeout;
 
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
-		       skb_queue_empty(&sk->sk_receive_queue) &&
+		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
 			cpu_relax_lowlatency();
 
diff --git a/include/linux/net.h b/include/linux/net.h
index 9aa49a0..b6b3843 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -185,6 +185,7 @@ struct proto_ops {
 	ssize_t 	(*splice_read)(struct socket *sock,  loff_t *ppos,
 				       struct pipe_inode_info *pipe, size_t len, unsigned int flags);
 	int		(*set_peek_off)(struct sock *sk, int val);
+	int		(*peek_len)(struct socket *sock);
 };
 
 #define DECLARE_SOCKADDR(type, dst, src)	\
-- 
2.7.4
Michael S. Tsirkin
2016-Jun-30  15:45 UTC
[PATCH net-next V4 0/6] switch to use tx skb array in tun
On Thu, Jun 30, 2016 at 02:45:30PM +0800, Jason Wang wrote:> Hi all: > > This series tries to switch to use skb array in tun. This is used to > eliminate the spinlock contention between producer and consumer. The > conversion was straightforward: just introdce a tx skb array and use > it instead of sk_receive_queue. > > A minor issue is to keep the tx_queue_len behaviour, since tun used to > use it for the length of sk_receive_queue. This is done through: > > - add the ability to resize multiple rings at once to avoid handling > partial resize failure for mutiple rings. > - add the support for zero length ring. > - introduce a notifier which was triggered when tx_queue_len was > changed for a netdev. > - resize all queues during the tx_queue_len changing. > > Tests shows about 15% improvement on guest rx pps: > > Before: ~1300000pps > After : ~1500000ppsAcked-by: Michael S. Tsirkin <mst at redhat.com> Acked-from-altitude: 34697 feet.> Changes from V3: > - fix kbuild warnings > - call NETDEV_CHANGE_TX_QUEUE_LEN on IFLA_TXQLEN > > Changes from V2: > - add multiple rings resizing support for ptr_ring/skb_array > - add zero length ring support > - introdce a NETDEV_CHANGE_TX_QUEUE_LEN > - drop new flags > > Changes from V1: > - switch to use skb array instead of a customized circular buffer > - add non-blocking support > - rename .peek to .peek_len > - drop lockless peeking since test show very minor improvement > > Jason Wang (5): > ptr_ring: support zero length ring > skb_array: minor tweak > skb_array: add wrappers for resizing > net: introduce NETDEV_CHANGE_TX_QUEUE_LEN > tun: switch to use skb array for tx > > Michael S. Tsirkin (1): > ptr_ring: support resizing multiple queues > > drivers/net/tun.c | 138 ++++++++++++++++++++++++++++++++++++--- > drivers/vhost/net.c | 16 ++++- > include/linux/net.h | 1 + > include/linux/netdevice.h | 1 + > include/linux/ptr_ring.h | 77 ++++++++++++++++++---- > include/linux/skb_array.h | 13 +++- > net/core/net-sysfs.c | 15 ++++- > net/core/rtnetlink.c | 16 +++-- > tools/virtio/ringtest/ptr_ring.c | 5 ++ > 9 files changed, 255 insertions(+), 27 deletions(-) > > -- > 2.7.4
John Fastabend
2016-Jun-30  16:33 UTC
[PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
On 16-06-29 11:45 PM, Jason Wang wrote:> This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this > will be triggered when tx_queue_len. It could be used by net device > who want to do some processing at that time. An example is tun who may > want to resize tx array when tx_queue_len is changed. > > Cc: John Fastabend <john.r.fastabend at intel.com> > Signed-off-by: Jason Wang <jasowang at redhat.com> > ---Thanks for adding the setlink case. Acked-by: John Fastabend <john.r.fastabend at intel.com>
Jason Wang
2016-Jul-01  06:04 UTC
[PATCH net-next V4 0/6] switch to use tx skb array in tun
On 2016?06?30? 23:45, Michael S. Tsirkin wrote:> On Thu, Jun 30, 2016 at 02:45:30PM +0800, Jason Wang wrote: >> >Hi all: >> > >> >This series tries to switch to use skb array in tun. This is used to >> >eliminate the spinlock contention between producer and consumer. The >> >conversion was straightforward: just introdce a tx skb array and use >> >it instead of sk_receive_queue. >> > >> >A minor issue is to keep the tx_queue_len behaviour, since tun used to >> >use it for the length of sk_receive_queue. This is done through: >> > >> >- add the ability to resize multiple rings at once to avoid handling >> > partial resize failure for mutiple rings. >> >- add the support for zero length ring. >> >- introduce a notifier which was triggered when tx_queue_len was >> > changed for a netdev. >> >- resize all queues during the tx_queue_len changing. >> > >> >Tests shows about 15% improvement on guest rx pps: >> > >> >Before: ~1300000pps >> >After : ~1500000pps > Acked-by: Michael S. Tsirkin<mst at redhat.com> > > Acked-from-altitude: 34697 feet.Wow, thanks a lot!
David Miller
2016-Jul-01  09:40 UTC
[PATCH net-next V4 0/6] switch to use tx skb array in tun
From: Jason Wang <jasowang at redhat.com> Date: Thu, 30 Jun 2016 14:45:30 +0800> This series tries to switch to use skb array in tun. This is used to > eliminate the spinlock contention between producer and consumer. The > conversion was straightforward: just introdce a tx skb array and use > it instead of sk_receive_queue.Series applied, thanks Jason.
Maybe Matching Threads
- [PATCH net-next V4 0/6] switch to use tx skb array in tun
- [PATCH net-next V4 0/6] switch to use tx skb array in tun
- [PATCH net-next V3 0/6] switch to use tx skb array in tun
- [PATCH net-next V3 0/6] switch to use tx skb array in tun
- [PATCH net-next V3 6/6] tun: switch to use skb array for tx