Reposting with actual patches included. The following changes since commit 96b2e73c5471542cb9c622c4360716684f8797ed: Revert "net/mlx4_en: Use affinity hint" (2014-06-02 00:18:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-next for you to fetch changes up to 2ae76693b8bcabf370b981cd00c36cd41d33fabc: vhost: replace rcu with mutex (2014-06-02 23:47:59 +0300) ---------------------------------------------------------------- Michael S. Tsirkin (2): vhost-net: extend device allocation to vmalloc vhost: replace rcu with mutex drivers/vhost/net.c | 23 ++++++++++++++++++----- drivers/vhost/vhost.c | 10 +++++++++- 2 files changed, 27 insertions(+), 6 deletions(-)
Michael S. Tsirkin
2014-Jun-02 21:30 UTC
[PULL 1/2] vhost-net: extend device allocation to vmalloc
Michael Mueller provided a patch to reduce the size of vhost-net structure as some allocations could fail under memory pressure/fragmentation. We are still left with high order allocations though. This patch is handling the problem at the core level, allowing vhost structures to use vmalloc() if kmalloc() failed. As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT to kzalloc() flags to do this fallback only when really needed. People are still looking at cleaner ways to handle the problem at the API level, probably passing in multiple iovecs. This hack seems consistent with approaches taken since then by drivers/vhost/scsi.c and net/core/dev.c Based on patch by Romain Francoise. Cc: Michael Mueller <mimu at linux.vnet.ibm.com> Signed-off-by: Romain Francoise <romain at orebokech.com> Acked-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/vhost/net.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index be414d2b..e489161 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -17,6 +17,7 @@ #include <linux/workqueue.h> #include <linux/file.h> #include <linux/slab.h> +#include <linux/vmalloc.h> #include <linux/net.h> #include <linux/if_packet.h> @@ -699,18 +700,30 @@ static void handle_rx_net(struct vhost_work *work) handle_rx(net); } +static void vhost_net_free(void *addr) +{ + if (is_vmalloc_addr(addr)) + vfree(addr); + else + kfree(addr); +} + static int vhost_net_open(struct inode *inode, struct file *f) { - struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL); + struct vhost_net *n; struct vhost_dev *dev; struct vhost_virtqueue **vqs; int i; - if (!n) - return -ENOMEM; + n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); + if (!n) { + n = vmalloc(sizeof *n); + if (!n) + return -ENOMEM; + } vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL); if (!vqs) { - kfree(n); + vhost_net_free(n); return -ENOMEM; } @@ -827,7 +840,7 @@ static int vhost_net_release(struct inode *inode, struct file *f) * since jobs can re-queue themselves. */ vhost_net_flush(n); kfree(n->dev.vqs); - kfree(n); + vhost_net_free(n); return 0; } -- MST
All memory accesses are done under some VQ mutex. So lock/unlock all VQs is a faster equivalent of synchronize_rcu() for memory access changes. Some guests cause a lot of these changes, so it's helpful to make them faster. Reported-by: "Gonglei (Arei)" <arei.gonglei at huawei.com> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/vhost/vhost.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 78987e4..1c05e60 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) { struct vhost_memory mem, *newmem, *oldmem; unsigned long size = offsetof(struct vhost_memory, regions); + int i; if (copy_from_user(&mem, m, size)) return -EFAULT; @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) oldmem = rcu_dereference_protected(d->memory, lockdep_is_held(&d->mutex)); rcu_assign_pointer(d->memory, newmem); - synchronize_rcu(); + + /* All memory accesses are done under some VQ mutex. + * So below is a faster equivalent of synchronize_rcu() + */ + for (i = 0; i < d->nvqs; ++i) { + mutex_lock(&d->vqs[i]->mutex); + mutex_unlock(&d->vqs[i]->mutex); + } kfree(oldmem); return 0; } -- MST
On Tue, 2014-06-03 at 00:30 +0300, Michael S. Tsirkin wrote:> All memory accesses are done under some VQ mutex. > So lock/unlock all VQs is a faster equivalent of synchronize_rcu() > for memory access changes. > Some guests cause a lot of these changes, so it's helpful > to make them faster. > > Reported-by: "Gonglei (Arei)" <arei.gonglei at huawei.com> > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > drivers/vhost/vhost.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 78987e4..1c05e60 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) > { > struct vhost_memory mem, *newmem, *oldmem; > unsigned long size = offsetof(struct vhost_memory, regions); > + int i; > > if (copy_from_user(&mem, m, size)) > return -EFAULT; > @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) > oldmem = rcu_dereference_protected(d->memory, > lockdep_is_held(&d->mutex)); > rcu_assign_pointer(d->memory, newmem); > - synchronize_rcu(); > + > + /* All memory accesses are done under some VQ mutex. > + * So below is a faster equivalent of synchronize_rcu() > + */ > + for (i = 0; i < d->nvqs; ++i) { > + mutex_lock(&d->vqs[i]->mutex); > + mutex_unlock(&d->vqs[i]->mutex); > + } > kfree(oldmem); > return 0; > }This looks dubious What about using kfree_rcu() instead ? translate_desc() still uses rcu_read_lock(), its not clear if the mutex is really held.