thr3ads.net - Xen devel - [PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements [Jul 2013]

If this information is useful, please help other people find it:
Share via:

David Vrabel

2013-Jul-19 14:51 UTC

[PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements

This series includes a small collection of fixes improving the evtchn
and gntdev devices.

Patch 1 fixes a bug in the evtchn device that may cause deadlocks when
unbinding events or closing the device.  You may wish to consider it
for stable.

Patch 2 is a (very) minor performance improvement to
m2p_remove_override() when used by the the gntdev device.  The TLB
flush that it did not have any significant performance cost (since
it''s a local CPU and single address only).

Patch 3 improves the scalability of the evtchn device when it is used
by multiple processes (e.g., multiple qemus).  As you can see from the
graph[1] it is a signficant improvement but still less than ideal.  I
suspect that this may be due to the per-domain event lock inside Xen
rather than anything kernel-side.

The graphed data was collected from dom0 with 8 VCPUs on a host with 8
CPUs.

David

[1] http://xenbits.xen.org/people/dvrabel/evtchn-device-scalability.png

David Vrabel

2013-Jul-19 14:51 UTC

head link

[PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel

From: David Vrabel <david.vrabel@citrix.com>

Unbinding an event channel (either with the ioctl or when the evtchn
device is closed) may deadlock because disable_irq() is called with
port_user_lock held which is also locked by the interrupt handler.

Using get_port_user() to check if a port''s user is safe without the
spin lock (as it''s protected by u->bind_mutex in the ioctl) so just
remove the unnesssary locking.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/evtchn.c |   21 ++-------------------
 1 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 8feecf0..b6165e0 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -379,18 +379,12 @@ static long evtchn_ioctl(struct file *file,
 		if (unbind.port >= NR_EVENT_CHANNELS)
 			break;
 
-		spin_lock_irq(&port_user_lock);
-
 		rc = -ENOTCONN;
-		if (get_port_user(unbind.port) != u) {
-			spin_unlock_irq(&port_user_lock);
+		if (get_port_user(unbind.port) != u)
 			break;
-		}
 
 		disable_irq(irq_from_evtchn(unbind.port));
 
-		spin_unlock_irq(&port_user_lock);
-
 		evtchn_unbind_from_user(u, unbind.port);
 
 		rc = 0;
@@ -490,26 +484,15 @@ static int evtchn_release(struct inode *inode, struct file
*filp)
 	int i;
 	struct per_user_data *u = filp->private_data;
 
-	spin_lock_irq(&port_user_lock);
-
-	free_page((unsigned long)u->ring);
-
 	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
 		if (get_port_user(i) != u)
 			continue;
 
 		disable_irq(irq_from_evtchn(i));
-	}
-
-	spin_unlock_irq(&port_user_lock);
-
-	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
-		if (get_port_user(i) != u)
-			continue;
-
 		evtchn_unbind_from_user(get_port_user(i), i);
 	}
 
+	free_page((unsigned long)u->ring);
 	kfree(u->name);
 	kfree(u);
 
-- 
1.7.2.5

David Vrabel

2013-Jul-19 14:51 UTC

head link

[PATCH 2/3] xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()

From: David Vrabel <david.vrabel@citrix.com>

In m2p_remove_override() when removing the grant map from the kernel
mapping and replacing with a mapping to the original page, the grant
unmap will already have flushed the TLB and it is not necessary to do
it again after updating the mapping.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/xen/p2m.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 95fb2aa..74672ee 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -1003,7 +1003,6 @@ int m2p_remove_override(struct page *page,
 
 			set_pte_at(&init_mm, address, ptep,
 					pfn_pte(pfn, PAGE_KERNEL));
-			__flush_tlb_single(address);
 			kmap_op->host_addr = 0;
 		}
 	}
-- 
1.7.2.5

David Vrabel

2013-Jul-19 14:52 UTC

head link

[PATCH 3/3] xen/evtchn: improve scalability by using per-user locks

From: David Vrabel <david.vrabel@citrix.com>

The global array of port users and the port_user_lock limits
scalability of the evtchn device.  Instead of the global array lookup,
use a per-use (per-fd) tree of event channels bound by that user and
protect the tree with a per-user lock.

This is also a prerequiste for extended the number of supported event
channels, by removing the fixed size, per-event channel array.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/evtchn.c |  192 +++++++++++++++++++++++++++++---------------------
 1 files changed, 112 insertions(+), 80 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index b6165e0..f328f12 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -57,6 +57,7 @@
 
 struct per_user_data {
 	struct mutex bind_mutex; /* serialize bind/unbind operations */
+	struct rb_root evtchns;
 
 	/* Notification ring, accessed via /dev/xen/evtchn. */
 #define EVTCHN_RING_SIZE     (PAGE_SIZE / sizeof(evtchn_port_t))
@@ -64,6 +65,7 @@ struct per_user_data {
 	evtchn_port_t *ring;
 	unsigned int ring_cons, ring_prod, ring_overflow;
 	struct mutex ring_cons_mutex; /* protect against concurrent readers */
+	spinlock_t ring_prod_lock; /* product against concurrent interrupts */
 
 	/* Processes wait on this queue when ring is empty. */
 	wait_queue_head_t evtchn_wait;
@@ -71,54 +73,79 @@ struct per_user_data {
 	const char *name;
 };
 
-/*
- * Who''s bound to each port?  This is logically an array of struct
- * per_user_data *, but we encode the current enabled-state in bit 0.
- */
-static unsigned long *port_user;
-static DEFINE_SPINLOCK(port_user_lock); /* protects port_user[] and ring_prod
*/
+struct user_evtchn {
+	struct rb_node node;
+	struct per_user_data *user;
+	unsigned port;
+	bool enabled;
+};
 
-static inline struct per_user_data *get_port_user(unsigned port)
+static int add_evtchn(struct per_user_data *u, struct user_evtchn *evtchn)
 {
-	return (struct per_user_data *)(port_user[port] & ~1);
-}
+	struct rb_node **new = &(u->evtchns.rb_node), *parent = NULL;
 
-static inline void set_port_user(unsigned port, struct per_user_data *u)
-{
-	port_user[port] = (unsigned long)u;
+	while (*new) {
+		struct user_evtchn *this;
+
+		this = container_of(*new, struct user_evtchn, node);
+
+		parent = *new;
+		if (this->port < evtchn->port)
+			new = &((*new)->rb_left);
+		else if (this->port > evtchn->port)
+			new = &((*new)->rb_right);
+		else
+			return -EEXIST;
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&evtchn->node, parent, new);
+	rb_insert_color(&evtchn->node, &u->evtchns);
+
+	return 0;
 }
 
-static inline bool get_port_enabled(unsigned port)
+static void del_evtchn(struct per_user_data *u, struct user_evtchn *evtchn)
 {
-	return port_user[port] & 1;
+	rb_erase(&evtchn->node, &u->evtchns);
+	kfree(evtchn);
 }
 
-static inline void set_port_enabled(unsigned port, bool enabled)
+static struct user_evtchn *find_evtchn(struct per_user_data *u, unsigned port)
 {
-	if (enabled)
-		port_user[port] |= 1;
-	else
-		port_user[port] &= ~1;
+	struct rb_node *node = u->evtchns.rb_node;
+
+	while (node) {
+		struct user_evtchn *evtchn;
+
+		evtchn = container_of(node, struct user_evtchn, node);
+
+		if (evtchn->port < port)
+			node = node->rb_left;
+		else if (evtchn->port > port)
+			node = node->rb_right;
+		else
+			return evtchn;
+	}
+	return NULL;
 }
 
 static irqreturn_t evtchn_interrupt(int irq, void *data)
 {
-	unsigned int port = (unsigned long)data;
-	struct per_user_data *u;
-
-	spin_lock(&port_user_lock);
-
-	u = get_port_user(port);
+	struct user_evtchn *evtchn = data;
+	struct per_user_data *u = evtchn->user;
 
-	WARN(!get_port_enabled(port),
+	WARN(!evtchn->enabled,
 	     "Interrupt for port %d, but apparently not enabled; per-user
%p\n",
-	     port, u);
+	     evtchn->port, u);
 
 	disable_irq_nosync(irq);
-	set_port_enabled(port, false);
+	evtchn->enabled = false;
+
+	spin_lock(&u->ring_prod_lock);
 
 	if ((u->ring_prod - u->ring_cons) < EVTCHN_RING_SIZE) {
-		u->ring[EVTCHN_RING_MASK(u->ring_prod)] = port;
+		u->ring[EVTCHN_RING_MASK(u->ring_prod)] = evtchn->port;
 		wmb(); /* Ensure ring contents visible */
 		if (u->ring_cons == u->ring_prod++) {
 			wake_up_interruptible(&u->evtchn_wait);
@@ -128,7 +155,7 @@ static irqreturn_t evtchn_interrupt(int irq, void *data)
 	} else
 		u->ring_overflow = 1;
 
-	spin_unlock(&port_user_lock);
+	spin_unlock(&u->ring_prod_lock);
 
 	return IRQ_HANDLED;
 }
@@ -229,20 +256,20 @@ static ssize_t evtchn_write(struct file *file, const char
__user *buf,
 	if (copy_from_user(kbuf, buf, count) != 0)
 		goto out;
 
-	spin_lock_irq(&port_user_lock);
+	mutex_lock(&u->bind_mutex);
 
 	for (i = 0; i < (count/sizeof(evtchn_port_t)); i++) {
 		unsigned port = kbuf[i];
+		struct user_evtchn *evtchn;
 
-		if (port < NR_EVENT_CHANNELS &&
-		    get_port_user(port) == u &&
-		    !get_port_enabled(port)) {
-			set_port_enabled(port, true);
+		evtchn = find_evtchn(u, port);
+		if (evtchn && !evtchn->enabled) {
+			evtchn->enabled = true;
 			enable_irq(irq_from_evtchn(port));
 		}
 	}
 
-	spin_unlock_irq(&port_user_lock);
+	mutex_unlock(&u->bind_mutex);
 
 	rc = count;
 
@@ -253,6 +280,8 @@ static ssize_t evtchn_write(struct file *file, const char
__user *buf,
 
 static int evtchn_bind_to_user(struct per_user_data *u, int port)
 {
+	struct user_evtchn *evtchn;
+	struct evtchn_close close;
 	int rc = 0;
 
 	/*
@@ -263,35 +292,47 @@ static int evtchn_bind_to_user(struct per_user_data *u,
int port)
 	 * interrupt handler yet, and our caller has already
 	 * serialized bind operations.)
 	 */
-	BUG_ON(get_port_user(port) != NULL);
-	set_port_user(port, u);
-	set_port_enabled(port, true); /* start enabled */
+
+	evtchn = kzalloc(sizeof(*evtchn), GFP_KERNEL);
+	if (!evtchn)
+		return -ENOMEM;
+
+	evtchn->user = u;
+	evtchn->port = port;
+	evtchn->enabled = true; /* start enabled */
+
+	rc = add_evtchn(u, evtchn);
+	if (rc < 0)
+		goto err;
 
 	rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
-				       u->name, (void *)(unsigned long)port);
-	if (rc >= 0)
-		rc = evtchn_make_refcounted(port);
-	else {
-		/* bind failed, should close the port now */
-		struct evtchn_close close;
-		close.port = port;
-		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
-			BUG();
-		set_port_user(port, NULL);
-	}
+				       u->name, evtchn);
+	if (rc < 0)
+		goto err;
 
+	rc = evtchn_make_refcounted(port);
+	return rc;
+
+err:
+	/* bind failed, should close the port now */
+	close.port = port;
+	if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+		BUG();
+	del_evtchn(u, evtchn);
+	kfree(evtchn);
 	return rc;
 }
 
-static void evtchn_unbind_from_user(struct per_user_data *u, int port)
+static void evtchn_unbind_from_user(struct per_user_data *u,
+				    struct user_evtchn *evtchn)
 {
-	int irq = irq_from_evtchn(port);
+	int irq = irq_from_evtchn(evtchn->port);
 
 	BUG_ON(irq < 0);
 
-	unbind_from_irqhandler(irq, (void *)(unsigned long)port);
+	unbind_from_irqhandler(irq, evtchn);
 
-	set_port_user(port, NULL);
+	del_evtchn(u, evtchn);
 }
 
 static long evtchn_ioctl(struct file *file,
@@ -370,6 +411,7 @@ static long evtchn_ioctl(struct file *file,
 
 	case IOCTL_EVTCHN_UNBIND: {
 		struct ioctl_evtchn_unbind unbind;
+		struct user_evtchn *evtchn;
 
 		rc = -EFAULT;
 		if (copy_from_user(&unbind, uarg, sizeof(unbind)))
@@ -380,29 +422,27 @@ static long evtchn_ioctl(struct file *file,
 			break;
 
 		rc = -ENOTCONN;
-		if (get_port_user(unbind.port) != u)
+		evtchn = find_evtchn(u, unbind.port);
+		if (!evtchn)
 			break;
 
 		disable_irq(irq_from_evtchn(unbind.port));
-
-		evtchn_unbind_from_user(u, unbind.port);
-
+		evtchn_unbind_from_user(u, evtchn);
 		rc = 0;
 		break;
 	}
 
 	case IOCTL_EVTCHN_NOTIFY: {
 		struct ioctl_evtchn_notify notify;
+		struct user_evtchn *evtchn;
 
 		rc = -EFAULT;
 		if (copy_from_user(&notify, uarg, sizeof(notify)))
 			break;
 
-		if (notify.port >= NR_EVENT_CHANNELS) {
-			rc = -EINVAL;
-		} else if (get_port_user(notify.port) != u) {
-			rc = -ENOTCONN;
-		} else {
+		rc = -ENOTCONN;
+		evtchn = find_evtchn(u, notify.port);
+		if (evtchn) {
 			notify_remote_via_evtchn(notify.port);
 			rc = 0;
 		}
@@ -412,9 +452,9 @@ static long evtchn_ioctl(struct file *file,
 	case IOCTL_EVTCHN_RESET: {
 		/* Initialise the ring to empty. Clear errors. */
 		mutex_lock(&u->ring_cons_mutex);
-		spin_lock_irq(&port_user_lock);
+		spin_lock_irq(&u->ring_prod_lock);
 		u->ring_cons = u->ring_prod = u->ring_overflow = 0;
-		spin_unlock_irq(&port_user_lock);
+		spin_unlock_irq(&u->ring_prod_lock);
 		mutex_unlock(&u->ring_cons_mutex);
 		rc = 0;
 		break;
@@ -473,6 +513,7 @@ static int evtchn_open(struct inode *inode, struct file
*filp)
 
 	mutex_init(&u->bind_mutex);
 	mutex_init(&u->ring_cons_mutex);
+	spin_lock_init(&u->ring_prod_lock);
 
 	filp->private_data = u;
 
@@ -481,15 +522,15 @@ static int evtchn_open(struct inode *inode, struct file
*filp)
 
 static int evtchn_release(struct inode *inode, struct file *filp)
 {
-	int i;
 	struct per_user_data *u = filp->private_data;
+	struct rb_node *node;
 
-	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
-		if (get_port_user(i) != u)
-			continue;
+	while ((node = u->evtchns.rb_node)) {
+		struct user_evtchn *evtchn;
 
-		disable_irq(irq_from_evtchn(i));
-		evtchn_unbind_from_user(get_port_user(i), i);
+		evtchn = rb_entry(node, struct user_evtchn, node);
+		disable_irq(irq_from_evtchn(evtchn->port));
+		evtchn_unbind_from_user(u, evtchn);
 	}
 
 	free_page((unsigned long)u->ring);
@@ -523,12 +564,6 @@ static int __init evtchn_init(void)
 	if (!xen_domain())
 		return -ENODEV;
 
-	port_user = kcalloc(NR_EVENT_CHANNELS, sizeof(*port_user), GFP_KERNEL);
-	if (port_user == NULL)
-		return -ENOMEM;
-
-	spin_lock_init(&port_user_lock);
-
 	/* Create ''/dev/xen/evtchn''. */
 	err = misc_register(&evtchn_miscdev);
 	if (err != 0) {
@@ -543,9 +578,6 @@ static int __init evtchn_init(void)
 
 static void __exit evtchn_cleanup(void)
 {
-	kfree(port_user);
-	port_user = NULL;
-
 	misc_deregister(&evtchn_miscdev);
 }
 
-- 
1.7.2.5

David Vrabel

2013-Jul-19 14:57 UTC

head link

Re: [PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements

On 19/07/13 15:51, David Vrabel wrote:> 
> Patch 3 improves the scalability of the evtchn device when it is used
> by multiple processes (e.g., multiple qemus).  As you can see from the
> graph[1] it is a signficant improvement but still less than ideal.  I
> suspect that this may be due to the per-domain event lock inside Xen
> rather than anything kernel-side.
> 
> The graphed data was collected from dom0 with 8 VCPUs on a host with 8
> CPUs.
Attached is the (slightly cheesy) test program I used to generate the
results.  This also triggered the deadlock fixed by patch 1.

It spawns N threads each of which opens /dev/xen/evtchn channel sets up
an event channel with both ends in the same domain.  The event channels
are manually distributed between the VCPUs.

David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Stefano Stabellini

2013-Jul-21 14:12 UTC

head link

Re: [PATCH 2/3] xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()

On Fri, 19 Jul 2013, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
> 
> In m2p_remove_override() when removing the grant map from the kernel
> mapping and replacing with a mapping to the original page, the grant
> unmap will already have flushed the TLB and it is not necessary to do
> it again after updating the mapping.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

>  arch/x86/xen/p2m.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 95fb2aa..74672ee 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -1003,7 +1003,6 @@ int m2p_remove_override(struct page *page,
>  
>  			set_pte_at(&init_mm, address, ptep,
>  					pfn_pte(pfn, PAGE_KERNEL));
> -			__flush_tlb_single(address);
>  			kmap_op->host_addr = 0;
>  		}
>  	}
> -- 
> 1.7.2.5
>

Konrad Rzeszutek Wilk

2013-Jul-29 14:07 UTC

head link

Re: [PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel

On Fri, Jul 19, 2013 at 03:51:58PM +0100, David Vrabel
wrote:> From: David Vrabel <david.vrabel@citrix.com>
> 
> Unbinding an event channel (either with the ioctl or when the evtchn
> device is closed) may deadlock because disable_irq() is called with
> port_user_lock held which is also locked by the interrupt handler.
So what you are saying is that if the ioctl IOCTL_EVTCHN_UNBIND
is called (and takes an spinlock) and the evtchn_interrupt triggers
it would deadlock?

But isn''t this the IRQ variant of spinlock? Which disables interrupts?
Could you perhaps write this out a bit with CPU1 and CPU2 in seperate
columns? I think I must missing something.

Thanks!> 
> Using get_port_user() to check if a port''s user is safe without
the
> spin lock (as it''s protected by u->bind_mutex in the ioctl) so
just
> remove the unnesssary locking.
What about in the interrupt handler? It does not use the mutex?
How will it protect the get_port_user() from being stale?

David Vrabel

2013-Jul-29 15:35 UTC

head link

Re: [PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel

On 29/07/13 15:07, Konrad Rzeszutek Wilk wrote:> On Fri, Jul 19, 2013 at 03:51:58PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Unbinding an event channel (either with the ioctl or when the evtchn
>> device is closed) may deadlock because disable_irq() is called with
>> port_user_lock held which is also locked by the interrupt handler.
> 
> So what you are saying is that if the ioctl IOCTL_EVTCHN_UNBIND
> is called (and takes an spinlock) and the evtchn_interrupt triggers
> it would deadlock?
> 
> But isn''t this the IRQ variant of spinlock? Which disables
interrupts?
> Could you perhaps write this out a bit with CPU1 and CPU2 in seperate
> columns? I think I must missing something.
Disabling local interrupts doesn''t help.

Remember that disable_irq() waits for the interrupt handler on all CPUs
to stop running.  If the irq occurs on another VCPU, it tries to take
port_user_lock and can''t because the unbind ioctl is holding it.
>> Using get_port_user() to check if a port''s user is safe
without the
>> spin lock (as it''s protected by u->bind_mutex in the ioctl)
so just
>> remove the unnesssary locking.
> 
> What about in the interrupt handler? It does not use the mutex?
> How will it protect the get_port_user() from being stale?
The unbind disables the irq before making the port user stale, so when
you clear it you are guaranteed that the interrupt handler that might
use that port cannot be running.

David

Xen devel - Jul 2013 - [PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements

[PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements

[PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel

[PATCH 2/3] xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()

[PATCH 3/3] xen/evtchn: improve scalability by using per-user locks

Re: [PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements

Re: [PATCH 2/3] xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()

Re: [PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel

Re: [PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel