David Vrabel
2013-Jul-19 14:51 UTC
[PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements
This series includes a small collection of fixes improving the evtchn and gntdev devices. Patch 1 fixes a bug in the evtchn device that may cause deadlocks when unbinding events or closing the device. You may wish to consider it for stable. Patch 2 is a (very) minor performance improvement to m2p_remove_override() when used by the the gntdev device. The TLB flush that it did not have any significant performance cost (since it''s a local CPU and single address only). Patch 3 improves the scalability of the evtchn device when it is used by multiple processes (e.g., multiple qemus). As you can see from the graph[1] it is a signficant improvement but still less than ideal. I suspect that this may be due to the per-domain event lock inside Xen rather than anything kernel-side. The graphed data was collected from dom0 with 8 VCPUs on a host with 8 CPUs. David [1] http://xenbits.xen.org/people/dvrabel/evtchn-device-scalability.png
David Vrabel
2013-Jul-19 14:51 UTC
[PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel
From: David Vrabel <david.vrabel@citrix.com> Unbinding an event channel (either with the ioctl or when the evtchn device is closed) may deadlock because disable_irq() is called with port_user_lock held which is also locked by the interrupt handler. Using get_port_user() to check if a port''s user is safe without the spin lock (as it''s protected by u->bind_mutex in the ioctl) so just remove the unnesssary locking. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- drivers/xen/evtchn.c | 21 ++------------------- 1 files changed, 2 insertions(+), 19 deletions(-) diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index 8feecf0..b6165e0 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -379,18 +379,12 @@ static long evtchn_ioctl(struct file *file, if (unbind.port >= NR_EVENT_CHANNELS) break; - spin_lock_irq(&port_user_lock); - rc = -ENOTCONN; - if (get_port_user(unbind.port) != u) { - spin_unlock_irq(&port_user_lock); + if (get_port_user(unbind.port) != u) break; - } disable_irq(irq_from_evtchn(unbind.port)); - spin_unlock_irq(&port_user_lock); - evtchn_unbind_from_user(u, unbind.port); rc = 0; @@ -490,26 +484,15 @@ static int evtchn_release(struct inode *inode, struct file *filp) int i; struct per_user_data *u = filp->private_data; - spin_lock_irq(&port_user_lock); - - free_page((unsigned long)u->ring); - for (i = 0; i < NR_EVENT_CHANNELS; i++) { if (get_port_user(i) != u) continue; disable_irq(irq_from_evtchn(i)); - } - - spin_unlock_irq(&port_user_lock); - - for (i = 0; i < NR_EVENT_CHANNELS; i++) { - if (get_port_user(i) != u) - continue; - evtchn_unbind_from_user(get_port_user(i), i); } + free_page((unsigned long)u->ring); kfree(u->name); kfree(u); -- 1.7.2.5
David Vrabel
2013-Jul-19 14:51 UTC
[PATCH 2/3] xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()
From: David Vrabel <david.vrabel@citrix.com> In m2p_remove_override() when removing the grant map from the kernel mapping and replacing with a mapping to the original page, the grant unmap will already have flushed the TLB and it is not necessary to do it again after updating the mapping. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- arch/x86/xen/p2m.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 95fb2aa..74672ee 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -1003,7 +1003,6 @@ int m2p_remove_override(struct page *page, set_pte_at(&init_mm, address, ptep, pfn_pte(pfn, PAGE_KERNEL)); - __flush_tlb_single(address); kmap_op->host_addr = 0; } } -- 1.7.2.5
David Vrabel
2013-Jul-19 14:52 UTC
[PATCH 3/3] xen/evtchn: improve scalability by using per-user locks
From: David Vrabel <david.vrabel@citrix.com> The global array of port users and the port_user_lock limits scalability of the evtchn device. Instead of the global array lookup, use a per-use (per-fd) tree of event channels bound by that user and protect the tree with a per-user lock. This is also a prerequiste for extended the number of supported event channels, by removing the fixed size, per-event channel array. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- drivers/xen/evtchn.c | 192 +++++++++++++++++++++++++++++--------------------- 1 files changed, 112 insertions(+), 80 deletions(-) diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index b6165e0..f328f12 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -57,6 +57,7 @@ struct per_user_data { struct mutex bind_mutex; /* serialize bind/unbind operations */ + struct rb_root evtchns; /* Notification ring, accessed via /dev/xen/evtchn. */ #define EVTCHN_RING_SIZE (PAGE_SIZE / sizeof(evtchn_port_t)) @@ -64,6 +65,7 @@ struct per_user_data { evtchn_port_t *ring; unsigned int ring_cons, ring_prod, ring_overflow; struct mutex ring_cons_mutex; /* protect against concurrent readers */ + spinlock_t ring_prod_lock; /* product against concurrent interrupts */ /* Processes wait on this queue when ring is empty. */ wait_queue_head_t evtchn_wait; @@ -71,54 +73,79 @@ struct per_user_data { const char *name; }; -/* - * Who''s bound to each port? This is logically an array of struct - * per_user_data *, but we encode the current enabled-state in bit 0. - */ -static unsigned long *port_user; -static DEFINE_SPINLOCK(port_user_lock); /* protects port_user[] and ring_prod */ +struct user_evtchn { + struct rb_node node; + struct per_user_data *user; + unsigned port; + bool enabled; +}; -static inline struct per_user_data *get_port_user(unsigned port) +static int add_evtchn(struct per_user_data *u, struct user_evtchn *evtchn) { - return (struct per_user_data *)(port_user[port] & ~1); -} + struct rb_node **new = &(u->evtchns.rb_node), *parent = NULL; -static inline void set_port_user(unsigned port, struct per_user_data *u) -{ - port_user[port] = (unsigned long)u; + while (*new) { + struct user_evtchn *this; + + this = container_of(*new, struct user_evtchn, node); + + parent = *new; + if (this->port < evtchn->port) + new = &((*new)->rb_left); + else if (this->port > evtchn->port) + new = &((*new)->rb_right); + else + return -EEXIST; + } + + /* Add new node and rebalance tree. */ + rb_link_node(&evtchn->node, parent, new); + rb_insert_color(&evtchn->node, &u->evtchns); + + return 0; } -static inline bool get_port_enabled(unsigned port) +static void del_evtchn(struct per_user_data *u, struct user_evtchn *evtchn) { - return port_user[port] & 1; + rb_erase(&evtchn->node, &u->evtchns); + kfree(evtchn); } -static inline void set_port_enabled(unsigned port, bool enabled) +static struct user_evtchn *find_evtchn(struct per_user_data *u, unsigned port) { - if (enabled) - port_user[port] |= 1; - else - port_user[port] &= ~1; + struct rb_node *node = u->evtchns.rb_node; + + while (node) { + struct user_evtchn *evtchn; + + evtchn = container_of(node, struct user_evtchn, node); + + if (evtchn->port < port) + node = node->rb_left; + else if (evtchn->port > port) + node = node->rb_right; + else + return evtchn; + } + return NULL; } static irqreturn_t evtchn_interrupt(int irq, void *data) { - unsigned int port = (unsigned long)data; - struct per_user_data *u; - - spin_lock(&port_user_lock); - - u = get_port_user(port); + struct user_evtchn *evtchn = data; + struct per_user_data *u = evtchn->user; - WARN(!get_port_enabled(port), + WARN(!evtchn->enabled, "Interrupt for port %d, but apparently not enabled; per-user %p\n", - port, u); + evtchn->port, u); disable_irq_nosync(irq); - set_port_enabled(port, false); + evtchn->enabled = false; + + spin_lock(&u->ring_prod_lock); if ((u->ring_prod - u->ring_cons) < EVTCHN_RING_SIZE) { - u->ring[EVTCHN_RING_MASK(u->ring_prod)] = port; + u->ring[EVTCHN_RING_MASK(u->ring_prod)] = evtchn->port; wmb(); /* Ensure ring contents visible */ if (u->ring_cons == u->ring_prod++) { wake_up_interruptible(&u->evtchn_wait); @@ -128,7 +155,7 @@ static irqreturn_t evtchn_interrupt(int irq, void *data) } else u->ring_overflow = 1; - spin_unlock(&port_user_lock); + spin_unlock(&u->ring_prod_lock); return IRQ_HANDLED; } @@ -229,20 +256,20 @@ static ssize_t evtchn_write(struct file *file, const char __user *buf, if (copy_from_user(kbuf, buf, count) != 0) goto out; - spin_lock_irq(&port_user_lock); + mutex_lock(&u->bind_mutex); for (i = 0; i < (count/sizeof(evtchn_port_t)); i++) { unsigned port = kbuf[i]; + struct user_evtchn *evtchn; - if (port < NR_EVENT_CHANNELS && - get_port_user(port) == u && - !get_port_enabled(port)) { - set_port_enabled(port, true); + evtchn = find_evtchn(u, port); + if (evtchn && !evtchn->enabled) { + evtchn->enabled = true; enable_irq(irq_from_evtchn(port)); } } - spin_unlock_irq(&port_user_lock); + mutex_unlock(&u->bind_mutex); rc = count; @@ -253,6 +280,8 @@ static ssize_t evtchn_write(struct file *file, const char __user *buf, static int evtchn_bind_to_user(struct per_user_data *u, int port) { + struct user_evtchn *evtchn; + struct evtchn_close close; int rc = 0; /* @@ -263,35 +292,47 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port) * interrupt handler yet, and our caller has already * serialized bind operations.) */ - BUG_ON(get_port_user(port) != NULL); - set_port_user(port, u); - set_port_enabled(port, true); /* start enabled */ + + evtchn = kzalloc(sizeof(*evtchn), GFP_KERNEL); + if (!evtchn) + return -ENOMEM; + + evtchn->user = u; + evtchn->port = port; + evtchn->enabled = true; /* start enabled */ + + rc = add_evtchn(u, evtchn); + if (rc < 0) + goto err; rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED, - u->name, (void *)(unsigned long)port); - if (rc >= 0) - rc = evtchn_make_refcounted(port); - else { - /* bind failed, should close the port now */ - struct evtchn_close close; - close.port = port; - if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) - BUG(); - set_port_user(port, NULL); - } + u->name, evtchn); + if (rc < 0) + goto err; + rc = evtchn_make_refcounted(port); + return rc; + +err: + /* bind failed, should close the port now */ + close.port = port; + if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) + BUG(); + del_evtchn(u, evtchn); + kfree(evtchn); return rc; } -static void evtchn_unbind_from_user(struct per_user_data *u, int port) +static void evtchn_unbind_from_user(struct per_user_data *u, + struct user_evtchn *evtchn) { - int irq = irq_from_evtchn(port); + int irq = irq_from_evtchn(evtchn->port); BUG_ON(irq < 0); - unbind_from_irqhandler(irq, (void *)(unsigned long)port); + unbind_from_irqhandler(irq, evtchn); - set_port_user(port, NULL); + del_evtchn(u, evtchn); } static long evtchn_ioctl(struct file *file, @@ -370,6 +411,7 @@ static long evtchn_ioctl(struct file *file, case IOCTL_EVTCHN_UNBIND: { struct ioctl_evtchn_unbind unbind; + struct user_evtchn *evtchn; rc = -EFAULT; if (copy_from_user(&unbind, uarg, sizeof(unbind))) @@ -380,29 +422,27 @@ static long evtchn_ioctl(struct file *file, break; rc = -ENOTCONN; - if (get_port_user(unbind.port) != u) + evtchn = find_evtchn(u, unbind.port); + if (!evtchn) break; disable_irq(irq_from_evtchn(unbind.port)); - - evtchn_unbind_from_user(u, unbind.port); - + evtchn_unbind_from_user(u, evtchn); rc = 0; break; } case IOCTL_EVTCHN_NOTIFY: { struct ioctl_evtchn_notify notify; + struct user_evtchn *evtchn; rc = -EFAULT; if (copy_from_user(¬ify, uarg, sizeof(notify))) break; - if (notify.port >= NR_EVENT_CHANNELS) { - rc = -EINVAL; - } else if (get_port_user(notify.port) != u) { - rc = -ENOTCONN; - } else { + rc = -ENOTCONN; + evtchn = find_evtchn(u, notify.port); + if (evtchn) { notify_remote_via_evtchn(notify.port); rc = 0; } @@ -412,9 +452,9 @@ static long evtchn_ioctl(struct file *file, case IOCTL_EVTCHN_RESET: { /* Initialise the ring to empty. Clear errors. */ mutex_lock(&u->ring_cons_mutex); - spin_lock_irq(&port_user_lock); + spin_lock_irq(&u->ring_prod_lock); u->ring_cons = u->ring_prod = u->ring_overflow = 0; - spin_unlock_irq(&port_user_lock); + spin_unlock_irq(&u->ring_prod_lock); mutex_unlock(&u->ring_cons_mutex); rc = 0; break; @@ -473,6 +513,7 @@ static int evtchn_open(struct inode *inode, struct file *filp) mutex_init(&u->bind_mutex); mutex_init(&u->ring_cons_mutex); + spin_lock_init(&u->ring_prod_lock); filp->private_data = u; @@ -481,15 +522,15 @@ static int evtchn_open(struct inode *inode, struct file *filp) static int evtchn_release(struct inode *inode, struct file *filp) { - int i; struct per_user_data *u = filp->private_data; + struct rb_node *node; - for (i = 0; i < NR_EVENT_CHANNELS; i++) { - if (get_port_user(i) != u) - continue; + while ((node = u->evtchns.rb_node)) { + struct user_evtchn *evtchn; - disable_irq(irq_from_evtchn(i)); - evtchn_unbind_from_user(get_port_user(i), i); + evtchn = rb_entry(node, struct user_evtchn, node); + disable_irq(irq_from_evtchn(evtchn->port)); + evtchn_unbind_from_user(u, evtchn); } free_page((unsigned long)u->ring); @@ -523,12 +564,6 @@ static int __init evtchn_init(void) if (!xen_domain()) return -ENODEV; - port_user = kcalloc(NR_EVENT_CHANNELS, sizeof(*port_user), GFP_KERNEL); - if (port_user == NULL) - return -ENOMEM; - - spin_lock_init(&port_user_lock); - /* Create ''/dev/xen/evtchn''. */ err = misc_register(&evtchn_miscdev); if (err != 0) { @@ -543,9 +578,6 @@ static int __init evtchn_init(void) static void __exit evtchn_cleanup(void) { - kfree(port_user); - port_user = NULL; - misc_deregister(&evtchn_miscdev); } -- 1.7.2.5
David Vrabel
2013-Jul-19 14:57 UTC
Re: [PATCH 0/3] xen: evtchn and gntdev device fixes and perf improvements
On 19/07/13 15:51, David Vrabel wrote:> > Patch 3 improves the scalability of the evtchn device when it is used > by multiple processes (e.g., multiple qemus). As you can see from the > graph[1] it is a signficant improvement but still less than ideal. I > suspect that this may be due to the per-domain event lock inside Xen > rather than anything kernel-side. > > The graphed data was collected from dom0 with 8 VCPUs on a host with 8 > CPUs.Attached is the (slightly cheesy) test program I used to generate the results. This also triggered the deadlock fixed by patch 1. It spawns N threads each of which opens /dev/xen/evtchn channel sets up an event channel with both ends in the same domain. The event channels are manually distributed between the VCPUs. David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefano Stabellini
2013-Jul-21 14:12 UTC
Re: [PATCH 2/3] xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()
On Fri, 19 Jul 2013, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > In m2p_remove_override() when removing the grant map from the kernel > mapping and replacing with a mapping to the original page, the grant > unmap will already have flushed the TLB and it is not necessary to do > it again after updating the mapping. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> arch/x86/xen/p2m.c | 1 - > 1 files changed, 0 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c > index 95fb2aa..74672ee 100644 > --- a/arch/x86/xen/p2m.c > +++ b/arch/x86/xen/p2m.c > @@ -1003,7 +1003,6 @@ int m2p_remove_override(struct page *page, > > set_pte_at(&init_mm, address, ptep, > pfn_pte(pfn, PAGE_KERNEL)); > - __flush_tlb_single(address); > kmap_op->host_addr = 0; > } > } > -- > 1.7.2.5 >
Konrad Rzeszutek Wilk
2013-Jul-29 14:07 UTC
Re: [PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel
On Fri, Jul 19, 2013 at 03:51:58PM +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Unbinding an event channel (either with the ioctl or when the evtchn > device is closed) may deadlock because disable_irq() is called with > port_user_lock held which is also locked by the interrupt handler.So what you are saying is that if the ioctl IOCTL_EVTCHN_UNBIND is called (and takes an spinlock) and the evtchn_interrupt triggers it would deadlock? But isn''t this the IRQ variant of spinlock? Which disables interrupts? Could you perhaps write this out a bit with CPU1 and CPU2 in seperate columns? I think I must missing something. Thanks!> > Using get_port_user() to check if a port''s user is safe without the > spin lock (as it''s protected by u->bind_mutex in the ioctl) so just > remove the unnesssary locking.What about in the interrupt handler? It does not use the mutex? How will it protect the get_port_user() from being stale?
David Vrabel
2013-Jul-29 15:35 UTC
Re: [PATCH 1/3] xen/evtchn: avoid a deadlock when unbinding an event channel
On 29/07/13 15:07, Konrad Rzeszutek Wilk wrote:> On Fri, Jul 19, 2013 at 03:51:58PM +0100, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> Unbinding an event channel (either with the ioctl or when the evtchn >> device is closed) may deadlock because disable_irq() is called with >> port_user_lock held which is also locked by the interrupt handler. > > So what you are saying is that if the ioctl IOCTL_EVTCHN_UNBIND > is called (and takes an spinlock) and the evtchn_interrupt triggers > it would deadlock? > > But isn''t this the IRQ variant of spinlock? Which disables interrupts? > Could you perhaps write this out a bit with CPU1 and CPU2 in seperate > columns? I think I must missing something.Disabling local interrupts doesn''t help. Remember that disable_irq() waits for the interrupt handler on all CPUs to stop running. If the irq occurs on another VCPU, it tries to take port_user_lock and can''t because the unbind ioctl is holding it.>> Using get_port_user() to check if a port''s user is safe without the >> spin lock (as it''s protected by u->bind_mutex in the ioctl) so just >> remove the unnesssary locking. > > What about in the interrupt handler? It does not use the mutex? > How will it protect the get_port_user() from being stale?The unbind disables the irq before making the port user stale, so when you clear it you are guaranteed that the interrupt handler that might use that port cannot be running. David