Daniel De Graaf
2011-Sep-16 21:14 UTC
[Xen-devel] [PATCH 0/2] Add reference counting to grant notify ioctls
The current notify ioctls assume that an event channel will not be closed prior to the page being unmapped. If the mappings are associated with an open file descriptor and the application crashes, the notification behavior depends on the close ordering of the file descriptors. To avoid this, event channels now have a reference count that is used by the grant notify ioctls to postpone the close operation until the notification is fired. [PATCH 1/2] xen/event: Add reference counting to event channel [PATCH 2/2] xen/gnt{dev,alloc}: reserve event channels for notify _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Sep-16 21:14 UTC
[Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
Event channels exposed to userspace by the evtchn module may be used by other modules in an asynchronous manner, which requires that reference counting be used to prevent the event channel from being closed before the signals are delivered. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/events.c | 38 ++++++++++++++++++++++++++++++++++++++ include/xen/events.h | 6 ++++++ 2 files changed, 44 insertions(+), 0 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index da70f5c..c9343b9 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -89,6 +89,7 @@ struct irq_info { struct list_head list; enum xen_irq_type type; /* type */ + unsigned short refcount; unsigned irq; unsigned short evtchn; /* event channel */ unsigned short cpu; /* cpu bound */ @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) panic("Unable to allocate metadata for IRQ%d\n", irq); info->type = IRQT_UNBOUND; + info->refcount = 1; irq_set_handler_data(irq, info); @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) irq_set_handler_data(irq, NULL); + BUG_ON(info->refcount > 1); + kfree(info); /* Legacy IRQ descriptors are managed by the arch. */ @@ -912,9 +916,14 @@ static void unbind_from_irq(unsigned int irq) { struct evtchn_close close; int evtchn = evtchn_from_irq(irq); + struct irq_info *info = irq_get_handler_data(irq); spin_lock(&irq_mapping_update_lock); + info->refcount--; + if (info->refcount > 0) + goto out_unlock; + if (VALID_EVTCHN(evtchn)) { close.port = evtchn; if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) @@ -943,6 +952,7 @@ static void unbind_from_irq(unsigned int irq) xen_free_irq(irq); + out_unlock: spin_unlock(&irq_mapping_update_lock); } @@ -1038,6 +1048,34 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) } EXPORT_SYMBOL_GPL(unbind_from_irqhandler); +int get_evtchn_reservation(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + struct irq_info *info; + + if (irq == -1) + return -ENOENT; + + info = irq_get_handler_data(irq); + + if (!info) + return -ENOENT; + + spin_lock(&irq_mapping_update_lock); + info->refcount++; + spin_unlock(&irq_mapping_update_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(get_evtchn_reservation); + +void put_evtchn_reservation(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + unbind_from_irq(irq); +} +EXPORT_SYMBOL_GPL(put_evtchn_reservation); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) { int irq = per_cpu(ipi_to_irq, cpu)[vector]; diff --git a/include/xen/events.h b/include/xen/events.h index d287997..23bd5fd 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, */ void unbind_from_irqhandler(unsigned int irq, void *dev_id); +/* + * Allow extra references to event channels exposed to userspace by evtchn + */ +int get_evtchn_reservation(unsigned int evtchn); +void put_evtchn_reservation(unsigned int evtchn); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); int resend_irq_on_evtchn(unsigned int irq); void rebind_evtchn_irq(int evtchn, int irq); -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Sep-16 21:14 UTC
[Xen-devel] [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 14 +++++++++++++- drivers/xen/gntdev.c | 11 +++++++++++ 2 files changed, 24 insertions(+), 1 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index f6832f4..cff862b 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) tmp[gref->notify.pgoff] = 0; kunmap(gref->page); } - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(gref->notify.event); + put_evtchn_reservation(gref->notify.event); + } gref->notify.flags = 0; @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, goto unlock_out; } + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (get_evtchn_reservation(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + put_evtchn_reservation(gref->notify.event); + gref->notify.flags = op.action; gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index f914b26..8de393c 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(map->notify.event); + put_evtchn_reservation(map->notify.event); } if (map->pages) { @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) goto unlock_out; } + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (get_evtchn_reservation(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + put_evtchn_reservation(map->notify.event); + map->notify.flags = op.action; map->notify.addr = op.index - (map->index << PAGE_SHIFT); map->notify.event = op.event_channel_port; -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Sep-17 04:50 UTC
Re: [Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
On 09/16/2011 02:14 PM, Daniel De Graaf wrote:> Event channels exposed to userspace by the evtchn module may be used by > other modules in an asynchronous manner, which requires that reference > counting be used to prevent the event channel from being closed before > the signals are delivered.Could you use the refcounting at the irq level? I was quite pleased to have removed the event channel refcounting (and the use of naked event channels). Oh, is it that userspace allocates an event channel with /dev/evtchn, then passes that event channel to the gntalloc/gntdev drivers so they can use it to pass events between the two. That''s a bit unfortunate; it might have been better to expose those event channels as file descriptors so you could use fd refcounting to manage the lifetimes. What''s the downside of sending the event after the event channel has closed? That said:> > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/events.c | 38 ++++++++++++++++++++++++++++++++++++++ > include/xen/events.h | 6 ++++++ > 2 files changed, 44 insertions(+), 0 deletions(-) > > diff --git a/drivers/xen/events.c b/drivers/xen/events.c > index da70f5c..c9343b9 100644 > --- a/drivers/xen/events.c > +++ b/drivers/xen/events.c > @@ -89,6 +89,7 @@ struct irq_info > { > struct list_head list; > enum xen_irq_type type; /* type */ > + unsigned short refcount;Is short large enough? Is this something that untrusted userspace could end up wrapping? If short is sufficient, you should pack it next to the other short fields to avoid a gap.> unsigned irq; > unsigned short evtchn; /* event channel */ > unsigned short cpu; /* cpu bound */ > @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) > panic("Unable to allocate metadata for IRQ%d\n", irq); > > info->type = IRQT_UNBOUND; > + info->refcount = 1; > > irq_set_handler_data(irq, info); > > @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) > > irq_set_handler_data(irq, NULL); > > + BUG_ON(info->refcount > 1); > + > kfree(info); > > /* Legacy IRQ descriptors are managed by the arch. */ > @@ -912,9 +916,14 @@ static void unbind_from_irq(unsigned int irq) > { > struct evtchn_close close; > int evtchn = evtchn_from_irq(irq); > + struct irq_info *info = irq_get_handler_data(irq); > > spin_lock(&irq_mapping_update_lock); > > + info->refcount--; > + if (info->refcount > 0) > + goto out_unlock; > + > if (VALID_EVTCHN(evtchn)) { > close.port = evtchn; > if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) > @@ -943,6 +952,7 @@ static void unbind_from_irq(unsigned int irq) > > xen_free_irq(irq); > > + out_unlock: > spin_unlock(&irq_mapping_update_lock); > } > > @@ -1038,6 +1048,34 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) > } > EXPORT_SYMBOL_GPL(unbind_from_irqhandler); > > +int get_evtchn_reservation(unsigned int evtchn)"reservation"? I think just evtchn_get/put would be more consistent with kernel naming conventions.> +{ > + int irq = evtchn_to_irq[evtchn]; > + struct irq_info *info; > + > + if (irq == -1) > + return -ENOENT; > + > + info = irq_get_handler_data(irq); > + > + if (!info) > + return -ENOENT; > + > + spin_lock(&irq_mapping_update_lock); > + info->refcount++; > + spin_unlock(&irq_mapping_update_lock);What is this spinlock protecting against? The non-atomicity of ++, or something larger scale? If its just an atomicity thing, should it be an atomic_t?> + > + return 0; > +} > +EXPORT_SYMBOL_GPL(get_evtchn_reservation); > + > +void put_evtchn_reservation(unsigned int evtchn) > +{ > + int irq = evtchn_to_irq[evtchn]; > + unbind_from_irq(irq);Hm.> +} > +EXPORT_SYMBOL_GPL(put_evtchn_reservation); > + > void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) > { > int irq = per_cpu(ipi_to_irq, cpu)[vector]; > diff --git a/include/xen/events.h b/include/xen/events.h > index d287997..23bd5fd 100644 > --- a/include/xen/events.h > +++ b/include/xen/events.h > @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, > */ > void unbind_from_irqhandler(unsigned int irq, void *dev_id); > > +/* > + * Allow extra references to event channels exposed to userspace by evtchn > + */ > +int get_evtchn_reservation(unsigned int evtchn); > +void put_evtchn_reservation(unsigned int evtchn); > + > void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); > int resend_irq_on_evtchn(unsigned int irq); > void rebind_evtchn_irq(int evtchn, int irq);_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Sep-17 06:11 UTC
Re: [Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
On 17/09/2011 05:50, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:> On 09/16/2011 02:14 PM, Daniel De Graaf wrote: >> Event channels exposed to userspace by the evtchn module may be used by >> other modules in an asynchronous manner, which requires that reference >> counting be used to prevent the event channel from being closed before >> the signals are delivered. > > Could you use the refcounting at the irq level? I was quite pleased to > have removed the event channel refcounting (and the use of naked event > channels). > > Oh, is it that userspace allocates an event channel with /dev/evtchn, > then passes that event channel to the gntalloc/gntdev drivers so they > can use it to pass events between the two. That''s a bit unfortunate; it > might have been better to expose those event channels as file > descriptors so you could use fd refcounting to manage the lifetimes.The ''event channels'' returned by /dev/evtchn are really 32-bit opaque tokens, and can be whatever you want, as long as all kernel drivers are updated to treat them consistently. -- Keir> What''s the downside of sending the event after the event channel has closed? > > That said: > >> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/events.c | 38 ++++++++++++++++++++++++++++++++++++++ >> include/xen/events.h | 6 ++++++ >> 2 files changed, 44 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/xen/events.c b/drivers/xen/events.c >> index da70f5c..c9343b9 100644 >> --- a/drivers/xen/events.c >> +++ b/drivers/xen/events.c >> @@ -89,6 +89,7 @@ struct irq_info >> { >> struct list_head list; >> enum xen_irq_type type; /* type */ >> + unsigned short refcount; > > Is short large enough? Is this something that untrusted userspace could > end up wrapping? If short is sufficient, you should pack it next to the > other short fields to avoid a gap. > >> unsigned irq; >> unsigned short evtchn; /* event channel */ >> unsigned short cpu; /* cpu bound */ >> @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) >> panic("Unable to allocate metadata for IRQ%d\n", irq); >> >> info->type = IRQT_UNBOUND; >> + info->refcount = 1; >> >> irq_set_handler_data(irq, info); >> >> @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) >> >> irq_set_handler_data(irq, NULL); >> >> + BUG_ON(info->refcount > 1); >> + >> kfree(info); >> >> /* Legacy IRQ descriptors are managed by the arch. */ >> @@ -912,9 +916,14 @@ static void unbind_from_irq(unsigned int irq) >> { >> struct evtchn_close close; >> int evtchn = evtchn_from_irq(irq); >> + struct irq_info *info = irq_get_handler_data(irq); >> >> spin_lock(&irq_mapping_update_lock); >> >> + info->refcount--; >> + if (info->refcount > 0) >> + goto out_unlock; >> + >> if (VALID_EVTCHN(evtchn)) { >> close.port = evtchn; >> if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) >> @@ -943,6 +952,7 @@ static void unbind_from_irq(unsigned int irq) >> >> xen_free_irq(irq); >> >> + out_unlock: >> spin_unlock(&irq_mapping_update_lock); >> } >> >> @@ -1038,6 +1048,34 @@ void unbind_from_irqhandler(unsigned int irq, void >> *dev_id) >> } >> EXPORT_SYMBOL_GPL(unbind_from_irqhandler); >> >> +int get_evtchn_reservation(unsigned int evtchn) > "reservation"? I think just evtchn_get/put would be more consistent > with kernel naming conventions. > >> +{ >> + int irq = evtchn_to_irq[evtchn]; >> + struct irq_info *info; >> + >> + if (irq == -1) >> + return -ENOENT; >> + >> + info = irq_get_handler_data(irq); >> + >> + if (!info) >> + return -ENOENT; >> + >> + spin_lock(&irq_mapping_update_lock); >> + info->refcount++; >> + spin_unlock(&irq_mapping_update_lock); > > What is this spinlock protecting against? The non-atomicity of ++, or > something larger scale? If its just an atomicity thing, should it be an > atomic_t? > >> + >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(get_evtchn_reservation); >> + >> +void put_evtchn_reservation(unsigned int evtchn) >> +{ >> + int irq = evtchn_to_irq[evtchn]; >> + unbind_from_irq(irq); > > Hm. > >> +} >> +EXPORT_SYMBOL_GPL(put_evtchn_reservation); >> + >> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) >> { >> int irq = per_cpu(ipi_to_irq, cpu)[vector]; >> diff --git a/include/xen/events.h b/include/xen/events.h >> index d287997..23bd5fd 100644 >> --- a/include/xen/events.h >> +++ b/include/xen/events.h >> @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int >> remote_domain, >> */ >> void unbind_from_irqhandler(unsigned int irq, void *dev_id); >> >> +/* >> + * Allow extra references to event channels exposed to userspace by evtchn >> + */ >> +int get_evtchn_reservation(unsigned int evtchn); >> +void put_evtchn_reservation(unsigned int evtchn); >> + >> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); >> int resend_irq_on_evtchn(unsigned int irq); >> void rebind_evtchn_irq(int evtchn, int irq); > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Sep-19 15:50 UTC
Re: [Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
On 09/17/2011 12:50 AM, Jeremy Fitzhardinge wrote:> On 09/16/2011 02:14 PM, Daniel De Graaf wrote: >> Event channels exposed to userspace by the evtchn module may be used by >> other modules in an asynchronous manner, which requires that reference >> counting be used to prevent the event channel from being closed before >> the signals are delivered. > > Could you use the refcounting at the irq level? I was quite pleased to > have removed the event channel refcounting (and the use of naked event > channels).This looked more complex: unbind_from_irq has cleanup that happens after EVTCHNOP_close but before the irq-refcount-protected close operation. For the IRQ-level refcounting to be useful here, the EVTCHNOP_close would have to be postponed. The evtchn_to_irq mapping is also maintained here.> Oh, is it that userspace allocates an event channel with /dev/evtchn, > then passes that event channel to the gntalloc/gntdev drivers so they > can use it to pass events between the two. That''s a bit unfortunate; it > might have been better to expose those event channels as file > descriptors so you could use fd refcounting to manage the lifetimes.This would also make event channels simpler to use from userspace, avoiding the extra read() to determine what event channel has fired (and avoiding almost all userspace knowledge of the local event channel number). However, this would require practically rewriting the /dev/xen/evtchn userspace API. With the current event channel API, you would have to pass both the event channel number and the evtchn file descriptor in order to get an fd reference which seems redundant and might cause confusion since keeping the fd reference would also prevent cleanup of other event channels associated with the shared file descriptor. It would also require changing the notify ioctl parameters since there is currently no need to pass an evtchn fd.> What''s the downside of sending the event after the event channel has closed?The event won''t get sent at all, since the hypervisor sees that the local end of the event channel is closed. If the local event channel number happens to get reused, the event could also get sent to the wrong event channel, although this is generally harmless.> That said: > >> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/events.c | 38 ++++++++++++++++++++++++++++++++++++++ >> include/xen/events.h | 6 ++++++ >> 2 files changed, 44 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/xen/events.c b/drivers/xen/events.c >> index da70f5c..c9343b9 100644 >> --- a/drivers/xen/events.c >> +++ b/drivers/xen/events.c >> @@ -89,6 +89,7 @@ struct irq_info >> { >> struct list_head list; >> enum xen_irq_type type; /* type */ >> + unsigned short refcount; > > Is short large enough? Is this something that untrusted userspace could > end up wrapping? If short is sufficient, you should pack it next to the > other short fields to avoid a gap.This would be quite difficult: since only gntdev/gntalloc use these counts for now, you would have to map or allocate 64K pages; the gntalloc driver has a default limit of 1024 pages and gntdev is limited by the hypervisor max_nr_grant_frames. Anyway, changing to atomic_t should>> unsigned irq; >> unsigned short evtchn; /* event channel */ >> unsigned short cpu; /* cpu bound */ >> @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) >> panic("Unable to allocate metadata for IRQ%d\n", irq); >> >> info->type = IRQT_UNBOUND; >> + info->refcount = 1; >> >> irq_set_handler_data(irq, info); >> >> @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) >> >> irq_set_handler_data(irq, NULL); >> >> + BUG_ON(info->refcount > 1); >> + >> kfree(info); >> >> /* Legacy IRQ descriptors are managed by the arch. */ >> @@ -912,9 +916,14 @@ static void unbind_from_irq(unsigned int irq) >> { >> struct evtchn_close close; >> int evtchn = evtchn_from_irq(irq); >> + struct irq_info *info = irq_get_handler_data(irq); >> >> spin_lock(&irq_mapping_update_lock); >> >> + info->refcount--; >> + if (info->refcount > 0) >> + goto out_unlock; >> + >> if (VALID_EVTCHN(evtchn)) { >> close.port = evtchn; >> if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) >> @@ -943,6 +952,7 @@ static void unbind_from_irq(unsigned int irq) >> >> xen_free_irq(irq); >> >> + out_unlock: >> spin_unlock(&irq_mapping_update_lock); >> } >> >> @@ -1038,6 +1048,34 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) >> } >> EXPORT_SYMBOL_GPL(unbind_from_irqhandler); >> >> +int get_evtchn_reservation(unsigned int evtchn) > "reservation"? I think just evtchn_get/put would be more consistent > with kernel naming conventions.Yes.>> +{ >> + int irq = evtchn_to_irq[evtchn]; >> + struct irq_info *info; >> + >> + if (irq == -1) >> + return -ENOENT; >> + >> + info = irq_get_handler_data(irq); >> + >> + if (!info) >> + return -ENOENT; >> + >> + spin_lock(&irq_mapping_update_lock); >> + info->refcount++; >> + spin_unlock(&irq_mapping_update_lock); > > What is this spinlock protecting against? The non-atomicity of ++, or > something larger scale? If its just an atomicity thing, should it be an > atomic_t?Just atomicity.>> + >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(get_evtchn_reservation); >> + >> +void put_evtchn_reservation(unsigned int evtchn) >> +{ >> + int irq = evtchn_to_irq[evtchn]; >> + unbind_from_irq(irq); > > Hm.Similar name change here.>> +} >> +EXPORT_SYMBOL_GPL(put_evtchn_reservation); >> + >> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) >> { >> int irq = per_cpu(ipi_to_irq, cpu)[vector]; >> diff --git a/include/xen/events.h b/include/xen/events.h >> index d287997..23bd5fd 100644 >> --- a/include/xen/events.h >> +++ b/include/xen/events.h >> @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, >> */ >> void unbind_from_irqhandler(unsigned int irq, void *dev_id); >> >> +/* >> + * Allow extra references to event channels exposed to userspace by evtchn >> + */ >> +int get_evtchn_reservation(unsigned int evtchn); >> +void put_evtchn_reservation(unsigned int evtchn); >> + >> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); >> int resend_irq_on_evtchn(unsigned int irq); >> void rebind_evtchn_irq(int evtchn, int irq); >-- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Sep-19 16:22 UTC
Re: [Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
On 09/19/2011 08:50 AM, Daniel De Graaf wrote:> On 09/17/2011 12:50 AM, Jeremy Fitzhardinge wrote: >> On 09/16/2011 02:14 PM, Daniel De Graaf wrote: >>> Event channels exposed to userspace by the evtchn module may be used by >>> other modules in an asynchronous manner, which requires that reference >>> counting be used to prevent the event channel from being closed before >>> the signals are delivered. >> Could you use the refcounting at the irq level? I was quite pleased to >> have removed the event channel refcounting (and the use of naked event >> channels). > This looked more complex: unbind_from_irq has cleanup that happens after > EVTCHNOP_close but before the irq-refcount-protected close operation. > For the IRQ-level refcounting to be useful here, the EVTCHNOP_close would > have to be postponed. The evtchn_to_irq mapping is also maintained here. > >> Oh, is it that userspace allocates an event channel with /dev/evtchn, >> then passes that event channel to the gntalloc/gntdev drivers so they >> can use it to pass events between the two. That''s a bit unfortunate; it >> might have been better to expose those event channels as file >> descriptors so you could use fd refcounting to manage the lifetimes. > This would also make event channels simpler to use from userspace, avoiding > the extra read() to determine what event channel has fired (and avoiding > almost all userspace knowledge of the local event channel number). However, > this would require practically rewriting the /dev/xen/evtchn userspace API.If you had an app which cared about many events, then having an fd for each would be pretty cumbersome. But perhaps it would be worth considering extending the current API a bit to add a general "set of events fd", where a given event can be the member of multiple sets.> With the current event channel API, you would have to pass both the event > channel number and the evtchn file descriptor in order to get an fd reference > which seems redundant and might cause confusion since keeping the fd > reference would also prevent cleanup of other event channels associated with > the shared file descriptor. It would also require changing the notify ioctl > parameters since there is currently no need to pass an evtchn fd.Yes, though there''s no reason even now one couldn''t be careful to bind your events to a given fd where they share similar lifetimes/uses (even to the extent of allocating single-event fds). On the other hand, is there really a need to make a connection between the gnt and evtchn devices? Why couldn''t the gnt driver just do its own form of event fd and management independent of /dev/evtchn? I know it seems a bit redundant, but coupling the two adds its own complexities.>> What''s the downside of sending the event after the event channel has closed? > The event won''t get sent at all, since the hypervisor sees that the local > end of the event channel is closed. If the local event channel number > happens to get reused, the event could also get sent to the wrong event > channel, although this is generally harmless.I think /dev/evtchn port numbers are never reused. And it sounds like it is purely a usermode problem if they close the event channel prematurely. J>> That said: >> >>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >>> --- >>> drivers/xen/events.c | 38 ++++++++++++++++++++++++++++++++++++++ >>> include/xen/events.h | 6 ++++++ >>> 2 files changed, 44 insertions(+), 0 deletions(-) >>> >>> diff --git a/drivers/xen/events.c b/drivers/xen/events.c >>> index da70f5c..c9343b9 100644 >>> --- a/drivers/xen/events.c >>> +++ b/drivers/xen/events.c >>> @@ -89,6 +89,7 @@ struct irq_info >>> { >>> struct list_head list; >>> enum xen_irq_type type; /* type */ >>> + unsigned short refcount; >> Is short large enough? Is this something that untrusted userspace could >> end up wrapping? If short is sufficient, you should pack it next to the >> other short fields to avoid a gap. > This would be quite difficult: since only gntdev/gntalloc use these counts > for now, you would have to map or allocate 64K pages; the gntalloc driver > has a default limit of 1024 pages and gntdev is limited by the hypervisor > max_nr_grant_frames. > > Anyway, changing to atomic_t should > >>> unsigned irq; >>> unsigned short evtchn; /* event channel */ >>> unsigned short cpu; /* cpu bound */ >>> @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) >>> panic("Unable to allocate metadata for IRQ%d\n", irq); >>> >>> info->type = IRQT_UNBOUND; >>> + info->refcount = 1; >>> >>> irq_set_handler_data(irq, info); >>> >>> @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) >>> >>> irq_set_handler_data(irq, NULL); >>> >>> + BUG_ON(info->refcount > 1); >>> + >>> kfree(info); >>> >>> /* Legacy IRQ descriptors are managed by the arch. */ >>> @@ -912,9 +916,14 @@ static void unbind_from_irq(unsigned int irq) >>> { >>> struct evtchn_close close; >>> int evtchn = evtchn_from_irq(irq); >>> + struct irq_info *info = irq_get_handler_data(irq); >>> >>> spin_lock(&irq_mapping_update_lock); >>> >>> + info->refcount--; >>> + if (info->refcount > 0) >>> + goto out_unlock; >>> + >>> if (VALID_EVTCHN(evtchn)) { >>> close.port = evtchn; >>> if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) >>> @@ -943,6 +952,7 @@ static void unbind_from_irq(unsigned int irq) >>> >>> xen_free_irq(irq); >>> >>> + out_unlock: >>> spin_unlock(&irq_mapping_update_lock); >>> } >>> >>> @@ -1038,6 +1048,34 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) >>> } >>> EXPORT_SYMBOL_GPL(unbind_from_irqhandler); >>> >>> +int get_evtchn_reservation(unsigned int evtchn) >> "reservation"? I think just evtchn_get/put would be more consistent >> with kernel naming conventions. > Yes. > >>> +{ >>> + int irq = evtchn_to_irq[evtchn]; >>> + struct irq_info *info; >>> + >>> + if (irq == -1) >>> + return -ENOENT; >>> + >>> + info = irq_get_handler_data(irq); >>> + >>> + if (!info) >>> + return -ENOENT; >>> + >>> + spin_lock(&irq_mapping_update_lock); >>> + info->refcount++; >>> + spin_unlock(&irq_mapping_update_lock); >> What is this spinlock protecting against? The non-atomicity of ++, or >> something larger scale? If its just an atomicity thing, should it be an >> atomic_t? > Just atomicity. > >>> + >>> + return 0; >>> +} >>> +EXPORT_SYMBOL_GPL(get_evtchn_reservation); >>> + >>> +void put_evtchn_reservation(unsigned int evtchn) >>> +{ >>> + int irq = evtchn_to_irq[evtchn]; >>> + unbind_from_irq(irq); >> Hm. > Similar name change here. > >>> +} >>> +EXPORT_SYMBOL_GPL(put_evtchn_reservation); >>> + >>> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) >>> { >>> int irq = per_cpu(ipi_to_irq, cpu)[vector]; >>> diff --git a/include/xen/events.h b/include/xen/events.h >>> index d287997..23bd5fd 100644 >>> --- a/include/xen/events.h >>> +++ b/include/xen/events.h >>> @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, >>> */ >>> void unbind_from_irqhandler(unsigned int irq, void *dev_id); >>> >>> +/* >>> + * Allow extra references to event channels exposed to userspace by evtchn >>> + */ >>> +int get_evtchn_reservation(unsigned int evtchn); >>> +void put_evtchn_reservation(unsigned int evtchn); >>> + >>> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); >>> int resend_irq_on_evtchn(unsigned int irq); >>> void rebind_evtchn_irq(int evtchn, int irq); >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Sep-19 17:53 UTC
Re: [Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
On 09/19/2011 12:22 PM, Jeremy Fitzhardinge wrote:> On 09/19/2011 08:50 AM, Daniel De Graaf wrote: >> On 09/17/2011 12:50 AM, Jeremy Fitzhardinge wrote: >>> On 09/16/2011 02:14 PM, Daniel De Graaf wrote: >>>> Event channels exposed to userspace by the evtchn module may be used by >>>> other modules in an asynchronous manner, which requires that reference >>>> counting be used to prevent the event channel from being closed before >>>> the signals are delivered. >>> Could you use the refcounting at the irq level? I was quite pleased to >>> have removed the event channel refcounting (and the use of naked event >>> channels). >> This looked more complex: unbind_from_irq has cleanup that happens after >> EVTCHNOP_close but before the irq-refcount-protected close operation. >> For the IRQ-level refcounting to be useful here, the EVTCHNOP_close would >> have to be postponed. The evtchn_to_irq mapping is also maintained here. >> >>> Oh, is it that userspace allocates an event channel with /dev/evtchn, >>> then passes that event channel to the gntalloc/gntdev drivers so they >>> can use it to pass events between the two. That''s a bit unfortunate; it >>> might have been better to expose those event channels as file >>> descriptors so you could use fd refcounting to manage the lifetimes. >> This would also make event channels simpler to use from userspace, avoiding >> the extra read() to determine what event channel has fired (and avoiding >> almost all userspace knowledge of the local event channel number). However, >> this would require practically rewriting the /dev/xen/evtchn userspace API. > > If you had an app which cared about many events, then having an fd for > each would be pretty cumbersome. But perhaps it would be worth > considering extending the current API a bit to add a general "set of > events fd", where a given event can be the member of multiple sets.This seems to overlap a lot with epoll, which can already do overlapping file descriptor readiness checks.>> With the current event channel API, you would have to pass both the event >> channel number and the evtchn file descriptor in order to get an fd reference >> which seems redundant and might cause confusion since keeping the fd >> reference would also prevent cleanup of other event channels associated with >> the shared file descriptor. It would also require changing the notify ioctl >> parameters since there is currently no need to pass an evtchn fd. > > Yes, though there''s no reason even now one couldn''t be careful to bind > your events to a given fd where they share similar lifetimes/uses (even > to the extent of allocating single-event fds). > > On the other hand, is there really a need to make a connection between > the gnt and evtchn devices? Why couldn''t the gnt driver just do its own > form of event fd and management independent of /dev/evtchn? I know it > seems a bit redundant, but coupling the two adds its own complexities.I don''t think there is a real need to make a connection, and even with this patch there isn''t one: only tracking on the event channel, not on the evtchn device or its FDs. Unless you''re talking about having /dev/xen/gnt* able to manage event channels independent of /dev/xen/evtchn? To be useful, this would either need to be able to both send and receive events, or to share local/remote ports with evtchn.>>> What''s the downside of sending the event after the event channel has closed? >> The event won''t get sent at all, since the hypervisor sees that the local >> end of the event channel is closed. If the local event channel number >> happens to get reused, the event could also get sent to the wrong event >> channel, although this is generally harmless. > > I think /dev/evtchn port numbers are never reused. And it sounds like > it is purely a usermode problem if they close the event channel prematurely.The trigger in this case is the kernel closing file descriptors when a process crashes. In particular: 1. open /dev/xen/evtchn (FD 3) and bind_interdomain an event channel 2. open /dev/xen/gntdev (FD 4) map a granted page 3. use unmap_notify on fd-4 to setup a notification on the event channel allocated in step 1 4. segfault, SIGKILL, or some other exit without proper cleanup At this point, the kernel closes file descriptors in numerical order, which will cause the event channel to be closed before the gntdev device is able to send its unmap notification event. In libvchan, I make sure to swap steps 1 and 2 to avoid triggering this bug, but that is clearly a workaround and not a proper solution. I have seen /dev/xen/evtchn ports be reused; the comment in evtchn.c is misleading or incorrect. The port is allocated by the hypervisor in xen/common/event_channel.c:get_free_port.> J > >>> That said: >>> >>>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >>>> --- >>>> drivers/xen/events.c | 38 ++++++++++++++++++++++++++++++++++++++ >>>> include/xen/events.h | 6 ++++++ >>>> 2 files changed, 44 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/drivers/xen/events.c b/drivers/xen/events.c >>>> index da70f5c..c9343b9 100644 >>>> --- a/drivers/xen/events.c >>>> +++ b/drivers/xen/events.c >>>> @@ -89,6 +89,7 @@ struct irq_info >>>> { >>>> struct list_head list; >>>> enum xen_irq_type type; /* type */ >>>> + unsigned short refcount; >>> Is short large enough? Is this something that untrusted userspace could >>> end up wrapping? If short is sufficient, you should pack it next to the >>> other short fields to avoid a gap. >> This would be quite difficult: since only gntdev/gntalloc use these counts >> for now, you would have to map or allocate 64K pages; the gntalloc driver >> has a default limit of 1024 pages and gntdev is limited by the hypervisor >> max_nr_grant_frames. >> >> Anyway, changing to atomic_t should >> >>>> unsigned irq; >>>> unsigned short evtchn; /* event channel */ >>>> unsigned short cpu; /* cpu bound */ >>>> @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) >>>> panic("Unable to allocate metadata for IRQ%d\n", irq); >>>> >>>> info->type = IRQT_UNBOUND; >>>> + info->refcount = 1; >>>> >>>> irq_set_handler_data(irq, info); >>>> >>>> @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) >>>> >>>> irq_set_handler_data(irq, NULL); >>>> >>>> + BUG_ON(info->refcount > 1); >>>> + >>>> kfree(info); >>>> >>>> /* Legacy IRQ descriptors are managed by the arch. */ >>>> @@ -912,9 +916,14 @@ static void unbind_from_irq(unsigned int irq) >>>> { >>>> struct evtchn_close close; >>>> int evtchn = evtchn_from_irq(irq); >>>> + struct irq_info *info = irq_get_handler_data(irq); >>>> >>>> spin_lock(&irq_mapping_update_lock); >>>> >>>> + info->refcount--; >>>> + if (info->refcount > 0) >>>> + goto out_unlock; >>>> + >>>> if (VALID_EVTCHN(evtchn)) { >>>> close.port = evtchn; >>>> if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) >>>> @@ -943,6 +952,7 @@ static void unbind_from_irq(unsigned int irq) >>>> >>>> xen_free_irq(irq); >>>> >>>> + out_unlock: >>>> spin_unlock(&irq_mapping_update_lock); >>>> } >>>> >>>> @@ -1038,6 +1048,34 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) >>>> } >>>> EXPORT_SYMBOL_GPL(unbind_from_irqhandler); >>>> >>>> +int get_evtchn_reservation(unsigned int evtchn) >>> "reservation"? I think just evtchn_get/put would be more consistent >>> with kernel naming conventions. >> Yes. >> >>>> +{ >>>> + int irq = evtchn_to_irq[evtchn]; >>>> + struct irq_info *info; >>>> + >>>> + if (irq == -1) >>>> + return -ENOENT; >>>> + >>>> + info = irq_get_handler_data(irq); >>>> + >>>> + if (!info) >>>> + return -ENOENT; >>>> + >>>> + spin_lock(&irq_mapping_update_lock); >>>> + info->refcount++; >>>> + spin_unlock(&irq_mapping_update_lock); >>> What is this spinlock protecting against? The non-atomicity of ++, or >>> something larger scale? If its just an atomicity thing, should it be an >>> atomic_t? >> Just atomicity. >> >>>> + >>>> + return 0; >>>> +} >>>> +EXPORT_SYMBOL_GPL(get_evtchn_reservation); >>>> + >>>> +void put_evtchn_reservation(unsigned int evtchn) >>>> +{ >>>> + int irq = evtchn_to_irq[evtchn]; >>>> + unbind_from_irq(irq); >>> Hm. >> Similar name change here. >> >>>> +} >>>> +EXPORT_SYMBOL_GPL(put_evtchn_reservation); >>>> + >>>> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) >>>> { >>>> int irq = per_cpu(ipi_to_irq, cpu)[vector]; >>>> diff --git a/include/xen/events.h b/include/xen/events.h >>>> index d287997..23bd5fd 100644 >>>> --- a/include/xen/events.h >>>> +++ b/include/xen/events.h >>>> @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, >>>> */ >>>> void unbind_from_irqhandler(unsigned int irq, void *dev_id); >>>> >>>> +/* >>>> + * Allow extra references to event channels exposed to userspace by evtchn >>>> + */ >>>> +int get_evtchn_reservation(unsigned int evtchn); >>>> +void put_evtchn_reservation(unsigned int evtchn); >>>> + >>>> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); >>>> int resend_irq_on_evtchn(unsigned int irq); >>>> void rebind_evtchn_irq(int evtchn, int irq); >> >-- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-18 21:04 UTC
[Xen-devel] [PATCH v2 0/2] Add reference counting to grant notify ioctls
> The current notify ioctls assume that an event channel will not be > closed prior to the page being unmapped. If the mappings are associated > with an open file descriptor and the application crashes, the > notification behavior depends on the close ordering of the file > descriptors. To avoid this, event channels now have a reference count > that is used by the grant notify ioctls to postpone the close operation > until the notification is fired.Changes since v1: Rename evtchn_get/put to match kernel naming conventions Use atomic_t for refcount [PATCH 1/2] xen/event: Add reference counting to event channel [PATCH 2/2] xen/gnt{dev,alloc}: reserve event channels for notify _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-18 21:04 UTC
[Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
Event channels exposed to userspace by the evtchn module may be used by other modules in an asynchronous manner, which requires that reference counting be used to prevent the event channel from being closed before the signals are delivered. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/events.c | 34 ++++++++++++++++++++++++++++++++++ include/xen/events.h | 6 ++++++ 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 7523719..36d3390 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -88,6 +88,7 @@ enum xen_irq_type { struct irq_info { struct list_head list; + atomic_t refcount; enum xen_irq_type type; /* type */ unsigned irq; unsigned short evtchn; /* event channel */ @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) panic("Unable to allocate metadata for IRQ%d\n", irq); info->type = IRQT_UNBOUND; + atomic_set(&info->refcount, 1); irq_set_handler_data(irq, info); @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) irq_set_handler_data(irq, NULL); + BUG_ON(atomic_read(&info->refcount) > 1); + kfree(info); /* Legacy IRQ descriptors are managed by the arch. */ @@ -912,6 +916,10 @@ static void unbind_from_irq(unsigned int irq) { struct evtchn_close close; int evtchn = evtchn_from_irq(irq); + struct irq_info *info = irq_get_handler_data(irq); + + if (!atomic_dec_and_test(&info->refcount)) + return; mutex_lock(&irq_mapping_update_lock); @@ -1038,6 +1046,32 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) } EXPORT_SYMBOL_GPL(unbind_from_irqhandler); +int evtchn_get(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + struct irq_info *info; + + if (irq == -1) + return -ENOENT; + + info = irq_get_handler_data(irq); + + if (!info) + return -ENOENT; + + atomic_inc(&info->refcount); + + return 0; +} +EXPORT_SYMBOL_GPL(evtchn_get); + +void evtchn_put(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + unbind_from_irq(irq); +} +EXPORT_SYMBOL_GPL(evtchn_put); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) { int irq = per_cpu(ipi_to_irq, cpu)[vector]; diff --git a/include/xen/events.h b/include/xen/events.h index d287997..a459cca 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, */ void unbind_from_irqhandler(unsigned int irq, void *dev_id); +/* + * Allow extra references to event channels exposed to userspace by evtchn + */ +int evtchn_get(unsigned int evtchn); +void evtchn_put(unsigned int evtchn); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); int resend_irq_on_evtchn(unsigned int irq); void rebind_evtchn_irq(int evtchn, int irq); -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-18 21:04 UTC
[Xen-devel] [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 14 +++++++++++++- drivers/xen/gntdev.c | 11 +++++++++++ 2 files changed, 24 insertions(+), 1 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index f6832f4..a739fb1 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) tmp[gref->notify.pgoff] = 0; kunmap(gref->page); } - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(gref->notify.event); + evtchn_put(gref->notify.event); + } gref->notify.flags = 0; @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, goto unlock_out; } + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(gref->notify.event); + gref->notify.flags = op.action; gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index f914b26..cfcc890 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(map->notify.event); + evtchn_put(map->notify.event); } if (map->pages) { @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) goto unlock_out; } + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(map->notify.event); + map->notify.flags = op.action; map->notify.addr = op.index - (map->index << PAGE_SHIFT); map->notify.event = op.event_channel_port; -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-19 09:20 UTC
Re: [Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel
On Tue, 2011-10-18 at 22:04 +0100, Daniel De Graaf wrote:> Event channels exposed to userspace by the evtchn module may be used by > other modules in an asynchronous manner, which requires that reference > counting be used to prevent the event channel from being closed before > the signals are delivered. > > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/events.c | 34 ++++++++++++++++++++++++++++++++++ > include/xen/events.h | 6 ++++++ > 2 files changed, 40 insertions(+), 0 deletions(-) > > diff --git a/drivers/xen/events.c b/drivers/xen/events.c > index 7523719..36d3390 100644 > --- a/drivers/xen/events.c > +++ b/drivers/xen/events.c > @@ -88,6 +88,7 @@ enum xen_irq_type { > struct irq_info > { > struct list_head list; > + atomic_t refcount; > enum xen_irq_type type; /* type */ > unsigned irq; > unsigned short evtchn; /* event channel */ > @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) > panic("Unable to allocate metadata for IRQ%d\n", irq); > > info->type = IRQT_UNBOUND; > + atomic_set(&info->refcount, 1); > > irq_set_handler_data(irq, info); > > @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) > > irq_set_handler_data(irq, NULL); > > + BUG_ON(atomic_read(&info->refcount) > 1);Should this be > 0 ? If we get here with a reference count still held isn''t that indicative of an issue? Is this just because of the various error paths which call xen_free_irq to clean up without also dropping the initial reference from xen_irq_init? I think either these paths should just drop the reference or the initial refcount should be 0 and the initial reference should be taken only after the caller of xen_irq_init has actually succeeded.> + > kfree(info); > > /* Legacy IRQ descriptors are managed by the arch. */ > @@ -912,6 +916,10 @@ static void unbind_from_irq(unsigned int irq) > { > struct evtchn_close close; > int evtchn = evtchn_from_irq(irq); > + struct irq_info *info = irq_get_handler_data(irq); > + > + if (!atomic_dec_and_test(&info->refcount)) > + return; > > mutex_lock(&irq_mapping_update_lock); > > @@ -1038,6 +1046,32 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) > } > EXPORT_SYMBOL_GPL(unbind_from_irqhandler); > > +int evtchn_get(unsigned int evtchn) > +{ > + int irq = evtchn_to_irq[evtchn]; > + struct irq_info *info; > + > + if (irq == -1) > + return -ENOENT; > + > + info = irq_get_handler_data(irq); > + > + if (!info) > + return -ENOENT; > + > + atomic_inc(&info->refcount); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(evtchn_get); > + > +void evtchn_put(unsigned int evtchn) > +{ > + int irq = evtchn_to_irq[evtchn]; > + unbind_from_irq(irq); > +} > +EXPORT_SYMBOL_GPL(evtchn_put); > + > void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) > { > int irq = per_cpu(ipi_to_irq, cpu)[vector]; > diff --git a/include/xen/events.h b/include/xen/events.h > index d287997..a459cca 100644 > --- a/include/xen/events.h > +++ b/include/xen/events.h > @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, > */ > void unbind_from_irqhandler(unsigned int irq, void *dev_id); > > +/* > + * Allow extra references to event channels exposed to userspace by evtchn > + */ > +int evtchn_get(unsigned int evtchn); > +void evtchn_put(unsigned int evtchn); > + > void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); > int resend_irq_on_evtchn(unsigned int irq); > void rebind_evtchn_irq(int evtchn, int irq);_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-19 09:24 UTC
Re: [Xen-devel] [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On Tue, 2011-10-18 at 22:04 +0100, Daniel De Graaf wrote:> When using the unmap notify ioctl, the event channel used for > notification needs to be reserved to avoid it being deallocated prior to > sending the notification. > > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/gntalloc.c | 14 +++++++++++++- > drivers/xen/gntdev.c | 11 +++++++++++ > 2 files changed, 24 insertions(+), 1 deletions(-) > > diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c > index f6832f4..a739fb1 100644 > --- a/drivers/xen/gntalloc.c > +++ b/drivers/xen/gntalloc.c > @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) > tmp[gref->notify.pgoff] = 0; > kunmap(gref->page); > } > - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { > notify_remote_via_evtchn(gref->notify.event); > + evtchn_put(gref->notify.event); > + } > > gref->notify.flags = 0; > > @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, > goto unlock_out; > } > > + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > + if (evtchn_get(op.event_channel_port)) { > + rc = -EINVAL; > + goto unlock_out; > + } > + } > + > + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > + evtchn_put(gref->notify.event); > +If the gref gets torn down here won''t we notify and drop the reference on the wrong evtchn, leading to a double free? If we defer the drop until after gref->notify.event has been updated then this goes away. Ian.> gref->notify.flags = op.action; > gref->notify.pgoff = pgoff; > gref->notify.event = op.event_channel_port; > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index f914b26..cfcc890 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) > > if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { > notify_remote_via_evtchn(map->notify.event); > + evtchn_put(map->notify.event); > } > > if (map->pages) { > @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) > goto unlock_out; > } > > + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > + if (evtchn_get(op.event_channel_port)) { > + rc = -EINVAL; > + goto unlock_out; > + } > + } > + > + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > + evtchn_put(map->notify.event); > + > map->notify.flags = op.action; > map->notify.addr = op.index - (map->index << PAGE_SHIFT); > map->notify.event = op.event_channel_port;_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-19 14:23 UTC
Re: [Xen-devel] [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On 10/19/2011 05:24 AM, Ian Campbell wrote:> On Tue, 2011-10-18 at 22:04 +0100, Daniel De Graaf wrote: >> When using the unmap notify ioctl, the event channel used for >> notification needs to be reserved to avoid it being deallocated prior to >> sending the notification. >> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/gntalloc.c | 14 +++++++++++++- >> drivers/xen/gntdev.c | 11 +++++++++++ >> 2 files changed, 24 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c >> index f6832f4..a739fb1 100644 >> --- a/drivers/xen/gntalloc.c >> +++ b/drivers/xen/gntalloc.c >> @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) >> tmp[gref->notify.pgoff] = 0; >> kunmap(gref->page); >> } >> - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) >> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { >> notify_remote_via_evtchn(gref->notify.event); >> + evtchn_put(gref->notify.event); >> + } >> >> gref->notify.flags = 0; >> >> @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, >> goto unlock_out; >> } >> >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { >> + if (evtchn_get(op.event_channel_port)) { >> + rc = -EINVAL; >> + goto unlock_out; >> + } >> + } >> + >> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) >> + evtchn_put(gref->notify.event); >> + > > If the gref gets torn down here won''t we notify and drop the reference > on the wrong evtchn, leading to a double free? If we defer the drop > until after gref->notify.event has been updated then this goes away. > > Ian.This evtchn_put will only be called in the case where the unmap_notify is being changed and already had an event channel reference. This reference must be dropped prior to changing gref->notify.event or we will leak the old event channel.> >> gref->notify.flags = op.action; >> gref->notify.pgoff = pgoff; >> gref->notify.event = op.event_channel_port; >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >> index f914b26..cfcc890 100644 >> --- a/drivers/xen/gntdev.c >> +++ b/drivers/xen/gntdev.c >> @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) >> >> if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { >> notify_remote_via_evtchn(map->notify.event); >> + evtchn_put(map->notify.event); >> } >> >> if (map->pages) { >> @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) >> goto unlock_out; >> } >> >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { >> + if (evtchn_get(op.event_channel_port)) { >> + rc = -EINVAL; >> + goto unlock_out; >> + } >> + } >> + >> + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) >> + evtchn_put(map->notify.event); >> + >> map->notify.flags = op.action; >> map->notify.addr = op.index - (map->index << PAGE_SHIFT); >> map->notify.event = op.event_channel_port; > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-19 14:45 UTC
Re: [Xen-devel] [PATCH 2/2] xen/gnt{dev,alloc}: reserve event channels for notify
On Wed, 2011-10-19 at 15:23 +0100, Daniel De Graaf wrote:> On 10/19/2011 05:24 AM, Ian Campbell wrote: > > On Tue, 2011-10-18 at 22:04 +0100, Daniel De Graaf wrote: > >> When using the unmap notify ioctl, the event channel used for > >> notification needs to be reserved to avoid it being deallocated prior to > >> sending the notification. > >> > >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > >> --- > >> drivers/xen/gntalloc.c | 14 +++++++++++++- > >> drivers/xen/gntdev.c | 11 +++++++++++ > >> 2 files changed, 24 insertions(+), 1 deletions(-) > >> > >> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c > >> index f6832f4..a739fb1 100644 > >> --- a/drivers/xen/gntalloc.c > >> +++ b/drivers/xen/gntalloc.c > >> @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) > >> tmp[gref->notify.pgoff] = 0; > >> kunmap(gref->page); > >> } > >> - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > >> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { > >> notify_remote_via_evtchn(gref->notify.event); > >> + evtchn_put(gref->notify.event); > >> + } > >> > >> gref->notify.flags = 0; > >> > >> @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, > >> goto unlock_out; > >> } > >> > >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > >> + if (evtchn_get(op.event_channel_port)) { > >> + rc = -EINVAL; > >> + goto unlock_out; > >> + } > >> + } > >> + > >> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > >> + evtchn_put(gref->notify.event); > >> + > > > > If the gref gets torn down here won''t we notify and drop the reference > > on the wrong evtchn, leading to a double free? If we defer the drop > > until after gref->notify.event has been updated then this goes away. > > > > Ian. > > This evtchn_put will only be called in the case where the unmap_notify > is being changed and already had an event channel reference. This reference > must be dropped prior to changing gref->notify.event or we will leak > the old event channel.More importantly I see now that there is a lock protecting all this stuff so everything is ok. Ian.> > > > >> gref->notify.flags = op.action; > >> gref->notify.pgoff = pgoff; > >> gref->notify.event = op.event_channel_port; > >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > >> index f914b26..cfcc890 100644 > >> --- a/drivers/xen/gntdev.c > >> +++ b/drivers/xen/gntdev.c > >> @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) > >> > >> if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { > >> notify_remote_via_evtchn(map->notify.event); > >> + evtchn_put(map->notify.event); > >> } > >> > >> if (map->pages) { > >> @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) > >> goto unlock_out; > >> } > >> > >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > >> + if (evtchn_get(op.event_channel_port)) { > >> + rc = -EINVAL; > >> + goto unlock_out; > >> + } > >> + } > >> + > >> + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > >> + evtchn_put(map->notify.event); > >> + > >> map->notify.flags = op.action; > >> map->notify.addr = op.index - (map->index << PAGE_SHIFT); > >> map->notify.event = op.event_channel_port; > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-24 20:22 UTC
[Xen-devel] Re: [PATCH 1/2] xen/event: Add reference counting to event channel
On Tue, Oct 18, 2011 at 05:04:10PM -0400, Daniel De Graaf wrote:> Event channels exposed to userspace by the evtchn module may be used by > other modules in an asynchronous manner, which requires that reference > counting be used to prevent the event channel from being closed before > the signals are delivered.You should probably also remove the comment in "xen_bind_pirq_gsi_to_irq" which talks about refcount (as the comment would now apply to this code and might confuse people reading the code). There are two scenarios I am concerned about: 1). Xen pciback allocates/setups an physical IRQ on behalf of a guest. Lets concentrate on MSI as that is more interesting. The PV guests sends XEN_PCI_OP_enable_msi, dom0 calls pci_enable_msi(), MSI libs end up calling xen_initdom_setup_msi_irqs, which calls xen_bind_pirq_msi_to_irq and irq->refcnt==2. Guest dies without calling XEN_PCI_OP_disable_msi, so we end up in xen_pcibk_reset_device which calls pci_disable_msi().. which calls xen_free_irq(). And all of that sets refcnt==1.. OK, and if we do call xen_pcibk_reset_device() again it is smart enough _not_ to call pci_disable_msi() twice. So I guess that case is actually OK, but if there was a driver that decided to call pci_disable_msi (or pci_disable_irq) we could hit the BUG_ON(). Perhaps that should be altered to WARN_ON. 2). Grantdev holding the refcnt forever. That is probably the easiest as it would be a bug in the code. Hmm, I think I''ve talked myself out of actually finding any cases where this would be problematic from a design perspective. The only issue I can see is exposing bugs in the users of event channel API - which there might be. So definitly needs some heavy duty testing.> > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/events.c | 34 ++++++++++++++++++++++++++++++++++ > include/xen/events.h | 6 ++++++ > 2 files changed, 40 insertions(+), 0 deletions(-) > > diff --git a/drivers/xen/events.c b/drivers/xen/events.c > index 7523719..36d3390 100644 > --- a/drivers/xen/events.c > +++ b/drivers/xen/events.c > @@ -88,6 +88,7 @@ enum xen_irq_type { > struct irq_info > { > struct list_head list; > + atomic_t refcount;refcnt> enum xen_irq_type type; /* type */ > unsigned irq; > unsigned short evtchn; /* event channel */ > @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) > panic("Unable to allocate metadata for IRQ%d\n", irq); > > info->type = IRQT_UNBOUND; > + atomic_set(&info->refcount, 1); > > irq_set_handler_data(irq, info); > > @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) > > irq_set_handler_data(irq, NULL); > > + BUG_ON(atomic_read(&info->refcount) > 1); > + > kfree(info); > > /* Legacy IRQ descriptors are managed by the arch. */ > @@ -912,6 +916,10 @@ static void unbind_from_irq(unsigned int irq) > { > struct evtchn_close close; > int evtchn = evtchn_from_irq(irq); > + struct irq_info *info = irq_get_handler_data(irq); > + > + if (!atomic_dec_and_test(&info->refcount)) > + return; > > mutex_lock(&irq_mapping_update_lock); > > @@ -1038,6 +1046,32 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) > } > EXPORT_SYMBOL_GPL(unbind_from_irqhandler); > > +int evtchn_get(unsigned int evtchn) > +{ > + int irq = evtchn_to_irq[evtchn]; > + struct irq_info *info; > + > + if (irq == -1) > + return -ENOENT; > + > + info = irq_get_handler_data(irq); > + > + if (!info) > + return -ENOENT; > + > + atomic_inc(&info->refcount); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(evtchn_get); > + > +void evtchn_put(unsigned int evtchn)The decleration for ''evtchn'' is ''unsigned short'' so that can be used instead of ''unsigned int''.> +{ > + int irq = evtchn_to_irq[evtchn];Not checking if the irq is valid? Or if the evtchn is valid?> + unbind_from_irq(irq); > +} > +EXPORT_SYMBOL_GPL(evtchn_put); > + > void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) > { > int irq = per_cpu(ipi_to_irq, cpu)[vector]; > diff --git a/include/xen/events.h b/include/xen/events.h > index d287997..a459cca 100644 > --- a/include/xen/events.h > +++ b/include/xen/events.h > @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, > */ > void unbind_from_irqhandler(unsigned int irq, void *dev_id); > > +/* > + * Allow extra references to event channels exposed to userspace by evtchn > + */ > +int evtchn_get(unsigned int evtchn); > +void evtchn_put(unsigned int evtchn); > + > void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); > int resend_irq_on_evtchn(unsigned int irq); > void rebind_evtchn_irq(int evtchn, int irq); > -- > 1.7.6.4_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-24 20:57 UTC
[Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On Tue, Oct 18, 2011 at 05:04:11PM -0400, Daniel De Graaf wrote:> When using the unmap notify ioctl, the event channel used for > notification needs to be reserved to avoid it being deallocated prior to > sending the notification. > > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/gntalloc.c | 14 +++++++++++++- > drivers/xen/gntdev.c | 11 +++++++++++ > 2 files changed, 24 insertions(+), 1 deletions(-) > > diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c > index f6832f4..a739fb1 100644 > --- a/drivers/xen/gntalloc.c > +++ b/drivers/xen/gntalloc.c > @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) > tmp[gref->notify.pgoff] = 0; > kunmap(gref->page); > } > - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { > notify_remote_via_evtchn(gref->notify.event); > + evtchn_put(gref->notify.event);So.. I could have some fun by doing this in the userspace: for (j = 0; j< 2;j++) { for (i = 0; i < 65534; i++) { struct ioctl_gntalloc_unmap_notify uarg = { .index = arg.index + offsetof(struct shr_page, notifies[0]), .action =UNMAP_NOTIFY_SEND_EVENT, .event_channel_port = i, }; rv = ioctl(a_fd, IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); } } And cause the event channel refcnt to be set to zero and free it. And then causing the box to die - as the event channels for the physical IRQ might have gotten free-ed. Hm.. Perhaps the gntalloc and gntdev should keep track of which event channels are OK to refcnt? Something like a whitelist? Granted at that point the refcounting could as well be done by the API that sets up the event channels from the userspace. So the evtchn_ioctl is pretty smart. It uses "get_port_user" to get the list of events that belong to this user (and have been handed out). I think you need to use that in the gntalloc to double-check that the event channel is not one of the kernel type.> + } > > gref->notify.flags = 0; > > @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, > goto unlock_out; > } > > + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > + if (evtchn_get(op.event_channel_port)) { > + rc = -EINVAL; > + goto unlock_out; > + } > + } > + > + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > + evtchn_put(gref->notify.event); > + > gref->notify.flags = op.action; > gref->notify.pgoff = pgoff; > gref->notify.event = op.event_channel_port; > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index f914b26..cfcc890 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) > > if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { > notify_remote_via_evtchn(map->notify.event); > + evtchn_put(map->notify.event); > } > > if (map->pages) { > @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) > goto unlock_out; > } > > + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > + if (evtchn_get(op.event_channel_port)) { > + rc = -EINVAL; > + goto unlock_out; > + } > + } > + > + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT)So notify.flags has not been set yet? That looks to be done later? Or is this in case of the user doing uargs.action = UNMAP_NOTIFY_SEND_EVENT; ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); uargs.action = UNAMP_NOTIFY_CLEAR_BYTE; ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); and we want to preserve the "old" flags before swapping over to the new?> + evtchn_put(map->notify.event); > + > map->notify.flags = op.action; > map->notify.addr = op.index - (map->index << PAGE_SHIFT); > map->notify.event = op.event_channel_port; > -- > 1.7.6.4_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-24 22:01 UTC
[Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On 10/24/2011 04:57 PM, Konrad Rzeszutek Wilk wrote:> On Tue, Oct 18, 2011 at 05:04:11PM -0400, Daniel De Graaf wrote: >> When using the unmap notify ioctl, the event channel used for >> notification needs to be reserved to avoid it being deallocated prior to >> sending the notification. >> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/gntalloc.c | 14 +++++++++++++- >> drivers/xen/gntdev.c | 11 +++++++++++ >> 2 files changed, 24 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c >> index f6832f4..a739fb1 100644 >> --- a/drivers/xen/gntalloc.c >> +++ b/drivers/xen/gntalloc.c >> @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) >> tmp[gref->notify.pgoff] = 0; >> kunmap(gref->page); >> } >> - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) >> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { >> notify_remote_via_evtchn(gref->notify.event); >> + evtchn_put(gref->notify.event); > > So.. I could have some fun by doing this in the userspace: > for (j = 0; j< 2;j++) { > for (i = 0; i < 65534; i++) { > struct ioctl_gntalloc_unmap_notify uarg = { > .index = arg.index + offsetof(struct shr_page, notifies[0]), > .action =UNMAP_NOTIFY_SEND_EVENT, > .event_channel_port = i, > }; > rv = ioctl(a_fd, IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); > } > } > > And cause the event channel refcnt to be set to zero and free it. And then > causing the box to die - as the event channels for the physical IRQ might have > gotten free-ed. >Not really. For a given valid event channel E, this will increase the refcnt by one when i == E, and then decrease refcnt the next time evtchn_get succeeds (for some other value of i).> Hm.. Perhaps the gntalloc and gntdev should keep track of which event channels > are OK to refcnt? Something like a whitelist? Granted at that point the refcounting > could as well be done by the API that sets up the event channels from the userspace.Hmm. Perhaps have a magic value for refcount (-1?) that indicates evtchn_get is not available. That would become the default value of refcnt, and evtchn.c would then use evtchn_make_refcounted() to change the refcount to 1 and allow _get/_put to work.> So the evtchn_ioctl is pretty smart. It uses "get_port_user" to get the list > of events that belong to this user (and have been handed out). I think you > need to use that in the gntalloc to double-check that the event channel is not > one of the kernel type. > >> + } >> >> gref->notify.flags = 0; >> >> @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, >> goto unlock_out; >> } >> >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { >> + if (evtchn_get(op.event_channel_port)) { >> + rc = -EINVAL; >> + goto unlock_out; >> + } >> + } >> + >> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) >> + evtchn_put(gref->notify.event); >> + >> gref->notify.flags = op.action; >> gref->notify.pgoff = pgoff; >> gref->notify.event = op.event_channel_port; >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >> index f914b26..cfcc890 100644 >> --- a/drivers/xen/gntdev.c >> +++ b/drivers/xen/gntdev.c >> @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) >> >> if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { >> notify_remote_via_evtchn(map->notify.event); >> + evtchn_put(map->notify.event); >> } >> >> if (map->pages) { >> @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) >> goto unlock_out; >> } >> >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { >> + if (evtchn_get(op.event_channel_port)) { >> + rc = -EINVAL; >> + goto unlock_out; >> + } >> + } >> + >> + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > > So notify.flags has not been set yet? That looks to be done later?Yep. It''s the previous value (zero if we haven''t called the ioctl yet).> Or is this in case of the user doing > > uargs.action = UNMAP_NOTIFY_SEND_EVENT; > ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); > uargs.action = UNAMP_NOTIFY_CLEAR_BYTE; > ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); > > and we want to preserve the "old" flags before swapping over to the > new?No. We acquire the new event channel before releasing the old one so that if we happen to be the only one holding a reference to this event channel, a change in the byte-clear portion of the notify does not cause us to drop the event channel.>> + evtchn_put(map->notify.event); >> + >> map->notify.flags = op.action; >> map->notify.addr = op.index - (map->index << PAGE_SHIFT); >> map->notify.event = op.event_channel_port; >> -- >> 1.7.6.4 >-- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-24 22:20 UTC
[Xen-devel] Re: [PATCH 1/2] xen/event: Add reference counting to event channel
On 10/24/2011 04:22 PM, Konrad Rzeszutek Wilk wrote:> On Tue, Oct 18, 2011 at 05:04:10PM -0400, Daniel De Graaf wrote: >> Event channels exposed to userspace by the evtchn module may be used by >> other modules in an asynchronous manner, which requires that reference >> counting be used to prevent the event channel from being closed before >> the signals are delivered. > > You should probably also remove the comment in "xen_bind_pirq_gsi_to_irq" > which talks about refcount (as the comment would now apply to this code and > might confuse people reading the code).OK.> There are two scenarios I am concerned about: > > 1). Xen pciback allocates/setups an physical IRQ on behalf of a guest. Lets > concentrate on MSI as that is more interesting. The PV guests sends > XEN_PCI_OP_enable_msi, dom0 calls pci_enable_msi(), MSI libs end up calling > xen_initdom_setup_msi_irqs, which calls xen_bind_pirq_msi_to_irq and > irq->refcnt==2. > > Guest dies without calling XEN_PCI_OP_disable_msi, so we end up in > xen_pcibk_reset_device which calls pci_disable_msi().. which calls xen_free_irq(). > And all of that sets refcnt==1.. OK, and if we do call xen_pcibk_reset_device() > again it is smart enough _not_ to call pci_disable_msi() twice. > > So I guess that case is actually OK, but if there was a driver that decided to > call pci_disable_msi (or pci_disable_irq) we could hit the BUG_ON(). Perhaps > that should be altered to WARN_ON.Agreed, WARN_ON is better as nothing is likely to explode if the refcnt is off.> 2). Grantdev holding the refcnt forever. That is probably the easiest as it would > be a bug in the code.Right; same as any other reference leak in external (kernel) code.> Hmm, I think I''ve talked myself out of actually finding any cases where this would > be problematic from a design perspective. The only issue I can see is exposing bugs > in the users of event channel API - which there might be. So definitly needs some > heavy duty testing. >> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/events.c | 34 ++++++++++++++++++++++++++++++++++ >> include/xen/events.h | 6 ++++++ >> 2 files changed, 40 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/xen/events.c b/drivers/xen/events.c >> index 7523719..36d3390 100644 >> --- a/drivers/xen/events.c >> +++ b/drivers/xen/events.c >> @@ -88,6 +88,7 @@ enum xen_irq_type { >> struct irq_info >> { >> struct list_head list; >> + atomic_t refcount; > > refcnt >Ah yes, vowels are much too valuable to use more than one here...>> enum xen_irq_type type; /* type */ >> unsigned irq; >> unsigned short evtchn; /* event channel */ >> @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq) >> panic("Unable to allocate metadata for IRQ%d\n", irq); >> >> info->type = IRQT_UNBOUND; >> + atomic_set(&info->refcount, 1); >> >> irq_set_handler_data(irq, info); >> >> @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) >> >> irq_set_handler_data(irq, NULL); >> >> + BUG_ON(atomic_read(&info->refcount) > 1); >> + >> kfree(info); >> >> /* Legacy IRQ descriptors are managed by the arch. */ >> @@ -912,6 +916,10 @@ static void unbind_from_irq(unsigned int irq) >> { >> struct evtchn_close close; >> int evtchn = evtchn_from_irq(irq); >> + struct irq_info *info = irq_get_handler_data(irq); >> + >> + if (!atomic_dec_and_test(&info->refcount)) >> + return; >> >> mutex_lock(&irq_mapping_update_lock); >> >> @@ -1038,6 +1046,32 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) >> } >> EXPORT_SYMBOL_GPL(unbind_from_irqhandler); >> >> +int evtchn_get(unsigned int evtchn) >> +{ >> + int irq = evtchn_to_irq[evtchn]; >> + struct irq_info *info; >> + >> + if (irq == -1) >> + return -ENOENT; >> + >> + info = irq_get_handler_data(irq); >> + >> + if (!info) >> + return -ENOENT; >> + >> + atomic_inc(&info->refcount); >> + >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(evtchn_get); >> + >> +void evtchn_put(unsigned int evtchn) > > The decleration for ''evtchn'' is ''unsigned short'' so that can be > used instead of ''unsigned int''. > >> +{ >> + int irq = evtchn_to_irq[evtchn]; > > Not checking if the irq is valid? Or if the evtchn is valid?All callers currently check this (by the _get function), but it''s probably a good idea to double-check with a WARN_ON just in case it''s not valid.>> + unbind_from_irq(irq); >> +} >> +EXPORT_SYMBOL_GPL(evtchn_put); >> + >> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) >> { >> int irq = per_cpu(ipi_to_irq, cpu)[vector]; >> diff --git a/include/xen/events.h b/include/xen/events.h >> index d287997..a459cca 100644 >> --- a/include/xen/events.h >> +++ b/include/xen/events.h >> @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, >> */ >> void unbind_from_irqhandler(unsigned int irq, void *dev_id); >> >> +/* >> + * Allow extra references to event channels exposed to userspace by evtchn >> + */ >> +int evtchn_get(unsigned int evtchn); >> +void evtchn_put(unsigned int evtchn); >> + >> void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); >> int resend_irq_on_evtchn(unsigned int irq); >> void rebind_evtchn_irq(int evtchn, int irq); >> -- >> 1.7.6.4 >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-25 08:17 UTC
Re: [Xen-devel] Re: [PATCH 1/2] xen/event: Add reference counting to event channel
On Mon, 2011-10-24 at 21:22 +0100, Konrad Rzeszutek Wilk wrote:> > > +void evtchn_put(unsigned int evtchn) > > The decleration for ''evtchn'' is ''unsigned short'' so that can be > used instead of ''unsigned int''.I think I nearly made the same comment but then I looked at drivers/xen/events.c and found that it uses "unsigned", "unsigned short", "unsigned int" and "int" fairly interchangeably. The externally visible API (i.e. include/xen/events.h) tends to stick to just "unsigned int" or "int". The problem with the short types is that they don''t leave room for an error indication, which is why on the Xen tools (libxc) side we have his piece of minor ugliness: typedef int evtchn_port_or_error_t; The actual hypercall interface uses "typedef uint32_t evtchn_port_t" but AIUI real evtchn values are guaranteed to fit in 31 bits. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-25 19:02 UTC
[Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
> > And cause the event channel refcnt to be set to zero and free it. And then > > causing the box to die - as the event channels for the physical IRQ might have > > gotten free-ed. > > > > Not really. For a given valid event channel E, this will increase the refcnt by one > when i == E, and then decrease refcnt the next time evtchn_get succeeds (for some > other value of i).Oh right. Hmm.. I am having this feeling that it still makes sense to seperate the events that are allocated by grantdev/grantalloc from the ones that are done for in-kernel uses (such as IRQ, MSI, IPI, etc). Basically not trusting the userland with its arguments as much as possible. And yes, I do understand that you need to be a root user to use /dev/gnt*, but I started thinking about QEMU. And Fedora has this concept of making QEMU run in its own SELinux container (and own user) - or perhaps I am confusing this with containers.. Anyhow it runs in one of those quasi-root-but-not-root. My thinking is that it could be possible do with QEMU running under Xen too, but then we have to make sure that all /dev/gnt* ioctls are secure <hand-waving what secure means>. It probably involves more than just what we discussed.> > > Hm.. Perhaps the gntalloc and gntdev should keep track of which event channels > > are OK to refcnt? Something like a whitelist? Granted at that point the refcounting > > could as well be done by the API that sets up the event channels from the userspace. > > Hmm. Perhaps have a magic value for refcount (-1?) that indicates evtchn_get is not > available. That would become the default value of refcnt, and evtchn.c would then > use evtchn_make_refcounted() to change the refcount to 1 and allow _get/_put to work.How would that work when the IRQ subsystem (so everything is setup in the kernel) gets an event? Would the refcount be for that -1.. oh. You would only set the refcnt when the _get/_put calls are made and not when in-kernel calls to setup IRQs are done?> > > So the evtchn_ioctl is pretty smart. It uses "get_port_user" to get the list > > of events that belong to this user (and have been handed out). I think you > > need to use that in the gntalloc to double-check that the event channel is not > > one of the kernel type. > > > >> + } > >> > >> gref->notify.flags = 0; > >> > >> @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, > >> goto unlock_out; > >> } > >> > >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > >> + if (evtchn_get(op.event_channel_port)) { > >> + rc = -EINVAL; > >> + goto unlock_out; > >> + } > >> + } > >> + > >> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > >> + evtchn_put(gref->notify.event); > >> + > >> gref->notify.flags = op.action; > >> gref->notify.pgoff = pgoff; > >> gref->notify.event = op.event_channel_port; > >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > >> index f914b26..cfcc890 100644 > >> --- a/drivers/xen/gntdev.c > >> +++ b/drivers/xen/gntdev.c > >> @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) > >> > >> if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { > >> notify_remote_via_evtchn(map->notify.event); > >> + evtchn_put(map->notify.event); > >> } > >> > >> if (map->pages) { > >> @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) > >> goto unlock_out; > >> } > >> > >> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { > >> + if (evtchn_get(op.event_channel_port)) { > >> + rc = -EINVAL; > >> + goto unlock_out; > >> + } > >> + } > >> + > >> + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) > > > > So notify.flags has not been set yet? That looks to be done later? > > Yep. It''s the previous value (zero if we haven''t called the ioctl yet).OK, can you add a tiny comment so that in a year time the person reading this will have a warm fuzzy feeling..> > > Or is this in case of the user doing > > > > uargs.action = UNMAP_NOTIFY_SEND_EVENT; > > ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); > > uargs.action = UNAMP_NOTIFY_CLEAR_BYTE; > > ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); > > > > and we want to preserve the "old" flags before swapping over to the > > new? > > No. We acquire the new event channel before releasing the old one so that > if we happen to be the only one holding a reference to this event channel, > a change in the byte-clear portion of the notify does not cause us to drop > the event channel.Ok.> > >> + evtchn_put(map->notify.event); > >> + > >> map->notify.flags = op.action; > >> map->notify.addr = op.index - (map->index << PAGE_SHIFT); > >> map->notify.event = op.event_channel_port; > >> -- > >> 1.7.6.4 > > > > > -- > Daniel De Graaf > National Security Agency_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-25 19:05 UTC
[Xen-devel] Re: [PATCH 1/2] xen/event: Add reference counting to event channel
> >> + atomic_t refcount; > > > > refcnt > > > > Ah yes, vowels are much too valuable to use more than one here...<laughs> Well, optimization!! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-25 19:09 UTC
Re: [Xen-devel] Re: [PATCH 1/2] xen/event: Add reference counting to event channel
On Tue, Oct 25, 2011 at 09:17:15AM +0100, Ian Campbell wrote:> On Mon, 2011-10-24 at 21:22 +0100, Konrad Rzeszutek Wilk wrote: > > > > > +void evtchn_put(unsigned int evtchn) > > > > The decleration for ''evtchn'' is ''unsigned short'' so that can be > > used instead of ''unsigned int''. > > I think I nearly made the same comment but then I looked at > drivers/xen/events.c and found that it uses "unsigned", "unsigned > short", "unsigned int" and "int" fairly interchangeably. The externally > visible API (i.e. include/xen/events.h) tends to stick to just "unsigned > int" or "int". > > The problem with the short types is that they don''t leave room for an > error indication, which is why on the Xen tools (libxc) side we have his > piece of minor ugliness: > typedef int evtchn_port_or_error_t; > > The actual hypercall interface uses "typedef uint32_t evtchn_port_t" but > AIUI real evtchn values are guaranteed to fit in 31 bits.Oh fun! So we actually have a bug with ''unsigned short'' cutting it down from 2^31 to 2^16! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-25 19:41 UTC
[Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On 10/25/2011 03:02 PM, Konrad Rzeszutek Wilk wrote:>>> And cause the event channel refcnt to be set to zero and free it. And then >>> causing the box to die - as the event channels for the physical IRQ might have >>> gotten free-ed. >>> >> >> Not really. For a given valid event channel E, this will increase the refcnt by one >> when i == E, and then decrease refcnt the next time evtchn_get succeeds (for some >> other value of i). > > Oh right. Hmm.. I am having this feeling that it still makes sense to seperate the > events that are allocated by grantdev/grantalloc from the ones that are done > for in-kernel uses (such as IRQ, MSI, IPI, etc). Basically not trusting the userland > with its arguments as much as possible. > > And yes, I do understand that you need to be a root user to use /dev/gnt*, but > I started thinking about QEMU. And Fedora has this concept of making QEMU run in its > own SELinux container (and own user) - or perhaps I am confusing this with containers.. > Anyhow it runs in one of those quasi-root-but-not-root. My thinking is that it could > be possible do with QEMU running under Xen too, but then we have to make sure > that all /dev/gnt* ioctls are secure <hand-waving what secure means>. > > It probably involves more than just what we discussed.That same SELinux category-based isolation mechanism is also a good solution for xen qemu-dm processes, although moving qemu to a stubdom provides better isolation since SELinux currently cannot talk to XSM to determine what domains a particular qemu-dm process should be able to manipulate. Only allowing event channels allocated by userspace to be used in gnt* notify is a good idea, since there''s no reason for userspace to need to manipulate an event channel set up by the kernel.>> >>> Hm.. Perhaps the gntalloc and gntdev should keep track of which event channels >>> are OK to refcnt? Something like a whitelist? Granted at that point the refcounting >>> could as well be done by the API that sets up the event channels from the userspace. >> >> Hmm. Perhaps have a magic value for refcount (-1?) that indicates evtchn_get is not >> available. That would become the default value of refcnt, and evtchn.c would then >> use evtchn_make_refcounted() to change the refcount to 1 and allow _get/_put to work. > > How would that work when the IRQ subsystem (so everything is setup in the kernel) > gets an event? Would the refcount be for that -1.. oh. You would only set > the refcnt when the _get/_put calls are made and not when in-kernel calls to setup > IRQs are done? >Right. The reference count would be a dual-purpose field indicating if the event channel is kernel-internal (value -1) or userspace-visible (reference count > 0). New event channels would start out at -1, and evtchn.c would change them to 1.>> >>> So the evtchn_ioctl is pretty smart. It uses "get_port_user" to get the list >>> of events that belong to this user (and have been handed out). I think you >>> need to use that in the gntalloc to double-check that the event channel is not >>> one of the kernel type. >>> >>>> + } >>>> >>>> gref->notify.flags = 0; >>>> >>>> @@ -396,6 +398,16 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, >>>> goto unlock_out; >>>> } >>>> >>>> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { >>>> + if (evtchn_get(op.event_channel_port)) { >>>> + rc = -EINVAL; >>>> + goto unlock_out; >>>> + } >>>> + } >>>> + >>>> + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) >>>> + evtchn_put(gref->notify.event); >>>> + >>>> gref->notify.flags = op.action; >>>> gref->notify.pgoff = pgoff; >>>> gref->notify.event = op.event_channel_port; >>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >>>> index f914b26..cfcc890 100644 >>>> --- a/drivers/xen/gntdev.c >>>> +++ b/drivers/xen/gntdev.c >>>> @@ -190,6 +190,7 @@ static void gntdev_put_map(struct grant_map *map) >>>> >>>> if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { >>>> notify_remote_via_evtchn(map->notify.event); >>>> + evtchn_put(map->notify.event); >>>> } >>>> >>>> if (map->pages) { >>>> @@ -596,6 +597,16 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) >>>> goto unlock_out; >>>> } >>>> >>>> + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { >>>> + if (evtchn_get(op.event_channel_port)) { >>>> + rc = -EINVAL; >>>> + goto unlock_out; >>>> + } >>>> + } >>>> + >>>> + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) >>> >>> So notify.flags has not been set yet? That looks to be done later? >> >> Yep. It''s the previous value (zero if we haven''t called the ioctl yet). > > OK, can you add a tiny comment so that in a year time the person reading this > will have a warm fuzzy feeling..OK>> >>> Or is this in case of the user doing >>> >>> uargs.action = UNMAP_NOTIFY_SEND_EVENT; >>> ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); >>> uargs.action = UNAMP_NOTIFY_CLEAR_BYTE; >>> ioctl(.., IOCTL_GNTALLOC_SET_UNMAP_NOTIFY, &uarg); >>> >>> and we want to preserve the "old" flags before swapping over to the >>> new? >> >> No. We acquire the new event channel before releasing the old one so that >> if we happen to be the only one holding a reference to this event channel, >> a change in the byte-clear portion of the notify does not cause us to drop >> the event channel. > > Ok. >> >>>> + evtchn_put(map->notify.event); >>>> + >>>> map->notify.flags = op.action; >>>> map->notify.addr = op.index - (map->index << PAGE_SHIFT); >>>> map->notify.event = op.event_channel_port; >>>> -- >>>> 1.7.6.4_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-25 19:51 UTC
[Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
> That same SELinux category-based isolation mechanism is also a good solution for xen > qemu-dm processes, although moving qemu to a stubdom provides better isolation since > SELinux currently cannot talk to XSM to determine what domains a particular qemu-dm > process should be able to manipulate.<nods>> > Only allowing event channels allocated by userspace to be used in gnt* notify is > a good idea, since there''s no reason for userspace to need to manipulate an event > channel set up by the kernel. >.. snip..> > > > How would that work when the IRQ subsystem (so everything is setup in the kernel) > > gets an event? Would the refcount be for that -1.. oh. You would only set > > the refcnt when the _get/_put calls are made and not when in-kernel calls to setup > > IRQs are done? > > > > Right. The reference count would be a dual-purpose field indicating if the event > channel is kernel-internal (value -1) or userspace-visible (reference count > 0). > New event channels would start out at -1, and evtchn.c would change them to 1.The tricky bit is going to be with the xen_free_irq which might have to deal with kernel events and grantdev events... oh wait, the event_put is going to decrement it and then call xen_free_irq, so that will come out to be the right number. Looking forward to the patches! Thanks for doing this work. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-25 20:27 UTC
Re: [Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On Tue, 2011-10-25 at 20:41 +0100, Daniel De Graaf wrote:> > >> Hmm. Perhaps have a magic value for refcount (-1?) that indicates > evtchn_get is not > >> available. That would become the default value of refcnt, and > evtchn.c would then > >> use evtchn_make_refcounted() to change the refcount to 1 and allow > _get/_put to work. > > > > How would that work when the IRQ subsystem (so everything is setup > in the kernel) > > gets an event? Would the refcount be for that -1.. oh. You would > only set > > the refcnt when the _get/_put calls are made and not when in-kernel > calls to setup> IRQs are done? > > > > Right. The reference count would be a dual-purpose field indicating if > the event channel is kernel-internal (value -1) or userspace-visible > (reference count > 0). New event channels would start out at -1, and > evtchn.c would change them to 1.Is there any way that the reference count could be made part of the datastructures associated with the /dev/xen/evtchn driver instead of the core evtchn.c stuff? That wouldreduce the chance of current or futures users getting something wrong. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-25 20:42 UTC
Re: [Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On 10/25/2011 04:27 PM, Ian Campbell wrote:> On Tue, 2011-10-25 at 20:41 +0100, Daniel De Graaf wrote: >> >>>> Hmm. Perhaps have a magic value for refcount (-1?) that indicates >> evtchn_get is not >>>> available. That would become the default value of refcnt, and >> evtchn.c would then >>>> use evtchn_make_refcounted() to change the refcount to 1 and allow >> _get/_put to work. >>> >>> How would that work when the IRQ subsystem (so everything is setup >> in the kernel) >>> gets an event? Would the refcount be for that -1.. oh. You would >> only set >>> the refcnt when the _get/_put calls are made and not when in-kernel >> calls to setup> IRQs are done? >>> >> >> Right. The reference count would be a dual-purpose field indicating if >> the event channel is kernel-internal (value -1) or userspace-visible >> (reference count > 0). New event channels would start out at -1, and >> evtchn.c would change them to 1. > > Is there any way that the reference count could be made part of the > datastructures associated with the /dev/xen/evtchn driver instead of the > core evtchn.c stuff? That wouldreduce the chance of current or futures > users getting something wrong. > > Ian.This would require that the gntdev and gntalloc modules have a dependency on the evtchn module, with the evtchn_{get,put} functions moved into that module. The only other way would be to add a refcount maintenance function pointer to the IRQ data structure, which seems like it would lead to more problems than a simple reference count. -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-25 20:44 UTC
Re: [Xen-devel] Re: [PATCH 1/2] xen/event: Add reference counting to event channel
On Tue, 2011-10-25 at 20:09 +0100, Konrad Rzeszutek Wilk wrote:> On Tue, Oct 25, 2011 at 09:17:15AM +0100, Ian Campbell wrote: > > On Mon, 2011-10-24 at 21:22 +0100, Konrad Rzeszutek Wilk wrote: > > > > > > > +void evtchn_put(unsigned int evtchn) > > > > > > The decleration for ''evtchn'' is ''unsigned short'' so that can be > > > used instead of ''unsigned int''. > > > > I think I nearly made the same comment but then I looked at > > drivers/xen/events.c and found that it uses "unsigned", "unsigned > > short", "unsigned int" and "int" fairly interchangeably. The externally > > visible API (i.e. include/xen/events.h) tends to stick to just "unsigned > > int" or "int". > > > > The problem with the short types is that they don''t leave room for an > > error indication, which is why on the Xen tools (libxc) side we have his > > piece of minor ugliness: > > typedef int evtchn_port_or_error_t; > > > > The actual hypercall interface uses "typedef uint32_t evtchn_port_t" but > > AIUI real evtchn values are guaranteed to fit in 31 bits. > > > Oh fun! So we actually have a bug with ''unsigned short'' cutting it down > from 2^31 to 2^16!The actual maximums are 1024 (on x86_32) and 4096 (on x86_64) so "guaranteed to fit 31 bits" (which I took from the xenctrl.h comment) is more a case of definitely big enough. The use of 31 is more of an allusion to errors being negative, I think. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-25 20:50 UTC
Re: [Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On Tue, 2011-10-25 at 21:42 +0100, Daniel De Graaf wrote:> On 10/25/2011 04:27 PM, Ian Campbell wrote: > > On Tue, 2011-10-25 at 20:41 +0100, Daniel De Graaf wrote: > >> > >>>> Hmm. Perhaps have a magic value for refcount (-1?) that indicates > >> evtchn_get is not > >>>> available. That would become the default value of refcnt, and > >> evtchn.c would then > >>>> use evtchn_make_refcounted() to change the refcount to 1 and allow > >> _get/_put to work. > >>> > >>> How would that work when the IRQ subsystem (so everything is setup > >> in the kernel) > >>> gets an event? Would the refcount be for that -1.. oh. You would > >> only set > >>> the refcnt when the _get/_put calls are made and not when in-kernel > >> calls to setup> IRQs are done? > >>> > >> > >> Right. The reference count would be a dual-purpose field indicating if > >> the event channel is kernel-internal (value -1) or userspace-visible > >> (reference count > 0). New event channels would start out at -1, and > >> evtchn.c would change them to 1. > > > > Is there any way that the reference count could be made part of the > > datastructures associated with the /dev/xen/evtchn driver instead of the > > core evtchn.c stuff? That wouldreduce the chance of current or futures > > users getting something wrong. > > > > Ian. > > This would require that the gntdev and gntalloc modules have a dependency > on the evtchn module, with the evtchn_{get,put} functions moved into that > module.Don''t they effectively have that already, since the only evtchns you can use with them have come from that module?> The only other way would be to add a refcount maintenance > function pointer to the IRQ data structure, which seems like it would lead > to more problems than a simple reference count. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-25 21:07 UTC
Re: [Xen-devel] Re: [PATCH 2/2] xen/gnt{dev, alloc}: reserve event channels for notify
On 10/25/2011 04:50 PM, Ian Campbell wrote:> On Tue, 2011-10-25 at 21:42 +0100, Daniel De Graaf wrote: >> On 10/25/2011 04:27 PM, Ian Campbell wrote: >>> On Tue, 2011-10-25 at 20:41 +0100, Daniel De Graaf wrote: >>>> >>>>>> Hmm. Perhaps have a magic value for refcount (-1?) that indicates >>>> evtchn_get is not >>>>>> available. That would become the default value of refcnt, and >>>> evtchn.c would then >>>>>> use evtchn_make_refcounted() to change the refcount to 1 and allow >>>> _get/_put to work. >>>>> >>>>> How would that work when the IRQ subsystem (so everything is setup >>>> in the kernel) >>>>> gets an event? Would the refcount be for that -1.. oh. You would >>>> only set >>>>> the refcnt when the _get/_put calls are made and not when in-kernel >>>> calls to setup> IRQs are done? >>>>> >>>> >>>> Right. The reference count would be a dual-purpose field indicating if >>>> the event channel is kernel-internal (value -1) or userspace-visible >>>> (reference count > 0). New event channels would start out at -1, and >>>> evtchn.c would change them to 1. >>> >>> Is there any way that the reference count could be made part of the >>> datastructures associated with the /dev/xen/evtchn driver instead of the >>> core evtchn.c stuff? That wouldreduce the chance of current or futures >>> users getting something wrong. >>> >>> Ian. >> >> This would require that the gntdev and gntalloc modules have a dependency >> on the evtchn module, with the evtchn_{get,put} functions moved into that >> module. > > Don''t they effectively have that already, since the only evtchns you can > use with them have come from that module? >Yes, but the primary purpose of the gntdev/gntalloc modules does not include using event channels (just mapping memory). In practice, however, this is true as shared-memory rings need a notification mechanism if you want to avoid polling.>> The only other way would be to add a refcount maintenance >> function pointer to the IRQ data structure, which seems like it would lead >> to more problems than a simple reference count. >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-26 15:47 UTC
[Xen-devel] [PATCH v3 0/3] Add reference counting to grant notify ioctls
> The current notify ioctls assume that an event channel will not be > closed prior to the page being unmapped. If the mappings are associated > with an open file descriptor and the application crashes, the > notification behavior depends on the close ordering of the file > descriptors. To avoid this, event channels now have a reference count > that is used by the grant notify ioctls to postpone the close operation > until the notification is fired.Changes since v2: Avoid possible sleep under spinlock Decrease refcount mismatch errors from BUG to WARN Use reference count to identify userspace-visible event channels Changes since v1: Rename evtchn_get/put to match kernel naming conventions Use atomic_t for refcount [PATCH 1/3] xen/event: Add reference counting to event channels [PATCH 2/3] xen/gntalloc: Change gref_lock to a mutex [PATCH 3/3] xen/gnt{dev,alloc}: reserve event channels for notify _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-26 15:47 UTC
[Xen-devel] [PATCH 1/3] xen/event: Add reference counting to event channels
Event channels exposed to userspace by the evtchn module may be used by other modules in an asynchronous manner, which requires that reference counting be used to prevent the event channel from being closed before the signals are delivered. The reference count on new event channels defaults to -1 which indicates the event channel is not referenced outside the kernel; evtchn_get fails if called on such an event channel. The event channels made visible to userspace by evtchn have a normal reference count. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/events.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++- drivers/xen/evtchn.c | 2 +- include/xen/events.h | 7 +++++ 3 files changed, 69 insertions(+), 2 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 073c11d..6054580 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -87,6 +87,7 @@ enum xen_irq_type { */ struct irq_info { struct list_head list; + atomic_t refcnt; enum xen_irq_type type; /* type */ unsigned irq; unsigned short evtchn; /* event channel */ @@ -406,6 +407,7 @@ static void xen_irq_init(unsigned irq) panic("Unable to allocate metadata for IRQ%d\n", irq); info->type = IRQT_UNBOUND; + atomic_set(&info->refcnt, -1); irq_set_handler_data(irq, info); @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) irq_set_handler_data(irq, NULL); + WARN_ON(atomic_read(&info->refcnt) > 0); + kfree(info); /* Legacy IRQ descriptors are managed by the arch. */ @@ -637,7 +641,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi, if (irq != -1) { printk(KERN_INFO "xen_map_pirq_gsi: returning irq %d for gsi %u\n", irq, gsi); - goto out; /* XXX need refcount? */ + goto out; } irq = xen_allocate_irq_gsi(gsi); @@ -939,6 +943,10 @@ static void unbind_from_irq(unsigned int irq) { struct evtchn_close close; int evtchn = evtchn_from_irq(irq); + struct irq_info *info = irq_get_handler_data(irq); + + if (atomic_read(&info->refcnt) > 0 && !atomic_dec_and_test(&info->refcnt)) + return; mutex_lock(&irq_mapping_update_lock); @@ -1065,6 +1073,58 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) } EXPORT_SYMBOL_GPL(unbind_from_irqhandler); +int evtchn_make_refcounted(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + struct irq_info *info; + + if (irq == -1) + return -ENOENT; + + info = irq_get_handler_data(irq); + + if (!info) + return -ENOENT; + + WARN_ON(atomic_read(&info->refcnt) != -1); + + atomic_set(&info->refcnt, 1); + + return 0; +} +EXPORT_SYMBOL_GPL(evtchn_make_refcounted); + +int evtchn_get(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + struct irq_info *info; + + if (irq == -1) + return -ENOENT; + + info = irq_get_handler_data(irq); + + if (!info) + return -ENOENT; + + if (atomic_read(&info->refcnt) < 0) + return -EINVAL; + + atomic_inc(&info->refcnt); + + return 0; +} +EXPORT_SYMBOL_GPL(evtchn_get); + +void evtchn_put(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + if (WARN_ON(irq == -1)) + return; + unbind_from_irq(irq); +} +EXPORT_SYMBOL_GPL(evtchn_put); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) { int irq = per_cpu(ipi_to_irq, cpu)[vector]; diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index dbc13e9..b1f60a0 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -268,7 +268,7 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port) rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED, u->name, (void *)(unsigned long)port); if (rc >= 0) - rc = 0; + rc = evtchn_make_refcounted(port); return rc; } diff --git a/include/xen/events.h b/include/xen/events.h index d287997..0f77370 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -37,6 +37,13 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, */ void unbind_from_irqhandler(unsigned int irq, void *dev_id); +/* + * Allow extra references to event channels exposed to userspace by evtchn + */ +int evtchn_make_refcounted(unsigned int evtchn); +int evtchn_get(unsigned int evtchn); +void evtchn_put(unsigned int evtchn); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); int resend_irq_on_evtchn(unsigned int irq); void rebind_evtchn_irq(int evtchn, int irq); -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-26 15:47 UTC
[Xen-devel] [PATCH 2/3] xen/gntalloc: Change gref_lock to a mutex
The event channel release function cannot be called under a spinlock because it can attempt to acquire a mutex due to the event channel reference acquired when setting up unmap notifications. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 41 +++++++++++++++++++++-------------------- 1 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index f6832f4..439352d 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -74,7 +74,7 @@ MODULE_PARM_DESC(limit, "Maximum number of grants that may be allocated by " "the gntalloc device"); static LIST_HEAD(gref_list); -static DEFINE_SPINLOCK(gref_lock); +static DEFINE_MUTEX(gref_mutex); static int gref_size; struct notify_info { @@ -143,15 +143,15 @@ static int add_grefs(struct ioctl_gntalloc_alloc_gref *op, } /* Add to gref lists. */ - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); list_splice_tail(&queue_gref, &gref_list); list_splice_tail(&queue_file, &priv->list); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return 0; undo: - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref_size -= (op->count - i); list_for_each_entry(gref, &queue_file, next_file) { @@ -167,7 +167,7 @@ undo: */ if (unlikely(!list_empty(&queue_gref))) list_splice_tail(&queue_gref, &gref_list); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rc; } @@ -251,7 +251,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp) pr_debug("%s: priv %p\n", __func__, priv); - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); while (!list_empty(&priv->list)) { gref = list_entry(priv->list.next, struct gntalloc_gref, next_file); @@ -261,7 +261,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp) __del_gref(gref); } kfree(priv); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return 0; } @@ -286,21 +286,21 @@ static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, goto out; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); /* Clean up pages that were at zero (local) users but were still mapped * by remote domains. Since those pages count towards the limit that we * are about to enforce, removing them here is a good idea. */ do_cleanup(); if (gref_size + op.count > limit) { - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); rc = -ENOSPC; goto out_free; } gref_size += op.count; op.index = priv->index; priv->index += op.count * PAGE_SIZE; - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); rc = add_grefs(&op, gref_ids, priv); if (rc < 0) @@ -343,7 +343,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, goto dealloc_grant_out; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, op.index, op.count); if (gref) { /* Remove from the file list only, and decrease reference count. @@ -363,7 +363,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, do_cleanup(); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); dealloc_grant_out: return rc; } @@ -383,7 +383,7 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, index = op.index & ~(PAGE_SIZE - 1); pgoff = op.index & (PAGE_SIZE - 1); - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, index, 1); if (!gref) { @@ -400,8 +400,9 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; rc = 0; + unlock_out: - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rc; } @@ -433,9 +434,9 @@ static void gntalloc_vma_open(struct vm_area_struct *vma) if (!gref) return; - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref->users++; - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); } static void gntalloc_vma_close(struct vm_area_struct *vma) @@ -444,11 +445,11 @@ static void gntalloc_vma_close(struct vm_area_struct *vma) if (!gref) return; - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref->users--; if (gref->users == 0) __del_gref(gref); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); } static struct vm_operations_struct gntalloc_vmops = { @@ -471,7 +472,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) return -EINVAL; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, vma->vm_pgoff << PAGE_SHIFT, count); if (gref == NULL) { rv = -ENOENT; @@ -499,7 +500,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) rv = 0; out_unlock: - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rv; } -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-26 15:47 UTC
[Xen-devel] [PATCH 3/3] xen/gnt{dev, alloc}: reserve event channels for notify
When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 21 ++++++++++++++++++++- drivers/xen/gntdev.c | 32 ++++++++++++++++++++++++++++++-- 2 files changed, 50 insertions(+), 3 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index 439352d..c95181f 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) tmp[gref->notify.pgoff] = 0; kunmap(gref->page); } - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(gref->notify.event); + evtchn_put(gref->notify.event); + } gref->notify.flags = 0; @@ -396,6 +398,23 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, goto unlock_out; } + /* We need to grab a reference to the event channel we are going to use + * to send the notify before releasing the reference we may already have + * (if someone has called this ioctl twice). This is required so that + * it is possible to change the clear_byte part of the notification + * without disturbing the event channel part, which may now be the last + * reference to that event channel. + */ + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(gref->notify.event); + gref->notify.flags = op.action; gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 5227506..f0bb322 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -193,8 +193,10 @@ static void gntdev_put_map(struct grant_map *map) atomic_sub(map->count, &pages_mapped); - if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(map->notify.event); + evtchn_put(map->notify.event); + } if (map->pages) { if (!use_ptemod) @@ -600,6 +602,8 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) struct ioctl_gntdev_unmap_notify op; struct grant_map *map; int rc; + int old_flags; + unsigned int old_event; if (copy_from_user(&op, u, sizeof(op))) return -EFAULT; @@ -625,10 +629,34 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) goto unlock_out; } + /* We need to grab a reference to the event channel we are going to use + * to send the notify before releasing the reference we may already have + * (if someone has called this ioctl twice). This is required so that + * it is possible to change the clear_byte part of the notification + * without disturbing the event channel part, which may now be the last + * reference to that event channel. + */ + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + old_flags = map->notify.flags; + old_event = map->notify.event; + map->notify.flags = op.action; map->notify.addr = op.index - (map->index << PAGE_SHIFT); map->notify.event = op.event_channel_port; - rc = 0; + spin_unlock(&priv->lock); + + /* Cannot execute evtchn_put under spinlock */ + if (old_flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(old_event); + + return 0; + unlock_out: spin_unlock(&priv->lock); return rc; -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-26 16:51 UTC
[Xen-devel] Re: [PATCH 1/3] xen/event: Add reference counting to event channels
On Wed, 2011-10-26 at 16:47 +0100, Daniel De Graaf wrote:> @@ -939,6 +943,10 @@ static void unbind_from_irq(unsigned int irq) > { > struct evtchn_close close; > int evtchn = evtchn_from_irq(irq); > + struct irq_info *info = irq_get_handler_data(irq); > + > + if (atomic_read(&info->refcnt) > 0 && !atomic_dec_and_test(&info->refcnt)) > + return;This isn''t all that atomic any more... evtchn_make_refcounted() doesn''t seem to have any locking which would save you... Perhaps you could always manipulate this flag under the mapping lock (which perhaps is normally taken around about the sort of place you''d want to do this anyway) and make it non-atomic? Or maybe you could build something with cmpxchg? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-26 16:57 UTC
Re: [Xen-devel] Re: [PATCH 1/3] xen/event: Add reference counting to event channels
On Wed, 2011-10-26 at 17:51 +0100, Ian Campbell wrote:> On Wed, 2011-10-26 at 16:47 +0100, Daniel De Graaf wrote: > > @@ -939,6 +943,10 @@ static void unbind_from_irq(unsigned int irq) > > { > > struct evtchn_close close; > > int evtchn = evtchn_from_irq(irq); > > + struct irq_info *info = irq_get_handler_data(irq); > > + > > + if (atomic_read(&info->refcnt) > 0 && !atomic_dec_and_test(&info->refcnt)) > > + return; > > This isn''t all that atomic any more... > > evtchn_make_refcounted() doesn''t seem to have any locking which would > save you... > > Perhaps you could always manipulate this flag under the mapping lock > (which perhaps is normally taken around about the sort of place you''d > want to do this anyway) and make it non-atomic? > > Or maybe you could build something with cmpxchg?Or atomic_inc_unless_negative seems to be a primitive but there''s no atomic_dev_unless_negative so unref is still tricky. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-26 17:28 UTC
[Xen-devel] Re: [PATCH 1/3] xen/event: Add reference counting to event channels
On 10/26/2011 12:51 PM, Ian Campbell wrote:> On Wed, 2011-10-26 at 16:47 +0100, Daniel De Graaf wrote: >> @@ -939,6 +943,10 @@ static void unbind_from_irq(unsigned int irq) >> { >> struct evtchn_close close; >> int evtchn = evtchn_from_irq(irq); >> + struct irq_info *info = irq_get_handler_data(irq); >> + >> + if (atomic_read(&info->refcnt) > 0 && !atomic_dec_and_test(&info->refcnt)) >> + return; > > This isn''t all that atomic any more... > > evtchn_make_refcounted() doesn''t seem to have any locking which would > save you... > > Perhaps you could always manipulate this flag under the mapping lock > (which perhaps is normally taken around about the sort of place you''d > want to do this anyway) and make it non-atomic? > > Or maybe you could build something with cmpxchg? > > Ian. >It''s atomic for the cases where it needs to be. There are two cases in which unbind_from_irq can be called: 1. Negative refcnt (to be exact, == -1). This is an internal reference, and unbind_from_irq is only called once as guaranteed by the caller. 2. Positive refcnt. Once refcnt has been changed to positive, it cannot be changed back, so "atomic_read(&info->refcnt) > 0" will always be true, for any number of parallel callers, assuming all callers had a reference to begin with. In this case, only the atomic_dec_and_test needs atomicity to resolve the race when parallel callers are running evtchn_put. The conversion from -1 to positive refcnt is done during the creation of the event channel, before multiple references to the event channel are allowed. There is also no way to convert an event channel from positive to negative (without destroying it completely) so a race from that conversion is also not possible. I considered using atomic_dec_if_positive here, but as the existing condition has no races this is unnecessary. Should this explanation be copied into a comment to avoid future confusion? -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-26 18:42 UTC
[Xen-devel] [PATCH 1/3 v3.2] xen/event: Add reference counting to event channels
The previous version had a race where running evtchn_get in parallel with an event channel''s destruction could attempt to manipulate an event channel that is in the process of being freed. Using the irq_mapping_update_lock mutex instead of an atomic refcnt prevents this. ------------------------------------------------------>8 Event channels exposed to userspace by the evtchn module may be used by other modules in an asynchronous manner, which requires that reference counting be used to prevent the event channel from being closed before the signals are delivered. The reference count on new event channels defaults to -1 which indicates the event channel is not referenced outside the kernel; evtchn_get fails if called on such an event channel. The event channels made visible to userspace by evtchn have a normal reference count. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/events.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++- drivers/xen/evtchn.c | 2 +- include/xen/events.h | 7 +++++ 3 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 073c11d..0237629 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -87,6 +87,7 @@ enum xen_irq_type { */ struct irq_info { struct list_head list; + int refcnt; enum xen_irq_type type; /* type */ unsigned irq; unsigned short evtchn; /* event channel */ @@ -406,6 +407,7 @@ static void xen_irq_init(unsigned irq) panic("Unable to allocate metadata for IRQ%d\n", irq); info->type = IRQT_UNBOUND; + info->refcnt = -1; irq_set_handler_data(irq, info); @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) irq_set_handler_data(irq, NULL); + WARN_ON(info->refcnt > 0); + kfree(info); /* Legacy IRQ descriptors are managed by the arch. */ @@ -637,7 +641,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi, if (irq != -1) { printk(KERN_INFO "xen_map_pirq_gsi: returning irq %d for gsi %u\n", irq, gsi); - goto out; /* XXX need refcount? */ + goto out; } irq = xen_allocate_irq_gsi(gsi); @@ -939,9 +943,16 @@ static void unbind_from_irq(unsigned int irq) { struct evtchn_close close; int evtchn = evtchn_from_irq(irq); + struct irq_info *info = irq_get_handler_data(irq); mutex_lock(&irq_mapping_update_lock); + if (info->refcnt > 0) { + info->refcnt--; + if (info->refcnt != 0) + goto done; + } + if (VALID_EVTCHN(evtchn)) { close.port = evtchn; if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) @@ -970,6 +981,7 @@ static void unbind_from_irq(unsigned int irq) xen_free_irq(irq); + done: mutex_unlock(&irq_mapping_update_lock); } @@ -1065,6 +1077,66 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) } EXPORT_SYMBOL_GPL(unbind_from_irqhandler); +int evtchn_make_refcounted(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + struct irq_info *info; + + if (irq == -1) + return -ENOENT; + + info = irq_get_handler_data(irq); + + if (!info) + return -ENOENT; + + WARN_ON(info->refcnt != -1); + + info->refcnt = 1; + + return 0; +} +EXPORT_SYMBOL_GPL(evtchn_make_refcounted); + +int evtchn_get(unsigned int evtchn) +{ + int irq; + struct irq_info *info; + int err = -ENOENT; + + mutex_lock(&irq_mapping_update_lock); + + irq = evtchn_to_irq[evtchn]; + if (irq == -1) + goto done; + + info = irq_get_handler_data(irq); + + if (!info) + goto done; + + err = -EINVAL; + if (info->refcnt <= 0) + goto done; + + info->refcnt++; + err = 0; + done: + mutex_unlock(&irq_mapping_update_lock); + + return err; +} +EXPORT_SYMBOL_GPL(evtchn_get); + +void evtchn_put(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + if (WARN_ON(irq == -1)) + return; + unbind_from_irq(irq); +} +EXPORT_SYMBOL_GPL(evtchn_put); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) { int irq = per_cpu(ipi_to_irq, cpu)[vector]; diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index dbc13e9..b1f60a0 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -268,7 +268,7 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port) rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED, u->name, (void *)(unsigned long)port); if (rc >= 0) - rc = 0; + rc = evtchn_make_refcounted(port); return rc; } diff --git a/include/xen/events.h b/include/xen/events.h index d287997..0f77370 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -37,6 +37,13 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, */ void unbind_from_irqhandler(unsigned int irq, void *dev_id); +/* + * Allow extra references to event channels exposed to userspace by evtchn + */ +int evtchn_make_refcounted(unsigned int evtchn); +int evtchn_get(unsigned int evtchn); +void evtchn_put(unsigned int evtchn); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); int resend_irq_on_evtchn(unsigned int irq); void rebind_evtchn_irq(int evtchn, int irq); -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-27 09:03 UTC
[Xen-devel] Re: [PATCH 1/3] xen/event: Add reference counting to event channels
On Wed, 2011-10-26 at 18:28 +0100, Daniel De Graaf wrote:> It''s atomic for the cases where it needs to be. There are two cases in which > unbind_from_irq can be called:[...snip...]> Should this explanation be copied into a comment to avoid future confusion?I would have said yes but it looks like you found a different problem and reposted a non-atomic_t/locked version, right? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-27 21:58 UTC
[Xen-devel] [PATCH v4 0/3] Add reference counting to grant notify ioctls
> The current notify ioctls assume that an event channel will not be > closed prior to the page being unmapped. If the mappings are associated > with an open file descriptor and the application crashes, the > notification behavior depends on the close ordering of the file > descriptors. To avoid this, event channels now have a reference count > that is used by the grant notify ioctls to postpone the close operation > until the notification is fired.Changes since v3: Avoid race between evtchn_get and evtchn_put (reference count changed to an integer protected by mutex) Changes since v2: Avoid possible sleep under spinlock Decrease refcount mismatch errors from BUG to WARN Use reference count to identify userspace-visible event channels Changes since v1: Rename evtchn_get/put to match kernel naming conventions Use atomic_t for refcount [PATCH 1/3] xen/event: Add reference counting to event channels [PATCH 2/3] xen/gntalloc: Change gref_lock to a mutex [PATCH 3/3] xen/gnt{dev,alloc}: reserve event channels for notify _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-27 21:58 UTC
[Xen-devel] [PATCH 1/3] xen/event: Add reference counting to event channels
Event channels exposed to userspace by the evtchn module may be used by other modules in an asynchronous manner, which requires that reference counting be used to prevent the event channel from being closed before the signals are delivered. The reference count on new event channels defaults to -1 which indicates the event channel is not referenced outside the kernel; evtchn_get fails if called on such an event channel. The event channels made visible to userspace by evtchn have a normal reference count. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/events.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++- drivers/xen/evtchn.c | 2 +- include/xen/events.h | 7 +++++ 3 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 073c11d..0237629 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -87,6 +87,7 @@ enum xen_irq_type { */ struct irq_info { struct list_head list; + int refcnt; enum xen_irq_type type; /* type */ unsigned irq; unsigned short evtchn; /* event channel */ @@ -406,6 +407,7 @@ static void xen_irq_init(unsigned irq) panic("Unable to allocate metadata for IRQ%d\n", irq); info->type = IRQT_UNBOUND; + info->refcnt = -1; irq_set_handler_data(irq, info); @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) irq_set_handler_data(irq, NULL); + WARN_ON(info->refcnt > 0); + kfree(info); /* Legacy IRQ descriptors are managed by the arch. */ @@ -637,7 +641,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi, if (irq != -1) { printk(KERN_INFO "xen_map_pirq_gsi: returning irq %d for gsi %u\n", irq, gsi); - goto out; /* XXX need refcount? */ + goto out; } irq = xen_allocate_irq_gsi(gsi); @@ -939,9 +943,16 @@ static void unbind_from_irq(unsigned int irq) { struct evtchn_close close; int evtchn = evtchn_from_irq(irq); + struct irq_info *info = irq_get_handler_data(irq); mutex_lock(&irq_mapping_update_lock); + if (info->refcnt > 0) { + info->refcnt--; + if (info->refcnt != 0) + goto done; + } + if (VALID_EVTCHN(evtchn)) { close.port = evtchn; if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) @@ -970,6 +981,7 @@ static void unbind_from_irq(unsigned int irq) xen_free_irq(irq); + done: mutex_unlock(&irq_mapping_update_lock); } @@ -1065,6 +1077,66 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) } EXPORT_SYMBOL_GPL(unbind_from_irqhandler); +int evtchn_make_refcounted(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + struct irq_info *info; + + if (irq == -1) + return -ENOENT; + + info = irq_get_handler_data(irq); + + if (!info) + return -ENOENT; + + WARN_ON(info->refcnt != -1); + + info->refcnt = 1; + + return 0; +} +EXPORT_SYMBOL_GPL(evtchn_make_refcounted); + +int evtchn_get(unsigned int evtchn) +{ + int irq; + struct irq_info *info; + int err = -ENOENT; + + mutex_lock(&irq_mapping_update_lock); + + irq = evtchn_to_irq[evtchn]; + if (irq == -1) + goto done; + + info = irq_get_handler_data(irq); + + if (!info) + goto done; + + err = -EINVAL; + if (info->refcnt <= 0) + goto done; + + info->refcnt++; + err = 0; + done: + mutex_unlock(&irq_mapping_update_lock); + + return err; +} +EXPORT_SYMBOL_GPL(evtchn_get); + +void evtchn_put(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + if (WARN_ON(irq == -1)) + return; + unbind_from_irq(irq); +} +EXPORT_SYMBOL_GPL(evtchn_put); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) { int irq = per_cpu(ipi_to_irq, cpu)[vector]; diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index dbc13e9..b1f60a0 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -268,7 +268,7 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port) rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED, u->name, (void *)(unsigned long)port); if (rc >= 0) - rc = 0; + rc = evtchn_make_refcounted(port); return rc; } diff --git a/include/xen/events.h b/include/xen/events.h index d287997..0f77370 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -37,6 +37,13 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, */ void unbind_from_irqhandler(unsigned int irq, void *dev_id); +/* + * Allow extra references to event channels exposed to userspace by evtchn + */ +int evtchn_make_refcounted(unsigned int evtchn); +int evtchn_get(unsigned int evtchn); +void evtchn_put(unsigned int evtchn); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); int resend_irq_on_evtchn(unsigned int irq); void rebind_evtchn_irq(int evtchn, int irq); -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-27 21:58 UTC
[Xen-devel] [PATCH 2/3] xen/gntalloc: Change gref_lock to a mutex
The event channel release function cannot be called under a spinlock because it can attempt to acquire a mutex due to the event channel reference acquired when setting up unmap notifications. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 41 +++++++++++++++++++++-------------------- 1 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index f6832f4..439352d 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -74,7 +74,7 @@ MODULE_PARM_DESC(limit, "Maximum number of grants that may be allocated by " "the gntalloc device"); static LIST_HEAD(gref_list); -static DEFINE_SPINLOCK(gref_lock); +static DEFINE_MUTEX(gref_mutex); static int gref_size; struct notify_info { @@ -143,15 +143,15 @@ static int add_grefs(struct ioctl_gntalloc_alloc_gref *op, } /* Add to gref lists. */ - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); list_splice_tail(&queue_gref, &gref_list); list_splice_tail(&queue_file, &priv->list); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return 0; undo: - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref_size -= (op->count - i); list_for_each_entry(gref, &queue_file, next_file) { @@ -167,7 +167,7 @@ undo: */ if (unlikely(!list_empty(&queue_gref))) list_splice_tail(&queue_gref, &gref_list); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rc; } @@ -251,7 +251,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp) pr_debug("%s: priv %p\n", __func__, priv); - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); while (!list_empty(&priv->list)) { gref = list_entry(priv->list.next, struct gntalloc_gref, next_file); @@ -261,7 +261,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp) __del_gref(gref); } kfree(priv); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return 0; } @@ -286,21 +286,21 @@ static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, goto out; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); /* Clean up pages that were at zero (local) users but were still mapped * by remote domains. Since those pages count towards the limit that we * are about to enforce, removing them here is a good idea. */ do_cleanup(); if (gref_size + op.count > limit) { - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); rc = -ENOSPC; goto out_free; } gref_size += op.count; op.index = priv->index; priv->index += op.count * PAGE_SIZE; - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); rc = add_grefs(&op, gref_ids, priv); if (rc < 0) @@ -343,7 +343,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, goto dealloc_grant_out; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, op.index, op.count); if (gref) { /* Remove from the file list only, and decrease reference count. @@ -363,7 +363,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, do_cleanup(); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); dealloc_grant_out: return rc; } @@ -383,7 +383,7 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, index = op.index & ~(PAGE_SIZE - 1); pgoff = op.index & (PAGE_SIZE - 1); - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, index, 1); if (!gref) { @@ -400,8 +400,9 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; rc = 0; + unlock_out: - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rc; } @@ -433,9 +434,9 @@ static void gntalloc_vma_open(struct vm_area_struct *vma) if (!gref) return; - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref->users++; - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); } static void gntalloc_vma_close(struct vm_area_struct *vma) @@ -444,11 +445,11 @@ static void gntalloc_vma_close(struct vm_area_struct *vma) if (!gref) return; - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref->users--; if (gref->users == 0) __del_gref(gref); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); } static struct vm_operations_struct gntalloc_vmops = { @@ -471,7 +472,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) return -EINVAL; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, vma->vm_pgoff << PAGE_SHIFT, count); if (gref == NULL) { rv = -ENOENT; @@ -499,7 +500,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) rv = 0; out_unlock: - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rv; } -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2011-Oct-27 21:58 UTC
[Xen-devel] [PATCH 3/3] xen/gnt{dev, alloc}: reserve event channels for notify
When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 21 ++++++++++++++++++++- drivers/xen/gntdev.c | 31 ++++++++++++++++++++++++++++++- 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index 439352d..c95181f 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) tmp[gref->notify.pgoff] = 0; kunmap(gref->page); } - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(gref->notify.event); + evtchn_put(gref->notify.event); + } gref->notify.flags = 0; @@ -396,6 +398,23 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, goto unlock_out; } + /* We need to grab a reference to the event channel we are going to use + * to send the notify before releasing the reference we may already have + * (if someone has called this ioctl twice). This is required so that + * it is possible to change the clear_byte part of the notification + * without disturbing the event channel part, which may now be the last + * reference to that event channel. + */ + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(gref->notify.event); + gref->notify.flags = op.action; gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 5227506..f87ce52 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -193,8 +193,10 @@ static void gntdev_put_map(struct grant_map *map) atomic_sub(map->count, &pages_mapped); - if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(map->notify.event); + evtchn_put(map->notify.event); + } if (map->pages) { if (!use_ptemod) @@ -600,6 +602,8 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) struct ioctl_gntdev_unmap_notify op; struct grant_map *map; int rc; + int out_flags; + unsigned int out_event; if (copy_from_user(&op, u, sizeof(op))) return -EFAULT; @@ -607,6 +611,21 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) if (op.action & ~(UNMAP_NOTIFY_CLEAR_BYTE|UNMAP_NOTIFY_SEND_EVENT)) return -EINVAL; + /* We need to grab a reference to the event channel we are going to use + * to send the notify before releasing the reference we may already have + * (if someone has called this ioctl twice). This is required so that + * it is possible to change the clear_byte part of the notification + * without disturbing the event channel part, which may now be the last + * reference to that event channel. + */ + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) + return -EINVAL; + } + + out_flags = op.action; + out_event = op.event_channel_port; + spin_lock(&priv->lock); list_for_each_entry(map, &priv->maps, next) { @@ -625,12 +644,22 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) goto unlock_out; } + out_flags = map->notify.flags; + out_event = map->notify.event; + map->notify.flags = op.action; map->notify.addr = op.index - (map->index << PAGE_SHIFT); map->notify.event = op.event_channel_port; + rc = 0; + unlock_out: spin_unlock(&priv->lock); + + /* Drop the reference to the event channel we did not save in the map */ + if (out_flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(out_event); + return rc; } -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-11 20:33 UTC
[Xen-devel] Re: [PATCH v4 0/3] Add reference counting to grant notify ioctls
On Thu, Oct 27, 2011 at 05:58:46PM -0400, Daniel De Graaf wrote:> > The current notify ioctls assume that an event channel will not be > > closed prior to the page being unmapped. If the mappings are associated > > with an open file descriptor and the application crashes, the > > notification behavior depends on the close ordering of the file > > descriptors. To avoid this, event channels now have a reference count > > that is used by the grant notify ioctls to postpone the close operation > > until the notification is fired.OK, it all looks sensible to me. Sticking on my 3.3 queue. Thx!> > Changes since v3: > Avoid race between evtchn_get and evtchn_put > (reference count changed to an integer protected by mutex) > > Changes since v2: > Avoid possible sleep under spinlock > Decrease refcount mismatch errors from BUG to WARN > Use reference count to identify userspace-visible event channels > > Changes since v1: > Rename evtchn_get/put to match kernel naming conventions > Use atomic_t for refcount > > [PATCH 1/3] xen/event: Add reference counting to event channels > [PATCH 2/3] xen/gntalloc: Change gref_lock to a mutex > [PATCH 3/3] xen/gnt{dev,alloc}: reserve event channels for notify_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel