This fixes a bug triggered by moving events between queues (either moving VCPUs or changing their priority). Testing with a process continually moving all event channels between VCPUs has been done. This would previously fail in under an hour but with this fix the system stayed up for over 10 days. It has also been through a reduced set of XenServer''s automated tests and no issues were found. I would have preferred to run though the full set of tests but it didn''t look like I''d get a slot before the first 4.4 release candidate. Changes in v6: - Limit loop to acquire old_q->lock to 3 iterations. Changes in v5: - Only set READY bits for new heads. - Rework old tail bug fix to cover all cases. Changes in v4: - const struct domain * - Clear BUSY with existing cmpxchg() where possible. - Fix BUSY bit debug output. Changes in v3: - Use a new BUSY bit to block guests from clearing UNMASKED, this is lower overhead than the previous solution (which required a hypercall). - Fix another problem with moving events between queues. - Add evtchn->last_vpcu_id and evtchn->last_priority instead of evtchn->q. This keeps the structure at 32 bytes long. Changes in v2: - Add MAINTAINERS patch - Remove some unnecessary temporary pending state clears - Add fix for DoS David
David Vrabel
2013-Dec-06 17:38 UTC
[PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
From: David Vrabel <david.vrabel@citrix.com> An event may still be the tail of a queue even if the queue is now empty (an ''old tail'' event). There is logic to handle the case when this old tail event needs to be added to the now empty queue (by checking for q->tail == port). However, this does not cover all cases. 1. An old tail may be re-added simultaneously with another event. LINKED is set on the old tail, and the other CPU may misinterpret this as the old tail still being valid and set LINK instead of HEAD. All events on this queue will then be lost. 2. If the old tail event on queue A is moved to a different queue B (by changing its VCPU or priority), the event may then be linked onto queue B. When another event is linked onto queue A it will check the old tail, see that it is linked (but on queue B) and overwrite the LINK field, corrupting both queues. When an event is linked, save the vcpu id and priority of the queue it is being linked onto. Use this when linking an event to check if it is an unlinked old tail event. If it is an old tail event, the old queue is empty and old_q->tail is invalidated to ensure adding another event to old_q will update HEAD. The tail is invalidated by setting it to 0 since the event 0 is never linked. The old_q->lock is held while setting LINKED to avoid the race with the test of LINKED in evtchn_fifo_set_link(). Since a event channel may move queues after old_q->lock is acquired, we must check that we have the correct lock and retry if not. Since changing VCPUs or priority is expected to be rare events that are serialized in the guest, we try at most 3 times before dropping the event. This prevents a malicious guest from repeatedly adjusting priority to prevent another domain from acquiring old_q->lock. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_fifo.c | 80 ++++++++++++++++++++++++++++++++++++++++------- xen/include/xen/sched.h | 2 + 2 files changed, 70 insertions(+), 12 deletions(-) diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index 6048784..b29297f 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -34,6 +34,36 @@ static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d, return d->evtchn_fifo->event_array[p] + w; } +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, + struct evtchn *evtchn, + unsigned long *flags) +{ + struct vcpu *v; + struct evtchn_fifo_queue *q, *old_q; + unsigned int try; + + for ( try = 0; try < 3; try++ ) + { + v = d->vcpu[evtchn->last_vcpu_id]; + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + spin_lock_irqsave(&old_q->lock, *flags); + + v = d->vcpu[evtchn->last_vcpu_id]; + q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + if ( old_q == q ) + return old_q; + + spin_unlock_irqrestore(&old_q->lock, *flags); + } + + gdprintk(XENLOG_WARNING, + "domain %d, port %d lost event (too many queue changes)\n", + d->domain_id, evtchn->port); + return NULL; +} + static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) { event_word_t new, old; @@ -103,7 +133,6 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) struct domain *d = v->domain; unsigned int port; event_word_t *word; - struct evtchn_fifo_queue *q; unsigned long flags; bool_t was_pending; @@ -120,25 +149,52 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) return; } - /* - * No locking around getting the queue. This may race with - * changing the priority but we are allowed to signal the event - * once on the old priority. - */ - q = &v->evtchn_fifo->queue[evtchn->priority]; - was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word); /* * Link the event if it unmasked and not already linked. */ if ( !test_bit(EVTCHN_FIFO_MASKED, word) - && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + && !test_bit(EVTCHN_FIFO_LINKED, word) ) { + struct evtchn_fifo_queue *q, *old_q; event_word_t *tail_word; bool_t linked = 0; - spin_lock_irqsave(&q->lock, flags); + /* + * No locking around getting the queue. This may race with + * changing the priority but we are allowed to signal the + * event once on the old priority. + */ + q = &v->evtchn_fifo->queue[evtchn->priority]; + + old_q = lock_old_queue(d, evtchn, &flags); + if ( !old_q ) + goto done; + + if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + { + spin_unlock_irqrestore(&old_q->lock, flags); + goto done; + } + + /* + * If this event was a tail, the old queue is now empty and + * its tail must be invalidated to prevent adding an event to + * the old queue from corrupting the new queue. + */ + if ( old_q->tail == port ) + old_q->tail = 0; + + /* Moved to a different queue? */ + if ( old_q != q ) + { + evtchn->last_vcpu_id = evtchn->notify_vcpu_id; + evtchn->last_priority = evtchn->priority; + + spin_unlock_irqrestore(&old_q->lock, flags); + spin_lock_irqsave(&q->lock, flags); + } /* * Atomically link the tail to port iff the tail is linked. @@ -150,7 +206,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) * If the queue is empty (i.e., we haven''t linked to the new * event), head must be updated. */ - if ( port != q->tail ) + if ( q->tail ) { tail_word = evtchn_fifo_word_from_port(d, q->tail); linked = evtchn_fifo_set_link(d, tail_word, port); @@ -166,7 +222,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) &v->evtchn_fifo->control_block->ready) ) vcpu_mark_events_pending(v); } - + done: if ( !was_pending ) evtchn_check_pollers(d, port); } diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index cbdf377..5ab92dd 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -98,6 +98,8 @@ struct evtchn } u; u8 priority; u8 pending:1; + u16 last_vcpu_id; + u8 last_priority; #ifdef FLASK_ENABLE void *ssid; #endif -- 1.7.2.5
Jan Beulich
2013-Dec-09 09:32 UTC
Re: [PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
>>> On 06.12.13 at 18:38, David Vrabel <david.vrabel@citrix.com> wrote: > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -98,6 +98,8 @@ struct evtchn > } u; > u8 priority; > u8 pending:1; > + u16 last_vcpu_id; > + u8 last_priority;Is it really correct for these two new fields to remain uninitialized until evtchn_fifo_set_pending() would get run the first time (and hence thinking there was a move this first time through)? Which also gets me to ask whether it''s really correct to only set the priority to EVTCHN_FIFO_PRIORITY_DEFAULT in setup_ports(), but not on any subsequently allocated/bound ones? Jan
David Vrabel
2013-Dec-09 11:49 UTC
Re: [PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
On 09/12/13 09:32, Jan Beulich wrote:>>>> On 06.12.13 at 18:38, David Vrabel <david.vrabel@citrix.com> wrote: >> --- a/xen/include/xen/sched.h >> +++ b/xen/include/xen/sched.h >> @@ -98,6 +98,8 @@ struct evtchn >> } u; >> u8 priority; >> u8 pending:1; >> + u16 last_vcpu_id; >> + u8 last_priority; > > Is it really correct for these two new fields to remain uninitialized > until evtchn_fifo_set_pending() would get run the first time (and > hence thinking there was a move this first time through)?They''re initialized to zero and I think this is fine. The code as-is is simpler than having to special case events that have never been on a queue.> Which also gets me to ask whether it''s really correct to only set > the priority to EVTCHN_FIFO_PRIORITY_DEFAULT in setup_ports(), > but not on any subsequently allocated/bound ones?This patch fixes this but would you prefer a new evtchn_port_op hook for the init? --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -141,6 +141,7 @@ static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port) return NULL; } chn[i].port = port + i; + chn[i].priority = EVTCHN_FIFO_PRIORITY_DEFAULT; } return chn; } diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index b29297f..394879e 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -444,7 +444,6 @@ static void setup_ports(struct domain *d) * For each port that is already bound: * * - save its pending state. - * - set default priority. */ for ( port = 1; port < d->max_evtchns; port++ ) { @@ -457,8 +456,6 @@ static void setup_ports(struct domain *d) if ( test_bit(port, &shared_info(d, evtchn_pending)) ) evtchn->pending = 1; - - evtchn_fifo_set_priority(d, evtchn, EVTCHN_FIFO_PRIORITY_DEFAULT); } } David
Jan Beulich
2013-Dec-09 12:21 UTC
Re: [PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
>>> On 09.12.13 at 12:49, David Vrabel <david.vrabel@citrix.com> wrote: > On 09/12/13 09:32, Jan Beulich wrote: >>>>> On 06.12.13 at 18:38, David Vrabel <david.vrabel@citrix.com> wrote: >>> --- a/xen/include/xen/sched.h >>> +++ b/xen/include/xen/sched.h >>> @@ -98,6 +98,8 @@ struct evtchn >>> } u; >>> u8 priority; >>> u8 pending:1; >>> + u16 last_vcpu_id; >>> + u8 last_priority; >> >> Is it really correct for these two new fields to remain uninitialized >> until evtchn_fifo_set_pending() would get run the first time (and >> hence thinking there was a move this first time through)? > > They''re initialized to zero and I think this is fine. The code as-is is > simpler than having to special case events that have never been on a queue.I''m not asking to add a special case, I''m only asking to initialize all fields correctly. Just like you ought to set up ->priority, you likely ought to set up the two new fields.>> Which also gets me to ask whether it''s really correct to only set >> the priority to EVTCHN_FIFO_PRIORITY_DEFAULT in setup_ports(), >> but not on any subsequently allocated/bound ones? > > This patch fixes this but would you prefer a new evtchn_port_op hook for > the init?Not really, and yes please (or perhaps it would be cheaper to have struct evtchn_port_ops specify the intended default priority, and common code simply copy that field).> --- a/xen/common/event_channel.c > +++ b/xen/common/event_channel.c > @@ -141,6 +141,7 @@ static struct evtchn *alloc_evtchn_bucket(struct > domain *d, unsigned int port) > return NULL; > } > chn[i].port = port + i; > + chn[i].priority = EVTCHN_FIFO_PRIORITY_DEFAULT;"Not really" because this takes care of only the case where the bucket gets allocated, but doesn''t seem to take care of event channel slots getting re-used. Jan> --- a/xen/common/event_fifo.c > +++ b/xen/common/event_fifo.c > @@ -444,7 +444,6 @@ static void setup_ports(struct domain *d) > * For each port that is already bound: > * > * - save its pending state. > - * - set default priority. > */ > for ( port = 1; port < d->max_evtchns; port++ ) > { > @@ -457,8 +456,6 @@ static void setup_ports(struct domain *d) > > if ( test_bit(port, &shared_info(d, evtchn_pending)) ) > evtchn->pending = 1; > - > - evtchn_fifo_set_priority(d, evtchn, EVTCHN_FIFO_PRIORITY_DEFAULT); > } > } > > David
David Vrabel
2013-Dec-09 12:56 UTC
Re: [PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
On 09/12/13 12:21, Jan Beulich wrote:>>>> On 09.12.13 at 12:49, David Vrabel <david.vrabel@citrix.com> wrote: >> On 09/12/13 09:32, Jan Beulich wrote: >>>>>> On 06.12.13 at 18:38, David Vrabel <david.vrabel@citrix.com> wrote: >>>> --- a/xen/include/xen/sched.h >>>> +++ b/xen/include/xen/sched.h >>>> @@ -98,6 +98,8 @@ struct evtchn >>>> } u; >>>> u8 priority; >>>> u8 pending:1; >>>> + u16 last_vcpu_id; >>>> + u8 last_priority; >>> >>> Is it really correct for these two new fields to remain uninitialized >>> until evtchn_fifo_set_pending() would get run the first time (and >>> hence thinking there was a move this first time through)? >> >> They''re initialized to zero and I think this is fine. The code as-is is >> simpler than having to special case events that have never been on a queue. > > I''m not asking to add a special case, I''m only asking to initialize all > fields correctly. Just like you ought to set up ->priority, you > likely ought to set up the two new fields.It''s not clear how you think they''re not initialized. They''re initialized to zero when the evtchn is allocated and then they must only be set in evtchn_fifo_set_pending() when they move to a new queue. Do you think they should be initialized when an event is (re)bound? Because this would be broken as an unbound event might be an old tail.>>> Which also gets me to ask whether it''s really correct to only set >>> the priority to EVTCHN_FIFO_PRIORITY_DEFAULT in setup_ports(), >>> but not on any subsequently allocated/bound ones? >> >> This patch fixes this but would you prefer a new evtchn_port_op hook for >> the init? > > Not really, and yes please (or perhaps it would be cheaper to have > struct evtchn_port_ops specify the intended default priority, and > common code simply copy that field).Ok. David
Jan Beulich
2013-Dec-09 13:10 UTC
Re: [PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
>>> On 09.12.13 at 13:56, David Vrabel <david.vrabel@citrix.com> wrote: > On 09/12/13 12:21, Jan Beulich wrote: >>>>> On 09.12.13 at 12:49, David Vrabel <david.vrabel@citrix.com> wrote: >>> On 09/12/13 09:32, Jan Beulich wrote: >>>>>>> On 06.12.13 at 18:38, David Vrabel <david.vrabel@citrix.com> wrote: >>>>> --- a/xen/include/xen/sched.h >>>>> +++ b/xen/include/xen/sched.h >>>>> @@ -98,6 +98,8 @@ struct evtchn >>>>> } u; >>>>> u8 priority; >>>>> u8 pending:1; >>>>> + u16 last_vcpu_id; >>>>> + u8 last_priority; >>>> >>>> Is it really correct for these two new fields to remain uninitialized >>>> until evtchn_fifo_set_pending() would get run the first time (and >>>> hence thinking there was a move this first time through)? >>> >>> They''re initialized to zero and I think this is fine. The code as-is is >>> simpler than having to special case events that have never been on a queue. >> >> I''m not asking to add a special case, I''m only asking to initialize all >> fields correctly. Just like you ought to set up ->priority, you >> likely ought to set up the two new fields. > > It''s not clear how you think they''re not initialized. They''re > initialized to zero when the evtchn is allocated and then they must only > be set in evtchn_fifo_set_pending() when they move to a new queue.My primary concern is with them being zero (and hence out of sync with the real values that things start out with), there may be subtle corruption later on. Secondary is that - as said - this would at least trigger one unnecessary move in evtchn_fifo_set_pending().> Do you think they should be initialized when an event is (re)bound? > Because this would be broken as an unbound event might be an old tail.But if you don''t do this, then you _require_ a set-priority operation, yet that one''s necessarily non-atomic with the bind. Newly created event channels should start out at the default priority irrespective of what the underlying tracking structure in the hypervisor was used for before. If that causes an issue with other state, then this needs addressing (and should not serve as an excuse to leave things in an unpredictable - from the guest''s perspective - state). Jan
David Vrabel
2013-Dec-09 14:43 UTC
Re: [PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
On 09/12/13 13:10, Jan Beulich wrote:>>>> On 09.12.13 at 13:56, David Vrabel <david.vrabel@citrix.com> wrote: >> On 09/12/13 12:21, Jan Beulich wrote: >>>>>> On 09.12.13 at 12:49, David Vrabel <david.vrabel@citrix.com> wrote: >>>> On 09/12/13 09:32, Jan Beulich wrote: >>>>>>>> On 06.12.13 at 18:38, David Vrabel <david.vrabel@citrix.com> wrote: >>>>>> --- a/xen/include/xen/sched.h >>>>>> +++ b/xen/include/xen/sched.h >>>>>> @@ -98,6 +98,8 @@ struct evtchn >>>>>> } u; >>>>>> u8 priority; >>>>>> u8 pending:1; >>>>>> + u16 last_vcpu_id; >>>>>> + u8 last_priority; >>>>> >>>>> Is it really correct for these two new fields to remain uninitialized >>>>> until evtchn_fifo_set_pending() would get run the first time (and >>>>> hence thinking there was a move this first time through)? >>>> >>>> They''re initialized to zero and I think this is fine. The code as-is is >>>> simpler than having to special case events that have never been on a queue. >>> >>> I''m not asking to add a special case, I''m only asking to initialize all >>> fields correctly. Just like you ought to set up ->priority, you >>> likely ought to set up the two new fields. >> >> It''s not clear how you think they''re not initialized. They''re >> initialized to zero when the evtchn is allocated and then they must only >> be set in evtchn_fifo_set_pending() when they move to a new queue. > > My primary concern is with them being zero (and hence out of sync > with the real values that things start out with), there may be subtle > corruption later on. Secondary is that - as said - this would at least > trigger one unnecessary move in evtchn_fifo_set_pending().(0, 0) is still a valid queue and it is always safe to do: if ( old_q->tail == port ) old_q->tail = 0 so I''m not seeing any risk of subtle corruption anywhere. An unnecessary move once per port is hardly expensive so not something I would introduce complexity in the common case to avoid.>> Do you think they should be initialized when an event is (re)bound? >> Because this would be broken as an unbound event might be an old tail. > > But if you don''t do this, then you _require_ a set-priority operation, > yet that one''s necessarily non-atomic with the bind. Newly created > event channels should start out at the default priority irrespective > of what the underlying tracking structure in the hypervisor was > used for before.Xen can only move an event between queues if that event isn''t on a queue. It is also not notified when an event is removed from a queue. The guest can ensure a predictable state by only unbinding events that are not currently on a queue. e.g., /* prevent it becoming LINKED. */ set_bit(word, MASKED) /* wait for interrupt handlers to drain event from its queue. */ while (test_bit(word, LINKED)) ; /* Unlinked and masked, safe to unbind. If this port is bound again it will becoming pending on the correct new queue. */ unbind() There doesn''t need to be anything added to Xen to support this. The guest may need to defer to wait and unbind to a work queue or similar. David
Jan Beulich
2013-Dec-09 15:29 UTC
Re: [PATCH] evtchn/fifo: don''t corrupt queues if an old tail is linked
>>> On 09.12.13 at 15:43, David Vrabel <david.vrabel@citrix.com> wrote: > On 09/12/13 13:10, Jan Beulich wrote: >>>>> On 09.12.13 at 13:56, David Vrabel <david.vrabel@citrix.com> wrote: >>> Do you think they should be initialized when an event is (re)bound? >>> Because this would be broken as an unbound event might be an old tail. >> >> But if you don''t do this, then you _require_ a set-priority operation, >> yet that one''s necessarily non-atomic with the bind. Newly created >> event channels should start out at the default priority irrespective >> of what the underlying tracking structure in the hypervisor was >> used for before. > > Xen can only move an event between queues if that event isn''t on a > queue. It is also not notified when an event is removed from a queue. > > The guest can ensure a predictable state by only unbinding events that > are not currently on a queue. e.g., > > /* prevent it becoming LINKED. */ > set_bit(word, MASKED) > /* wait for interrupt handlers to drain event from its queue. */ > while (test_bit(word, LINKED)) > ; > /* Unlinked and masked, safe to unbind. If this port is bound again > it will becoming pending on the correct new queue. */ > unbind() > > There doesn''t need to be anything added to Xen to support this. > > The guest may need to defer to wait and unbind to a work queue or similar.I still don''t see how the event, after having got de-allocated and re-bound, would end up on the default priority queue. Yet other than for an active event channel, where the guest has to be prepared for it to fire once on the old priority when altering its priority, a freshly bound event channel shouldn''t fire on e.g. the lowest or highest priority, as that _may_ confuse the guest. Jan