This series address two design flaws in the FIFO-based event channel ABI. 1. Fix a potential DoS caused by an unbounded loop when setting LINK. 2. Fix queue corruption that may occurs when events are moved between queues. An updated design document is available from: http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf - Add the BUSY bit to indicate that the guest must not clear MASKED. v9 of the Linux patches have been posted already. Changes in v4: - const struct domain * - Clear BUSY with existing cmpxchg() where possible. - Fix BUSY bit debug output. Changes in v3: - Use a new BUSY bit to block guests from clearing UNMASKED, this is lower overhead than the previous solution (which required a hypercall). - Fix another problem with moving events between queues. - Add evtchn->last_vpcu_id and evtchn->last_priority instead of evtchn->q. This keeps the structure at 32 bytes long. Changes in v2: - Add MAINTAINERS patch - Remove some unnecessary temporary pending state clears - Add fix for DoS
David Vrabel
2013-Nov-12 11:38 UTC
[PATCH 1/2] evtchn/fifo: don''t spin indefinitely when setting LINK
From: David Vrabel <david.vrabel@citrix.com> A malicious or buggy guest can cause another domain to spin indefinitely by repeatedly writing to an event word when the other guest is trying to link a new event. The cmpxchg() in evtchn_fifo_set_link() will repeatedly fail and the loop may never terminate. Fixing this requires a change to the ABI which is documented in draft H of the design. http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf Since a well-behaved guest only makes a limited set of state changes, the loop can terminate early if the guest makes an invalid state transition. The guest may: - clear LINKED and LINK. - clear PENDING - set MASKED - clear MASKED It is valid for the guest to mask and unmask an event at any time so specify that it is not valid for a guest to clear MASKED if Xen is trying to update LINK. Indicate this to the guest with an additional BUSY bit in the event word. The guest must not clear MASKED if BUSY is set and it should spin until BUSY is cleared. The remaining valid writes (clear LINKED, clear PENDING, set MASKED, clear MASKED by Xen) will limit the number of failures of the cmpxchg() to at most 4. A clear of LINKED will also terminate the loop early. Therefore, the loop can then be limited to at most 4 iterations. If the buggy or malicious guest does cause the loop to exit with LINKED set and LINK unset then that buggy guest will lose events. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Anthony Liguori <aliguori@amazon.com> --- xen/common/event_fifo.c | 75 +++++++++++++++++++++++++++++------ xen/include/public/event_channel.h | 1 + 2 files changed, 63 insertions(+), 13 deletions(-) diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index 64dbfff..9106c55 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -34,19 +34,67 @@ static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d, return d->evtchn_fifo->event_array[p] + w; } -static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link) +static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) { - event_word_t n, o, w; + event_word_t new, old; - w = *word; + if ( !(*w & (1 << EVTCHN_FIFO_LINKED)) ) + return 0; - do { - if ( !(w & (1 << EVTCHN_FIFO_LINKED)) ) - return 0; - o = w; - n = (w & ~EVTCHN_FIFO_LINK_MASK) | link; - } while ( (w = cmpxchg(word, o, n)) != o ); + old = *w; + new = (old & ~((1 << EVTCHN_FIFO_BUSY) | EVTCHN_FIFO_LINK_MASK)) | link; + *w = cmpxchg(word, old, new); + if ( *w == old ) + return 1; + return -EAGAIN; +} + +/* + * Atomically set the LINK field iff it is still LINKED. + * + * The guest is only permitted to make the following changes to a + * LINKED event. + * + * - set MASKED + * - clear MASKED + * - clear PENDING + * - clear LINKED (and LINK) + * + * We block unmasking by the guest by marking the tail word as BUSY, + * therefore, the cmpxchg() may fail at most 4 times. + */ +static bool_t evtchn_fifo_set_link(const struct domain *d, event_word_t *word, + uint32_t link) +{ + event_word_t w; + unsigned int try; + int ret; + + w = read_atomic(word); + + ret = try_set_link(word, &w, link); + if ( ret >= 0 ) + return ret; + + /* Lock the word to prevent guest unmasking. */ + set_bit(EVTCHN_FIFO_BUSY, word); + + w = read_atomic(word); + + for ( try = 0; try < 4; try++ ) + { + ret = try_set_link(word, &w, link); + if ( ret >= 0 ) + { + if ( ret == 0 ) + clear_bit(EVTCHN_FIFO_BUSY, word); + return ret; + } + } + gdprintk(XENLOG_WARNING, "domain %d, port %d not linked\n", + d->domain_id, link); + clear_bit(EVTCHN_FIFO_BUSY, word); return 1; } @@ -105,7 +153,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) if ( port != q->tail ) { tail_word = evtchn_fifo_word_from_port(d, q->tail); - linked = evtchn_fifo_set_link(tail_word, port); + linked = evtchn_fifo_set_link(d, tail_word, port); } if ( !linked ) write_atomic(q->head, port); @@ -202,11 +250,12 @@ static void evtchn_fifo_print_state(struct domain *d, word = evtchn_fifo_word_from_port(d, evtchn->port); if ( !word ) - printk("? "); + printk("? "); else if ( test_bit(EVTCHN_FIFO_LINKED, word) ) - printk("%-4u", *word & EVTCHN_FIFO_LINK_MASK); + printk("%c %-4u", test_bit(EVTCHN_FIFO_BUSY, word) ? ''B'' : '' '', + *word & EVTCHN_FIFO_LINK_MASK); else - printk("- "); + printk("%c - ", test_bit(EVTCHN_FIFO_BUSY, word) ? ''B'' : '' ''); } static const struct evtchn_port_ops evtchn_port_ops_fifo diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h index 4a53484..b78ee32 100644 --- a/xen/include/public/event_channel.h +++ b/xen/include/public/event_channel.h @@ -343,6 +343,7 @@ typedef uint32_t event_word_t; #define EVTCHN_FIFO_PENDING 31 #define EVTCHN_FIFO_MASKED 30 #define EVTCHN_FIFO_LINKED 29 +#define EVTCHN_FIFO_BUSY 28 #define EVTCHN_FIFO_LINK_BITS 17 #define EVTCHN_FIFO_LINK_MASK ((1 << EVTCHN_FIFO_LINK_BITS) - 1) -- 1.7.2.5
David Vrabel
2013-Nov-12 11:38 UTC
[PATCH 2/2] evtchn/fifo: don''t corrupt queues if an old tail moves queues
From: David Vrabel <david.vrabel@citrix.com> An event may still be the tail of a queue even if the queue is now empty (an ''old tail'' event). There is logic to handle the case when this old tail event needs to be added to the now empty queue (by checking for q->tail == port). However, if the old tail event on queue A is moved to a different queue B (by changing its VCPU or priority), the event may then be linked onto queue B. When another event is linked onto queue A it will check the old tail, see that it is linked (but on queue B) and overwrite the LINK field, corrupting both queues. When an event is linked, save the vcpu id and priority of thee queue it is being linked onto. Use this when linking an event to check if it is an unlinked old tail event. i.e., a) it has moved queues; b) it is currently unlinked; and c) it''s the tail of the old queue. If it is an unlinked, old tail event, the old queue is empty and old_q->tail is invalidated to ensure adding another event to old_q will update HEAD. The tail is invalidated by setting it to 0 since the event 0 is never linked. The old_q->lock is held while setting LINKED to avoid a race with the test of LINKED in evtchn_fifo_set_link(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_fifo.c | 43 ++++++++++++++++++++++++++++++++++++++++++- xen/include/xen/sched.h | 2 ++ 2 files changed, 44 insertions(+), 1 deletions(-) diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index 9106c55..8e126d6 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -98,6 +98,47 @@ static bool_t evtchn_fifo_set_link(const struct domain *d, event_word_t *word, return 1; } +static bool_t test_and_set_linked(const struct domain *d, + struct evtchn *evtchn, + struct evtchn_fifo_queue *q, + event_word_t *word) +{ + struct vcpu *old_v; + struct evtchn_fifo_queue *old_q; + bool_t was_linked; + unsigned long flags; + + old_v = d->vcpu[evtchn->last_vcpu_id]; + old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority]; + + evtchn->last_vcpu_id = evtchn->notify_vcpu_id; + evtchn->last_priority = evtchn->priority; + + if ( q == old_q ) + return test_and_set_bit(EVTCHN_FIFO_LINKED, word); + + /* + * This event is now on a different queue. + * + * If the event is still linked in the old queue it won''t be moved + * yet. + * + * If this event is unlinked /and/ it''s the old queue''s tail, the + * old queue is empty and its tail must be invalidated to prevent + * adding an event to the old queue from corrupting the new queue. + */ + spin_lock_irqsave(&old_q->lock, flags); + + was_linked = test_and_set_bit(EVTCHN_FIFO_LINKED, word); + + if ( !was_linked && old_q->tail == evtchn->port ) + old_q->tail = 0; + + spin_unlock_irqrestore(&old_q->lock, flags); + + return was_linked; +} + static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) { struct domain *d = v->domain; @@ -133,7 +174,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) * Link the event if it unmasked and not already linked. */ if ( !test_bit(EVTCHN_FIFO_MASKED, word) - && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + && !test_and_set_linked(d, evtchn, q, word) ) { event_word_t *tail_word; bool_t linked = 0; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 2397537..3714c37 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -98,6 +98,8 @@ struct evtchn } u; u8 priority; u8 pending:1; + u16 last_vcpu_id; + u8 last_priority; #ifdef FLASK_ENABLE void *ssid; #endif -- 1.7.2.5
David Vrabel
2013-Nov-15 13:15 UTC
Re: [PATCH 2/2] evtchn/fifo: don''t corrupt queues if an old tail moves queues
On 12/11/13 11:38, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > An event may still be the tail of a queue even if the queue is now > empty (an ''old tail'' event). There is logic to handle the case when > this old tail event needs to be added to the now empty queue (by > checking for q->tail == port). > > However, if the old tail event on queue A is moved to a different > queue B (by changing its VCPU or priority), the event may then be > linked onto queue B. When another event is linked onto queue A it > will check the old tail, see that it is linked (but on queue B) and > overwrite the LINK field, corrupting both queues.This fix is not quite right so don''t apply.> --- a/xen/common/event_fifo.c > +++ b/xen/common/event_fifo.c > @@ -98,6 +98,47 @@ static bool_t evtchn_fifo_set_link(const struct domain *d, event_word_t *word, > return 1; > } > > +static bool_t test_and_set_linked(const struct domain *d, > + struct evtchn *evtchn, > + struct evtchn_fifo_queue *q, > + event_word_t *word) > +{ > + struct vcpu *old_v; > + struct evtchn_fifo_queue *old_q; > + bool_t was_linked; > + unsigned long flags; > + > + old_v = d->vcpu[evtchn->last_vcpu_id]; > + old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority]; > + > + evtchn->last_vcpu_id = evtchn->notify_vcpu_id; > + evtchn->last_priority = evtchn->priority;We set the last queue here even if we''re not moving it yet. David
David Vrabel
2013-Nov-20 15:19 UTC
Re: [PATCH 1/2] evtchn/fifo: don''t spin indefinitely when setting LINK
On 12/11/13 11:38, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > A malicious or buggy guest can cause another domain to spin > indefinitely by repeatedly writing to an event word when the other > guest is trying to link a new event. The cmpxchg() in > evtchn_fifo_set_link() will repeatedly fail and the loop may never > terminate.This was fixed by introducing the BUSY bit (see below), but I wonder if another solution would be better. The MASKED bits could all be moved into a separate bit array, guest writes to set/clear the MASKED bit would never conflict with Xen updating the main event word. The cmpxchg() loop is then trivially bounded as the only valid writes by the guest are clear PENDING and clear LINKED & LINK. The masked array could be either: 1. In the same page as the event array. This would waste a big chunk of the event array page though, Doubling the memory requirements. 2. Have a separate set of pages. EVTCHNOP_expand_array would be extended to supply the GFN for the array page where necessary. For 2^17 events, 4 additional pages would be required for this masked array. Thoughts? David> Fixing this requires a change to the ABI which is documented in draft > H of the design. > > http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf > > Since a well-behaved guest only makes a limited set of state changes, > the loop can terminate early if the guest makes an invalid state > transition. > > The guest may: > > - clear LINKED and LINK. > - clear PENDING > - set MASKED > - clear MASKED > > It is valid for the guest to mask and unmask an event at any time so > specify that it is not valid for a guest to clear MASKED if Xen is > trying to update LINK. Indicate this to the guest with an additional > BUSY bit in the event word. The guest must not clear MASKED if BUSY > is set and it should spin until BUSY is cleared. > > The remaining valid writes (clear LINKED, clear PENDING, set MASKED, > clear MASKED by Xen) will limit the number of failures of the > cmpxchg() to at most 4. A clear of LINKED will also terminate the > loop early. Therefore, the loop can then be limited to at most 4 > iterations. > > If the buggy or malicious guest does cause the loop to exit with > LINKED set and LINK unset then that buggy guest will lose events.
Jan Beulich
2013-Nov-22 12:08 UTC
Re: [PATCH 1/2] evtchn/fifo: don''t spin indefinitely when setting LINK
>>> On 20.11.13 at 16:19, David Vrabel <david.vrabel@citrix.com> wrote: > On 12/11/13 11:38, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> A malicious or buggy guest can cause another domain to spin >> indefinitely by repeatedly writing to an event word when the other >> guest is trying to link a new event. The cmpxchg() in >> evtchn_fifo_set_link() will repeatedly fail and the loop may never >> terminate. > > This was fixed by introducing the BUSY bit (see below), but I wonder if > another solution would be better. > > The MASKED bits could all be moved into a separate bit array, guest > writes to set/clear the MASKED bit would never conflict with Xen > updating the main event word. The cmpxchg() loop is then trivially > bounded as the only valid writes by the guest are clear PENDING and > clear LINKED & LINK. > > The masked array could be either: > > 1. In the same page as the event array. This would waste a big chunk of > the event array page though, Doubling the memory requirements. > > 2. Have a separate set of pages. EVTCHNOP_expand_array would be > extended to supply the GFN for the array page where necessary. For 2^17 > events, 4 additional pages would be required for this masked array.Further splitting up the necessary pages seems to make things more complex than necessary. I''d stay with the various control bits all grouped together. Jan