This is a complete implementation of the hypervisor parts of the FIFO-based event channel ABI described in this design document: http://xenbits.xen.org/people/dvrabel/event-channels-E.pdf Changes in draft E are: - Control blocks are no longer required to be in the same page as the vcpu_info. - Added padding to struct evtchn_init_control. Remaining work: * Extend libxl and xl to allow the event channel limit to be set. Without this DomUs are limited the default number of event channels (4096). I use a trivial C program to set the limit. * Decide whether EVTCHNOP_set_limit should be a DOMCTL instead. * Allocation of struct event channels to use less memory when fewer event channels are in use. I plan to have d->evtchns point to a page contains N struct evtchns and M group pointers with the remainder event channels being indexed by group/bucket in patch 6. Patch 1-4 do some preparatory work for supporting alternate ABIs. Patch 5-6 expand the number of evtchn objects a domain may have to changing how they are allocated. Patch 7 adds the ABI. Patch 8 adds the EVTCHNOP_set_priority implementation. This will return -ENOSYS for ABIs that do not support priority. Patch 9-10 adds the EVTCHNOP_set_limit implementation and adds a function to libxc. This will also work with the 2-level ABI. Patch 11 adds the FIFO-based ABI implementation. Changes in v3: - Updates for Draft E of the design. - Store priority in struct evtchn. - Implement set_priority with generic code + hook. - Implement set_limit and add libxc function. - Add ABI specific output to ''e'' debug key. Changes in v2: - Updates for Draft D of the design. - 130,000+ event channels are now supported. - event_port.c -> event_2l.c and only contains 2l functions. - Addresses various review comments - int -> unsigned in lots of places - use write_atomic() to set HEAD - removed MAX_EVTCHNS - evtchn_ops are const. - Pack struct evtchns better to reduce memory needed.
David Vrabel
2013-Sep-13 16:56 UTC
[PATCH 01/11] debug: remove some event channel info from the ''i'' and ''q'' debug keys
From: David Vrabel <david.vrabel@citrix.com> The ''i'' key would always use VCPU0''s selector word when printing the event channel state. Remove the incorrect output as a subsequent change will add the (correct) information to the ''e'' key instead. When dumping domain information, printing the state of the VIRQ_DEBUG port is redundant -- this information is available via the ''e'' key. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/arch/x86/irq.c | 5 +---- xen/common/keyhandler.c | 11 ++--------- 2 files changed, 3 insertions(+), 13 deletions(-) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index c61cc46..7f547ff 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2262,14 +2262,11 @@ static void dump_irqs(unsigned char key) d = action->guest[i]; pirq = domain_irq_to_pirq(d, irq); info = pirq_info(d, pirq); - printk("%u:%3d(%c%c%c%c)", + printk("%u:%3d(%c%c%c)", d->domain_id, pirq, (test_bit(info->evtchn, &shared_info(d, evtchn_pending)) ? ''P'' : ''-''), - (test_bit(info->evtchn / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(d->vcpu[0], evtchn_pending_sel)) ? - ''S'' : ''-''), (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ? ''M'' : ''-''), (info->masked ? ''M'' : ''-'')); diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index b9ad1b5..8e4b3f8 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -310,16 +310,9 @@ static void dump_domains(unsigned char key) { for_each_vcpu ( d, v ) { - printk("Notifying guest %d:%d (virq %d, port %d, stat %d/%d/%d)\n", + printk("Notifying guest %d:%d (virq %d, port %d)\n", d->domain_id, v->vcpu_id, - VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG], - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_pending)), - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_mask)), - test_bit(v->virq_to_evtchn[VIRQ_DEBUG] / - BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel))); + VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG]); send_guest_vcpu_virq(v, VIRQ_DEBUG); } } -- 1.7.2.5
David Vrabel
2013-Sep-13 16:56 UTC
[PATCH 02/11] evtchn: refactor low-level event channel port ops
From: David Vrabel <david.vrabel@citrix.com> Use functions for the low-level event channel port operations (set/clear pending, unmask, is_pending and is_masked). Group these functions into a struct evtchn_port_op so they can be replaced by alternate implementations (for different ABIs) on a per-domain basis. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/arch/x86/irq.c | 11 ++--- xen/common/Makefile | 1 + xen/common/event_2l.c | 99 ++++++++++++++++++++++++++++++++++++++++++++ xen/common/event_channel.c | 88 +++++++++++++++------------------------ xen/common/schedule.c | 3 +- xen/include/xen/event.h | 45 ++++++++++++++++++++ xen/include/xen/sched.h | 4 ++ 7 files changed, 190 insertions(+), 61 deletions(-) create mode 100644 xen/common/event_2l.c diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 7f547ff..53fe9e3 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1474,7 +1474,7 @@ int pirq_guest_unmask(struct domain *d) { pirq = pirqs[i]->pirq; if ( pirqs[i]->masked && - !test_bit(pirqs[i]->evtchn, &shared_info(d, evtchn_mask)) ) + !evtchn_port_is_masked(d, evtchn_from_port(d, pirqs[i]->evtchn)) ) pirq_guest_eoi(pirqs[i]); } } while ( ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) ); @@ -2222,6 +2222,7 @@ static void dump_irqs(unsigned char key) int i, irq, pirq; struct irq_desc *desc; irq_guest_action_t *action; + struct evtchn *evtchn; struct domain *d; const struct pirq *info; unsigned long flags; @@ -2262,13 +2263,11 @@ static void dump_irqs(unsigned char key) d = action->guest[i]; pirq = domain_irq_to_pirq(d, irq); info = pirq_info(d, pirq); + evtchn = evtchn_from_port(d, info->evtchn); printk("%u:%3d(%c%c%c)", d->domain_id, pirq, - (test_bit(info->evtchn, - &shared_info(d, evtchn_pending)) ? - ''P'' : ''-''), - (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ? - ''M'' : ''-''), + (evtchn_port_is_pending(d, evtchn) ? ''P'' : ''-''), + (evtchn_port_is_masked(d, evtchn) ? ''M'' : ''-''), (info->masked ? ''M'' : ''-'')); if ( i != action->nr_guests ) printk(","); diff --git a/xen/common/Makefile b/xen/common/Makefile index 5486140..0a3a367 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -5,6 +5,7 @@ obj-y += cpupool.o obj-$(HAS_DEVICE_TREE) += device_tree.o obj-y += domctl.o obj-y += domain.o +obj-y += event_2l.o obj-y += event_channel.o obj-y += grant_table.o obj-y += irq.o diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c new file mode 100644 index 0000000..18c0c6e --- /dev/null +++ b/xen/common/event_2l.c @@ -0,0 +1,99 @@ +/* + * Event channel port operations. + * + * Copyright (c) 2003-2006, K A Fraser. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ + +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/sched.h> +#include <xen/event.h> + +static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn) +{ + struct domain *d = v->domain; + unsigned port = evtchn->port; + + /* + * The following bit operations must happen in strict order. + * NB. On x86, the atomic bit operations also act as memory barriers. + * There is therefore sufficiently strict ordering for this architecture -- + * others may require explicit memory barriers. + */ + + if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) ) + return; + + if ( !test_bit (port, &shared_info(d, evtchn_mask)) && + !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel)) ) + { + vcpu_mark_events_pending(v); + } + + evtchn_check_pollers(d, port); +} + +static void evtchn_2l_clear_pending(struct domain *d, struct evtchn *evtchn) +{ + clear_bit(evtchn->port, &shared_info(d, evtchn_pending)); +} + +static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + unsigned port = evtchn->port; + + /* + * These operations must happen in strict order. Based on + * evtchn_2l_set_pending() above. + */ + if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) && + test_bit (port, &shared_info(d, evtchn_pending)) && + !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel)) ) + { + vcpu_mark_events_pending(v); + } +} + +static bool_t evtchn_2l_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + return test_bit(evtchn->port, &shared_info(d, evtchn_pending)); +} + +static bool_t evtchn_2l_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + return test_bit(evtchn->port, &shared_info(d, evtchn_mask)); +} + +static const struct evtchn_port_ops evtchn_port_ops_2l +{ + .set_pending = evtchn_2l_set_pending, + .clear_pending = evtchn_2l_clear_pending, + .unmask = evtchn_2l_unmask, + .is_pending = evtchn_2l_is_pending, + .is_masked = evtchn_2l_is_masked, +}; + +void evtchn_2l_init(struct domain *d) +{ + d->evtchn_port_ops = &evtchn_port_ops_2l; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 64c976b..eac0d99 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -150,6 +150,7 @@ static int get_free_port(struct domain *d) xfree(chn); return -ENOMEM; } + chn[i].port = port + i; } bucket_from_port(d, port) = chn; @@ -530,7 +531,7 @@ static long __evtchn_close(struct domain *d1, int port1) } /* Clear pending event to avoid unexpected behavior on re-bind. */ - clear_bit(port1, &shared_info(d1, evtchn_pending)); + evtchn_port_clear_pending(d1, chn1); /* Reset binding to vcpu0 when the channel is freed. */ chn1->state = ECS_FREE; @@ -615,43 +616,7 @@ out: static void evtchn_set_pending(struct vcpu *v, int port) { - struct domain *d = v->domain; - int vcpuid; - - /* - * The following bit operations must happen in strict order. - * NB. On x86, the atomic bit operations also act as memory barriers. - * There is therefore sufficiently strict ordering for this architecture -- - * others may require explicit memory barriers. - */ - - if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) ) - return; - - if ( !test_bit (port, &shared_info(d, evtchn_mask)) && - !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel)) ) - { - vcpu_mark_events_pending(v); - } - - /* Check if some VCPU might be polling for this event. */ - if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) - return; - - /* Wake any interested (or potentially interested) pollers. */ - for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); - vcpuid < d->max_vcpus; - vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) - { - v = d->vcpu[vcpuid]; - if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && - test_and_clear_bit(vcpuid, d->poll_mask) ) - { - v->poll_evtchn = 0; - vcpu_unblock(v); - } - } + evtchn_port_set_pending(v, evtchn_from_port(v->domain, port)); } int guest_enabled_event(struct vcpu *v, uint32_t virq) @@ -920,26 +885,15 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) int evtchn_unmask(unsigned int port) { struct domain *d = current->domain; - struct vcpu *v; + struct evtchn *evtchn; ASSERT(spin_is_locked(&d->event_lock)); if ( unlikely(!port_is_valid(d, port)) ) return -EINVAL; - v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id]; - - /* - * These operations must happen in strict order. Based on - * include/xen/event.h:evtchn_set_pending(). - */ - if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) && - test_bit (port, &shared_info(d, evtchn_pending)) && - !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel)) ) - { - vcpu_mark_events_pending(v); - } + evtchn = evtchn_from_port(d, port); + evtchn_port_unmask(d, evtchn); return 0; } @@ -1170,9 +1124,35 @@ void notify_via_xen_event_channel(struct domain *ld, int lport) spin_unlock(&ld->event_lock); } +void evtchn_check_pollers(struct domain *d, unsigned port) +{ + struct vcpu *v; + unsigned vcpuid; + + /* Check if some VCPU might be polling for this event. */ + if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) + return; + + /* Wake any interested (or potentially interested) pollers. */ + for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); + vcpuid < d->max_vcpus; + vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) + { + v = d->vcpu[vcpuid]; + if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && + test_and_clear_bit(vcpuid, d->poll_mask) ) + { + v->poll_evtchn = 0; + vcpu_unblock(v); + } + } +} int evtchn_init(struct domain *d) { + /* Default to N-level ABI. */ + evtchn_2l_init(d); + spin_lock_init(&d->event_lock); if ( get_free_port(d) != 0 ) return -EINVAL; @@ -1270,8 +1250,8 @@ static void domain_dump_evtchn_info(struct domain *d) printk(" %4u [%d/%d]: s=%d n=%d x=%d", port, - !!test_bit(port, &shared_info(d, evtchn_pending)), - !!test_bit(port, &shared_info(d, evtchn_mask)), + !!evtchn_port_is_pending(d, chn), + !!evtchn_port_is_masked(d, chn), chn->state, chn->notify_vcpu_id, chn->xen_consumer); switch ( chn->state ) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index a8398bd..7e6884d 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -34,6 +34,7 @@ #include <xen/multicall.h> #include <xen/cpu.h> #include <xen/preempt.h> +#include <xen/event.h> #include <public/sched.h> #include <xsm/xsm.h> @@ -751,7 +752,7 @@ static long do_poll(struct sched_poll *sched_poll) goto out; rc = 0; - if ( test_bit(port, &shared_info(d, evtchn_pending)) ) + if ( evtchn_port_is_pending(d, evtchn_from_port(d, port)) ) goto out; } diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 6f60162..7522f4e 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -102,4 +102,49 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); smp_mb(); /* set blocked status /then/ caller does his work */ \ } while ( 0 ) +void evtchn_check_pollers(struct domain *d, unsigned port); + +void evtchn_2l_init(struct domain *d); + +/* + * Low-level event channel port ops. + */ +struct evtchn_port_ops { + void (*set_pending)(struct vcpu *v, struct evtchn *evtchn); + void (*clear_pending)(struct domain *d, struct evtchn *evtchn); + void (*unmask)(struct domain *d, struct evtchn *evtchn); + bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); + bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); +}; + +static inline void evtchn_port_set_pending(struct vcpu *v, + struct evtchn *evtchn) +{ + v->domain->evtchn_port_ops->set_pending(v, evtchn); +} + +static inline void evtchn_port_clear_pending(struct domain *d, + struct evtchn *evtchn) +{ + d->evtchn_port_ops->clear_pending(d, evtchn); +} + +static inline void evtchn_port_unmask(struct domain *d, + struct evtchn *evtchn) +{ + d->evtchn_port_ops->unmask(d, evtchn); +} + +static inline bool_t evtchn_port_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + return d->evtchn_port_ops->is_pending(d, evtchn); +} + +static inline bool_t evtchn_port_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + return d->evtchn_port_ops->is_masked(d, evtchn); +} + #endif /* __XEN_EVENT_H__ */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 0013a8d..fb9cf11 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -66,6 +66,7 @@ struct evtchn u8 state; /* ECS_* */ u8 xen_consumer; /* Consumer in Xen, if any? (0 = send to guest) */ u16 notify_vcpu_id; /* VCPU for local delivery notification */ + u32 port; union { struct { domid_t remote_domid; @@ -238,6 +239,8 @@ struct mem_event_per_domain struct mem_event_domain access; }; +struct evtchn_port_ops; + struct domain { domid_t domain_id; @@ -271,6 +274,7 @@ struct domain /* Event channel information. */ struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; spinlock_t event_lock; + const struct evtchn_port_ops *evtchn_port_ops; struct grant_table *grant_table; -- 1.7.2.5
David Vrabel
2013-Sep-13 16:56 UTC
[PATCH 03/11] evtchn: print ABI specific state with the ''e'' debug key
From: David Vrabel <david.vrabel@citrix.com> In the output of the ''e'' debug key, print some ABI specific state in addition to the (p)ending and (m)asked bits. For the 2-level ABI, print the state of that event''s selector bit. e.g., (XEN) port [p/m/s] (XEN) 1 [0/0/1]: s=3 n=0 x=0 d=0 p=74 (XEN) 2 [0/0/1]: s=3 n=0 x=0 d=0 p=75 Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_2l.c | 10 ++++++++++ xen/common/event_channel.c | 8 +++++--- xen/include/xen/event.h | 7 +++++++ 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c index 18c0c6e..b7b152c 100644 --- a/xen/common/event_2l.c +++ b/xen/common/event_2l.c @@ -74,6 +74,15 @@ static bool_t evtchn_2l_is_masked(struct domain *d, return test_bit(evtchn->port, &shared_info(d, evtchn_mask)); } +static void evtchn_2l_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + + printk("%d", !!test_bit(evtchn->port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel))); +} + static const struct evtchn_port_ops evtchn_port_ops_2l { .set_pending = evtchn_2l_set_pending, @@ -81,6 +90,7 @@ static const struct evtchn_port_ops evtchn_port_ops_2l .unmask = evtchn_2l_unmask, .is_pending = evtchn_2l_is_pending, .is_masked = evtchn_2l_is_masked, + .print_state = evtchn_2l_print_state, }; void evtchn_2l_init(struct domain *d) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index eac0d99..5569fe9 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1233,7 +1233,7 @@ static void domain_dump_evtchn_info(struct domain *d) d->poll_mask, d->max_vcpus); printk("Event channel information for domain %d:\n" "Polling vCPUs: {%s}\n" - " port [p/m]\n", d->domain_id, keyhandler_scratch); + " port [p/m/s]\n", d->domain_id, keyhandler_scratch); spin_lock(&d->event_lock); @@ -1248,10 +1248,12 @@ static void domain_dump_evtchn_info(struct domain *d) if ( chn->state == ECS_FREE ) continue; - printk(" %4u [%d/%d]: s=%d n=%d x=%d", + printk(" %4u [%d/%d/", port, !!evtchn_port_is_pending(d, chn), - !!evtchn_port_is_masked(d, chn), + !!evtchn_port_is_masked(d, chn)); + evtchn_port_print_state(d, chn); + printk("]: s=%d n=%d x=%d", chn->state, chn->notify_vcpu_id, chn->xen_consumer); switch ( chn->state ) diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 7522f4e..90410e0 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -115,6 +115,7 @@ struct evtchn_port_ops { void (*unmask)(struct domain *d, struct evtchn *evtchn); bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); + void (*print_state)(struct domain *d, const struct evtchn *evtchn); }; static inline void evtchn_port_set_pending(struct vcpu *v, @@ -147,4 +148,10 @@ static inline bool_t evtchn_port_is_masked(struct domain *d, return d->evtchn_port_ops->is_masked(d, evtchn); } +static inline void evtchn_port_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + d->evtchn_port_ops->print_state(d, evtchn); +} + #endif /* __XEN_EVENT_H__ */ -- 1.7.2.5
David Vrabel
2013-Sep-13 16:56 UTC
[PATCH 04/11] evtchn: use a per-domain variable for the max number of event channels
From: David Vrabel <david.vrabel@citrix.com> Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead. This avoids having to repeatedly check the ABI type. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_2l.c | 1 + xen/common/event_channel.c | 4 ++-- xen/common/schedule.c | 2 +- xen/include/xen/event.h | 2 +- xen/include/xen/sched.h | 2 +- 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c index b7b152c..ecdcdaf 100644 --- a/xen/common/event_2l.c +++ b/xen/common/event_2l.c @@ -96,6 +96,7 @@ static const struct evtchn_port_ops evtchn_port_ops_2l void evtchn_2l_init(struct domain *d) { d->evtchn_port_ops = &evtchn_port_ops_2l; + d->max_evtchns = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d); } /* diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 5569fe9..0c92cb0 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -134,7 +134,7 @@ static int get_free_port(struct domain *d) if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; - if ( port == MAX_EVTCHNS(d) ) + if ( port == d->max_evtchns ) return -ENOSPC; chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); @@ -1237,7 +1237,7 @@ static void domain_dump_evtchn_info(struct domain *d) spin_lock(&d->event_lock); - for ( port = 1; port < MAX_EVTCHNS(d); ++port ) + for ( port = 1; port < d->max_evtchns; ++port ) { const struct evtchn *chn; char *ssid; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 7e6884d..a5a0010 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -748,7 +748,7 @@ static long do_poll(struct sched_poll *sched_poll) goto out; rc = -EINVAL; - if ( port >= MAX_EVTCHNS(d) ) + if ( port >= d->max_evtchns ) goto out; rc = 0; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 90410e0..302a904 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -73,7 +73,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); #define bucket_from_port(d,p) \ ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET]) #define port_is_valid(d,p) \ - (((p) >= 0) && ((p) < MAX_EVTCHNS(d)) && \ + (((p) >= 0) && ((p) < (d)->max_evtchns) && \ (bucket_from_port(d,p) != NULL)) #define evtchn_from_port(d,p) \ (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)]) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index fb9cf11..532dd46 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -50,7 +50,6 @@ extern struct domain *dom0; #else #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG) #endif -#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d)) #define EVTCHNS_PER_BUCKET 128 #define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) @@ -273,6 +272,7 @@ struct domain /* Event channel information. */ struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; + unsigned max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; -- 1.7.2.5
From: Wei Liu <wei.liu2@citrix.com> As we move to extended evtchn ABI we need bigger d->evtchn, as a result this will bloat struct domain. So move this array out of struct domain and allocate a dedicated array for it. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/common/event_channel.c | 13 +++++++++++++ xen/include/xen/sched.h | 2 +- 2 files changed, 14 insertions(+), 1 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 0c92cb0..273d449 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1153,15 +1153,26 @@ int evtchn_init(struct domain *d) /* Default to N-level ABI. */ evtchn_2l_init(d); + BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE); + d->evtchn = xzalloc_array(struct evtchn *, NR_EVTCHN_BUCKETS); + if ( d->evtchn == NULL ) + return -ENOMEM; + spin_lock_init(&d->event_lock); if ( get_free_port(d) != 0 ) + { + xfree(d->evtchn); return -EINVAL; + } evtchn_from_port(d, 0)->state = ECS_RESERVED; #if MAX_VIRT_CPUS > BITS_PER_LONG d->poll_mask = xmalloc_array(unsigned long, BITS_TO_LONGS(MAX_VIRT_CPUS)); if ( !d->poll_mask ) + { + xfree(d->evtchn); return -ENOMEM; + } bitmap_zero(d->poll_mask, MAX_VIRT_CPUS); #endif @@ -1195,6 +1206,8 @@ void evtchn_destroy(struct domain *d) spin_unlock(&d->event_lock); clear_global_virq_handlers(d); + + xfree(d->evtchn); } diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 532dd46..9227685 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -271,7 +271,7 @@ struct domain spinlock_t rangesets_lock; /* Event channel information. */ - struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; + struct evtchn **evtchn; unsigned max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; -- 1.7.2.5
David Vrabel
2013-Sep-13 16:56 UTC
[PATCH 06/11] evtchn: alter internal object handling scheme
From: Wei Liu <wei.liu2@citrix.com> Originally, evtchn objects are stored in buckets. Now we add another layer called group. struct domain holds an array to evtchn groups, then each group holds pointers to a bucket. With this change, each domain can have more struct evtchn in a space-efficient way. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Allocate the array of group pointers, shrinking the size of struct domain. Compile time calculate grouping and bucket parameters to achive optimum packing into PAGE_SIZE memory allocations. Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_channel.c | 29 ++++++++++++++++++++++++----- xen/include/xen/event.h | 24 ++++++++++++++++-------- xen/include/xen/sched.h | 20 +++++++++++++++++--- 3 files changed, 57 insertions(+), 16 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 273d449..28c641b 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -124,6 +124,7 @@ static int virq_is_global(uint32_t virq) static int get_free_port(struct domain *d) { struct evtchn *chn; + struct evtchn **grp; int port; int i, j; @@ -137,6 +138,15 @@ static int get_free_port(struct domain *d) if ( port == d->max_evtchns ) return -ENOSPC; + if ( unlikely(group_from_port(d, port) == NULL ) ) + { + grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP); + if ( unlikely(grp == NULL) ) + return -ENOMEM; + else + group_from_port(d, port) = grp; + } + chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); if ( unlikely(chn == NULL) ) return -ENOMEM; @@ -1153,8 +1163,7 @@ int evtchn_init(struct domain *d) /* Default to N-level ABI. */ evtchn_2l_init(d); - BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE); - d->evtchn = xzalloc_array(struct evtchn *, NR_EVTCHN_BUCKETS); + d->evtchn = xzalloc_array(struct evtchn **, NR_EVTCHN_GROUPS); if ( d->evtchn == NULL ) return -ENOMEM; @@ -1182,7 +1191,7 @@ int evtchn_init(struct domain *d) void evtchn_destroy(struct domain *d) { - int i; + unsigned i, j; /* After this barrier no new event-channel allocations can occur. */ BUG_ON(!d->is_dying); @@ -1197,9 +1206,19 @@ void evtchn_destroy(struct domain *d) /* Free all event-channel buckets. */ spin_lock(&d->event_lock); - for ( i = 0; i < NR_EVTCHN_BUCKETS; i++ ) + for ( i = 0; i < NR_EVTCHN_GROUPS; i++ ) { - xsm_free_security_evtchn(d->evtchn[i]); + if ( d->evtchn[i] == NULL ) + continue; + + for ( j = 0; j < BUCKETS_PER_GROUP; j++ ) + { + if ( d->evtchn[i][j] == NULL ) + continue; + xsm_free_security_evtchn(d->evtchn[i][j]); + xfree(d->evtchn[i][j]); + d->evtchn[i][j] = NULL; + } xfree(d->evtchn[i]); d->evtchn[i] = NULL; } diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 302a904..091b53c 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -69,14 +69,22 @@ int guest_enabled_event(struct vcpu *v, uint32_t virq); /* Notify remote end of a Xen-attached event channel.*/ void notify_via_xen_event_channel(struct domain *ld, int lport); -/* Internal event channel object accessors */ -#define bucket_from_port(d,p) \ - ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET]) -#define port_is_valid(d,p) \ - (((p) >= 0) && ((p) < (d)->max_evtchns) && \ - (bucket_from_port(d,p) != NULL)) -#define evtchn_from_port(d,p) \ - (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)]) +/* + * Internal event channel object storage: + * Objects are organized in two level scheme: group and bucket + * A group consists of several buckets, a bucket is an array of struct evtchn + */ +#define group_from_port(d, p) \ + ((d)->evtchn[(p) / EVTCHNS_PER_GROUP]) +/* User should make sure group is not NULL */ +#define bucket_from_port(d, p) \ + ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET]) +#define port_is_valid(d, p) \ + (((p) >= 0) && ((p) < (d)->max_evtchns) && \ + (group_from_port(d, p) != NULL) && \ + (bucket_from_port(d, p) != NULL)) +#define evtchn_from_port(d, p) \ + (&(bucket_from_port(d, p))[(p) & (EVTCHNS_PER_BUCKET-1)]) /* Wait on a Xen-attached event channel. */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 9227685..b348232 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -50,8 +50,22 @@ extern struct domain *dom0; #else #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG) #endif -#define EVTCHNS_PER_BUCKET 128 -#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) + +#define BUCKETS_PER_GROUP (PAGE_SIZE/sizeof(struct evtchn *)) +/* Round size of struct evtchn up to power of 2 size */ +#define __RDU2(x) ( (x) | ( (x) >> 1)) +#define __RDU4(x) ( __RDU2(x) | ( __RDU2(x) >> 2)) +#define __RDU8(x) ( __RDU4(x) | ( __RDU4(x) >> 4)) +#define __RDU16(x) ( __RDU8(x) | ( __RDU8(x) >> 8)) +#define __RDU32(x) (__RDU16(x) | (__RDU16(x) >>16)) +#define next_power_of_2(x) (__RDU32((x)-1) + 1) + +/* Maximum number of event channels for any ABI. */ +#define MAX_NR_EVTCHNS NR_EVENT_CHANNELS + +#define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) +#define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) +#define NR_EVTCHN_GROUPS DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP) struct evtchn { @@ -271,7 +285,7 @@ struct domain spinlock_t rangesets_lock; /* Event channel information. */ - struct evtchn **evtchn; + struct evtchn ***evtchn; unsigned max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; -- 1.7.2.5
From: David Vrabel <david.vrabel@citrix.com> Add the event channel hypercall sub-ops and the definitions for the shared data structures for the FIFO-based event channel ABI. The design document for this new ABI is available here: http://xenbits.xen.org/people/dvrabel/event-channels-E.pdf In summary, events are reported using a per-domain shared event array of event words. Each event word has PENDING, LINKED and MASKED bits and a LINK field for pointing to the next event in the event queue. There are 16 event queues (with different priorities) per-VCPU. Key advantages of this new ABI include: - Support for over 100,000 events (2^17). - 16 different event priorities. - Improved fairness in event latency through the use of FIFOs. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/include/public/event_channel.h | 77 ++++++++++++++++++++++++++++++++++++ xen/include/public/xen.h | 2 +- 2 files changed, 78 insertions(+), 1 deletions(-) diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h index 472efdb..031366a 100644 --- a/xen/include/public/event_channel.h +++ b/xen/include/public/event_channel.h @@ -71,6 +71,10 @@ #define EVTCHNOP_bind_vcpu 8 #define EVTCHNOP_unmask 9 #define EVTCHNOP_reset 10 +#define EVTCHNOP_init_control 11 +#define EVTCHNOP_expand_array 12 +#define EVTCHNOP_set_priority 13 +#define EVTCHNOP_set_limit 14 /* ` } */ typedef uint32_t evtchn_port_t; @@ -258,6 +262,53 @@ struct evtchn_reset { typedef struct evtchn_reset evtchn_reset_t; /* + * EVTCHNOP_init_control: initialize the control block for the FIFO ABI. + * + * Note: any events that are currently pending will not be resent and + * will be lost. Guests should call this before binding any event to + * avoid losing any events. + */ +struct evtchn_init_control { + /* IN parameters. */ + uint64_t control_mfn; + uint32_t offset; + uint32_t vcpu; + /* OUT parameters. */ + uint8_t link_bits; + uint8_t __pad[7]; +}; +typedef struct evtchn_init_control evtchn_init_control_t; + +/* + * EVTCHNOP_expand_array: add an additional page to the event array. + */ +struct evtchn_expand_array { + /* IN parameters. */ + uint64_t array_mfn; +}; +typedef struct evtchn_expand_array evtchn_expand_array_t; + +/* + * EVTCHNOP_set_priority: set the priority for an event channel. + */ +struct evtchn_set_priority { + /* IN parameters. */ + uint32_t port; + uint32_t priority; +}; +typedef struct evtchn_set_priority evtchn_set_priority_t; + +/* + * EVTCHNOP_set_limit: set the maximum event channel port that may be bound. + */ +struct evtchn_set_limit { + /* IN parameters. */ + uint32_t domid; + uint32_t max_port; +}; +typedef struct evtchn_set_limit evtchn_set_limit_t; + +/* * ` enum neg_errnoval * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op) * ` @@ -281,6 +332,32 @@ struct evtchn_op { typedef struct evtchn_op evtchn_op_t; DEFINE_XEN_GUEST_HANDLE(evtchn_op_t); +/* + * FIFO ABI + */ + +/* Events may have priorities from 0 (highest) to 15 (lowest). */ +#define EVTCHN_FIFO_PRIORITY_MIN 15 +#define EVTCHN_FIFO_PRIORITY_DEFAULT 7 + +#define EVTCHN_FIFO_MAX_QUEUES (EVTCHN_FIFO_PRIORITY_MIN + 1) + +typedef uint32_t event_word_t; + +#define EVTCHN_FIFO_PENDING 31 +#define EVTCHN_FIFO_MASKED 30 +#define EVTCHN_FIFO_LINKED 29 + +#define EVTCHN_FIFO_LINK_BITS 17 +#define EVTCHN_FIFO_LINK_MASK ((1 << EVTCHN_FIFO_LINK_BITS) - 1) + +struct evtchn_fifo_control_block { + uint32_t ready; + uint32_t _rsvd1; + uint32_t head[EVTCHN_FIFO_MAX_QUEUES]; +}; +typedef struct evtchn_fifo_control_block evtchn_fifo_control_block_t; + #endif /* __XEN_PUBLIC_EVENT_CHANNEL_H__ */ /* diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index b50bd05..ab248d4 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -553,7 +553,7 @@ typedef struct multicall_entry multicall_entry_t; DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); /* - * Event channel endpoints per domain: + * Event channel endpoints per domain (when using the 2-level ABI): * 1024 if a long is 32 bits; 4096 if a long is 64 bits. */ #define NR_EVENT_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64) -- 1.7.2.5
David Vrabel
2013-Sep-13 16:56 UTC
[PATCH 08/11] evtchn: implement EVTCHNOP_set_priority and add the set_priority hook
From: David Vrabel <david.vrabel@citrix.com> Implement EVTCHNOP_set_priority. A new set_priority hook added to struct evtchn_port_ops will do the ABI specific validation and setup. If an ABI does not provide a set_priority hook (as is the case of the 2-level ABI), the sub-op will return -ENOSYS. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_channel.c | 29 +++++++++++++++++++++++++++++ xen/include/xen/event.h | 11 +++++++++++ 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 28c641b..d583d54 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -934,6 +934,27 @@ out: return rc; } +static long evtchn_set_priority(const struct evtchn_set_priority *set_priority) +{ + struct domain *d = current->domain; + unsigned port = set_priority->port; + long ret; + + spin_lock(&d->event_lock); + + if ( !port_is_valid(d, port) ) + { + spin_unlock(&d->event_lock); + return -EINVAL; + } + + ret = evtchn_port_set_priority(d, evtchn_from_port(d, port), + set_priority->priority); + + spin_unlock(&d->event_lock); + + return ret; +} long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { @@ -1043,6 +1064,14 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_set_priority: { + struct evtchn_set_priority set_priority; + if ( copy_from_guest(&set_priority, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_set_priority(&set_priority); + break; + } + default: rc = -ENOSYS; break; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 091b53c..954ed89 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -123,6 +123,8 @@ struct evtchn_port_ops { void (*unmask)(struct domain *d, struct evtchn *evtchn); bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); + int (*set_priority)(struct domain *d, struct evtchn *evtchn, + unsigned priority); void (*print_state)(struct domain *d, const struct evtchn *evtchn); }; @@ -156,6 +158,15 @@ static inline bool_t evtchn_port_is_masked(struct domain *d, return d->evtchn_port_ops->is_masked(d, evtchn); } +static inline int evtchn_port_set_priority(struct domain *d, + struct evtchn *evtchn, + unsigned priority) +{ + if ( !d->evtchn_port_ops->set_priority ) + return -ENOSYS; + return d->evtchn_port_ops->set_priority(d, evtchn, priority); +} + static inline void evtchn_port_print_state(struct domain *d, const struct evtchn *evtchn) { -- 1.7.2.5
From: David Vrabel <david.vrabel@citrix.com> Add max_evtchn_port to struct domain as the maximum port number that a domain may bind. Allow a suitably privileged domain to set this limit so it may restrict the other domain''s usage of Xen resources for event channels (primarily global mapping entries for the shared data structures). The default values for this limit is unlimited for control domains or NR_EVENT_CHANNELS - 1 for other domains. These defaults ensure that no domain will regress in the number of event channels they may use. If a domain is expected to use the FIFO-based event channel ABI, a toolstack may wish to set the limit to 1023 or lower to ensure a minimum number of global mapping entries is used. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- xen/common/event_channel.c | 47 +++++++++++++++++++++++++++++++++++- xen/include/public/event_channel.h | 3 ++ xen/include/xen/sched.h | 1 + xen/include/xsm/dummy.h | 6 ++++ xen/include/xsm/xsm.h | 6 ++++ 5 files changed, 62 insertions(+), 1 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index d583d54..45ff115 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -132,10 +132,14 @@ static int get_free_port(struct domain *d) return -EINVAL; for ( port = 0; port_is_valid(d, port); port++ ) + { + if (port > d->max_evtchn_port ) + return -ENOSPC; if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; + } - if ( port == d->max_evtchns ) + if ( port == d->max_evtchns || port > d->max_evtchn_port ) return -ENOSPC; if ( unlikely(group_from_port(d, port) == NULL ) ) @@ -956,6 +960,34 @@ static long evtchn_set_priority(const struct evtchn_set_priority *set_priority) return ret; } +static long evtchn_set_limit(const struct evtchn_set_limit *set_limit) +{ + struct domain *d; + unsigned max_port = set_limit->max_port; + long ret; + + if ( max_port > EVTCHN_MAX_PORT_UNLIMITED ) + return -EINVAL; + + d = rcu_lock_domain_by_id(set_limit->domid); + if ( !d ) + return -ESRCH; + + ret = xsm_evtchn_set_limit(XSM_DM_PRIV, d); + if ( ret ) + goto out; + + spin_lock(&d->event_lock); + + d->max_evtchn_port = max_port; + + spin_unlock(&d->event_lock); + +out: + rcu_unlock_domain(d); + return ret; +} + long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { long rc; @@ -1072,6 +1104,14 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_set_limit: { + struct evtchn_set_limit set_limit; + if ( copy_from_guest(&set_limit, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_set_limit(&set_limit); + break; + } + default: rc = -ENOSYS; break; @@ -1189,6 +1229,11 @@ void evtchn_check_pollers(struct domain *d, unsigned port) int evtchn_init(struct domain *d) { + if ( is_control_domain(d) ) + d->max_evtchn_port = EVTCHN_MAX_PORT_UNLIMITED; + else + d->max_evtchn_port = EVTCHN_MAX_PORT_DEFAULT; + /* Default to N-level ABI. */ evtchn_2l_init(d); diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h index 031366a..30a233c 100644 --- a/xen/include/public/event_channel.h +++ b/xen/include/public/event_channel.h @@ -308,6 +308,9 @@ struct evtchn_set_limit { }; typedef struct evtchn_set_limit evtchn_set_limit_t; +#define EVTCHN_MAX_PORT_UNLIMITED ((1u << 31) - 1) +#define EVTCHN_MAX_PORT_DEFAULT (NR_EVENT_CHANNELS - 1) + /* * ` enum neg_errnoval * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index b348232..ba3714d 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -287,6 +287,7 @@ struct domain /* Event channel information. */ struct evtchn ***evtchn; unsigned max_evtchns; + unsigned max_evtchn_port; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index cc0a5a8..3f1f3c1 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -296,6 +296,12 @@ static XSM_INLINE int xsm_evtchn_reset(XSM_DEFAULT_ARG struct domain *d1, struct return xsm_default_action(action, d1, d2); } +static XSM_INLINE int xsm_evtchn_set_limit(XSM_DEFAULT_ARG struct domain *d) +{ + XSM_ASSERT_ACTION(XSM_DM_PRIV); + return xsm_default_action(action, current->domain, d); +} + static XSM_INLINE int xsm_alloc_security_evtchn(struct evtchn *chn) { return 0; diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index 1939453..ae79ca9 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -70,6 +70,7 @@ struct xsm_operations { int (*evtchn_send) (struct domain *d, struct evtchn *chn); int (*evtchn_status) (struct domain *d, struct evtchn *chn); int (*evtchn_reset) (struct domain *d1, struct domain *d2); + int (*evtchn_set_limit) (struct domain *d); int (*grant_mapref) (struct domain *d1, struct domain *d2, uint32_t flags); int (*grant_unmapref) (struct domain *d1, struct domain *d2); @@ -255,6 +256,11 @@ static inline int xsm_evtchn_reset (xsm_default_t def, struct domain *d1, struct return xsm_ops->evtchn_reset(d1, d2); } +static inline int xsm_evtchn_set_limit (xsm_default_t def, struct domain *d) +{ + return xsm_ops->evtchn_set_limit(d); +} + static inline int xsm_grant_mapref (xsm_default_t def, struct domain *d1, struct domain *d2, uint32_t flags) { -- 1.7.2.5
From: David Vrabel <david.vrabel@citrix.com> Add xc_evtchn_set_limit(), a wrapper around the EVTCHNOP_set_limit hypercall. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> --- tools/libxc/xc_evtchn.c | 6 ++++++ tools/libxc/xenctrl.h | 13 +++++++++++++ 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/tools/libxc/xc_evtchn.c b/tools/libxc/xc_evtchn.c index 2e0679e..fd9d120 100644 --- a/tools/libxc/xc_evtchn.c +++ b/tools/libxc/xc_evtchn.c @@ -78,6 +78,12 @@ int xc_evtchn_status(xc_interface *xch, xc_evtchn_status_t *status) sizeof(*status), 1); } +int xc_evtchn_set_limit(xc_interface *xch, domid_t domid, uint32_t max_port) +{ + struct evtchn_set_limit arg = { .domid = domid, max_port = max_port }; + return do_evtchn_op(xch, EVTCHNOP_set_limit, &arg, sizeof(arg), 0); +} + int xc_evtchn_fd(xc_evtchn *xce) { return xce->ops->u.evtchn.fd(xce, xce->ops_handle); diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index f2cebaf..f5b1aa3 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -964,6 +964,19 @@ int xc_evtchn_reset(xc_interface *xch, typedef struct evtchn_status xc_evtchn_status_t; int xc_evtchn_status(xc_interface *xch, xc_evtchn_status_t *status); +/** + * Set the maximum port number that a domain may use. + * + * The toolstack may use this to limit the amount of Xen resources + * (xenheap, global mapping pages, etc.) the domain may use for event + * channels. + * + * @parm xch a handle to an open hypervisor interface + * @parm domid the ID of the domain + * @parm max_port the new maximum port number + */ +int xc_evtchn_set_limit(xc_interface *xch, domid_t domid, uint32_t max_port); + /* * Return a handle to the event channel driver, or NULL on failure, in * which case errno will be set appropriately. -- 1.7.2.5
David Vrabel
2013-Sep-13 16:56 UTC
[PATCH 11/11] evtchn: add FIFO-based event channel hypercalls and port ops
From: David Vrabel <david.vrabel@citrix.com> Add the implementation for the FIFO-based event channel ABI. The new hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array) and the required evtchn_ops (set_pending, unmask, etc.). Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/Makefile | 1 + xen/common/event_channel.c | 20 ++ xen/common/event_fifo.c | 455 ++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/event_fifo.h | 54 +++++ xen/include/xen/sched.h | 7 +- 5 files changed, 536 insertions(+), 1 deletions(-) create mode 100644 xen/common/event_fifo.c create mode 100644 xen/include/xen/event_fifo.h diff --git a/xen/common/Makefile b/xen/common/Makefile index 0a3a367..533b603 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -7,6 +7,7 @@ obj-y += domctl.o obj-y += domain.o obj-y += event_2l.o obj-y += event_channel.o +obj-y += event_fifo.o obj-y += grant_table.o obj-y += irq.o obj-y += kernel.o diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 45ff115..9e0832f 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -26,6 +26,7 @@ #include <xen/compat.h> #include <xen/guest_access.h> #include <xen/keyhandler.h> +#include <xen/event_fifo.h> #include <asm/current.h> #include <public/xen.h> @@ -1096,6 +1097,24 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_init_control: { + struct evtchn_init_control init_control; + if ( copy_from_guest(&init_control, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_fifo_init_control(&init_control); + if ( !rc && __copy_to_guest(arg, &init_control, 1) ) + rc = -EFAULT; + break; + } + + case EVTCHNOP_expand_array: { + struct evtchn_expand_array expand_array; + if ( copy_from_guest(&expand_array, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_fifo_expand_array(&expand_array); + break; + } + case EVTCHNOP_set_priority: { struct evtchn_set_priority set_priority; if ( copy_from_guest(&set_priority, arg, 1) != 0 ) @@ -1300,6 +1319,7 @@ void evtchn_destroy(struct domain *d) clear_global_virq_handlers(d); + evtchn_fifo_destroy(d); xfree(d->evtchn); } diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c new file mode 100644 index 0000000..c674178 --- /dev/null +++ b/xen/common/event_fifo.c @@ -0,0 +1,455 @@ +/* + * FIFO event channel management. + * + * Copyright (C) 2013 Citrix Systems R&D Ltd. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ + +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/sched.h> +#include <xen/event.h> +#include <xen/event_fifo.h> +#include <xen/paging.h> +#include <xen/mm.h> + +#include <public/event_channel.h> + +static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d, + unsigned port) +{ + unsigned p, w; + + if ( unlikely(port >= d->evtchn_fifo->num_evtchns) ) + return NULL; + + p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + + return d->evtchn_fifo->event_array[p].virt + w; +} + +static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link) +{ + event_word_t n, o, w; + + w = *word; + + do { + if ( !(w & (1 << EVTCHN_FIFO_LINKED)) ) + return 0; + o = w; + n = (w & ~EVTCHN_FIFO_LINK_MASK) | link; + } while ( (w = cmpxchg(word, o, n)) != o ); + + return 1; +} + +static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) +{ + struct domain *d = v->domain; + unsigned port; + event_word_t *word; + struct evtchn_fifo_queue *q; + unsigned long flags; + bool_t was_pending; + + port = evtchn->port; + word = evtchn_fifo_word_from_port(d, port); + if ( unlikely(!word) ) + return; + + /* + * No locking around getting the queue. This may race with + * changing the priority but we are allowed to signal the event + * once on the old priority. + */ + q = &v->evtchn_fifo->queue[evtchn->priority]; + + was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word); + + /* + * Link the event if it unmasked and not already linked. + */ + if ( !test_bit(EVTCHN_FIFO_MASKED, word) + && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + { + event_word_t *tail_word; + bool_t linked = 0; + + spin_lock_irqsave(&q->lock, flags); + + /* + * Atomically link the tail to port iff the tail is linked. + * If the tail is unlinked the queue is empty. + * + * If port is the same as tail, the queue is empty but q->tail + * will appear linked as we just set LINKED above. + * + * If the queue is empty (i.e., we haven''t linked to the new + * event), head must be updated. + */ + if ( port != q->tail ) + { + tail_word = evtchn_fifo_word_from_port(d, q->tail); + linked = evtchn_fifo_set_link(tail_word, port); + } + if ( !linked ) + write_atomic(q->head, port); + q->tail = port; + + spin_unlock_irqrestore(&q->lock, flags); + + if ( !test_and_set_bit(q->priority, + &v->evtchn_fifo->control_block->ready) ) + vcpu_mark_events_pending(v); + } + + if ( !was_pending ) + evtchn_check_pollers(d, port); +} + +static void evtchn_fifo_clear_pending(struct domain *d, struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return; + + /* + * Just clear the P bit. + * + * No need to unlink as the guest will unlink and ignore + * non-pending events. + */ + clear_bit(EVTCHN_FIFO_PENDING, word); +} + +static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return; + + clear_bit(EVTCHN_FIFO_MASKED, word); + + /* Relink if pending. */ + if ( test_bit(EVTCHN_FIFO_PENDING, word) ) + evtchn_fifo_set_pending(v, evtchn); +} + +static bool_t evtchn_fifo_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return 0; + + return test_bit(EVTCHN_FIFO_PENDING, word); +} + +static bool_t evtchn_fifo_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return 1; + + return test_bit(EVTCHN_FIFO_MASKED, word); +} + +static int evtchn_fifo_set_priority(struct domain *d, struct evtchn *evtchn, + unsigned priority) +{ + if ( priority > EVTCHN_FIFO_PRIORITY_MIN ) + return -EINVAL; + + /* + * Only need to switch to the new queue for future events. If the + * event is already pending or in the process of being linked it + * will be on the old queue -- this is fine. + */ + evtchn->priority = priority; + + return 0; +} + +static void evtchn_fifo_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( !word ) + printk("? "); + else if ( test_bit(EVTCHN_FIFO_LINKED, word) ) + printk("%-4u", *word & EVTCHN_FIFO_LINK_MASK); + else + printk("- "); +} + +static const struct evtchn_port_ops evtchn_port_ops_fifo +{ + .set_pending = evtchn_fifo_set_pending, + .clear_pending = evtchn_fifo_clear_pending, + .unmask = evtchn_fifo_unmask, + .is_pending = evtchn_fifo_is_pending, + .is_masked = evtchn_fifo_is_masked, + .set_priority = evtchn_fifo_set_priority, + .print_state = evtchn_fifo_print_state, +}; + +static int map_guest_page(struct domain *d, uint64_t gfn, + struct page_info **page, void **virt) +{ + struct page_info *p; + + p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); + if ( !p ) + return -EINVAL; + + if ( !get_page_type(p, PGT_writable_page) ) + { + put_page(p); + return -EINVAL; + } + + *virt = map_domain_page_global(gfn); + if ( !*virt ) + { + put_page_and_type(p); + return -ENOMEM; + } + *page = p; + return 0; +} + +static void unmap_guest_page(struct page_info *page, void *virt) +{ + if ( page == NULL ) + return; + + unmap_domain_page_global(virt); + put_page_and_type(page); +} + +static void cleanup_control_block(struct vcpu *v) +{ + if ( v->evtchn_fifo ) + { + unmap_guest_page(v->evtchn_fifo->cb_page, v->evtchn_fifo->control_block); + xfree(v->evtchn_fifo); + v->evtchn_fifo = NULL; + } +} + +static void init_queue(struct vcpu *v, struct evtchn_fifo_queue *q, unsigned i) +{ + spin_lock_init(&q->lock); + q->priority = i; + q->head = &v->evtchn_fifo->control_block->head[i]; +} + +static int setup_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset) +{ + struct domain *d = v->domain; + struct evtchn_fifo_vcpu *efv; + struct page_info *page; + void *virt; + unsigned i; + int rc; + + if ( v->evtchn_fifo ) + return -EINVAL; + + efv = xzalloc(struct evtchn_fifo_vcpu); + if ( efv == NULL ) + return -ENOMEM; + + rc = map_guest_page(d, gfn, &page, &virt); + if ( rc < 0 ) + { + xfree(efv); + return rc; + } + + v->evtchn_fifo = efv; + + v->evtchn_fifo->cb_page = page; + v->evtchn_fifo->control_block = virt + offset; + + for ( i = 0; i <= EVTCHN_FIFO_PRIORITY_MIN; i++ ) + init_queue(v, &v->evtchn_fifo->queue[i], i); + + return 0; +} + +/* + * Setup an event array with no pages. + */ +static int setup_event_array(struct domain *d) +{ + if ( d->evtchn_fifo ) + return 0; + + d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain); + if ( d->evtchn_fifo == NULL ) + return -ENOMEM; + + d->evtchn_fifo->num_evtchns = 0; + + return 0; +} + +static void cleanup_event_array(struct domain *d) +{ + unsigned i; + + if ( d->evtchn_fifo == NULL ) + return; + + for ( i = 0; i < EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES; i++ ) + { + unmap_guest_page(d->evtchn_fifo->event_array[i].page, + d->evtchn_fifo->event_array[i].virt); + } + xfree(d->evtchn_fifo); +} + +static void set_priority_all(struct domain *d, unsigned priority) +{ + unsigned port; + + for ( port = 1; port < d->max_evtchns; port++ ) + { + if ( !port_is_valid(d, port) ) + break; + + evtchn_port_set_priority(d, evtchn_from_port(d, port), priority); + } +} + +int evtchn_fifo_init_control(struct evtchn_init_control *init_control) +{ + struct domain *d = current->domain; + uint32_t vcpu_id; + uint64_t gfn; + uint32_t offset; + struct vcpu *v; + int rc; + + init_control->link_bits = EVTCHN_FIFO_LINK_BITS; + + vcpu_id = init_control->vcpu; + gfn = init_control->control_mfn; + offset = init_control->offset; + + if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) ) + return -ENOENT; + v = d->vcpu[vcpu_id]; + + /* Must not cross page boundary. */ + if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) ) + return -EINVAL; + + /* Must be 8-bytes aligned. */ + if ( offset & (8 - 1) ) + return -EINVAL; + + spin_lock(&d->event_lock); + + rc = setup_control_block(v, gfn, offset); + + /* + * If this is the first control block, setup an empty event array + * and switch to the fifo port ops. + * + * Any ports currently bound will have their priority set to the + * default. + */ + if ( d->evtchn_fifo == NULL ) + { + rc = setup_event_array(d); + if ( rc < 0 ) + cleanup_control_block(v); + else + { + d->evtchn_port_ops = &evtchn_port_ops_fifo; + d->max_evtchns = 1 << EVTCHN_FIFO_LINK_BITS; + set_priority_all(d, EVTCHN_FIFO_PRIORITY_DEFAULT); + } + } + + spin_unlock(&d->event_lock); + + return rc; +} + +static int add_page_to_event_array(struct domain *d, unsigned long gfn) +{ + struct page_info *page = NULL; + void *virt; + unsigned slot; + int rc; + + slot = d->evtchn_fifo->num_evtchns / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + if ( slot >= EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES ) + return -ENOSPC; + + rc = map_guest_page(d, gfn, &page, &virt); + if ( rc < 0 ) + return rc; + + d->evtchn_fifo->event_array[slot].page = page; + d->evtchn_fifo->event_array[slot].virt = virt; + + d->evtchn_fifo->num_evtchns += EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + + return 0; +} + +int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array) +{ + struct domain *d = current->domain; + int rc; + + if ( !d->evtchn_fifo ) + return -ENOSYS; + + spin_lock(&d->event_lock); + rc = add_page_to_event_array(d, expand_array->array_mfn); + spin_unlock(&d->event_lock); + + return rc; +} + +void evtchn_fifo_destroy(struct domain *d) +{ + struct vcpu *v; + + for_each_vcpu( d, v ) + cleanup_control_block(v); + cleanup_event_array(d); +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/event_fifo.h b/xen/include/xen/event_fifo.h new file mode 100644 index 0000000..702d692 --- /dev/null +++ b/xen/include/xen/event_fifo.h @@ -0,0 +1,54 @@ +/* + * FIFO-based event channel ABI. + * + * Copyright (C) 2013 Citrix Systems R&D Ltd. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ +#ifndef __XEN_EVENT_FIFO_H__ +#define __XEN_EVENT_FIFO_H__ + +struct evtchn_fifo_queue { + uint32_t *head; /* points into control block */ + uint32_t tail; + spinlock_t lock; + uint8_t priority; +}; + +struct evtchn_fifo_vcpu { + struct page_info *cb_page; + struct evtchn_fifo_control_block *control_block; + struct evtchn_fifo_queue queue[EVTCHN_FIFO_MAX_QUEUES]; +}; + +#define EVTCHN_FIFO_EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t)) +#define EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES \ + ((1 << EVTCHN_FIFO_LINK_BITS) / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE) + + +struct evtchn_fifo_array_page { + struct page_info *page; + event_word_t *virt; +}; + +struct evtchn_fifo_domain { + struct evtchn_fifo_array_page event_array[EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES]; + unsigned num_evtchns; +}; + +int evtchn_fifo_init_control(struct evtchn_init_control *init_control); +int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array); +void evtchn_fifo_destroy(struct domain *domain); + +#endif /* __XEN_EVENT_FIFO_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ba3714d..5ec57af 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -61,7 +61,8 @@ extern struct domain *dom0; #define next_power_of_2(x) (__RDU32((x)-1) + 1) /* Maximum number of event channels for any ABI. */ -#define MAX_NR_EVTCHNS NR_EVENT_CHANNELS +#define MAX_NR_EVTCHNS (max_t(unsigned, NR_EVENT_CHANNELS, \ + 1 << EVTCHN_FIFO_LINK_BITS)) #define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) #define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) @@ -95,6 +96,7 @@ struct evtchn } pirq; /* state == ECS_PIRQ */ u16 virq; /* state == ECS_VIRQ */ } u; + u8 priority; #ifdef FLASK_ENABLE void *ssid; #endif @@ -209,6 +211,8 @@ struct vcpu /* Guest-specified relocation of vcpu_info. */ unsigned long vcpu_info_mfn; + struct evtchn_fifo_vcpu *evtchn_fifo; + struct arch_vcpu arch; }; @@ -290,6 +294,7 @@ struct domain unsigned max_evtchn_port; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; + struct evtchn_fifo_domain *evtchn_fifo; struct grant_table *grant_table; -- 1.7.2.5
Daniel De Graaf
2013-Sep-13 18:32 UTC
Re: [PATCH 09/11] evtchn: implement EVTCHNOP_set_limit
On 09/13/2013 12:56 PM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Add max_evtchn_port to struct domain as the maximum port number that a > domain may bind. Allow a suitably privileged domain to set this limit > so it may restrict the other domain''s usage of Xen resources for event > channels (primarily global mapping entries for the shared data > structures). > > The default values for this limit is unlimited for control domains or > NR_EVENT_CHANNELS - 1 for other domains. These defaults ensure that no > domain will regress in the number of event channels they may use. > > If a domain is expected to use the FIFO-based event channel ABI, a > toolstack may wish to set the limit to 1023 or lower to ensure a > minimum number of global mapping entries is used. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>This allows device model domains to adjust the limit of their target domain arbitrarily; I think it makes more sense for this to be XSM_PRIV so that only dom0 can adjust it, similar to max_mem. Control domains will already need an explicit XSM policy that can address this as a fine-grained permission. The new XSM hook also needs to be added to xsm/dummy.c. Adding a FLASK hook and permission similar to the one for evtchn_reset would also be nice, but that may fit better in a later patch that adds this operation to the example security policy. With the check changed to XSM_PRIV and a set_to_dummy_if_null line added to xsm/dummy.c, Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> -- Daniel De Graaf National Security Agency
Stefano Stabellini
2013-Sep-15 13:06 UTC
Re: [PATCH 02/11] evtchn: refactor low-level event channel port ops
On Fri, 13 Sep 2013, David Vrabel wrote:> diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c > new file mode 100644 > index 0000000..18c0c6e > --- /dev/null > +++ b/xen/common/event_2l.c > @@ -0,0 +1,99 @@ > +/* > + * Event channel port operations. > + * > + * Copyright (c) 2003-2006, K A Fraser. > + * > + * This source code is licensed under the GNU General Public License, > + * Version 2 or later. See the file COPYING for more details. > + */Does it have to be GPLv2 only? Can it be dual-licensed BSD to make it easier for other operating systems to support the new interface?
Stefano Stabellini
2013-Sep-15 13:11 UTC
Re: [PATCH 02/11] evtchn: refactor low-level event channel port ops
On Sun, 15 Sep 2013, Stefano Stabellini wrote:> On Fri, 13 Sep 2013, David Vrabel wrote: > > diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c > > new file mode 100644 > > index 0000000..18c0c6e > > --- /dev/null > > +++ b/xen/common/event_2l.c > > @@ -0,0 +1,99 @@ > > +/* > > + * Event channel port operations. > > + * > > + * Copyright (c) 2003-2006, K A Fraser. > > + * > > + * This source code is licensed under the GNU General Public License, > > + * Version 2 or later. See the file COPYING for more details. > > + */ > > Does it have to be GPLv2 only? Can it be dual-licensed BSD to make it > easier for other operating systems to support the new interface? >I meant to quote the new Linux files: drivers/xen/events/events_internal.h drivers/xen/events/n-level.c drivers/xen/events/fifo.c
Ian Campbell
2013-Sep-15 13:20 UTC
Re: [PATCH 02/11] evtchn: refactor low-level event channel port ops
On Sun, 2013-09-15 at 14:06 +0100, Stefano Stabellini wrote:> On Fri, 13 Sep 2013, David Vrabel wrote: > > diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c > > new file mode 100644 > > index 0000000..18c0c6e > > --- /dev/null > > +++ b/xen/common/event_2l.c > > @@ -0,0 +1,99 @@ > > +/* > > + * Event channel port operations. > > + * > > + * Copyright (c) 2003-2006, K A Fraser. > > + * > > + * This source code is licensed under the GNU General Public License, > > + * Version 2 or later. See the file COPYING for more details. > > + */ > > Does it have to be GPLv2 only? Can it be dual-licensed BSD to make it > easier for other operating systems to support the new interface?This is the Xen side not the guest side so I''m not sure how helpful that would be in practice. More useful would be BSD licenses mini-os support for this stuff, or just doing it in one of the BSDs, but I don''t think either of those should be a requirement for this series. Ian
Jan Beulich
2013-Sep-16 06:59 UTC
Re: [PATCH 07/11] evtchn: add FIFO-based event channel ABI
>>> On 13.09.13 at 18:56, David Vrabel <david.vrabel@citrix.com> wrote: > --- a/xen/include/public/xen.h > +++ b/xen/include/public/xen.h > @@ -553,7 +553,7 @@ typedef struct multicall_entry multicall_entry_t; > DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); > > /* > - * Event channel endpoints per domain: > + * Event channel endpoints per domain (when using the 2-level ABI): > * 1024 if a long is 32 bits; 4096 if a long is 64 bits. > */ > #define NR_EVENT_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64)This should probably become NR_L2_EVENT_CHANNELS, with the original symbol aliased to the new one only for pre-4.4 interface consumers. Jan
>>> On 13.09.13 at 18:56, David Vrabel <david.vrabel@citrix.com> wrote: > +static long evtchn_set_limit(const struct evtchn_set_limit *set_limit) > +{ > + struct domain *d; > + unsigned max_port = set_limit->max_port; > + long ret; > + > + if ( max_port > EVTCHN_MAX_PORT_UNLIMITED ) > + return -EINVAL; > + > + d = rcu_lock_domain_by_id(set_limit->domid); > + if ( !d ) > + return -ESRCH; > + > + ret = xsm_evtchn_set_limit(XSM_DM_PRIV, d); > + if ( ret ) > + goto out; > + > + spin_lock(&d->event_lock); > + > + d->max_evtchn_port = max_port;So you allow this to be set even if the L2 ABI is in use. Does this make sense? Is this consistent?> @@ -1189,6 +1229,11 @@ void evtchn_check_pollers(struct domain *d, unsigned port) > > int evtchn_init(struct domain *d) > { > + if ( is_control_domain(d) ) > + d->max_evtchn_port = EVTCHN_MAX_PORT_UNLIMITED; > + else > + d->max_evtchn_port = EVTCHN_MAX_PORT_DEFAULT; > + > /* Default to N-level ABI. */ > evtchn_2l_init(d);Similarly here - you set limits that are not consistent with the default L2 ABI.> --- a/xen/include/public/event_channel.h > +++ b/xen/include/public/event_channel.h > @@ -308,6 +308,9 @@ struct evtchn_set_limit { > }; > typedef struct evtchn_set_limit evtchn_set_limit_t; > > +#define EVTCHN_MAX_PORT_UNLIMITED ((1u << 31) - 1) > +#define EVTCHN_MAX_PORT_DEFAULT (NR_EVENT_CHANNELS - 1)Does the former really need to be part of the ABI? And does it really need to be 2^31-1 (rather than 2^32-1)? Independent of that I think that - as hinted at in the overview mail - this ought to be a domctl. Jan
On 16/09/13 08:07, Jan Beulich wrote:>>>> On 13.09.13 at 18:56, David Vrabel <david.vrabel@citrix.com> wrote: >> +static long evtchn_set_limit(const struct evtchn_set_limit *set_limit) >> +{ >> + struct domain *d; >> + unsigned max_port = set_limit->max_port; >> + long ret; >> + >> + if ( max_port > EVTCHN_MAX_PORT_UNLIMITED ) >> + return -EINVAL; >> + >> + d = rcu_lock_domain_by_id(set_limit->domid); >> + if ( !d ) >> + return -ESRCH; >> + >> + ret = xsm_evtchn_set_limit(XSM_DM_PRIV, d); >> + if ( ret ) >> + goto out; >> + >> + spin_lock(&d->event_lock); >> + >> + d->max_evtchn_port = max_port; > > So you allow this to be set even if the L2 ABI is in use. Does this > make sense? Is this consistent?I think it would be confusing if guests could subvert the limit by using a different ABI, even if it doesn''t really make much difference from a resource usage point of view.>> @@ -1189,6 +1229,11 @@ void evtchn_check_pollers(struct domain *d, unsigned port) >> >> int evtchn_init(struct domain *d) >> { >> + if ( is_control_domain(d) ) >> + d->max_evtchn_port = EVTCHN_MAX_PORT_UNLIMITED; >> + else >> + d->max_evtchn_port = EVTCHN_MAX_PORT_DEFAULT; >> + >> /* Default to N-level ABI. */ >> evtchn_2l_init(d); > > Similarly here - you set limits that are not consistent with the default > L2 ABI.I''m not sure why you think they are inconsistent, the limits set here are such that there is no regression in the number of usable event channels. A guest is still limited by the maximum supported by any ABI. i.e., the limit is min(d->max_evtchn_port, d->max_evtchns-1). However, I''m going to change it so the hypervisor always sets the limit to unlimited. The toolstack should be responsible for setting any limits (and picking a sensible default).>> --- a/xen/include/public/event_channel.h >> +++ b/xen/include/public/event_channel.h >> @@ -308,6 +308,9 @@ struct evtchn_set_limit { >> }; >> typedef struct evtchn_set_limit evtchn_set_limit_t; >> >> +#define EVTCHN_MAX_PORT_UNLIMITED ((1u << 31) - 1) >> +#define EVTCHN_MAX_PORT_DEFAULT (NR_EVENT_CHANNELS - 1) > > Does the former really need to be part of the ABI? And does it > really need to be 2^31-1 (rather than 2^32-1)?The hypervisor uses int for port in places (e.g., get_free_port() where it returns a port number or a negative error code). I will remove UNLIMITED from the ABI and set_limit will map any limit > UNLIMITED to UNLIMITED. At some point some one should go a change all the uses for port number to unsigned but I think this is work independent of this series. David
David Vrabel
2013-Sep-16 10:08 UTC
Re: [PATCH 02/11] evtchn: refactor low-level event channel port ops
On 15/09/13 14:11, Stefano Stabellini wrote:> On Sun, 15 Sep 2013, Stefano Stabellini wrote: >> On Fri, 13 Sep 2013, David Vrabel wrote: >>> diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c >>> new file mode 100644 >>> index 0000000..18c0c6e >>> --- /dev/null >>> +++ b/xen/common/event_2l.c >>> @@ -0,0 +1,99 @@ >>> +/* >>> + * Event channel port operations. >>> + * >>> + * Copyright (c) 2003-2006, K A Fraser. >>> + * >>> + * This source code is licensed under the GNU General Public License, >>> + * Version 2 or later. See the file COPYING for more details. >>> + */ >> >> Does it have to be GPLv2 only? Can it be dual-licensed BSD to make it >> easier for other operating systems to support the new interface? >> > > I meant to quote the new Linux files: > > drivers/xen/events/events_internal.h > drivers/xen/events/n-level.cThese will have to stay as GPLv2 as the are derived from the original events.c which does not have any license stated (so we must assume GPLv2).> drivers/xen/events/fifo.cI will make this one dual GPLv2 or later and 2-clause BSD. Although, I do think that any implementer would be better off basing their work off the documentation rather a Linux-centric implementation. David
>>> On 16.09.13 at 12:00, David Vrabel <david.vrabel@citrix.com> wrote: > On 16/09/13 08:07, Jan Beulich wrote: >>>>> On 13.09.13 at 18:56, David Vrabel <david.vrabel@citrix.com> wrote: >>> +static long evtchn_set_limit(const struct evtchn_set_limit *set_limit) >>> +{ >>> + struct domain *d; >>> + unsigned max_port = set_limit->max_port; >>> + long ret; >>> + >>> + if ( max_port > EVTCHN_MAX_PORT_UNLIMITED ) >>> + return -EINVAL; >>> + >>> + d = rcu_lock_domain_by_id(set_limit->domid); >>> + if ( !d ) >>> + return -ESRCH; >>> + >>> + ret = xsm_evtchn_set_limit(XSM_DM_PRIV, d); >>> + if ( ret ) >>> + goto out; >>> + >>> + spin_lock(&d->event_lock); >>> + >>> + d->max_evtchn_port = max_port; >> >> So you allow this to be set even if the L2 ABI is in use. Does this >> make sense? Is this consistent? > > I think it would be confusing if guests could subvert the limit by using > a different ABI, even if it doesn''t really make much difference from a > resource usage point of view.Somehow I''m getting the impression we''re no understanding one another. My questions were: - What''s the point of permitting use of this function for a guest using the 2-level ABI? - If you consider it valid to be used by a 2-level guest, will the result be consistent (will the guest see the limit enforced, and is there no implicit assumption somewhere that 2-level guests can always use the so far statically limited number of event channels)?>>> @@ -1189,6 +1229,11 @@ void evtchn_check_pollers(struct domain *d, unsigned port) >>> >>> int evtchn_init(struct domain *d) >>> { >>> + if ( is_control_domain(d) ) >>> + d->max_evtchn_port = EVTCHN_MAX_PORT_UNLIMITED; >>> + else >>> + d->max_evtchn_port = EVTCHN_MAX_PORT_DEFAULT; >>> + >>> /* Default to N-level ABI. */ >>> evtchn_2l_init(d); >> >> Similarly here - you set limits that are not consistent with the default >> L2 ABI. > > I''m not sure why you think they are inconsistent, the limits set here > are such that there is no regression in the number of usable event > channels. A guest is still limited by the maximum supported by any ABI. > i.e., the limit is min(d->max_evtchn_port, d->max_evtchns-1).I asked because the limit for a 2-level Dom0 is now wrong. But you ought to read this in the context of the questions above, i.e. if all''s consistent (and the max() you point out is indeed consistently enforced), then there is no issue.>>> --- a/xen/include/public/event_channel.h >>> +++ b/xen/include/public/event_channel.h >>> @@ -308,6 +308,9 @@ struct evtchn_set_limit { >>> }; >>> typedef struct evtchn_set_limit evtchn_set_limit_t; >>> >>> +#define EVTCHN_MAX_PORT_UNLIMITED ((1u << 31) - 1) >>> +#define EVTCHN_MAX_PORT_DEFAULT (NR_EVENT_CHANNELS - 1) >> >> Does the former really need to be part of the ABI? And does it >> really need to be 2^31-1 (rather than 2^32-1)? > > The hypervisor uses int for port in places (e.g., get_free_port() where > it returns a port number or a negative error code).Ah, right, yes.> I will remove UNLIMITED from the ABI and set_limit will map any limit > > UNLIMITED to UNLIMITED. > > At some point some one should go a change all the uses for port number > to unsigned but I think this is work independent of this series.Indeed. Jan
On 16/09/13 11:33, Jan Beulich wrote:>>>> On 16.09.13 at 12:00, David Vrabel <david.vrabel@citrix.com> wrote: >> On 16/09/13 08:07, Jan Beulich wrote: >>>>>> On 13.09.13 at 18:56, David Vrabel <david.vrabel@citrix.com> wrote: >>>> +static long evtchn_set_limit(const struct evtchn_set_limit *set_limit) >>>> +{ >>>> + struct domain *d; >>>> + unsigned max_port = set_limit->max_port; >>>> + long ret; >>>> + >>>> + if ( max_port > EVTCHN_MAX_PORT_UNLIMITED ) >>>> + return -EINVAL; >>>> + >>>> + d = rcu_lock_domain_by_id(set_limit->domid); >>>> + if ( !d ) >>>> + return -ESRCH; >>>> + >>>> + ret = xsm_evtchn_set_limit(XSM_DM_PRIV, d); >>>> + if ( ret ) >>>> + goto out; >>>> + >>>> + spin_lock(&d->event_lock); >>>> + >>>> + d->max_evtchn_port = max_port; >>> >>> So you allow this to be set even if the L2 ABI is in use. Does this >>> make sense? Is this consistent? >> >> I think it would be confusing if guests could subvert the limit by using >> a different ABI, even if it doesn''t really make much difference from a >> resource usage point of view. > > Somehow I''m getting the impression we''re no understanding one > another. My questions were: > - What''s the point of permitting use of this function for a guest > using the 2-level ABI?So any administrator set limit is consistently applied regardless of which ABI a guest uses.> - If you consider it valid to be used by a 2-level guest, will the > result be consistent (will the guest see the limit enforced, and is > there no implicit assumption somewhere that 2-level guests can > always use the so far statically limited number of event channels)?The limit is enforced (see the checks in get_free_port()) for all ABIs: for ( port = 0; port_is_valid(d, port); port++ ) { if (port > d->max_evtchn_port ) return -ENOSPC; if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; } if ( port == d->max_evtchns || port > d->max_evtchn_port ) return -ENOSPC;>>>> @@ -1189,6 +1229,11 @@ void evtchn_check_pollers(struct domain *d, unsigned port) >>>> >>>> int evtchn_init(struct domain *d) >>>> { >>>> + if ( is_control_domain(d) ) >>>> + d->max_evtchn_port = EVTCHN_MAX_PORT_UNLIMITED; >>>> + else >>>> + d->max_evtchn_port = EVTCHN_MAX_PORT_DEFAULT; >>>> + >>>> /* Default to N-level ABI. */ >>>> evtchn_2l_init(d); >>> >>> Similarly here - you set limits that are not consistent with the default >>> L2 ABI. >> >> I''m not sure why you think they are inconsistent, the limits set here >> are such that there is no regression in the number of usable event >> channels. A guest is still limited by the maximum supported by any ABI. >> i.e., the limit is min(d->max_evtchn_port, d->max_evtchns-1). > > I asked because the limit for a 2-level Dom0 is now wrong. But you > ought to read this in the context of the questions above, i.e. if all''s > consistent (and the max() you point out is indeed consistently > enforced), then there is no issue.The maximum supported by the ABI and the administratively set limit are independent constraints. get_free_port() makes sure neither constraint is exceeded. David