http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf The v5 version of the Linux patch series will be posted shortly and can be found in the orochi-v5 branch of: git://xenbits.xen.org/people/dvrabel/linux.git Patch 1-4 do some preparatory work for supporting alternate ABIs. Patch 5 expands the number of evtchn objects a domain may have by changing how they are allocated. Patch 6 adds the public ABI. Patch 7 adds the EVTCHNOP_set_priority implementation. This will return -ENOSYS for ABIs that do not support priority. Patch 8* adds the FIFO-based ABI implementation. Patch 9*-10 adds the DOMCTL_set_max_evtchn implementation and adds a function to libxc. This will also work with the 2-level ABI. Patch 11* add the max_event_channels configuration option to xl and libxl bits needed for this. * Reviewed-by pending. Changes in v5: - xl config option renamed to ''max_event_channels''. - Default set by libxl is 1023. - Fix flask (I think). - Use domain_page_map_to_mfn() when unmapping guest pages. - Coding style (unsigned -> unsigned int throughout, x == NULL -> !x). Changes in v4: - Updates for Draft F of the design. - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. - Hypervisor defaults to unlimited event channels for for DomU. - Optimized memory allocation for struct evtchn''s when fewer than 128 are required (see patch 5). - Added event_channels option to the xl domain configuration file and plumbed this through libxl_build_info. Defaults to 127. Changes in v3: - Updates for Draft E of the design. - Store priority in struct evtchn. - Implement set_priority with generic code + hook. - Implement set_limit and add libxc function. - Add ABI specific output to ''e'' debug key. Changes in v2: - Updates for Draft D of the design. - 130,000+ event channels are now supported. - event_port.c -> event_2l.c and only contains 2l functions. - Addresses various review comments - int -> unsigned in lots of places - use write_atomic() to set HEAD - removed MAX_EVTCHNS - evtchn_ops are const. - Pack struct evtchns better to reduce memory needed.
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 01/11] debug: remove some event channel info from the ''i'' and ''q'' debug keys
From: David Vrabel <david.vrabel@citrix.com> The ''i'' key would always use VCPU0''s selector word when printing the event channel state. Remove the incorrect output as a subsequent change will add the (correct) information to the ''e'' key instead. When dumping domain information, printing the state of the VIRQ_DEBUG port is redundant -- this information is available via the ''e'' key. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/irq.c | 5 +---- xen/common/keyhandler.c | 11 ++--------- 2 files changed, 3 insertions(+), 13 deletions(-) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index c61cc46..7f547ff 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2262,14 +2262,11 @@ static void dump_irqs(unsigned char key) d = action->guest[i]; pirq = domain_irq_to_pirq(d, irq); info = pirq_info(d, pirq); - printk("%u:%3d(%c%c%c%c)", + printk("%u:%3d(%c%c%c)", d->domain_id, pirq, (test_bit(info->evtchn, &shared_info(d, evtchn_pending)) ? ''P'' : ''-''), - (test_bit(info->evtchn / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(d->vcpu[0], evtchn_pending_sel)) ? - ''S'' : ''-''), (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ? ''M'' : ''-''), (info->masked ? ''M'' : ''-'')); diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index b9ad1b5..8e4b3f8 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -310,16 +310,9 @@ static void dump_domains(unsigned char key) { for_each_vcpu ( d, v ) { - printk("Notifying guest %d:%d (virq %d, port %d, stat %d/%d/%d)\n", + printk("Notifying guest %d:%d (virq %d, port %d)\n", d->domain_id, v->vcpu_id, - VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG], - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_pending)), - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_mask)), - test_bit(v->virq_to_evtchn[VIRQ_DEBUG] / - BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel))); + VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG]); send_guest_vcpu_virq(v, VIRQ_DEBUG); } } -- 1.7.2.5
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 02/11] evtchn: refactor low-level event channel port ops
From: David Vrabel <david.vrabel@citrix.com> Use functions for the low-level event channel port operations (set/clear pending, unmask, is_pending and is_masked). Group these functions into a struct evtchn_port_op so they can be replaced by alternate implementations (for different ABIs) on a per-domain basis. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/irq.c | 11 ++--- xen/common/Makefile | 1 + xen/common/event_2l.c | 99 ++++++++++++++++++++++++++++++++++++++++++++ xen/common/event_channel.c | 87 +++++++++++++++------------------------ xen/common/schedule.c | 3 +- xen/include/xen/event.h | 45 ++++++++++++++++++++ xen/include/xen/sched.h | 4 ++ 7 files changed, 189 insertions(+), 61 deletions(-) create mode 100644 xen/common/event_2l.c diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 7f547ff..53fe9e3 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1474,7 +1474,7 @@ int pirq_guest_unmask(struct domain *d) { pirq = pirqs[i]->pirq; if ( pirqs[i]->masked && - !test_bit(pirqs[i]->evtchn, &shared_info(d, evtchn_mask)) ) + !evtchn_port_is_masked(d, evtchn_from_port(d, pirqs[i]->evtchn)) ) pirq_guest_eoi(pirqs[i]); } } while ( ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) ); @@ -2222,6 +2222,7 @@ static void dump_irqs(unsigned char key) int i, irq, pirq; struct irq_desc *desc; irq_guest_action_t *action; + struct evtchn *evtchn; struct domain *d; const struct pirq *info; unsigned long flags; @@ -2262,13 +2263,11 @@ static void dump_irqs(unsigned char key) d = action->guest[i]; pirq = domain_irq_to_pirq(d, irq); info = pirq_info(d, pirq); + evtchn = evtchn_from_port(d, info->evtchn); printk("%u:%3d(%c%c%c)", d->domain_id, pirq, - (test_bit(info->evtchn, - &shared_info(d, evtchn_pending)) ? - ''P'' : ''-''), - (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ? - ''M'' : ''-''), + (evtchn_port_is_pending(d, evtchn) ? ''P'' : ''-''), + (evtchn_port_is_masked(d, evtchn) ? ''M'' : ''-''), (info->masked ? ''M'' : ''-'')); if ( i != action->nr_guests ) printk(","); diff --git a/xen/common/Makefile b/xen/common/Makefile index 6da4651..69ab94f 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -5,6 +5,7 @@ obj-y += cpupool.o obj-$(HAS_DEVICE_TREE) += device_tree.o obj-y += domctl.o obj-y += domain.o +obj-y += event_2l.o obj-y += event_channel.o obj-y += grant_table.o obj-y += irq.o diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c new file mode 100644 index 0000000..ed6de38 --- /dev/null +++ b/xen/common/event_2l.c @@ -0,0 +1,99 @@ +/* + * Event channel port operations. + * + * Copyright (c) 2003-2006, K A Fraser. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ + +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/sched.h> +#include <xen/event.h> + +static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn) +{ + struct domain *d = v->domain; + unsigned int port = evtchn->port; + + /* + * The following bit operations must happen in strict order. + * NB. On x86, the atomic bit operations also act as memory barriers. + * There is therefore sufficiently strict ordering for this architecture -- + * others may require explicit memory barriers. + */ + + if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) ) + return; + + if ( !test_bit (port, &shared_info(d, evtchn_mask)) && + !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel)) ) + { + vcpu_mark_events_pending(v); + } + + evtchn_check_pollers(d, port); +} + +static void evtchn_2l_clear_pending(struct domain *d, struct evtchn *evtchn) +{ + clear_bit(evtchn->port, &shared_info(d, evtchn_pending)); +} + +static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + unsigned int port = evtchn->port; + + /* + * These operations must happen in strict order. Based on + * evtchn_2l_set_pending() above. + */ + if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) && + test_bit (port, &shared_info(d, evtchn_pending)) && + !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel)) ) + { + vcpu_mark_events_pending(v); + } +} + +static bool_t evtchn_2l_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + return test_bit(evtchn->port, &shared_info(d, evtchn_pending)); +} + +static bool_t evtchn_2l_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + return test_bit(evtchn->port, &shared_info(d, evtchn_mask)); +} + +static const struct evtchn_port_ops evtchn_port_ops_2l +{ + .set_pending = evtchn_2l_set_pending, + .clear_pending = evtchn_2l_clear_pending, + .unmask = evtchn_2l_unmask, + .is_pending = evtchn_2l_is_pending, + .is_masked = evtchn_2l_is_masked, +}; + +void evtchn_2l_init(struct domain *d) +{ + d->evtchn_port_ops = &evtchn_port_ops_2l; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 64c976b..7290a21 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -150,6 +150,7 @@ static int get_free_port(struct domain *d) xfree(chn); return -ENOMEM; } + chn[i].port = port + i; } bucket_from_port(d, port) = chn; @@ -530,7 +531,7 @@ static long __evtchn_close(struct domain *d1, int port1) } /* Clear pending event to avoid unexpected behavior on re-bind. */ - clear_bit(port1, &shared_info(d1, evtchn_pending)); + evtchn_port_clear_pending(d1, chn1); /* Reset binding to vcpu0 when the channel is freed. */ chn1->state = ECS_FREE; @@ -615,43 +616,7 @@ out: static void evtchn_set_pending(struct vcpu *v, int port) { - struct domain *d = v->domain; - int vcpuid; - - /* - * The following bit operations must happen in strict order. - * NB. On x86, the atomic bit operations also act as memory barriers. - * There is therefore sufficiently strict ordering for this architecture -- - * others may require explicit memory barriers. - */ - - if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) ) - return; - - if ( !test_bit (port, &shared_info(d, evtchn_mask)) && - !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel)) ) - { - vcpu_mark_events_pending(v); - } - - /* Check if some VCPU might be polling for this event. */ - if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) - return; - - /* Wake any interested (or potentially interested) pollers. */ - for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); - vcpuid < d->max_vcpus; - vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) - { - v = d->vcpu[vcpuid]; - if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && - test_and_clear_bit(vcpuid, d->poll_mask) ) - { - v->poll_evtchn = 0; - vcpu_unblock(v); - } - } + evtchn_port_set_pending(v, evtchn_from_port(v->domain, port)); } int guest_enabled_event(struct vcpu *v, uint32_t virq) @@ -920,26 +885,15 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) int evtchn_unmask(unsigned int port) { struct domain *d = current->domain; - struct vcpu *v; + struct evtchn *evtchn; ASSERT(spin_is_locked(&d->event_lock)); if ( unlikely(!port_is_valid(d, port)) ) return -EINVAL; - v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id]; - - /* - * These operations must happen in strict order. Based on - * include/xen/event.h:evtchn_set_pending(). - */ - if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) && - test_bit (port, &shared_info(d, evtchn_pending)) && - !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel)) ) - { - vcpu_mark_events_pending(v); - } + evtchn = evtchn_from_port(d, port); + evtchn_port_unmask(d, evtchn); return 0; } @@ -1170,9 +1124,34 @@ void notify_via_xen_event_channel(struct domain *ld, int lport) spin_unlock(&ld->event_lock); } +void evtchn_check_pollers(struct domain *d, unsigned int port) +{ + struct vcpu *v; + unsigned int vcpuid; + + /* Check if some VCPU might be polling for this event. */ + if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) + return; + + /* Wake any interested (or potentially interested) pollers. */ + for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); + vcpuid < d->max_vcpus; + vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) + { + v = d->vcpu[vcpuid]; + if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && + test_and_clear_bit(vcpuid, d->poll_mask) ) + { + v->poll_evtchn = 0; + vcpu_unblock(v); + } + } +} int evtchn_init(struct domain *d) { + evtchn_2l_init(d); + spin_lock_init(&d->event_lock); if ( get_free_port(d) != 0 ) return -EINVAL; @@ -1270,8 +1249,8 @@ static void domain_dump_evtchn_info(struct domain *d) printk(" %4u [%d/%d]: s=%d n=%d x=%d", port, - !!test_bit(port, &shared_info(d, evtchn_pending)), - !!test_bit(port, &shared_info(d, evtchn_mask)), + !!evtchn_port_is_pending(d, chn), + !!evtchn_port_is_masked(d, chn), chn->state, chn->notify_vcpu_id, chn->xen_consumer); switch ( chn->state ) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index a8398bd..7e6884d 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -34,6 +34,7 @@ #include <xen/multicall.h> #include <xen/cpu.h> #include <xen/preempt.h> +#include <xen/event.h> #include <public/sched.h> #include <xsm/xsm.h> @@ -751,7 +752,7 @@ static long do_poll(struct sched_poll *sched_poll) goto out; rc = 0; - if ( test_bit(port, &shared_info(d, evtchn_pending)) ) + if ( evtchn_port_is_pending(d, evtchn_from_port(d, port)) ) goto out; } diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 6f60162..30c59c9 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -102,4 +102,49 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); smp_mb(); /* set blocked status /then/ caller does his work */ \ } while ( 0 ) +void evtchn_check_pollers(struct domain *d, unsigned int port); + +void evtchn_2l_init(struct domain *d); + +/* + * Low-level event channel port ops. + */ +struct evtchn_port_ops { + void (*set_pending)(struct vcpu *v, struct evtchn *evtchn); + void (*clear_pending)(struct domain *d, struct evtchn *evtchn); + void (*unmask)(struct domain *d, struct evtchn *evtchn); + bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); + bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); +}; + +static inline void evtchn_port_set_pending(struct vcpu *v, + struct evtchn *evtchn) +{ + v->domain->evtchn_port_ops->set_pending(v, evtchn); +} + +static inline void evtchn_port_clear_pending(struct domain *d, + struct evtchn *evtchn) +{ + d->evtchn_port_ops->clear_pending(d, evtchn); +} + +static inline void evtchn_port_unmask(struct domain *d, + struct evtchn *evtchn) +{ + d->evtchn_port_ops->unmask(d, evtchn); +} + +static inline bool_t evtchn_port_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + return d->evtchn_port_ops->is_pending(d, evtchn); +} + +static inline bool_t evtchn_port_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + return d->evtchn_port_ops->is_masked(d, evtchn); +} + #endif /* __XEN_EVENT_H__ */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 0013a8d..fb9cf11 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -66,6 +66,7 @@ struct evtchn u8 state; /* ECS_* */ u8 xen_consumer; /* Consumer in Xen, if any? (0 = send to guest) */ u16 notify_vcpu_id; /* VCPU for local delivery notification */ + u32 port; union { struct { domid_t remote_domid; @@ -238,6 +239,8 @@ struct mem_event_per_domain struct mem_event_domain access; }; +struct evtchn_port_ops; + struct domain { domid_t domain_id; @@ -271,6 +274,7 @@ struct domain /* Event channel information. */ struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; spinlock_t event_lock; + const struct evtchn_port_ops *evtchn_port_ops; struct grant_table *grant_table; -- 1.7.2.5
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 03/11] evtchn: print ABI specific state with the ''e'' debug key
From: David Vrabel <david.vrabel@citrix.com> In the output of the ''e'' debug key, print some ABI specific state in addition to the (p)ending and (m)asked bits. For the 2-level ABI, print the state of that event''s selector bit. e.g., (XEN) port [p/m/s] (XEN) 1 [0/0/1]: s=3 n=0 x=0 d=0 p=74 (XEN) 2 [0/0/1]: s=3 n=0 x=0 d=0 p=75 Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/common/event_2l.c | 10 ++++++++++ xen/common/event_channel.c | 8 +++++--- xen/include/xen/event.h | 7 +++++++ 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c index ed6de38..5c27eab 100644 --- a/xen/common/event_2l.c +++ b/xen/common/event_2l.c @@ -74,6 +74,15 @@ static bool_t evtchn_2l_is_masked(struct domain *d, return test_bit(evtchn->port, &shared_info(d, evtchn_mask)); } +static void evtchn_2l_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + + printk("%d", !!test_bit(evtchn->port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel))); +} + static const struct evtchn_port_ops evtchn_port_ops_2l { .set_pending = evtchn_2l_set_pending, @@ -81,6 +90,7 @@ static const struct evtchn_port_ops evtchn_port_ops_2l .unmask = evtchn_2l_unmask, .is_pending = evtchn_2l_is_pending, .is_masked = evtchn_2l_is_masked, + .print_state = evtchn_2l_print_state, }; void evtchn_2l_init(struct domain *d) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 7290a21..f73c7a9 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1232,7 +1232,7 @@ static void domain_dump_evtchn_info(struct domain *d) d->poll_mask, d->max_vcpus); printk("Event channel information for domain %d:\n" "Polling vCPUs: {%s}\n" - " port [p/m]\n", d->domain_id, keyhandler_scratch); + " port [p/m/s]\n", d->domain_id, keyhandler_scratch); spin_lock(&d->event_lock); @@ -1247,10 +1247,12 @@ static void domain_dump_evtchn_info(struct domain *d) if ( chn->state == ECS_FREE ) continue; - printk(" %4u [%d/%d]: s=%d n=%d x=%d", + printk(" %4u [%d/%d/", port, !!evtchn_port_is_pending(d, chn), - !!evtchn_port_is_masked(d, chn), + !!evtchn_port_is_masked(d, chn)); + evtchn_port_print_state(d, chn); + printk("]: s=%d n=%d x=%d", chn->state, chn->notify_vcpu_id, chn->xen_consumer); switch ( chn->state ) diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 30c59c9..2445562 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -115,6 +115,7 @@ struct evtchn_port_ops { void (*unmask)(struct domain *d, struct evtchn *evtchn); bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); + void (*print_state)(struct domain *d, const struct evtchn *evtchn); }; static inline void evtchn_port_set_pending(struct vcpu *v, @@ -147,4 +148,10 @@ static inline bool_t evtchn_port_is_masked(struct domain *d, return d->evtchn_port_ops->is_masked(d, evtchn); } +static inline void evtchn_port_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + d->evtchn_port_ops->print_state(d, evtchn); +} + #endif /* __XEN_EVENT_H__ */ -- 1.7.2.5
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 04/11] evtchn: use a per-domain variable for the max number of event channels
From: David Vrabel <david.vrabel@citrix.com> Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead. This avoids having to repeatedly check the ABI type. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/common/event_2l.c | 1 + xen/common/event_channel.c | 4 ++-- xen/common/schedule.c | 2 +- xen/include/xen/event.h | 2 +- xen/include/xen/sched.h | 2 +- 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c index 5c27eab..add37e4 100644 --- a/xen/common/event_2l.c +++ b/xen/common/event_2l.c @@ -96,6 +96,7 @@ static const struct evtchn_port_ops evtchn_port_ops_2l void evtchn_2l_init(struct domain *d) { d->evtchn_port_ops = &evtchn_port_ops_2l; + d->max_evtchns = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d); } /* diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index f73c7a9..539a198 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -134,7 +134,7 @@ static int get_free_port(struct domain *d) if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; - if ( port == MAX_EVTCHNS(d) ) + if ( port == d->max_evtchns ) return -ENOSPC; chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); @@ -1236,7 +1236,7 @@ static void domain_dump_evtchn_info(struct domain *d) spin_lock(&d->event_lock); - for ( port = 1; port < MAX_EVTCHNS(d); ++port ) + for ( port = 1; port < d->max_evtchns; ++port ) { const struct evtchn *chn; char *ssid; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 7e6884d..a5a0010 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -748,7 +748,7 @@ static long do_poll(struct sched_poll *sched_poll) goto out; rc = -EINVAL; - if ( port >= MAX_EVTCHNS(d) ) + if ( port >= d->max_evtchns ) goto out; rc = 0; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 2445562..6933f02 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -73,7 +73,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); #define bucket_from_port(d,p) \ ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET]) #define port_is_valid(d,p) \ - (((p) >= 0) && ((p) < MAX_EVTCHNS(d)) && \ + (((p) >= 0) && ((p) < (d)->max_evtchns) && \ (bucket_from_port(d,p) != NULL)) #define evtchn_from_port(d,p) \ (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)]) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index fb9cf11..cb9604a 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -50,7 +50,6 @@ extern struct domain *dom0; #else #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG) #endif -#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d)) #define EVTCHNS_PER_BUCKET 128 #define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) @@ -273,6 +272,7 @@ struct domain /* Event channel information. */ struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; + unsigned int max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; -- 1.7.2.5
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 05/11] evtchn: allow many more evtchn objects to be allocated per domain
From: David Vrabel <david.vrabel@citrix.com> Expand the number of event channels that can be supported internally by altering now struct evtchn''s are allocated. The objects are indexed using a two level scheme of groups and buckets (instead of only buckets). Each group is a page of bucket pointers. Each bucket is a page-sized array of struct evtchn''s. The optimal number of evtchns per bucket is calculated at compile time. If XSM is not enabled, struct evtchn is 16 bytes and each bucket contains 256, requiring only 1 group of 512 pointers for 2^17 (131,072) event channels. With XSM enabled, struct evtchn is 24 bytes, each bucket contains 128 and 2 groups are required. For the common case of a domain with only a few event channels, instead of requiring an additional allocation for the group page, the first bucket is indexed directly. As a consequence of this, struct domain shrinks by at least 232 bytes as 32 bucket pointers are replaced with 1 bucket pointer and (at most) 2 group pointers. [ Based on a patch from Wei Liu with improvements from Malcolm Crossley. ] Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/common/event_channel.c | 82 ++++++++++++++++++++++++++++++++++---------- xen/include/xen/event.h | 40 ++++++++++++++++----- xen/include/xen/sched.h | 21 ++++++++++-- 3 files changed, 113 insertions(+), 30 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 539a198..87bca94 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -121,11 +121,47 @@ static int virq_is_global(uint32_t virq) } +static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port) +{ + struct evtchn *chn; + unsigned int i; + + chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); + if ( !chn ) + return NULL; + + for ( i = 0; i < EVTCHNS_PER_BUCKET; i++ ) + { + if ( xsm_alloc_security_evtchn(&chn[i]) ) + { + while ( i-- ) + xsm_free_security_evtchn(&chn[i]); + xfree(chn); + return NULL; + } + chn[i].port = port + i; + } + return chn; +} + +static void free_evtchn_bucket(struct domain *d, struct evtchn *bucket) +{ + unsigned int i; + + if ( !bucket ) + return; + + for ( i = 0; i < EVTCHNS_PER_BUCKET; i++ ) + xsm_free_security_evtchn(bucket + i); + + xfree(bucket); +} + static int get_free_port(struct domain *d) { struct evtchn *chn; + struct evtchn **grp; int port; - int i, j; if ( d->is_dying ) return -EINVAL; @@ -137,22 +173,17 @@ static int get_free_port(struct domain *d) if ( port == d->max_evtchns ) return -ENOSPC; - chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); - if ( unlikely(chn == NULL) ) - return -ENOMEM; - - for ( i = 0; i < EVTCHNS_PER_BUCKET; i++ ) + if ( !group_from_port(d, port) ) { - if ( xsm_alloc_security_evtchn(&chn[i]) ) - { - for ( j = 0; j < i; j++ ) - xsm_free_security_evtchn(&chn[j]); - xfree(chn); + grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP); + if ( !grp ) return -ENOMEM; - } - chn[i].port = port + i; + group_from_port(d, port) = grp; } + chn = alloc_evtchn_bucket(d, port); + if ( !chn ) + return -ENOMEM; bucket_from_port(d, port) = chn; return port; @@ -1152,15 +1183,25 @@ int evtchn_init(struct domain *d) { evtchn_2l_init(d); + d->evtchn = alloc_evtchn_bucket(d, 0); + if ( !d->evtchn ) + return -ENOMEM; + spin_lock_init(&d->event_lock); if ( get_free_port(d) != 0 ) + { + free_evtchn_bucket(d, d->evtchn); return -EINVAL; + } evtchn_from_port(d, 0)->state = ECS_RESERVED; #if MAX_VIRT_CPUS > BITS_PER_LONG d->poll_mask = xmalloc_array(unsigned long, BITS_TO_LONGS(MAX_VIRT_CPUS)); if ( !d->poll_mask ) + { + free_evtchn_bucket(d, d->evtchn); return -ENOMEM; + } bitmap_zero(d->poll_mask, MAX_VIRT_CPUS); #endif @@ -1170,7 +1211,7 @@ int evtchn_init(struct domain *d) void evtchn_destroy(struct domain *d) { - int i; + unsigned int i, j; /* After this barrier no new event-channel allocations can occur. */ BUG_ON(!d->is_dying); @@ -1185,12 +1226,17 @@ void evtchn_destroy(struct domain *d) /* Free all event-channel buckets. */ spin_lock(&d->event_lock); - for ( i = 0; i < NR_EVTCHN_BUCKETS; i++ ) + for ( i = 0; i < NR_EVTCHN_GROUPS; i++ ) { - xsm_free_security_evtchn(d->evtchn[i]); - xfree(d->evtchn[i]); - d->evtchn[i] = NULL; + if ( !d->evtchn_group[i] ) + continue; + for ( j = 0; j < BUCKETS_PER_GROUP; j++ ) + free_evtchn_bucket(d, d->evtchn_group[i][j]); + xfree(d->evtchn_group[i]); + d->evtchn_group[i] = NULL; } + free_evtchn_bucket(d, d->evtchn); + d->evtchn = NULL; spin_unlock(&d->event_lock); clear_global_virq_handlers(d); diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 6933f02..cba09e7 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -69,15 +69,37 @@ int guest_enabled_event(struct vcpu *v, uint32_t virq); /* Notify remote end of a Xen-attached event channel.*/ void notify_via_xen_event_channel(struct domain *ld, int lport); -/* Internal event channel object accessors */ -#define bucket_from_port(d,p) \ - ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET]) -#define port_is_valid(d,p) \ - (((p) >= 0) && ((p) < (d)->max_evtchns) && \ - (bucket_from_port(d,p) != NULL)) -#define evtchn_from_port(d,p) \ - (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)]) +/* + * Internal event channel object storage. + * + * The objects (struct evtchn) are indexed using a two level scheme of + * groups and buckets. Each group is a page of bucket pointers. Each + * bucket is a page-sized array of struct evtchn''s. + * + * The first bucket is directly accessed via d->evtchn. + */ +#define group_from_port(d, p) \ + ((d)->evtchn_group[(p) / EVTCHNS_PER_GROUP]) +#define bucket_from_port(d, p) \ + ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET]) +static inline bool_t port_is_valid(struct domain *d, unsigned int p) +{ + if ( p >= d->max_evtchns ) + return 0; + if ( !d->evtchn ) + return 0; + if ( p < EVTCHNS_PER_BUCKET ) + return 1; + return group_from_port(d, p) != NULL && bucket_from_port(d, p) != NULL; +} + +static inline struct evtchn *evtchn_from_port(struct domain *d, unsigned int p) +{ + if ( p < EVTCHNS_PER_BUCKET ) + return &d->evtchn[p]; + return bucket_from_port(d, p) + (p % EVTCHNS_PER_BUCKET); +} /* Wait on a Xen-attached event channel. */ #define wait_on_xen_event_channel(port, condition) \ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index cb9604a..59f6161 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -50,8 +50,22 @@ extern struct domain *dom0; #else #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG) #endif -#define EVTCHNS_PER_BUCKET 128 -#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) + +#define BUCKETS_PER_GROUP (PAGE_SIZE/sizeof(struct evtchn *)) +/* Round size of struct evtchn up to power of 2 size */ +#define __RDU2(x) ( (x) | ( (x) >> 1)) +#define __RDU4(x) ( __RDU2(x) | ( __RDU2(x) >> 2)) +#define __RDU8(x) ( __RDU4(x) | ( __RDU4(x) >> 4)) +#define __RDU16(x) ( __RDU8(x) | ( __RDU8(x) >> 8)) +#define __RDU32(x) (__RDU16(x) | (__RDU16(x) >>16)) +#define next_power_of_2(x) (__RDU32((x)-1) + 1) + +/* Maximum number of event channels for any ABI. */ +#define MAX_NR_EVTCHNS NR_EVENT_CHANNELS + +#define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) +#define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) +#define NR_EVTCHN_GROUPS DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP) struct evtchn { @@ -271,7 +285,8 @@ struct domain spinlock_t rangesets_lock; /* Event channel information. */ - struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; + struct evtchn *evtchn; /* first bucket only */ + struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */ unsigned int max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; -- 1.7.2.5
From: David Vrabel <david.vrabel@citrix.com> Add the event channel hypercall sub-ops and the definitions for the shared data structures for the FIFO-based event channel ABI. The design document for this new ABI is available here: http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf In summary, events are reported using a per-domain shared event array of event words. Each event word has PENDING, LINKED and MASKED bits and a LINK field for pointing to the next event in the event queue. There are 16 event queues (with different priorities) per-VCPU. Key advantages of this new ABI include: - Support for over 100,000 events (2^17). - 16 different event priorities. - Improved fairness in event latency through the use of FIFOs. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/include/public/event_channel.h | 75 ++++++++++++++++++++++++++++++++++++ xen/include/public/xen.h | 6 ++- xen/include/xen/sched.h | 2 +- 3 files changed, 80 insertions(+), 3 deletions(-) diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h index 472efdb..4a53484 100644 --- a/xen/include/public/event_channel.h +++ b/xen/include/public/event_channel.h @@ -71,6 +71,9 @@ #define EVTCHNOP_bind_vcpu 8 #define EVTCHNOP_unmask 9 #define EVTCHNOP_reset 10 +#define EVTCHNOP_init_control 11 +#define EVTCHNOP_expand_array 12 +#define EVTCHNOP_set_priority 13 /* ` } */ typedef uint32_t evtchn_port_t; @@ -258,6 +261,43 @@ struct evtchn_reset { typedef struct evtchn_reset evtchn_reset_t; /* + * EVTCHNOP_init_control: initialize the control block for the FIFO ABI. + * + * Note: any events that are currently pending will not be resent and + * will be lost. Guests should call this before binding any event to + * avoid losing any events. + */ +struct evtchn_init_control { + /* IN parameters. */ + uint64_t control_gfn; + uint32_t offset; + uint32_t vcpu; + /* OUT parameters. */ + uint8_t link_bits; + uint8_t _pad[7]; +}; +typedef struct evtchn_init_control evtchn_init_control_t; + +/* + * EVTCHNOP_expand_array: add an additional page to the event array. + */ +struct evtchn_expand_array { + /* IN parameters. */ + uint64_t array_gfn; +}; +typedef struct evtchn_expand_array evtchn_expand_array_t; + +/* + * EVTCHNOP_set_priority: set the priority for an event channel. + */ +struct evtchn_set_priority { + /* IN parameters. */ + uint32_t port; + uint32_t priority; +}; +typedef struct evtchn_set_priority evtchn_set_priority_t; + +/* * ` enum neg_errnoval * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op) * ` @@ -281,6 +321,41 @@ struct evtchn_op { typedef struct evtchn_op evtchn_op_t; DEFINE_XEN_GUEST_HANDLE(evtchn_op_t); +/* + * 2-level ABI + */ + +#define EVTCHN_2L_NR_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64) + +/* + * FIFO ABI + */ + +/* Events may have priorities from 0 (highest) to 15 (lowest). */ +#define EVTCHN_FIFO_PRIORITY_MAX 0 +#define EVTCHN_FIFO_PRIORITY_DEFAULT 7 +#define EVTCHN_FIFO_PRIORITY_MIN 15 + +#define EVTCHN_FIFO_MAX_QUEUES (EVTCHN_FIFO_PRIORITY_MIN + 1) + +typedef uint32_t event_word_t; + +#define EVTCHN_FIFO_PENDING 31 +#define EVTCHN_FIFO_MASKED 30 +#define EVTCHN_FIFO_LINKED 29 + +#define EVTCHN_FIFO_LINK_BITS 17 +#define EVTCHN_FIFO_LINK_MASK ((1 << EVTCHN_FIFO_LINK_BITS) - 1) + +#define EVTCHN_FIFO_NR_CHANNELS (1 << EVTCHN_FIFO_LINK_BITS) + +struct evtchn_fifo_control_block { + uint32_t ready; + uint32_t _rsvd; + uint32_t head[EVTCHN_FIFO_MAX_QUEUES]; +}; +typedef struct evtchn_fifo_control_block evtchn_fifo_control_block_t; + #endif /* __XEN_PUBLIC_EVENT_CHANNEL_H__ */ /* diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index b50bd05..8c5697e 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -552,11 +552,13 @@ struct multicall_entry { typedef struct multicall_entry multicall_entry_t; DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); +#if __XEN_INTERFACE_VERSION__ < 0x00040400 /* - * Event channel endpoints per domain: + * Event channel endpoints per domain (when using the 2-level ABI): * 1024 if a long is 32 bits; 4096 if a long is 64 bits. */ -#define NR_EVENT_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64) +#define NR_EVENT_CHANNELS EVTCHN_2L_NR_CHANNELS +#endif struct vcpu_time_info { /* diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 59f6161..098857c 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -61,7 +61,7 @@ extern struct domain *dom0; #define next_power_of_2(x) (__RDU32((x)-1) + 1) /* Maximum number of event channels for any ABI. */ -#define MAX_NR_EVTCHNS NR_EVENT_CHANNELS +#define MAX_NR_EVTCHNS EVTCHN_2L_NR_CHANNELS #define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) #define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) -- 1.7.2.5
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 07/11] evtchn: implement EVTCHNOP_set_priority and add the set_priority hook
From: David Vrabel <david.vrabel@citrix.com> Implement EVTCHNOP_set_priority. A new set_priority hook added to struct evtchn_port_ops will do the ABI specific validation and setup. If an ABI does not provide a set_priority hook (as is the case of the 2-level ABI), the sub-op will return -ENOSYS. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/common/event_channel.c | 29 +++++++++++++++++++++++++++++ xen/include/xen/event.h | 11 +++++++++++ 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 87bca94..340bf32 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -955,6 +955,27 @@ out: return rc; } +static long evtchn_set_priority(const struct evtchn_set_priority *set_priority) +{ + struct domain *d = current->domain; + unsigned int port = set_priority->port; + long ret; + + spin_lock(&d->event_lock); + + if ( !port_is_valid(d, port) ) + { + spin_unlock(&d->event_lock); + return -EINVAL; + } + + ret = evtchn_port_set_priority(d, evtchn_from_port(d, port), + set_priority->priority); + + spin_unlock(&d->event_lock); + + return ret; +} long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { @@ -1064,6 +1085,14 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_set_priority: { + struct evtchn_set_priority set_priority; + if ( copy_from_guest(&set_priority, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_set_priority(&set_priority); + break; + } + default: rc = -ENOSYS; break; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index cba09e7..70fc271 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -137,6 +137,8 @@ struct evtchn_port_ops { void (*unmask)(struct domain *d, struct evtchn *evtchn); bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); + int (*set_priority)(struct domain *d, struct evtchn *evtchn, + unsigned int priority); void (*print_state)(struct domain *d, const struct evtchn *evtchn); }; @@ -170,6 +172,15 @@ static inline bool_t evtchn_port_is_masked(struct domain *d, return d->evtchn_port_ops->is_masked(d, evtchn); } +static inline int evtchn_port_set_priority(struct domain *d, + struct evtchn *evtchn, + unsigned int priority) +{ + if ( !d->evtchn_port_ops->set_priority ) + return -ENOSYS; + return d->evtchn_port_ops->set_priority(d, evtchn, priority); +} + static inline void evtchn_port_print_state(struct domain *d, const struct evtchn *evtchn) { -- 1.7.2.5
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 08/11] evtchn: add FIFO-based event channel hypercalls and port ops
From: David Vrabel <david.vrabel@citrix.com> Add the implementation for the FIFO-based event channel ABI. The new hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array) and the required evtchn_ops (set_pending, unmask, etc.). Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/Makefile | 1 + xen/common/event_channel.c | 21 ++ xen/common/event_fifo.c | 448 ++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/event_fifo.h | 47 +++++ xen/include/xen/sched.h | 6 +- 5 files changed, 522 insertions(+), 1 deletions(-) create mode 100644 xen/common/event_fifo.c create mode 100644 xen/include/xen/event_fifo.h diff --git a/xen/common/Makefile b/xen/common/Makefile index 69ab94f..3f74016 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -7,6 +7,7 @@ obj-y += domctl.o obj-y += domain.o obj-y += event_2l.o obj-y += event_channel.o +obj-y += event_fifo.o obj-y += grant_table.o obj-y += irq.o obj-y += kernel.o diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 340bf32..0c0bbe4 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -26,6 +26,7 @@ #include <xen/compat.h> #include <xen/guest_access.h> #include <xen/keyhandler.h> +#include <xen/event_fifo.h> #include <asm/current.h> #include <public/xen.h> @@ -1085,6 +1086,24 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_init_control: { + struct evtchn_init_control init_control; + if ( copy_from_guest(&init_control, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_fifo_init_control(&init_control); + if ( !rc && __copy_to_guest(arg, &init_control, 1) ) + rc = -EFAULT; + break; + } + + case EVTCHNOP_expand_array: { + struct evtchn_expand_array expand_array; + if ( copy_from_guest(&expand_array, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_fifo_expand_array(&expand_array); + break; + } + case EVTCHNOP_set_priority: { struct evtchn_set_priority set_priority; if ( copy_from_guest(&set_priority, arg, 1) != 0 ) @@ -1269,6 +1288,8 @@ void evtchn_destroy(struct domain *d) spin_unlock(&d->event_lock); clear_global_virq_handlers(d); + + evtchn_fifo_destroy(d); } diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c new file mode 100644 index 0000000..f1d8622 --- /dev/null +++ b/xen/common/event_fifo.c @@ -0,0 +1,448 @@ +/* + * FIFO event channel management. + * + * Copyright (C) 2013 Citrix Systems R&D Ltd. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ + +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/sched.h> +#include <xen/event.h> +#include <xen/event_fifo.h> +#include <xen/paging.h> +#include <xen/mm.h> + +#include <public/event_channel.h> + +static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d, + unsigned int port) +{ + unsigned int p, w; + + if ( unlikely(port >= d->evtchn_fifo->num_evtchns) ) + return NULL; + + p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + + return d->evtchn_fifo->event_array[p] + w; +} + +static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link) +{ + event_word_t n, o, w; + + w = *word; + + do { + if ( !(w & (1 << EVTCHN_FIFO_LINKED)) ) + return 0; + o = w; + n = (w & ~EVTCHN_FIFO_LINK_MASK) | link; + } while ( (w = cmpxchg(word, o, n)) != o ); + + return 1; +} + +static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) +{ + struct domain *d = v->domain; + unsigned int port; + event_word_t *word; + struct evtchn_fifo_queue *q; + unsigned long flags; + bool_t was_pending; + + port = evtchn->port; + word = evtchn_fifo_word_from_port(d, port); + if ( unlikely(!word) ) + return; + + /* + * No locking around getting the queue. This may race with + * changing the priority but we are allowed to signal the event + * once on the old priority. + */ + q = &v->evtchn_fifo->queue[evtchn->priority]; + + was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word); + + /* + * Link the event if it unmasked and not already linked. + */ + if ( !test_bit(EVTCHN_FIFO_MASKED, word) + && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + { + event_word_t *tail_word; + bool_t linked = 0; + + spin_lock_irqsave(&q->lock, flags); + + /* + * Atomically link the tail to port iff the tail is linked. + * If the tail is unlinked the queue is empty. + * + * If port is the same as tail, the queue is empty but q->tail + * will appear linked as we just set LINKED above. + * + * If the queue is empty (i.e., we haven''t linked to the new + * event), head must be updated. + */ + if ( port != q->tail ) + { + tail_word = evtchn_fifo_word_from_port(d, q->tail); + linked = evtchn_fifo_set_link(tail_word, port); + } + if ( !linked ) + write_atomic(q->head, port); + q->tail = port; + + spin_unlock_irqrestore(&q->lock, flags); + + if ( !test_and_set_bit(q->priority, + &v->evtchn_fifo->control_block->ready) ) + vcpu_mark_events_pending(v); + } + + if ( !was_pending ) + evtchn_check_pollers(d, port); +} + +static void evtchn_fifo_clear_pending(struct domain *d, struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return; + + /* + * Just clear the P bit. + * + * No need to unlink as the guest will unlink and ignore + * non-pending events. + */ + clear_bit(EVTCHN_FIFO_PENDING, word); +} + +static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return; + + clear_bit(EVTCHN_FIFO_MASKED, word); + + /* Relink if pending. */ + if ( test_bit(EVTCHN_FIFO_PENDING, word) ) + evtchn_fifo_set_pending(v, evtchn); +} + +static bool_t evtchn_fifo_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return 0; + + return test_bit(EVTCHN_FIFO_PENDING, word); +} + +static bool_t evtchn_fifo_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return 1; + + return test_bit(EVTCHN_FIFO_MASKED, word); +} + +static int evtchn_fifo_set_priority(struct domain *d, struct evtchn *evtchn, + unsigned int priority) +{ + if ( priority > EVTCHN_FIFO_PRIORITY_MIN ) + return -EINVAL; + + /* + * Only need to switch to the new queue for future events. If the + * event is already pending or in the process of being linked it + * will be on the old queue -- this is fine. + */ + evtchn->priority = priority; + + return 0; +} + +static void evtchn_fifo_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( !word ) + printk("? "); + else if ( test_bit(EVTCHN_FIFO_LINKED, word) ) + printk("%-4u", *word & EVTCHN_FIFO_LINK_MASK); + else + printk("- "); +} + +static const struct evtchn_port_ops evtchn_port_ops_fifo +{ + .set_pending = evtchn_fifo_set_pending, + .clear_pending = evtchn_fifo_clear_pending, + .unmask = evtchn_fifo_unmask, + .is_pending = evtchn_fifo_is_pending, + .is_masked = evtchn_fifo_is_masked, + .set_priority = evtchn_fifo_set_priority, + .print_state = evtchn_fifo_print_state, +}; + +static int map_guest_page(struct domain *d, uint64_t gfn, void **virt) +{ + struct page_info *p; + + p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); + if ( !p ) + return -EINVAL; + + if ( !get_page_type(p, PGT_writable_page) ) + { + put_page(p); + return -EINVAL; + } + + *virt = map_domain_page_global(gfn); + if ( !*virt ) + { + put_page_and_type(p); + return -ENOMEM; + } + return 0; +} + +static void unmap_guest_page(void *virt) +{ + if ( !virt ) + return; + + virt = (void *)((unsigned long)virt & PAGE_MASK); + + unmap_domain_page_global(virt); + put_page_and_type(mfn_to_page(domain_page_map_to_mfn(virt))); +} + +static void cleanup_control_block(struct vcpu *v) +{ + if ( v->evtchn_fifo ) + { + unmap_guest_page(v->evtchn_fifo->control_block); + xfree(v->evtchn_fifo); + v->evtchn_fifo = NULL; + } +} + +static void init_queue(struct vcpu *v, struct evtchn_fifo_queue *q, + unsigned int i) +{ + spin_lock_init(&q->lock); + q->priority = i; + q->head = &v->evtchn_fifo->control_block->head[i]; +} + +static int setup_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset) +{ + struct domain *d = v->domain; + struct evtchn_fifo_vcpu *efv; + void *virt; + unsigned int i; + int rc; + + if ( v->evtchn_fifo ) + return -EINVAL; + + efv = xzalloc(struct evtchn_fifo_vcpu); + if ( !efv ) + return -ENOMEM; + + rc = map_guest_page(d, gfn, &virt); + if ( rc < 0 ) + { + xfree(efv); + return rc; + } + + v->evtchn_fifo = efv; + + v->evtchn_fifo->control_block = virt + offset; + + for ( i = 0; i <= EVTCHN_FIFO_PRIORITY_MIN; i++ ) + init_queue(v, &v->evtchn_fifo->queue[i], i); + + return 0; +} + +/* + * Setup an event array with no pages. + */ +static int setup_event_array(struct domain *d) +{ + if ( d->evtchn_fifo ) + return 0; + + d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain); + if ( !d->evtchn_fifo ) + return -ENOMEM; + + d->evtchn_fifo->num_evtchns = 0; + + return 0; +} + +static void cleanup_event_array(struct domain *d) +{ + unsigned int i; + + if ( !d->evtchn_fifo ) + return; + + for ( i = 0; i < EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES; i++ ) + unmap_guest_page(d->evtchn_fifo->event_array[i]); + xfree(d->evtchn_fifo); +} + +static void set_priority_all(struct domain *d, unsigned int priority) +{ + unsigned int port; + + for ( port = 1; port < d->max_evtchns; port++ ) + { + if ( !port_is_valid(d, port) ) + break; + + evtchn_port_set_priority(d, evtchn_from_port(d, port), priority); + } +} + +int evtchn_fifo_init_control(struct evtchn_init_control *init_control) +{ + struct domain *d = current->domain; + uint32_t vcpu_id; + uint64_t gfn; + uint32_t offset; + struct vcpu *v; + int rc; + + init_control->link_bits = EVTCHN_FIFO_LINK_BITS; + + vcpu_id = init_control->vcpu; + gfn = init_control->control_gfn; + offset = init_control->offset; + + if ( vcpu_id >= d->max_vcpus || !d->vcpu[vcpu_id] ) + return -ENOENT; + v = d->vcpu[vcpu_id]; + + /* Must not cross page boundary. */ + if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) ) + return -EINVAL; + + /* Must be 8-bytes aligned. */ + if ( offset & (8 - 1) ) + return -EINVAL; + + spin_lock(&d->event_lock); + + rc = setup_control_block(v, gfn, offset); + + /* + * If this is the first control block, setup an empty event array + * and switch to the fifo port ops. + * + * Any ports currently bound will have their priority set to the + * default. + */ + if ( rc == 0 && !d->evtchn_fifo ) + { + rc = setup_event_array(d); + if ( rc < 0 ) + cleanup_control_block(v); + else + { + d->evtchn_port_ops = &evtchn_port_ops_fifo; + d->max_evtchns = EVTCHN_FIFO_NR_CHANNELS; + set_priority_all(d, EVTCHN_FIFO_PRIORITY_DEFAULT); + } + } + + spin_unlock(&d->event_lock); + + return rc; +} + +static int add_page_to_event_array(struct domain *d, unsigned long gfn) +{ + void *virt; + unsigned int slot; + int rc; + + slot = d->evtchn_fifo->num_evtchns / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + if ( slot >= EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES ) + return -ENOSPC; + + rc = map_guest_page(d, gfn, &virt); + if ( rc < 0 ) + return rc; + + d->evtchn_fifo->event_array[slot] = virt; + d->evtchn_fifo->num_evtchns += EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + + return 0; +} + +int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array) +{ + struct domain *d = current->domain; + int rc; + + if ( !d->evtchn_fifo ) + return -ENOSYS; + + spin_lock(&d->event_lock); + rc = add_page_to_event_array(d, expand_array->array_gfn); + spin_unlock(&d->event_lock); + + return rc; +} + +void evtchn_fifo_destroy(struct domain *d) +{ + struct vcpu *v; + + for_each_vcpu( d, v ) + cleanup_control_block(v); + cleanup_event_array(d); +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/event_fifo.h b/xen/include/xen/event_fifo.h new file mode 100644 index 0000000..4115f6f --- /dev/null +++ b/xen/include/xen/event_fifo.h @@ -0,0 +1,47 @@ +/* + * FIFO-based event channel ABI. + * + * Copyright (C) 2013 Citrix Systems R&D Ltd. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ +#ifndef __XEN_EVENT_FIFO_H__ +#define __XEN_EVENT_FIFO_H__ + +struct evtchn_fifo_queue { + uint32_t *head; /* points into control block */ + uint32_t tail; + uint8_t priority; + spinlock_t lock; +}; + +struct evtchn_fifo_vcpu { + struct evtchn_fifo_control_block *control_block; + struct evtchn_fifo_queue queue[EVTCHN_FIFO_MAX_QUEUES]; +}; + +#define EVTCHN_FIFO_EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t)) +#define EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES \ + (EVTCHN_FIFO_NR_CHANNELS / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE) + +struct evtchn_fifo_domain { + event_word_t *event_array[EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES]; + unsigned int num_evtchns; +}; + +int evtchn_fifo_init_control(struct evtchn_init_control *init_control); +int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array); +void evtchn_fifo_destroy(struct domain *domain); + +#endif /* __XEN_EVENT_FIFO_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 098857c..ab7be82 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -61,7 +61,7 @@ extern struct domain *dom0; #define next_power_of_2(x) (__RDU32((x)-1) + 1) /* Maximum number of event channels for any ABI. */ -#define MAX_NR_EVTCHNS EVTCHN_2L_NR_CHANNELS +#define MAX_NR_EVTCHNS MAX(EVTCHN_2L_NR_CHANNELS, EVTCHN_FIFO_NR_CHANNELS) #define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) #define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) @@ -95,6 +95,7 @@ struct evtchn } pirq; /* state == ECS_PIRQ */ u16 virq; /* state == ECS_VIRQ */ } u; + u8 priority; #ifdef FLASK_ENABLE void *ssid; #endif @@ -209,6 +210,8 @@ struct vcpu /* Guest-specified relocation of vcpu_info. */ unsigned long vcpu_info_mfn; + struct evtchn_fifo_vcpu *evtchn_fifo; + struct arch_vcpu arch; }; @@ -290,6 +293,7 @@ struct domain unsigned int max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; + struct evtchn_fifo_domain *evtchn_fifo; struct grant_table *grant_table; -- 1.7.2.5
David Vrabel
2013-Oct-02 16:35 UTC
[PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
From: David Vrabel <david.vrabel@citrix.com> Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to set the maximum event channel port a domain may use. This may be used to limit the amount of Xen resources (global mapping space and xenheap) that a domain may use for event channels. A domain that does not have a limit set may use all the event channels supported by the event channel ABI in use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- tools/flask/policy/policy/mls | 2 +- tools/flask/policy/policy/modules/xen/xen.if | 2 +- tools/flask/policy/policy/modules/xen/xen.te | 2 +- xen/common/domctl.c | 8 ++++++++ xen/common/event_channel.c | 7 ++++++- xen/include/public/domctl.h | 13 +++++++++++++ xen/include/xen/sched.h | 1 + xen/xsm/flask/hooks.c | 3 +++ xen/xsm/flask/policy/access_vectors | 2 ++ 9 files changed, 36 insertions(+), 4 deletions(-) diff --git a/tools/flask/policy/policy/mls b/tools/flask/policy/policy/mls index 9290a76..fb603cd 100644 --- a/tools/flask/policy/policy/mls +++ b/tools/flask/policy/policy/mls @@ -74,7 +74,7 @@ mlsconstrain domain { getaffinity getdomaininfo getvcpuinfo getvcpucontext getad ((l1 dom l2) or (t1 == mls_priv)); # all the domain "write" ops -mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext } +mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext set_max_evtchn } ((l1 eq l2) or (t1 == mls_priv)); # This is incomplete - similar constraints must be written for all classes diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if index 97af0a8..63e40f0 100644 --- a/tools/flask/policy/policy/modules/xen/xen.if +++ b/tools/flask/policy/policy/modules/xen/xen.if @@ -48,7 +48,7 @@ define(`create_domain_common'', ` allow $1 $2:domain { create max_vcpus setdomainmaxmem setaddrsize getdomaininfo hypercall setvcpucontext setextvcpucontext getscheduler getvcpuinfo getvcpuextstate getaddrsize - getaffinity setaffinity }; + getaffinity setaffinity set_max_evtchn }; allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim }; allow $1 $2:security check_context; allow $1 $2:shadow enable; diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index c89ce28..5f9de5c 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -73,7 +73,7 @@ allow dom0_t dom0_t:domain { getdomaininfo getvcpuinfo getvcpucontext setdomainmaxmem setdomainhandle setdebugging hypercall settime setaddrsize getaddrsize trigger getextvcpucontext setextvcpucontext getvcpuextstate setvcpuextstate - getpodtarget setpodtarget set_misc_info set_virq_handler + getpodtarget setpodtarget set_misc_info set_virq_handler set_max_evtchn }; allow dom0_t dom0_t:domain2 { set_cpuid gettsc settsc setscheduler diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 9760d50..870eef1 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -863,6 +863,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) } break; + case XEN_DOMCTL_set_max_evtchn: + { + d->max_evtchn_port = min_t(unsigned int, + op->u.set_max_evtchn.max_port, + INT_MAX); + } + break; + default: ret = arch_do_domctl(op, d, u_domctl); break; diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 0c0bbe4..34efd24 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -168,10 +168,14 @@ static int get_free_port(struct domain *d) return -EINVAL; for ( port = 0; port_is_valid(d, port); port++ ) + { + if ( port > d->max_evtchn_port ) + return -ENOSPC; if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; + } - if ( port == d->max_evtchns ) + if ( port == d->max_evtchns || port > d->max_evtchn_port ) return -ENOSPC; if ( !group_from_port(d, port) ) @@ -1230,6 +1234,7 @@ void evtchn_check_pollers(struct domain *d, unsigned int port) int evtchn_init(struct domain *d) { evtchn_2l_init(d); + d->max_evtchn_port = INT_MAX; d->evtchn = alloc_evtchn_bucket(d, 0); if ( !d->evtchn ) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 4c5b2bb..d4e479f 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -852,6 +852,17 @@ struct xen_domctl_set_broken_page_p2m { typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t); +/* + * XEN_DOMCTL_set_max_evtchn: sets the maximum event channel port + * number the guest may use. Use this limit the amount of resources + * (global mapping space, xenheap) a guest may use for event channels. + */ +struct xen_domctl_set_max_evtchn { + uint32_t max_port; +}; +typedef struct xen_domctl_set_max_evtchn xen_domctl_set_max_evtchn_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_max_evtchn_t); + struct xen_domctl { uint32_t cmd; #define XEN_DOMCTL_createdomain 1 @@ -920,6 +931,7 @@ struct xen_domctl { #define XEN_DOMCTL_set_broken_page_p2m 67 #define XEN_DOMCTL_setnodeaffinity 68 #define XEN_DOMCTL_getnodeaffinity 69 +#define XEN_DOMCTL_set_max_evtchn 70 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -975,6 +987,7 @@ struct xen_domctl { struct xen_domctl_set_access_required access_required; struct xen_domctl_audit_p2m audit_p2m; struct xen_domctl_set_virq_handler set_virq_handler; + struct xen_domctl_set_max_evtchn set_max_evtchn; struct xen_domctl_gdbsx_memio gdbsx_guest_memio; struct xen_domctl_set_broken_page_p2m set_broken_page_p2m; struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ab7be82..0da0096 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -291,6 +291,7 @@ struct domain struct evtchn *evtchn; /* first bucket only */ struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */ unsigned int max_evtchns; + unsigned int max_evtchn_port; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; struct evtchn_fifo_domain *evtchn_fifo; diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index fa0589a..548df47 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) case XEN_DOMCTL_audit_p2m: return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); + case XEN_DOMCTL_set_max_evtchn: + return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SET_MAX_EVTCHN); + default: printk("flask_domctl: Unknown op %d\n", cmd); return -EPERM; diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 5dfe13b..1fbe241 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -194,6 +194,8 @@ class domain2 setscheduler # XENMEM_claim_pages setclaim +# XEN_DOMCTL_set_max_evtchn + set_max_evtchn } # Similar to class domain, but primarily contains domctls related to HVM domains -- 1.7.2.5
From: David Vrabel <david.vrabel@citrix.com> Add xc_domain_set_max_evtchn(), a wrapper around the DOMCTL_set_max_evtchn hypercall. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> --- tools/libxc/xc_domain.c | 11 +++++++++++ tools/libxc/xenctrl.h | 12 ++++++++++++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 81316d3..2cea6e3 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1766,6 +1766,17 @@ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq) return do_domctl(xch, &domctl); } +int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid, + uint32_t max_port) +{ + DECLARE_DOMCTL; + + domctl.cmd = XEN_DOMCTL_set_max_evtchn; + domctl.domain = domid; + domctl.u.set_max_evtchn.max_port = max_port; + return do_domctl(xch, &domctl); +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 58d51f3..8cf3f3b 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -847,6 +847,18 @@ int xc_domain_set_access_required(xc_interface *xch, */ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq); +/** + * Set the maximum event channel port a domain may bind. + * + * This does not affect ports that are already bound. + * + * @param xch a handle to an open hypervisor interface + * @param domid the domain id + * @param max_port maximum port number + */ +int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid, + uint32_t max_port); + /* * CPUPOOL MANAGEMENT FUNCTIONS */ -- 1.7.2.5
David Vrabel
2013-Oct-02 16:36 UTC
[PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
From: David Vrabel <david.vrabel@citrix.com> Add the ''event_channels'' option to the xl configuration file to limit the number of event channels that domain may use. Plumb this option through to libxl via a new libxl_build_info field and call xc_domain_set_max_evtchn() in the post build stage of domain creation. A new LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS #define indicates that this new field is available. The default value of 1023 limits the domain to using the minimum amount of global mapping pages and at most 5 xenheap pages. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> --- docs/man/xl.cfg.pod.5 | 11 +++++++++++ tools/libxl/libxl.h | 5 +++++ tools/libxl/libxl_create.c | 3 +++ tools/libxl/libxl_dom.c | 7 +++++++ tools/libxl/libxl_types.idl | 1 + tools/libxl/xl_cmdimpl.c | 3 +++ 6 files changed, 30 insertions(+), 0 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index d57cd4d..6312eb0 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -572,6 +572,17 @@ Allow a guest to access specific physical IRQs. It is recommended to use this option only for trusted VMs under administrator control. +=item B<max_event_channels=N> + +Limit the guest to using at most N event channels (PV interrupts). +Guests use hypervisor resources for each event channel they use. + +The default of 1023 should be sufficient for typical guests. The +maximum value depends what the guest supports. Guests supporting the +FIFO-based event channel ABI support up to 131,071 event channels. +Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit +x86). + =back =head2 Paravirtualised (PV) Guest Specific Options diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 4cab294..30712c2 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -90,6 +90,11 @@ #define LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE 1 /* + * The libxl_domain_build_info has the event_channels field. + */ +#define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1 + +/* * libxl ABI compatibility * * The only guarantee which libxl makes regarding ABI compatibility diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 7567238..790eb32 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -208,6 +208,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, libxl_defbool_setdefault(&b_info->disable_migrate, false); + if (!b_info->event_channels) + b_info->event_channels = 1023; + switch (b_info->type) { case LIBXL_DOMAIN_TYPE_HVM: if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 6e2252a..356f920 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -268,6 +268,13 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid, if (rc) return rc; + rc = xc_domain_set_max_evtchn(ctx->xch, domid, info->event_channels); + if (rc) { + LOG(ERROR, "Failed to set event channel limit to %d (%d)", + info->event_channels, rc); + return ERROR_FAIL; + } + libxl_cpuid_apply_policy(ctx, domid); if (info->cpuid != NULL) libxl_cpuid_set(ctx, domid, info->cpuid); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 049dbb5..7bf517d 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -308,6 +308,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("irqs", Array(uint32, "num_irqs")), ("iomem", Array(libxl_iomem_range, "num_iomem")), ("claim_mode", libxl_defbool), + ("event_channels", uint32), ("u", KeyedUnion(None, libxl_domain_type, "type", [("hvm", Struct(None, [("firmware", string), ("bios", libxl_bios_type), diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 642b130..c54b44e 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -813,6 +813,9 @@ static void parse_config_data(const char *config_source, if (!xlu_cfg_get_long (config, "videoram", &l, 0)) b_info->video_memkb = l * 1024; + if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0)) + b_info->event_channels = l; + switch(b_info->type) { case LIBXL_DOMAIN_TYPE_HVM: if (!xlu_cfg_get_string (config, "kernel", &buf, 0)) -- 1.7.2.5
David Vrabel
2013-Oct-02 17:06 UTC
Re: [PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
On 02/10/13 17:35, David Vrabel wrote:> > --- a/xen/xsm/flask/hooks.c > +++ b/xen/xsm/flask/hooks.c > @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) > case XEN_DOMCTL_audit_p2m: > return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); > > + case XEN_DOMCTL_set_max_evtchn: > + return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SET_MAX_EVTCHN);#Sorry, I forgot to try a build with XSM and FLASK enabled. This should have been SECCLASS_DOMAIN2 and DOMAIN2__SET_MAX_EVTCHN.> + > default: > printk("flask_domctl: Unknown op %d\n", cmd); > return -EPERM; > diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors > index 5dfe13b..1fbe241 100644 > --- a/xen/xsm/flask/policy/access_vectors > +++ b/xen/xsm/flask/policy/access_vectors > @@ -194,6 +194,8 @@ class domain2 > setscheduler > # XENMEM_claim_pages > setclaim > +# XEN_DOMCTL_set_max_evtchn > + set_max_evtchn > } > > # Similar to class domain, but primarily contains domctls related to HVM domainsDavid
Ian Campbell
2013-Oct-03 08:29 UTC
Re: [PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
On Wed, 2013-10-02 at 17:36 +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Add the ''event_channels'' option to the xl configuration file to limit > the number of event channels that domain may use. > > Plumb this option through to libxl via a new libxl_build_info field > and call xc_domain_set_max_evtchn() in the post build stage of domain > creation. > > A new LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS #define indicates that this > new field is available. > > The default value of 1023 limits the domain to using the minimum > amount of global mapping pages and at most 5 xenheap pages. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com>Acked-by: Ian Campbell <ian.campbell@citrix.com> I''m happy for whoever commits the rest of the series to take this one too.> Cc: Ian Jackson <ian.jackson@eu.citrix.com> > --- > docs/man/xl.cfg.pod.5 | 11 +++++++++++ > tools/libxl/libxl.h | 5 +++++ > tools/libxl/libxl_create.c | 3 +++ > tools/libxl/libxl_dom.c | 7 +++++++ > tools/libxl/libxl_types.idl | 1 + > tools/libxl/xl_cmdimpl.c | 3 +++ > 6 files changed, 30 insertions(+), 0 deletions(-) > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > index d57cd4d..6312eb0 100644 > --- a/docs/man/xl.cfg.pod.5 > +++ b/docs/man/xl.cfg.pod.5 > @@ -572,6 +572,17 @@ Allow a guest to access specific physical IRQs. > It is recommended to use this option only for trusted VMs under > administrator control. > > +=item B<max_event_channels=N> > + > +Limit the guest to using at most N event channels (PV interrupts). > +Guests use hypervisor resources for each event channel they use. > + > +The default of 1023 should be sufficient for typical guests. The > +maximum value depends what the guest supports. Guests supporting the > +FIFO-based event channel ABI support up to 131,071 event channels. > +Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit > +x86). > + > =back > > =head2 Paravirtualised (PV) Guest Specific Options > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 4cab294..30712c2 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -90,6 +90,11 @@ > #define LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE 1 > > /* > + * The libxl_domain_build_info has the event_channels field. > + */ > +#define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1 > + > +/* > * libxl ABI compatibility > * > * The only guarantee which libxl makes regarding ABI compatibility > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 7567238..790eb32 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -208,6 +208,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, > > libxl_defbool_setdefault(&b_info->disable_migrate, false); > > + if (!b_info->event_channels) > + b_info->event_channels = 1023; > + > switch (b_info->type) { > case LIBXL_DOMAIN_TYPE_HVM: > if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT) > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index 6e2252a..356f920 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -268,6 +268,13 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid, > if (rc) > return rc; > > + rc = xc_domain_set_max_evtchn(ctx->xch, domid, info->event_channels); > + if (rc) { > + LOG(ERROR, "Failed to set event channel limit to %d (%d)", > + info->event_channels, rc); > + return ERROR_FAIL; > + } > + > libxl_cpuid_apply_policy(ctx, domid); > if (info->cpuid != NULL) > libxl_cpuid_set(ctx, domid, info->cpuid); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index 049dbb5..7bf517d 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -308,6 +308,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ > ("irqs", Array(uint32, "num_irqs")), > ("iomem", Array(libxl_iomem_range, "num_iomem")), > ("claim_mode", libxl_defbool), > + ("event_channels", uint32), > ("u", KeyedUnion(None, libxl_domain_type, "type", > [("hvm", Struct(None, [("firmware", string), > ("bios", libxl_bios_type), > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 642b130..c54b44e 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -813,6 +813,9 @@ static void parse_config_data(const char *config_source, > if (!xlu_cfg_get_long (config, "videoram", &l, 0)) > b_info->video_memkb = l * 1024; > > + if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0)) > + b_info->event_channels = l; > + > switch(b_info->type) { > case LIBXL_DOMAIN_TYPE_HVM: > if (!xlu_cfg_get_string (config, "kernel", &buf, 0))
David Vrabel
2013-Oct-04 11:56 UTC
Re: [PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
On 02/10/13 18:06, David Vrabel wrote:> On 02/10/13 17:35, David Vrabel wrote: >> >> --- a/xen/xsm/flask/hooks.c >> +++ b/xen/xsm/flask/hooks.c >> @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) >> case XEN_DOMCTL_audit_p2m: >> return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); >> >> + case XEN_DOMCTL_set_max_evtchn: >> + return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SET_MAX_EVTCHN);# > > Sorry, I forgot to try a build with XSM and FLASK enabled. This should > have been SECCLASS_DOMAIN2 and DOMAIN2__SET_MAX_EVTCHN.And here''s a fixed version of the patch. Daniel, can you review the XSM parts of this, please? 8<----------------------------------- xen: Add DOMCTL to limit the number of event channels a domain may use Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to set the maximum event channel port a domain may use. This may be used to limit the amount of Xen resources (global mapping space and xenheap) that a domain may use for event channels. A domain that does not have a limit set may use all the event channels supported by the event channel ABI in use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- tools/flask/policy/policy/mls | 2 +- tools/flask/policy/policy/modules/xen/xen.if | 2 +- tools/flask/policy/policy/modules/xen/xen.te | 2 +- xen/common/domctl.c | 8 ++++++++ xen/common/event_channel.c | 7 ++++++- xen/include/public/domctl.h | 13 +++++++++++++ xen/include/xen/sched.h | 1 + xen/xsm/flask/hooks.c | 3 +++ xen/xsm/flask/policy/access_vectors | 2 ++ 9 files changed, 36 insertions(+), 4 deletions(-) diff --git a/tools/flask/policy/policy/mls b/tools/flask/policy/policy/mls index 9290a76..fb603cd 100644 --- a/tools/flask/policy/policy/mls +++ b/tools/flask/policy/policy/mls @@ -74,7 +74,7 @@ mlsconstrain domain { getaffinity getdomaininfo getvcpuinfo getvcpucontext getad ((l1 dom l2) or (t1 == mls_priv)); # all the domain "write" ops -mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext } +mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext set_max_evtchn } ((l1 eq l2) or (t1 == mls_priv)); # This is incomplete - similar constraints must be written for all classes diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if index 97af0a8..63e40f0 100644 --- a/tools/flask/policy/policy/modules/xen/xen.if +++ b/tools/flask/policy/policy/modules/xen/xen.if @@ -48,7 +48,7 @@ define(`create_domain_common'', ` allow $1 $2:domain { create max_vcpus setdomainmaxmem setaddrsize getdomaininfo hypercall setvcpucontext setextvcpucontext getscheduler getvcpuinfo getvcpuextstate getaddrsize - getaffinity setaffinity }; + getaffinity setaffinity set_max_evtchn }; allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim }; allow $1 $2:security check_context; allow $1 $2:shadow enable; diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index c89ce28..5f9de5c 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -73,7 +73,7 @@ allow dom0_t dom0_t:domain { getdomaininfo getvcpuinfo getvcpucontext setdomainmaxmem setdomainhandle setdebugging hypercall settime setaddrsize getaddrsize trigger getextvcpucontext setextvcpucontext getvcpuextstate setvcpuextstate - getpodtarget setpodtarget set_misc_info set_virq_handler + getpodtarget setpodtarget set_misc_info set_virq_handler set_max_evtchn }; allow dom0_t dom0_t:domain2 { set_cpuid gettsc settsc setscheduler diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 9760d50..870eef1 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -863,6 +863,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) } break; + case XEN_DOMCTL_set_max_evtchn: + { + d->max_evtchn_port = min_t(unsigned int, + op->u.set_max_evtchn.max_port, + INT_MAX); + } + break; + default: ret = arch_do_domctl(op, d, u_domctl); break; diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 0c0bbe4..34efd24 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -168,10 +168,14 @@ static int get_free_port(struct domain *d) return -EINVAL; for ( port = 0; port_is_valid(d, port); port++ ) + { + if ( port > d->max_evtchn_port ) + return -ENOSPC; if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; + } - if ( port == d->max_evtchns ) + if ( port == d->max_evtchns || port > d->max_evtchn_port ) return -ENOSPC; if ( !group_from_port(d, port) ) @@ -1230,6 +1234,7 @@ void evtchn_check_pollers(struct domain *d, unsigned int port) int evtchn_init(struct domain *d) { evtchn_2l_init(d); + d->max_evtchn_port = INT_MAX; d->evtchn = alloc_evtchn_bucket(d, 0); if ( !d->evtchn ) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 4c5b2bb..d4e479f 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -852,6 +852,17 @@ struct xen_domctl_set_broken_page_p2m { typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t); +/* + * XEN_DOMCTL_set_max_evtchn: sets the maximum event channel port + * number the guest may use. Use this limit the amount of resources + * (global mapping space, xenheap) a guest may use for event channels. + */ +struct xen_domctl_set_max_evtchn { + uint32_t max_port; +}; +typedef struct xen_domctl_set_max_evtchn xen_domctl_set_max_evtchn_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_max_evtchn_t); + struct xen_domctl { uint32_t cmd; #define XEN_DOMCTL_createdomain 1 @@ -920,6 +931,7 @@ struct xen_domctl { #define XEN_DOMCTL_set_broken_page_p2m 67 #define XEN_DOMCTL_setnodeaffinity 68 #define XEN_DOMCTL_getnodeaffinity 69 +#define XEN_DOMCTL_set_max_evtchn 70 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -975,6 +987,7 @@ struct xen_domctl { struct xen_domctl_set_access_required access_required; struct xen_domctl_audit_p2m audit_p2m; struct xen_domctl_set_virq_handler set_virq_handler; + struct xen_domctl_set_max_evtchn set_max_evtchn; struct xen_domctl_gdbsx_memio gdbsx_guest_memio; struct xen_domctl_set_broken_page_p2m set_broken_page_p2m; struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ab7be82..0da0096 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -291,6 +291,7 @@ struct domain struct evtchn *evtchn; /* first bucket only */ struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */ unsigned int max_evtchns; + unsigned int max_evtchn_port; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; struct evtchn_fifo_domain *evtchn_fifo; diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index fa0589a..b1e2593 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) case XEN_DOMCTL_audit_p2m: return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); + case XEN_DOMCTL_set_max_evtchn: + return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_MAX_EVTCHN); + default: printk("flask_domctl: Unknown op %d\n", cmd); return -EPERM; diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 5dfe13b..1fbe241 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -194,6 +194,8 @@ class domain2 setscheduler # XENMEM_claim_pages setclaim +# XEN_DOMCTL_set_max_evtchn + set_max_evtchn } # Similar to class domain, but primarily contains domctls related to HVM domains -- 1.7.2.5
Jan Beulich
2013-Oct-04 16:00 UTC
Re: [PATCH 08/11] evtchn: add FIFO-based event channel hypercalls and port ops
>>> On 02.10.13 at 18:35, David Vrabel <david.vrabel@citrix.com> wrote: > +static void unmap_guest_page(void *virt) > +{ > + if ( !virt ) > + return; > + > + virt = (void *)((unsigned long)virt & PAGE_MASK); > + > + unmap_domain_page_global(virt); > + put_page_and_type(mfn_to_page(domain_page_map_to_mfn(virt)));Did you test this? It should not have worked - you ought to call domain_page_map_to_mfn() before unmap_domain_page_global(). Jan
Jan Beulich
2013-Oct-04 16:02 UTC
Re: [PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
>>> On 04.10.13 at 13:56, David Vrabel <david.vrabel@citrix.com> wrote: > On 02/10/13 18:06, David Vrabel wrote: >> On 02/10/13 17:35, David Vrabel wrote: >>> >>> --- a/xen/xsm/flask/hooks.c >>> +++ b/xen/xsm/flask/hooks.c >>> @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) >>> case XEN_DOMCTL_audit_p2m: >>> return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); >>> >>> + case XEN_DOMCTL_set_max_evtchn: >>> + return current_has_perm(d, SECCLASS_DOMAIN, > DOMAIN__SET_MAX_EVTCHN);# >> >> Sorry, I forgot to try a build with XSM and FLASK enabled. This should >> have been SECCLASS_DOMAIN2 and DOMAIN2__SET_MAX_EVTCHN. > > And here''s a fixed version of the patch. > > Daniel, can you review the XSM parts of this, please? > > 8<----------------------------------- > xen: Add DOMCTL to limit the number of event channels a domain may use > > Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to > set the maximum event channel port a domain may use. This may be used to > limit the amount of Xen resources (global mapping space and xenheap) that > a domain may use for event channels. > > A domain that does not have a limit set may use all the event channels > supported by the event channel ABI in use. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com>Just to clarify once more: This is only for the non-XSM parts; I''m relying on Daniel to do the review on that front. Jan> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > tools/flask/policy/policy/mls | 2 +- > tools/flask/policy/policy/modules/xen/xen.if | 2 +- > tools/flask/policy/policy/modules/xen/xen.te | 2 +- > xen/common/domctl.c | 8 ++++++++ > xen/common/event_channel.c | 7 ++++++- > xen/include/public/domctl.h | 13 +++++++++++++ > xen/include/xen/sched.h | 1 + > xen/xsm/flask/hooks.c | 3 +++ > xen/xsm/flask/policy/access_vectors | 2 ++ > 9 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/tools/flask/policy/policy/mls b/tools/flask/policy/policy/mls > index 9290a76..fb603cd 100644 > --- a/tools/flask/policy/policy/mls > +++ b/tools/flask/policy/policy/mls > @@ -74,7 +74,7 @@ mlsconstrain domain { getaffinity getdomaininfo getvcpuinfo > getvcpucontext getad > ((l1 dom l2) or (t1 == mls_priv)); > > # all the domain "write" ops > -mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus > destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging > hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext } > +mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus > destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging > hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext > set_max_evtchn } > ((l1 eq l2) or (t1 == mls_priv)); > > # This is incomplete - similar constraints must be written for all classes > diff --git a/tools/flask/policy/policy/modules/xen/xen.if > b/tools/flask/policy/policy/modules/xen/xen.if > index 97af0a8..63e40f0 100644 > --- a/tools/flask/policy/policy/modules/xen/xen.if > +++ b/tools/flask/policy/policy/modules/xen/xen.if > @@ -48,7 +48,7 @@ define(`create_domain_common'', ` > allow $1 $2:domain { create max_vcpus setdomainmaxmem setaddrsize > getdomaininfo hypercall setvcpucontext setextvcpucontext > getscheduler getvcpuinfo getvcpuextstate getaddrsize > - getaffinity setaffinity }; > + getaffinity setaffinity set_max_evtchn }; > allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim }; > allow $1 $2:security check_context; > allow $1 $2:shadow enable; > diff --git a/tools/flask/policy/policy/modules/xen/xen.te > b/tools/flask/policy/policy/modules/xen/xen.te > index c89ce28..5f9de5c 100644 > --- a/tools/flask/policy/policy/modules/xen/xen.te > +++ b/tools/flask/policy/policy/modules/xen/xen.te > @@ -73,7 +73,7 @@ allow dom0_t dom0_t:domain { > getdomaininfo getvcpuinfo getvcpucontext setdomainmaxmem setdomainhandle > setdebugging hypercall settime setaddrsize getaddrsize trigger > getextvcpucontext setextvcpucontext getvcpuextstate setvcpuextstate > - getpodtarget setpodtarget set_misc_info set_virq_handler > + getpodtarget setpodtarget set_misc_info set_virq_handler set_max_evtchn > }; > allow dom0_t dom0_t:domain2 { > set_cpuid gettsc settsc setscheduler > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > index 9760d50..870eef1 100644 > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -863,6 +863,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) > u_domctl) > } > break; > > + case XEN_DOMCTL_set_max_evtchn: > + { > + d->max_evtchn_port = min_t(unsigned int, > + op->u.set_max_evtchn.max_port, > + INT_MAX); > + } > + break; > + > default: > ret = arch_do_domctl(op, d, u_domctl); > break; > diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c > index 0c0bbe4..34efd24 100644 > --- a/xen/common/event_channel.c > +++ b/xen/common/event_channel.c > @@ -168,10 +168,14 @@ static int get_free_port(struct domain *d) > return -EINVAL; > > for ( port = 0; port_is_valid(d, port); port++ ) > + { > + if ( port > d->max_evtchn_port ) > + return -ENOSPC; > if ( evtchn_from_port(d, port)->state == ECS_FREE ) > return port; > + } > > - if ( port == d->max_evtchns ) > + if ( port == d->max_evtchns || port > d->max_evtchn_port ) > return -ENOSPC; > > if ( !group_from_port(d, port) ) > @@ -1230,6 +1234,7 @@ void evtchn_check_pollers(struct domain *d, unsigned > int port) > int evtchn_init(struct domain *d) > { > evtchn_2l_init(d); > + d->max_evtchn_port = INT_MAX; > > d->evtchn = alloc_evtchn_bucket(d, 0); > if ( !d->evtchn ) > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > index 4c5b2bb..d4e479f 100644 > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -852,6 +852,17 @@ struct xen_domctl_set_broken_page_p2m { > typedef struct xen_domctl_set_broken_page_p2m > xen_domctl_set_broken_page_p2m_t; > DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t); > > +/* > + * XEN_DOMCTL_set_max_evtchn: sets the maximum event channel port > + * number the guest may use. Use this limit the amount of resources > + * (global mapping space, xenheap) a guest may use for event channels. > + */ > +struct xen_domctl_set_max_evtchn { > + uint32_t max_port; > +}; > +typedef struct xen_domctl_set_max_evtchn xen_domctl_set_max_evtchn_t; > +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_max_evtchn_t); > + > struct xen_domctl { > uint32_t cmd; > #define XEN_DOMCTL_createdomain 1 > @@ -920,6 +931,7 @@ struct xen_domctl { > #define XEN_DOMCTL_set_broken_page_p2m 67 > #define XEN_DOMCTL_setnodeaffinity 68 > #define XEN_DOMCTL_getnodeaffinity 69 > +#define XEN_DOMCTL_set_max_evtchn 70 > #define XEN_DOMCTL_gdbsx_guestmemio 1000 > #define XEN_DOMCTL_gdbsx_pausevcpu 1001 > #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 > @@ -975,6 +987,7 @@ struct xen_domctl { > struct xen_domctl_set_access_required access_required; > struct xen_domctl_audit_p2m audit_p2m; > struct xen_domctl_set_virq_handler set_virq_handler; > + struct xen_domctl_set_max_evtchn set_max_evtchn; > struct xen_domctl_gdbsx_memio gdbsx_guest_memio; > struct xen_domctl_set_broken_page_p2m set_broken_page_p2m; > struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index ab7be82..0da0096 100644 > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -291,6 +291,7 @@ struct domain > struct evtchn *evtchn; /* first bucket only > */ > struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets > */ > unsigned int max_evtchns; > + unsigned int max_evtchn_port; > spinlock_t event_lock; > const struct evtchn_port_ops *evtchn_port_ops; > struct evtchn_fifo_domain *evtchn_fifo; > diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c > index fa0589a..b1e2593 100644 > --- a/xen/xsm/flask/hooks.c > +++ b/xen/xsm/flask/hooks.c > @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) > case XEN_DOMCTL_audit_p2m: > return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); > > + case XEN_DOMCTL_set_max_evtchn: > + return current_has_perm(d, SECCLASS_DOMAIN2, > DOMAIN2__SET_MAX_EVTCHN); > + > default: > printk("flask_domctl: Unknown op %d\n", cmd); > return -EPERM; > diff --git a/xen/xsm/flask/policy/access_vectors > b/xen/xsm/flask/policy/access_vectors > index 5dfe13b..1fbe241 100644 > --- a/xen/xsm/flask/policy/access_vectors > +++ b/xen/xsm/flask/policy/access_vectors > @@ -194,6 +194,8 @@ class domain2 > setscheduler > # XENMEM_claim_pages > setclaim > +# XEN_DOMCTL_set_max_evtchn > + set_max_evtchn > } > > # Similar to class domain, but primarily contains domctls related to HVM > domains > -- > 1.7.2.5
Daniel De Graaf
2013-Oct-07 16:00 UTC
Re: [PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
On 10/04/2013 07:56 AM, David Vrabel wrote:> On 02/10/13 18:06, David Vrabel wrote: >> On 02/10/13 17:35, David Vrabel wrote: >>> >>> --- a/xen/xsm/flask/hooks.c >>> +++ b/xen/xsm/flask/hooks.c >>> @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) >>> case XEN_DOMCTL_audit_p2m: >>> return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); >>> >>> + case XEN_DOMCTL_set_max_evtchn: >>> + return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SET_MAX_EVTCHN);# >> >> Sorry, I forgot to try a build with XSM and FLASK enabled. This should >> have been SECCLASS_DOMAIN2 and DOMAIN2__SET_MAX_EVTCHN. > > And here''s a fixed version of the patch. > > Daniel, can you review the XSM parts of this, please? > > 8<----------------------------------- > xen: Add DOMCTL to limit the number of event channels a domain may use > > Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to > set the maximum event channel port a domain may use. This may be used to > limit the amount of Xen resources (global mapping space and xenheap) that > a domain may use for event channels. > > A domain that does not have a limit set may use all the event channels > supported by the event channel ABI in use. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>With the policy changes tweaked so that it compiles (see below): Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>> --- > tools/flask/policy/policy/mls | 2 +- > tools/flask/policy/policy/modules/xen/xen.if | 2 +- > tools/flask/policy/policy/modules/xen/xen.te | 2 +- > xen/common/domctl.c | 8 ++++++++ > xen/common/event_channel.c | 7 ++++++- > xen/include/public/domctl.h | 13 +++++++++++++ > xen/include/xen/sched.h | 1 + > xen/xsm/flask/hooks.c | 3 +++ > xen/xsm/flask/policy/access_vectors | 2 ++ > 9 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/tools/flask/policy/policy/mls b/tools/flask/policy/policy/mls > index 9290a76..fb603cd 100644 > --- a/tools/flask/policy/policy/mls > +++ b/tools/flask/policy/policy/mls > @@ -74,7 +74,7 @@ mlsconstrain domain { getaffinity getdomaininfo getvcpuinfo getvcpucontext getad > ((l1 dom l2) or (t1 == mls_priv)); > > # all the domain "write" ops > -mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext } > +mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext set_max_evtchn } > ((l1 eq l2) or (t1 == mls_priv)); > > # This is incomplete - similar constraints must be written for all classes > diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if > index 97af0a8..63e40f0 100644 > --- a/tools/flask/policy/policy/modules/xen/xen.if > +++ b/tools/flask/policy/policy/modules/xen/xen.if > @@ -48,7 +48,7 @@ define(`create_domain_common'', ` > allow $1 $2:domain { create max_vcpus setdomainmaxmem setaddrsize > getdomaininfo hypercall setvcpucontext setextvcpucontext > getscheduler getvcpuinfo getvcpuextstate getaddrsize > - getaffinity setaffinity }; > + getaffinity setaffinity set_max_evtchn }; > allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim }; > allow $1 $2:security check_context; > allow $1 $2:shadow enable; > diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te > index c89ce28..5f9de5c 100644 > --- a/tools/flask/policy/policy/modules/xen/xen.te > +++ b/tools/flask/policy/policy/modules/xen/xen.te > @@ -73,7 +73,7 @@ allow dom0_t dom0_t:domain { > getdomaininfo getvcpuinfo getvcpucontext setdomainmaxmem setdomainhandle > setdebugging hypercall settime setaddrsize getaddrsize trigger > getextvcpucontext setextvcpucontext getvcpuextstate setvcpuextstate > - getpodtarget setpodtarget set_misc_info set_virq_handler > + getpodtarget setpodtarget set_misc_info set_virq_handler set_max_evtchn > }; > allow dom0_t dom0_t:domain2 { > set_cpuid gettsc settsc setschedulerWith the set_max_evtchn permission moved to domain2, these files also need to be changed (just moving the addition down to domain2). The modification to mls can be dropped: the existing domain2 controls are not present in this file, there is already a comment noting that the constraints are incomplete, and the example XSM policy does not use MLS. You should be able to test the compilation using "make -C tools/flask/policy". [...]> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c > index fa0589a..b1e2593 100644 > --- a/xen/xsm/flask/hooks.c > +++ b/xen/xsm/flask/hooks.c > @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) > case XEN_DOMCTL_audit_p2m: > return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); > > + case XEN_DOMCTL_set_max_evtchn: > + return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_MAX_EVTCHN); > + > default: > printk("flask_domctl: Unknown op %d\n", cmd); > return -EPERM; > diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors > index 5dfe13b..1fbe241 100644 > --- a/xen/xsm/flask/policy/access_vectors > +++ b/xen/xsm/flask/policy/access_vectors > @@ -194,6 +194,8 @@ class domain2 > setscheduler > # XENMEM_claim_pages > setclaim > +# XEN_DOMCTL_set_max_evtchn > + set_max_evtchn > } > > # Similar to class domain, but primarily contains domctls related to HVM domains >-- Daniel De Graaf National Security Agency
Ian Jackson
2013-Oct-14 14:31 UTC
Re: [PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
David Vrabel writes ("[PATCH 11/11] libxl,xl: add event_channels option to xl configuration file"):> +=item B<max_event_channels=N> > + > +Limit the guest to using at most N event channels (PV interrupts). > +Guests use hypervisor resources for each event channel they use. > + > +The default of 1023 should be sufficient for typical guests. The > +maximum value depends what the guest supports. Guests supporting the > +FIFO-based event channel ABI support up to 131,071 event channels. > +Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit > +x86).It''s not clear to me what happens if you specify a larger value for max_event_channels than the guest and/or hypervisor support. Ian.
David Vrabel
2013-Oct-14 16:43 UTC
Re: [PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
On 14/10/13 15:31, Ian Jackson wrote:> David Vrabel writes ("[PATCH 11/11] libxl,xl: add event_channels option to xl configuration file"): >> +=item B<max_event_channels=N> >> + >> +Limit the guest to using at most N event channels (PV interrupts). >> +Guests use hypervisor resources for each event channel they use. >> + >> +The default of 1023 should be sufficient for typical guests. The >> +maximum value depends what the guest supports. Guests supporting the >> +FIFO-based event channel ABI support up to 131,071 event channels. >> +Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit >> +x86). > > It''s not clear to me what happens if you specify a larger value for > max_event_channels than the guest and/or hypervisor support.The guest is limited to the minimum of: - what it supports - what the hypervisor support - what max_event_channels is set to. The docs should be clarified to say this. David
Ian Jackson
2013-Oct-14 16:58 UTC
Re: [PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
David Vrabel writes ("Re: [PATCH 11/11] libxl,xl: add event_channels option to xl configuration file"):> On 14/10/13 15:31, Ian Jackson wrote: > > It''s not clear to me what happens if you specify a larger value for > > max_event_channels than the guest and/or hypervisor support. > > The guest is limited to the minimum of: > > - what it supports > - what the hypervisor support > - what max_event_channels is set to.Good. (I was worried that this might be treated as a configuration error, which would be annoying.)> The docs should be clarified to say this.Yes, please :-). Ian.