This is a complete implementation of the hypervisor and xl toolstack parts of the FIFO-based event channel ABI described in this design document: http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf Changes in draft F are: - READY field in the control block is now 32-bits (so guests only need to support atomic bit ops on 32-bit words). This is only a documentation change as the implementation already used a uint32_t. - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. - DomUs default to unlimited number of event channels requiring the toolstack to set a limit. The toolstack defaults to limiting guests to 127 event channels if the event_channels option is omitted. This means the minimum amount of both Xen heap and global mapping space is used regardless of which ABI is used. If this is considered too restrictive a limit, 1023 would be another sensible default (limits the guest to a single event array page but 5 xenheap pages for the struct evtchns). An updated version of the Linux patch series is not quite ready yet. There is one remaining issue but fixing this will not require any changes to the hypervisor ABI or implementation. The remaining issue requires preallocating space in the evtchn-to-irq map as this cannot be expanded in pirq_startup() (since this function cannot return a failure). The latest Linux changes can be found in the orochi-v4w branch of: git://xenbits.xen.org/people/dvrabel/linux.git Patch 1-4 do some preparatory work for supporting alternate ABIs. Patch 5 expands the number of evtchn objects a domain may have by changing how they are allocated. Patch 6 adds the public ABI. Patch 7 adds the EVTCHNOP_set_priority implementation. This will return -ENOSYS for ABIs that do not support priority. Patch 8 adds the FIFO-based ABI implementation. Patch 9-10 adds the DOMCTL_set_max_evtchn implementation and adds a function to libxc. This will also work with the 2-level ABI. Patch 11 add the event_channels configuration option to xl and libxl bits needed for this. Changes in v4: - Updates for Draft F of the design. - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. - Hypervisor defaults to unlimited event channels for for DomU. - Optimized memory allocation for struct evtchn''s when fewer than 128 are required (see patch 5). - Added event_channels option to the xl domain configuration file and plumbed this through libxl_build_info. Defaults to 127. Changes in v3: - Updates for Draft E of the design. - Store priority in struct evtchn. - Implement set_priority with generic code + hook. - Implement set_limit and add libxc function. - Add ABI specific output to ''e'' debug key. Changes in v2: - Updates for Draft D of the design. - 130,000+ event channels are now supported. - event_port.c -> event_2l.c and only contains 2l functions. - Addresses various review comments - int -> unsigned in lots of places - use write_atomic() to set HEAD - removed MAX_EVTCHNS - evtchn_ops are const. - Pack struct evtchns better to reduce memory needed.
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 01/11] debug: remove some event channel info from the ''i'' and ''q'' debug keys
From: David Vrabel <david.vrabel@citrix.com> The ''i'' key would always use VCPU0''s selector word when printing the event channel state. Remove the incorrect output as a subsequent change will add the (correct) information to the ''e'' key instead. When dumping domain information, printing the state of the VIRQ_DEBUG port is redundant -- this information is available via the ''e'' key. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/arch/x86/irq.c | 5 +---- xen/common/keyhandler.c | 11 ++--------- 2 files changed, 3 insertions(+), 13 deletions(-) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index c61cc46..7f547ff 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2262,14 +2262,11 @@ static void dump_irqs(unsigned char key) d = action->guest[i]; pirq = domain_irq_to_pirq(d, irq); info = pirq_info(d, pirq); - printk("%u:%3d(%c%c%c%c)", + printk("%u:%3d(%c%c%c)", d->domain_id, pirq, (test_bit(info->evtchn, &shared_info(d, evtchn_pending)) ? ''P'' : ''-''), - (test_bit(info->evtchn / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(d->vcpu[0], evtchn_pending_sel)) ? - ''S'' : ''-''), (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ? ''M'' : ''-''), (info->masked ? ''M'' : ''-'')); diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index b9ad1b5..8e4b3f8 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -310,16 +310,9 @@ static void dump_domains(unsigned char key) { for_each_vcpu ( d, v ) { - printk("Notifying guest %d:%d (virq %d, port %d, stat %d/%d/%d)\n", + printk("Notifying guest %d:%d (virq %d, port %d)\n", d->domain_id, v->vcpu_id, - VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG], - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_pending)), - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_mask)), - test_bit(v->virq_to_evtchn[VIRQ_DEBUG] / - BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel))); + VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG]); send_guest_vcpu_virq(v, VIRQ_DEBUG); } } -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 02/11] evtchn: refactor low-level event channel port ops
From: David Vrabel <david.vrabel@citrix.com> Use functions for the low-level event channel port operations (set/clear pending, unmask, is_pending and is_masked). Group these functions into a struct evtchn_port_op so they can be replaced by alternate implementations (for different ABIs) on a per-domain basis. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/arch/x86/irq.c | 11 ++--- xen/common/Makefile | 1 + xen/common/event_2l.c | 99 ++++++++++++++++++++++++++++++++++++++++++++ xen/common/event_channel.c | 87 +++++++++++++++------------------------ xen/common/schedule.c | 3 +- xen/include/xen/event.h | 45 ++++++++++++++++++++ xen/include/xen/sched.h | 4 ++ 7 files changed, 189 insertions(+), 61 deletions(-) create mode 100644 xen/common/event_2l.c diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 7f547ff..53fe9e3 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1474,7 +1474,7 @@ int pirq_guest_unmask(struct domain *d) { pirq = pirqs[i]->pirq; if ( pirqs[i]->masked && - !test_bit(pirqs[i]->evtchn, &shared_info(d, evtchn_mask)) ) + !evtchn_port_is_masked(d, evtchn_from_port(d, pirqs[i]->evtchn)) ) pirq_guest_eoi(pirqs[i]); } } while ( ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) ); @@ -2222,6 +2222,7 @@ static void dump_irqs(unsigned char key) int i, irq, pirq; struct irq_desc *desc; irq_guest_action_t *action; + struct evtchn *evtchn; struct domain *d; const struct pirq *info; unsigned long flags; @@ -2262,13 +2263,11 @@ static void dump_irqs(unsigned char key) d = action->guest[i]; pirq = domain_irq_to_pirq(d, irq); info = pirq_info(d, pirq); + evtchn = evtchn_from_port(d, info->evtchn); printk("%u:%3d(%c%c%c)", d->domain_id, pirq, - (test_bit(info->evtchn, - &shared_info(d, evtchn_pending)) ? - ''P'' : ''-''), - (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ? - ''M'' : ''-''), + (evtchn_port_is_pending(d, evtchn) ? ''P'' : ''-''), + (evtchn_port_is_masked(d, evtchn) ? ''M'' : ''-''), (info->masked ? ''M'' : ''-'')); if ( i != action->nr_guests ) printk(","); diff --git a/xen/common/Makefile b/xen/common/Makefile index 5486140..0a3a367 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -5,6 +5,7 @@ obj-y += cpupool.o obj-$(HAS_DEVICE_TREE) += device_tree.o obj-y += domctl.o obj-y += domain.o +obj-y += event_2l.o obj-y += event_channel.o obj-y += grant_table.o obj-y += irq.o diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c new file mode 100644 index 0000000..18c0c6e --- /dev/null +++ b/xen/common/event_2l.c @@ -0,0 +1,99 @@ +/* + * Event channel port operations. + * + * Copyright (c) 2003-2006, K A Fraser. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ + +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/sched.h> +#include <xen/event.h> + +static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn) +{ + struct domain *d = v->domain; + unsigned port = evtchn->port; + + /* + * The following bit operations must happen in strict order. + * NB. On x86, the atomic bit operations also act as memory barriers. + * There is therefore sufficiently strict ordering for this architecture -- + * others may require explicit memory barriers. + */ + + if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) ) + return; + + if ( !test_bit (port, &shared_info(d, evtchn_mask)) && + !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel)) ) + { + vcpu_mark_events_pending(v); + } + + evtchn_check_pollers(d, port); +} + +static void evtchn_2l_clear_pending(struct domain *d, struct evtchn *evtchn) +{ + clear_bit(evtchn->port, &shared_info(d, evtchn_pending)); +} + +static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + unsigned port = evtchn->port; + + /* + * These operations must happen in strict order. Based on + * evtchn_2l_set_pending() above. + */ + if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) && + test_bit (port, &shared_info(d, evtchn_pending)) && + !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel)) ) + { + vcpu_mark_events_pending(v); + } +} + +static bool_t evtchn_2l_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + return test_bit(evtchn->port, &shared_info(d, evtchn_pending)); +} + +static bool_t evtchn_2l_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + return test_bit(evtchn->port, &shared_info(d, evtchn_mask)); +} + +static const struct evtchn_port_ops evtchn_port_ops_2l +{ + .set_pending = evtchn_2l_set_pending, + .clear_pending = evtchn_2l_clear_pending, + .unmask = evtchn_2l_unmask, + .is_pending = evtchn_2l_is_pending, + .is_masked = evtchn_2l_is_masked, +}; + +void evtchn_2l_init(struct domain *d) +{ + d->evtchn_port_ops = &evtchn_port_ops_2l; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 64c976b..618ced0 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -150,6 +150,7 @@ static int get_free_port(struct domain *d) xfree(chn); return -ENOMEM; } + chn[i].port = port + i; } bucket_from_port(d, port) = chn; @@ -530,7 +531,7 @@ static long __evtchn_close(struct domain *d1, int port1) } /* Clear pending event to avoid unexpected behavior on re-bind. */ - clear_bit(port1, &shared_info(d1, evtchn_pending)); + evtchn_port_clear_pending(d1, chn1); /* Reset binding to vcpu0 when the channel is freed. */ chn1->state = ECS_FREE; @@ -615,43 +616,7 @@ out: static void evtchn_set_pending(struct vcpu *v, int port) { - struct domain *d = v->domain; - int vcpuid; - - /* - * The following bit operations must happen in strict order. - * NB. On x86, the atomic bit operations also act as memory barriers. - * There is therefore sufficiently strict ordering for this architecture -- - * others may require explicit memory barriers. - */ - - if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) ) - return; - - if ( !test_bit (port, &shared_info(d, evtchn_mask)) && - !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel)) ) - { - vcpu_mark_events_pending(v); - } - - /* Check if some VCPU might be polling for this event. */ - if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) - return; - - /* Wake any interested (or potentially interested) pollers. */ - for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); - vcpuid < d->max_vcpus; - vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) - { - v = d->vcpu[vcpuid]; - if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && - test_and_clear_bit(vcpuid, d->poll_mask) ) - { - v->poll_evtchn = 0; - vcpu_unblock(v); - } - } + evtchn_port_set_pending(v, evtchn_from_port(v->domain, port)); } int guest_enabled_event(struct vcpu *v, uint32_t virq) @@ -920,26 +885,15 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) int evtchn_unmask(unsigned int port) { struct domain *d = current->domain; - struct vcpu *v; + struct evtchn *evtchn; ASSERT(spin_is_locked(&d->event_lock)); if ( unlikely(!port_is_valid(d, port)) ) return -EINVAL; - v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id]; - - /* - * These operations must happen in strict order. Based on - * include/xen/event.h:evtchn_set_pending(). - */ - if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) && - test_bit (port, &shared_info(d, evtchn_pending)) && - !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d), - &vcpu_info(v, evtchn_pending_sel)) ) - { - vcpu_mark_events_pending(v); - } + evtchn = evtchn_from_port(d, port); + evtchn_port_unmask(d, evtchn); return 0; } @@ -1170,9 +1124,34 @@ void notify_via_xen_event_channel(struct domain *ld, int lport) spin_unlock(&ld->event_lock); } +void evtchn_check_pollers(struct domain *d, unsigned port) +{ + struct vcpu *v; + unsigned vcpuid; + + /* Check if some VCPU might be polling for this event. */ + if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) + return; + + /* Wake any interested (or potentially interested) pollers. */ + for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); + vcpuid < d->max_vcpus; + vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) + { + v = d->vcpu[vcpuid]; + if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && + test_and_clear_bit(vcpuid, d->poll_mask) ) + { + v->poll_evtchn = 0; + vcpu_unblock(v); + } + } +} int evtchn_init(struct domain *d) { + evtchn_2l_init(d); + spin_lock_init(&d->event_lock); if ( get_free_port(d) != 0 ) return -EINVAL; @@ -1270,8 +1249,8 @@ static void domain_dump_evtchn_info(struct domain *d) printk(" %4u [%d/%d]: s=%d n=%d x=%d", port, - !!test_bit(port, &shared_info(d, evtchn_pending)), - !!test_bit(port, &shared_info(d, evtchn_mask)), + !!evtchn_port_is_pending(d, chn), + !!evtchn_port_is_masked(d, chn), chn->state, chn->notify_vcpu_id, chn->xen_consumer); switch ( chn->state ) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index a8398bd..7e6884d 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -34,6 +34,7 @@ #include <xen/multicall.h> #include <xen/cpu.h> #include <xen/preempt.h> +#include <xen/event.h> #include <public/sched.h> #include <xsm/xsm.h> @@ -751,7 +752,7 @@ static long do_poll(struct sched_poll *sched_poll) goto out; rc = 0; - if ( test_bit(port, &shared_info(d, evtchn_pending)) ) + if ( evtchn_port_is_pending(d, evtchn_from_port(d, port)) ) goto out; } diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 6f60162..7522f4e 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -102,4 +102,49 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); smp_mb(); /* set blocked status /then/ caller does his work */ \ } while ( 0 ) +void evtchn_check_pollers(struct domain *d, unsigned port); + +void evtchn_2l_init(struct domain *d); + +/* + * Low-level event channel port ops. + */ +struct evtchn_port_ops { + void (*set_pending)(struct vcpu *v, struct evtchn *evtchn); + void (*clear_pending)(struct domain *d, struct evtchn *evtchn); + void (*unmask)(struct domain *d, struct evtchn *evtchn); + bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); + bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); +}; + +static inline void evtchn_port_set_pending(struct vcpu *v, + struct evtchn *evtchn) +{ + v->domain->evtchn_port_ops->set_pending(v, evtchn); +} + +static inline void evtchn_port_clear_pending(struct domain *d, + struct evtchn *evtchn) +{ + d->evtchn_port_ops->clear_pending(d, evtchn); +} + +static inline void evtchn_port_unmask(struct domain *d, + struct evtchn *evtchn) +{ + d->evtchn_port_ops->unmask(d, evtchn); +} + +static inline bool_t evtchn_port_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + return d->evtchn_port_ops->is_pending(d, evtchn); +} + +static inline bool_t evtchn_port_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + return d->evtchn_port_ops->is_masked(d, evtchn); +} + #endif /* __XEN_EVENT_H__ */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 0013a8d..fb9cf11 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -66,6 +66,7 @@ struct evtchn u8 state; /* ECS_* */ u8 xen_consumer; /* Consumer in Xen, if any? (0 = send to guest) */ u16 notify_vcpu_id; /* VCPU for local delivery notification */ + u32 port; union { struct { domid_t remote_domid; @@ -238,6 +239,8 @@ struct mem_event_per_domain struct mem_event_domain access; }; +struct evtchn_port_ops; + struct domain { domid_t domain_id; @@ -271,6 +274,7 @@ struct domain /* Event channel information. */ struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; spinlock_t event_lock; + const struct evtchn_port_ops *evtchn_port_ops; struct grant_table *grant_table; -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 03/11] evtchn: print ABI specific state with the ''e'' debug key
From: David Vrabel <david.vrabel@citrix.com> In the output of the ''e'' debug key, print some ABI specific state in addition to the (p)ending and (m)asked bits. For the 2-level ABI, print the state of that event''s selector bit. e.g., (XEN) port [p/m/s] (XEN) 1 [0/0/1]: s=3 n=0 x=0 d=0 p=74 (XEN) 2 [0/0/1]: s=3 n=0 x=0 d=0 p=75 Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_2l.c | 10 ++++++++++ xen/common/event_channel.c | 8 +++++--- xen/include/xen/event.h | 7 +++++++ 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c index 18c0c6e..b7b152c 100644 --- a/xen/common/event_2l.c +++ b/xen/common/event_2l.c @@ -74,6 +74,15 @@ static bool_t evtchn_2l_is_masked(struct domain *d, return test_bit(evtchn->port, &shared_info(d, evtchn_mask)); } +static void evtchn_2l_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + + printk("%d", !!test_bit(evtchn->port / BITS_PER_EVTCHN_WORD(d), + &vcpu_info(v, evtchn_pending_sel))); +} + static const struct evtchn_port_ops evtchn_port_ops_2l { .set_pending = evtchn_2l_set_pending, @@ -81,6 +90,7 @@ static const struct evtchn_port_ops evtchn_port_ops_2l .unmask = evtchn_2l_unmask, .is_pending = evtchn_2l_is_pending, .is_masked = evtchn_2l_is_masked, + .print_state = evtchn_2l_print_state, }; void evtchn_2l_init(struct domain *d) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 618ced0..51d59b8 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1232,7 +1232,7 @@ static void domain_dump_evtchn_info(struct domain *d) d->poll_mask, d->max_vcpus); printk("Event channel information for domain %d:\n" "Polling vCPUs: {%s}\n" - " port [p/m]\n", d->domain_id, keyhandler_scratch); + " port [p/m/s]\n", d->domain_id, keyhandler_scratch); spin_lock(&d->event_lock); @@ -1247,10 +1247,12 @@ static void domain_dump_evtchn_info(struct domain *d) if ( chn->state == ECS_FREE ) continue; - printk(" %4u [%d/%d]: s=%d n=%d x=%d", + printk(" %4u [%d/%d/", port, !!evtchn_port_is_pending(d, chn), - !!evtchn_port_is_masked(d, chn), + !!evtchn_port_is_masked(d, chn)); + evtchn_port_print_state(d, chn); + printk("]: s=%d n=%d x=%d", chn->state, chn->notify_vcpu_id, chn->xen_consumer); switch ( chn->state ) diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 7522f4e..90410e0 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -115,6 +115,7 @@ struct evtchn_port_ops { void (*unmask)(struct domain *d, struct evtchn *evtchn); bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); + void (*print_state)(struct domain *d, const struct evtchn *evtchn); }; static inline void evtchn_port_set_pending(struct vcpu *v, @@ -147,4 +148,10 @@ static inline bool_t evtchn_port_is_masked(struct domain *d, return d->evtchn_port_ops->is_masked(d, evtchn); } +static inline void evtchn_port_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + d->evtchn_port_ops->print_state(d, evtchn); +} + #endif /* __XEN_EVENT_H__ */ -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 04/11] evtchn: use a per-domain variable for the max number of event channels
From: David Vrabel <david.vrabel@citrix.com> Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead. This avoids having to repeatedly check the ABI type. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_2l.c | 1 + xen/common/event_channel.c | 4 ++-- xen/common/schedule.c | 2 +- xen/include/xen/event.h | 2 +- xen/include/xen/sched.h | 2 +- 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c index b7b152c..ecdcdaf 100644 --- a/xen/common/event_2l.c +++ b/xen/common/event_2l.c @@ -96,6 +96,7 @@ static const struct evtchn_port_ops evtchn_port_ops_2l void evtchn_2l_init(struct domain *d) { d->evtchn_port_ops = &evtchn_port_ops_2l; + d->max_evtchns = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d); } /* diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 51d59b8..5b5df88 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -134,7 +134,7 @@ static int get_free_port(struct domain *d) if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; - if ( port == MAX_EVTCHNS(d) ) + if ( port == d->max_evtchns ) return -ENOSPC; chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); @@ -1236,7 +1236,7 @@ static void domain_dump_evtchn_info(struct domain *d) spin_lock(&d->event_lock); - for ( port = 1; port < MAX_EVTCHNS(d); ++port ) + for ( port = 1; port < d->max_evtchns; ++port ) { const struct evtchn *chn; char *ssid; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 7e6884d..a5a0010 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -748,7 +748,7 @@ static long do_poll(struct sched_poll *sched_poll) goto out; rc = -EINVAL; - if ( port >= MAX_EVTCHNS(d) ) + if ( port >= d->max_evtchns ) goto out; rc = 0; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 90410e0..302a904 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -73,7 +73,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); #define bucket_from_port(d,p) \ ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET]) #define port_is_valid(d,p) \ - (((p) >= 0) && ((p) < MAX_EVTCHNS(d)) && \ + (((p) >= 0) && ((p) < (d)->max_evtchns) && \ (bucket_from_port(d,p) != NULL)) #define evtchn_from_port(d,p) \ (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)]) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index fb9cf11..532dd46 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -50,7 +50,6 @@ extern struct domain *dom0; #else #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG) #endif -#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d)) #define EVTCHNS_PER_BUCKET 128 #define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) @@ -273,6 +272,7 @@ struct domain /* Event channel information. */ struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; + unsigned max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 05/11] evtchn: allow many more evtchn objects to be allocated per domain
From: David Vrabel <david.vrabel@citrix.com> Expand the number of event channels that can be supported internally by altering now struct evtchn''s are allocated. The objects are indexed using a two level scheme of groups and buckets (instead of only buckets). Each group is a page of bucket pointers. Each bucket is a page-sized array of struct evtchn''s. The optimal number of evtchns per bucket is calculated at compile time. If XSM is not enabled, struct evtchn is 16 bytes and each bucket contains 256, requiring only 1 group of 512 pointers for 2^17 (131,072) event channels. With XSM enabled, struct evtchn is 24 bytes, each bucket contains 128 and 2 groups are required. For the common case of a domain with only a few event channels, instead of requiring an additional allocation for the group page, the first bucket is indexed directly. As a consequence of this, struct domain shrinks by at least 232 bytes as 32 bucket pointers are replaced with 1 bucket pointer and (at most) 2 group pointers. [ Based on a patch from Wei Liu with improvements from Malcolm Crossley. ] Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_channel.c | 82 ++++++++++++++++++++++++++++++++++---------- xen/include/xen/event.h | 40 ++++++++++++++++----- xen/include/xen/sched.h | 21 ++++++++++-- 3 files changed, 113 insertions(+), 30 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 5b5df88..fe9bba2 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -121,11 +121,47 @@ static int virq_is_global(uint32_t virq) } +static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned port) +{ + struct evtchn *chn; + unsigned i; + + chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); + if ( !chn ) + return NULL; + + for ( i = 0; i < EVTCHNS_PER_BUCKET; i++ ) + { + if ( xsm_alloc_security_evtchn(&chn[i]) ) + { + while ( i-- ) + xsm_free_security_evtchn(&chn[i]); + xfree(chn); + return NULL; + } + chn[i].port = port + i; + } + return chn; +} + +static void free_evtchn_bucket(struct domain *d, struct evtchn *bucket) +{ + unsigned i; + + if ( !bucket ) + return; + + for ( i = 0; i < EVTCHNS_PER_BUCKET; i++ ) + xsm_free_security_evtchn(bucket + i); + + xfree(bucket); +} + static int get_free_port(struct domain *d) { struct evtchn *chn; + struct evtchn **grp; int port; - int i, j; if ( d->is_dying ) return -EINVAL; @@ -137,22 +173,17 @@ static int get_free_port(struct domain *d) if ( port == d->max_evtchns ) return -ENOSPC; - chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET); - if ( unlikely(chn == NULL) ) - return -ENOMEM; - - for ( i = 0; i < EVTCHNS_PER_BUCKET; i++ ) + if ( !group_from_port(d, port) ) { - if ( xsm_alloc_security_evtchn(&chn[i]) ) - { - for ( j = 0; j < i; j++ ) - xsm_free_security_evtchn(&chn[j]); - xfree(chn); + grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP); + if ( !grp ) return -ENOMEM; - } - chn[i].port = port + i; + group_from_port(d, port) = grp; } + chn = alloc_evtchn_bucket(d, port); + if ( !chn ) + return -ENOMEM; bucket_from_port(d, port) = chn; return port; @@ -1152,15 +1183,25 @@ int evtchn_init(struct domain *d) { evtchn_2l_init(d); + d->evtchn = alloc_evtchn_bucket(d, 0); + if ( !d->evtchn ) + return -ENOMEM; + spin_lock_init(&d->event_lock); if ( get_free_port(d) != 0 ) + { + free_evtchn_bucket(d, d->evtchn); return -EINVAL; + } evtchn_from_port(d, 0)->state = ECS_RESERVED; #if MAX_VIRT_CPUS > BITS_PER_LONG d->poll_mask = xmalloc_array(unsigned long, BITS_TO_LONGS(MAX_VIRT_CPUS)); if ( !d->poll_mask ) + { + free_evtchn_bucket(d, d->evtchn); return -ENOMEM; + } bitmap_zero(d->poll_mask, MAX_VIRT_CPUS); #endif @@ -1170,7 +1211,7 @@ int evtchn_init(struct domain *d) void evtchn_destroy(struct domain *d) { - int i; + unsigned i, j; /* After this barrier no new event-channel allocations can occur. */ BUG_ON(!d->is_dying); @@ -1185,12 +1226,17 @@ void evtchn_destroy(struct domain *d) /* Free all event-channel buckets. */ spin_lock(&d->event_lock); - for ( i = 0; i < NR_EVTCHN_BUCKETS; i++ ) + for ( i = 0; i < NR_EVTCHN_GROUPS; i++ ) { - xsm_free_security_evtchn(d->evtchn[i]); - xfree(d->evtchn[i]); - d->evtchn[i] = NULL; + if ( !d->evtchn_group[i] ) + continue; + for ( j = 0; j < BUCKETS_PER_GROUP; j++ ) + free_evtchn_bucket(d, d->evtchn_group[i][j]); + xfree(d->evtchn_group[i]); + d->evtchn_group[i] = NULL; } + free_evtchn_bucket(d, d->evtchn); + d->evtchn = NULL; spin_unlock(&d->event_lock); clear_global_virq_handlers(d); diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 302a904..9d1a8c4 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -69,15 +69,37 @@ int guest_enabled_event(struct vcpu *v, uint32_t virq); /* Notify remote end of a Xen-attached event channel.*/ void notify_via_xen_event_channel(struct domain *ld, int lport); -/* Internal event channel object accessors */ -#define bucket_from_port(d,p) \ - ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET]) -#define port_is_valid(d,p) \ - (((p) >= 0) && ((p) < (d)->max_evtchns) && \ - (bucket_from_port(d,p) != NULL)) -#define evtchn_from_port(d,p) \ - (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)]) +/* + * Internal event channel object storage. + * + * The objects (struct evtchn) are indexed using a two level scheme of + * groups and buckets. Each group is a page of bucket pointers. Each + * bucket is a page-sized array of struct evtchn''s. + * + * The first bucket is directly accessed via d->evtchn. + */ +#define group_from_port(d, p) \ + ((d)->evtchn_group[(p) / EVTCHNS_PER_GROUP]) +#define bucket_from_port(d, p) \ + ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET]) +static inline bool_t port_is_valid(struct domain *d, unsigned p) +{ + if ( p >= d->max_evtchns ) + return 0; + if ( !d->evtchn ) + return 0; + if ( p < EVTCHNS_PER_BUCKET ) + return 1; + return group_from_port(d, p) != NULL && bucket_from_port(d, p) != NULL; +} + +static inline struct evtchn *evtchn_from_port(struct domain *d, unsigned p) +{ + if ( p < EVTCHNS_PER_BUCKET ) + return &d->evtchn[p]; + return bucket_from_port(d, p) + (p % EVTCHNS_PER_BUCKET); +} /* Wait on a Xen-attached event channel. */ #define wait_on_xen_event_channel(port, condition) \ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 532dd46..48ab812 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -50,8 +50,22 @@ extern struct domain *dom0; #else #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG) #endif -#define EVTCHNS_PER_BUCKET 128 -#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) + +#define BUCKETS_PER_GROUP (PAGE_SIZE/sizeof(struct evtchn *)) +/* Round size of struct evtchn up to power of 2 size */ +#define __RDU2(x) ( (x) | ( (x) >> 1)) +#define __RDU4(x) ( __RDU2(x) | ( __RDU2(x) >> 2)) +#define __RDU8(x) ( __RDU4(x) | ( __RDU4(x) >> 4)) +#define __RDU16(x) ( __RDU8(x) | ( __RDU8(x) >> 8)) +#define __RDU32(x) (__RDU16(x) | (__RDU16(x) >>16)) +#define next_power_of_2(x) (__RDU32((x)-1) + 1) + +/* Maximum number of event channels for any ABI. */ +#define MAX_NR_EVTCHNS NR_EVENT_CHANNELS + +#define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) +#define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) +#define NR_EVTCHN_GROUPS DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP) struct evtchn { @@ -271,7 +285,8 @@ struct domain spinlock_t rangesets_lock; /* Event channel information. */ - struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; + struct evtchn *evtchn; /* first bucket only */ + struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */ unsigned max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; -- 1.7.2.5
From: David Vrabel <david.vrabel@citrix.com> Add the event channel hypercall sub-ops and the definitions for the shared data structures for the FIFO-based event channel ABI. The design document for this new ABI is available here: http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf In summary, events are reported using a per-domain shared event array of event words. Each event word has PENDING, LINKED and MASKED bits and a LINK field for pointing to the next event in the event queue. There are 16 event queues (with different priorities) per-VCPU. Key advantages of this new ABI include: - Support for over 100,000 events (2^17). - 16 different event priorities. - Improved fairness in event latency through the use of FIFOs. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/include/public/event_channel.h | 75 ++++++++++++++++++++++++++++++++++++ xen/include/public/xen.h | 6 ++- xen/include/xen/sched.h | 2 +- 3 files changed, 80 insertions(+), 3 deletions(-) diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h index 472efdb..4a53484 100644 --- a/xen/include/public/event_channel.h +++ b/xen/include/public/event_channel.h @@ -71,6 +71,9 @@ #define EVTCHNOP_bind_vcpu 8 #define EVTCHNOP_unmask 9 #define EVTCHNOP_reset 10 +#define EVTCHNOP_init_control 11 +#define EVTCHNOP_expand_array 12 +#define EVTCHNOP_set_priority 13 /* ` } */ typedef uint32_t evtchn_port_t; @@ -258,6 +261,43 @@ struct evtchn_reset { typedef struct evtchn_reset evtchn_reset_t; /* + * EVTCHNOP_init_control: initialize the control block for the FIFO ABI. + * + * Note: any events that are currently pending will not be resent and + * will be lost. Guests should call this before binding any event to + * avoid losing any events. + */ +struct evtchn_init_control { + /* IN parameters. */ + uint64_t control_gfn; + uint32_t offset; + uint32_t vcpu; + /* OUT parameters. */ + uint8_t link_bits; + uint8_t _pad[7]; +}; +typedef struct evtchn_init_control evtchn_init_control_t; + +/* + * EVTCHNOP_expand_array: add an additional page to the event array. + */ +struct evtchn_expand_array { + /* IN parameters. */ + uint64_t array_gfn; +}; +typedef struct evtchn_expand_array evtchn_expand_array_t; + +/* + * EVTCHNOP_set_priority: set the priority for an event channel. + */ +struct evtchn_set_priority { + /* IN parameters. */ + uint32_t port; + uint32_t priority; +}; +typedef struct evtchn_set_priority evtchn_set_priority_t; + +/* * ` enum neg_errnoval * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op) * ` @@ -281,6 +321,41 @@ struct evtchn_op { typedef struct evtchn_op evtchn_op_t; DEFINE_XEN_GUEST_HANDLE(evtchn_op_t); +/* + * 2-level ABI + */ + +#define EVTCHN_2L_NR_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64) + +/* + * FIFO ABI + */ + +/* Events may have priorities from 0 (highest) to 15 (lowest). */ +#define EVTCHN_FIFO_PRIORITY_MAX 0 +#define EVTCHN_FIFO_PRIORITY_DEFAULT 7 +#define EVTCHN_FIFO_PRIORITY_MIN 15 + +#define EVTCHN_FIFO_MAX_QUEUES (EVTCHN_FIFO_PRIORITY_MIN + 1) + +typedef uint32_t event_word_t; + +#define EVTCHN_FIFO_PENDING 31 +#define EVTCHN_FIFO_MASKED 30 +#define EVTCHN_FIFO_LINKED 29 + +#define EVTCHN_FIFO_LINK_BITS 17 +#define EVTCHN_FIFO_LINK_MASK ((1 << EVTCHN_FIFO_LINK_BITS) - 1) + +#define EVTCHN_FIFO_NR_CHANNELS (1 << EVTCHN_FIFO_LINK_BITS) + +struct evtchn_fifo_control_block { + uint32_t ready; + uint32_t _rsvd; + uint32_t head[EVTCHN_FIFO_MAX_QUEUES]; +}; +typedef struct evtchn_fifo_control_block evtchn_fifo_control_block_t; + #endif /* __XEN_PUBLIC_EVENT_CHANNEL_H__ */ /* diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index b50bd05..8c5697e 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -552,11 +552,13 @@ struct multicall_entry { typedef struct multicall_entry multicall_entry_t; DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); +#if __XEN_INTERFACE_VERSION__ < 0x00040400 /* - * Event channel endpoints per domain: + * Event channel endpoints per domain (when using the 2-level ABI): * 1024 if a long is 32 bits; 4096 if a long is 64 bits. */ -#define NR_EVENT_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64) +#define NR_EVENT_CHANNELS EVTCHN_2L_NR_CHANNELS +#endif struct vcpu_time_info { /* diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 48ab812..4107c41 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -61,7 +61,7 @@ extern struct domain *dom0; #define next_power_of_2(x) (__RDU32((x)-1) + 1) /* Maximum number of event channels for any ABI. */ -#define MAX_NR_EVTCHNS NR_EVENT_CHANNELS +#define MAX_NR_EVTCHNS EVTCHN_2L_NR_CHANNELS #define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) #define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 07/11] evtchn: implement EVTCHNOP_set_priority and add the set_priority hook
From: David Vrabel <david.vrabel@citrix.com> Implement EVTCHNOP_set_priority. A new set_priority hook added to struct evtchn_port_ops will do the ABI specific validation and setup. If an ABI does not provide a set_priority hook (as is the case of the 2-level ABI), the sub-op will return -ENOSYS. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_channel.c | 29 +++++++++++++++++++++++++++++ xen/include/xen/event.h | 11 +++++++++++ 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index fe9bba2..9b5f710 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -955,6 +955,27 @@ out: return rc; } +static long evtchn_set_priority(const struct evtchn_set_priority *set_priority) +{ + struct domain *d = current->domain; + unsigned port = set_priority->port; + long ret; + + spin_lock(&d->event_lock); + + if ( !port_is_valid(d, port) ) + { + spin_unlock(&d->event_lock); + return -EINVAL; + } + + ret = evtchn_port_set_priority(d, evtchn_from_port(d, port), + set_priority->priority); + + spin_unlock(&d->event_lock); + + return ret; +} long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { @@ -1064,6 +1085,14 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_set_priority: { + struct evtchn_set_priority set_priority; + if ( copy_from_guest(&set_priority, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_set_priority(&set_priority); + break; + } + default: rc = -ENOSYS; break; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 9d1a8c4..16e7ed0 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -137,6 +137,8 @@ struct evtchn_port_ops { void (*unmask)(struct domain *d, struct evtchn *evtchn); bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn); bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn); + int (*set_priority)(struct domain *d, struct evtchn *evtchn, + unsigned priority); void (*print_state)(struct domain *d, const struct evtchn *evtchn); }; @@ -170,6 +172,15 @@ static inline bool_t evtchn_port_is_masked(struct domain *d, return d->evtchn_port_ops->is_masked(d, evtchn); } +static inline int evtchn_port_set_priority(struct domain *d, + struct evtchn *evtchn, + unsigned priority) +{ + if ( !d->evtchn_port_ops->set_priority ) + return -ENOSYS; + return d->evtchn_port_ops->set_priority(d, evtchn, priority); +} + static inline void evtchn_port_print_state(struct domain *d, const struct evtchn *evtchn) { -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 08/11] evtchn: add FIFO-based event channel hypercalls and port ops
From: David Vrabel <david.vrabel@citrix.com> Add the implementation for the FIFO-based event channel ABI. The new hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array) and the required evtchn_ops (set_pending, unmask, etc.). Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/Makefile | 1 + xen/common/event_channel.c | 21 ++ xen/common/event_fifo.c | 455 ++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/event_fifo.h | 53 +++++ xen/include/xen/sched.h | 6 +- 5 files changed, 535 insertions(+), 1 deletions(-) create mode 100644 xen/common/event_fifo.c create mode 100644 xen/include/xen/event_fifo.h diff --git a/xen/common/Makefile b/xen/common/Makefile index 0a3a367..533b603 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -7,6 +7,7 @@ obj-y += domctl.o obj-y += domain.o obj-y += event_2l.o obj-y += event_channel.o +obj-y += event_fifo.o obj-y += grant_table.o obj-y += irq.o obj-y += kernel.o diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 9b5f710..2c90a66 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -26,6 +26,7 @@ #include <xen/compat.h> #include <xen/guest_access.h> #include <xen/keyhandler.h> +#include <xen/event_fifo.h> #include <asm/current.h> #include <public/xen.h> @@ -1085,6 +1086,24 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_init_control: { + struct evtchn_init_control init_control; + if ( copy_from_guest(&init_control, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_fifo_init_control(&init_control); + if ( !rc && __copy_to_guest(arg, &init_control, 1) ) + rc = -EFAULT; + break; + } + + case EVTCHNOP_expand_array: { + struct evtchn_expand_array expand_array; + if ( copy_from_guest(&expand_array, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_fifo_expand_array(&expand_array); + break; + } + case EVTCHNOP_set_priority: { struct evtchn_set_priority set_priority; if ( copy_from_guest(&set_priority, arg, 1) != 0 ) @@ -1269,6 +1288,8 @@ void evtchn_destroy(struct domain *d) spin_unlock(&d->event_lock); clear_global_virq_handlers(d); + + evtchn_fifo_destroy(d); } diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c new file mode 100644 index 0000000..23e273f --- /dev/null +++ b/xen/common/event_fifo.c @@ -0,0 +1,455 @@ +/* + * FIFO event channel management. + * + * Copyright (C) 2013 Citrix Systems R&D Ltd. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ + +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/sched.h> +#include <xen/event.h> +#include <xen/event_fifo.h> +#include <xen/paging.h> +#include <xen/mm.h> + +#include <public/event_channel.h> + +static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d, + unsigned port) +{ + unsigned p, w; + + if ( unlikely(port >= d->evtchn_fifo->num_evtchns) ) + return NULL; + + p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + + return d->evtchn_fifo->event_array[p].virt + w; +} + +static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link) +{ + event_word_t n, o, w; + + w = *word; + + do { + if ( !(w & (1 << EVTCHN_FIFO_LINKED)) ) + return 0; + o = w; + n = (w & ~EVTCHN_FIFO_LINK_MASK) | link; + } while ( (w = cmpxchg(word, o, n)) != o ); + + return 1; +} + +static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) +{ + struct domain *d = v->domain; + unsigned port; + event_word_t *word; + struct evtchn_fifo_queue *q; + unsigned long flags; + bool_t was_pending; + + port = evtchn->port; + word = evtchn_fifo_word_from_port(d, port); + if ( unlikely(!word) ) + return; + + /* + * No locking around getting the queue. This may race with + * changing the priority but we are allowed to signal the event + * once on the old priority. + */ + q = &v->evtchn_fifo->queue[evtchn->priority]; + + was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word); + + /* + * Link the event if it unmasked and not already linked. + */ + if ( !test_bit(EVTCHN_FIFO_MASKED, word) + && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + { + event_word_t *tail_word; + bool_t linked = 0; + + spin_lock_irqsave(&q->lock, flags); + + /* + * Atomically link the tail to port iff the tail is linked. + * If the tail is unlinked the queue is empty. + * + * If port is the same as tail, the queue is empty but q->tail + * will appear linked as we just set LINKED above. + * + * If the queue is empty (i.e., we haven''t linked to the new + * event), head must be updated. + */ + if ( port != q->tail ) + { + tail_word = evtchn_fifo_word_from_port(d, q->tail); + linked = evtchn_fifo_set_link(tail_word, port); + } + if ( !linked ) + write_atomic(q->head, port); + q->tail = port; + + spin_unlock_irqrestore(&q->lock, flags); + + if ( !test_and_set_bit(q->priority, + &v->evtchn_fifo->control_block->ready) ) + vcpu_mark_events_pending(v); + } + + if ( !was_pending ) + evtchn_check_pollers(d, port); +} + +static void evtchn_fifo_clear_pending(struct domain *d, struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return; + + /* + * Just clear the P bit. + * + * No need to unlink as the guest will unlink and ignore + * non-pending events. + */ + clear_bit(EVTCHN_FIFO_PENDING, word); +} + +static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn) +{ + struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id]; + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return; + + clear_bit(EVTCHN_FIFO_MASKED, word); + + /* Relink if pending. */ + if ( test_bit(EVTCHN_FIFO_PENDING, word) ) + evtchn_fifo_set_pending(v, evtchn); +} + +static bool_t evtchn_fifo_is_pending(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return 0; + + return test_bit(EVTCHN_FIFO_PENDING, word); +} + +static bool_t evtchn_fifo_is_masked(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( unlikely(!word) ) + return 1; + + return test_bit(EVTCHN_FIFO_MASKED, word); +} + +static int evtchn_fifo_set_priority(struct domain *d, struct evtchn *evtchn, + unsigned priority) +{ + if ( priority > EVTCHN_FIFO_PRIORITY_MIN ) + return -EINVAL; + + /* + * Only need to switch to the new queue for future events. If the + * event is already pending or in the process of being linked it + * will be on the old queue -- this is fine. + */ + evtchn->priority = priority; + + return 0; +} + +static void evtchn_fifo_print_state(struct domain *d, + const struct evtchn *evtchn) +{ + event_word_t *word; + + word = evtchn_fifo_word_from_port(d, evtchn->port); + if ( !word ) + printk("? "); + else if ( test_bit(EVTCHN_FIFO_LINKED, word) ) + printk("%-4u", *word & EVTCHN_FIFO_LINK_MASK); + else + printk("- "); +} + +static const struct evtchn_port_ops evtchn_port_ops_fifo +{ + .set_pending = evtchn_fifo_set_pending, + .clear_pending = evtchn_fifo_clear_pending, + .unmask = evtchn_fifo_unmask, + .is_pending = evtchn_fifo_is_pending, + .is_masked = evtchn_fifo_is_masked, + .set_priority = evtchn_fifo_set_priority, + .print_state = evtchn_fifo_print_state, +}; + +static int map_guest_page(struct domain *d, uint64_t gfn, + struct page_info **page, void **virt) +{ + struct page_info *p; + + p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); + if ( !p ) + return -EINVAL; + + if ( !get_page_type(p, PGT_writable_page) ) + { + put_page(p); + return -EINVAL; + } + + *virt = map_domain_page_global(gfn); + if ( !*virt ) + { + put_page_and_type(p); + return -ENOMEM; + } + *page = p; + return 0; +} + +static void unmap_guest_page(struct page_info *page, void *virt) +{ + if ( page == NULL ) + return; + + unmap_domain_page_global(virt); + put_page_and_type(page); +} + +static void cleanup_control_block(struct vcpu *v) +{ + if ( v->evtchn_fifo ) + { + unmap_guest_page(v->evtchn_fifo->cb_page, v->evtchn_fifo->control_block); + xfree(v->evtchn_fifo); + v->evtchn_fifo = NULL; + } +} + +static void init_queue(struct vcpu *v, struct evtchn_fifo_queue *q, unsigned i) +{ + spin_lock_init(&q->lock); + q->priority = i; + q->head = &v->evtchn_fifo->control_block->head[i]; +} + +static int setup_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset) +{ + struct domain *d = v->domain; + struct evtchn_fifo_vcpu *efv; + struct page_info *page; + void *virt; + unsigned i; + int rc; + + if ( v->evtchn_fifo ) + return -EINVAL; + + efv = xzalloc(struct evtchn_fifo_vcpu); + if ( efv == NULL ) + return -ENOMEM; + + rc = map_guest_page(d, gfn, &page, &virt); + if ( rc < 0 ) + { + xfree(efv); + return rc; + } + + v->evtchn_fifo = efv; + + v->evtchn_fifo->cb_page = page; + v->evtchn_fifo->control_block = virt + offset; + + for ( i = 0; i <= EVTCHN_FIFO_PRIORITY_MIN; i++ ) + init_queue(v, &v->evtchn_fifo->queue[i], i); + + return 0; +} + +/* + * Setup an event array with no pages. + */ +static int setup_event_array(struct domain *d) +{ + if ( d->evtchn_fifo ) + return 0; + + d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain); + if ( d->evtchn_fifo == NULL ) + return -ENOMEM; + + d->evtchn_fifo->num_evtchns = 0; + + return 0; +} + +static void cleanup_event_array(struct domain *d) +{ + unsigned i; + + if ( d->evtchn_fifo == NULL ) + return; + + for ( i = 0; i < EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES; i++ ) + { + unmap_guest_page(d->evtchn_fifo->event_array[i].page, + d->evtchn_fifo->event_array[i].virt); + } + xfree(d->evtchn_fifo); +} + +static void set_priority_all(struct domain *d, unsigned priority) +{ + unsigned port; + + for ( port = 1; port < d->max_evtchns; port++ ) + { + if ( !port_is_valid(d, port) ) + break; + + evtchn_port_set_priority(d, evtchn_from_port(d, port), priority); + } +} + +int evtchn_fifo_init_control(struct evtchn_init_control *init_control) +{ + struct domain *d = current->domain; + uint32_t vcpu_id; + uint64_t gfn; + uint32_t offset; + struct vcpu *v; + int rc; + + init_control->link_bits = EVTCHN_FIFO_LINK_BITS; + + vcpu_id = init_control->vcpu; + gfn = init_control->control_gfn; + offset = init_control->offset; + + if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) ) + return -ENOENT; + v = d->vcpu[vcpu_id]; + + /* Must not cross page boundary. */ + if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) ) + return -EINVAL; + + /* Must be 8-bytes aligned. */ + if ( offset & (8 - 1) ) + return -EINVAL; + + spin_lock(&d->event_lock); + + rc = setup_control_block(v, gfn, offset); + + /* + * If this is the first control block, setup an empty event array + * and switch to the fifo port ops. + * + * Any ports currently bound will have their priority set to the + * default. + */ + if ( d->evtchn_fifo == NULL ) + { + rc = setup_event_array(d); + if ( rc < 0 ) + cleanup_control_block(v); + else + { + d->evtchn_port_ops = &evtchn_port_ops_fifo; + d->max_evtchns = EVTCHN_FIFO_NR_CHANNELS; + set_priority_all(d, EVTCHN_FIFO_PRIORITY_DEFAULT); + } + } + + spin_unlock(&d->event_lock); + + return rc; +} + +static int add_page_to_event_array(struct domain *d, unsigned long gfn) +{ + struct page_info *page = NULL; + void *virt; + unsigned slot; + int rc; + + slot = d->evtchn_fifo->num_evtchns / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + if ( slot >= EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES ) + return -ENOSPC; + + rc = map_guest_page(d, gfn, &page, &virt); + if ( rc < 0 ) + return rc; + + d->evtchn_fifo->event_array[slot].page = page; + d->evtchn_fifo->event_array[slot].virt = virt; + + d->evtchn_fifo->num_evtchns += EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; + + return 0; +} + +int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array) +{ + struct domain *d = current->domain; + int rc; + + if ( !d->evtchn_fifo ) + return -ENOSYS; + + spin_lock(&d->event_lock); + rc = add_page_to_event_array(d, expand_array->array_gfn); + spin_unlock(&d->event_lock); + + return rc; +} + +void evtchn_fifo_destroy(struct domain *d) +{ + struct vcpu *v; + + for_each_vcpu( d, v ) + cleanup_control_block(v); + cleanup_event_array(d); +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/event_fifo.h b/xen/include/xen/event_fifo.h new file mode 100644 index 0000000..ff118ad --- /dev/null +++ b/xen/include/xen/event_fifo.h @@ -0,0 +1,53 @@ +/* + * FIFO-based event channel ABI. + * + * Copyright (C) 2013 Citrix Systems R&D Ltd. + * + * This source code is licensed under the GNU General Public License, + * Version 2 or later. See the file COPYING for more details. + */ +#ifndef __XEN_EVENT_FIFO_H__ +#define __XEN_EVENT_FIFO_H__ + +struct evtchn_fifo_queue { + uint32_t *head; /* points into control block */ + uint32_t tail; + spinlock_t lock; + uint8_t priority; +}; + +struct evtchn_fifo_vcpu { + struct page_info *cb_page; + struct evtchn_fifo_control_block *control_block; + struct evtchn_fifo_queue queue[EVTCHN_FIFO_MAX_QUEUES]; +}; + +#define EVTCHN_FIFO_EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t)) +#define EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES \ + (EVTCHN_FIFO_NR_CHANNELS / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE) + +struct evtchn_fifo_array_page { + struct page_info *page; + event_word_t *virt; +}; + +struct evtchn_fifo_domain { + struct evtchn_fifo_array_page event_array[EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES]; + unsigned num_evtchns; +}; + +int evtchn_fifo_init_control(struct evtchn_init_control *init_control); +int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array); +void evtchn_fifo_destroy(struct domain *domain); + +#endif /* __XEN_EVENT_FIFO_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 4107c41..ae653e6 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -61,7 +61,7 @@ extern struct domain *dom0; #define next_power_of_2(x) (__RDU32((x)-1) + 1) /* Maximum number of event channels for any ABI. */ -#define MAX_NR_EVTCHNS EVTCHN_2L_NR_CHANNELS +#define MAX_NR_EVTCHNS MAX(EVTCHN_2L_NR_CHANNELS, EVTCHN_FIFO_NR_CHANNELS) #define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn))) #define EVTCHNS_PER_GROUP (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET) @@ -95,6 +95,7 @@ struct evtchn } pirq; /* state == ECS_PIRQ */ u16 virq; /* state == ECS_VIRQ */ } u; + u8 priority; #ifdef FLASK_ENABLE void *ssid; #endif @@ -209,6 +210,8 @@ struct vcpu /* Guest-specified relocation of vcpu_info. */ unsigned long vcpu_info_mfn; + struct evtchn_fifo_vcpu *evtchn_fifo; + struct arch_vcpu arch; }; @@ -290,6 +293,7 @@ struct domain unsigned max_evtchns; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; + struct evtchn_fifo_domain *evtchn_fifo; struct grant_table *grant_table; -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
From: David Vrabel <dvrabel@cantab.net> Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to set the maximum event channel port a domain may use. This may be used to limit the amount of Xen resources (global mapping space and xenheap) that a domain may use for event channels. A domain that does not have a limit set may use all the event channels supported by the event channel ABI in use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- tools/flask/policy/policy/mls | 2 +- tools/flask/policy/policy/modules/xen/xen.if | 2 +- tools/flask/policy/policy/modules/xen/xen.te | 2 +- xen/common/domctl.c | 8 ++++++++ xen/common/event_channel.c | 7 ++++++- xen/include/public/domctl.h | 15 ++++++++++++++- xen/include/xen/sched.h | 1 + xen/xsm/flask/hooks.c | 3 +++ xen/xsm/flask/policy/access_vectors | 2 ++ 9 files changed, 37 insertions(+), 5 deletions(-) diff --git a/tools/flask/policy/policy/mls b/tools/flask/policy/policy/mls index 9290a76..fb603cd 100644 --- a/tools/flask/policy/policy/mls +++ b/tools/flask/policy/policy/mls @@ -74,7 +74,7 @@ mlsconstrain domain { getaffinity getdomaininfo getvcpuinfo getvcpucontext getad ((l1 dom l2) or (t1 == mls_priv)); # all the domain "write" ops -mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext } +mlsconstrain domain { setvcpucontext pause unpause resume create max_vcpus destroy setaffinity scheduler setdomainmaxmem setdomainhandle setdebugging hypercall settime set_target shutdown setaddrsize trigger setextvcpucontext set_max_evtchn } ((l1 eq l2) or (t1 == mls_priv)); # This is incomplete - similar constraints must be written for all classes diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if index 97af0a8..63e40f0 100644 --- a/tools/flask/policy/policy/modules/xen/xen.if +++ b/tools/flask/policy/policy/modules/xen/xen.if @@ -48,7 +48,7 @@ define(`create_domain_common'', ` allow $1 $2:domain { create max_vcpus setdomainmaxmem setaddrsize getdomaininfo hypercall setvcpucontext setextvcpucontext getscheduler getvcpuinfo getvcpuextstate getaddrsize - getaffinity setaffinity }; + getaffinity setaffinity set_max_evtchn }; allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim }; allow $1 $2:security check_context; allow $1 $2:shadow enable; diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index c89ce28..5f9de5c 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -73,7 +73,7 @@ allow dom0_t dom0_t:domain { getdomaininfo getvcpuinfo getvcpucontext setdomainmaxmem setdomainhandle setdebugging hypercall settime setaddrsize getaddrsize trigger getextvcpucontext setextvcpucontext getvcpuextstate setvcpuextstate - getpodtarget setpodtarget set_misc_info set_virq_handler + getpodtarget setpodtarget set_misc_info set_virq_handler set_max_evtchn }; allow dom0_t dom0_t:domain2 { set_cpuid gettsc settsc setscheduler diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 9760d50..bffe8d8 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -863,6 +863,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) } break; + case XEN_DOMCTL_set_max_evtchn: + { + d->max_evtchn_port = min_t(unsigned, + op->u.set_max_evtchn.max_port, + INT_MAX); + } + break; + default: ret = arch_do_domctl(op, d, u_domctl); break; diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 2c90a66..8e430e2 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -168,10 +168,14 @@ static int get_free_port(struct domain *d) return -EINVAL; for ( port = 0; port_is_valid(d, port); port++ ) + { + if ( port > d->max_evtchn_port ) + return -ENOSPC; if ( evtchn_from_port(d, port)->state == ECS_FREE ) return port; + } - if ( port == d->max_evtchns ) + if ( port == d->max_evtchns || port > d->max_evtchn_port ) return -ENOSPC; if ( !group_from_port(d, port) ) @@ -1230,6 +1234,7 @@ void evtchn_check_pollers(struct domain *d, unsigned port) int evtchn_init(struct domain *d) { evtchn_2l_init(d); + d->max_evtchn_port = INT_MAX; d->evtchn = alloc_evtchn_bucket(d, 0); if ( !d->evtchn ) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 4c5b2bb..ed9155a 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -36,7 +36,7 @@ #include "grant_table.h" #include "hvm/save.h" -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009 +#define XEN_DOMCTL_INTERFACE_VERSION 0x0000000a /* * NB. xen_domctl.domain is an IN/OUT parameter for this operation. @@ -852,6 +852,17 @@ struct xen_domctl_set_broken_page_p2m { typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t); +/* + * XEN_DOMCTL_set_max_evtchn: sets the maximum event channel port + * number the guest may use. Use this limit the amount of resources + * (global mapping space, xenheap) a guest may use for event channels. + */ +struct xen_domctl_set_max_evtchn { + uint32_t max_port; +}; +typedef struct xen_domctl_set_max_evtchn xen_domctl_set_max_evtchn_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_max_evtchn_t); + struct xen_domctl { uint32_t cmd; #define XEN_DOMCTL_createdomain 1 @@ -920,6 +931,7 @@ struct xen_domctl { #define XEN_DOMCTL_set_broken_page_p2m 67 #define XEN_DOMCTL_setnodeaffinity 68 #define XEN_DOMCTL_getnodeaffinity 69 +#define XEN_DOMCTL_set_max_evtchn 70 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -975,6 +987,7 @@ struct xen_domctl { struct xen_domctl_set_access_required access_required; struct xen_domctl_audit_p2m audit_p2m; struct xen_domctl_set_virq_handler set_virq_handler; + struct xen_domctl_set_max_evtchn set_max_evtchn; struct xen_domctl_gdbsx_memio gdbsx_guest_memio; struct xen_domctl_set_broken_page_p2m set_broken_page_p2m; struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ae653e6..bca381c 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -291,6 +291,7 @@ struct domain struct evtchn *evtchn; /* first bucket only */ struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */ unsigned max_evtchns; + unsigned max_evtchn_port; spinlock_t event_lock; const struct evtchn_port_ops *evtchn_port_ops; struct evtchn_fifo_domain *evtchn_fifo; diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index fa0589a..548df47 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd) case XEN_DOMCTL_audit_p2m: return current_has_perm(d, SECCLASS_HVM, HVM__AUDIT_P2M); + case XEN_DOMCTL_set_max_evtchn: + return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SET_MAX_EVTCHN); + default: printk("flask_domctl: Unknown op %d\n", cmd); return -EPERM; diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 5dfe13b..03a8e64 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -157,6 +157,8 @@ class domain set_misc_info # XEN_DOMCTL_set_virq_handler set_virq_handler +# XEN_DOMCTL_set_max_evtchn + set_max_evtchn } # This is a continuation of class domain, since only 32 permissions can be -- 1.7.2.5
From: David Vrabel <david.vrabel@citrix.com> Add xc_domain_set_max_evtchn(), a wrapper around the DOMCTL_set_max_evtchn hypercall. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> --- tools/libxc/xc_domain.c | 11 +++++++++++ tools/libxc/xenctrl.h | 12 ++++++++++++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 81316d3..2cea6e3 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1766,6 +1766,17 @@ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq) return do_domctl(xch, &domctl); } +int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid, + uint32_t max_port) +{ + DECLARE_DOMCTL; + + domctl.cmd = XEN_DOMCTL_set_max_evtchn; + domctl.domain = domid; + domctl.u.set_max_evtchn.max_port = max_port; + return do_domctl(xch, &domctl); +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 58d51f3..8cf3f3b 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -847,6 +847,18 @@ int xc_domain_set_access_required(xc_interface *xch, */ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq); +/** + * Set the maximum event channel port a domain may bind. + * + * This does not affect ports that are already bound. + * + * @param xch a handle to an open hypervisor interface + * @param domid the domain id + * @param max_port maximum port number + */ +int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid, + uint32_t max_port); + /* * CPUPOOL MANAGEMENT FUNCTIONS */ -- 1.7.2.5
David Vrabel
2013-Sep-27 10:55 UTC
[PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
From: David Vrabel <david.vrabel@citrix.com> Add the ''event_channels'' option to the xl configuration file to limit the number of event channels that domain may use. Plumb this option through to libxl via a new libxl_build_info field and call xc_domain_set_max_evtchn() in the post build stage of domain creation. A new LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS #define indicates that this new field is available. The default value of 127 limits the domain to uses the minimum amount of Xen resources (xenheap and global mapping pages) regardless of event channel ABI that may be used by a guest. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> --- docs/man/xl.cfg.pod.5 | 12 ++++++++++++ tools/libxl/libxl.h | 5 +++++ tools/libxl/libxl_create.c | 3 +++ tools/libxl/libxl_dom.c | 4 ++++ tools/libxl/libxl_types.idl | 1 + tools/libxl/xl_cmdimpl.c | 3 +++ 6 files changed, 28 insertions(+), 0 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 769767b..3c7dd28 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -572,6 +572,18 @@ Allow a guest to access specific physical IRQs. It is recommended to use this option only for trusted VMs under administrator control. +=item B<event_channels=N> + +Limit the guest to using a most N event channels (PV interrupts). +Guests use hypervisor resources for each event channel they use. + +The default of 127 should be sufficient for typical guests and means +the guest uses the lowest amout of hypervisor resources. The maximum +value depends what the guest supports. Guests supporting the +FIFO-based event channel ABI support up to 131,071 event channels. +Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit +x86). + =back =head2 Paravirtualised (PV) Guest Specific Options diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 4cab294..30712c2 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -90,6 +90,11 @@ #define LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE 1 /* + * The libxl_domain_build_info has the event_channels field. + */ +#define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1 + +/* * libxl ABI compatibility * * The only guarantee which libxl makes regarding ABI compatibility diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 7567238..806e25c 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -208,6 +208,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, libxl_defbool_setdefault(&b_info->disable_migrate, false); + if (!b_info->event_channels) + b_info->event_channels = 127; + switch (b_info->type) { case LIBXL_DOMAIN_TYPE_HVM: if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 6e2252a..c905c74 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -268,6 +268,10 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid, if (rc) return rc; + rc = xc_domain_set_max_evtchn(ctx->xch, domid, info->event_channels); + if (rc) + return rc; + libxl_cpuid_apply_policy(ctx, domid); if (info->cpuid != NULL) libxl_cpuid_set(ctx, domid, info->cpuid); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 049dbb5..7bf517d 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -308,6 +308,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("irqs", Array(uint32, "num_irqs")), ("iomem", Array(libxl_iomem_range, "num_iomem")), ("claim_mode", libxl_defbool), + ("event_channels", uint32), ("u", KeyedUnion(None, libxl_domain_type, "type", [("hvm", Struct(None, [("firmware", string), ("bios", libxl_bios_type), diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 3d7eaad..7931bb9 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -813,6 +813,9 @@ static void parse_config_data(const char *config_source, if (!xlu_cfg_get_long (config, "videoram", &l, 0)) b_info->video_memkb = l * 1024; + if (!xlu_cfg_get_long(config, "event_channels", &l, 0)) + b_info->event_channels = l; + switch(b_info->type) { case LIBXL_DOMAIN_TYPE_HVM: if (!xlu_cfg_get_string (config, "kernel", &buf, 0)) -- 1.7.2.5
Jan Beulich
2013-Sep-27 12:34 UTC
Re: [PATCH 08/11] evtchn: add FIFO-based event channel hypercalls and port ops
>>> On 27.09.13 at 12:55, David Vrabel <david.vrabel@citrix.com> wrote: > +static int map_guest_page(struct domain *d, uint64_t gfn, > + struct page_info **page, void **virt) > +{ > + struct page_info *p; > + > + p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); > + if ( !p ) > + return -EINVAL; > + > + if ( !get_page_type(p, PGT_writable_page) ) > + { > + put_page(p); > + return -EINVAL; > + } > + > + *virt = map_domain_page_global(gfn);Since this one returns page aligned addresses, ...> + if ( !*virt ) > + { > + put_page_and_type(p); > + return -ENOMEM; > + } > + *page = p; > + return 0; > +} > + > +static void unmap_guest_page(struct page_info *page, void *virt) > +{ > + if ( page == NULL ) > + return; > + > + unmap_domain_page_global(virt);... this one expects page aligned addresses, but the way you use it below doesn''t guarantee that (see b0581b92 ("x86: make map_domain_page_global() a simple wrapper around vmap()")).> +static int setup_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset) > +{ > + struct domain *d = v->domain; > + struct evtchn_fifo_vcpu *efv; > + struct page_info *page; > + void *virt; > + unsigned i; > + int rc; > + > + if ( v->evtchn_fifo ) > + return -EINVAL; > + > + efv = xzalloc(struct evtchn_fifo_vcpu); > + if ( efv == NULL ) > + return -ENOMEM; > + > + rc = map_guest_page(d, gfn, &page, &virt); > + if ( rc < 0 ) > + { > + xfree(efv); > + return rc; > + } > + > + v->evtchn_fifo = efv; > + > + v->evtchn_fifo->cb_page = page;There''s no real need to store this if you only need it for freeing later. x86 at least has domain_page_map_to_mfn() to recover the MFN (and thus page) from the mapping, and if ARM doesn''t they should implement it.> +struct evtchn_fifo_queue { > + uint32_t *head; /* points into control block */ > + uint32_t tail; > + spinlock_t lock; > + uint8_t priority; > +};spinlock_t being at least 32 bits in size you win nothing with this ordering, whereas if "tail" and "priority" would be adjacent the structure size might decrease for certain cases (namely when alignof(spinlock_t) > 4). Jan
Jan Beulich
2013-Sep-27 12:40 UTC
Re: [PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
>>> On 27.09.13 at 12:55, David Vrabel <david.vrabel@citrix.com> wrote: > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -36,7 +36,7 @@ > #include "grant_table.h" > #include "hvm/save.h" > > -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009 > +#define XEN_DOMCTL_INTERFACE_VERSION 0x0000000aThe mere addition of a new sub-hypercall does not make an interface version bump necessary. With that fixed Reviewed-by: Jan Beulich <jbeulich@suse.com> for the non-XSM parts of the patch. Jan
>>> On 27.09.13 at 12:55, David Vrabel <david.vrabel@citrix.com> wrote: > Patch 1-4 do some preparatory work for supporting alternate ABIs. > > Patch 5 expands the number of evtchn objects a domain may have by > changing how they are allocated. > > Patch 6 adds the public ABI. > > Patch 7 adds the EVTCHNOP_set_priority implementation. This will > return -ENOSYS for ABIs that do not support priority.Up to here Reviewed-by: Jan Beulich <jbeulich@suse.com> despite there still being a few minor coding style issues. One thing - as it is quite wide spread throughout the series - you may want to adjust is the use of "unsigned" when generally we use "unsigned int". Jan
Daniel De Graaf
2013-Sep-27 14:29 UTC
Re: [PATCH 09/11] xen: Add DOMCTL to limit the number of event channels a domain may use
On 09/27/2013 06:55 AM, David Vrabel wrote:> From: David Vrabel <dvrabel@cantab.net> > > Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to > set the maximum event channel port a domain may use. This may be used to > limit the amount of Xen resources (global mapping space and xenheap) that > a domain may use for event channels. > > A domain that does not have a limit set may use all the event channels > supported by the event channel ABI in use. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> > ---[...]> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors > index 5dfe13b..03a8e64 100644 > --- a/xen/xsm/flask/policy/access_vectors > +++ b/xen/xsm/flask/policy/access_vectors > @@ -157,6 +157,8 @@ class domain > set_misc_info > # XEN_DOMCTL_set_virq_handler > set_virq_handler > +# XEN_DOMCTL_set_max_evtchn > + set_max_evtchn > } > > # This is a continuation of class domain, since only 32 permissions can be > # defined per class > class domain2The new domctl access vector must be added to the "domain2" class, not "domain" which is full (already has 32 items). While the hypervisor compilation does not currently report this as an error, attempting to compile the policy (tools/flask/policy) will report it. -- Daniel De Graaf National Security Agency
Konrad Rzeszutek Wilk
2013-Sep-30 18:41 UTC
Re: [PATCHv4 0/11] Xen: FIFO-based event channel ABI
On Fri, Sep 27, 2013 at 11:55:48AM +0100, David Vrabel wrote:> This is a complete implementation of the hypervisor and xl toolstack > parts of the FIFO-based event channel ABI described in this design > document: > > http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf > > Changes in draft F are: > > - READY field in the control block is now 32-bits (so guests only need > to support atomic bit ops on 32-bit words). This is only a > documentation change as the implementation already used a uint32_t. > > - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. > > - DomUs default to unlimited number of event channels requiring > the toolstack to set a limit. > > The toolstack defaults to limiting guests to 127 event channels if the > event_channels option is omitted. This means the minimum amount of > both Xen heap and global mapping space is used regardless of which ABI > is used. If this is considered too restrictive a limit, 1023 would be > another sensible default (limits the guest to a single event array > page but 5 xenheap pages for the struct evtchns).I would say 1023 (so the same value as the existing event mechanism) would be a sensible default.> > An updated version of the Linux patch series is not quite ready yet. > There is one remaining issue but fixing this will not require any > changes to the hypervisor ABI or implementation. The remaining issue > requires preallocating space in the evtchn-to-irq map as this cannot > be expanded in pirq_startup() (since this function cannot return a > failure). > > The latest Linux changes can be found in the orochi-v4w branch of: > > git://xenbits.xen.org/people/dvrabel/linux.git > > Patch 1-4 do some preparatory work for supporting alternate ABIs. > > Patch 5 expands the number of evtchn objects a domain may have by > changing how they are allocated. > > Patch 6 adds the public ABI. > > Patch 7 adds the EVTCHNOP_set_priority implementation. This will > return -ENOSYS for ABIs that do not support priority. > > Patch 8 adds the FIFO-based ABI implementation. > > Patch 9-10 adds the DOMCTL_set_max_evtchn implementation and adds a > function to libxc. This will also work with the 2-level ABI. > > Patch 11 add the event_channels configuration option to xl and libxl > bits needed for this. > > Changes in v4: > > - Updates for Draft F of the design. > - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. > - Hypervisor defaults to unlimited event channels for for DomU. > - Optimized memory allocation for struct evtchn''s when fewer than 128 > are required (see patch 5). > - Added event_channels option to the xl domain configuration file and > plumbed this through libxl_build_info. Defaults to 127. > > Changes in v3: > > - Updates for Draft E of the design. > - Store priority in struct evtchn. > - Implement set_priority with generic code + hook. > - Implement set_limit and add libxc function. > - Add ABI specific output to ''e'' debug key. > > Changes in v2: > > - Updates for Draft D of the design. > - 130,000+ event channels are now supported. > - event_port.c -> event_2l.c and only contains 2l functions. > - Addresses various review comments > - int -> unsigned in lots of places > - use write_atomic() to set HEAD > - removed MAX_EVTCHNS > - evtchn_ops are const. > - Pack struct evtchns better to reduce memory needed. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
>>> On 30.09.13 at 20:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Fri, Sep 27, 2013 at 11:55:48AM +0100, David Vrabel wrote: >> This is a complete implementation of the hypervisor and xl toolstack >> parts of the FIFO-based event channel ABI described in this design >> document: >> >> http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf >> >> Changes in draft F are: >> >> - READY field in the control block is now 32-bits (so guests only need >> to support atomic bit ops on 32-bit words). This is only a >> documentation change as the implementation already used a uint32_t. >> >> - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. >> >> - DomUs default to unlimited number of event channels requiring >> the toolstack to set a limit. >> >> The toolstack defaults to limiting guests to 127 event channels if the >> event_channels option is omitted. This means the minimum amount of >> both Xen heap and global mapping space is used regardless of which ABI >> is used. If this is considered too restrictive a limit, 1023 would be >> another sensible default (limits the guest to a single event array >> page but 5 xenheap pages for the struct evtchns). > > I would say 1023 (so the same value as the existing event mechanism) > would be a sensible default.That''s the existing 32-bit default; 64-bit has 4095 (yet that surely would be needlessly high as the new default). Jan
Ian Campbell
2013-Oct-01 12:30 UTC
Re: [PATCH 10/11] libxc: add xc_domain_set_max_evtchn()
On Fri, 2013-09-27 at 11:55 +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Add xc_domain_set_max_evtchn(), a wrapper around the > DOMCTL_set_max_evtchn hypercall. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com>Acked-by: Ian Campbell <ian.campbell@citrix.com> I''m happy for this to be committed by whoever takes the hypervisor side...> Cc: Ian Jackson <ian.jackson@eu.citrix.com> > --- > tools/libxc/xc_domain.c | 11 +++++++++++ > tools/libxc/xenctrl.h | 12 ++++++++++++ > 2 files changed, 23 insertions(+), 0 deletions(-) > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > index 81316d3..2cea6e3 100644 > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -1766,6 +1766,17 @@ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq) > return do_domctl(xch, &domctl); > } > > +int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid, > + uint32_t max_port) > +{ > + DECLARE_DOMCTL; > + > + domctl.cmd = XEN_DOMCTL_set_max_evtchn; > + domctl.domain = domid; > + domctl.u.set_max_evtchn.max_port = max_port; > + return do_domctl(xch, &domctl); > +} > + > /* > * Local variables: > * mode: C > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h > index 58d51f3..8cf3f3b 100644 > --- a/tools/libxc/xenctrl.h > +++ b/tools/libxc/xenctrl.h > @@ -847,6 +847,18 @@ int xc_domain_set_access_required(xc_interface *xch, > */ > int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq); > > +/** > + * Set the maximum event channel port a domain may bind. > + * > + * This does not affect ports that are already bound. > + * > + * @param xch a handle to an open hypervisor interface > + * @param domid the domain id > + * @param max_port maximum port number > + */ > +int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid, > + uint32_t max_port); > + > /* > * CPUPOOL MANAGEMENT FUNCTIONS > */
Ian Campbell
2013-Oct-01 12:36 UTC
Re: [PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
On Fri, 2013-09-27 at 11:55 +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Add the ''event_channels'' option to the xl configuration file to limit > the number of event channels that domain may use. > > Plumb this option through to libxl via a new libxl_build_info field > and call xc_domain_set_max_evtchn() in the post build stage of domain > creation. > > A new LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS #define indicates that this > new field is available. > > The default value of 127 limits the domain to uses the minimum amount"to use"> of Xen resources (xenheap and global mapping pages) regardless of > event channel ABI that may be used by a guest. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: Ian Campbell <ian.campbell@citrix.com> > Cc: Ian Jackson <ian.jackson@eu.citrix.com> > --- > docs/man/xl.cfg.pod.5 | 12 ++++++++++++ > tools/libxl/libxl.h | 5 +++++ > tools/libxl/libxl_create.c | 3 +++ > tools/libxl/libxl_dom.c | 4 ++++ > tools/libxl/libxl_types.idl | 1 + > tools/libxl/xl_cmdimpl.c | 3 +++ > 6 files changed, 28 insertions(+), 0 deletions(-) > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > index 769767b..3c7dd28 100644 > --- a/docs/man/xl.cfg.pod.5 > +++ b/docs/man/xl.cfg.pod.5 > @@ -572,6 +572,18 @@ Allow a guest to access specific physical IRQs. > It is recommended to use this option only for trusted VMs under > administrator control. > > +=item B<event_channels=N>Since this is actually a limit perhaps apply s/event_channels/max_&/ throughout?> + > +Limit the guest to using a most N event channels (PV interrupts). > +Guests use hypervisor resources for each event channel they use. > + > +The default of 127 should be sufficient for typical guests and means > +the guest uses the lowest amout of hypervisor resources. The maximum"amount"> +value depends what the guest supports. Guests supporting the > +FIFO-based event channel ABI support up to 131,071 event channels. > +Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit > +x86).Does this setting really restrict the classical ABI to 127? TBH I''m fine if it does, just curious.> + > =back > > =head2 Paravirtualised (PV) Guest Specific Options > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 4cab294..30712c2 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -90,6 +90,11 @@ > #define LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE 1 > > /* > + * The libxl_domain_build_info has the event_channels field. > + */ > +#define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1 > + > +/* > * libxl ABI compatibility > * > * The only guarantee which libxl makes regarding ABI compatibility > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 7567238..806e25c 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -208,6 +208,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, > > libxl_defbool_setdefault(&b_info->disable_migrate, false); > > + if (!b_info->event_channels) > + b_info->event_channels = 127; > + > switch (b_info->type) { > case LIBXL_DOMAIN_TYPE_HVM: > if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT) > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index 6e2252a..c905c74 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -268,6 +268,10 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid, > if (rc) > return rc; > > + rc = xc_domain_set_max_evtchn(ctx->xch, domid, info->event_channels); > + if (rc) > + return rc;Is there anything sensible we can log here? Would be nice to give some clue as to why the domain creation failed...> + > libxl_cpuid_apply_policy(ctx, domid); > if (info->cpuid != NULL) > libxl_cpuid_set(ctx, domid, info->cpuid); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index 049dbb5..7bf517d 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -308,6 +308,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ > ("irqs", Array(uint32, "num_irqs")), > ("iomem", Array(libxl_iomem_range, "num_iomem")), > ("claim_mode", libxl_defbool), > + ("event_channels", uint32), > ("u", KeyedUnion(None, libxl_domain_type, "type", > [("hvm", Struct(None, [("firmware", string), > ("bios", libxl_bios_type), > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 3d7eaad..7931bb9 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -813,6 +813,9 @@ static void parse_config_data(const char *config_source, > if (!xlu_cfg_get_long (config, "videoram", &l, 0)) > b_info->video_memkb = l * 1024; > > + if (!xlu_cfg_get_long(config, "event_channels", &l, 0)) > + b_info->event_channels = l; > + > switch(b_info->type) { > case LIBXL_DOMAIN_TYPE_HVM: > if (!xlu_cfg_get_string (config, "kernel", &buf, 0))
David Vrabel
2013-Oct-01 12:53 UTC
Re: [PATCH 11/11] libxl, xl: add event_channels option to xl configuration file
On 01/10/13 13:36, Ian Campbell wrote:> On Fri, 2013-09-27 at 11:55 +0100, David Vrabel wrote: > >> +value depends what the guest supports. Guests supporting the >> +FIFO-based event channel ABI support up to 131,071 event channels. >> +Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit >> +x86). > > Does this setting really restrict the classical ABI to 127? TBH I''m fine > if it does, just curious.Yes, the limit applies regardless of the ABI that a guest uses. With the 2-level ABI it doesn''t save much Xen resources though (1 xen heap page per 128 events). David
Konrad Rzeszutek Wilk
2013-Oct-01 14:22 UTC
Re: [PATCHv4 0/11] Xen: FIFO-based event channel ABI
On Tue, Oct 01, 2013 at 11:25:59AM +0100, Jan Beulich wrote:> >>> On 30.09.13 at 20:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > On Fri, Sep 27, 2013 at 11:55:48AM +0100, David Vrabel wrote: > >> This is a complete implementation of the hypervisor and xl toolstack > >> parts of the FIFO-based event channel ABI described in this design > >> document: > >> > >> http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf > >> > >> Changes in draft F are: > >> > >> - READY field in the control block is now 32-bits (so guests only need > >> to support atomic bit ops on 32-bit words). This is only a > >> documentation change as the implementation already used a uint32_t. > >> > >> - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. > >> > >> - DomUs default to unlimited number of event channels requiring > >> the toolstack to set a limit. > >> > >> The toolstack defaults to limiting guests to 127 event channels if the > >> event_channels option is omitted. This means the minimum amount of > >> both Xen heap and global mapping space is used regardless of which ABI > >> is used. If this is considered too restrictive a limit, 1023 would be > >> another sensible default (limits the guest to a single event array > >> page but 5 xenheap pages for the struct evtchns). > > > > I would say 1023 (so the same value as the existing event mechanism) > > would be a sensible default. > > That''s the existing 32-bit default; 64-bit has 4095 (yet that surely > would be needlessly high as the new default).127 is too little I think. For example for every VCPU there are 6 events being consumed (VIRQ_TIMER, VIRQ_DEBUG, CALLFUNCSINGLE, CALLFUNC, RESCHED and IRQWORK). If you launch a 32 VCPU guest you are already at 224. Then there is the blk event channel, the tx/rx of the vif. With the possibility of per-cpu tx/rx of vifs you would have 2*VCPU, so now we are at 288. If you want even more LUNS (say 16), you are at 304. 1023 being the universal value looks OK to me.> > Jan >
>>> On 01.10.13 at 16:22, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Tue, Oct 01, 2013 at 11:25:59AM +0100, Jan Beulich wrote: >> >>> On 30.09.13 at 20:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >> > On Fri, Sep 27, 2013 at 11:55:48AM +0100, David Vrabel wrote: >> >> This is a complete implementation of the hypervisor and xl toolstack >> >> parts of the FIFO-based event channel ABI described in this design >> >> document: >> >> >> >> http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf >> >> >> >> Changes in draft F are: >> >> >> >> - READY field in the control block is now 32-bits (so guests only need >> >> to support atomic bit ops on 32-bit words). This is only a >> >> documentation change as the implementation already used a uint32_t. >> >> >> >> - DOMCTL_set_max_evtchn replaces EVTCHNOP_set_limit. >> >> >> >> - DomUs default to unlimited number of event channels requiring >> >> the toolstack to set a limit. >> >> >> >> The toolstack defaults to limiting guests to 127 event channels if the >> >> event_channels option is omitted. This means the minimum amount of >> >> both Xen heap and global mapping space is used regardless of which ABI >> >> is used. If this is considered too restrictive a limit, 1023 would be >> >> another sensible default (limits the guest to a single event array >> >> page but 5 xenheap pages for the struct evtchns). >> > >> > I would say 1023 (so the same value as the existing event mechanism) >> > would be a sensible default. >> >> That''s the existing 32-bit default; 64-bit has 4095 (yet that surely >> would be needlessly high as the new default). > > 127 is too little I think.I agree; I''m fine with defaulting to 1023 (I only wanted to point out that other than you claimed this is lower than the 2-level default on 64-bit guests). Perhaps the tools could even be intelligent enough to make the default depend on the vCPU count of the guest.> For example for every VCPU there are 6 events > being consumed (VIRQ_TIMER, VIRQ_DEBUG, CALLFUNCSINGLE, CALLFUNC, RESCHED > and IRQWORK). If you launch a 32 VCPU guest you are already at 224.Because you waste them - all the IPI flavors could collectively do with just one event channel per vCPU (as our more recent kernels do). You of course need a separate one for the timer, and you forgot the spin lock polling one. Whether the VIRQ_DEBUG one is always necessary I''m not sure - I would think the kernel should by default avoid registering it. Jan