Changes from RFC V2: * Adjust registration interface * Get rid of xmalloc and friends in registraion routine * Avoid redirection with function pointers * Share routines between 2 and 3 level event channels Changes from RFC V1; * Use function pointers to get rid of switch statments * Do not manipulate VCPU state * No more gcc-ism code in public headers * Consolidate some boilerplates using macros
Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/xen/sched.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 90a6537..39f85d2 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -92,7 +92,7 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */ struct waitqueue_vcpu; -struct vcpu +struct vcpu { int vcpu_id; @@ -453,7 +453,7 @@ struct domain *domain_create( /* * rcu_lock_domain_by_id() is more efficient than get_domain_by_id(). * This is the preferred function if the returned domain reference - * is short lived, but it cannot be used if the domain reference needs + * is short lived, but it cannot be used if the domain reference needs * to be kept beyond the current scope (e.g., across a softirq). * The returned domain reference must be discarded using rcu_unlock_domain(). */ @@ -574,7 +574,7 @@ void sync_local_execstate(void); * sync_vcpu_execstate() will switch and commit @prev''s state. */ void context_switch( - struct vcpu *prev, + struct vcpu *prev, struct vcpu *next); /* -- 1.7.10.4
Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/xen/event.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 71c3e92..65ac81a 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -1,8 +1,8 @@ /****************************************************************************** * event.h - * + * * A nice interface for passing asynchronous events to guest OSes. - * + * * Copyright (c) 2002-2006, K A Fraser */ -- 1.7.10.4
Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/public/xen.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 5593066..fe44eb5 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -1,8 +1,8 @@ /****************************************************************************** * xen.h - * + * * Guest OS interface to Xen. - * + * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to * deal in the Software without restriction, including without limitation the @@ -137,11 +137,11 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t); #define __HYPERVISOR_dom0_op __HYPERVISOR_platform_op #endif -/* +/* * VIRTUAL INTERRUPTS - * + * * Virtual interrupts that a guest OS may receive from Xen. - * + * * In the side comments, ''V.'' denotes a per-VCPU VIRQ while ''G.'' denotes a * global VIRQ. The former can be bound once per VCPU and cannot be re-bound. * The latter can be allocated only once per guest: they must initially be @@ -190,7 +190,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t); * (x) encodes the PFD as follows: * x == 0 => PFD == DOMID_SELF * x != 0 => PFD == x - 1 - * + * * Sub-commands: ptr[1:0] specifies the appropriate MMU_* command. * ------------- * ptr[1:0] == MMU_NORMAL_PT_UPDATE: @@ -236,13 +236,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t); * To deallocate the pages, the operations are the reverse of the steps * mentioned above. The argument is MMUEXT_UNPIN_TABLE for all levels and the * pagetable MUST not be in use (meaning that the cr3 is not set to it). - * + * * ptr[1:0] == MMU_MACHPHYS_UPDATE: * Updates an entry in the machine->pseudo-physical mapping table. * ptr[:2] -- Machine address within the frame whose mapping to modify. * The frame must belong to the FD, if one is specified. * val -- Value to write into the mapping entry. - * + * * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD: * As MMU_NORMAL_PT_UPDATE above, but A/D bits currently in the PTE are ORed * with those in @val. @@ -588,7 +588,7 @@ typedef struct vcpu_time_info vcpu_time_info_t; struct vcpu_info { /* * ''evtchn_upcall_pending'' is written non-zero by Xen to indicate - * a pending notification for a particular VCPU. It is then cleared + * a pending notification for a particular VCPU. It is then cleared * by the guest OS /before/ checking for pending work, thus avoiding * a set-and-check race. Note that the mask is only accessed by Xen * on the CPU that is currently hosting the VCPU. This means that the @@ -646,7 +646,7 @@ struct shared_info { * 3. Virtual interrupts (''events''). A domain can bind an event-channel * port to a virtual interrupt source, such as the virtual-timer * device or the emergency console. - * + * * Event channels are addressed by a "port index". Each channel is * associated with two bits of information: * 1. PENDING -- notifies the domain that there is a pending notification @@ -657,7 +657,7 @@ struct shared_info { * becomes pending while the channel is masked then the ''edge'' is lost * (i.e., when the channel is unmasked, the guest must manually handle * pending notifications as no upcall will be scheduled by Xen). - * + * * To expedite scanning of pending notifications, any 0->1 pending * transition on an unmasked channel causes a corresponding bit in a * per-vcpu selector word to be set. Each bit in the selector covers a -- 1.7.10.4
Wei Liu
2013-Jan-31 14:42 UTC
[PATCH 04/16] Move event channel macros / struct definition to proper place
The original ones seem to be mis-placed in sched.h, move them to proper place in xen.h and event.h. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/public/xen.h | 2 ++ xen/include/xen/event.h | 43 +++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/sched.h | 45 --------------------------------------------- 3 files changed, 45 insertions(+), 45 deletions(-) diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index fe44eb5..6132682 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -557,6 +557,8 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); * 1024 if a long is 32 bits; 4096 if a long is 64 bits. */ #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64) +#define EVTCHNS_PER_BUCKET 128 +#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) struct vcpu_time_info { /* diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 65ac81a..1c13bd0 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -15,6 +15,49 @@ #include <asm/bitops.h> #include <asm/event.h> +#ifndef CONFIG_COMPAT +#define BITS_PER_EVTCHN_WORD(d) BITS_PER_LONG +#else +#define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_LONG) +#endif +#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d)) + +struct evtchn +{ +#define ECS_FREE 0 /* Channel is available for use. */ +#define ECS_RESERVED 1 /* Channel is reserved. */ +#define ECS_UNBOUND 2 /* Channel is waiting to bind to a remote domain. */ +#define ECS_INTERDOMAIN 3 /* Channel is bound to another domain. */ +#define ECS_PIRQ 4 /* Channel is bound to a physical IRQ line. */ +#define ECS_VIRQ 5 /* Channel is bound to a virtual IRQ line. */ +#define ECS_IPI 6 /* Channel is bound to a virtual IPI line. */ + u8 state; /* ECS_* */ + u8 xen_consumer; /* Consumer in Xen, if any? (0 = send to guest) */ + u16 notify_vcpu_id; /* VCPU for local delivery notification */ + union { + struct { + domid_t remote_domid; + } unbound; /* state == ECS_UNBOUND */ + struct { + u16 remote_port; + struct domain *remote_dom; + } interdomain; /* state == ECS_INTERDOMAIN */ + struct { + u16 irq; + u16 next_port; + u16 prev_port; + } pirq; /* state == ECS_PIRQ */ + u16 virq; /* state == ECS_VIRQ */ + } u; +#ifdef FLASK_ENABLE + void *ssid; +#endif +}; + +int evtchn_init(struct domain *d); /* from domain_create */ +void evtchn_destroy(struct domain *d); /* from domain_kill */ +void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */ + /* * send_guest_vcpu_virq: Notify guest via a per-VCPU VIRQ. * @v: VCPU to which virtual IRQ should be sent diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 39f85d2..64a0ba4 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -45,51 +45,6 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t); /* A global pointer to the initial domain (DOM0). */ extern struct domain *dom0; -#ifndef CONFIG_COMPAT -#define BITS_PER_EVTCHN_WORD(d) BITS_PER_LONG -#else -#define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_LONG) -#endif -#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d)) -#define EVTCHNS_PER_BUCKET 128 -#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) - -struct evtchn -{ -#define ECS_FREE 0 /* Channel is available for use. */ -#define ECS_RESERVED 1 /* Channel is reserved. */ -#define ECS_UNBOUND 2 /* Channel is waiting to bind to a remote domain. */ -#define ECS_INTERDOMAIN 3 /* Channel is bound to another domain. */ -#define ECS_PIRQ 4 /* Channel is bound to a physical IRQ line. */ -#define ECS_VIRQ 5 /* Channel is bound to a virtual IRQ line. */ -#define ECS_IPI 6 /* Channel is bound to a virtual IPI line. */ - u8 state; /* ECS_* */ - u8 xen_consumer; /* Consumer in Xen, if any? (0 = send to guest) */ - u16 notify_vcpu_id; /* VCPU for local delivery notification */ - union { - struct { - domid_t remote_domid; - } unbound; /* state == ECS_UNBOUND */ - struct { - u16 remote_port; - struct domain *remote_dom; - } interdomain; /* state == ECS_INTERDOMAIN */ - struct { - u16 irq; - u16 next_port; - u16 prev_port; - } pirq; /* state == ECS_PIRQ */ - u16 virq; /* state == ECS_VIRQ */ - } u; -#ifdef FLASK_ENABLE - void *ssid; -#endif -}; - -int evtchn_init(struct domain *d); /* from domain_create */ -void evtchn_destroy(struct domain *d); /* from domain_kill */ -void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */ - struct waitqueue_vcpu; struct vcpu -- 1.7.10.4
This field is manipulated by hypervisor only, so if anything goes wrong it is a bug. The default event channel is 2, which has two level lookup structure: a selector in struct vcpu and a shared bitmap in shared info. The up coming 3-level event channel utilizes three level lookup structure: a top level selector and second level selector for every vcpu, and shared bitmap. When constructing a domain, it starts with 2-level event channel, which is guaranteed to be supported by the hypervisor. If a domain wants to use N (N>=3) level event channel, it must explicitly issue a hypercall to setup N-level event channel. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/common/event_channel.c | 1 + xen/include/xen/event.h | 16 +++++++++++++++- xen/include/xen/sched.h | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 9231eb0..b96d5b1 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1173,6 +1173,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport) int evtchn_init(struct domain *d) { spin_lock_init(&d->event_lock); + d->evtchn_level = EVTCHN_DEFAULT_LEVEL; if ( get_free_port(d) != 0 ) return -EINVAL; evtchn_from_port(d, 0)->state = ECS_RESERVED; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 1c13bd0..c17b891 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -20,7 +20,21 @@ #else #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_LONG) #endif -#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d)) +#define EVTCHN_2_LEVEL 2 +#define EVTCHN_3_LEVEL 3 +#define EVTCHN_DEFAULT_LEVEL EVTCHN_2_LEVEL +#define MAX_EVTCHNS_L2(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d)) +#define MAX_EVTCHNS_L3(d) (MAX_EVTCHNS_L2(d) * BITS_PER_EVTCHN_WORD(d)) +#define MAX_EVTCHNS(d) ({ int __v = 0; \ + switch ( d->evtchn_level ) { \ + case EVTCHN_2_LEVEL: \ + __v = MAX_EVTCHNS_L2(d); break; \ + case EVTCHN_3_LEVEL: \ + __v = MAX_EVTCHNS_L3(d); break; \ + default: \ + BUG(); \ + }; \ + __v;}) struct evtchn { diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 64a0ba4..21f7b68 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -217,6 +217,7 @@ struct domain /* Event channel information. */ struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; spinlock_t event_lock; + unsigned int evtchn_level; struct grant_table *grant_table; -- 1.7.10.4
As we move to N level evtchn we need bigger d->evtchn, as a result this will bloat struct domain. So move this array out of struct domain and allocate a dedicated page for it. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/common/event_channel.c | 17 +++++++++++++++-- xen/include/xen/sched.h | 2 +- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index b96d5b1..43ee854 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1172,16 +1172,27 @@ void notify_via_xen_event_channel(struct domain *ld, int lport) int evtchn_init(struct domain *d) { + BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE); + d->evtchn = alloc_xenheap_page(); + + if ( d->evtchn == NULL ) + return -ENOMEM; + clear_page(d->evtchn); + spin_lock_init(&d->event_lock); d->evtchn_level = EVTCHN_DEFAULT_LEVEL; - if ( get_free_port(d) != 0 ) + if ( get_free_port(d) != 0 ) { + free_xenheap_page(d->evtchn); return -EINVAL; + } evtchn_from_port(d, 0)->state = ECS_RESERVED; #if MAX_VIRT_CPUS > BITS_PER_LONG d->poll_mask = xmalloc_array(unsigned long, BITS_TO_LONGS(MAX_VIRT_CPUS)); - if ( !d->poll_mask ) + if ( !d->poll_mask ) { + free_xenheap_page(d->evtchn); return -ENOMEM; + } bitmap_zero(d->poll_mask, MAX_VIRT_CPUS); #endif @@ -1215,6 +1226,8 @@ void evtchn_destroy(struct domain *d) spin_unlock(&d->event_lock); clear_global_virq_handlers(d); + + free_xenheap_page(d->evtchn); } diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 21f7b68..2f18fe5 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -215,7 +215,7 @@ struct domain spinlock_t rangesets_lock; /* Event channel information. */ - struct evtchn *evtchn[NR_EVTCHN_BUCKETS]; + struct evtchn **evtchn; spinlock_t event_lock; unsigned int evtchn_level; -- 1.7.10.4
For 64 bit build and 3-level event channel and the original value of EVTCHNS_PER_BUCKET (128), the space needed to accommodate d->evtchn would be 4 pages (PAGE_SIZE = 4096). Given that not every domain needs 3-level event channel, this leads to waste of memory. Also we''ve restricted d->evtchn to one page, if we move to 3-level event channel, Xen cannot build. Having EVTCHN_PER_BUCKETS to be 512 can occupy exact one page. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/public/xen.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 6132682..4a354e1 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -557,7 +557,7 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); * 1024 if a long is 32 bits; 4096 if a long is 64 bits. */ #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64) -#define EVTCHNS_PER_BUCKET 128 +#define EVTCHNS_PER_BUCKET 512 #define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) struct vcpu_time_info { -- 1.7.10.4
Wei Liu
2013-Jan-31 14:42 UTC
[PATCH 08/16] Add evtchn_is_{pending, masked} and evtchn_clear_pending
Some code paths access the arrays in shared info directly. This only works with 2-level event channel. Add functions to abstract away implementation details. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/arch/x86/irq.c | 7 +++---- xen/common/event_channel.c | 22 +++++++++++++++++++--- xen/common/keyhandler.c | 6 ++---- xen/common/schedule.c | 2 +- xen/include/xen/event.h | 6 ++++++ 5 files changed, 31 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 068c5a0..216271b 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1452,7 +1452,7 @@ int pirq_guest_unmask(struct domain *d) { pirq = pirqs[i]->pirq; if ( pirqs[i]->masked && - !test_bit(pirqs[i]->evtchn, &shared_info(d, evtchn_mask)) ) + !evtchn_is_masked(d, pirqs[i]->evtchn) ) pirq_guest_eoi(pirqs[i]); } } while ( ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) ); @@ -2093,13 +2093,12 @@ static void dump_irqs(unsigned char key) info = pirq_info(d, pirq); printk("%u:%3d(%c%c%c%c)", d->domain_id, pirq, - (test_bit(info->evtchn, - &shared_info(d, evtchn_pending)) ? + (evtchn_is_pending(d, info->evtchn) ? ''P'' : ''-''), (test_bit(info->evtchn / BITS_PER_EVTCHN_WORD(d), &vcpu_info(d->vcpu[0], evtchn_pending_sel)) ? ''S'' : ''-''), - (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ? + (evtchn_is_masked(d, info->evtchn) ? ''M'' : ''-''), (info->masked ? ''M'' : ''-'')); if ( i != action->nr_guests ) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 43ee854..37fecee 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -95,6 +95,7 @@ static uint8_t get_xen_consumer(xen_event_channel_notification_t fn) #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1]) static void evtchn_set_pending(struct vcpu *v, int port); +static void evtchn_clear_pending(struct domain *d, int port); static int virq_is_global(uint32_t virq) { @@ -156,6 +157,16 @@ static int get_free_port(struct domain *d) return port; } +int evtchn_is_pending(struct domain *d, int port) +{ + return test_bit(port, &shared_info(d, evtchn_pending)); +} + +int evtchn_is_masked(struct domain *d, int port) +{ + return test_bit(port, &shared_info(d, evtchn_mask)); +} + static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc) { @@ -529,7 +540,7 @@ static long __evtchn_close(struct domain *d1, int port1) } /* Clear pending event to avoid unexpected behavior on re-bind. */ - clear_bit(port1, &shared_info(d1, evtchn_pending)); + evtchn_clear_pending(d1, port1); /* Reset binding to vcpu0 when the channel is freed. */ chn1->state = ECS_FREE; @@ -653,6 +664,11 @@ static void evtchn_set_pending(struct vcpu *v, int port) } } +static void evtchn_clear_pending(struct domain *d, int port) +{ + clear_bit(port, &shared_info(d, evtchn_pending)); +} + int guest_enabled_event(struct vcpu *v, uint32_t virq) { return ((v != NULL) && (v->virq_to_evtchn[virq] != 0)); @@ -1283,8 +1299,8 @@ static void domain_dump_evtchn_info(struct domain *d) printk(" %4u [%d/%d]: s=%d n=%d x=%d", port, - !!test_bit(port, &shared_info(d, evtchn_pending)), - !!test_bit(port, &shared_info(d, evtchn_mask)), + !!evtchn_is_pending(d, port), + !!evtchn_is_masked(d, port), chn->state, chn->notify_vcpu_id, chn->xen_consumer); switch ( chn->state ) diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index 2c5c230..16bc452 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -301,10 +301,8 @@ static void dump_domains(unsigned char key) printk("Notifying guest %d:%d (virq %d, port %d, stat %d/%d/%d)\n", d->domain_id, v->vcpu_id, VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG], - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_pending)), - test_bit(v->virq_to_evtchn[VIRQ_DEBUG], - &shared_info(d, evtchn_mask)), + evtchn_is_pending(d, v->virq_to_evtchn[VIRQ_DEBUG]), + evtchn_is_masked(d, v->virq_to_evtchn[VIRQ_DEBUG]), test_bit(v->virq_to_evtchn[VIRQ_DEBUG] / BITS_PER_EVTCHN_WORD(d), &vcpu_info(v, evtchn_pending_sel))); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index e6a90d8..1bf010e 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -693,7 +693,7 @@ static long do_poll(struct sched_poll *sched_poll) goto out; rc = 0; - if ( test_bit(port, &shared_info(d, evtchn_pending)) ) + if ( evtchn_is_pending(d, port) ) goto out; } diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index c17b891..2d2c585 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -111,6 +111,12 @@ int evtchn_unmask(unsigned int port); /* Move all PIRQs after a vCPU was moved to another pCPU. */ void evtchn_move_pirqs(struct vcpu *v); +/* Tell a given event-channel port is pending or not */ +int evtchn_is_pending(struct domain *d, int port); + +/* Tell a given event-channel port is masked or not */ +int evtchn_is_masked(struct domain *d, int port); + /* Allocate/free a Xen-attached event channel port. */ typedef void (*xen_event_channel_notification_t)( struct vcpu *v, unsigned int port); -- 1.7.10.4
For N-level event channels, the shared bitmaps in the hypervisor are by design not guaranteed to be contigious. These macros are used to calculate page number / offset within a page of a given event channel. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/asm-arm/types.h | 7 +++++-- xen/include/asm-x86/config.h | 4 +++- xen/include/xen/event.h | 13 +++++++++++++ 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/xen/include/asm-arm/types.h b/xen/include/asm-arm/types.h index 48864f9..65562b8 100644 --- a/xen/include/asm-arm/types.h +++ b/xen/include/asm-arm/types.h @@ -41,10 +41,13 @@ typedef char bool_t; #define test_and_clear_bool(b) xchg(&(b), 0) #endif /* __ASSEMBLY__ */ +#define BYTE_BITORDER 3 +#define BITS_PER_BYTE (1 << BYTE_BITORDER) -#define BITS_PER_LONG 32 -#define BYTES_PER_LONG 4 +#define BITS_PER_LONG (1 << LONG_BITORDER) #define LONG_BYTEORDER 2 +#define LONG_BITORDER (LONG_BYTEORDER + BYTE_BITORDER) +#define BYTES_PER_LONG (1 << LONG_BYTEORDER) #endif /* __ARM_TYPES_H__ */ /* diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h index da82e73..b921586 100644 --- a/xen/include/asm-x86/config.h +++ b/xen/include/asm-x86/config.h @@ -8,11 +8,13 @@ #define __X86_CONFIG_H__ #define LONG_BYTEORDER 3 +#define BYTE_BITORDER 3 +#define LONG_BITORDER (BYTE_BITORDER + LONG_BYTEORDER) #define CONFIG_PAGING_LEVELS 4 #define BYTES_PER_LONG (1 << LONG_BYTEORDER) #define BITS_PER_LONG (BYTES_PER_LONG << 3) -#define BITS_PER_BYTE 8 +#define BITS_PER_BYTE (1 << BYTE_BITORDER) #define CONFIG_X86 1 #define CONFIG_X86_HT 1 diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 2d2c585..cacd89d 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -36,6 +36,19 @@ }; \ __v;}) +/* N.B. EVTCHNS_PER_PAGE is always powers of 2, use shifts to optimize */ +#define EVTCHNS_SHIFT (PAGE_SHIFT+BYTE_BITORDER) +#define EVTCHNS_PER_PAGE (_AC(1,L) << EVTCHNS_SHIFT) +#define EVTCHN_MASK (~(EVTCHNS_PER_PAGE-1)) +#define EVTCHN_PAGE_NO(chn) ((chn) >> EVTCHNS_SHIFT) +#define EVTCHN_OFFSET_IN_PAGE(chn) ((chn) & ~EVTCHN_MASK) + +#ifndef CONFIG_COMPAT +#define EVTCHN_WORD_BITORDER(d) LONG_BITORDER +#else +#define EVTCHN_WORD_BITORDER(d) (has_32bit_shinfo(d) ? 5 : LONG_BITORDER) +#endif + struct evtchn { #define ECS_FREE 0 /* Channel is available for use. */ -- 1.7.10.4
Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/public/xen.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 4a354e1..2e2ec7f 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -554,11 +554,19 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); /* * Event channel endpoints per domain: + * 2-level: * 1024 if a long is 32 bits; 4096 if a long is 64 bits. + * 3-level: + * 32k if a long is 32 bits; 256k if a long is 64 bits. */ -#define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64) +#define NR_EVENT_CHANNELS_L2 (sizeof(unsigned long) * sizeof(unsigned long) * 64) +#define NR_EVENT_CHANNELS_L3 (NR_EVENT_CHANNELS_L2 * 64) +#if !defined(__XEN__) && !defined(__XEN_TOOLS__) +#define NR_EVENT_CHANNELS NR_EVENT_CHANNELS_L2 /* for compatibility */ +#endif + #define EVTCHNS_PER_BUCKET 512 -#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) +#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS_L2 / EVTCHNS_PER_BUCKET) struct vcpu_time_info { /* -- 1.7.10.4
Wei Liu
2013-Jan-31 14:42 UTC
[PATCH 11/16] Define N-level event channel registration interface
Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/public/event_channel.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h index 07ff321..f26d6d5 100644 --- a/xen/include/public/event_channel.h +++ b/xen/include/public/event_channel.h @@ -71,6 +71,7 @@ #define EVTCHNOP_bind_vcpu 8 #define EVTCHNOP_unmask 9 #define EVTCHNOP_reset 10 +#define EVTCHNOP_register_nlevel 11 /* ` } */ typedef uint32_t evtchn_port_t; @@ -258,6 +259,38 @@ struct evtchn_reset { typedef struct evtchn_reset evtchn_reset_t; /* + * EVTCHNOP_register_nlevel: Register N-level event channel + * NOTES: + * 1. Currently only 3-level is supported. + * 2. Should fall back to 2-level if this call fails. + */ +/* 64 bit guests need 8 pages for evtchn_pending and evtchn_mask for + * 256k event channels while 32 bit ones only need 1 page for 32k + * event channels. */ +#define EVTCHN_MAX_L3_PAGES 8 +struct evtchn_register_3level { + /* IN parameters. */ + uint32_t nr_pages; /* for evtchn_{pending,mask} */ + uint32_t nr_vcpus; /* for l2sel_{mfns,offsets} */ + XEN_GUEST_HANDLE(xen_pfn_t) evtchn_pending; + XEN_GUEST_HANDLE(xen_pfn_t) evtchn_mask; + XEN_GUEST_HANDLE(xen_pfn_t) l2sel_mfns; + XEN_GUEST_HANDLE(xen_pfn_t) l2sel_offsets; +}; +typedef struct evtchn_register_3level evtchn_register_3level_t; +DEFINE_XEN_GUEST_HANDLE(evtchn_register_3level_t); + +struct evtchn_register_nlevel { + /* IN parameters. */ + uint32_t level; + union { + evtchn_register_3level_t l3; + } u; +}; +typedef struct evtchn_register_nlevel evtchn_register_nlevel_t; +DEFINE_XEN_GUEST_HANDLE(evtchn_register_nlevel_t); + +/* * ` enum neg_errnoval * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op) * ` -- 1.7.10.4
Wei Liu
2013-Jan-31 14:43 UTC
[PATCH 12/16] Add control structures for 3-level event channel
The references to shared bitmap pending / mask are embedded in struct domain. And pointer to the second level selector is embedded in struct vcpu. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/xen/sched.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 2f18fe5..1d8c1b5 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -24,6 +24,7 @@ #include <public/sysctl.h> #include <public/vcpu.h> #include <public/mem_event.h> +#include <public/event_channel.h> #ifdef CONFIG_COMPAT #include <compat/vcpu.h> @@ -57,6 +58,9 @@ struct vcpu struct domain *domain; + /* For 3-level event channels */ + unsigned long *evtchn_pending_sel_l2; + struct vcpu *next_in_list; s_time_t periodic_period; @@ -218,6 +222,8 @@ struct domain struct evtchn **evtchn; spinlock_t event_lock; unsigned int evtchn_level; + unsigned long *evtchn_pending[EVTCHN_MAX_L3_PAGES]; + unsigned long *evtchn_mask[EVTCHN_MAX_L3_PAGES]; struct grant_table *grant_table; -- 1.7.10.4
Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/include/public/xen.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 2e2ec7f..8fecd07 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -566,7 +566,7 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); #endif #define EVTCHNS_PER_BUCKET 512 -#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS_L2 / EVTCHNS_PER_BUCKET) +#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS_L3 / EVTCHNS_PER_BUCKET) struct vcpu_time_info { /* -- 1.7.10.4
Use pointer in struct domain to reference evtchn_pending and evtchn_mask bitmaps. When building a domain, the default operation set is 2-level operation set. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/arch/arm/domain.c | 1 + xen/arch/x86/domain.c | 1 + xen/common/event_channel.c | 65 ++++++++++++++++++++++++++++++++++++-------- xen/include/xen/event.h | 3 ++ 4 files changed, 59 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 59d8d73..bc477f6 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -417,6 +417,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) goto fail; clear_page(d->shared_info); + evtchn_set_default_bitmap(d); share_xen_page_with_guest( virt_to_page(d->shared_info), d, XENSHARE_writable); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index a58cc1a..a669dc0 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -580,6 +580,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) goto fail; clear_page(d->shared_info); + evtchn_set_default_bitmap(d); share_xen_page_with_guest( virt_to_page(d->shared_info), d, XENSHARE_writable); diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 37fecee..1ce97b0 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -51,6 +51,9 @@ #define consumer_is_xen(e) (!!(e)->xen_consumer) +static void evtchn_set_pending(struct vcpu *v, int port); +static void evtchn_clear_pending(struct domain *d, int port); + /* * The function alloc_unbound_xen_event_channel() allows an arbitrary * notifier function to be specified. However, very few unique functions @@ -94,9 +97,6 @@ static uint8_t get_xen_consumer(xen_event_channel_notification_t fn) /* Get the notification function for a given Xen-bound event channel. */ #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1]) -static void evtchn_set_pending(struct vcpu *v, int port); -static void evtchn_clear_pending(struct domain *d, int port); - static int virq_is_global(uint32_t virq) { int rc; @@ -159,15 +159,18 @@ static int get_free_port(struct domain *d) int evtchn_is_pending(struct domain *d, int port) { - return test_bit(port, &shared_info(d, evtchn_pending)); + unsigned int page_no = EVTCHN_PAGE_NO(port); + unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port); + return test_bit(offset, d->evtchn_pending[page_no]); } int evtchn_is_masked(struct domain *d, int port) { - return test_bit(port, &shared_info(d, evtchn_mask)); + unsigned int page_no = EVTCHN_PAGE_NO(port); + unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port); + return test_bit(offset, d->evtchn_mask[page_no]); } - static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc) { struct evtchn *chn; @@ -623,7 +626,7 @@ out: return ret; } -static void evtchn_set_pending(struct vcpu *v, int port) +static void evtchn_set_pending_l2(struct vcpu *v, int port) { struct domain *d = v->domain; int vcpuid; @@ -664,9 +667,25 @@ static void evtchn_set_pending(struct vcpu *v, int port) } } +static void evtchn_set_pending(struct vcpu *v, int port) +{ + struct domain *d = v->domain; + + switch ( d->evtchn_level ) + { + case EVTCHN_2_LEVEL: + evtchn_set_pending_l2(v, port); + break; + default: + BUG(); + } +} + static void evtchn_clear_pending(struct domain *d, int port) { - clear_bit(port, &shared_info(d, evtchn_pending)); + unsigned int page_no = EVTCHN_PAGE_NO(port); + unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port); + clear_bit(offset, d->evtchn_pending[page_no]); } int guest_enabled_event(struct vcpu *v, uint32_t virq) @@ -932,10 +951,12 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) } -int evtchn_unmask(unsigned int port) +static int evtchn_unmask_l2(unsigned int port) { struct domain *d = current->domain; struct vcpu *v; + unsigned int page_no = EVTCHN_PAGE_NO(port); + unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port); ASSERT(spin_is_locked(&d->event_lock)); @@ -948,8 +969,8 @@ int evtchn_unmask(unsigned int port) * These operations must happen in strict order. Based on * include/xen/event.h:evtchn_set_pending(). */ - if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) && - test_bit (port, &shared_info(d, evtchn_pending)) && + if ( test_and_clear_bit(offset, d->evtchn_mask[page_no]) && + test_bit (offset, d->evtchn_pending[page_no]) && !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d), &vcpu_info(v, evtchn_pending_sel)) ) { @@ -959,6 +980,23 @@ int evtchn_unmask(unsigned int port) return 0; } +int evtchn_unmask(unsigned int port) +{ + struct domain *d = current->domain; + int rc = 0; + + switch ( d->evtchn_level ) + { + case EVTCHN_2_LEVEL: + rc = evtchn_unmask_l2(port); + break; + default: + BUG(); + } + + return rc; +} + static long evtchn_reset(evtchn_reset_t *r) { @@ -1185,6 +1223,11 @@ void notify_via_xen_event_channel(struct domain *ld, int lport) spin_unlock(&ld->event_lock); } +void evtchn_set_default_bitmap(struct domain *d) +{ + d->evtchn_pending[0] = (unsigned long *)shared_info(d, evtchn_pending); + d->evtchn_mask[0] = (unsigned long *)shared_info(d, evtchn_mask); +} int evtchn_init(struct domain *d) { diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index cacd89d..34a82d0 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -145,6 +145,9 @@ int guest_enabled_event(struct vcpu *v, uint32_t virq); /* Notify remote end of a Xen-attached event channel.*/ void notify_via_xen_event_channel(struct domain *ld, int lport); +/* This is called after domain''s shared info page is setup */ +void evtchn_set_default_bitmap(struct domain *d); + /* Internal event channel object accessors */ #define bucket_from_port(d,p) \ ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET]) -- 1.7.10.4
Wei Liu
2013-Jan-31 14:43 UTC
[PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
NOTE: the registration call is always failed because other part of the code is not yet completed. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/common/event_channel.c | 278 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 278 insertions(+) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 1ce97b0..c448c60 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -26,6 +26,7 @@ #include <xen/compat.h> #include <xen/guest_access.h> #include <xen/keyhandler.h> +#include <xen/paging.h> #include <asm/current.h> #include <public/xen.h> @@ -1024,6 +1025,258 @@ out: } +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending, + xen_pfn_t *mask, int nr_pages) +{ + int rc; + void *mapping; + struct page_info *pginfo; + unsigned long gfn; + int pending_count = 0, mask_count = 0; + +#define __MAP(src, dst, cnt) \ + for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ ) \ + { \ + rc = -EINVAL; \ + gfn = (src)[(cnt)]; \ + pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); \ + if ( !pginfo ) \ + goto err; \ + if ( !get_page_type(pginfo, PGT_writable_page) ) \ + { \ + put_page(pginfo); \ + goto err; \ + } \ + mapping = __map_domain_page_global(pginfo); \ + if ( !mapping ) \ + { \ + put_page_and_type(pginfo); \ + rc = -ENOMEM; \ + goto err; \ + } \ + (dst)[(cnt)] = mapping; \ + } + + __MAP(pending, d->evtchn_pending, pending_count) + __MAP(mask, d->evtchn_mask, mask_count) +#undef __MAP + + rc = 0; + + err: + return rc; +} + +static void __unmap_l3_arrays(struct domain *d) +{ + int i; + unsigned long mfn; + + for ( i = 0; i < EVTCHN_MAX_L3_PAGES; i++ ) + { + if ( d->evtchn_pending[i] != 0 ) + { + mfn = domain_page_map_to_mfn(d->evtchn_pending[i]); + unmap_domain_page_global(d->evtchn_pending[i]); + put_page_and_type(mfn_to_page(mfn)); + d->evtchn_pending[i] = 0; + } + if ( d->evtchn_mask[i] != 0 ) + { + mfn = domain_page_map_to_mfn(d->evtchn_mask[i]); + unmap_domain_page_global(d->evtchn_mask[i]); + put_page_and_type(mfn_to_page(mfn)); + d->evtchn_mask[i] = 0; + } + } +} + +static long __map_l2_selector(struct vcpu *v, unsigned long gfn, + unsigned long off) +{ + void *mapping; + int rc; + struct page_info *page; + struct domain *d = v->domain; + + rc = -EINVAL; /* common errno for following operations */ + + /* Sanity check: L2 selector has maximum size of sizeof(unsigned + * long) * 8, this size is equal to the size of shared bitmap + * array of 2-level event channel. */ + if ( off + sizeof(unsigned long) * 8 >= PAGE_SIZE ) + goto out; + + page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); + if ( !page ) + goto out; + + if ( !get_page_type(page, PGT_writable_page) ) + { + put_page(page); + goto out; + } + + /* Use global mapping here, because we need to map selector for + * other vcpu (v != current). However this mapping is only used by + * v when guest is running. */ + mapping = __map_domain_page_global(page); + + if ( mapping == NULL ) + { + put_page_and_type(page); + rc = -ENOMEM; + goto out; + } + + v->evtchn_pending_sel_l2 = mapping + off; + rc = 0; + + out: + return rc; +} + +static void __unmap_l2_selector(struct vcpu *v) +{ + unsigned long mfn; + + if ( v->evtchn_pending_sel_l2 ) + { + mfn = domain_page_map_to_mfn(v->evtchn_pending_sel_l2); + unmap_domain_page_global(v->evtchn_pending_sel_l2); + put_page_and_type(mfn_to_page(mfn)); + v->evtchn_pending_sel_l2 = NULL; + } +} + +static void __evtchn_unmap_all_3level(struct domain *d) +{ + struct vcpu *v; + for_each_vcpu ( d, v ) + __unmap_l2_selector(v); + __unmap_l3_arrays(d); +} + +static void __evtchn_setup_bitmap_l3(struct domain *d) +{ + struct vcpu *v; + + /* Easy way to setup 3-level bitmap, just move existing selector + * to next level then copy pending array and mask array */ + for_each_vcpu ( d, v ) + { + memcpy(&v->evtchn_pending_sel_l2[0], + &vcpu_info(v, evtchn_pending_sel), + sizeof(vcpu_info(v, evtchn_pending_sel))); + memset(&vcpu_info(v, evtchn_pending_sel), 0, + sizeof(vcpu_info(v, evtchn_pending_sel))); + set_bit(0, &vcpu_info(v, evtchn_pending_sel)); + } + + memcpy(d->evtchn_pending[0], &shared_info(d, evtchn_pending), + sizeof(shared_info(d, evtchn_pending))); + memcpy(d->evtchn_mask[0], &shared_info(d, evtchn_mask), + sizeof(shared_info(d, evtchn_mask))); +} + +static long evtchn_register_3level(evtchn_register_3level_t *arg) +{ + struct domain *d = current->domain; + struct vcpu *v; + int rc = 0; + xen_pfn_t evtchn_pending[EVTCHN_MAX_L3_PAGES]; + xen_pfn_t evtchn_mask[EVTCHN_MAX_L3_PAGES]; + xen_pfn_t l2sel_mfn = 0; + xen_pfn_t l2sel_offset = 0; + + if ( d->evtchn_level == EVTCHN_3_LEVEL ) + { + rc = -EINVAL; + goto out; + } + + if ( arg->nr_vcpus > d->max_vcpus || + arg->nr_pages > EVTCHN_MAX_L3_PAGES ) + { + rc = -EINVAL; + goto out; + } + + memset(evtchn_pending, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES); + memset(evtchn_mask, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES); + +#define __COPY_ARRAY(_d, _s, _nr) \ + do { \ + if ( copy_from_guest((_d), (_s), (_nr)) ) \ + { \ + rc = -EFAULT; \ + goto out; \ + } \ + } while (0) + __COPY_ARRAY(evtchn_pending, arg->evtchn_pending, arg->nr_pages); + __COPY_ARRAY(evtchn_mask, arg->evtchn_mask, arg->nr_pages); +#undef __COPY_ARRAY + + rc = __map_l3_arrays(d, evtchn_pending, evtchn_mask, arg->nr_pages); + if ( rc ) + goto out; + + for_each_vcpu ( d, v ) + { + int vcpu_id = v->vcpu_id; + + if ( unlikely(copy_from_guest_offset(&l2sel_mfn, arg->l2sel_mfns, + vcpu_id, 1)) ) + { + rc = -EFAULT; + __evtchn_unmap_all_3level(d); + goto out; + } + if ( unlikely(copy_from_guest_offset(&l2sel_offset, arg->l2sel_offsets, + vcpu_id, 1)) ) + { + rc = -EFAULT; + __evtchn_unmap_all_3level(d); + goto out; + } + if ( (rc = __map_l2_selector(v, l2sel_mfn, l2sel_offset)) ) + { + __evtchn_unmap_all_3level(d); + goto out; + } + } + + __evtchn_setup_bitmap_l3(d); + + d->evtchn_level = EVTCHN_3_LEVEL; + + rc = 0; + + out: + return rc; +} + +static long evtchn_register_nlevel(struct evtchn_register_nlevel *reg) +{ + struct domain *d = current->domain; + int rc; + + spin_lock(&d->event_lock); + + switch ( reg->level ) + { + case EVTCHN_3_LEVEL: + rc = evtchn_register_3level(®->u.l3); + break; + default: + rc = -EINVAL; + } + + spin_unlock(&d->event_lock); + + return rc; +} + long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { long rc; @@ -1132,6 +1385,18 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case EVTCHNOP_register_nlevel: { + struct evtchn_register_nlevel reg; + if ( copy_from_guest(®, arg, 1) != 0 ) + return -EFAULT; + rc = evtchn_register_nlevel(®); + + /* XXX always fails this call because it is not yet completed */ + rc = -EINVAL; + + break; + } + default: rc = -ENOSYS; break; @@ -1258,6 +1523,17 @@ int evtchn_init(struct domain *d) return 0; } +static void evtchn_unmap_nlevel(struct domain *d) +{ + switch ( d->evtchn_level ) + { + case EVTCHN_3_LEVEL: + __evtchn_unmap_all_3level(d); + break; + default: + break; + } +} void evtchn_destroy(struct domain *d) { @@ -1286,6 +1562,8 @@ void evtchn_destroy(struct domain *d) clear_global_virq_handlers(d); + evtchn_unmap_nlevel(d); + free_xenheap_page(d->evtchn); } -- 1.7.10.4
Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- xen/common/event_channel.c | 110 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 90 insertions(+), 20 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index c448c60..a0bd00f 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -627,10 +627,33 @@ out: return ret; } +static void __check_vcpu_polling(struct vcpu *v, int port) +{ + int vcpuid; + struct domain *d = v->domain; + + /* Check if some VCPU might be polling for this event. */ + if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) + return; + + /* Wake any interested (or potentially interested) pollers. */ + for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); + vcpuid < d->max_vcpus; + vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) + { + v = d->vcpu[vcpuid]; + if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && + test_and_clear_bit(vcpuid, d->poll_mask) ) + { + v->poll_evtchn = 0; + vcpu_unblock(v); + } + } +} + static void evtchn_set_pending_l2(struct vcpu *v, int port) { struct domain *d = v->domain; - int vcpuid; /* * The following bit operations must happen in strict order. @@ -649,23 +672,35 @@ static void evtchn_set_pending_l2(struct vcpu *v, int port) vcpu_mark_events_pending(v); } - /* Check if some VCPU might be polling for this event. */ - if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) ) - return; + __check_vcpu_polling(v, port); +} - /* Wake any interested (or potentially interested) pollers. */ - for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus); - vcpuid < d->max_vcpus; - vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) ) +static void evtchn_set_pending_l3(struct vcpu *v, int port) +{ + struct domain *d = v->domain; + unsigned int page_no = EVTCHN_PAGE_NO(port); + unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port); + unsigned int l1bit = port >> (EVTCHN_WORD_BITORDER(d) << 1); + unsigned int l2bit = port >> EVTCHN_WORD_BITORDER(d); + + /* + * The following bit operations must happen in strict order. + * NB. On x86, the atomic bit operations also act as memory barriers. + * There is therefore sufficiently strict ordering for this architecture -- + * others may require explicit memory barriers. + */ + + if ( test_and_set_bit(offset, d->evtchn_pending[page_no]) ) + return; + + if ( !test_bit(offset, d->evtchn_mask[page_no]) && + !test_and_set_bit(l2bit, v->evtchn_pending_sel_l2) && + !test_and_set_bit(l1bit, &vcpu_info(v, evtchn_pending_sel)) ) { - v = d->vcpu[vcpuid]; - if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) && - test_and_clear_bit(vcpuid, d->poll_mask) ) - { - v->poll_evtchn = 0; - vcpu_unblock(v); - } + vcpu_mark_events_pending(v); } + + __check_vcpu_polling(v, port); } static void evtchn_set_pending(struct vcpu *v, int port) @@ -677,6 +712,9 @@ static void evtchn_set_pending(struct vcpu *v, int port) case EVTCHN_2_LEVEL: evtchn_set_pending_l2(v, port); break; + case 3: + evtchn_set_pending_l3(v, port); + break; default: BUG(); } @@ -981,6 +1019,37 @@ static int evtchn_unmask_l2(unsigned int port) return 0; } +static int evtchn_unmask_l3(unsigned int port) +{ + struct domain *d = current->domain; + struct vcpu *v; + unsigned int page_no = EVTCHN_PAGE_NO(port); + unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port); + unsigned int l1bit = port >> (EVTCHN_WORD_BITORDER(d) << 1); + unsigned int l2bit = port >> EVTCHN_WORD_BITORDER(d); + + ASSERT(spin_is_locked(&d->event_lock)); + + if ( unlikely(!port_is_valid(d, port)) ) + return -EINVAL; + + v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id]; + + /* + * These operations must happen in strict order. Based on + * include/xen/event.h:evtchn_set_pending(). + */ + if ( test_and_clear_bit(offset, d->evtchn_mask[page_no]) && + test_bit (offset, d->evtchn_pending[page_no]) && + !test_and_set_bit (l2bit, v->evtchn_pending_sel_l2) && + !test_and_set_bit (l1bit, &vcpu_info(v, evtchn_pending_sel)) ) + { + vcpu_mark_events_pending(v); + } + + return 0; +} + int evtchn_unmask(unsigned int port) { struct domain *d = current->domain; @@ -991,6 +1060,9 @@ int evtchn_unmask(unsigned int port) case EVTCHN_2_LEVEL: rc = evtchn_unmask_l2(port); break; + case 3: + rc = evtchn_unmask_l3(port); + break; default: BUG(); } @@ -1390,10 +1462,6 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(®, arg, 1) != 0 ) return -EFAULT; rc = evtchn_register_nlevel(®); - - /* XXX always fails this call because it is not yet completed */ - rc = -EINVAL; - break; } @@ -1602,8 +1670,10 @@ static void domain_dump_evtchn_info(struct domain *d) bitmap_scnlistprintf(keyhandler_scratch, sizeof(keyhandler_scratch), d->poll_mask, d->max_vcpus); printk("Event channel information for domain %d:\n" + "Using %d-level event channel\n" "Polling vCPUs: {%s}\n" - " port [p/m]\n", d->domain_id, keyhandler_scratch); + " port [p/m]\n", + d->domain_id, d->evtchn_level, keyhandler_scratch); spin_lock(&d->event_lock); -- 1.7.10.4
Jan Beulich
2013-Feb-04 09:00 UTC
Re: [PATCH 04/16] Move event channel macros / struct definition to proper place
>>> On 31.01.13 at 15:42, Wei Liu <wei.liu2@citrix.com> wrote: > --- a/xen/include/public/xen.h > +++ b/xen/include/public/xen.h > @@ -557,6 +557,8 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); > * 1024 if a long is 32 bits; 4096 if a long is 64 bits. > */ > #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64) > +#define EVTCHNS_PER_BUCKET 128 > +#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)These aren''t part of the hypercall ABI, and hence don''t belong here. What is preventing you from putting them alongside the other stuff you move to xen/include/xen/event.h? Jan
Jan Beulich
2013-Feb-04 09:23 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
>>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote: > +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending, > + xen_pfn_t *mask, int nr_pages) > +{ > + int rc; > + void *mapping; > + struct page_info *pginfo; > + unsigned long gfn; > + int pending_count = 0, mask_count = 0; > + > +#define __MAP(src, dst, cnt) \ > + for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ ) \ > + { \ > + rc = -EINVAL; \ > + gfn = (src)[(cnt)]; \ > + pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); \ > + if ( !pginfo ) \ > + goto err; \ > + if ( !get_page_type(pginfo, PGT_writable_page) ) \ > + { \ > + put_page(pginfo); \ > + goto err; \ > + } \ > + mapping = __map_domain_page_global(pginfo); \ > + if ( !mapping ) \ > + { \ > + put_page_and_type(pginfo); \ > + rc = -ENOMEM; \ > + goto err; \ > + } \ > + (dst)[(cnt)] = mapping; \ > + } > + > + __MAP(pending, d->evtchn_pending, pending_count) > + __MAP(mask, d->evtchn_mask, mask_count) > +#undef __MAP > + > + rc = 0; > + > + err: > + return rc; > +}So this alone already is up to 16 pages per guest, and hence a theoretical maximum of 512k pages, i.e. 2G mapped space. The global page mapping area, however, is only 1Gb in size on x86-64 (didn''t check ARM at all)... Which is why I said that you need to at least explain why bumping that address range isn''t necessary (i.e. if we think that we really don''t want to support the maximum number of guests allowed in theory, and that their amount is really always going to be low enough to also not run into resource conflicts with other users of the interface).> +static long evtchn_register_3level(evtchn_register_3level_t *arg) > +{ > + struct domain *d = current->domain; > + struct vcpu *v; > + int rc = 0; > + xen_pfn_t evtchn_pending[EVTCHN_MAX_L3_PAGES]; > + xen_pfn_t evtchn_mask[EVTCHN_MAX_L3_PAGES]; > + xen_pfn_t l2sel_mfn = 0; > + xen_pfn_t l2sel_offset = 0; > + > + if ( d->evtchn_level == EVTCHN_3_LEVEL ) > + { > + rc = -EINVAL; > + goto out; > + } > + > + if ( arg->nr_vcpus > d->max_vcpus || > + arg->nr_pages > EVTCHN_MAX_L3_PAGES ) > + { > + rc = -EINVAL; > + goto out; > + } > + > + memset(evtchn_pending, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES); > + memset(evtchn_mask, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES); > + > +#define __COPY_ARRAY(_d, _s, _nr) \ > + do { \ > + if ( copy_from_guest((_d), (_s), (_nr)) ) \ > + { \ > + rc = -EFAULT; \ > + goto out; \ > + } \ > + } while (0) > + __COPY_ARRAY(evtchn_pending, arg->evtchn_pending, arg->nr_pages); > + __COPY_ARRAY(evtchn_mask, arg->evtchn_mask, arg->nr_pages); > +#undef __COPY_ARRAYI don''t think this really benefits from using the __COPY_ARRAY() macro. Jan
Wei Liu
2013-Feb-04 10:25 UTC
Re: [PATCH 04/16] Move event channel macros / struct definition to proper place
On Mon, 2013-02-04 at 09:00 +0000, Jan Beulich wrote:> >>> On 31.01.13 at 15:42, Wei Liu <wei.liu2@citrix.com> wrote: > > --- a/xen/include/public/xen.h > > +++ b/xen/include/public/xen.h > > @@ -557,6 +557,8 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); > > * 1024 if a long is 32 bits; 4096 if a long is 64 bits. > > */ > > #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64) > > +#define EVTCHNS_PER_BUCKET 128 > > +#define NR_EVTCHN_BUCKETS (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET) > > These aren''t part of the hypercall ABI, and hence don''t belong here. > What is preventing you from putting them alongside the other > stuff you move to xen/include/xen/event.h? >That would cause circular inclusion and break the build. a) sched.h: struct domain reference NR_EVTCHN_BUCKETS b) event.h: refereces sched.h Now a second thought come to me, a clean fix would be that I make the allocation of evtchn in struct domain first, then move those macros / definitions to proper place. Wei.
Ian Campbell
2013-Feb-04 11:20 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 09:23 +0000, Jan Beulich wrote:> >>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote: > > +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending, > > + xen_pfn_t *mask, int nr_pages) > > +{ > > + int rc; > > + void *mapping; > > + struct page_info *pginfo; > > + unsigned long gfn; > > + int pending_count = 0, mask_count = 0; > > + > > +#define __MAP(src, dst, cnt) \ > > + for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ ) \ > > + { \ > > + rc = -EINVAL; \ > > + gfn = (src)[(cnt)]; \ > > + pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); \ > > + if ( !pginfo ) \ > > + goto err; \ > > + if ( !get_page_type(pginfo, PGT_writable_page) ) \ > > + { \ > > + put_page(pginfo); \ > > + goto err; \ > > + } \ > > + mapping = __map_domain_page_global(pginfo); \ > > + if ( !mapping ) \ > > + { \ > > + put_page_and_type(pginfo); \ > > + rc = -ENOMEM; \ > > + goto err; \ > > + } \ > > + (dst)[(cnt)] = mapping; \ > > + } > > + > > + __MAP(pending, d->evtchn_pending, pending_count) > > + __MAP(mask, d->evtchn_mask, mask_count) > > +#undef __MAP > > + > > + rc = 0; > > + > > + err: > > + return rc; > > +} > > So this alone already is up to 16 pages per guest, and hence a > theoretical maximum of 512k pages, i.e. 2G mapped space.That''s given a theoretical 32k guests? Ouch. It also ignores the need for other global mappings. on the flip side only a minority of domains are likely to be using the extended scheme, and I expect even those which are would not be using all 16 pages, so maybe we can fault them in on demand as we bind/unbind evtchns. Where does 16 come from? How many pages to we end up with at each level in the new scheme? Some levels of the trie are per-VCPU, did you account for that already in the 2GB?> The > global page mapping area, however, is only 1Gb in size on x86-64 > (didn''t check ARM at all)...There isn''t currently a global page mapping area on 32-bit ARM (I suppose we have avoided them somehow...) but obviously 2G would be a problem in a 4GB address space. On ARM we currently have 2G for domheap mappings which I suppose we would split if we needed a global page map These need to be global so we can deliver evtchns to VCPUs which aren''t running, right? I suppose mapping on demand (other than for a running VCPU) would be prohibitively expensive. Could we make this space per-VCPU (or per-domain) by saying that a domain maps its own evtchn pages plus the required pages from other domains with which an evtchn is bound? Might be tricky to arrange though, especially with the per-VCPU pages and affinity changes? Ian.
Jan Beulich
2013-Feb-04 11:29 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
>>> On 04.02.13 at 12:20, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Mon, 2013-02-04 at 09:23 +0000, Jan Beulich wrote: >> >>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote: >> > +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending, >> > + xen_pfn_t *mask, int nr_pages) >> > +{ >> > + int rc; >> > + void *mapping; >> > + struct page_info *pginfo; >> > + unsigned long gfn; >> > + int pending_count = 0, mask_count = 0; >> > + >> > +#define __MAP(src, dst, cnt) \ >> > + for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ ) \ >> > + { \ >> > + rc = -EINVAL; \ >> > + gfn = (src)[(cnt)]; \ >> > + pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); \ >> > + if ( !pginfo ) \ >> > + goto err; \ >> > + if ( !get_page_type(pginfo, PGT_writable_page) ) \ >> > + { \ >> > + put_page(pginfo); \ >> > + goto err; \ >> > + } \ >> > + mapping = __map_domain_page_global(pginfo); \ >> > + if ( !mapping ) \ >> > + { \ >> > + put_page_and_type(pginfo); \ >> > + rc = -ENOMEM; \ >> > + goto err; \ >> > + } \ >> > + (dst)[(cnt)] = mapping; \ >> > + } >> > + >> > + __MAP(pending, d->evtchn_pending, pending_count) >> > + __MAP(mask, d->evtchn_mask, mask_count) >> > +#undef __MAP >> > + >> > + rc = 0; >> > + >> > + err: >> > + return rc; >> > +} >> >> So this alone already is up to 16 pages per guest, and hence a >> theoretical maximum of 512k pages, i.e. 2G mapped space. > > That''s given a theoretical 32k guests? Ouch. It also ignores the need > for other global mappings. > > on the flip side only a minority of domains are likely to be using the > extended scheme, and I expect even those which are would not be using > all 16 pages, so maybe we can fault them in on demand as we bind/unbind > evtchns. > > Where does 16 come from? How many pages to we end up with at each level > in the new scheme?Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we''ve got two of them (pending and mask bits).> Some levels of the trie are per-VCPU, did you account for that already > in the 2GB?No, I didn''t, as it would only increase the number, and make the math less clear.>> The >> global page mapping area, however, is only 1Gb in size on x86-64 >> (didn''t check ARM at all)... > > There isn''t currently a global page mapping area on 32-bit ARM (I > suppose we have avoided them somehow...) but obviously 2G would be a > problem in a 4GB address space. > > On ARM we currently have 2G for domheap mappings which I suppose we > would split if we needed a global page map > > These need to be global so we can deliver evtchns to VCPUs which aren''t > running, right? I suppose mapping on demand (other than for a running > VCPU) would be prohibitively expensive.Likely, especially for high rate ones.> Could we make this space per-VCPU (or per-domain) by saying that a > domain maps its own evtchn pages plus the required pages from other > domains with which an evtchn is bound? Might be tricky to arrange > though, especially with the per-VCPU pages and affinity changes?Even without that trickiness it wouldn''t work I''m afraid: In various cases we need to be able to raise the events out of context (timer, IRQs from passed through devices). Jan
Wei Liu
2013-Feb-04 11:37 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 11:20 +0000, Ian Campbell wrote:> On Mon, 2013-02-04 at 09:23 +0000, Jan Beulich wrote: > > >>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote: > > > +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending, > > > + xen_pfn_t *mask, int nr_pages) > > > +{ > > > + int rc; > > > + void *mapping; > > > + struct page_info *pginfo; > > > + unsigned long gfn; > > > + int pending_count = 0, mask_count = 0; > > > + > > > +#define __MAP(src, dst, cnt) \ > > > + for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ ) \ > > > + { \ > > > + rc = -EINVAL; \ > > > + gfn = (src)[(cnt)]; \ > > > + pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); \ > > > + if ( !pginfo ) \ > > > + goto err; \ > > > + if ( !get_page_type(pginfo, PGT_writable_page) ) \ > > > + { \ > > > + put_page(pginfo); \ > > > + goto err; \ > > > + } \ > > > + mapping = __map_domain_page_global(pginfo); \ > > > + if ( !mapping ) \ > > > + { \ > > > + put_page_and_type(pginfo); \ > > > + rc = -ENOMEM; \ > > > + goto err; \ > > > + } \ > > > + (dst)[(cnt)] = mapping; \ > > > + } > > > + > > > + __MAP(pending, d->evtchn_pending, pending_count) > > > + __MAP(mask, d->evtchn_mask, mask_count) > > > +#undef __MAP > > > + > > > + rc = 0; > > > + > > > + err: > > > + return rc; > > > +} > > > > So this alone already is up to 16 pages per guest, and hence a > > theoretical maximum of 512k pages, i.e. 2G mapped space. > > That''s given a theoretical 32k guests? Ouch. It also ignores the need > for other global mappings. > > on the flip side only a minority of domains are likely to be using the > extended scheme, and I expect even those which are would not be using > all 16 pages, so maybe we can fault them in on demand as we bind/unbind > evtchns. >This is doable. However I''m afraid checking for mapping validity in hot path could bring in performance penalty.> Where does 16 come from? How many pages to we end up with at each level > in the new scheme? >For 64 bit guest, 8 pages each for evtchn_pending / evtchn_mask. And there are also other global mappings for per-vcpu L2 selectors - there is no API for a vcpu to manipulate other vcpu''s mapping. So the worst case would be there could be lots of global mappings if a domain has hundreds of cpus utilizes 3-level event channel.> Some levels of the trie are per-VCPU, did you account for that already > in the 2GB? > > > The > > global page mapping area, however, is only 1Gb in size on x86-64 > > (didn''t check ARM at all)... > > There isn''t currently a global page mapping area on 32-bit ARM (I > suppose we have avoided them somehow...) but obviously 2G would be a > problem in a 4GB address space. > > On ARM we currently have 2G for domheap mappings which I suppose we > would split if we needed a global page map > > These need to be global so we can deliver evtchns to VCPUs which aren''t > running, right? I suppose mapping on demand (other than for a running > VCPU) would be prohibitively expensive. >Those are the leaf mappings which are supposed to be global.> Could we make this space per-VCPU (or per-domain) by saying that a > domain maps its own evtchn pages plus the required pages from other > domains with which an evtchn is bound? Might be tricky to arrange > though, especially with the per-VCPU pages and affinity changes? >Really tricky... Also potential performance penalty. Wei.
Wei Liu
2013-Feb-04 13:45 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 11:29 +0000, Jan Beulich wrote:> >> > >> So this alone already is up to 16 pages per guest, and hence a > >> theoretical maximum of 512k pages, i.e. 2G mapped space. > > > > That''s given a theoretical 32k guests? Ouch. It also ignores the need > > for other global mappings. > > > > on the flip side only a minority of domains are likely to be using the > > extended scheme, and I expect even those which are would not be using > > all 16 pages, so maybe we can fault them in on demand as we bind/unbind > > evtchns. > > > > Where does 16 come from? How many pages to we end up with at each level > > in the new scheme? > > Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we''ve > got two of them (pending and mask bits). > > > Some levels of the trie are per-VCPU, did you account for that already > > in the 2GB? > > No, I didn''t, as it would only increase the number, and make > the math less clear. > > >> The > >> global page mapping area, however, is only 1Gb in size on x86-64 > >> (didn''t check ARM at all)... > > > > There isn''t currently a global page mapping area on 32-bit ARM (I > > suppose we have avoided them somehow...) but obviously 2G would be a > > problem in a 4GB address space. > > > > On ARM we currently have 2G for domheap mappings which I suppose we > > would split if we needed a global page map > > > > These need to be global so we can deliver evtchns to VCPUs which aren''t > > running, right? I suppose mapping on demand (other than for a running > > VCPU) would be prohibitively expensive. > > Likely, especially for high rate ones. > > > Could we make this space per-VCPU (or per-domain) by saying that a > > domain maps its own evtchn pages plus the required pages from other > > domains with which an evtchn is bound? Might be tricky to arrange > > though, especially with the per-VCPU pages and affinity changes? > > Even without that trickiness it wouldn''t work I''m afraid: In various > cases we need to be able to raise the events out of context (timer, > IRQs from passed through devices). > > JanSo I come up with following comment on the 3-level registration interface (not specific to __map_l3_array() function). /* * Note to 3-level event channel users: * Only enable 3-level event channel for Dom0 or driver domains, because * 3-level event channels consumes (16 + nr_vcpus pages) global mapping * area in Xen. */ Wei.
Ian Campbell
2013-Feb-04 13:47 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote:> /* > * Note to 3-level event channel users: > * Only enable 3-level event channel for Dom0 or driver domains, because > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > * area in Xen. > */Can this be enforced by the system administrator? Ian.
Wei Liu
2013-Feb-04 13:51 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote:> On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote: > > > /* > > * Note to 3-level event channel users: > > * Only enable 3-level event channel for Dom0 or driver domains, because > > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > > * area in Xen. > > */ > > Can this be enforced by the system administrator? >Knowing a domain is Dom0 is easy, but is it possible to know a domain is driver domain? Wei.> Ian. > >
Ian Campbell
2013-Feb-04 13:54 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote:> On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote: > > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote: > > > > > /* > > > * Note to 3-level event channel users: > > > * Only enable 3-level event channel for Dom0 or driver domains, because > > > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > > > * area in Xen. > > > */ > > > > Can this be enforced by the system administrator? > > > > Knowing a domain is Dom0 is easy, but is it possible to know a domain is > driver domain?The admin knows, at the very least they need to have a manual override (or maybe this should even default off for non-dom0) Ian.
Wei Liu
2013-Feb-04 13:59 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 13:54 +0000, Ian Campbell wrote:> On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote: > > On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote: > > > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote: > > > > > > > /* > > > > * Note to 3-level event channel users: > > > > * Only enable 3-level event channel for Dom0 or driver domains, because > > > > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > > > > * area in Xen. > > > > */ > > > > > > Can this be enforced by the system administrator? > > > > > > > Knowing a domain is Dom0 is easy, but is it possible to know a domain is > > driver domain? > > The admin knows, at the very least they need to have a manual override > (or maybe this should even default off for non-dom0) >Do you mean maintaining white list in Xen or adding options in guest kernel? I already have that in my kernel patch series - only enable 3-level event channel for Dom0. And I used to propose a kernel option for overriding this, but Konrad didn''t like it. Wei.> Ian. >
Jan Beulich
2013-Feb-04 14:06 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
>>> On 04.02.13 at 14:45, Wei Liu <wei.liu2@citrix.com> wrote: > On Mon, 2013-02-04 at 11:29 +0000, Jan Beulich wrote: >> >> >> >> So this alone already is up to 16 pages per guest, and hence a >> >> theoretical maximum of 512k pages, i.e. 2G mapped space. >> > >> > That''s given a theoretical 32k guests? Ouch. It also ignores the need >> > for other global mappings. >> > >> > on the flip side only a minority of domains are likely to be using the >> > extended scheme, and I expect even those which are would not be using >> > all 16 pages, so maybe we can fault them in on demand as we bind/unbind >> > evtchns. >> > >> > Where does 16 come from? How many pages to we end up with at each level >> > in the new scheme? >> >> Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we''ve >> got two of them (pending and mask bits). >> >> > Some levels of the trie are per-VCPU, did you account for that already >> > in the 2GB? >> >> No, I didn''t, as it would only increase the number, and make >> the math less clear. >> >> >> The >> >> global page mapping area, however, is only 1Gb in size on x86-64 >> >> (didn''t check ARM at all)... >> > >> > There isn''t currently a global page mapping area on 32-bit ARM (I >> > suppose we have avoided them somehow...) but obviously 2G would be a >> > problem in a 4GB address space. >> > >> > On ARM we currently have 2G for domheap mappings which I suppose we >> > would split if we needed a global page map >> > >> > These need to be global so we can deliver evtchns to VCPUs which aren''t >> > running, right? I suppose mapping on demand (other than for a running >> > VCPU) would be prohibitively expensive. >> >> Likely, especially for high rate ones. >> >> > Could we make this space per-VCPU (or per-domain) by saying that a >> > domain maps its own evtchn pages plus the required pages from other >> > domains with which an evtchn is bound? Might be tricky to arrange >> > though, especially with the per-VCPU pages and affinity changes? >> >> Even without that trickiness it wouldn''t work I''m afraid: In various >> cases we need to be able to raise the events out of context (timer, >> IRQs from passed through devices). >> >> Jan > > So I come up with following comment on the 3-level registration > interface (not specific to __map_l3_array() function). > > /* > * Note to 3-level event channel users: > * Only enable 3-level event channel for Dom0 or driver domains, because > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > * area in Xen. > */So you intended to fail the request for other guests? That''s fine with me in principle, but how do you tell a driver domain from an "ordinary" one? Jan
Ian Campbell
2013-Feb-04 14:22 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 13:59 +0000, Wei Liu wrote:> On Mon, 2013-02-04 at 13:54 +0000, Ian Campbell wrote: > > On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote: > > > On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote: > > > > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote: > > > > > > > > > /* > > > > > * Note to 3-level event channel users: > > > > > * Only enable 3-level event channel for Dom0 or driver domains, because > > > > > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > > > > > * area in Xen. > > > > > */ > > > > > > > > Can this be enforced by the system administrator? > > > > > > > > > > Knowing a domain is Dom0 is easy, but is it possible to know a domain is > > > driver domain? > > > > The admin knows, at the very least they need to have a manual override > > (or maybe this should even default off for non-dom0) > > > > Do you mean maintaining white list in Xen or adding options in guest > kernel?I mean that it should be a property of the domain (i.e. a flag in struct domain or whatever) whether they can use 3-levels and this should be settable by the host administrator when they build the guest.> I already have that in my kernel patch series - only enable > 3-level event channel for Dom0.Imagine I am a malicious user of you cloud service, I could potentially create dozens of guests using kernels which forcibly try to use 3-level evtchns and suck up loads of host RAM. Ian.
Wei Liu
2013-Feb-04 14:24 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 14:22 +0000, Ian Campbell wrote:> On Mon, 2013-02-04 at 13:59 +0000, Wei Liu wrote: > > On Mon, 2013-02-04 at 13:54 +0000, Ian Campbell wrote: > > > On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote: > > > > On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote: > > > > > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote: > > > > > > > > > > > /* > > > > > > * Note to 3-level event channel users: > > > > > > * Only enable 3-level event channel for Dom0 or driver domains, because > > > > > > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > > > > > > * area in Xen. > > > > > > */ > > > > > > > > > > Can this be enforced by the system administrator? > > > > > > > > > > > > > Knowing a domain is Dom0 is easy, but is it possible to know a domain is > > > > driver domain? > > > > > > The admin knows, at the very least they need to have a manual override > > > (or maybe this should even default off for non-dom0) > > > > > > > Do you mean maintaining white list in Xen or adding options in guest > > kernel? > > I mean that it should be a property of the domain (i.e. a flag in struct > domain or whatever) whether they can use 3-levels and this should be > settable by the host administrator when they build the guest. >I''m looking at this now since I realized that we cannot trust users at all right after I sent my email...> > I already have that in my kernel patch series - only enable > > 3-level event channel for Dom0. > > Imagine I am a malicious user of you cloud service, I could potentially > create dozens of guests using kernels which forcibly try to use 3-level > evtchns and suck up loads of host RAM. >Right. Wei.> Ian. >
Wei Liu
2013-Feb-04 14:36 UTC
Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
On Mon, 2013-02-04 at 14:06 +0000, Jan Beulich wrote:> >>> On 04.02.13 at 14:45, Wei Liu <wei.liu2@citrix.com> wrote: > > On Mon, 2013-02-04 at 11:29 +0000, Jan Beulich wrote: > >> >> > >> >> So this alone already is up to 16 pages per guest, and hence a > >> >> theoretical maximum of 512k pages, i.e. 2G mapped space. > >> > > >> > That''s given a theoretical 32k guests? Ouch. It also ignores the need > >> > for other global mappings. > >> > > >> > on the flip side only a minority of domains are likely to be using the > >> > extended scheme, and I expect even those which are would not be using > >> > all 16 pages, so maybe we can fault them in on demand as we bind/unbind > >> > evtchns. > >> > > >> > Where does 16 come from? How many pages to we end up with at each level > >> > in the new scheme? > >> > >> Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we''ve > >> got two of them (pending and mask bits). > >> > >> > Some levels of the trie are per-VCPU, did you account for that already > >> > in the 2GB? > >> > >> No, I didn''t, as it would only increase the number, and make > >> the math less clear. > >> > >> >> The > >> >> global page mapping area, however, is only 1Gb in size on x86-64 > >> >> (didn''t check ARM at all)... > >> > > >> > There isn''t currently a global page mapping area on 32-bit ARM (I > >> > suppose we have avoided them somehow...) but obviously 2G would be a > >> > problem in a 4GB address space. > >> > > >> > On ARM we currently have 2G for domheap mappings which I suppose we > >> > would split if we needed a global page map > >> > > >> > These need to be global so we can deliver evtchns to VCPUs which aren''t > >> > running, right? I suppose mapping on demand (other than for a running > >> > VCPU) would be prohibitively expensive. > >> > >> Likely, especially for high rate ones. > >> > >> > Could we make this space per-VCPU (or per-domain) by saying that a > >> > domain maps its own evtchn pages plus the required pages from other > >> > domains with which an evtchn is bound? Might be tricky to arrange > >> > though, especially with the per-VCPU pages and affinity changes? > >> > >> Even without that trickiness it wouldn''t work I''m afraid: In various > >> cases we need to be able to raise the events out of context (timer, > >> IRQs from passed through devices). > >> > >> Jan > > > > So I come up with following comment on the 3-level registration > > interface (not specific to __map_l3_array() function). > > > > /* > > * Note to 3-level event channel users: > > * Only enable 3-level event channel for Dom0 or driver domains, because > > * 3-level event channels consumes (16 + nr_vcpus pages) global mapping > > * area in Xen. > > */ > > So you intended to fail the request for other guests? That''s fine > with me in principle, but how do you tell a driver domain from an > "ordinary" one? >I can''t at the moment. I''m investigating on adding a flag in domain creation process. Wei.> Jan >