thr3ads.net - Xen devel - [PATCHv6 00/16] Linux: FIFO-based event channel ABI [Oct 2013]

If this information is useful, please help other people find it:
Share via:

David Vrabel

2013-Oct-08 12:48 UTC

[PATCHv6 00/16] Linux: FIFO-based event channel ABI

This is the complete Linux guest-side implementation of the FIFO-based
event channel ABI described in this design document:

  http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf

Refer also to the Xen series.

Note that __startup_pirq() was never ideal as failure when binding a
PIRQ to an event channel cannot be propagated to the calling driver
since the irq_startup callback cannot return an error.  This series
adds another (very unlikely) failure point to __startup_pirq().

I have a half-bakedpatch series to refactor when PIRQs are setup and
torn down but this won''t be ready for a while and I do not think this
is a prerequisite for applying the FIFO-based ABI series.

Patch 1 is a obvious refactoring of common code.

Patch 2-6 prepare for supporting multiple ABIs.

Patch 7 adds the low-level evtchn_ops hooks.

Patch 8-9 add an additional hook for ABI-specific per-port setup
(used for expanding the event array as more event are bound).

Patch 10 allows many more event channels to be supported by altering
how the event channel to irq map is allocated.  Note that other
factors limit the number of supported IRQs (IRQs is 8192 + 64 *
NR_CPUS).

Patch 11 is some trival refactoring.

Patch 12 removes a limit on the number of event channels supported by
the xen-evtchn driver.

Patch 13 adds the hypervisor ABI.

Patches 14-15 Set the priority of VIRQ_TIMER events to the highest.

Patch 16 adds the FIFO-based ABI implementation.

Changes in v6:

- nlevel -> 2l to match naming in Xen.

Changes in v5 (v4 not posted):

- Set priority of VIRQ_TIMER.
- Remove NR_EVENT_CHANNELS limit from xen-evtchn driver.
- Licence drivers/xen/events/fifo.c under a dual MIT/GPLv2.

Changes in v3:

- Support suspend/resume by reinitializing the control blocks on resume.
- Only init control block if one does not exist (should fix CPU
  hotplug).

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 01/16] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn()

From: David Vrabel <david.vrabel@citrix.com>

These two function did the same thing with different parameters, put
the common bits in retrigger_evtchn().

This changes the return value of resend_irq_on_evtchn() but the only
caller (in arch/ia64/xen/irq_xen.c) ignored the return value so this
is fine.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/events.c |   27 +++++++++------------------
 1 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 4035e83..ddcdbb5 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1558,13 +1558,13 @@ static int set_affinity_irq(struct irq_data *data, const
struct cpumask *dest,
 	return rebind_irq_to_cpu(data->irq, tcpu);
 }
 
-int resend_irq_on_evtchn(unsigned int irq)
+static int retrigger_evtchn(int evtchn)
 {
-	int masked, evtchn = evtchn_from_irq(irq);
+	int masked;
 	struct shared_info *s = HYPERVISOR_shared_info;
 
 	if (!VALID_EVTCHN(evtchn))
-		return 1;
+		return 0;
 
 	masked = sync_test_and_set_bit(evtchn, BM(s->evtchn_mask));
 	sync_set_bit(evtchn, BM(s->evtchn_pending));
@@ -1574,6 +1574,11 @@ int resend_irq_on_evtchn(unsigned int irq)
 	return 1;
 }
 
+int resend_irq_on_evtchn(unsigned int irq)
+{
+	return retrigger_evtchn(evtchn_from_irq(irq));
+}
+
 static void enable_dynirq(struct irq_data *data)
 {
 	int evtchn = evtchn_from_irq(data->irq);
@@ -1608,21 +1613,7 @@ static void mask_ack_dynirq(struct irq_data *data)
 
 static int retrigger_dynirq(struct irq_data *data)
 {
-	int evtchn = evtchn_from_irq(data->irq);
-	struct shared_info *sh = HYPERVISOR_shared_info;
-	int ret = 0;
-
-	if (VALID_EVTCHN(evtchn)) {
-		int masked;
-
-		masked = sync_test_and_set_bit(evtchn, BM(sh->evtchn_mask));
-		sync_set_bit(evtchn, BM(sh->evtchn_pending));
-		if (!masked)
-			unmask_evtchn(evtchn);
-		ret = 1;
-	}
-
-	return ret;
+	return retrigger_evtchn(evtchn_from_irq(data->irq));
 }
 
 static void restore_pirqs(void)
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 02/16] xen/events: remove unnecessary init_evtchn_cpu_bindings()

From: David Vrabel <david.vrabel@citrix.com>

Because the guest-side binding of an event to a VCPU (i.e., setting
the local per-cpu masks) is always explicitly done after an event
channel is bound to a port, there is no need to initialize all
possible events as bound to VCPU 0 at start of day or after a resume.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/events.c |   22 ----------------------
 1 files changed, 0 insertions(+), 22 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index ddcdbb5..1e2c74b 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -334,24 +334,6 @@ static void bind_evtchn_to_cpu(unsigned int chn, unsigned
int cpu)
 	info_for_irq(irq)->cpu = cpu;
 }
 
-static void init_evtchn_cpu_bindings(void)
-{
-	int i;
-#ifdef CONFIG_SMP
-	struct irq_info *info;
-
-	/* By default all event channels notify CPU#0. */
-	list_for_each_entry(info, &xen_irq_list_head, list) {
-		struct irq_desc *desc = irq_to_desc(info->irq);
-		cpumask_copy(desc->irq_data.affinity, cpumask_of(0));
-	}
-#endif
-
-	for_each_possible_cpu(i)
-		memset(per_cpu(cpu_evtchn_mask, i),
-		       (i == 0) ? ~0 : 0, NR_EVENT_CHANNELS/8);
-}
-
 static inline void clear_evtchn(int port)
 {
 	struct shared_info *s = HYPERVISOR_shared_info;
@@ -1778,8 +1760,6 @@ void xen_irq_resume(void)
 	unsigned int cpu, evtchn;
 	struct irq_info *info;
 
-	init_evtchn_cpu_bindings();
-
 	/* New event-channel space is not ''live'' yet. */
 	for (evtchn = 0; evtchn < NR_EVENT_CHANNELS; evtchn++)
 		mask_evtchn(evtchn);
@@ -1890,8 +1870,6 @@ void __init xen_init_IRQ(void)
 	for (i = 0; i < NR_EVENT_CHANNELS; i++)
 		evtchn_to_irq[i] = -1;
 
-	init_evtchn_cpu_bindings();
-
 	/* No event channels are ''live'' right now. */
 	for (i = 0; i < NR_EVENT_CHANNELS; i++)
 		mask_evtchn(i);
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 03/16] xen/events: introduce test_and_set_mask

From: Wei Liu <wei.liu2@citrix.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/events.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 1e2c74b..359e983 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -352,6 +352,12 @@ static inline int test_evtchn(int port)
 	return sync_test_bit(port, BM(&s->evtchn_pending[0]));
 }
 
+static inline int test_and_set_mask(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	return sync_test_and_set_bit(port, BM(&s->evtchn_mask[0]));
+}
+
 
 /**
  * notify_remote_via_irq - send event to remote end of event channel via irq
@@ -1493,7 +1499,6 @@ void rebind_evtchn_irq(int evtchn, int irq)
 /* Rebind an evtchn so that it gets delivered to a specific cpu */
 static int rebind_irq_to_cpu(unsigned irq, unsigned tcpu)
 {
-	struct shared_info *s = HYPERVISOR_shared_info;
 	struct evtchn_bind_vcpu bind_vcpu;
 	int evtchn = evtchn_from_irq(irq);
 	int masked;
@@ -1516,7 +1521,7 @@ static int rebind_irq_to_cpu(unsigned irq, unsigned tcpu)
 	 * Mask the event while changing the VCPU binding to prevent
 	 * it being delivered on an unexpected VCPU.
 	 */
-	masked = sync_test_and_set_bit(evtchn, BM(s->evtchn_mask));
+	masked = test_and_set_mask(evtchn);
 
 	/*
 	 * If this fails, it usually just indicates that we''re dealing with a
@@ -1548,7 +1553,7 @@ static int retrigger_evtchn(int evtchn)
 	if (!VALID_EVTCHN(evtchn))
 		return 0;
 
-	masked = sync_test_and_set_bit(evtchn, BM(s->evtchn_mask));
+	masked = test_and_set_mask(evtchn);
 	sync_set_bit(evtchn, BM(s->evtchn_pending));
 	if (!masked)
 		unmask_evtchn(evtchn);
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 04/16] xen/events: replace raw bit ops with functions

From: Wei Liu <wei.liu2@citrix.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/events.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 359e983..fec5da4 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1548,13 +1548,12 @@ static int set_affinity_irq(struct irq_data *data, const
struct cpumask *dest,
 static int retrigger_evtchn(int evtchn)
 {
 	int masked;
-	struct shared_info *s = HYPERVISOR_shared_info;
 
 	if (!VALID_EVTCHN(evtchn))
 		return 0;
 
 	masked = test_and_set_mask(evtchn);
-	sync_set_bit(evtchn, BM(s->evtchn_pending));
+	set_evtchn(evtchn);
 	if (!masked)
 		unmask_evtchn(evtchn);
 
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 05/16] xen/events: move drivers/xen/events.c into drivers/xen/events/

From: David Vrabel <david.vrabel@citrix.com>

events.c will be split into multiple files so move it into its own
directory.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/Makefile              |    3 ++-
 drivers/xen/events/Makefile       |    1 +
 drivers/xen/{ => events}/events.c |    0
 3 files changed, 3 insertions(+), 1 deletions(-)
 create mode 100644 drivers/xen/events/Makefile
 rename drivers/xen/{ => events}/events.c (100%)

diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 14fe79d..d75c811 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -2,7 +2,8 @@ ifeq ($(filter y, $(CONFIG_ARM) $(CONFIG_ARM64)),)
 obj-$(CONFIG_HOTPLUG_CPU)		+= cpu_hotplug.o
 endif
 obj-$(CONFIG_X86)			+= fallback.o
-obj-y	+= grant-table.o features.o events.o balloon.o manage.o
+obj-y	+= grant-table.o features.o balloon.o manage.o
+obj-y	+= events/
 obj-y	+= xenbus/
 
 nostackp := $(call cc-option, -fno-stack-protector)
diff --git a/drivers/xen/events/Makefile b/drivers/xen/events/Makefile
new file mode 100644
index 0000000..86fd7c1
--- /dev/null
+++ b/drivers/xen/events/Makefile
@@ -0,0 +1 @@
+obj-y += events.o
diff --git a/drivers/xen/events.c b/drivers/xen/events/events.c
similarity index 100%
rename from drivers/xen/events.c
rename to drivers/xen/events/events.c
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 06/16] xen/events: move 2-level specific code into its own file

From: David Vrabel <david.vrabel@citrix.com>

In preparation for alternative event channel ABIs, move all the
functions accessing the shared data structures into their own file.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/events/Makefile          |    1 +
 drivers/xen/events/events.c          |  372 +---------------------------------
 drivers/xen/events/events_2l.c       |  322 +++++++++++++++++++++++++++++
 drivers/xen/events/events_internal.h |   74 +++++++
 4 files changed, 408 insertions(+), 361 deletions(-)
 create mode 100644 drivers/xen/events/events_2l.c
 create mode 100644 drivers/xen/events/events_internal.h

diff --git a/drivers/xen/events/Makefile b/drivers/xen/events/Makefile
index 86fd7c1..3bed6e0 100644
--- a/drivers/xen/events/Makefile
+++ b/drivers/xen/events/Makefile
@@ -1 +1,2 @@
 obj-y += events.o
+obj-y += events_2l.o
diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index fec5da4..2700c12 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -59,6 +59,8 @@
 #include <xen/interface/vcpu.h>
 #include <asm/hw_irq.h>
 
+#include "events_internal.h"
+
 /*
  * This lock protects updates to the following mapping and reference-count
  * arrays. The lock does not need to be acquired to read the mapping tables.
@@ -73,72 +75,12 @@ static DEFINE_PER_CPU(int [NR_VIRQS], virq_to_irq) = {[0 ...
NR_VIRQS-1] = -1};
 /* IRQ <-> IPI mapping */
 static DEFINE_PER_CPU(int [XEN_NR_IPIS], ipi_to_irq) = {[0 ... XEN_NR_IPIS-1] =
-1};
 
-/* Interrupt types. */
-enum xen_irq_type {
-	IRQT_UNBOUND = 0,
-	IRQT_PIRQ,
-	IRQT_VIRQ,
-	IRQT_IPI,
-	IRQT_EVTCHN
-};
-
-/*
- * Packed IRQ information:
- * type - enum xen_irq_type
- * event channel - irq->event channel mapping
- * cpu - cpu this event channel is bound to
- * index - type-specific information:
- *    PIRQ - physical IRQ, GSI, flags, and owner domain
- *    VIRQ - virq number
- *    IPI - IPI vector
- *    EVTCHN -
- */
-struct irq_info {
-	struct list_head list;
-	int refcnt;
-	enum xen_irq_type type;	/* type */
-	unsigned irq;
-	unsigned short evtchn;	/* event channel */
-	unsigned short cpu;	/* cpu bound */
-
-	union {
-		unsigned short virq;
-		enum ipi_vector ipi;
-		struct {
-			unsigned short pirq;
-			unsigned short gsi;
-			unsigned char flags;
-			uint16_t domid;
-		} pirq;
-	} u;
-};
-#define PIRQ_NEEDS_EOI	(1 << 0)
-#define PIRQ_SHAREABLE	(1 << 1)
-
-static int *evtchn_to_irq;
+int *evtchn_to_irq;
 #ifdef CONFIG_X86
 static unsigned long *pirq_eoi_map;
 #endif
 static bool (*pirq_needs_eoi)(unsigned irq);
 
-/*
- * Note sizeof(xen_ulong_t) can be more than sizeof(unsigned long). Be
- * careful to only use bitops which allow for this (e.g
- * test_bit/find_first_bit and friends but not __ffs) and to pass
- * BITS_PER_EVTCHN_WORD as the bitmask length.
- */
-#define BITS_PER_EVTCHN_WORD (sizeof(xen_ulong_t)*8)
-/*
- * Make a bitmask (i.e. unsigned long *) of a xen_ulong_t
- * array. Primarily to avoid long lines (hence the terse name).
- */
-#define BM(x) (unsigned long *)(x)
-/* Find the first set bit in a evtchn mask */
-#define EVTCHN_FIRST_BIT(w) find_first_bit(BM(&(w)), BITS_PER_EVTCHN_WORD)
-
-static DEFINE_PER_CPU(xen_ulong_t [NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD],
-		      cpu_evtchn_mask);
-
 /* Xen will never allocate port zero for any purpose. */
 #define VALID_EVTCHN(chn)	((chn) != 0)
 
@@ -149,7 +91,7 @@ static void enable_dynirq(struct irq_data *data);
 static void disable_dynirq(struct irq_data *data);
 
 /* Get info for IRQ */
-static struct irq_info *info_for_irq(unsigned irq)
+struct irq_info *info_for_irq(unsigned irq)
 {
 	return irq_get_handler_data(irq);
 }
@@ -279,12 +221,12 @@ static enum xen_irq_type type_from_irq(unsigned irq)
 	return info_for_irq(irq)->type;
 }
 
-static unsigned cpu_from_irq(unsigned irq)
+unsigned cpu_from_irq(unsigned irq)
 {
 	return info_for_irq(irq)->cpu;
 }
 
-static unsigned int cpu_from_evtchn(unsigned int evtchn)
+unsigned int cpu_from_evtchn(unsigned int evtchn)
 {
 	int irq = evtchn_to_irq[evtchn];
 	unsigned ret = 0;
@@ -310,55 +252,21 @@ static bool pirq_needs_eoi_flag(unsigned irq)
 	return info->u.pirq.flags & PIRQ_NEEDS_EOI;
 }
 
-static inline xen_ulong_t active_evtchns(unsigned int cpu,
-					 struct shared_info *sh,
-					 unsigned int idx)
-{
-	return sh->evtchn_pending[idx] &
-		per_cpu(cpu_evtchn_mask, cpu)[idx] &
-		~sh->evtchn_mask[idx];
-}
-
 static void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
 {
 	int irq = evtchn_to_irq[chn];
+	struct irq_info *info = info_for_irq(irq);
 
 	BUG_ON(irq == -1);
 #ifdef CONFIG_SMP
 	cpumask_copy(irq_to_desc(irq)->irq_data.affinity, cpumask_of(cpu));
 #endif
 
-	clear_bit(chn, BM(per_cpu(cpu_evtchn_mask, cpu_from_irq(irq))));
-	set_bit(chn, BM(per_cpu(cpu_evtchn_mask, cpu)));
-
-	info_for_irq(irq)->cpu = cpu;
-}
+	xen_evtchn_port_bind_to_cpu(info, cpu);
 
-static inline void clear_evtchn(int port)
-{
-	struct shared_info *s = HYPERVISOR_shared_info;
-	sync_clear_bit(port, BM(&s->evtchn_pending[0]));
-}
-
-static inline void set_evtchn(int port)
-{
-	struct shared_info *s = HYPERVISOR_shared_info;
-	sync_set_bit(port, BM(&s->evtchn_pending[0]));
-}
-
-static inline int test_evtchn(int port)
-{
-	struct shared_info *s = HYPERVISOR_shared_info;
-	return sync_test_bit(port, BM(&s->evtchn_pending[0]));
-}
-
-static inline int test_and_set_mask(int port)
-{
-	struct shared_info *s = HYPERVISOR_shared_info;
-	return sync_test_and_set_bit(port, BM(&s->evtchn_mask[0]));
+	info->cpu = cpu;
 }
 
-
 /**
  * notify_remote_via_irq - send event to remote end of event channel via irq
  * @irq: irq of event channel to send event to
@@ -376,63 +284,6 @@ void notify_remote_via_irq(int irq)
 }
 EXPORT_SYMBOL_GPL(notify_remote_via_irq);
 
-static void mask_evtchn(int port)
-{
-	struct shared_info *s = HYPERVISOR_shared_info;
-	sync_set_bit(port, BM(&s->evtchn_mask[0]));
-}
-
-static void unmask_evtchn(int port)
-{
-	struct shared_info *s = HYPERVISOR_shared_info;
-	unsigned int cpu = get_cpu();
-	int do_hypercall = 0, evtchn_pending = 0;
-
-	BUG_ON(!irqs_disabled());
-
-	if (unlikely((cpu != cpu_from_evtchn(port))))
-		do_hypercall = 1;
-	else {
-		/*
-		 * Need to clear the mask before checking pending to
-		 * avoid a race with an event becoming pending.
-		 *
-		 * EVTCHNOP_unmask will only trigger an upcall if the
-		 * mask bit was set, so if a hypercall is needed
-		 * remask the event.
-		 */
-		sync_clear_bit(port, BM(&s->evtchn_mask[0]));
-		evtchn_pending = sync_test_bit(port, BM(&s->evtchn_pending[0]));
-
-		if (unlikely(evtchn_pending && xen_hvm_domain())) {
-			sync_set_bit(port, BM(&s->evtchn_mask[0]));
-			do_hypercall = 1;
-		}
-	}
-
-	/* Slow path (hypercall) if this is a non-local port or if this is
-	 * an hvm domain and an event is pending (hvm domains don''t have
-	 * their own implementation of irq_enable). */
-	if (do_hypercall) {
-		struct evtchn_unmask unmask = { .port = port };
-		(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
-	} else {
-		struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
-
-		/*
-		 * The following is basically the equivalent of
-		 * ''hw_resend_irq''. Just like a real IO-APIC we
''lose
-		 * the interrupt edge'' if the channel is masked.
-		 */
-		if (evtchn_pending &&
-		    !sync_test_and_set_bit(port / BITS_PER_EVTCHN_WORD,
-					   BM(&vcpu_info->evtchn_pending_sel)))
-			vcpu_info->evtchn_upcall_pending = 1;
-	}
-
-	put_cpu();
-}
-
 static void xen_irq_init(unsigned irq)
 {
 	struct irq_info *info;
@@ -1216,222 +1067,21 @@ void xen_send_IPI_one(unsigned int cpu, enum ipi_vector
vector)
 	notify_remote_via_irq(irq);
 }
 
-irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
-{
-	struct shared_info *sh = HYPERVISOR_shared_info;
-	int cpu = smp_processor_id();
-	xen_ulong_t *cpu_evtchn = per_cpu(cpu_evtchn_mask, cpu);
-	int i;
-	unsigned long flags;
-	static DEFINE_SPINLOCK(debug_lock);
-	struct vcpu_info *v;
-
-	spin_lock_irqsave(&debug_lock, flags);
-
-	printk("\nvcpu %d\n  ", cpu);
-
-	for_each_online_cpu(i) {
-		int pending;
-		v = per_cpu(xen_vcpu, i);
-		pending = (get_irq_regs() && i == cpu)
-			? xen_irqs_disabled(get_irq_regs())
-			: v->evtchn_upcall_mask;
-		printk("%d: masked=%d pending=%d event_sel
%0*"PRI_xen_ulong"\n  ", i,
-		       pending, v->evtchn_upcall_pending,
-		       (int)(sizeof(v->evtchn_pending_sel)*2),
-		       v->evtchn_pending_sel);
-	}
-	v = per_cpu(xen_vcpu, cpu);
-
-	printk("\npending:\n   ");
-	for (i = ARRAY_SIZE(sh->evtchn_pending)-1; i >= 0; i--)
-		printk("%0*"PRI_xen_ulong"%s",
-		       (int)sizeof(sh->evtchn_pending[0])*2,
-		       sh->evtchn_pending[i],
-		       i % 8 == 0 ? "\n   " : " ");
-	printk("\nglobal mask:\n   ");
-	for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
-		printk("%0*"PRI_xen_ulong"%s",
-		       (int)(sizeof(sh->evtchn_mask[0])*2),
-		       sh->evtchn_mask[i],
-		       i % 8 == 0 ? "\n   " : " ");
-
-	printk("\nglobally unmasked:\n   ");
-	for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
-		printk("%0*"PRI_xen_ulong"%s",
-		       (int)(sizeof(sh->evtchn_mask[0])*2),
-		       sh->evtchn_pending[i] & ~sh->evtchn_mask[i],
-		       i % 8 == 0 ? "\n   " : " ");
-
-	printk("\nlocal cpu%d mask:\n   ", cpu);
-	for (i = (NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD)-1; i >= 0; i--)
-		printk("%0*"PRI_xen_ulong"%s",
(int)(sizeof(cpu_evtchn[0])*2),
-		       cpu_evtchn[i],
-		       i % 8 == 0 ? "\n   " : " ");
-
-	printk("\nlocally unmasked:\n   ");
-	for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--) {
-		xen_ulong_t pending = sh->evtchn_pending[i]
-			& ~sh->evtchn_mask[i]
-			& cpu_evtchn[i];
-		printk("%0*"PRI_xen_ulong"%s",
-		       (int)(sizeof(sh->evtchn_mask[0])*2),
-		       pending, i % 8 == 0 ? "\n   " : " ");
-	}
-
-	printk("\npending list:\n");
-	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
-		if (sync_test_bit(i, BM(sh->evtchn_pending))) {
-			int word_idx = i / BITS_PER_EVTCHN_WORD;
-			printk("  %d: event %d -> irq %d%s%s%s\n",
-			       cpu_from_evtchn(i), i,
-			       evtchn_to_irq[i],
-			       sync_test_bit(word_idx, BM(&v->evtchn_pending_sel))
-					     ? "" : " l2-clear",
-			       !sync_test_bit(i, BM(sh->evtchn_mask))
-					     ? "" : " globally-masked",
-			       sync_test_bit(i, BM(cpu_evtchn))
-					     ? "" : " locally-masked");
-		}
-	}
-
-	spin_unlock_irqrestore(&debug_lock, flags);
-
-	return IRQ_HANDLED;
-}
-
 static DEFINE_PER_CPU(unsigned, xed_nesting_count);
-static DEFINE_PER_CPU(unsigned int, current_word_idx);
-static DEFINE_PER_CPU(unsigned int, current_bit_idx);
 
-/*
- * Mask out the i least significant bits of w
- */
-#define MASK_LSBS(w, i) (w & ((~((xen_ulong_t)0UL)) << i))
-
-/*
- * Search the CPUs pending events bitmasks.  For each one found, map
- * the event number to an irq, and feed it into do_IRQ() for
- * handling.
- *
- * Xen uses a two-level bitmap to speed searching.  The first level is
- * a bitset of words which contain pending event bits.  The second
- * level is a bitset of pending events themselves.
- */
 static void __xen_evtchn_do_upcall(void)
 {
-	int start_word_idx, start_bit_idx;
-	int word_idx, bit_idx;
-	int i, irq;
-	int cpu = get_cpu();
-	struct shared_info *s = HYPERVISOR_shared_info;
 	struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
+	int cpu = get_cpu();
 	unsigned count;
 
 	do {
-		xen_ulong_t pending_words;
-		xen_ulong_t pending_bits;
-		struct irq_desc *desc;
-
 		vcpu_info->evtchn_upcall_pending = 0;
 
 		if (__this_cpu_inc_return(xed_nesting_count) - 1)
 			goto out;
 
-		/*
-		 * Master flag must be cleared /before/ clearing
-		 * selector flag. xchg_xen_ulong must contain an
-		 * appropriate barrier.
-		 */
-		if ((irq = per_cpu(virq_to_irq, cpu)[VIRQ_TIMER]) != -1) {
-			int evtchn = evtchn_from_irq(irq);
-			word_idx = evtchn / BITS_PER_LONG;
-			pending_bits = evtchn % BITS_PER_LONG;
-			if (active_evtchns(cpu, s, word_idx) & (1ULL << pending_bits)) {
-				desc = irq_to_desc(irq);
-				if (desc)
-					generic_handle_irq_desc(irq, desc);
-			}
-		}
-
-		pending_words = xchg_xen_ulong(&vcpu_info->evtchn_pending_sel, 0);
-
-		start_word_idx = __this_cpu_read(current_word_idx);
-		start_bit_idx = __this_cpu_read(current_bit_idx);
-
-		word_idx = start_word_idx;
-
-		for (i = 0; pending_words != 0; i++) {
-			xen_ulong_t words;
-
-			words = MASK_LSBS(pending_words, word_idx);
-
-			/*
-			 * If we masked out all events, wrap to beginning.
-			 */
-			if (words == 0) {
-				word_idx = 0;
-				bit_idx = 0;
-				continue;
-			}
-			word_idx = EVTCHN_FIRST_BIT(words);
-
-			pending_bits = active_evtchns(cpu, s, word_idx);
-			bit_idx = 0; /* usually scan entire word from start */
-			/*
-			 * We scan the starting word in two parts.
-			 *
-			 * 1st time: start in the middle, scanning the
-			 * upper bits.
-			 *
-			 * 2nd time: scan the whole word (not just the
-			 * parts skipped in the first pass) -- if an
-			 * event in the previously scanned bits is
-			 * pending again it would just be scanned on
-			 * the next loop anyway.
-			 */
-			if (word_idx == start_word_idx) {
-				if (i == 0)
-					bit_idx = start_bit_idx;
-			}
-
-			do {
-				xen_ulong_t bits;
-				int port;
-
-				bits = MASK_LSBS(pending_bits, bit_idx);
-
-				/* If we masked out all events, move on. */
-				if (bits == 0)
-					break;
-
-				bit_idx = EVTCHN_FIRST_BIT(bits);
-
-				/* Process port. */
-				port = (word_idx * BITS_PER_EVTCHN_WORD) + bit_idx;
-				irq = evtchn_to_irq[port];
-
-				if (irq != -1) {
-					desc = irq_to_desc(irq);
-					if (desc)
-						generic_handle_irq_desc(irq, desc);
-				}
-
-				bit_idx = (bit_idx + 1) % BITS_PER_EVTCHN_WORD;
-
-				/* Next caller starts at last processed + 1 */
-				__this_cpu_write(current_word_idx,
-						 bit_idx ? word_idx :
-						 (word_idx+1) % BITS_PER_EVTCHN_WORD);
-				__this_cpu_write(current_bit_idx, bit_idx);
-			} while (bit_idx != 0);
-
-			/* Scan start_l1i twice; all others once. */
-			if ((word_idx != start_word_idx) || (i != 0))
-				pending_words &= ~(1UL << word_idx);
-
-			word_idx = (word_idx + 1) % BITS_PER_EVTCHN_WORD;
-		}
+		xen_evtchn_handle_events(cpu);
 
 		BUG_ON(!irqs_disabled());
 
diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
new file mode 100644
index 0000000..c4f0b55
--- /dev/null
+++ b/drivers/xen/events/events_2l.c
@@ -0,0 +1,322 @@
+/*
+ * Xen event channels (2-level ABI)
+ *
+ * Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007
+ */
+
+#include <linux/linkage.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+
+#include <asm/sync_bitops.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>
+
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
+#include <xen/events.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/event_channel.h>
+
+#include "events_internal.h"
+
+/*
+ * Note sizeof(xen_ulong_t) can be more than sizeof(unsigned long). Be
+ * careful to only use bitops which allow for this (e.g
+ * test_bit/find_first_bit and friends but not __ffs) and to pass
+ * BITS_PER_EVTCHN_WORD as the bitmask length.
+ */
+#define BITS_PER_EVTCHN_WORD (sizeof(xen_ulong_t)*8)
+/*
+ * Make a bitmask (i.e. unsigned long *) of a xen_ulong_t
+ * array. Primarily to avoid long lines (hence the terse name).
+ */
+#define BM(x) (unsigned long *)(x)
+/* Find the first set bit in a evtchn mask */
+#define EVTCHN_FIRST_BIT(w) find_first_bit(BM(&(w)), BITS_PER_EVTCHN_WORD)
+
+static DEFINE_PER_CPU(xen_ulong_t [NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD],
+		      cpu_evtchn_mask);
+
+void xen_evtchn_port_bind_to_cpu(struct irq_info *info, int cpu)
+{
+	clear_bit(info->evtchn, BM(per_cpu(cpu_evtchn_mask, info->cpu)));
+	set_bit(info->evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));
+}
+
+void clear_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	sync_clear_bit(port, BM(&s->evtchn_pending[0]));
+}
+
+void set_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	sync_set_bit(port, BM(&s->evtchn_pending[0]));
+}
+
+int test_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	return sync_test_bit(port, BM(&s->evtchn_pending[0]));
+}
+
+int test_and_set_mask(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	return sync_test_and_set_bit(port, BM(&s->evtchn_mask[0]));
+}
+
+void mask_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	sync_set_bit(port, BM(&s->evtchn_mask[0]));
+}
+
+void unmask_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	unsigned int cpu = get_cpu();
+	int do_hypercall = 0, evtchn_pending = 0;
+
+	BUG_ON(!irqs_disabled());
+
+	if (unlikely((cpu != cpu_from_evtchn(port))))
+		do_hypercall = 1;
+	else {
+		sync_clear_bit(port, BM(&s->evtchn_mask[0]));
+		evtchn_pending = sync_test_bit(port, BM(&s->evtchn_pending[0]));
+
+		if (unlikely(evtchn_pending && xen_hvm_domain()))
+			do_hypercall = 1;
+	}
+
+	/* Slow path (hypercall) if this is a non-local port or if this is
+	 * an hvm domain and an event is pending (hvm domains don''t have
+	 * their own implementation of irq_enable). */
+	if (do_hypercall) {
+		struct evtchn_unmask unmask = { .port = port };
+		(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
+	} else {
+		struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
+
+		/*
+		 * The following is basically the equivalent of
+		 * ''hw_resend_irq''. Just like a real IO-APIC we
''lose
+		 * the interrupt edge'' if the channel is masked.
+		 */
+		if (evtchn_pending &&
+		    !sync_test_and_set_bit(port / BITS_PER_EVTCHN_WORD,
+					   BM(&vcpu_info->evtchn_pending_sel)))
+			vcpu_info->evtchn_upcall_pending = 1;
+	}
+
+	put_cpu();
+}
+
+static DEFINE_PER_CPU(unsigned int, current_word_idx);
+static DEFINE_PER_CPU(unsigned int, current_bit_idx);
+
+/*
+ * Mask out the i least significant bits of w
+ */
+#define MASK_LSBS(w, i) (w & ((~((xen_ulong_t)0UL)) << i))
+
+static inline xen_ulong_t active_evtchns(unsigned int cpu,
+					 struct shared_info *sh,
+					 unsigned int idx)
+{
+	return sh->evtchn_pending[idx] &
+		per_cpu(cpu_evtchn_mask, cpu)[idx] &
+		~sh->evtchn_mask[idx];
+}
+
+/*
+ * Search the CPU''s pending events bitmasks.  For each one found, map
+ * the event number to an irq, and feed it into do_IRQ() for handling.
+ *
+ * Xen uses a two-level bitmap to speed searching.  The first level is
+ * a bitset of words which contain pending event bits.  The second
+ * level is a bitset of pending events themselves.
+ */
+void xen_evtchn_handle_events(int cpu)
+{
+	xen_ulong_t pending_words;
+	int start_word_idx, start_bit_idx;
+	int word_idx, bit_idx;
+	int i;
+	struct shared_info *s = HYPERVISOR_shared_info;
+	struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
+
+	/*
+	 * Master flag must be cleared /before/ clearing
+	 * selector flag. xchg_xen_ulong must contain an
+	 * appropriate barrier.
+	 */
+	pending_words = xchg_xen_ulong(&vcpu_info->evtchn_pending_sel, 0);
+
+	start_word_idx = __this_cpu_read(current_word_idx);
+	start_bit_idx = __this_cpu_read(current_bit_idx);
+
+	word_idx = start_word_idx;
+
+	for (i = 0; pending_words != 0; i++) {
+		xen_ulong_t pending_bits;
+		xen_ulong_t words;
+
+		words = MASK_LSBS(pending_words, word_idx);
+
+		/*
+		 * If we masked out all events, wrap to beginning.
+		 */
+		if (words == 0) {
+			word_idx = 0;
+			bit_idx = 0;
+			continue;
+		}
+		word_idx = EVTCHN_FIRST_BIT(words);
+
+		pending_bits = active_evtchns(cpu, s, word_idx);
+		bit_idx = 0; /* usually scan entire word from start */
+		/*
+		 * We scan the starting word in two parts.
+		 *
+		 * 1st time: start in the middle, scanning the
+		 * upper bits.
+		 *
+		 * 2nd time: scan the whole word (not just the
+		 * parts skipped in the first pass) -- if an
+		 * event in the previously scanned bits is
+		 * pending again it would just be scanned on
+		 * the next loop anyway.
+		 */
+		if (word_idx == start_word_idx) {
+			if (i == 0)
+				bit_idx = start_bit_idx;
+		}
+
+		do {
+			xen_ulong_t bits;
+			int port, irq;
+			struct irq_desc *desc;
+
+			bits = MASK_LSBS(pending_bits, bit_idx);
+
+			/* If we masked out all events, move on. */
+			if (bits == 0)
+				break;
+
+			bit_idx = EVTCHN_FIRST_BIT(bits);
+
+			/* Process port. */
+			port = (word_idx * BITS_PER_EVTCHN_WORD) + bit_idx;
+			irq = evtchn_to_irq[port];
+
+			if (irq != -1) {
+				desc = irq_to_desc(irq);
+				if (desc)
+					generic_handle_irq_desc(irq, desc);
+			}
+
+			bit_idx = (bit_idx + 1) % BITS_PER_EVTCHN_WORD;
+
+			/* Next caller starts at last processed + 1 */
+			__this_cpu_write(current_word_idx,
+					 bit_idx ? word_idx :
+					 (word_idx+1) % BITS_PER_EVTCHN_WORD);
+			__this_cpu_write(current_bit_idx, bit_idx);
+		} while (bit_idx != 0);
+
+		/* Scan start_l1i twice; all others once. */
+		if ((word_idx != start_word_idx) || (i != 0))
+			pending_words &= ~(1UL << word_idx);
+
+		word_idx = (word_idx + 1) % BITS_PER_EVTCHN_WORD;
+	}
+}
+
+irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
+{
+	struct shared_info *sh = HYPERVISOR_shared_info;
+	int cpu = smp_processor_id();
+	xen_ulong_t *cpu_evtchn = per_cpu(cpu_evtchn_mask, cpu);
+	int i;
+	unsigned long flags;
+	static DEFINE_SPINLOCK(debug_lock);
+	struct vcpu_info *v;
+
+	spin_lock_irqsave(&debug_lock, flags);
+
+	printk("\nvcpu %d\n  ", cpu);
+
+	for_each_online_cpu(i) {
+		int pending;
+		v = per_cpu(xen_vcpu, i);
+		pending = (get_irq_regs() && i == cpu)
+			? xen_irqs_disabled(get_irq_regs())
+			: v->evtchn_upcall_mask;
+		printk("%d: masked=%d pending=%d event_sel
%0*"PRI_xen_ulong"\n  ", i,
+		       pending, v->evtchn_upcall_pending,
+		       (int)(sizeof(v->evtchn_pending_sel)*2),
+		       v->evtchn_pending_sel);
+	}
+	v = per_cpu(xen_vcpu, cpu);
+
+	printk("\npending:\n   ");
+	for (i = ARRAY_SIZE(sh->evtchn_pending)-1; i >= 0; i--)
+		printk("%0*"PRI_xen_ulong"%s",
+		       (int)sizeof(sh->evtchn_pending[0])*2,
+		       sh->evtchn_pending[i],
+		       i % 8 == 0 ? "\n   " : " ");
+	printk("\nglobal mask:\n   ");
+	for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
+		printk("%0*"PRI_xen_ulong"%s",
+		       (int)(sizeof(sh->evtchn_mask[0])*2),
+		       sh->evtchn_mask[i],
+		       i % 8 == 0 ? "\n   " : " ");
+
+	printk("\nglobally unmasked:\n   ");
+	for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
+		printk("%0*"PRI_xen_ulong"%s",
+		       (int)(sizeof(sh->evtchn_mask[0])*2),
+		       sh->evtchn_pending[i] & ~sh->evtchn_mask[i],
+		       i % 8 == 0 ? "\n   " : " ");
+
+	printk("\nlocal cpu%d mask:\n   ", cpu);
+	for (i = (NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD)-1; i >= 0; i--)
+		printk("%0*"PRI_xen_ulong"%s",
(int)(sizeof(cpu_evtchn[0])*2),
+		       cpu_evtchn[i],
+		       i % 8 == 0 ? "\n   " : " ");
+
+	printk("\nlocally unmasked:\n   ");
+	for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--) {
+		xen_ulong_t pending = sh->evtchn_pending[i]
+			& ~sh->evtchn_mask[i]
+			& cpu_evtchn[i];
+		printk("%0*"PRI_xen_ulong"%s",
+		       (int)(sizeof(sh->evtchn_mask[0])*2),
+		       pending, i % 8 == 0 ? "\n   " : " ");
+	}
+
+	printk("\npending list:\n");
+	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
+		if (sync_test_bit(i, BM(sh->evtchn_pending))) {
+			int word_idx = i / BITS_PER_EVTCHN_WORD;
+			printk("  %d: event %d -> irq %d%s%s%s\n",
+			       cpu_from_evtchn(i), i,
+			       evtchn_to_irq[i],
+			       sync_test_bit(word_idx, BM(&v->evtchn_pending_sel))
+					     ? "" : " l2-clear",
+			       !sync_test_bit(i, BM(sh->evtchn_mask))
+					     ? "" : " globally-masked",
+			       sync_test_bit(i, BM(cpu_evtchn))
+					     ? "" : " locally-masked");
+		}
+	}
+
+	spin_unlock_irqrestore(&debug_lock, flags);
+
+	return IRQ_HANDLED;
+}
diff --git a/drivers/xen/events/events_internal.h
b/drivers/xen/events/events_internal.h
new file mode 100644
index 0000000..79ac70b
--- /dev/null
+++ b/drivers/xen/events/events_internal.h
@@ -0,0 +1,74 @@
+/*
+ * Xen Event Channels (internal header)
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2 or later.  See the file COPYING for more details.
+ */
+#ifndef __EVENTS_INTERNAL_H__
+#define __EVENTS_INTERNAL_H__
+
+/* Interrupt types. */
+enum xen_irq_type {
+	IRQT_UNBOUND = 0,
+	IRQT_PIRQ,
+	IRQT_VIRQ,
+	IRQT_IPI,
+	IRQT_EVTCHN
+};
+
+/*
+ * Packed IRQ information:
+ * type - enum xen_irq_type
+ * event channel - irq->event channel mapping
+ * cpu - cpu this event channel is bound to
+ * index - type-specific information:
+ *    PIRQ - vector, with MSB being "needs EIO", or physical IRQ of
the HVM
+ *           guest, or GSI (real passthrough IRQ) of the device.
+ *    VIRQ - virq number
+ *    IPI - IPI vector
+ *    EVTCHN -
+ */
+struct irq_info {
+	struct list_head list;
+	int refcnt;
+	enum xen_irq_type type;	/* type */
+	unsigned irq;
+	unsigned short evtchn;	/* event channel */
+	unsigned short cpu;	/* cpu bound */
+
+	union {
+		unsigned short virq;
+		enum ipi_vector ipi;
+		struct {
+			unsigned short pirq;
+			unsigned short gsi;
+			unsigned char vector;
+			unsigned char flags;
+			uint16_t domid;
+		} pirq;
+	} u;
+};
+
+#define PIRQ_NEEDS_EOI	(1 << 0)
+#define PIRQ_SHAREABLE	(1 << 1)
+
+extern int *evtchn_to_irq;
+
+struct irq_info *info_for_irq(unsigned irq);
+unsigned cpu_from_irq(unsigned irq);
+unsigned cpu_from_evtchn(unsigned int evtchn);
+
+void xen_evtchn_port_bind_to_cpu(struct irq_info *info, int cpu);
+
+void clear_evtchn(int port);
+void set_evtchn(int port);
+int test_evtchn(int port);
+int test_and_set_mask(int port);
+void mask_evtchn(int port);
+void unmask_evtchn(int port);
+
+void xen_evtchn_handle_events(int cpu);
+
+#endif /* #ifndef __EVENTS_INTERNAL_H__ */
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 07/16] xen/events: add struct evtchn_ops for the low-level port operations

From: David Vrabel <david.vrabel@citrix.com>

evtchn_ops contains the low-level operations that access the shared
data structures.  This allows alternate ABIs to be supported.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/events/events.c          |    4 ++
 drivers/xen/events/events_2l.c       |   32 +++++++++++++----
 drivers/xen/events/events_internal.h |   63 +++++++++++++++++++++++++++++----
 3 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index 2700c12..bc35f5d 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -61,6 +61,8 @@
 
 #include "events_internal.h"
 
+const struct evtchn_ops *evtchn_ops;
+
 /*
  * This lock protects updates to the following mapping and reference-count
  * arrays. The lock does not need to be acquired to read the mapping tables.
@@ -1518,6 +1520,8 @@ void __init xen_init_IRQ(void)
 {
 	int i;
 
+	xen_evtchn_2l_init();
+
 	evtchn_to_irq = kcalloc(NR_EVENT_CHANNELS, sizeof(*evtchn_to_irq),
 				    GFP_KERNEL);
 	BUG_ON(!evtchn_to_irq);
diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index c4f0b55..5f27c72 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -39,43 +39,43 @@
 static DEFINE_PER_CPU(xen_ulong_t [NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD],
 		      cpu_evtchn_mask);
 
-void xen_evtchn_port_bind_to_cpu(struct irq_info *info, int cpu)
+static void evtchn_2l_bind_to_cpu(struct irq_info *info, unsigned cpu)
 {
 	clear_bit(info->evtchn, BM(per_cpu(cpu_evtchn_mask, info->cpu)));
 	set_bit(info->evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));
 }
 
-void clear_evtchn(int port)
+static void evtchn_2l_clear_pending(unsigned port)
 {
 	struct shared_info *s = HYPERVISOR_shared_info;
 	sync_clear_bit(port, BM(&s->evtchn_pending[0]));
 }
 
-void set_evtchn(int port)
+static void evtchn_2l_set_pending(unsigned port)
 {
 	struct shared_info *s = HYPERVISOR_shared_info;
 	sync_set_bit(port, BM(&s->evtchn_pending[0]));
 }
 
-int test_evtchn(int port)
+static bool evtchn_2l_is_pending(unsigned port)
 {
 	struct shared_info *s = HYPERVISOR_shared_info;
 	return sync_test_bit(port, BM(&s->evtchn_pending[0]));
 }
 
-int test_and_set_mask(int port)
+static bool evtchn_2l_test_and_set_mask(unsigned port)
 {
 	struct shared_info *s = HYPERVISOR_shared_info;
 	return sync_test_and_set_bit(port, BM(&s->evtchn_mask[0]));
 }
 
-void mask_evtchn(int port)
+static void evtchn_2l_mask(unsigned port)
 {
 	struct shared_info *s = HYPERVISOR_shared_info;
 	sync_set_bit(port, BM(&s->evtchn_mask[0]));
 }
 
-void unmask_evtchn(int port)
+static void evtchn_2l_unmask(unsigned port)
 {
 	struct shared_info *s = HYPERVISOR_shared_info;
 	unsigned int cpu = get_cpu();
@@ -141,7 +141,7 @@ static inline xen_ulong_t active_evtchns(unsigned int cpu,
  * a bitset of words which contain pending event bits.  The second
  * level is a bitset of pending events themselves.
  */
-void xen_evtchn_handle_events(int cpu)
+static void evtchn_2l_handle_events(unsigned cpu)
 {
 	xen_ulong_t pending_words;
 	int start_word_idx, start_bit_idx;
@@ -320,3 +320,19 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 
 	return IRQ_HANDLED;
 }
+
+static const struct evtchn_ops evtchn_ops_2l = {
+	.bind_to_cpu       = evtchn_2l_bind_to_cpu,
+	.clear_pending     = evtchn_2l_clear_pending,
+	.set_pending       = evtchn_2l_set_pending,
+	.is_pending        = evtchn_2l_is_pending,
+	.test_and_set_mask = evtchn_2l_test_and_set_mask,
+	.mask              = evtchn_2l_mask,
+	.unmask            = evtchn_2l_unmask,
+	.handle_events     = evtchn_2l_handle_events,
+};
+
+void __init xen_evtchn_2l_init(void)
+{
+	evtchn_ops = &evtchn_ops_2l;
+}
diff --git a/drivers/xen/events/events_internal.h
b/drivers/xen/events/events_internal.h
index 79ac70b..ba8142f 100644
--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -54,21 +54,68 @@ struct irq_info {
 #define PIRQ_NEEDS_EOI	(1 << 0)
 #define PIRQ_SHAREABLE	(1 << 1)
 
+struct evtchn_ops {
+	void (*bind_to_cpu)(struct irq_info *info, unsigned cpu);
+
+	void (*clear_pending)(unsigned port);
+	void (*set_pending)(unsigned port);
+	bool (*is_pending)(unsigned port);
+	bool (*test_and_set_mask)(unsigned port);
+	void (*mask)(unsigned port);
+	void (*unmask)(unsigned port);
+
+	void (*handle_events)(unsigned cpu);
+};
+
+extern const struct evtchn_ops *evtchn_ops;
+
 extern int *evtchn_to_irq;
 
 struct irq_info *info_for_irq(unsigned irq);
 unsigned cpu_from_irq(unsigned irq);
 unsigned cpu_from_evtchn(unsigned int evtchn);
 
-void xen_evtchn_port_bind_to_cpu(struct irq_info *info, int cpu);
+static inline void xen_evtchn_port_bind_to_cpu(struct irq_info *info,
+					       unsigned cpu)
+{
+	evtchn_ops->bind_to_cpu(info, cpu);
+}
+
+static inline void clear_evtchn(unsigned port)
+{
+	evtchn_ops->clear_pending(port);
+}
+
+static inline void set_evtchn(unsigned port)
+{
+	evtchn_ops->set_pending(port);
+}
+
+static inline bool test_evtchn(unsigned port)
+{
+	return evtchn_ops->is_pending(port);
+}
+
+static inline bool test_and_set_mask(unsigned port)
+{
+	return evtchn_ops->test_and_set_mask(port);
+}
+
+static inline void mask_evtchn(unsigned port)
+{
+	return evtchn_ops->mask(port);
+}
+
+static inline void unmask_evtchn(unsigned port)
+{
+	return evtchn_ops->unmask(port);
+}
 
-void clear_evtchn(int port);
-void set_evtchn(int port);
-int test_evtchn(int port);
-int test_and_set_mask(int port);
-void mask_evtchn(int port);
-void unmask_evtchn(int port);
+static inline void xen_evtchn_handle_events(unsigned cpu)
+{
+	return evtchn_ops->handle_events(cpu);
+}
 
-void xen_evtchn_handle_events(int cpu);
+void xen_evtchn_2l_init(void);
 
 #endif /* #ifndef __EVENTS_INTERNAL_H__ */
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 08/16] xen/events: allow setup of irq_info to fail

From: David Vrabel <david.vrabel@citrix.com>

The FIFO-based event ABI requires additional setup of newly bound
events (it may need to expand the event array) and this setup may
fail.

xen_irq_info_common_init() is a useful place to put this setup so
allow this call to fail.  This call and the other similar calls are
renamed to be *_setup() to reflect that they may now fail.

This failure can only occur with new event channels not on rebind.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/events/events.c |  156 +++++++++++++++++++++++++------------------
 1 files changed, 91 insertions(+), 65 deletions(-)

diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index bc35f5d..9628df8 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -99,7 +99,7 @@ struct irq_info *info_for_irq(unsigned irq)
 }
 
 /* Constructors for packed IRQ information. */
-static void xen_irq_info_common_init(struct irq_info *info,
+static int xen_irq_info_common_setup(struct irq_info *info,
 				     unsigned irq,
 				     enum xen_irq_type type,
 				     unsigned short evtchn,
@@ -116,45 +116,47 @@ static void xen_irq_info_common_init(struct irq_info
*info,
 	evtchn_to_irq[evtchn] = irq;
 
 	irq_clear_status_flags(irq, IRQ_NOREQUEST|IRQ_NOAUTOEN);
+
+	return 0;
 }
 
-static void xen_irq_info_evtchn_init(unsigned irq,
+static int xen_irq_info_evtchn_setup(unsigned irq,
 				     unsigned short evtchn)
 {
 	struct irq_info *info = info_for_irq(irq);
 
-	xen_irq_info_common_init(info, irq, IRQT_EVTCHN, evtchn, 0);
+	return xen_irq_info_common_setup(info, irq, IRQT_EVTCHN, evtchn, 0);
 }
 
-static void xen_irq_info_ipi_init(unsigned cpu,
+static int xen_irq_info_ipi_setup(unsigned cpu,
 				  unsigned irq,
 				  unsigned short evtchn,
 				  enum ipi_vector ipi)
 {
 	struct irq_info *info = info_for_irq(irq);
 
-	xen_irq_info_common_init(info, irq, IRQT_IPI, evtchn, 0);
-
 	info->u.ipi = ipi;
 
 	per_cpu(ipi_to_irq, cpu)[ipi] = irq;
+
+	return xen_irq_info_common_setup(info, irq, IRQT_IPI, evtchn, 0);
 }
 
-static void xen_irq_info_virq_init(unsigned cpu,
+static int xen_irq_info_virq_setup(unsigned cpu,
 				   unsigned irq,
 				   unsigned short evtchn,
 				   unsigned short virq)
 {
 	struct irq_info *info = info_for_irq(irq);
 
-	xen_irq_info_common_init(info, irq, IRQT_VIRQ, evtchn, 0);
-
 	info->u.virq = virq;
 
 	per_cpu(virq_to_irq, cpu)[virq] = irq;
+
+	return xen_irq_info_common_setup(info, irq, IRQT_VIRQ, evtchn, 0);
 }
 
-static void xen_irq_info_pirq_init(unsigned irq,
+static int xen_irq_info_pirq_setup(unsigned irq,
 				   unsigned short evtchn,
 				   unsigned short pirq,
 				   unsigned short gsi,
@@ -163,12 +165,12 @@ static void xen_irq_info_pirq_init(unsigned irq,
 {
 	struct irq_info *info = info_for_irq(irq);
 
-	xen_irq_info_common_init(info, irq, IRQT_PIRQ, evtchn, 0);
-
 	info->u.pirq.pirq = pirq;
 	info->u.pirq.gsi = gsi;
 	info->u.pirq.domid = domid;
 	info->u.pirq.flags = flags;
+
+	return xen_irq_info_common_setup(info, irq, IRQT_PIRQ, evtchn, 0);
 }
 
 /*
@@ -516,6 +518,47 @@ int xen_irq_from_gsi(unsigned gsi)
 }
 EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
 
+static void __unbind_from_irq(unsigned int irq)
+{
+	struct evtchn_close close;
+	int evtchn = evtchn_from_irq(irq);
+	struct irq_info *info = irq_get_handler_data(irq);
+
+	if (info->refcnt > 0) {
+		info->refcnt--;
+		if (info->refcnt != 0)
+			return;
+	}
+
+	if (VALID_EVTCHN(evtchn)) {
+		close.port = evtchn;
+		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+			BUG();
+
+		switch (type_from_irq(irq)) {
+		case IRQT_VIRQ:
+			per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
+				[virq_from_irq(irq)] = -1;
+			break;
+		case IRQT_IPI:
+			per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn))
+				[ipi_from_irq(irq)] = -1;
+			break;
+		default:
+			break;
+		}
+
+		/* Closed ports are implicitly re-bound to VCPU0. */
+		bind_evtchn_to_cpu(evtchn, 0);
+
+		evtchn_to_irq[evtchn] = -1;
+	}
+
+	BUG_ON(info_for_irq(irq)->type == IRQT_UNBOUND);
+
+	xen_free_irq(irq);
+}
+
 /*
  * Do not make any assumptions regarding the relationship between the
  * IRQ number returned here and the Xen pirq argument.
@@ -531,6 +574,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
 {
 	int irq = -1;
 	struct physdev_irq irq_op;
+	int ret;
 
 	mutex_lock(&irq_mapping_update_lock);
 
@@ -558,8 +602,13 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
 		goto out;
 	}
 
-	xen_irq_info_pirq_init(irq, 0, pirq, gsi, DOMID_SELF,
+	ret = xen_irq_info_pirq_setup(irq, 0, pirq, gsi, DOMID_SELF,
 			       shareable ? PIRQ_SHAREABLE : 0);
+	if (ret < 0) {
+		__unbind_from_irq(irq);
+		irq = ret;
+		goto out;
+	}
 
 	pirq_query_unmask(irq);
 	/* We try to use the handler with the appropriate semantic for the
@@ -619,7 +668,9 @@ int xen_bind_pirq_msi_to_irq(struct pci_dev *dev, struct
msi_desc *msidesc,
 	irq_set_chip_and_handler_name(irq, &xen_pirq_chip, handle_edge_irq,
 			name);
 
-	xen_irq_info_pirq_init(irq, 0, pirq, 0, domid, 0);
+	ret = xen_irq_info_pirq_setup(irq, 0, pirq, 0, domid, 0);
+	if (ret < 0)
+		goto error_irq;
 	ret = irq_set_msi_desc(irq, msidesc);
 	if (ret < 0)
 		goto error_irq;
@@ -627,8 +678,8 @@ out:
 	mutex_unlock(&irq_mapping_update_lock);
 	return irq;
 error_irq:
+	__unbind_from_irq(irq);
 	mutex_unlock(&irq_mapping_update_lock);
-	xen_free_irq(irq);
 	return ret;
 }
 #endif
@@ -698,9 +749,11 @@ int xen_pirq_from_irq(unsigned irq)
 	return pirq_from_irq(irq);
 }
 EXPORT_SYMBOL_GPL(xen_pirq_from_irq);
+
 int bind_evtchn_to_irq(unsigned int evtchn)
 {
 	int irq;
+	int ret;
 
 	mutex_lock(&irq_mapping_update_lock);
 
@@ -714,7 +767,12 @@ int bind_evtchn_to_irq(unsigned int evtchn)
 		irq_set_chip_and_handler_name(irq, &xen_dynamic_chip,
 					      handle_edge_irq, "event");
 
-		xen_irq_info_evtchn_init(irq, evtchn);
+		ret = xen_irq_info_evtchn_setup(irq, evtchn);
+		if (ret < 0) {
+			__unbind_from_irq(irq);
+			irq = ret;
+			goto out;
+		}
 	} else {
 		struct irq_info *info = info_for_irq(irq);
 		WARN_ON(info == NULL || info->type != IRQT_EVTCHN);
@@ -731,6 +789,7 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int
cpu)
 {
 	struct evtchn_bind_ipi bind_ipi;
 	int evtchn, irq;
+	int ret;
 
 	mutex_lock(&irq_mapping_update_lock);
 
@@ -750,8 +809,12 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int
cpu)
 			BUG();
 		evtchn = bind_ipi.port;
 
-		xen_irq_info_ipi_init(cpu, irq, evtchn, ipi);
-
+		ret = xen_irq_info_ipi_setup(cpu, irq, evtchn, ipi);
+		if (ret < 0) {
+			__unbind_from_irq(irq);
+			irq = ret;
+			goto out;
+		}
 		bind_evtchn_to_cpu(evtchn, cpu);
 	} else {
 		struct irq_info *info = info_for_irq(irq);
@@ -830,7 +893,12 @@ int bind_virq_to_irq(unsigned int virq, unsigned int cpu)
 			evtchn = ret;
 		}
 
-		xen_irq_info_virq_init(cpu, irq, evtchn, virq);
+		ret = xen_irq_info_virq_setup(cpu, irq, evtchn, virq);
+		if (ret < 0) {
+			__unbind_from_irq(irq);
+			irq = ret;
+			goto out;
+		}
 
 		bind_evtchn_to_cpu(evtchn, cpu);
 	} else {
@@ -846,50 +914,8 @@ out:
 
 static void unbind_from_irq(unsigned int irq)
 {
-	struct evtchn_close close;
-	int evtchn = evtchn_from_irq(irq);
-	struct irq_info *info = irq_get_handler_data(irq);
-
-	if (WARN_ON(!info))
-		return;
-
 	mutex_lock(&irq_mapping_update_lock);
-
-	if (info->refcnt > 0) {
-		info->refcnt--;
-		if (info->refcnt != 0)
-			goto done;
-	}
-
-	if (VALID_EVTCHN(evtchn)) {
-		close.port = evtchn;
-		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
-			BUG();
-
-		switch (type_from_irq(irq)) {
-		case IRQT_VIRQ:
-			per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
-				[virq_from_irq(irq)] = -1;
-			break;
-		case IRQT_IPI:
-			per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn))
-				[ipi_from_irq(irq)] = -1;
-			break;
-		default:
-			break;
-		}
-
-		/* Closed ports are implicitly re-bound to VCPU0. */
-		bind_evtchn_to_cpu(evtchn, 0);
-
-		evtchn_to_irq[evtchn] = -1;
-	}
-
-	BUG_ON(info_for_irq(irq)->type == IRQT_UNBOUND);
-
-	xen_free_irq(irq);
-
- done:
+	__unbind_from_irq(irq);
 	mutex_unlock(&irq_mapping_update_lock);
 }
 
@@ -1137,7 +1163,7 @@ void rebind_evtchn_irq(int evtchn, int irq)
 	   so there should be a proper type */
 	BUG_ON(info->type == IRQT_UNBOUND);
 
-	xen_irq_info_evtchn_init(irq, evtchn);
+	xen_irq_info_evtchn_setup(irq, evtchn);
 
 	mutex_unlock(&irq_mapping_update_lock);
 
@@ -1312,7 +1338,7 @@ static void restore_cpu_virqs(unsigned int cpu)
 		evtchn = bind_virq.port;
 
 		/* Record the new mapping. */
-		xen_irq_info_virq_init(cpu, irq, evtchn, virq);
+		xen_irq_info_virq_setup(cpu, irq, evtchn, virq);
 		bind_evtchn_to_cpu(evtchn, cpu);
 	}
 }
@@ -1336,7 +1362,7 @@ static void restore_cpu_ipis(unsigned int cpu)
 		evtchn = bind_ipi.port;
 
 		/* Record the new mapping. */
-		xen_irq_info_ipi_init(cpu, irq, evtchn, ipi);
+		xen_irq_info_ipi_setup(cpu, irq, evtchn, ipi);
 		bind_evtchn_to_cpu(evtchn, cpu);
 	}
 }
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 09/16] xen/events: add a evtchn_op for port setup

From: David Vrabel <david.vrabel@citrix.com>

Add a hook for port-specific setup and call it from
xen_irq_info_common_setup().

The FIFO-based ABIs may need to perform additional setup (expanding
the event array) before a bound event channel can start to receive
events.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/events/events.c          |    2 +-
 drivers/xen/events/events_internal.h |   12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index 9628df8..b06b913 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -117,7 +117,7 @@ static int xen_irq_info_common_setup(struct irq_info *info,
 
 	irq_clear_status_flags(irq, IRQ_NOREQUEST|IRQ_NOAUTOEN);
 
-	return 0;
+	return xen_evtchn_port_setup(info);
 }
 
 static int xen_irq_info_evtchn_setup(unsigned irq,
diff --git a/drivers/xen/events/events_internal.h
b/drivers/xen/events/events_internal.h
index ba8142f..dc96502 100644
--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -55,6 +55,7 @@ struct irq_info {
 #define PIRQ_SHAREABLE	(1 << 1)
 
 struct evtchn_ops {
+	int (*setup)(struct irq_info *info);
 	void (*bind_to_cpu)(struct irq_info *info, unsigned cpu);
 
 	void (*clear_pending)(unsigned port);
@@ -75,6 +76,17 @@ struct irq_info *info_for_irq(unsigned irq);
 unsigned cpu_from_irq(unsigned irq);
 unsigned cpu_from_evtchn(unsigned int evtchn);
 
+/*
+ * Do any ABI specific setup for a bound event channel before it can
+ * be unmasked and used.
+ */
+static inline int xen_evtchn_port_setup(struct irq_info *info)
+{
+	if (evtchn_ops->setup)
+		return evtchn_ops->setup(info);
+	return 0;
+}
+
 static inline void xen_evtchn_port_bind_to_cpu(struct irq_info *info,
 					       unsigned cpu)
 {
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 10/16] xen/events: Refactor evtchn_to_irq array to be dynamically allocated

From: Malcolm Crossley <malcolm.crossley@citrix.com>

Refactor static array evtchn_to_irq array to be dynamically allocated by
implementing get and set functions for accesses to the array.

Two new port ops are added: max_channels (maximum supported number of
event channels) and nr_channels (number of currently usable event
channels).  For the 2-level ABI, these numbers are both the same as
the shared data structure is a fixed size. For the FIFO ABI, these
will be different as the event array is expanded dynamically.

This allows more than 65000 event channels so an unsigned short is no
longer sufficient for an event channel port number and unsigned int is
used instead.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/events/events.c          |  158 ++++++++++++++++++++++++----------
 drivers/xen/events/events_2l.c       |   11 ++-
 drivers/xen/events/events_internal.h |   18 ++++-
 3 files changed, 137 insertions(+), 50 deletions(-)

diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index b06b913..636079b 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -77,12 +77,16 @@ static DEFINE_PER_CPU(int [NR_VIRQS], virq_to_irq) = {[0 ...
NR_VIRQS-1] = -1};
 /* IRQ <-> IPI mapping */
 static DEFINE_PER_CPU(int [XEN_NR_IPIS], ipi_to_irq) = {[0 ... XEN_NR_IPIS-1] =
-1};
 
-int *evtchn_to_irq;
+int **evtchn_to_irq;
 #ifdef CONFIG_X86
 static unsigned long *pirq_eoi_map;
 #endif
 static bool (*pirq_needs_eoi)(unsigned irq);
 
+#define EVTCHN_ROW(e)  (e / (PAGE_SIZE/sizeof(**evtchn_to_irq)))
+#define EVTCHN_COL(e)  (e % (PAGE_SIZE/sizeof(**evtchn_to_irq)))
+#define EVTCHN_PER_ROW (PAGE_SIZE / sizeof(**evtchn_to_irq))
+
 /* Xen will never allocate port zero for any purpose. */
 #define VALID_EVTCHN(chn)	((chn) != 0)
 
@@ -92,6 +96,61 @@ static struct irq_chip xen_pirq_chip;
 static void enable_dynirq(struct irq_data *data);
 static void disable_dynirq(struct irq_data *data);
 
+static void clear_evtchn_to_irq_row(unsigned row)
+{
+	unsigned col;
+
+	for (col = 0; col < EVTCHN_PER_ROW; col++)
+		evtchn_to_irq[row][col] = -1;
+}
+
+static void clear_evtchn_to_irq_all(void)
+{
+	unsigned row;
+
+	for (row = 0; row < EVTCHN_ROW(xen_evtchn_max_channels()); row++) {
+		if (evtchn_to_irq[row] == NULL)
+			continue;
+		clear_evtchn_to_irq_row(row);
+	}
+}
+
+static int set_evtchn_to_irq(unsigned evtchn, unsigned irq)
+{
+	unsigned row;
+	unsigned col;
+
+	if (evtchn >= xen_evtchn_max_channels())
+		return -EINVAL;
+
+	row = EVTCHN_ROW(evtchn);
+	col = EVTCHN_COL(evtchn);
+
+	if (evtchn_to_irq[row] == NULL) {
+		/* Unallocated irq entries return -1 anyway */
+		if (irq == -1)
+			return 0;
+
+		evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
+		if (evtchn_to_irq[row] == NULL)
+			return -ENOMEM;
+
+		clear_evtchn_to_irq_row(row);
+	}
+
+	evtchn_to_irq[EVTCHN_ROW(evtchn)][EVTCHN_COL(evtchn)] = irq;
+	return 0;
+}
+
+int get_evtchn_to_irq(unsigned evtchn)
+{
+	if (evtchn >= xen_evtchn_max_channels())
+		return -1;
+	if (evtchn_to_irq[EVTCHN_ROW(evtchn)] == NULL)
+		return -1;
+	return evtchn_to_irq[EVTCHN_ROW(evtchn)][EVTCHN_COL(evtchn)];
+}
+
 /* Get info for IRQ */
 struct irq_info *info_for_irq(unsigned irq)
 {
@@ -102,7 +161,7 @@ struct irq_info *info_for_irq(unsigned irq)
 static int xen_irq_info_common_setup(struct irq_info *info,
 				     unsigned irq,
 				     enum xen_irq_type type,
-				     unsigned short evtchn,
+				     unsigned evtchn,
 				     unsigned short cpu)
 {
 
@@ -113,7 +172,8 @@ static int xen_irq_info_common_setup(struct irq_info *info,
 	info->evtchn = evtchn;
 	info->cpu = cpu;
 
-	evtchn_to_irq[evtchn] = irq;
+	if (set_evtchn_to_irq(evtchn, irq))
+		return -ENOMEM;
 
 	irq_clear_status_flags(irq, IRQ_NOREQUEST|IRQ_NOAUTOEN);
 
@@ -121,7 +181,7 @@ static int xen_irq_info_common_setup(struct irq_info *info,
 }
 
 static int xen_irq_info_evtchn_setup(unsigned irq,
-				     unsigned short evtchn)
+				     unsigned evtchn)
 {
 	struct irq_info *info = info_for_irq(irq);
 
@@ -130,7 +190,7 @@ static int xen_irq_info_evtchn_setup(unsigned irq,
 
 static int xen_irq_info_ipi_setup(unsigned cpu,
 				  unsigned irq,
-				  unsigned short evtchn,
+				  unsigned evtchn,
 				  enum ipi_vector ipi)
 {
 	struct irq_info *info = info_for_irq(irq);
@@ -144,8 +204,8 @@ static int xen_irq_info_ipi_setup(unsigned cpu,
 
 static int xen_irq_info_virq_setup(unsigned cpu,
 				   unsigned irq,
-				   unsigned short evtchn,
-				   unsigned short virq)
+				   unsigned evtchn,
+				   unsigned virq)
 {
 	struct irq_info *info = info_for_irq(irq);
 
@@ -157,9 +217,9 @@ static int xen_irq_info_virq_setup(unsigned cpu,
 }
 
 static int xen_irq_info_pirq_setup(unsigned irq,
-				   unsigned short evtchn,
-				   unsigned short pirq,
-				   unsigned short gsi,
+				   unsigned evtchn,
+				   unsigned pirq,
+				   unsigned gsi,
 				   uint16_t domid,
 				   unsigned char flags)
 {
@@ -186,7 +246,7 @@ static unsigned int evtchn_from_irq(unsigned irq)
 
 unsigned irq_from_evtchn(unsigned int evtchn)
 {
-	return evtchn_to_irq[evtchn];
+	return get_evtchn_to_irq(evtchn);
 }
 EXPORT_SYMBOL_GPL(irq_from_evtchn);
 
@@ -232,7 +292,7 @@ unsigned cpu_from_irq(unsigned irq)
 
 unsigned int cpu_from_evtchn(unsigned int evtchn)
 {
-	int irq = evtchn_to_irq[evtchn];
+	int irq = get_evtchn_to_irq(evtchn);
 	unsigned ret = 0;
 
 	if (irq != -1)
@@ -258,7 +318,7 @@ static bool pirq_needs_eoi_flag(unsigned irq)
 
 static void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
 {
-	int irq = evtchn_to_irq[chn];
+	int irq = get_evtchn_to_irq(chn);
 	struct irq_info *info = info_for_irq(irq);
 
 	BUG_ON(irq == -1);
@@ -381,6 +441,18 @@ static void xen_free_irq(unsigned irq)
 	irq_free_desc(irq);
 }
 
+static void xen_evtchn_close(unsigned port)
+{
+	struct evtchn_close close;
+
+	close.port = port;
+	if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+		BUG();
+
+	/* Closed ports are implicitly re-bound to VCPU0. */
+	bind_evtchn_to_cpu(port, 0);
+}
+
 static void pirq_query_unmask(int irq)
 {
 	struct physdev_irq_status_query irq_status;
@@ -453,7 +525,13 @@ static unsigned int __startup_pirq(unsigned int irq)
 
 	pirq_query_unmask(irq);
 
-	evtchn_to_irq[evtchn] = irq;
+	rc = set_evtchn_to_irq(evtchn, irq);
+	if (rc != 0) {
+		pr_err("irq%d: Failed to set port to irq mapping (%d)\n",
+		       irq, rc);
+		xen_evtchn_close(evtchn);
+		return 0;
+	}
 	bind_evtchn_to_cpu(evtchn, 0);
 	info->evtchn = evtchn;
 
@@ -471,10 +549,9 @@ static unsigned int startup_pirq(struct irq_data *data)
 
 static void shutdown_pirq(struct irq_data *data)
 {
-	struct evtchn_close close;
 	unsigned int irq = data->irq;
 	struct irq_info *info = info_for_irq(irq);
-	int evtchn = evtchn_from_irq(irq);
+	unsigned evtchn = evtchn_from_irq(irq);
 
 	BUG_ON(info->type != IRQT_PIRQ);
 
@@ -482,13 +559,8 @@ static void shutdown_pirq(struct irq_data *data)
 		return;
 
 	mask_evtchn(evtchn);
-
-	close.port = evtchn;
-	if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
-		BUG();
-
-	bind_evtchn_to_cpu(evtchn, 0);
-	evtchn_to_irq[evtchn] = -1;
+	xen_evtchn_close(evtchn);
+	set_evtchn_to_irq(evtchn, -1);
 	info->evtchn = 0;
 }
 
@@ -520,7 +592,6 @@ EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
 
 static void __unbind_from_irq(unsigned int irq)
 {
-	struct evtchn_close close;
 	int evtchn = evtchn_from_irq(irq);
 	struct irq_info *info = irq_get_handler_data(irq);
 
@@ -531,9 +602,7 @@ static void __unbind_from_irq(unsigned int irq)
 	}
 
 	if (VALID_EVTCHN(evtchn)) {
-		close.port = evtchn;
-		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
-			BUG();
+		xen_evtchn_close(evtchn);
 
 		switch (type_from_irq(irq)) {
 		case IRQT_VIRQ:
@@ -548,10 +617,7 @@ static void __unbind_from_irq(unsigned int irq)
 			break;
 		}
 
-		/* Closed ports are implicitly re-bound to VCPU0. */
-		bind_evtchn_to_cpu(evtchn, 0);
-
-		evtchn_to_irq[evtchn] = -1;
+		set_evtchn_to_irq(evtchn, -1);
 	}
 
 	BUG_ON(info_for_irq(irq)->type == IRQT_UNBOUND);
@@ -755,9 +821,12 @@ int bind_evtchn_to_irq(unsigned int evtchn)
 	int irq;
 	int ret;
 
+	if (evtchn >= xen_evtchn_max_channels())
+		return -ENOMEM;
+
 	mutex_lock(&irq_mapping_update_lock);
 
-	irq = evtchn_to_irq[evtchn];
+	irq = get_evtchn_to_irq(evtchn);
 
 	if (irq == -1) {
 		irq = xen_allocate_irq_dynamic();
@@ -847,7 +916,7 @@ static int find_virq(unsigned int virq, unsigned int cpu)
 	int port, rc = -ENOENT;
 
 	memset(&status, 0, sizeof(status));
-	for (port = 0; port <= NR_EVENT_CHANNELS; port++) {
+	for (port = 0; port < xen_evtchn_max_channels(); port++) {
 		status.dom = DOMID_SELF;
 		status.port = port;
 		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
@@ -1017,7 +1086,7 @@ EXPORT_SYMBOL_GPL(unbind_from_irqhandler);
 
 int evtchn_make_refcounted(unsigned int evtchn)
 {
-	int irq = evtchn_to_irq[evtchn];
+	int irq = get_evtchn_to_irq(evtchn);
 	struct irq_info *info;
 
 	if (irq == -1)
@@ -1042,12 +1111,12 @@ int evtchn_get(unsigned int evtchn)
 	struct irq_info *info;
 	int err = -ENOENT;
 
-	if (evtchn >= NR_EVENT_CHANNELS)
+	if (evtchn >= xen_evtchn_max_channels())
 		return -EINVAL;
 
 	mutex_lock(&irq_mapping_update_lock);
 
-	irq = evtchn_to_irq[evtchn];
+	irq = get_evtchn_to_irq(evtchn);
 	if (irq == -1)
 		goto done;
 
@@ -1071,7 +1140,7 @@ EXPORT_SYMBOL_GPL(evtchn_get);
 
 void evtchn_put(unsigned int evtchn)
 {
-	int irq = evtchn_to_irq[evtchn];
+	int irq = get_evtchn_to_irq(evtchn);
 	if (WARN_ON(irq == -1))
 		return;
 	unbind_from_irq(irq);
@@ -1158,7 +1227,7 @@ void rebind_evtchn_irq(int evtchn, int irq)
 	mutex_lock(&irq_mapping_update_lock);
 
 	/* After resume the irq<->evtchn mappings are all cleared out */
-	BUG_ON(evtchn_to_irq[evtchn] != -1);
+	BUG_ON(get_evtchn_to_irq(evtchn) != -1);
 	/* Expect irq to have been bound before,
 	   so there should be a proper type */
 	BUG_ON(info->type == IRQT_UNBOUND);
@@ -1443,15 +1512,14 @@ void xen_irq_resume(void)
 	struct irq_info *info;
 
 	/* New event-channel space is not ''live'' yet. */
-	for (evtchn = 0; evtchn < NR_EVENT_CHANNELS; evtchn++)
+	for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++)
 		mask_evtchn(evtchn);
 
 	/* No IRQ <-> event-channel mappings. */
 	list_for_each_entry(info, &xen_irq_list_head, list)
 		info->evtchn = 0; /* zap event-channel binding */
 
-	for (evtchn = 0; evtchn < NR_EVENT_CHANNELS; evtchn++)
-		evtchn_to_irq[evtchn] = -1;
+	clear_evtchn_to_irq_all();
 
 	for_each_possible_cpu(cpu) {
 		restore_cpu_virqs(cpu);
@@ -1548,14 +1616,12 @@ void __init xen_init_IRQ(void)
 
 	xen_evtchn_2l_init();
 
-	evtchn_to_irq = kcalloc(NR_EVENT_CHANNELS, sizeof(*evtchn_to_irq),
-				    GFP_KERNEL);
+	evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
+				sizeof(*evtchn_to_irq), GFP_KERNEL);
 	BUG_ON(!evtchn_to_irq);
-	for (i = 0; i < NR_EVENT_CHANNELS; i++)
-		evtchn_to_irq[i] = -1;
 
 	/* No event channels are ''live'' right now. */
-	for (i = 0; i < NR_EVENT_CHANNELS; i++)
+	for (i = 0; i < xen_evtchn_nr_channels(); i++)
 		mask_evtchn(i);
 
 	pirq_needs_eoi = pirq_needs_eoi_flag;
diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index 5f27c72..0d79a63 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -39,6 +39,11 @@
 static DEFINE_PER_CPU(xen_ulong_t [NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD],
 		      cpu_evtchn_mask);
 
+static unsigned evtchn_2l_max_channels(void)
+{
+	return NR_EVENT_CHANNELS;
+}
+
 static void evtchn_2l_bind_to_cpu(struct irq_info *info, unsigned cpu)
 {
 	clear_bit(info->evtchn, BM(per_cpu(cpu_evtchn_mask, info->cpu)));
@@ -212,7 +217,7 @@ static void evtchn_2l_handle_events(unsigned cpu)
 
 			/* Process port. */
 			port = (word_idx * BITS_PER_EVTCHN_WORD) + bit_idx;
-			irq = evtchn_to_irq[port];
+			irq = get_evtchn_to_irq(port);
 
 			if (irq != -1) {
 				desc = irq_to_desc(irq);
@@ -306,7 +311,7 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 			int word_idx = i / BITS_PER_EVTCHN_WORD;
 			printk("  %d: event %d -> irq %d%s%s%s\n",
 			       cpu_from_evtchn(i), i,
-			       evtchn_to_irq[i],
+			       get_evtchn_to_irq(i),
 			       sync_test_bit(word_idx, BM(&v->evtchn_pending_sel))
 					     ? "" : " l2-clear",
 			       !sync_test_bit(i, BM(sh->evtchn_mask))
@@ -322,6 +327,8 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 }
 
 static const struct evtchn_ops evtchn_ops_2l = {
+	.max_channels      = evtchn_2l_max_channels,
+	.nr_channels       = evtchn_2l_max_channels,
 	.bind_to_cpu       = evtchn_2l_bind_to_cpu,
 	.clear_pending     = evtchn_2l_clear_pending,
 	.set_pending       = evtchn_2l_set_pending,
diff --git a/drivers/xen/events/events_internal.h
b/drivers/xen/events/events_internal.h
index dc96502..a3d9aec 100644
--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -35,7 +35,7 @@ struct irq_info {
 	int refcnt;
 	enum xen_irq_type type;	/* type */
 	unsigned irq;
-	unsigned short evtchn;	/* event channel */
+	unsigned int evtchn;	/* event channel */
 	unsigned short cpu;	/* cpu bound */
 
 	union {
@@ -55,6 +55,9 @@ struct irq_info {
 #define PIRQ_SHAREABLE	(1 << 1)
 
 struct evtchn_ops {
+	unsigned (*max_channels)(void);
+	unsigned (*nr_channels)(void);
+
 	int (*setup)(struct irq_info *info);
 	void (*bind_to_cpu)(struct irq_info *info, unsigned cpu);
 
@@ -70,12 +73,23 @@ struct evtchn_ops {
 
 extern const struct evtchn_ops *evtchn_ops;
 
-extern int *evtchn_to_irq;
+extern int **evtchn_to_irq;
+int get_evtchn_to_irq(unsigned int evtchn);
 
 struct irq_info *info_for_irq(unsigned irq);
 unsigned cpu_from_irq(unsigned irq);
 unsigned cpu_from_evtchn(unsigned int evtchn);
 
+static inline unsigned xen_evtchn_max_channels(void)
+{
+	return evtchn_ops->max_channels();
+}
+
+static inline unsigned xen_evtchn_nr_channels(void)
+{
+	return evtchn_ops->nr_channels();
+}
+
 /*
  * Do any ABI specific setup for a bound event channel before it can
  * be unmasked and used.
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 11/16] xen/events: add xen_evtchn_mask_all()

From: David Vrabel <david.vrabel@citrix.com>

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/events/events.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index 636079b..0ebb10c 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -331,6 +331,14 @@ static void bind_evtchn_to_cpu(unsigned int chn, unsigned
int cpu)
 	info->cpu = cpu;
 }
 
+static void xen_evtchn_mask_all(void)
+{
+	unsigned int evtchn;
+
+	for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++)
+		mask_evtchn(evtchn);
+}
+
 /**
  * notify_remote_via_irq - send event to remote end of event channel via irq
  * @irq: irq of event channel to send event to
@@ -1508,12 +1516,11 @@ EXPORT_SYMBOL_GPL(xen_test_irq_shared);
 
 void xen_irq_resume(void)
 {
-	unsigned int cpu, evtchn;
+	unsigned int cpu;
 	struct irq_info *info;
 
 	/* New event-channel space is not ''live'' yet. */
-	for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++)
-		mask_evtchn(evtchn);
+	xen_evtchn_mask_all();
 
 	/* No IRQ <-> event-channel mappings. */
 	list_for_each_entry(info, &xen_irq_list_head, list)
@@ -1612,8 +1619,6 @@ void xen_callback_vector(void) {}
 
 void __init xen_init_IRQ(void)
 {
-	int i;
-
 	xen_evtchn_2l_init();
 
 	evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
@@ -1621,8 +1626,7 @@ void __init xen_init_IRQ(void)
 	BUG_ON(!evtchn_to_irq);
 
 	/* No event channels are ''live'' right now. */
-	for (i = 0; i < xen_evtchn_nr_channels(); i++)
-		mask_evtchn(i);
+	xen_evtchn_mask_all();
 
 	pirq_needs_eoi = pirq_needs_eoi_flag;
 
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 12/16] xen/evtchn: support more than 4096 ports

From: David Vrabel <david.vrabel@citrix.com>

Remove the check during unbind for NR_EVENT_CHANNELS as this limits
support to less than 4096 ports.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/evtchn.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 8b3a69a..1b0fc44 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -416,10 +416,6 @@ static long evtchn_ioctl(struct file *file,
 		if (copy_from_user(&unbind, uarg, sizeof(unbind)))
 			break;
 
-		rc = -EINVAL;
-		if (unbind.port >= NR_EVENT_CHANNELS)
-			break;
-
 		rc = -ENOTCONN;
 		evtchn = find_evtchn(u, unbind.port);
 		if (!evtchn)
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 13/16] xen/events: Add the hypervisor interface for the FIFO-based event channels

From: David Vrabel <david.vrabel@citrix.com>

Add the hypercall sub-ops and the structures for the shared data used
in the FIFO-based event channel ABI.

The design document for this new ABI is available here:

    http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf

In summary, events are reported using a per-domain shared event array
of event words.  Each event word has PENDING, LINKED and MASKED bits
and a LINK field for pointing to the next event in the event queue.

There are 16 event queues (with different priorities) per-VCPU.

Key advantages of this new ABI include:

- Support for over 100,000 events (2^17).
- 16 different event priorities.
- Improved fairness in event latency through the use of FIFOs.

The ABI is available in Xen 4.4 and later.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/events/events_2l.c        |    8 ++--
 include/xen/interface/event_channel.h |   67 +++++++++++++++++++++++++++++++++
 include/xen/interface/xen.h           |    6 ---
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index 0d79a63..ef9faf1 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -36,12 +36,12 @@
 /* Find the first set bit in a evtchn mask */
 #define EVTCHN_FIRST_BIT(w) find_first_bit(BM(&(w)), BITS_PER_EVTCHN_WORD)
 
-static DEFINE_PER_CPU(xen_ulong_t [NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD],
+static DEFINE_PER_CPU(xen_ulong_t [EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD],
 		      cpu_evtchn_mask);
 
 static unsigned evtchn_2l_max_channels(void)
 {
-	return NR_EVENT_CHANNELS;
+	return EVTCHN_2L_NR_CHANNELS;
 }
 
 static void evtchn_2l_bind_to_cpu(struct irq_info *info, unsigned cpu)
@@ -290,7 +290,7 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 		       i % 8 == 0 ? "\n   " : " ");
 
 	printk("\nlocal cpu%d mask:\n   ", cpu);
-	for (i = (NR_EVENT_CHANNELS/BITS_PER_EVTCHN_WORD)-1; i >= 0; i--)
+	for (i = (EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD)-1; i >= 0; i--)
 		printk("%0*"PRI_xen_ulong"%s",
(int)(sizeof(cpu_evtchn[0])*2),
 		       cpu_evtchn[i],
 		       i % 8 == 0 ? "\n   " : " ");
@@ -306,7 +306,7 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 	}
 
 	printk("\npending list:\n");
-	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
+	for (i = 0; i < EVTCHN_2L_NR_CHANNELS; i++) {
 		if (sync_test_bit(i, BM(sh->evtchn_pending))) {
 			int word_idx = i / BITS_PER_EVTCHN_WORD;
 			printk("  %d: event %d -> irq %d%s%s%s\n",
diff --git a/include/xen/interface/event_channel.h
b/include/xen/interface/event_channel.h
index f494292..0cd1e73 100644
--- a/include/xen/interface/event_channel.h
+++ b/include/xen/interface/event_channel.h
@@ -190,6 +190,39 @@ struct evtchn_reset {
 };
 typedef struct evtchn_reset evtchn_reset_t;
 
+/*
+ * EVTCHNOP_init_control: initialize the control block for the FIFO ABI.
+ */
+#define EVTCHNOP_init_control    11
+struct evtchn_init_control {
+	/* IN parameters. */
+	uint64_t control_gfn;
+	uint32_t offset;
+	uint32_t vcpu;
+	/* OUT parameters. */
+	uint8_t link_bits;
+	uint8_t _pad[7];
+};
+
+/*
+ * EVTCHNOP_expand_array: add an additional page to the event array.
+ */
+#define EVTCHNOP_expand_array    12
+struct evtchn_expand_array {
+	/* IN parameters. */
+	uint64_t array_gfn;
+};
+
+/*
+ * EVTCHNOP_set_priority: set the priority for an event channel.
+ */
+#define EVTCHNOP_set_priority    13
+struct evtchn_set_priority {
+	/* IN parameters. */
+	uint32_t port;
+	uint32_t priority;
+};
+
 struct evtchn_op {
 	uint32_t cmd; /* EVTCHNOP_* */
 	union {
@@ -207,4 +240,38 @@ struct evtchn_op {
 };
 DEFINE_GUEST_HANDLE_STRUCT(evtchn_op);
 
+/*
+ * 2-level ABI
+ */
+
+#define EVTCHN_2L_NR_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64)
+
+/*
+ * FIFO ABI
+ */
+
+/* Events may have priorities from 0 (highest) to 15 (lowest). */
+#define EVTCHN_FIFO_PRIORITY_MAX     0
+#define EVTCHN_FIFO_PRIORITY_DEFAULT 7
+#define EVTCHN_FIFO_PRIORITY_MIN     15
+
+#define EVTCHN_FIFO_MAX_QUEUES (EVTCHN_FIFO_PRIORITY_MIN + 1)
+
+typedef uint32_t event_word_t;
+
+#define EVTCHN_FIFO_PENDING 31
+#define EVTCHN_FIFO_MASKED  30
+#define EVTCHN_FIFO_LINKED  29
+
+#define EVTCHN_FIFO_LINK_BITS 17
+#define EVTCHN_FIFO_LINK_MASK ((1 << EVTCHN_FIFO_LINK_BITS) - 1)
+
+#define EVTCHN_FIFO_NR_CHANNELS (1 << EVTCHN_FIFO_LINK_BITS)
+
+struct evtchn_fifo_control_block {
+	uint32_t     ready;
+	uint32_t     _rsvd;
+	event_word_t head[EVTCHN_FIFO_MAX_QUEUES];
+};
+
 #endif /* __XEN_PUBLIC_EVENT_CHANNEL_H__ */
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 53ec416..0cd5ca3 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -281,12 +281,6 @@ struct multicall_entry {
 };
 DEFINE_GUEST_HANDLE_STRUCT(multicall_entry);
 
-/*
- * Event channel endpoints per domain:
- *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
- */
-#define NR_EVENT_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64)
-
 struct vcpu_time_info {
 	/*
 	 * Updates to the following values are preceded and followed
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 14/16] xen/events: allow event channel priority to be set

From: David Vrabel <david.vrabel@citrix.com>

Add xen_irq_set_priority() to set an event channels priority.  This function
will only work with event channel ABIs that support priority (i.e., the
FIFO-based ABI).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/events/events.c |   17 +++++++++++++++++
 include/xen/events.h        |    5 +++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index 0ebb10c..fd65485 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -1092,6 +1092,23 @@ void unbind_from_irqhandler(unsigned int irq, void
*dev_id)
 }
 EXPORT_SYMBOL_GPL(unbind_from_irqhandler);
 
+/**
+ * xen_set_irq_priority() - set an event channel priority.
+ * @irq:irq bound to an event channel.
+ * @priority: priority between XEN_IRQ_PRIORITY_MAX and XEN_IRQ_PRIORITY_MIN.
+ */
+int xen_set_irq_priority(unsigned irq, unsigned priority)
+{
+	struct evtchn_set_priority set_priority;
+
+	set_priority.port = evtchn_from_irq(irq);
+	set_priority.priority = priority;
+
+	return HYPERVISOR_event_channel_op(EVTCHNOP_set_priority,
+					   &set_priority);
+}
+EXPORT_SYMBOL_GPL(xen_set_irq_priority);
+
 int evtchn_make_refcounted(unsigned int evtchn)
 {
 	int irq = get_evtchn_to_irq(evtchn);
diff --git a/include/xen/events.h b/include/xen/events.h
index c9ea10e..d265d7e 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -37,6 +37,11 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int
remote_domain,
  */
 void unbind_from_irqhandler(unsigned int irq, void *dev_id);
 
+#define XEN_IRQ_PRIORITY_MAX     EVTCHN_FIFO_PRIORITY_MAX
+#define XEN_IRQ_PRIORITY_DEFAULT EVTCHN_FIFO_PRIORITY_DEFAULT
+#define XEN_IRQ_PRIORITY_MIN     EVTCHN_FIFO_PRIORITY_MIN
+int xen_set_irq_priority(unsigned irq, unsigned priority);
+
 /*
  * Allow extra references to event channels exposed to userspace by evtchn
  */
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 15/16] xen/x86: set VIRQ_TIMER priority to maximum

From: David Vrabel <david.vrabel@citrix.com>

Commit bee980d9e (xen/events: Handle VIRQ_TIMER before any other hardirq
in event loop) effectively made the VIRQ_TIMER the highest priority event
when using the 2-level ABI.

Set the VIRQ_TIMER priority to the highest so this behaviour is retained
when using the FIFO-based ABI.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/time.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index ee36589..10e11c1 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -447,6 +447,7 @@ void xen_setup_timer(int cpu)
 				      IRQF_NOBALANCING|IRQF_TIMER|
 				      IRQF_FORCE_RESUME,
 				      name, NULL);
+	(void)xen_set_irq_priority(irq, XEN_IRQ_PRIORITY_MAX);
 
 	memcpy(evt, xen_clockevent, sizeof(*evt));
 
-- 
1.7.2.5

David Vrabel

2013-Oct-08 12:49 UTC

head link

[PATCH 16/16] xen/events: use the FIFO-based ABI if available

From: David Vrabel <david.vrabel@citrix.com>

Implement all the event channel port ops for the FIFO-based ABI.

If the hypervisor supports the FIFO-based ABI, enable it by
initializing the control block for the boot VCPU and subsequent VCPUs
as they are brought up and on resume.  The event array is expanded as
required when event ports are setup.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/events/Makefile          |    1 +
 drivers/xen/events/events.c          |    9 +-
 drivers/xen/events/events_fifo.c     |  409 ++++++++++++++++++++++++++++++++++
 drivers/xen/events/events_internal.h |    8 +
 4 files changed, 426 insertions(+), 1 deletions(-)
 create mode 100644 drivers/xen/events/events_fifo.c

diff --git a/drivers/xen/events/Makefile b/drivers/xen/events/Makefile
index 3bed6e0..d0f1581 100644
--- a/drivers/xen/events/Makefile
+++ b/drivers/xen/events/Makefile
@@ -1,2 +1,3 @@
 obj-y += events.o
 obj-y += events_2l.o
+obj-y += events_fifo.o
diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
index fd65485..dec4d6d 100644
--- a/drivers/xen/events/events.c
+++ b/drivers/xen/events/events.c
@@ -1538,6 +1538,7 @@ void xen_irq_resume(void)
 
 	/* New event-channel space is not ''live'' yet. */
 	xen_evtchn_mask_all();
+	xen_evtchn_resume();
 
 	/* No IRQ <-> event-channel mappings. */
 	list_for_each_entry(info, &xen_irq_list_head, list)
@@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {}
 
 void __init xen_init_IRQ(void)
 {
-	xen_evtchn_2l_init();
+	int ret;
+
+	ret = xen_evtchn_fifo_init();
+	if (ret < 0) {
+		printk(KERN_INFO "xen: falling back to n-level event channels");
+		xen_evtchn_2l_init();
+	}
 
 	evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
 				sizeof(*evtchn_to_irq), GFP_KERNEL);
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
new file mode 100644
index 0000000..c109639
--- /dev/null
+++ b/drivers/xen/events/events_fifo.c
@@ -0,0 +1,409 @@
+/*
+ * Xen event channels (FIFO-based ABI)
+ *
+ * Copyright (C) 2013 Citrix Systems R&D ltd.
+ *
+ * This source code is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * Or, when distributed separately from the Linux kernel or
+ * incorporated into other software packages, subject to the following
+ * license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software
without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/linkage.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/smp.h>
+#include <linux/percpu.h>
+#include <linux/cpu.h>
+
+#include <asm/sync_bitops.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>
+#include <asm/xen/page.h>
+
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
+#include <xen/events.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/event_channel.h>
+
+#include "events_internal.h"
+
+#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
+
+struct evtchn_fifo_queue {
+	uint32_t head[EVTCHN_FIFO_MAX_QUEUES];
+};
+
+static DEFINE_PER_CPU(struct evtchn_fifo_control_block *, cpu_control_block);
+static DEFINE_PER_CPU(struct evtchn_fifo_queue, cpu_queue);
+static event_word_t *event_array[MAX_EVENT_ARRAY_PAGES] __read_mostly;
+static unsigned event_array_pages __read_mostly;
+
+#define BM(w) ((unsigned long *)(w))
+
+static inline event_word_t *event_word_from_port(unsigned port)
+{
+	unsigned i = port / EVENT_WORDS_PER_PAGE;
+
+	if (i >= event_array_pages)
+		return NULL;
+	return event_array[i] + port % EVENT_WORDS_PER_PAGE;
+}
+
+static unsigned evtchn_fifo_max_channels(void)
+{
+	return EVTCHN_FIFO_NR_CHANNELS;
+}
+
+static unsigned evtchn_fifo_nr_channels(void)
+{
+	return event_array_pages * EVENT_WORDS_PER_PAGE;
+}
+
+static void free_unused_array_pages(void)
+{
+	unsigned i;
+
+	for (i = event_array_pages; i < MAX_EVENT_ARRAY_PAGES; i++) {
+		if (!event_array[i])
+			break;
+		free_page((unsigned long)event_array[i]);
+		event_array[i] = NULL;
+	}
+}
+
+static void init_array_page(event_word_t *array_page)
+{
+	unsigned i;
+
+	for (i = 0; i < EVENT_WORDS_PER_PAGE; i++)
+		array_page[i] = 1 << EVTCHN_FIFO_MASKED;
+}
+
+static int evtchn_fifo_setup(struct irq_info *info)
+{
+	unsigned port = info->evtchn;
+	unsigned i;
+	int ret = -ENOMEM;
+
+	i = port / EVENT_WORDS_PER_PAGE;
+
+	if (i >= MAX_EVENT_ARRAY_PAGES)
+		return -EINVAL;
+
+	while (i >= event_array_pages) {
+		void *array_page;
+		struct evtchn_expand_array expand_array;
+
+		/* Might already have a page if we''ve resumed. */
+		array_page = event_array[event_array_pages];
+		if (!array_page) {
+			array_page = (void *)__get_free_page(GFP_KERNEL);
+			if (array_page == NULL)
+				goto error;
+			event_array[event_array_pages] = array_page;
+		}
+
+		/* Mask all events in this page before adding it. */
+		init_array_page(array_page);
+
+		expand_array.array_gfn = virt_to_mfn(array_page);
+
+		ret = HYPERVISOR_event_channel_op(EVTCHNOP_expand_array, &expand_array);
+		if (ret < 0)
+			goto error;
+
+		event_array_pages++;
+	}
+	return 0;
+
+  error:
+	if (event_array_pages == 0)
+		panic("xen: unable to expand event array with initial page (%d)\n",
ret);
+	else
+		printk(KERN_ERR "xen: unable to expand event array (%d)\n", ret);
+	free_unused_array_pages();
+	return ret;
+}
+
+static void evtchn_fifo_bind_to_cpu(struct irq_info *info, unsigned cpu)
+{
+	/* no-op */
+}
+
+static void evtchn_fifo_clear_pending(unsigned port)
+{
+	event_word_t *word = event_word_from_port(port);
+	sync_clear_bit(EVTCHN_FIFO_PENDING, BM(word));
+}
+
+static void evtchn_fifo_set_pending(unsigned port)
+{
+	event_word_t *word = event_word_from_port(port);
+	sync_set_bit(EVTCHN_FIFO_PENDING, BM(word));
+}
+
+static bool evtchn_fifo_is_pending(unsigned port)
+{
+	event_word_t *word = event_word_from_port(port);
+	return sync_test_bit(EVTCHN_FIFO_PENDING, BM(word));
+}
+
+static bool evtchn_fifo_test_and_set_mask(unsigned port)
+{
+	event_word_t *word = event_word_from_port(port);
+	return sync_test_and_set_bit(EVTCHN_FIFO_MASKED, BM(word));
+}
+
+static void evtchn_fifo_mask(unsigned port)
+{
+	event_word_t *word = event_word_from_port(port);
+	if (word)
+		sync_set_bit(EVTCHN_FIFO_MASKED, BM(word));
+}
+
+static void evtchn_fifo_unmask(unsigned port)
+{
+	event_word_t *word = event_word_from_port(port);
+
+	BUG_ON(!irqs_disabled());
+
+	sync_clear_bit(EVTCHN_FIFO_MASKED, BM(word));
+	if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))) {
+		struct evtchn_unmask unmask = { .port = port };
+		(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
+	}
+}
+
+static uint32_t clear_linked(volatile event_word_t *word)
+{
+	event_word_t new, old, w;
+
+	w = *word;
+
+	do {
+		old = w;
+		new = (w & ~((1 << EVTCHN_FIFO_LINKED)
+			     | EVTCHN_FIFO_LINK_MASK));
+	} while ((w = sync_cmpxchg(word, old, new)) != old);
+
+	return w & EVTCHN_FIFO_LINK_MASK;
+}
+
+static void handle_irq_for_port(unsigned port)
+{
+	int irq;
+	struct irq_desc *desc;
+
+	irq = get_evtchn_to_irq(port);
+	if (irq != -1) {
+		desc = irq_to_desc(irq);
+		if (desc)
+			generic_handle_irq_desc(irq, desc);
+	}
+}
+
+static void consume_one_event(unsigned cpu,
+			      struct evtchn_fifo_control_block *control_block,
+			      unsigned priority, uint32_t *ready)
+{
+	struct evtchn_fifo_queue *q = &per_cpu(cpu_queue, cpu);
+	uint32_t head;
+	unsigned port;
+	event_word_t *word;
+
+	head = q->head[priority];
+
+	/* Reached the tail last time?  Read the new HEAD from the
+	   control block. */
+	if (head == 0) {
+		rmb(); /* Ensure word is up-to-date before reading head. */
+		head = control_block->head[priority];
+	}
+
+	port = head;
+	word = event_word_from_port(port);
+	head = clear_linked(word);
+
+	/*
+	 * If the link is non-zero, there are more events in the
+	 * queue, otherwise the queue is empty.
+	 *
+	 * If the queue is empty, clear this priority from our local
+	 * copy of the ready word.
+	 */
+	if (head == 0)
+		clear_bit(priority, BM(ready));
+
+	if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))
+	    && !sync_test_bit(EVTCHN_FIFO_MASKED, BM(word)))
+		handle_irq_for_port(port);
+
+	q->head[priority] = head;
+}
+
+static void evtchn_fifo_handle_events(unsigned cpu)
+{
+	struct evtchn_fifo_control_block *control_block;
+	uint32_t ready;
+	unsigned q;
+
+	control_block = per_cpu(cpu_control_block, cpu);
+
+	ready = xchg(&control_block->ready, 0);
+
+	while (ready) {
+		q = find_first_bit(BM(&ready), EVTCHN_FIFO_MAX_QUEUES);
+		consume_one_event(cpu, control_block, q, &ready);
+		ready |= xchg(&control_block->ready, 0);
+	}
+}
+
+static void evtchn_fifo_resume(void)
+{
+	unsigned cpu;
+
+	for_each_possible_cpu(cpu) {
+		void *control_block = per_cpu(cpu_control_block, cpu);
+		struct evtchn_init_control init_control;
+		int ret;
+
+		if (!control_block)
+			continue;
+
+		/*
+		 * If this CPU is offline, take the opportunity to
+		 * free the control block while it is not being
+		 * used.
+		 */
+		if (!cpu_online(cpu)) {
+			free_page((unsigned long)control_block);
+			per_cpu(cpu_control_block, cpu) = NULL;
+			continue;
+		}
+
+		init_control.control_gfn = virt_to_mfn(control_block);
+		init_control.offset = 0;
+		init_control.vcpu = cpu;
+
+		ret = HYPERVISOR_event_channel_op(EVTCHNOP_init_control,
+						  &init_control);
+		if (ret < 0)
+			BUG();
+	}
+
+	/*
+	 * The event array starts out as empty again and is extended
+	 * as normal when events are bound.  The existing pages will
+	 * be reused.
+	 */
+	event_array_pages = 0;
+}
+
+static const struct evtchn_ops evtchn_ops_fifo = {
+	.max_channels      = evtchn_fifo_max_channels,
+	.nr_channels       = evtchn_fifo_nr_channels,
+	.setup             = evtchn_fifo_setup,
+	.bind_to_cpu       = evtchn_fifo_bind_to_cpu,
+	.clear_pending     = evtchn_fifo_clear_pending,
+	.set_pending       = evtchn_fifo_set_pending,
+	.is_pending        = evtchn_fifo_is_pending,
+	.test_and_set_mask = evtchn_fifo_test_and_set_mask,
+	.mask              = evtchn_fifo_mask,
+	.unmask            = evtchn_fifo_unmask,
+	.handle_events     = evtchn_fifo_handle_events,
+	.resume            = evtchn_fifo_resume,
+};
+
+static int __cpuinit evtchn_fifo_init_control_block(unsigned cpu)
+{
+	struct page *control_block = NULL;
+	struct evtchn_init_control init_control;
+	int ret = -ENOMEM;
+
+	control_block = alloc_page(GFP_KERNEL|__GFP_ZERO);
+	if (control_block == NULL)
+		goto error;
+
+	init_control.control_gfn = virt_to_mfn(page_address(control_block));
+	init_control.offset      = 0;
+	init_control.vcpu        = cpu;
+
+	ret = HYPERVISOR_event_channel_op(EVTCHNOP_init_control, &init_control);
+	if (ret < 0)
+		goto error;
+
+	per_cpu(cpu_control_block, cpu) = page_address(control_block);
+
+	return 0;
+
+  error:
+	__free_page(control_block);
+	return ret;
+}
+
+static int __cpuinit evtchn_fifo_cpu_notification(struct notifier_block *self,
+						  unsigned long action,
+						  void *hcpu)
+{
+	int cpu = (long)hcpu;
+	int ret = 0;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+		if (!per_cpu(cpu_control_block, cpu))
+			ret = evtchn_fifo_init_control_block(cpu);
+		break;
+	default:
+		break;
+	}
+	return ret < 0 ? NOTIFY_BAD : NOTIFY_OK;
+}
+
+static struct notifier_block evtchn_fifo_cpu_notifier __cpuinitdata = {
+	.notifier_call	= evtchn_fifo_cpu_notification,
+};
+
+int __init xen_evtchn_fifo_init(void)
+{
+	int cpu = get_cpu();
+	int ret;
+
+	ret = evtchn_fifo_init_control_block(cpu);
+	if (ret < 0)
+		goto out;
+
+	printk(KERN_INFO "xen: switching to FIFO-based event channels\n");
+
+	evtchn_ops = &evtchn_ops_fifo;
+
+	register_cpu_notifier(&evtchn_fifo_cpu_notifier);
+out:
+	put_cpu();
+	return ret;
+}
diff --git a/drivers/xen/events/events_internal.h
b/drivers/xen/events/events_internal.h
index a3d9aec..0af518f 100644
--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -69,6 +69,7 @@ struct evtchn_ops {
 	void (*unmask)(unsigned port);
 
 	void (*handle_events)(unsigned cpu);
+	void (*resume)(void);
 };
 
 extern const struct evtchn_ops *evtchn_ops;
@@ -142,6 +143,13 @@ static inline void xen_evtchn_handle_events(unsigned cpu)
 	return evtchn_ops->handle_events(cpu);
 }
 
+static inline void xen_evtchn_resume(void)
+{
+	if (evtchn_ops->resume)
+		evtchn_ops->resume();
+}
+
 void xen_evtchn_2l_init(void);
+int xen_evtchn_fifo_init(void);
 
 #endif /* #ifndef __EVENTS_INTERNAL_H__ */
-- 
1.7.2.5

David Vrabel

2013-Oct-14 13:41 UTC

head link

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI

On 08/10/13 13:48, David Vrabel wrote:> This is the complete Linux guest-side implementation of the FIFO-based
> event channel ABI described in this design document:
> 
>   http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf
> 
> Refer also to the Xen series.
The Xen series has now been merged.  This Linux series is still missing
a number of Reviewed-by or Acked-by tags.

Konrad, Boris, could review, please?

David

Boris Ostrovsky

2013-Oct-14 15:59 UTC

head link

Re: [PATCH 01/16] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn()

On 10/08/2013 08:49 AM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> These two function did the same thing with different parameters, put
> the common bits in retrigger_evtchn().
>
> This changes the return value of resend_irq_on_evtchn() but the only
> caller (in arch/ia64/xen/irq_xen.c) ignored the return value so this
> is fine.
Not directly related to this patch, but why do we still have 
arch/ia64/xen? Is anyone testing it? I only see a few superficial 
changes in the log for last year, and those appear to be result of code 
inspection or some global changes, not actual execution.

-boris

>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>   drivers/xen/events.c |   27 +++++++++------------------
>   1 files changed, 9 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index 4035e83..ddcdbb5 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -1558,13 +1558,13 @@ static int set_affinity_irq(struct irq_data *data,
const struct cpumask *dest,
>   	return rebind_irq_to_cpu(data->irq, tcpu);
>   }
>   
> -int resend_irq_on_evtchn(unsigned int irq)
> +static int retrigger_evtchn(int evtchn)
>   {
> -	int masked, evtchn = evtchn_from_irq(irq);
> +	int masked;
>   	struct shared_info *s = HYPERVISOR_shared_info;
>   
>   	if (!VALID_EVTCHN(evtchn))
> -		return 1;
> +		return 0;
>   
>   	masked = sync_test_and_set_bit(evtchn, BM(s->evtchn_mask));
>   	sync_set_bit(evtchn, BM(s->evtchn_pending));
> @@ -1574,6 +1574,11 @@ int resend_irq_on_evtchn(unsigned int irq)
>   	return 1;
>   }
>   
> +int resend_irq_on_evtchn(unsigned int irq)
> +{
> +	return retrigger_evtchn(evtchn_from_irq(irq));
> +}
> +
>   static void enable_dynirq(struct irq_data *data)
>   {
>   	int evtchn = evtchn_from_irq(data->irq);
> @@ -1608,21 +1613,7 @@ static void mask_ack_dynirq(struct irq_data *data)
>   
>   static int retrigger_dynirq(struct irq_data *data)
>   {
> -	int evtchn = evtchn_from_irq(data->irq);
> -	struct shared_info *sh = HYPERVISOR_shared_info;
> -	int ret = 0;
> -
> -	if (VALID_EVTCHN(evtchn)) {
> -		int masked;
> -
> -		masked = sync_test_and_set_bit(evtchn, BM(sh->evtchn_mask));
> -		sync_set_bit(evtchn, BM(sh->evtchn_pending));
> -		if (!masked)
> -			unmask_evtchn(evtchn);
> -		ret = 1;
> -	}
> -
> -	return ret;
> +	return retrigger_evtchn(evtchn_from_irq(data->irq));
>   }
>   
>   static void restore_pirqs(void)

Boris Ostrovsky

2013-Oct-14 16:30 UTC

head link

Re: [PATCH 04/16] xen/events: replace raw bit ops with functions

On 10/08/2013 08:49 AM, David Vrabel wrote:> From: Wei Liu <wei.liu2@citrix.com>
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>   drivers/xen/events.c |    3 +--
>   1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index 359e983..fec5da4 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -1548,13 +1548,12 @@ static int set_affinity_irq(struct irq_data *data,
const struct cpumask *dest,
>   static int retrigger_evtchn(int evtchn)
>   {
>   	int masked;
> -	struct shared_info *s = HYPERVISOR_shared_info;
>   
>   	if (!VALID_EVTCHN(evtchn))
>   		return 0;
>   
>   	masked = test_and_set_mask(evtchn);
> -	sync_set_bit(evtchn, BM(s->evtchn_pending));
> +	set_evtchn(evtchn);
>   	if (!masked)
>   		unmask_evtchn(evtchn);
There are other places in this file that manipulate evtchn_pending bits 
(e.g. in unmask_evtchn()). As long as this change is made, those places 
should probably be updated as well.

You also need a commit message for this and previous patch.

-boris

David Vrabel

2013-Oct-14 16:35 UTC

head link

Re: [PATCH 01/16] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn()

On 14/10/13 16:59, Boris Ostrovsky wrote:> On 10/08/2013 08:49 AM, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> These two function did the same thing with different parameters, put
>> the common bits in retrigger_evtchn().
>>
>> This changes the return value of resend_irq_on_evtchn() but the only
>> caller (in arch/ia64/xen/irq_xen.c) ignored the return value so this
>> is fine.
> 
> Not directly related to this patch, but why do we still have
> arch/ia64/xen? Is anyone testing it? I only see a few superficial
> changes in the log for last year, and those appear to be result of code
> inspection or some global changes, not actual execution.
I was going to bring it up at the Developer''s meeting.  I think we
should think about removing it.

David

David Vrabel

2013-Oct-14 16:43 UTC

head link

Re: [PATCH 04/16] xen/events: replace raw bit ops with functions

On 14/10/13 17:30, Boris Ostrovsky wrote:> On 10/08/2013 08:49 AM, David Vrabel wrote:
>> From: Wei Liu <wei.liu2@citrix.com>
>>
>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>   drivers/xen/events.c |    3 +--
>>   1 files changed, 1 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
>> index 359e983..fec5da4 100644
>> --- a/drivers/xen/events.c
>> +++ b/drivers/xen/events.c
>> @@ -1548,13 +1548,12 @@ static int set_affinity_irq(struct irq_data
>> *data, const struct cpumask *dest,
>>   static int retrigger_evtchn(int evtchn)
>>   {
>>       int masked;
>> -    struct shared_info *s = HYPERVISOR_shared_info;
>>         if (!VALID_EVTCHN(evtchn))
>>           return 0;
>>         masked = test_and_set_mask(evtchn);
>> -    sync_set_bit(evtchn, BM(s->evtchn_pending));
>> +    set_evtchn(evtchn);
>>       if (!masked)
>>           unmask_evtchn(evtchn);
> 
> There are other places in this file that manipulate evtchn_pending bits
> (e.g. in unmask_evtchn()). As long as this change is made, those places
> should probably be updated as well.
This in preparation of the evtchn_ops refactor so we only change places
that will turn into port ops.
> You also need a commit message for this and previous patch.
TBH, I think the existing 1 liner is fine.

David

Boris Ostrovsky

2013-Oct-14 16:50 UTC

head link

Re: [PATCH 06/16] xen/events: move 2-level specific code into its own file

On 10/08/2013 08:49 AM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> In preparation for alternative event channel ABIs, move all the
> functions accessing the shared data structures into their own file.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>   drivers/xen/events/Makefile          |    1 +
>   drivers/xen/events/events.c          |  372
+---------------------------------
>   drivers/xen/events/events_2l.c       |  322 +++++++++++++++++++++++++++++
>   drivers/xen/events/events_internal.h |   74 +++++++
>   4 files changed, 408 insertions(+), 361 deletions(-)
>   create mode 100644 drivers/xen/events/events_2l.c
>   create mode 100644 drivers/xen/events/events_internal.h...
> - */
>   static void __xen_evtchn_do_upcall(void)
>   {
> -	int start_word_idx, start_bit_idx;
> -	int word_idx, bit_idx;
> -	int i, irq;
> -	int cpu = get_cpu();
> -	struct shared_info *s = HYPERVISOR_shared_info;
>   	struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
> +	int cpu = get_cpu();
>   	unsigned count;
>   
>   	do {
> -		xen_ulong_t pending_words;
> -		xen_ulong_t pending_bits;
> -		struct irq_desc *desc;
> -
>   		vcpu_info->evtchn_upcall_pending = 0;
>   
>   		if (__this_cpu_inc_return(xed_nesting_count) - 1)
>   			goto out;
>   
> -		/*
> -		 * Master flag must be cleared /before/ clearing
> -		 * selector flag. xchg_xen_ulong must contain an
> -		 * appropriate barrier.
> -		 */
> -		if ((irq = per_cpu(virq_to_irq, cpu)[VIRQ_TIMER]) != -1) {
> -			int evtchn = evtchn_from_irq(irq);
> -			word_idx = evtchn / BITS_PER_LONG;
> -			pending_bits = evtchn % BITS_PER_LONG;
> -			if (active_evtchns(cpu, s, word_idx) & (1ULL <<
pending_bits)) {
> -				desc = irq_to_desc(irq);
> -				if (desc)
> -					generic_handle_irq_desc(irq, desc);
> -			}
> -		}
This chunk dealing with timer interrupt has been completely removed. I 
see that you have a later patch that raises priority of VIRQ_TIMER. If 
that patch is replacement for what this chunk used to do, should you be 
removing this code in that patch?

Also, it appears that you are not simply moving code around. There are 
some changes to logic (for example this one) and it should be reflected 
in the commit message.
> -
> -		pending_words = xchg_xen_ulong(&vcpu_info->evtchn_pending_sel,
0);
> -
> -		start_word_idx = __this_cpu_read(current_word_idx);
> -		start_bit_idx = __this_cpu_read(current_bit_idx);
> -
...
> -
> +
> +void unmask_evtchn(int port)
> +{
> +	struct shared_info *s = HYPERVISOR_shared_info;
> +	unsigned int cpu = get_cpu();
> +	int do_hypercall = 0, evtchn_pending = 0;
> +
> +	BUG_ON(!irqs_disabled());
> +
> +	if (unlikely((cpu != cpu_from_evtchn(port))))
> +		do_hypercall = 1;
> +	else {
> +		sync_clear_bit(port, BM(&s->evtchn_mask[0]));
> +		evtchn_pending = sync_test_bit(port, BM(&s->evtchn_pending[0]));
> +
> +		if (unlikely(evtchn_pending && xen_hvm_domain()))
> +			do_hypercall = 1;
Why are you no longer setting back evtchn_mask bit in the ''if''
clause?
Wouldn''t this make you possibly miss a pending event from
Xen''s
evtchn_unmask()?

You also removed a comment that talks about why we clear the bit. Is 
this comment no longer relevant?

-boris
> +	}
> +
> +	/* Slow path (hypercall) if this is a non-local port or if this is
> +	 * an hvm domain and an event is pending (hvm domains don''t have
> +	 * their own implementation of irq_enable). */
> +	if (do_hypercall) {
> +		struct evtchn_unmask unmask = { .port = port };
> +		(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
> +	} else {
> +		struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
> +
> +		/*
> +		 * The following is basically the equivalent of
> +		 * ''hw_resend_irq''. Just like a real IO-APIC we
''lose
> +		 * the interrupt edge'' if the channel is masked.
> +		 */
> +		if (evtchn_pending &&
> +		    !sync_test_and_set_bit(port / BITS_PER_EVTCHN_WORD,
> +					   BM(&vcpu_info->evtchn_pending_sel)))
> +			vcpu_info->evtchn_upcall_pending = 1;
> +	}
> +
> +	put_cpu();
> +}
> +

David Vrabel

2013-Oct-14 16:53 UTC

head link

Re: [PATCH 06/16] xen/events: move 2-level specific code into its own file

On 14/10/13 17:50, Boris Ostrovsky wrote:> 
> This chunk dealing with timer interrupt has been completely removed. I
> see that you have a later patch that raises priority of VIRQ_TIMER. If
> that patch is replacement for what this chunk used to do, should you be
> removing this code in that patch?
> 
> Also, it appears that you are not simply moving code around. There are
> some changes to logic (for example this one) and it should be reflected
> in the commit message.
Oops.  It was just code motion in v1 but the new file as not been kept
up-to-date.  I''ll redo this whole patch from scratch.

David

Boris Ostrovsky

2013-Oct-14 17:26 UTC

head link

Re: [PATCH 08/16] xen/events: allow setup of irq_info to fail

On 10/08/2013 08:49 AM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> The FIFO-based event ABI requires additional setup of newly bound
> events (it may need to expand the event array) and this setup may
> fail.
>
> xen_irq_info_common_init() is a useful place to put this setup so
> allow this call to fail.  This call and the other similar calls are
> renamed to be *_setup() to reflect that they may now fail.
>
> This failure can only occur with new event channels not on rebind.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>   drivers/xen/events/events.c |  156
+++++++++++++++++++++++++------------------
>   1 files changed, 91 insertions(+), 65 deletions(-)
>
> diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
> index bc35f5d..9628df8 100644
> --- a/drivers/xen/events/events.c
> +++ b/drivers/xen/events/events.c
> @@ -99,7 +99,7 @@ struct irq_info *info_for_irq(unsigned irq)
>   }
>   
>   /* Constructors for packed IRQ information. */
> -static void xen_irq_info_common_init(struct irq_info *info,
> +static int xen_irq_info_common_setup(struct irq_info *info,
>   				     unsigned irq,
>   				     enum xen_irq_type type,
>   				     unsigned short evtchn,
> @@ -116,45 +116,47 @@ static void xen_irq_info_common_init(struct irq_info
*info,
>   	evtchn_to_irq[evtchn] = irq;
>   
>   	irq_clear_status_flags(irq, IRQ_NOREQUEST|IRQ_NOAUTOEN);
> +
> +	return 0;
>   }
>   
> -static void xen_irq_info_evtchn_init(unsigned irq,
> +static int xen_irq_info_evtchn_setup(unsigned irq,
>   				     unsigned short evtchn)
>   {
>   	struct irq_info *info = info_for_irq(irq);
>   
> -	xen_irq_info_common_init(info, irq, IRQT_EVTCHN, evtchn, 0);
> +	return xen_irq_info_common_setup(info, irq, IRQT_EVTCHN, evtchn, 0);
>   }
>   
> -static void xen_irq_info_ipi_init(unsigned cpu,
> +static int xen_irq_info_ipi_setup(unsigned cpu,
>   				  unsigned irq,
>   				  unsigned short evtchn,
>   				  enum ipi_vector ipi)
>   {
>   	struct irq_info *info = info_for_irq(irq);
>   
> -	xen_irq_info_common_init(info, irq, IRQT_IPI, evtchn, 0);
> -
>   	info->u.ipi = ipi;
>   
>   	per_cpu(ipi_to_irq, cpu)[ipi] = irq;
> +
> +	return xen_irq_info_common_setup(info, irq, IRQT_IPI, evtchn, 0);
Do you need to cleanup on error here (and in similar routines below)? I.e.

      per_cpu(ipi_to_irq, cpu)[ipi] = -1;

You may be trying to do this in __unbind_from _irq() but I think a 
routine that fails shouldn''t expects others to clean up after it. If 
__unbind from irq() is used in other contexts (it probably is) then 
perhaps you can factor our the part that is relevant to setup routines 
only and call it from here.
>   }
>   
> -static void xen_irq_info_virq_init(unsigned cpu,
> +static int xen_irq_info_virq_setup(unsigned cpu,
>   				   unsigned irq,
>   				   unsigned short evtchn,
>   				   unsigned short virq)
>   {
>   	struct irq_info *info = info_for_irq(irq);
>   
> -	xen_irq_info_common_init(info, irq, IRQT_VIRQ, evtchn, 0);
> -
>   	info->u.virq = virq;
>   
>   	per_cpu(virq_to_irq, cpu)[virq] = irq;
> +
> +	return xen_irq_info_common_setup(info, irq, IRQT_VIRQ, evtchn, 0);
>   }
>   
> -static void xen_irq_info_pirq_init(unsigned irq,
> +static int xen_irq_info_pirq_setup(unsigned irq,
>   				   unsigned short evtchn,
>   				   unsigned short pirq,
>   				   unsigned short gsi,
> @@ -163,12 +165,12 @@ static void xen_irq_info_pirq_init(unsigned irq,
>   {
>   	struct irq_info *info = info_for_irq(irq);
>   
> -	xen_irq_info_common_init(info, irq, IRQT_PIRQ, evtchn, 0);
> -
>   	info->u.pirq.pirq = pirq;
>   	info->u.pirq.gsi = gsi;
>   	info->u.pirq.domid = domid;
>   	info->u.pirq.flags = flags;
> +
> +	return xen_irq_info_common_setup(info, irq, IRQT_PIRQ, evtchn, 0);
>   }
>   
>   /*
> @@ -516,6 +518,47 @@ int xen_irq_from_gsi(unsigned gsi)
>   }
>   EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
>   
> +static void __unbind_from_irq(unsigned int irq)
> +{
> +	struct evtchn_close close;
> +	int evtchn = evtchn_from_irq(irq);
> +	struct irq_info *info = irq_get_handler_data(irq);
> +
> +	if (info->refcnt > 0) {
info may be NULL.
> +		info->refcnt--;
> +		if (info->refcnt != 0)
> +			return;
> +	}
> +
> +	if (VALID_EVTCHN(evtchn)) {
> +		close.port = evtchn;
> +		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
> +			BUG();
> +
> +		switch (type_from_irq(irq)) {
> +		case IRQT_VIRQ:
> +			per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
> +				[virq_from_irq(irq)] = -1;
> +			break;
> +		case IRQT_IPI:
> +			per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn))
> +				[ipi_from_irq(irq)] = -1;
> +			break;
> +		default:
> +			break;
> +		}
> +
> +		/* Closed ports are implicitly re-bound to VCPU0. */
> +		bind_evtchn_to_cpu(evtchn, 0);
> +
> +		evtchn_to_irq[evtchn] = -1;
> +	}
> +
> +	BUG_ON(info_for_irq(irq)->type == IRQT_UNBOUND);
> +
> +	xen_free_irq(irq);
> +}
> +
>   /*
>    * Do not make any assumptions regarding the relationship between the
>    * IRQ number returned here and the Xen pirq argument.
> @@ -531,6 +574,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
>   {
>   	int irq = -1;
>   	struct physdev_irq irq_op;
> +	int ret;
>   
>   	mutex_lock(&irq_mapping_update_lock);
>   
> @@ -558,8 +602,13 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
>   		goto out;
>   	}
>   
> -	xen_irq_info_pirq_init(irq, 0, pirq, gsi, DOMID_SELF,
> +	ret = xen_irq_info_pirq_setup(irq, 0, pirq, gsi, DOMID_SELF,
>   			       shareable ? PIRQ_SHAREABLE : 0);
> +	if (ret < 0) {
> +		__unbind_from_irq(irq);
> +		irq = ret;
> +		goto out;
> +	}
>   
>   	pirq_query_unmask(irq);
>   	/* We try to use the handler with the appropriate semantic for the
> @@ -619,7 +668,9 @@ int xen_bind_pirq_msi_to_irq(struct pci_dev *dev,
struct msi_desc *msidesc,
>   	irq_set_chip_and_handler_name(irq, &xen_pirq_chip, handle_edge_irq,
>   			name);
>   
> -	xen_irq_info_pirq_init(irq, 0, pirq, 0, domid, 0);
> +	ret = xen_irq_info_pirq_setup(irq, 0, pirq, 0, domid, 0);
> +	if (ret < 0)
> +		goto error_irq;
>   	ret = irq_set_msi_desc(irq, msidesc);
>   	if (ret < 0)
>   		goto error_irq;
> @@ -627,8 +678,8 @@ out:
>   	mutex_unlock(&irq_mapping_update_lock);
>   	return irq;
>   error_irq:
> +	__unbind_from_irq(irq);
>   	mutex_unlock(&irq_mapping_update_lock);
> -	xen_free_irq(irq);
>   	return ret;
>   }
>   #endif
> @@ -698,9 +749,11 @@ int xen_pirq_from_irq(unsigned irq)
>   	return pirq_from_irq(irq);
>   }
>   EXPORT_SYMBOL_GPL(xen_pirq_from_irq);
> +
>   int bind_evtchn_to_irq(unsigned int evtchn)
>   {
>   	int irq;
> +	int ret;
>   
>   	mutex_lock(&irq_mapping_update_lock);
>   
> @@ -714,7 +767,12 @@ int bind_evtchn_to_irq(unsigned int evtchn)
>   		irq_set_chip_and_handler_name(irq, &xen_dynamic_chip,
>   					      handle_edge_irq, "event");
>   
> -		xen_irq_info_evtchn_init(irq, evtchn);
> +		ret = xen_irq_info_evtchn_setup(irq, evtchn);
> +		if (ret < 0) {
> +			__unbind_from_irq(irq);
> +			irq = ret;
> +			goto out;
> +		}
>   	} else {
>   		struct irq_info *info = info_for_irq(irq);
>   		WARN_ON(info == NULL || info->type != IRQT_EVTCHN);
> @@ -731,6 +789,7 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned
int cpu)
>   {
>   	struct evtchn_bind_ipi bind_ipi;
>   	int evtchn, irq;
> +	int ret;
>   
>   	mutex_lock(&irq_mapping_update_lock);
>   
> @@ -750,8 +809,12 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned
int cpu)
>   			BUG();
>   		evtchn = bind_ipi.port;
>   
> -		xen_irq_info_ipi_init(cpu, irq, evtchn, ipi);
> -
> +		ret = xen_irq_info_ipi_setup(cpu, irq, evtchn, ipi);
> +		if (ret < 0) {
> +			__unbind_from_irq(irq);
> +			irq = ret;
> +			goto out;
> +		}
>   		bind_evtchn_to_cpu(evtchn, cpu);
>   	} else {
>   		struct irq_info *info = info_for_irq(irq);
> @@ -830,7 +893,12 @@ int bind_virq_to_irq(unsigned int virq, unsigned int
cpu)
>   			evtchn = ret;
>   		}
>   
> -		xen_irq_info_virq_init(cpu, irq, evtchn, virq);
> +		ret = xen_irq_info_virq_setup(cpu, irq, evtchn, virq);
> +		if (ret < 0) {
> +			__unbind_from_irq(irq);
> +			irq = ret;
> +			goto out;
> +		}
>   
>   		bind_evtchn_to_cpu(evtchn, cpu);
>   	} else {
> @@ -846,50 +914,8 @@ out:
>   
>   static void unbind_from_irq(unsigned int irq)
>   {
> -	struct evtchn_close close;
> -	int evtchn = evtchn_from_irq(irq);
> -	struct irq_info *info = irq_get_handler_data(irq);
> -
> -	if (WARN_ON(!info))
> -		return;
> -
>   	mutex_lock(&irq_mapping_update_lock);
> -
> -	if (info->refcnt > 0) {
> -		info->refcnt--;
> -		if (info->refcnt != 0)
> -			goto done;
> -	}
> -
> -	if (VALID_EVTCHN(evtchn)) {
> -		close.port = evtchn;
> -		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
> -			BUG();
> -
> -		switch (type_from_irq(irq)) {
> -		case IRQT_VIRQ:
> -			per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
> -				[virq_from_irq(irq)] = -1;
> -			break;
> -		case IRQT_IPI:
> -			per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn))
> -				[ipi_from_irq(irq)] = -1;
> -			break;
> -		default:
> -			break;
> -		}
> -
> -		/* Closed ports are implicitly re-bound to VCPU0. */
> -		bind_evtchn_to_cpu(evtchn, 0);
> -
> -		evtchn_to_irq[evtchn] = -1;
> -	}
> -
> -	BUG_ON(info_for_irq(irq)->type == IRQT_UNBOUND);
> -
> -	xen_free_irq(irq);
> -
> - done:
> +	__unbind_from_irq(irq);
>   	mutex_unlock(&irq_mapping_update_lock);
>   }
>   
> @@ -1137,7 +1163,7 @@ void rebind_evtchn_irq(int evtchn, int irq)
>   	   so there should be a proper type */
>   	BUG_ON(info->type == IRQT_UNBOUND);
>   
> -	xen_irq_info_evtchn_init(irq, evtchn);
> +	xen_irq_info_evtchn_setup(irq, evtchn);
(void) xen_irq_info_evtchn_setup(irq, evtchn) ?

(here and below)

-boris
>   
>   	mutex_unlock(&irq_mapping_update_lock);
>   
> @@ -1312,7 +1338,7 @@ static void restore_cpu_virqs(unsigned int cpu)
>   		evtchn = bind_virq.port;
>   
>   		/* Record the new mapping. */
> -		xen_irq_info_virq_init(cpu, irq, evtchn, virq);
> +		xen_irq_info_virq_setup(cpu, irq, evtchn, virq);
>   		bind_evtchn_to_cpu(evtchn, cpu);
>   	}
>   }
> @@ -1336,7 +1362,7 @@ static void restore_cpu_ipis(unsigned int cpu)
>   		evtchn = bind_ipi.port;
>   
>   		/* Record the new mapping. */
> -		xen_irq_info_ipi_init(cpu, irq, evtchn, ipi);
> +		xen_irq_info_ipi_setup(cpu, irq, evtchn, ipi);
>   		bind_evtchn_to_cpu(evtchn, cpu);
>   	}
>   }

Boris Ostrovsky

2013-Oct-14 17:52 UTC

head link

Re: [PATCH 10/16] xen/events: Refactor evtchn_to_irq array to be dynamically allocated

On 10/08/2013 08:49 AM, David Vrabel wrote:>   
> @@ -482,13 +559,8 @@ static void shutdown_pirq(struct irq_data *data)
>   		return;
>   
>   	mask_evtchn(evtchn);
> -
> -	close.port = evtchn;
> -	if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
> -		BUG();
> -
> -	bind_evtchn_to_cpu(evtchn, 0);
> -	evtchn_to_irq[evtchn] = -1;
> +	xen_evtchn_close(evtchn);
> +	set_evtchn_to_irq(evtchn, -1);
Would it make sense to call set_evtchn_to_irq(evtchn, -1) from 
xen_evtchn_close()? If you are dismantling an event channel you probably 
always want to clear the irq mapping for that channel.

-boris

Boris Ostrovsky

2013-Oct-14 18:06 UTC

head link

Re: [PATCH 12/16] xen/evtchn: support more than 4096 ports

On 10/08/2013 08:49 AM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> Remove the check during unbind for NR_EVENT_CHANNELS as this limits
> support to less than 4096 ports.

This is an interface change, you will be returning a different type of 
error (ENOTCONN instead of EINVAL).

Why not check for xen_evtchn_nr_channels()?

-boris

>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>   drivers/xen/evtchn.c |    4 ----
>   1 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 8b3a69a..1b0fc44 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -416,10 +416,6 @@ static long evtchn_ioctl(struct file *file,
>   		if (copy_from_user(&unbind, uarg, sizeof(unbind)))
>   			break;
>   
> -		rc = -EINVAL;
> -		if (unbind.port >= NR_EVENT_CHANNELS)
> -			break;
> -
>   		rc = -ENOTCONN;
>   		evtchn = find_evtchn(u, unbind.port);
>   		if (!evtchn)

Boris Ostrovsky

2013-Oct-14 19:30 UTC

head link

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

On 10/08/2013 08:49 AM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> Implement all the event channel port ops for the FIFO-based ABI.
>
> If the hypervisor supports the FIFO-based ABI, enable it by
> initializing the control block for the boot VCPU and subsequent VCPUs
> as they are brought up and on resume.  The event array is expanded as
> required when event ports are setup.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>   drivers/xen/events/Makefile          |    1 +
>   drivers/xen/events/events.c          |    9 +-
>   drivers/xen/events/events_fifo.c     |  409
++++++++++++++++++++++++++++++++++
>   drivers/xen/events/events_internal.h |    8 +
>   4 files changed, 426 insertions(+), 1 deletions(-)
>   create mode 100644 drivers/xen/events/events_fifo.c
>
> diff --git a/drivers/xen/events/Makefile b/drivers/xen/events/Makefile
> index 3bed6e0..d0f1581 100644
> --- a/drivers/xen/events/Makefile
> +++ b/drivers/xen/events/Makefile
> @@ -1,2 +1,3 @@
>   obj-y += events.o
>   obj-y += events_2l.o
> +obj-y += events_fifo.o
> diff --git a/drivers/xen/events/events.c b/drivers/xen/events/events.c
> index fd65485..dec4d6d 100644
> --- a/drivers/xen/events/events.c
> +++ b/drivers/xen/events/events.c
> @@ -1538,6 +1538,7 @@ void xen_irq_resume(void)
>   
>   	/* New event-channel space is not ''live'' yet. */
>   	xen_evtchn_mask_all();
> +	xen_evtchn_resume();
>   
>   	/* No IRQ <-> event-channel mappings. */
>   	list_for_each_entry(info, &xen_irq_list_head, list)
> @@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {}
>   
>   void __init xen_init_IRQ(void)
>   {
> -	xen_evtchn_2l_init();
> +	int ret;
> +
> +	ret = xen_evtchn_fifo_init();
> +	if (ret < 0) {
> +		printk(KERN_INFO "xen: falling back to n-level event
channels");
> +		xen_evtchn_2l_init();
> +	}
Should we provide users with ability to choose which mechanism to use? 
Is there any advantage in staying with 2-level? Stability, I guess, 
would be one.
>   
>   	evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
>   				sizeof(*evtchn_to_irq), GFP_KERNEL);
> diff --git a/drivers/xen/events/events_fifo.c
b/drivers/xen/events/events_fifo.c
> new file mode 100644
> index 0000000..c109639
> --- /dev/null
> +++ b/drivers/xen/events/events_fifo.c
> @@ -0,0 +1,409 @@
> +/*
> + * Xen event channels (FIFO-based ABI)
> + *
> + * Copyright (C) 2013 Citrix Systems R&D ltd.
> + *
> + * This source code is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License, or (at your option) any later version.
> + *
> + * Or, when distributed separately from the Linux kernel or
> + * incorporated into other software packages, subject to the following
> + * license:
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
copy
> + * of this source file (the "Software"), to deal in the Software
without
> + * restriction, including without limitation the rights to use, copy,
modify,
> + * merge, publish, distribute, sublicense, and/or sell copies of the
Software,
> + * and to permit persons to whom the Software is furnished to do so,
subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included
in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include <linux/linkage.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/module.h>
> +#include <linux/smp.h>
> +#include <linux/percpu.h>
> +#include <linux/cpu.h>
> +
> +#include <asm/sync_bitops.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/hypervisor.h>
> +#include <asm/xen/page.h>
> +
> +#include <xen/xen.h>
> +#include <xen/xen-ops.h>
> +#include <xen/events.h>
> +#include <xen/interface/xen.h>
> +#include <xen/interface/event_channel.h>
> +
> +#include "events_internal.h"
> +
> +#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
> +#define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS /
EVENT_WORDS_PER_PAGE)
> +
> +struct evtchn_fifo_queue {
> +	uint32_t head[EVTCHN_FIFO_MAX_QUEUES];
> +};
> +
> +static DEFINE_PER_CPU(struct evtchn_fifo_control_block *,
cpu_control_block);
> +static DEFINE_PER_CPU(struct evtchn_fifo_queue, cpu_queue);
> +static event_word_t *event_array[MAX_EVENT_ARRAY_PAGES] __read_mostly;
> +static unsigned event_array_pages __read_mostly;
> +
> +#define BM(w) ((unsigned long *)(w))
This could go into a header file (events_internal.h?) since 2-level uses 
it as well.
> +
> +static inline event_word_t *event_word_from_port(unsigned port)
> +{
> +	unsigned i = port / EVENT_WORDS_PER_PAGE;
> +
> +	if (i >= event_array_pages)
> +		return NULL;
> +	return event_array[i] + port % EVENT_WORDS_PER_PAGE;
> +}
> +
> +static unsigned evtchn_fifo_max_channels(void)
> +{
> +	return EVTCHN_FIFO_NR_CHANNELS;
> +}
> +
> +static unsigned evtchn_fifo_nr_channels(void)
> +{
> +	return event_array_pages * EVENT_WORDS_PER_PAGE;
> +}
> +
> +static void free_unused_array_pages(void)
> +{
> +	unsigned i;
> +
> +	for (i = event_array_pages; i < MAX_EVENT_ARRAY_PAGES; i++) {
> +		if (!event_array[i])
> +			break;
> +		free_page((unsigned long)event_array[i]);
> +		event_array[i] = NULL;
> +	}
> +}
> +
> +static void init_array_page(event_word_t *array_page)
> +{
> +	unsigned i;
> +
> +	for (i = 0; i < EVENT_WORDS_PER_PAGE; i++)
> +		array_page[i] = 1 << EVTCHN_FIFO_MASKED;
> +}
> +
> +static int evtchn_fifo_setup(struct irq_info *info)
> +{
> +	unsigned port = info->evtchn;
> +	unsigned i;
> +	int ret = -ENOMEM;
> +
> +	i = port / EVENT_WORDS_PER_PAGE;
Something more descriptive than ''i'' would be better.
> +
> +	if (i >= MAX_EVENT_ARRAY_PAGES)
> +		return -EINVAL;
> +
> +	while (i >= event_array_pages) {
> +		void *array_page;
> +		struct evtchn_expand_array expand_array;
> +
> +		/* Might already have a page if we''ve resumed. */
> +		array_page = event_array[event_array_pages];
> +		if (!array_page) {
> +			array_page = (void *)__get_free_page(GFP_KERNEL);
> +			if (array_page == NULL)
> +				goto error;
> +			event_array[event_array_pages] = array_page;
> +		}
> +
> +		/* Mask all events in this page before adding it. */
> +		init_array_page(array_page);
> +
> +		expand_array.array_gfn = virt_to_mfn(array_page);
> +
> +		ret = HYPERVISOR_event_channel_op(EVTCHNOP_expand_array,
&expand_array);
> +		if (ret < 0)
> +			goto error;
> +
> +		event_array_pages++;
Should this increment happen in the ''if(!array_page)'' clause?
> +	}
> +	return 0;
> +
> +  error:
> +	if (event_array_pages == 0)
> +		panic("xen: unable to expand event array with initial page
(%d)\n", ret);
> +	else
> +		printk(KERN_ERR "xen: unable to expand event array (%d)\n",
ret);
> +	free_unused_array_pages();
Do you need to clean up in the hypervisor as well?
> +	return ret;
> +}
> +
> +static void evtchn_fifo_bind_to_cpu(struct irq_info *info, unsigned cpu)
> +{
> +	/* no-op */
> +}
> +
> +static void evtchn_fifo_clear_pending(unsigned port)
> +{
> +	event_word_t *word = event_word_from_port(port);
> +	sync_clear_bit(EVTCHN_FIFO_PENDING, BM(word));
> +}
> +
> +static void evtchn_fifo_set_pending(unsigned port)
> +{
> +	event_word_t *word = event_word_from_port(port);
> +	sync_set_bit(EVTCHN_FIFO_PENDING, BM(word));
> +}
> +
> +static bool evtchn_fifo_is_pending(unsigned port)
> +{
> +	event_word_t *word = event_word_from_port(port);
> +	return sync_test_bit(EVTCHN_FIFO_PENDING, BM(word));
> +}
> +
> +static bool evtchn_fifo_test_and_set_mask(unsigned port)
> +{
> +	event_word_t *word = event_word_from_port(port);
> +	return sync_test_and_set_bit(EVTCHN_FIFO_MASKED, BM(word));
> +}
> +
> +static void evtchn_fifo_mask(unsigned port)
> +{
> +	event_word_t *word = event_word_from_port(port);
> +	if (word)
> +		sync_set_bit(EVTCHN_FIFO_MASKED, BM(word));
You are testing ''word'' here but not in the routines above (or
below).
> +}
> +
> +static void evtchn_fifo_unmask(unsigned port)
> +{
> +	event_word_t *word = event_word_from_port(port);
> +
> +	BUG_ON(!irqs_disabled());
> +
> +	sync_clear_bit(EVTCHN_FIFO_MASKED, BM(word));
> +	if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))) {
> +		struct evtchn_unmask unmask = { .port = port };
> +		(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
> +	}
> +}
2-level unmasking is somewhat more elaborate, with it trying to avoid 
races on pending events. Is this not a concern here?
> +
> +static uint32_t clear_linked(volatile event_word_t *word)
> +{
> +	event_word_t new, old, w;
> +
> +	w = *word;
> +
> +	do {
> +		old = w;
> +		new = (w & ~((1 << EVTCHN_FIFO_LINKED)
> +			     | EVTCHN_FIFO_LINK_MASK));
> +	} while ((w = sync_cmpxchg(word, old, new)) != old);
> +
> +	return w & EVTCHN_FIFO_LINK_MASK;
> +}
> +
> +static void handle_irq_for_port(unsigned port)
> +{
> +	int irq;
> +	struct irq_desc *desc;
> +
> +	irq = get_evtchn_to_irq(port);
> +	if (irq != -1) {
> +		desc = irq_to_desc(irq);
> +		if (desc)
> +			generic_handle_irq_desc(irq, desc);
> +	}
> +}
> +
> +static void consume_one_event(unsigned cpu,
> +			      struct evtchn_fifo_control_block *control_block,
> +			      unsigned priority, uint32_t *ready)
> +{
> +	struct evtchn_fifo_queue *q = &per_cpu(cpu_queue, cpu);
> +	uint32_t head;
> +	unsigned port;
> +	event_word_t *word;
> +
> +	head = q->head[priority];
> +
> +	/* Reached the tail last time?  Read the new HEAD from the
> +	   control block. */
Comment style.
> +	if (head == 0) {
> +		rmb(); /* Ensure word is up-to-date before reading head. */
> +		head = control_block->head[priority];
> +	}
> +
> +	port = head;
> +	word = event_word_from_port(port);
Do you need to check for ''word!=NULL''? You don''t
check it in
clear_linked() (which is maybe where this should be done).
> +	head = clear_linked(word);
> +
> +	/*
> +	 * If the link is non-zero, there are more events in the
> +	 * queue, otherwise the queue is empty.
> +	 *
> +	 * If the queue is empty, clear this priority from our local
> +	 * copy of the ready word.
> +	 */
> +	if (head == 0)
> +		clear_bit(priority, BM(ready));
> +
> +	if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))
> +	    && !sync_test_bit(EVTCHN_FIFO_MASKED, BM(word)))
> +		handle_irq_for_port(port);
> +
> +	q->head[priority] = head;
> +}
> +
> +static void evtchn_fifo_handle_events(unsigned cpu)
> +{
> +	struct evtchn_fifo_control_block *control_block;
> +	uint32_t ready;
> +	unsigned q;
> +
> +	control_block = per_cpu(cpu_control_block, cpu);
> +
> +	ready = xchg(&control_block->ready, 0);
> +
> +	while (ready) {
> +		q = find_first_bit(BM(&ready), EVTCHN_FIFO_MAX_QUEUES);
> +		consume_one_event(cpu, control_block, q, &ready);
> +		ready |= xchg(&control_block->ready, 0);
> +	}
> +}
> +
> +static void evtchn_fifo_resume(void)
> +{
> +	unsigned cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		void *control_block = per_cpu(cpu_control_block, cpu);
> +		struct evtchn_init_control init_control;
> +		int ret;
> +
> +		if (!control_block)
> +			continue;
> +
> +		/*
> +		 * If this CPU is offline, take the opportunity to
> +		 * free the control block while it is not being
> +		 * used.
> +		 */
> +		if (!cpu_online(cpu)) {
> +			free_page((unsigned long)control_block);
> +			per_cpu(cpu_control_block, cpu) = NULL;
> +			continue;
> +		}
Have you tested offlining/onlining CPUs (lots of them)? I am asking 
because I see EVTCHNOP_init_control both here
and in init_control_block() but I don''t see anything that would
"deinit"
control block for which you are freeing the page above.

> +
> +		init_control.control_gfn = virt_to_mfn(control_block);
> +		init_control.offset = 0;
> +		init_control.vcpu = cpu;
> +
> +		ret = HYPERVISOR_event_channel_op(EVTCHNOP_init_control,
> +						  &init_control);
> +		if (ret < 0)
> +			BUG();
> +	}
> +
> +	/*
> +	 * The event array starts out as empty again and is extended
> +	 * as normal when events are bound.  The existing pages will
> +	 * be reused.
> +	 */
> +	event_array_pages = 0;
> +}
> +
> +static const struct evtchn_ops evtchn_ops_fifo = {
> +	.max_channels      = evtchn_fifo_max_channels,
> +	.nr_channels       = evtchn_fifo_nr_channels,
> +	.setup             = evtchn_fifo_setup,
> +	.bind_to_cpu       = evtchn_fifo_bind_to_cpu,
> +	.clear_pending     = evtchn_fifo_clear_pending,
> +	.set_pending       = evtchn_fifo_set_pending,
> +	.is_pending        = evtchn_fifo_is_pending,
> +	.test_and_set_mask = evtchn_fifo_test_and_set_mask,
> +	.mask              = evtchn_fifo_mask,
> +	.unmask            = evtchn_fifo_unmask,
> +	.handle_events     = evtchn_fifo_handle_events,
> +	.resume            = evtchn_fifo_resume,
> +};
> +
> +static int __cpuinit evtchn_fifo_init_control_block(unsigned cpu)
> +{
> +	struct page *control_block = NULL;
> +	struct evtchn_init_control init_control;
> +	int ret = -ENOMEM;
> +
> +	control_block = alloc_page(GFP_KERNEL|__GFP_ZERO);
> +	if (control_block == NULL)
> +		goto error;
> +
> +	init_control.control_gfn = virt_to_mfn(page_address(control_block));
> +	init_control.offset      = 0;
> +	init_control.vcpu        = cpu;
> +
> +	ret = HYPERVISOR_event_channel_op(EVTCHNOP_init_control,
&init_control);
> +	if (ret < 0)
> +		goto error;
> +
> +	per_cpu(cpu_control_block, cpu) = page_address(control_block);
> +
> +	return 0;
> +
> +  error:
> +	__free_page(control_block);
> +	return ret;
> +}
> +
> +static int __cpuinit evtchn_fifo_cpu_notification(struct notifier_block
*self,
> +						  unsigned long action,
> +						  void *hcpu)
> +{
> +	int cpu = (long)hcpu;
I don''t understand this. Yes, it''s everywhere in the kernel
but it looks
weird.
> +	int ret = 0;
> +
> +	switch (action) {
> +	case CPU_UP_PREPARE:
> +		if (!per_cpu(cpu_control_block, cpu))
> +			ret = evtchn_fifo_init_control_block(cpu);
> +		break;
> +	default:
> +		break;
> +	}
What happens when you offline a CPU?


-boris


> +	return ret < 0 ? NOTIFY_BAD : NOTIFY_OK;
> +}
> +
> +static struct notifier_block evtchn_fifo_cpu_notifier __cpuinitdata = {
> +	.notifier_call	= evtchn_fifo_cpu_notification,
> +};
> +
> +int __init xen_evtchn_fifo_init(void)
> +{
> +	int cpu = get_cpu();
> +	int ret;
> +
> +	ret = evtchn_fifo_init_control_block(cpu);
> +	if (ret < 0)
> +		goto out;
> +
> +	printk(KERN_INFO "xen: switching to FIFO-based event
channels\n");
> +
> +	evtchn_ops = &evtchn_ops_fifo;
> +
> +	register_cpu_notifier(&evtchn_fifo_cpu_notifier);
> +out:
> +	put_cpu();
> +	return ret;
> +}
> diff --git a/drivers/xen/events/events_internal.h
b/drivers/xen/events/events_internal.h
> index a3d9aec..0af518f 100644
> --- a/drivers/xen/events/events_internal.h
> +++ b/drivers/xen/events/events_internal.h
> @@ -69,6 +69,7 @@ struct evtchn_ops {
>   	void (*unmask)(unsigned port);
>   
>   	void (*handle_events)(unsigned cpu);
> +	void (*resume)(void);
>   };
>   
>   extern const struct evtchn_ops *evtchn_ops;
> @@ -142,6 +143,13 @@ static inline void xen_evtchn_handle_events(unsigned
cpu)
>   	return evtchn_ops->handle_events(cpu);
>   }
>   
> +static inline void xen_evtchn_resume(void)
> +{
> +	if (evtchn_ops->resume)
> +		evtchn_ops->resume();
> +}
> +
>   void xen_evtchn_2l_init(void);
> +int xen_evtchn_fifo_init(void);
>   
>   #endif /* #ifndef __EVENTS_INTERNAL_H__ */

David Vrabel

2013-Oct-15 18:58 UTC

head link

Re: [PATCH 10/16] xen/events: Refactor evtchn_to_irq array to be dynamically allocated

On 14/10/13 18:52, Boris Ostrovsky wrote:> On 10/08/2013 08:49 AM, David Vrabel wrote:
>>   @@ -482,13 +559,8 @@ static void shutdown_pirq(struct irq_data *data)
>>           return;
>>         mask_evtchn(evtchn);
>> -
>> -    close.port = evtchn;
>> -    if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
>> -        BUG();
>> -
>> -    bind_evtchn_to_cpu(evtchn, 0);
>> -    evtchn_to_irq[evtchn] = -1;
>> +    xen_evtchn_close(evtchn);
>> +    set_evtchn_to_irq(evtchn, -1);
> 
> Would it make sense to call set_evtchn_to_irq(evtchn, -1) from
> xen_evtchn_close()? If you are dismantling an event channel you probably
> always want to clear the irq mapping for that channel.
I''ve added an xen_irq_info_cleanup() function which is paired with the
various xen_irq_info_*setup() calls.

David

David Vrabel

2013-Oct-15 18:58 UTC

head link

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

On 14/10/13 20:30, Boris Ostrovsky wrote:> On 10/08/2013 08:49 AM, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Implement all the event channel port ops for the FIFO-based ABI.
>>
>> If the hypervisor supports the FIFO-based ABI, enable it by
>> initializing the control block for the boot VCPU and subsequent VCPUs
>> as they are brought up and on resume.  The event array is expanded as
>> required when event ports are setup.
[...]>> --- a/drivers/xen/events/events.c
>> +++ b/drivers/xen/events/events.c
[...]>> @@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {}
>>     void __init xen_init_IRQ(void)
>>   {
>> -    xen_evtchn_2l_init();
>> +    int ret;
>> +
>> +    ret = xen_evtchn_fifo_init();
>> +    if (ret < 0) {
>> +        printk(KERN_INFO "xen: falling back to n-level event
channels");
>> +        xen_evtchn_2l_init();
>> +    }
> 
> Should we provide users with ability to choose which mechanism to use?
> Is there any advantage in staying with 2-level? Stability, I guess,
> would be one.
If someone can demonstrate a use case where 2-level is better then we
could consider an option.  I don''t think we want options for new
software features just because they might be buggy.
>> --- /dev/null
>> +++ b/drivers/xen/events/events_fifo.c
[...]>> +#define BM(w) ((unsigned long *)(w))
> 
> This could go into a header file (events_internal.h?) since 2-level uses
> it as well.
Although they look the same they''re converting between different types.
 xen_ulong_t in the 2-level case and event_word_t in the fifo-based case
so I would prefer this to be local to both files.
>> +
>> +    if (i >= MAX_EVENT_ARRAY_PAGES)
>> +        return -EINVAL;
>> +
>> +    while (i >= event_array_pages) {
>> +        void *array_page;
>> +        struct evtchn_expand_array expand_array;
>> +
>> +        /* Might already have a page if we''ve resumed. */
>> +        array_page = event_array[event_array_pages];
>> +        if (!array_page) {
>> +            array_page = (void *)__get_free_page(GFP_KERNEL);
>> +            if (array_page == NULL)
>> +                goto error;
>> +            event_array[event_array_pages] = array_page;
>> +        }
>> +
>> +        /* Mask all events in this page before adding it. */
>> +        init_array_page(array_page);
>> +
>> +        expand_array.array_gfn = virt_to_mfn(array_page);
>> +
>> +        ret = HYPERVISOR_event_channel_op(EVTCHNOP_expand_array,
>> &expand_array);
>> +        if (ret < 0)
>> +            goto error;
>> +
>> +        event_array_pages++;
> 
> Should this increment happen in the ''if(!array_page)''
clause?
No. event_array_pages is the number of pages Xen is aware of.  Note how
we zero it when resuming on a new domain with the FIFO-based ABI
initially disabled.
>> +    }
>> +    return 0;
>> +
>> +  error:
>> +    if (event_array_pages == 0)
>> +        panic("xen: unable to expand event array with initial
page
>> (%d)\n", ret);
>> +    else
>> +        printk(KERN_ERR "xen: unable to expand event array
(%d)\n",
>> ret);
>> +    free_unused_array_pages();
> 
> Do you need to clean up in the hypervisor as well?
There''s noting to clean up in the hypervisor here.
free_unused_array_pages() is freeing array pages that Xen is not aware of.
>> +static void evtchn_fifo_mask(unsigned port)
>> +{
>> +    event_word_t *word = event_word_from_port(port);
>> +    if (word)
>> +        sync_set_bit(EVTCHN_FIFO_MASKED, BM(word));
> 
> You are testing ''word'' here but not in the routines above
(or below).
I think the test can be removed.  The common code used to try and mask
all events even if there were no array pages yet, but it doesn''t do
this
any more.
>> +}
>> +
>> +static void evtchn_fifo_unmask(unsigned port)
>> +{
>> +    event_word_t *word = event_word_from_port(port);
>> +
>> +    BUG_ON(!irqs_disabled());
>> +
>> +    sync_clear_bit(EVTCHN_FIFO_MASKED, BM(word));
>> +    if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))) {
>> +        struct evtchn_unmask unmask = { .port = port };
>> +        (void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask,
&unmask);
>> +    }
>> +}
> 
> 2-level unmasking is somewhat more elaborate, with it trying to avoid
> races on pending events. Is this not a concern here?
The 2-level unmask is trying to avoid doing a hypercall as an
optimization.  This optimization is not possible so the code here is
much simpler.
>> +    if (head == 0) {
>> +        rmb(); /* Ensure word is up-to-date before reading head. */
>> +        head = control_block->head[priority];
>> +    }
>> +
>> +    port = head;
>> +    word = event_word_from_port(port);
> 
> Do you need to check for ''word!=NULL''? You don''t
check it in
> clear_linked() (which is maybe where this should be done).
I don''t think so.  The kernel trusts Xen to only set valid LINK fields.
>> +static void evtchn_fifo_resume(void)
>> +{
>> +    unsigned cpu;
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        void *control_block = per_cpu(cpu_control_block, cpu);
>> +        struct evtchn_init_control init_control;
>> +        int ret;
>> +
>> +        if (!control_block)
>> +            continue;
>> +
>> +        /*
>> +         * If this CPU is offline, take the opportunity to
>> +         * free the control block while it is not being
>> +         * used.
>> +         */
>> +        if (!cpu_online(cpu)) {
>> +            free_page((unsigned long)control_block);
>> +            per_cpu(cpu_control_block, cpu) = NULL;
>> +            continue;
>> +        }
> 
> Have you tested offlining/onlining CPUs (lots of them)? I am asking
> because I see EVTCHNOP_init_control both here
> and in init_control_block() but I don''t see anything that would
"deinit"
> control block for which you are freeing the page above.
It''s not possible to "deinit" a control block.  The
hypervisor
deliberately doesn''t provide an operation for this.

Note that evtchn_fifo_resume() is called when the guest is resumed in a
new domain which does not have any control blocks initialized yet. So,
in the case above, we''re freeing a control block that Xen
isn''t aware of
yet.
>> +    int ret = 0;
>> +
>> +    switch (action) {
>> +    case CPU_UP_PREPARE:
>> +        if (!per_cpu(cpu_control_block, cpu))
>> +            ret = evtchn_fifo_init_control_block(cpu);
>> +        break;
>> +    default:
>> +        break;
>> +    }
> 
> What happens when you offline a CPU?
All the control blocks remain initialized, available for use when the
CPU is onlined again.  This is no different to the per-VCPU shared info.

This does all work fine[1].

David

[1] Once I fixed a recent bug I introduced into patch 10 which would
accidentally trash the IPIs/VIRQs for VCPU 0 instead of the offlined
VCPU.  Oops.

David Vrabel

2013-Oct-15 19:20 UTC

head link

Re: [PATCH 08/16] xen/events: allow setup of irq_info to fail

On 14/10/13 18:26, Boris Ostrovsky wrote:> On 10/08/2013 08:49 AM, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> The FIFO-based event ABI requires additional setup of newly bound
>> events (it may need to expand the event array) and this setup may
>> fail.
>>
>> xen_irq_info_common_init() is a useful place to put this setup so
>> allow this call to fail.  This call and the other similar calls are
>> renamed to be *_setup() to reflect that they may now fail.
>>
>> This failure can only occur with new event channels not on rebind.
[...]>> --- a/drivers/xen/events/events.c
>> +++ b/drivers/xen/events/events.c
[...]>> +static int xen_irq_info_ipi_setup(unsigned cpu,
>>                     unsigned irq,
>>                     unsigned short evtchn,
>>                     enum ipi_vector ipi)
>>   {
>>       struct irq_info *info = info_for_irq(irq);
>>   -    xen_irq_info_common_init(info, irq, IRQT_IPI, evtchn, 0);
>> -
>>       info->u.ipi = ipi;
>>         per_cpu(ipi_to_irq, cpu)[ipi] = irq;
>> +
>> +    return xen_irq_info_common_setup(info, irq, IRQT_IPI, evtchn, 0);
> 
> Do you need to cleanup on error here (and in similar routines below)? I.e.
> 
>      per_cpu(ipi_to_irq, cpu)[ipi] = -1;
> 
> You may be trying to do this in __unbind_from _irq() but I think a
> routine that fails shouldn''t expects others to clean up after it.
If
> __unbind from irq() is used in other contexts (it probably is) then
> perhaps you can factor our the part that is relevant to setup routines
> only and call it from here.
It''s a lot easier to do all the cleanup in __unbind_from_irq(), no
matter what (part-initialized) state the irq is in.
>>   +static void __unbind_from_irq(unsigned int irq)
>> +{
>> +    struct evtchn_close close;
>> +    int evtchn = evtchn_from_irq(irq);
>> +    struct irq_info *info = irq_get_handler_data(irq);
>> +
>> +    if (info->refcnt > 0) {
> 
> info may be NULL.
I couldn''t see how.

There used to be a WARN_ON(!info) but I''ve never seen reports of this
ever triggering and I couldn''t find a case where it might.

David

Boris Ostrovsky

2013-Oct-15 20:39 UTC

head link

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

On 10/15/2013 02:58 PM, David Vrabel wrote:> On 14/10/13 20:30, Boris Ostrovsky wrote:
>> On 10/08/2013 08:49 AM, David Vrabel wrote:
>>> @@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {}
>>>      void __init xen_init_IRQ(void)
>>>    {
>>> -    xen_evtchn_2l_init();
>>> +    int ret;
>>> +
>>> +    ret = xen_evtchn_fifo_init();
>>> +    if (ret < 0) {
>>> +        printk(KERN_INFO "xen: falling back to n-level event
channels");
>>> +        xen_evtchn_2l_init();
>>> +    }
>> Should we provide users with ability to choose which mechanism to use?
>> Is there any advantage in staying with 2-level? Stability, I guess,
>> would be one.
> If someone can demonstrate a use case where 2-level is better then we
> could consider an option.  I don''t think we want options for new
> software features just because they might be buggy.
I think we should always try to have a way to force old behavior when a 
new feature is introduced. If a user discovers a bug we don''t want them
to wait for a fix when a simpler solution is possible. I can see that 
having this option in the kernel may be a bit too much but is there at 
least an option to force 2-level in the hypervisor?

Actually, is it even possible to have guests using different event 
mechanisms on the same system?
>
>>> +
>>> +    if (i >= MAX_EVENT_ARRAY_PAGES)
>>> +        return -EINVAL;
>>> +
>>> +    while (i >= event_array_pages) {
>>> +        void *array_page;
>>> +        struct evtchn_expand_array expand_array;
>>> +
>>> +        /* Might already have a page if we''ve resumed. */
>>> +        array_page = event_array[event_array_pages];
>>> +        if (!array_page) {
>>> +            array_page = (void *)__get_free_page(GFP_KERNEL);
>>> +            if (array_page == NULL)
>>> +                goto error;
>>> +            event_array[event_array_pages] = array_page;
>>> +        }
>>> +
>>> +        /* Mask all events in this page before adding it. */
>>> +        init_array_page(array_page);
>>> +
>>> +        expand_array.array_gfn = virt_to_mfn(array_page);
>>> +
>>> +        ret = HYPERVISOR_event_channel_op(EVTCHNOP_expand_array,
>>> &expand_array);
>>> +        if (ret < 0)
>>> +            goto error;
>>> +
>>> +        event_array_pages++;
>> Should this increment happen in the ''if(!array_page)''
clause?
> No. event_array_pages is the number of pages Xen is aware of.  Note how
> we zero it when resuming on a new domain with the FIFO-based ABI
> initially disabled.
>
>>> +    }
>>> +    return 0;
>>> +
>>> +  error:
>>> +    if (event_array_pages == 0)
>>> +        panic("xen: unable to expand event array with initial
page
>>> (%d)\n", ret);
>>> +    else
>>> +        printk(KERN_ERR "xen: unable to expand event array
(%d)\n",
>>> ret);
>>> +    free_unused_array_pages();
>> Do you need to clean up in the hypervisor as well?
> There''s noting to clean up in the hypervisor here.
> free_unused_array_pages() is freeing array pages that Xen is not aware of.
You made calls to ENTCHOP_expand_array that at some point calls 
map_domain_page_global(). Don''t you need to do
unmap_domain_page_global()?


-boris

David Vrabel

2013-Oct-16 09:46 UTC

head link

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

On 15/10/13 21:39, Boris Ostrovsky wrote:> On 10/15/2013 02:58 PM, David Vrabel wrote:
>> On 14/10/13 20:30, Boris Ostrovsky wrote:
>>> On 10/08/2013 08:49 AM, David Vrabel wrote:
>>>> @@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {}
>>>>      void __init xen_init_IRQ(void)
>>>>    {
>>>> -    xen_evtchn_2l_init();
>>>> +    int ret;
>>>> +
>>>> +    ret = xen_evtchn_fifo_init();
>>>> +    if (ret < 0) {
>>>> +        printk(KERN_INFO "xen: falling back to n-level
event
>>>> channels");
>>>> +        xen_evtchn_2l_init();
>>>> +    }
>>> Should we provide users with ability to choose which mechanism to
use?
>>> Is there any advantage in staying with 2-level? Stability, I guess,
>>> would be one.
>> If someone can demonstrate a use case where 2-level is better then we
>> could consider an option.  I don''t think we want options for
new
>> software features just because they might be buggy.
> 
> I think we should always try to have a way to force old behavior when a
> new feature is introduced. If a user discovers a bug we don''t want
them
> to wait for a fix when a simpler solution is possible. I can see that
> having this option in the kernel may be a bit too much but is there at
> least an option to force 2-level in the hypervisor?
The majority of the risk in this series is the refactoring patches and
not the new ABI code so an option to disable it wouldn''t really help.

Also, if we are not confidant that a feature is production ready that we
need knobs to turn it off then we shouldn''t merge it.
> Actually, is it even possible to have guests using different event
> mechanisms on the same system?
> 
>>>> +    free_unused_array_pages();
>>> Do you need to clean up in the hypervisor as well?
>> There''s noting to clean up in the hypervisor here.
>> free_unused_array_pages() is freeing array pages that Xen is not aware
>> of.
> 
> You made calls to ENTCHOP_expand_array that at some point calls
> map_domain_page_global(). Don''t you need to do
unmap_domain_page_global()?
This is done when Xen destroys the domain.

David

Boris Ostrovsky

2013-Oct-16 13:26 UTC

head link

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

On 10/16/2013 05:46 AM, David Vrabel wrote:> On 15/10/13 21:39, Boris Ostrovsky wrote:
>> On 10/15/2013 02:58 PM, David Vrabel wrote:
>>> On 14/10/13 20:30, Boris Ostrovsky wrote:
>>>> On 10/08/2013 08:49 AM, David Vrabel wrote:
>>>>> @@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {}
>>>>>       void __init xen_init_IRQ(void)
>>>>>     {
>>>>> -    xen_evtchn_2l_init();
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = xen_evtchn_fifo_init();
>>>>> +    if (ret < 0) {
>>>>> +        printk(KERN_INFO "xen: falling back to
n-level event
>>>>> channels");
>>>>> +        xen_evtchn_2l_init();
>>>>> +    }
>>>> Should we provide users with ability to choose which mechanism
to use?
>>>> Is there any advantage in staying with 2-level? Stability, I
guess,
>>>> would be one.
>>> If someone can demonstrate a use case where 2-level is better then
we
>>> could consider an option.  I don''t think we want options
for new
>>> software features just because they might be buggy.
>> I think we should always try to have a way to force old behavior when a
>> new feature is introduced. If a user discovers a bug we don''t
want them
>> to wait for a fix when a simpler solution is possible. I can see that
>> having this option in the kernel may be a bit too much but is there at
>> least an option to force 2-level in the hypervisor?
> The majority of the risk in this series is the refactoring patches and
> not the new ABI code so an option to disable it wouldn''t really
help.
>
> Also, if we are not confidant that a feature is production ready that we
> need knobs to turn it off then we shouldn''t merge it.
I disagree. If this were the case then hardware wouldn''t have chicken 
bits and there wouldn''t exist half of "no*" options to
kernel, for example.

We are introducing a rewrite of a critical component of the system. It 
has bugs (by definition) and there should be a way to fall back to an 
implementation that presumably has fewer bugs.

-boris

David Vrabel

2013-Oct-16 13:49 UTC

head link

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

On 16/10/13 14:26, Boris Ostrovsky wrote:> On 10/16/2013 05:46 AM, David Vrabel wrote:
>> 
>> The majority of the risk in this series is the refactoring patches and
>> not the new ABI code so an option to disable it wouldn''t
really help.
>>
>> Also, if we are not confidant that a feature is production ready that
we
>> need knobs to turn it off then we shouldn''t merge it.
> 
> I disagree. If this were the case then hardware wouldn''t have
chicken
> bits and there wouldn''t exist half of "no*" options to
kernel, for example.
> 
> We are introducing a rewrite of a critical component of the system. It
> has bugs (by definition) and there should be a way to fall back to an
> implementation that presumably has fewer bugs.
Ok.

David

Ian Campbell

2013-Oct-16 15:19 UTC

head link

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI

On Mon, 2013-10-14 at 14:41 +0100, David Vrabel wrote:> On 08/10/13 13:48, David Vrabel wrote:
> > This is the complete Linux guest-side implementation of the FIFO-based
> > event channel ABI described in this design document:
> > 
> >   http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf
> > 
> > Refer also to the Xen series.
> 
> The Xen series has now been merged.
Congrats. We should probably arrange for some sort of baseline level off
osstest support for this, which I suppose means a test job using some
new enough kernel

Long term I suppose this means adding a newer upstream kernel to the
flights (e.g. the next long term supported one?) but short term should
we be testing something? Do we test a linux-next tree of any sort?

Ian.

David Vrabel

2013-Oct-16 15:36 UTC

head link

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI

On 16/10/13 16:19, Ian Campbell wrote:> On Mon, 2013-10-14 at 14:41 +0100, David Vrabel wrote:
>> On 08/10/13 13:48, David Vrabel wrote:
>>> This is the complete Linux guest-side implementation of the
FIFO-based
>>> event channel ABI described in this design document:
>>>
>>>   http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf
>>>
>>> Refer also to the Xen series.
>>
>> The Xen series has now been merged.
> 
> Congrats. We should probably arrange for some sort of baseline level off
> osstest support for this, which I suppose means a test job using some
> new enough kernel
> 
> Long term I suppose this means adding a newer upstream kernel to the
> flights (e.g. the next long term supported one?) but short term should
> we be testing something? Do we test a linux-next tree of any sort?
Features like this could be tested by adding support to minios.  Someone
other than me would have to do this work though.

David

Ian Jackson

2013-Oct-16 15:38 UTC

head link

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI

Ian Campbell writes ("Re: [Xen-devel] [PATCHv6 00/16] Linux: FIFO-based
event channel ABI"):> On Mon, 2013-10-14 at 14:41 +0100, David Vrabel wrote:
...> Congrats. We should probably arrange for some sort of baseline level off
> osstest support for this, which I suppose means a test job using some
> new enough kernel
That would be fairly straightforward.  I would need to know the git
url and branch to get this kernel from - and have a promise that it
would get only fast-forward updates.  I''d set it up with its own push
gate.
> Long term I suppose this means adding a newer upstream kernel to the
> flights (e.g. the next long term supported one?) but short term should
> we be testing something? Do we test a linux-next tree of any sort?
We do test linux-linus occasionally.  It doesn''t normally do very
well.

Ian.

Xen devel - Oct 2013 - [PATCHv6 00/16] Linux: FIFO-based event channel ABI

[PATCHv6 00/16] Linux: FIFO-based event channel ABI

[PATCH 01/16] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn()

[PATCH 02/16] xen/events: remove unnecessary init_evtchn_cpu_bindings()

[PATCH 03/16] xen/events: introduce test_and_set_mask

[PATCH 04/16] xen/events: replace raw bit ops with functions

[PATCH 05/16] xen/events: move drivers/xen/events.c into drivers/xen/events/

[PATCH 06/16] xen/events: move 2-level specific code into its own file

[PATCH 07/16] xen/events: add struct evtchn_ops for the low-level port operations

[PATCH 08/16] xen/events: allow setup of irq_info to fail

[PATCH 09/16] xen/events: add a evtchn_op for port setup

[PATCH 10/16] xen/events: Refactor evtchn_to_irq array to be dynamically allocated

[PATCH 11/16] xen/events: add xen_evtchn_mask_all()

[PATCH 12/16] xen/evtchn: support more than 4096 ports

[PATCH 13/16] xen/events: Add the hypervisor interface for the FIFO-based event channels

[PATCH 14/16] xen/events: allow event channel priority to be set

[PATCH 15/16] xen/x86: set VIRQ_TIMER priority to maximum

[PATCH 16/16] xen/events: use the FIFO-based ABI if available

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI

Re: [PATCH 01/16] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn()

Re: [PATCH 04/16] xen/events: replace raw bit ops with functions

Re: [PATCH 01/16] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn()

Re: [PATCH 04/16] xen/events: replace raw bit ops with functions

Re: [PATCH 06/16] xen/events: move 2-level specific code into its own file

Re: [PATCH 06/16] xen/events: move 2-level specific code into its own file

Re: [PATCH 08/16] xen/events: allow setup of irq_info to fail

Re: [PATCH 10/16] xen/events: Refactor evtchn_to_irq array to be dynamically allocated

Re: [PATCH 12/16] xen/evtchn: support more than 4096 ports

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

Re: [PATCH 10/16] xen/events: Refactor evtchn_to_irq array to be dynamically allocated

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

Re: [PATCH 08/16] xen/events: allow setup of irq_info to fail

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

Re: [PATCH 16/16] xen/events: use the FIFO-based ABI if available

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI

Re: [PATCHv6 00/16] Linux: FIFO-based event channel ABI