thr3ads.net - Xen devel - [RFC PATCH V2] Implement 3-level event channel support in Xen [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Wei Liu

2013-Jan-21 14:30 UTC

[RFC PATCH V2] Implement 3-level event channel support in Xen

This is version 2 of the patch series. I''ve broken down the series to
small
patches. I also added more comments in the commit log.

Apart from normal fixes and cleanups, the differences between V1 and V2 are:
* Use function pointers to get rid of switch statments
* Do not manipulate VCPU state
* No more gcc-ism code in public headers
* Consolidate some boilerplates using macros

The compat shim is not implemented at the moment. Will do this when we reach
consensus on the interface.

Under what curcumstance should we enable 3-level event channel is still open
for discussion.


Thanks
Wei.

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 01/14] Remove trailing whitespaces in event_channel.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 2d7afc9..9231eb0 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1,15 +1,15 @@
 /******************************************************************************
  * event_channel.c
- * 
+ *
  * Event notifications from VIRQs, PIRQs, and other domains.
- * 
+ *
  * Copyright (c) 2003-2006, K A Fraser.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
@@ -238,7 +238,7 @@ static long
evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     lchn->u.interdomain.remote_dom  = rd;
     lchn->u.interdomain.remote_port = (u16)rport;
     lchn->state                     = ECS_INTERDOMAIN;
-    
+
     rchn->u.interdomain.remote_dom  = ld;
     rchn->u.interdomain.remote_port = (u16)lport;
     rchn->state                     = ECS_INTERDOMAIN;
@@ -255,7 +255,7 @@ static long
evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     spin_unlock(&ld->event_lock);
     if ( ld != rd )
         spin_unlock(&rd->event_lock);
-    
+
     rcu_unlock_domain(rd);
 
     return rc;
@@ -633,7 +633,7 @@ static void evtchn_set_pending(struct vcpu *v, int port)
     {
         vcpu_mark_events_pending(v);
     }
-    
+
     /* Check if some VCPU might be polling for this event. */
     if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
         return;
@@ -930,7 +930,7 @@ int evtchn_unmask(unsigned int port)
 
     /*
      * These operations must happen in strict order. Based on
-     * include/xen/event.h:evtchn_set_pending(). 
+     * include/xen/event.h:evtchn_set_pending().
      */
     if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) &&
          test_bit          (port, &shared_info(d, evtchn_pending))
&&
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 02/14] Remove trailing whitespaces in sched.h

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 90a6537..39f85d2 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -92,7 +92,7 @@ void evtchn_destroy_final(struct domain *d); /* from
complete_domain_destroy */
 
 struct waitqueue_vcpu;
 
-struct vcpu 
+struct vcpu
 {
     int              vcpu_id;
 
@@ -453,7 +453,7 @@ struct domain *domain_create(
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
  * This is the preferred function if the returned domain reference
- * is short lived,  but it cannot be used if the domain reference needs 
+ * is short lived,  but it cannot be used if the domain reference needs
  * to be kept beyond the current scope (e.g., across a softirq).
  * The returned domain reference must be discarded using rcu_unlock_domain().
  */
@@ -574,7 +574,7 @@ void sync_local_execstate(void);
  * sync_vcpu_execstate() will switch and commit @prev''s state.
  */
 void context_switch(
-    struct vcpu *prev, 
+    struct vcpu *prev,
     struct vcpu *next);
 
 /*
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 03/14] Add evtchn_level in struct domain

This field is manipulated by hypervisor only, so if anything goes
wrong, it is a bug.

The default event channel level is 2, which has a two level lookup
structure: a selector in struct vcpu and a shared bitmap in shared
info.

The up coming 3-level event channel utilies a three level lookup
structure: a top level selector and second level selector for every
vcpu, and shared bitmap.

When building a domain, it starts with 2-level event channel, which is
guaranteed to be always supported by the hypervisor. If a domain wants
to use N (>=3) level event channel, it must explicitly issue a
hypercall to setup N level event channel.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |    1 +
 xen/include/xen/sched.h    |   19 ++++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 9231eb0..bc5db10 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1173,6 +1173,7 @@ void notify_via_xen_event_channel(struct domain *ld, int
lport)
 int evtchn_init(struct domain *d)
 {
     spin_lock_init(&d->event_lock);
+    d->evtchn_level = EVTCHN_DEFAULT_LEVEL; /* = 2 */
     if ( get_free_port(d) != 0 )
         return -EINVAL;
     evtchn_from_port(d, 0)->state = ECS_RESERVED;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 39f85d2..aa97407 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -50,7 +50,23 @@ extern struct domain *dom0;
 #else
 #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_LONG)
 #endif
-#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
+
+#define EVTCHN_2_LEVEL       2
+#define EVTCHN_3_LEVEL       3
+#define EVTCHN_DEFAULT_LEVEL EVTCHN_2_LEVEL
+#define MAX_EVTCHNS_L2(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
+#define MAX_EVTCHNS_L3(d) (MAX_EVTCHNS_L2(d) * BITS_PER_EVTCHN_WORD(d))
+#define MAX_EVTCHNS(d) ({ int __v = 0;          \
+            switch ( d->evtchn_level ) {        \
+            case EVTCHN_2_LEVEL:                \
+                __v = MAX_EVTCHNS_L2(d); break; \
+            case EVTCHN_3_LEVEL:                \
+                __v = MAX_EVTCHNS_L3(d); break; \
+            default:                            \
+                BUG();                          \
+            };                                  \
+            __v;})
+
 #define EVTCHNS_PER_BUCKET 128
 #define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
 
@@ -262,6 +278,7 @@ struct domain
     /* Event channel information. */
     struct evtchn   *evtchn[NR_EVTCHN_BUCKETS];
     spinlock_t       event_lock;
+    unsigned int     evtchn_level;
 
     struct grant_table *grant_table;
 
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 04/14] Dynamically allocate d->evtchn

As we move to N level evtchn we need bigger d->evtchn, as a result
this will bloat struct domain. So move this array out of struct domain
and allocate a dedicated page for it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |   17 +++++++++++++++--
 xen/include/xen/sched.h    |    2 +-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index bc5db10..a5d96ab 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1172,16 +1172,27 @@ void notify_via_xen_event_channel(struct domain *ld, int
lport)
 
 int evtchn_init(struct domain *d)
 {
+    BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE);
+    d->evtchn = alloc_xenheap_page();
+
+    if ( d->evtchn == NULL )
+        return -ENOMEM;
+    clear_page(d->evtchn);
+
     spin_lock_init(&d->event_lock);
     d->evtchn_level = EVTCHN_DEFAULT_LEVEL; /* = 2 */
-    if ( get_free_port(d) != 0 )
+    if ( get_free_port(d) != 0 ) {
+        free_xenheap_page(d->evtchn);
         return -EINVAL;
+    }
     evtchn_from_port(d, 0)->state = ECS_RESERVED;
 
 #if MAX_VIRT_CPUS > BITS_PER_LONG
     d->poll_mask = xmalloc_array(unsigned long,
BITS_TO_LONGS(MAX_VIRT_CPUS));
-    if ( !d->poll_mask )
+    if ( !d->poll_mask ) {
+        free_xenheap_page(d->evtchn);
         return -ENOMEM;
+    }
     bitmap_zero(d->poll_mask, MAX_VIRT_CPUS);
 #endif
 
@@ -1215,6 +1226,8 @@ void evtchn_destroy(struct domain *d)
     spin_unlock(&d->event_lock);
 
     clear_global_virq_handlers(d);
+
+    free_xenheap_page(d->evtchn);
 }
 
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index aa97407..c876892 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -276,7 +276,7 @@ struct domain
     spinlock_t       rangesets_lock;
 
     /* Event channel information. */
-    struct evtchn   *evtchn[NR_EVTCHN_BUCKETS];
+    struct evtchn  **evtchn;
     spinlock_t       event_lock;
     unsigned int     evtchn_level;
 
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 05/14] Bump EVTCHNS_PER_BUCKET to 512

For 64 bit build and 3-level event channel and the original value of
EVTCHNS_PER_BUCKET (128), the space needed to accommodate d->evtchn
would be 4 pages (PAGE_SIZE = 4096). Given that not every domain needs
3-level event channel, this leads to waste of memory. Also we''ve
restricted d->evtchn to one page, if we move to 3-level event channel,
Xen cannot build.

Having EVTCHN_PER_BUCKETS to be 512 can occupy exact one page.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c876892..eae9baf 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -67,7 +67,7 @@ extern struct domain *dom0;
             };                                  \
             __v;})
 
-#define EVTCHNS_PER_BUCKET 128
+#define EVTCHNS_PER_BUCKET 512
 #define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
 
 struct evtchn
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 06/14] Add evtchn_is_{pending, masked} and evtchn_clear_pending

Some code paths access the arrays in shared info directly. This only
works with 2-level event channel.

Add functions to abstract away implementation details.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/irq.c         |    7 +++----
 xen/common/event_channel.c |   22 +++++++++++++++++++---
 xen/common/keyhandler.c    |    6 ++----
 xen/common/schedule.c      |    2 +-
 xen/include/xen/event.h    |    6 ++++++
 5 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 095c17d..121c7d3 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1452,7 +1452,7 @@ int pirq_guest_unmask(struct domain *d)
         {
             pirq = pirqs[i]->pirq;
             if ( pirqs[i]->masked &&
-                 !test_bit(pirqs[i]->evtchn, &shared_info(d,
evtchn_mask)) )
+                 !evtchn_is_masked(d, pirqs[i]->evtchn) )
                 pirq_guest_eoi(pirqs[i]);
         }
     } while ( ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) );
@@ -2088,13 +2088,12 @@ static void dump_irqs(unsigned char key)
                 info = pirq_info(d, pirq);
                 printk("%u:%3d(%c%c%c%c)",
                        d->domain_id, pirq,
-                       (test_bit(info->evtchn,
-                                 &shared_info(d, evtchn_pending)) ?
+                       (evtchn_is_pending(d, info->evtchn) ?
                         ''P'' : ''-''),
                        (test_bit(info->evtchn / BITS_PER_EVTCHN_WORD(d),
                                  &vcpu_info(d->vcpu[0],
evtchn_pending_sel)) ?
                         ''S'' : ''-''),
-                       (test_bit(info->evtchn, &shared_info(d,
evtchn_mask)) ?
+                       (evtchn_is_masked(d, info->evtchn) ?
                         ''M'' : ''-''),
                        (info->masked ? ''M'' :
''-''));
                 if ( i != action->nr_guests )
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index a5d96ab..1df2b76 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -95,6 +95,7 @@ static uint8_t
get_xen_consumer(xen_event_channel_notification_t fn)
 #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1])
 
 static void evtchn_set_pending(struct vcpu *v, int port);
+static void evtchn_clear_pending(struct domain *d, int port);
 
 static int virq_is_global(uint32_t virq)
 {
@@ -156,6 +157,16 @@ static int get_free_port(struct domain *d)
     return port;
 }
 
+int evtchn_is_pending(struct domain *d, int port)
+{
+    return test_bit(port, &shared_info(d, evtchn_pending));
+}
+
+int evtchn_is_masked(struct domain *d, int port)
+{
+    return test_bit(port, &shared_info(d, evtchn_mask));
+}
+
 
 static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 {
@@ -529,7 +540,7 @@ static long __evtchn_close(struct domain *d1, int port1)
     }
 
     /* Clear pending event to avoid unexpected behavior on re-bind. */
-    clear_bit(port1, &shared_info(d1, evtchn_pending));
+    evtchn_clear_pending(d1, port1);
 
     /* Reset binding to vcpu0 when the channel is freed. */
     chn1->state          = ECS_FREE;
@@ -653,6 +664,11 @@ static void evtchn_set_pending(struct vcpu *v, int port)
     }
 }
 
+static void evtchn_clear_pending(struct domain *d, int port)
+{
+    clear_bit(port, &shared_info(d, evtchn_pending));
+}
+
 int guest_enabled_event(struct vcpu *v, uint32_t virq)
 {
     return ((v != NULL) && (v->virq_to_evtchn[virq] != 0));
@@ -1283,8 +1299,8 @@ static void domain_dump_evtchn_info(struct domain *d)
 
         printk("    %4u [%d/%d]: s=%d n=%d x=%d",
                port,
-               !!test_bit(port, &shared_info(d, evtchn_pending)),
-               !!test_bit(port, &shared_info(d, evtchn_mask)),
+               !!evtchn_is_pending(d, port),
+               !!evtchn_is_masked(d, port),
                chn->state, chn->notify_vcpu_id, chn->xen_consumer);
 
         switch ( chn->state )
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 2c5c230..16bc452 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -301,10 +301,8 @@ static void dump_domains(unsigned char key)
             printk("Notifying guest %d:%d (virq %d, port %d, stat
%d/%d/%d)\n",
                    d->domain_id, v->vcpu_id,
                    VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG],
-                   test_bit(v->virq_to_evtchn[VIRQ_DEBUG], 
-                            &shared_info(d, evtchn_pending)),
-                   test_bit(v->virq_to_evtchn[VIRQ_DEBUG], 
-                            &shared_info(d, evtchn_mask)),
+                   evtchn_is_pending(d, v->virq_to_evtchn[VIRQ_DEBUG]),
+                   evtchn_is_masked(d, v->virq_to_evtchn[VIRQ_DEBUG]),
                    test_bit(v->virq_to_evtchn[VIRQ_DEBUG] /
                             BITS_PER_EVTCHN_WORD(d),
                             &vcpu_info(v, evtchn_pending_sel)));
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e6a90d8..1bf010e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -693,7 +693,7 @@ static long do_poll(struct sched_poll *sched_poll)
             goto out;
 
         rc = 0;
-        if ( test_bit(port, &shared_info(d, evtchn_pending)) )
+        if ( evtchn_is_pending(d, port) )
             goto out;
     }
 
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 71c3e92..d6a8269 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -54,6 +54,12 @@ int evtchn_unmask(unsigned int port);
 /* Move all PIRQs after a vCPU was moved to another pCPU. */
 void evtchn_move_pirqs(struct vcpu *v);
 
+/* Tell a given event-channel port is pending or not */
+int evtchn_is_pending(struct domain *d, int port);
+
+/* Tell a given event-channel port is masked or not */
+int evtchn_is_masked(struct domain *d, int port);
+
 /* Allocate/free a Xen-attached event channel port. */
 typedef void (*xen_event_channel_notification_t)(
     struct vcpu *v, unsigned int port);
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 07/14] Genneralized event channel operations

Add struct xen_evtchn_ops *eops in struct domain to reference current
operation function set.

When building a domain, the default operation set is 2-level operation
set.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |   64 ++++++++++++++++++++++++++++++++------------
 xen/include/xen/sched.h    |    2 ++
 2 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 1df2b76..e8faf7d 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -51,6 +51,15 @@
 
 #define consumer_is_xen(e) (!!(e)->xen_consumer)
 
+/* N-level event channel should implement following operations */
+struct xen_evtchn_ops {
+    void (*set_pending)(struct vcpu *v, int port);
+    void (*clear_pending)(struct domain *d, int port);
+    int  (*unmask)(unsigned int port);
+    int  (*is_pending)(struct domain *d, int port);
+    int  (*is_masked)(struct domain *d, int port);
+};
+
 /*
  * The function alloc_unbound_xen_event_channel() allows an arbitrary
  * notifier function to be specified. However, very few unique functions
@@ -94,9 +103,6 @@ static uint8_t
get_xen_consumer(xen_event_channel_notification_t fn)
 /* Get the notification function for a given Xen-bound event channel. */
 #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1])
 
-static void evtchn_set_pending(struct vcpu *v, int port);
-static void evtchn_clear_pending(struct domain *d, int port);
-
 static int virq_is_global(uint32_t virq)
 {
     int rc;
@@ -157,16 +163,25 @@ static int get_free_port(struct domain *d)
     return port;
 }
 
-int evtchn_is_pending(struct domain *d, int port)
+static int evtchn_is_pending_l2(struct domain *d, int port)
 {
     return test_bit(port, &shared_info(d, evtchn_pending));
 }
 
-int evtchn_is_masked(struct domain *d, int port)
+static int evtchn_is_masked_l2(struct domain *d, int port)
 {
     return test_bit(port, &shared_info(d, evtchn_mask));
 }
 
+int evtchn_is_pending(struct domain *d, int port)
+{
+    return d->eops->is_pending(d, port);
+}
+
+int evtchn_is_masked(struct domain *d, int port)
+{
+    return d->eops->is_masked(d, port);
+}
 
 static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 {
@@ -258,7 +273,7 @@ static long
evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
      * We may have lost notifications on the remote unbound port. Fix that up
      * here by conservatively always setting a notification on the local port.
      */
-    evtchn_set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);
+    ld->eops->set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);
 
     bind->local_port = lport;
 
@@ -540,7 +555,7 @@ static long __evtchn_close(struct domain *d1, int port1)
     }
 
     /* Clear pending event to avoid unexpected behavior on re-bind. */
-    evtchn_clear_pending(d1, port1);
+    d1->eops->clear_pending(d1, port1);
 
     /* Reset binding to vcpu0 when the channel is freed. */
     chn1->state          = ECS_FREE;
@@ -605,10 +620,10 @@ int evtchn_send(struct domain *d, unsigned int lport)
         if ( consumer_is_xen(rchn) )
             (*xen_notification_fn(rchn))(rvcpu, rport);
         else
-            evtchn_set_pending(rvcpu, rport);
+            rd->eops->set_pending(rvcpu, rport);
         break;
     case ECS_IPI:
-        evtchn_set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);
+        ld->eops->set_pending(ld->vcpu[lchn->notify_vcpu_id],
lport);
         break;
     case ECS_UNBOUND:
         /* silently drop the notification */
@@ -623,7 +638,7 @@ out:
     return ret;
 }
 
-static void evtchn_set_pending(struct vcpu *v, int port)
+static void evtchn_set_pending_l2(struct vcpu *v, int port)
 {
     struct domain *d = v->domain;
     int vcpuid;
@@ -664,7 +679,7 @@ static void evtchn_set_pending(struct vcpu *v, int port)
     }
 }
 
-static void evtchn_clear_pending(struct domain *d, int port)
+static void evtchn_clear_pending_l2(struct domain *d, int port)
 {
     clear_bit(port, &shared_info(d, evtchn_pending));
 }
@@ -678,6 +693,7 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
 {
     unsigned long flags;
     int port;
+    struct domain *d = v->domain;
 
     ASSERT(!virq_is_global(virq));
 
@@ -687,7 +703,7 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
     if ( unlikely(port == 0) )
         goto out;
 
-    evtchn_set_pending(v, port);
+    d->eops->set_pending(v, port);
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
@@ -716,7 +732,7 @@ static void send_guest_global_virq(struct domain *d,
uint32_t virq)
         goto out;
 
     chn = evtchn_from_port(d, port);
-    evtchn_set_pending(d->vcpu[chn->notify_vcpu_id], port);
+    d->eops->set_pending(d->vcpu[chn->notify_vcpu_id], port);
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
@@ -740,7 +756,7 @@ void send_guest_pirq(struct domain *d, const struct pirq
*pirq)
     }
 
     chn = evtchn_from_port(d, port);
-    evtchn_set_pending(d->vcpu[chn->notify_vcpu_id], port);
+    d->eops->set_pending(d->vcpu[chn->notify_vcpu_id], port);
 }
 
 static struct domain *global_virq_handlers[NR_VIRQS] __read_mostly;
@@ -932,7 +948,7 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int
vcpu_id)
 }
 
 
-int evtchn_unmask(unsigned int port)
+static int evtchn_unmask_l2(unsigned int port)
 {
     struct domain *d = current->domain;
     struct vcpu   *v;
@@ -959,6 +975,12 @@ int evtchn_unmask(unsigned int port)
     return 0;
 }
 
+int evtchn_unmask(unsigned int port)
+{
+    struct domain *d = current->domain;
+    return d->eops->unmask(port);
+}
+
 
 static long evtchn_reset(evtchn_reset_t *r)
 {
@@ -1179,12 +1201,19 @@ void notify_via_xen_event_channel(struct domain *ld, int
lport)
         rd    = lchn->u.interdomain.remote_dom;
         rport = lchn->u.interdomain.remote_port;
         rchn  = evtchn_from_port(rd, rport);
-        evtchn_set_pending(rd->vcpu[rchn->notify_vcpu_id], rport);
+        rd->eops->set_pending(rd->vcpu[rchn->notify_vcpu_id],
rport);
     }
 
     spin_unlock(&ld->event_lock);
 }
 
+static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l2 = {
+    .set_pending      = evtchn_set_pending_l2,
+    .clear_pending    = evtchn_clear_pending_l2,
+    .unmask           = evtchn_unmask_l2,
+    .is_pending       = evtchn_is_pending_l2,
+    .is_masked        = evtchn_is_masked_l2,
+};
 
 int evtchn_init(struct domain *d)
 {
@@ -1197,6 +1226,7 @@ int evtchn_init(struct domain *d)
 
     spin_lock_init(&d->event_lock);
     d->evtchn_level = EVTCHN_DEFAULT_LEVEL; /* = 2 */
+    d->eops = &xen_evtchn_ops_l2;
     if ( get_free_port(d) != 0 ) {
         free_xenheap_page(d->evtchn);
         return -EINVAL;
@@ -1272,7 +1302,6 @@ void evtchn_move_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
-
 static void domain_dump_evtchn_info(struct domain *d)
 {
     unsigned int port;
@@ -1334,6 +1363,7 @@ static void domain_dump_evtchn_info(struct domain *d)
     spin_unlock(&d->event_lock);
 }
 
+
 static void dump_evtchn_info(unsigned char key)
 {
     struct domain *d;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index eae9baf..df3b877 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -69,6 +69,7 @@ extern struct domain *dom0;
 
 #define EVTCHNS_PER_BUCKET 512
 #define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
+struct xen_evtchn_ops;
 
 struct evtchn
 {
@@ -279,6 +280,7 @@ struct domain
     struct evtchn  **evtchn;
     spinlock_t       event_lock;
     unsigned int     evtchn_level;
+    struct xen_evtchn_ops *eops;
 
     struct grant_table *grant_table;
 
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 08/14] Define N-level event channel registration interface

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/event_channel.h |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/xen/include/public/event_channel.h
b/xen/include/public/event_channel.h
index 07ff321..c5194d9 100644
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -71,6 +71,7 @@
 #define EVTCHNOP_bind_vcpu        8
 #define EVTCHNOP_unmask           9
 #define EVTCHNOP_reset           10
+#define EVTCHNOP_register_nlevel 11
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
@@ -258,6 +259,38 @@ struct evtchn_reset {
 typedef struct evtchn_reset evtchn_reset_t;
 
 /*
+ * EVTCHNOP_register_nlevel: Register N-level event channel
+ * NOTES:
+ *  1. Currently only 3-level is supported.
+ *  2. Should fall back to 2-level if this call fails.
+ */
+/* 64 bit guests need 8 pages for evtchn_pending and evtchn_mask for
+ * 256k event channels while 32 bit ones only need 1 page for 32k
+ * event channels. */
+#define EVTCHN_MAX_L3_PAGES 8
+struct evtchn_register_3level {
+    /* IN parameters. */
+    uint32_t nr_pages;
+    XEN_GUEST_HANDLE(xen_pfn_t) evtchn_pending;
+    XEN_GUEST_HANDLE(xen_pfn_t) evtchn_mask;
+    uint32_t nr_vcpus;
+    XEN_GUEST_HANDLE(xen_pfn_t) l2sel_mfns;
+    XEN_GUEST_HANDLE(xen_pfn_t) l2sel_offsets;
+};
+typedef struct evtchn_register_3level evtchn_register_3level_t;
+DEFINE_XEN_GUEST_HANDLE(evtchn_register_3level_t);
+
+struct evtchn_register_nlevel {
+    /* IN parameters. */
+    uint32_t level;
+    union {
+        XEN_GUEST_HANDLE(evtchn_register_3level_t) l3;
+    } u;
+};
+typedef struct evtchn_register_nlevel evtchn_register_nlevel_t;
+DEFINE_XEN_GUEST_HANDLE(evtchn_register_nlevel_t);
+
+/*
  * ` enum neg_errnoval
  * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op)
  * `
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 09/14] Update Xen public header

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/xen.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 5593066..ff308c0 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -554,9 +554,14 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
 
 /*
  * Event channel endpoints per domain:
+ * 2-level:
  *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
+ * 3-level:
+ *  32k if a long is 32 bits; 256k if a long is 64 bits.
  */
-#define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)
+#define NR_EVENT_CHANNELS_L2 (sizeof(unsigned long) * sizeof(unsigned long) *
64)
+#define NR_EVENT_CHANNELS_L3 (NR_EVENT_CHANNELS_L2 * 64)
+#define NR_EVENT_CHANNELS NR_EVENT_CHANNELS_L2 /* for compatibility */
 
 struct vcpu_time_info {
     /*
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 10/14] Add control structures for 3-level event channel

The references to shared bitmap pending / mask are embedded in struct domain.
And pointer to the second level selector is embedded in struct vcpu.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index df3b877..d6e3a03 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -24,6 +24,7 @@
 #include <public/sysctl.h>
 #include <public/vcpu.h>
 #include <public/mem_event.h>
+#include <public/event_channel.h>
 
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
@@ -119,6 +120,9 @@ struct vcpu
 
     struct domain   *domain;
 
+    /* For 3-level event channels */
+    unsigned long   *evtchn_pending_sel_l2;
+
     struct vcpu     *next_in_list;
 
     s_time_t         periodic_period;
@@ -281,6 +285,8 @@ struct domain
     spinlock_t       event_lock;
     unsigned int     evtchn_level;
     struct xen_evtchn_ops *eops;
+    unsigned long   *evtchn_pending[EVTCHN_MAX_L3_PAGES];
+    unsigned long   *evtchn_mask[EVTCHN_MAX_L3_PAGES];
 
     struct grant_table *grant_table;
 
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 11/14] Introduce some macros for event channels

For N-level event channels, the shared bitmaps in the hypervisor are by design
not guaranteed to be contigious.

These macros are used to calculate page number / offset within a page of a
given event channel.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/asm-x86/config.h |    4 +++-
 xen/include/xen/sched.h      |   13 +++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index ce3a7c0..1a40d80 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -8,11 +8,13 @@
 #define __X86_CONFIG_H__
 
 #define LONG_BYTEORDER 3
+#define LONG_BITORDER 6
+#define BYTE_BITORDER 3
 #define CONFIG_PAGING_LEVELS 4
 
 #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
 #define BITS_PER_LONG (BYTES_PER_LONG << 3)
-#define BITS_PER_BYTE 8
+#define BITS_PER_BYTE (1 << BYTE_BITORDER)
 
 #define CONFIG_X86 1
 #define CONFIG_X86_HT 1
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d6e3a03..0c3af04 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -69,6 +69,19 @@ extern struct domain *dom0;
             __v;})
 
 #define EVTCHNS_PER_BUCKET 512
+/* N.B. EVTCHNS_PER_PAGE is always powers of 2, use shifts to optimize */
+#define EVTCHNS_SHIFT (PAGE_SHIFT+BYTE_BITORDER)
+#define EVTCHNS_PER_PAGE (_AC(1,L) << EVTCHNS_SHIFT)
+#define EVTCHN_MASK (~(EVTCHNS_PER_PAGE-1))
+#define EVTCHN_PAGE_NO(chn) ((chn) >> EVTCHNS_SHIFT)
+#define EVTCHN_OFFSET_IN_PAGE(chn) ((chn) & ~EVTCHN_MASK)
+
+#ifndef CONFIG_COMPAT
+#define EVTCHN_WORD_BITORDER(d) LONG_BITORDER
+#else
+#define EVTCHN_WORD_BITORDER(d) (has_32bit_shinfo(d) ? 5 : LONG_BITORDER)
+#endif
+
 #define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
 struct xen_evtchn_ops;
 
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 12/14] Make NR_EVTCHN_BUCKETS 3-level ready

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0c3af04..5ab6c61 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -82,7 +82,7 @@ extern struct domain *dom0;
 #define EVTCHN_WORD_BITORDER(d) (has_32bit_shinfo(d) ? 5 : LONG_BITORDER)
 #endif
 
-#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
+#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS_L3 / EVTCHNS_PER_BUCKET)
 struct xen_evtchn_ops;
 
 struct evtchn
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

NOTE: the registration call is always failed because other part of the code is
not yet completed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |  287 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 287 insertions(+)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index e8faf7d..54a847e 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -26,6 +26,7 @@
 #include <xen/compat.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
+#include <xen/paging.h>
 #include <asm/current.h>
 
 #include <public/xen.h>
@@ -1008,6 +1009,267 @@ out:
 }
 
 
+static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending,
+                            xen_pfn_t *mask, int nr_pages)
+{
+    int rc;
+    void *mapping;
+    struct page_info *pginfo;
+    unsigned long gfn;
+    int pending_count = 0, mask_count = 0;
+
+#define __MAP(src, dst, cnt)                                    \
+    for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ )                \
+    {                                                           \
+        rc = -EINVAL;                                           \
+        gfn = (src)[(cnt)];                                     \
+        pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);    \
+        if ( !pginfo )                                          \
+            goto err;                                           \
+        if ( !get_page_type(pginfo, PGT_writable_page) )        \
+        {                                                       \
+            put_page(pginfo);                                   \
+            goto err;                                           \
+        }                                                       \
+        mapping = __map_domain_page_global(pginfo);             \
+        if ( !mapping )                                         \
+        {                                                       \
+            put_page_and_type(pginfo);                          \
+            rc = -ENOMEM;                                       \
+            goto err;                                           \
+        }                                                       \
+        (dst)[(cnt)] = mapping;                                 \
+    }
+
+    __MAP(pending, d->evtchn_pending, pending_count)
+    __MAP(mask, d->evtchn_mask, mask_count)
+#undef __MAP
+
+    rc = 0;
+
+ err:
+    return rc;
+}
+
+static void __unmap_l3_arrays(struct domain *d)
+{
+    int i;
+    unsigned long mfn;
+
+    for ( i = 0; i < EVTCHN_MAX_L3_PAGES; i++ )
+    {
+        if ( d->evtchn_pending[i] != 0 )
+        {
+            mfn = domain_page_map_to_mfn(d->evtchn_pending[i]);
+            unmap_domain_page_global(d->evtchn_pending[i]);
+            put_page_and_type(mfn_to_page(mfn));
+            d->evtchn_pending[i] = 0;
+        }
+        if ( d->evtchn_mask[i] != 0 )
+        {
+            mfn = domain_page_map_to_mfn(d->evtchn_mask[i]);
+            unmap_domain_page_global(d->evtchn_mask[i]);
+            put_page_and_type(mfn_to_page(mfn));
+            d->evtchn_mask[i] = 0;
+        }
+    }
+}
+
+static long __map_l2_selector(struct vcpu *v, unsigned long gfn,
+                              unsigned long off)
+{
+    void *mapping;
+    int rc;
+    struct page_info *page;
+    struct domain *d = v->domain;
+
+    rc = -EINVAL;   /* common errno for following operations */
+
+    /* Sanity check: L2 selector has maximum size of sizeof(unsigned
+     * long) * 8, this size is equal to the size of shared bitmap
+     * array of 2-level event channel. */
+    if ( off + sizeof(unsigned long) * 8 >= PAGE_SIZE )
+        goto out;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        goto out;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        goto out;
+    }
+
+    /* Use global mapping here, because we need to map selector for
+     * other vcpu (v != current). However this mapping is only used by
+     * v when guest is running. */
+    mapping = __map_domain_page_global(page);
+
+    if ( mapping == NULL )
+    {
+        put_page_and_type(page);
+        rc = -ENOMEM;
+        goto out;
+    }
+
+    v->evtchn_pending_sel_l2 = mapping + off;
+    rc = 0;
+
+ out:
+    return rc;
+}
+
+static void __unmap_l2_selector(struct vcpu *v)
+{
+    unsigned long mfn;
+
+    if ( v->evtchn_pending_sel_l2 )
+    {
+        mfn = domain_page_map_to_mfn(v->evtchn_pending_sel_l2);
+        unmap_domain_page_global(v->evtchn_pending_sel_l2);
+        put_page_and_type(mfn_to_page(mfn));
+        v->evtchn_pending_sel_l2 = NULL;
+    }
+}
+
+static void __evtchn_unmap_all_3level(struct domain *d)
+{
+    struct vcpu *v;
+    for_each_vcpu ( d, v )
+        __unmap_l2_selector(v);
+    __unmap_l3_arrays(d);
+}
+
+static void __evtchn_setup_bitmap_l3(struct domain *d)
+{
+    struct vcpu *v;
+
+    /* Easy way to setup 3-level bitmap, just move existing selector
+     * to next level then copy pending array and mask array */
+    for_each_vcpu ( d, v )
+    {
+        memcpy(&v->evtchn_pending_sel_l2[0],
+               &vcpu_info(v, evtchn_pending_sel),
+               sizeof(vcpu_info(v, evtchn_pending_sel)));
+        memset(&vcpu_info(v, evtchn_pending_sel), 0,
+               sizeof(vcpu_info(v, evtchn_pending_sel)));
+        set_bit(0, &vcpu_info(v, evtchn_pending_sel));
+    }
+
+    memcpy(d->evtchn_pending[0], &shared_info(d, evtchn_pending),
+           sizeof(shared_info(d, evtchn_pending)));
+    memcpy(d->evtchn_mask[0], &shared_info(d, evtchn_mask),
+           sizeof(shared_info(d, evtchn_mask)));
+}
+
+static long evtchn_register_3level(
+    XEN_GUEST_HANDLE_PARAM(evtchn_register_3level_t) arg)
+{
+    struct domain *d = current->domain;
+    struct evtchn_register_3level r;
+    struct vcpu *v;
+    int rc = 0;
+    xen_pfn_t *evtchn_pending = NULL;
+    xen_pfn_t *evtchn_mask = NULL;
+    xen_pfn_t *l2sel_mfns = NULL;
+    xen_pfn_t *l2sel_offsets = NULL;
+
+    if ( d->evtchn_level == EVTCHN_3_LEVEL )
+    {
+        rc = -EINVAL;
+        goto out;
+    }
+
+    if ( copy_from_guest(&r, arg, 1) )
+    {
+        rc = -EFAULT;
+        goto out;
+    }
+
+    if ( r.nr_vcpus > d->max_vcpus ||
+         r.nr_pages > EVTCHN_MAX_L3_PAGES )
+    {
+        rc = -EINVAL;
+        goto out;
+    }
+
+    evtchn_pending = xzalloc_array(xen_pfn_t, r.nr_pages);
+    evtchn_mask = xzalloc_array(xen_pfn_t, r.nr_pages);
+    l2sel_mfns = xzalloc_array(xen_pfn_t, r.nr_vcpus);
+    l2sel_offsets = xzalloc_array(xen_pfn_t, r.nr_vcpus);
+
+    if ( !evtchn_pending || !evtchn_mask ||
+         !l2sel_mfns || !l2sel_offsets )
+    {
+        rc = -ENOMEM;
+        goto out_free;
+    }
+
+#define __COPY_ARRAY(_d, _s, _nr)                       \
+    if ( copy_from_guest((_d), (_s), (_nr)) )           \
+    {                                                   \
+        rc = -EFAULT;                                   \
+        goto out_free;                                  \
+    }
+    __COPY_ARRAY(evtchn_pending, r.evtchn_pending, r.nr_pages)
+    __COPY_ARRAY(evtchn_mask, r.evtchn_mask, r.nr_pages)
+    __COPY_ARRAY(l2sel_mfns, r.l2sel_mfns, r.nr_vcpus)
+    __COPY_ARRAY(l2sel_offsets, r.l2sel_offsets, r.nr_vcpus)
+#undef __COPY_ARRAY
+
+    rc = __map_l3_arrays(d, evtchn_pending, evtchn_mask, r.nr_pages);
+    if ( rc )
+        goto out_free;
+
+    for_each_vcpu ( d, v )
+    {
+        if ( (rc = __map_l2_selector(v, l2sel_mfns[v->vcpu_id],
+                                     l2sel_offsets[v->vcpu_id])) )
+        {
+            __evtchn_unmap_all_3level(d);
+            goto out_free;
+        }
+    }
+
+    __evtchn_setup_bitmap_l3(d);
+
+    d->evtchn_level = EVTCHN_3_LEVEL;
+
+ out_free:
+    if ( evtchn_pending )
+        xfree(evtchn_pending);
+    if ( evtchn_mask )
+        xfree(evtchn_mask);
+    if ( l2sel_mfns )
+        xfree(l2sel_mfns);
+    if ( l2sel_offsets )
+        xfree(l2sel_offsets);
+ out:
+    return rc;
+}
+
+static long evtchn_register_nlevel(struct evtchn_register_nlevel *reg)
+{
+    struct domain *d = current->domain;
+    int rc;
+
+    spin_lock(&d->event_lock);
+
+    switch ( reg->level )
+    {
+    case EVTCHN_3_LEVEL:
+        rc = evtchn_register_3level(reg->u.l3);
+        break;
+    default:
+        rc = -EINVAL;
+    }
+
+    spin_unlock(&d->event_lock);
+
+    return rc;
+}
+
 long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     long rc;
@@ -1116,6 +1378,18 @@ long do_event_channel_op(int cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case EVTCHNOP_register_nlevel: {
+        struct evtchn_register_nlevel reg;
+        if ( copy_from_guest(&reg, arg, 1) != 0 )
+            return -EFAULT;
+        rc = evtchn_register_nlevel(&reg);
+
+        /* XXX always fails this call because it is not yet completed */
+        rc = -EINVAL;
+
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
@@ -1245,6 +1519,17 @@ int evtchn_init(struct domain *d)
     return 0;
 }
 
+static void evtchn_unmap_nlevel(struct domain *d)
+{
+    switch ( d->evtchn_level )
+    {
+    case EVTCHN_3_LEVEL:
+        __evtchn_unmap_all_3level(d);
+        break;
+    default:
+        break;
+    }
+}
 
 void evtchn_destroy(struct domain *d)
 {
@@ -1273,6 +1558,8 @@ void evtchn_destroy(struct domain *d)
 
     clear_global_virq_handlers(d);
 
+    evtchn_unmap_nlevel(d);
+
     free_xenheap_page(d->evtchn);
 }
 
-- 
1.7.10.4

Wei Liu

2013-Jan-21 14:30 UTC

head link

[RFC PATCH V2 14/14] Implement 3-level event channel routines

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |  150 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 122 insertions(+), 28 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 54a847e..ae58f00 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -169,11 +169,25 @@ static int evtchn_is_pending_l2(struct domain *d, int
port)
     return test_bit(port, &shared_info(d, evtchn_pending));
 }
 
+static int evtchn_is_pending_l3(struct domain *d, int port)
+{
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    return test_bit(offset, d->evtchn_pending[page_no]);
+}
+
 static int evtchn_is_masked_l2(struct domain *d, int port)
 {
     return test_bit(port, &shared_info(d, evtchn_mask));
 }
 
+static int evtchn_is_masked_l3(struct domain *d, int port)
+{
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    return test_bit(offset, d->evtchn_mask[page_no]);
+}
+
 int evtchn_is_pending(struct domain *d, int port)
 {
     return d->eops->is_pending(d, port);
@@ -639,10 +653,33 @@ out:
     return ret;
 }
 
+static void __check_vcpu_polling(struct vcpu *v, int port)
+{
+    int vcpuid;
+    struct domain *d = v->domain;
+
+    /* Check if some VCPU might be polling for this event. */
+    if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
+        return;
+
+    /* Wake any interested (or potentially interested) pollers. */
+    for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
+          vcpuid < d->max_vcpus;
+          vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
+    {
+        v = d->vcpu[vcpuid];
+        if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port))
&&
+             test_and_clear_bit(vcpuid, d->poll_mask) )
+        {
+            v->poll_evtchn = 0;
+            vcpu_unblock(v);
+        }
+    }
+}
+
 static void evtchn_set_pending_l2(struct vcpu *v, int port)
 {
     struct domain *d = v->domain;
-    int vcpuid;
 
     /*
      * The following bit operations must happen in strict order.
@@ -661,23 +698,35 @@ static void evtchn_set_pending_l2(struct vcpu *v, int
port)
         vcpu_mark_events_pending(v);
     }
 
-    /* Check if some VCPU might be polling for this event. */
-    if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
-        return;
+    __check_vcpu_polling(v, port);
+}
 
-    /* Wake any interested (or potentially interested) pollers. */
-    for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
-          vcpuid < d->max_vcpus;
-          vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
+static void evtchn_set_pending_l3(struct vcpu *v, int port)
+{
+    struct domain *d = v->domain;
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    unsigned int l1bit = port >> (EVTCHN_WORD_BITORDER(d) << 1);
+    unsigned int l2bit = port >> EVTCHN_WORD_BITORDER(d);
+
+    /*
+     * The following bit operations must happen in strict order.
+     * NB. On x86, the atomic bit operations also act as memory barriers.
+     * There is therefore sufficiently strict ordering for this architecture --
+     * others may require explicit memory barriers.
+     */
+
+    if ( test_and_set_bit(offset, d->evtchn_pending[page_no]) )
+         return;
+
+    if ( !test_bit(offset, d->evtchn_mask[page_no]) &&
+         !test_and_set_bit(l2bit, v->evtchn_pending_sel_l2) &&
+         !test_and_set_bit(l1bit, &vcpu_info(v, evtchn_pending_sel)) )
     {
-        v = d->vcpu[vcpuid];
-        if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port))
&&
-             test_and_clear_bit(vcpuid, d->poll_mask) )
-        {
-            v->poll_evtchn = 0;
-            vcpu_unblock(v);
-        }
+        vcpu_mark_events_pending(v);
     }
+
+    __check_vcpu_polling(v, port);
 }
 
 static void evtchn_clear_pending_l2(struct domain *d, int port)
@@ -685,6 +734,13 @@ static void evtchn_clear_pending_l2(struct domain *d, int
port)
     clear_bit(port, &shared_info(d, evtchn_pending));
 }
 
+static void evtchn_clear_pending_l3(struct domain *d, int port)
+{
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    clear_bit(offset, d->evtchn_pending[page_no]);
+}
+
 int guest_enabled_event(struct vcpu *v, uint32_t virq)
 {
     return ((v != NULL) && (v->virq_to_evtchn[virq] != 0));
@@ -976,6 +1032,37 @@ static int evtchn_unmask_l2(unsigned int port)
     return 0;
 }
 
+static int evtchn_unmask_l3(unsigned int port)
+{
+    struct domain *d = current->domain;
+    struct vcpu   *v;
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    unsigned int l1bit = port >> (EVTCHN_WORD_BITORDER(d) << 1);
+    unsigned int l2bit = port >> EVTCHN_WORD_BITORDER(d);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( unlikely(!port_is_valid(d, port)) )
+        return -EINVAL;
+
+    v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id];
+
+    /*
+     * These operations must happen in strict order. Based on
+     * include/xen/event.h:evtchn_set_pending().
+     */
+    if ( test_and_clear_bit(offset, d->evtchn_mask[page_no]) &&
+         test_bit          (offset, d->evtchn_pending[page_no]) &&
+         !test_and_set_bit (l2bit, v->evtchn_pending_sel_l2) &&
+         !test_and_set_bit (l1bit, &vcpu_info(v, evtchn_pending_sel)) )
+    {
+        vcpu_mark_events_pending(v);
+    }
+
+    return 0;
+}
+
 int evtchn_unmask(unsigned int port)
 {
     struct domain *d = current->domain;
@@ -1163,6 +1250,22 @@ static void __evtchn_setup_bitmap_l3(struct domain *d)
            sizeof(shared_info(d, evtchn_mask)));
 }
 
+static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l2 = {
+    .set_pending      = evtchn_set_pending_l2,
+    .clear_pending    = evtchn_clear_pending_l2,
+    .unmask           = evtchn_unmask_l2,
+    .is_pending       = evtchn_is_pending_l2,
+    .is_masked        = evtchn_is_masked_l2,
+};
+
+static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l3 = {
+    .set_pending      = evtchn_set_pending_l3,
+    .clear_pending    = evtchn_clear_pending_l3,
+    .unmask           = evtchn_unmask_l3,
+    .is_pending       = evtchn_is_pending_l3,
+    .is_masked        = evtchn_is_masked_l3,
+};
+
 static long evtchn_register_3level(
     XEN_GUEST_HANDLE_PARAM(evtchn_register_3level_t) arg)
 {
@@ -1235,6 +1338,7 @@ static long evtchn_register_3level(
     __evtchn_setup_bitmap_l3(d);
 
     d->evtchn_level = EVTCHN_3_LEVEL;
+    d->eops = &xen_evtchn_ops_l3;
 
  out_free:
     if ( evtchn_pending )
@@ -1383,10 +1487,6 @@ long do_event_channel_op(int cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&reg, arg, 1) != 0 )
             return -EFAULT;
         rc = evtchn_register_nlevel(&reg);
-
-        /* XXX always fails this call because it is not yet completed */
-        rc = -EINVAL;
-
         break;
     }
 
@@ -1481,14 +1581,6 @@ void notify_via_xen_event_channel(struct domain *ld, int
lport)
     spin_unlock(&ld->event_lock);
 }
 
-static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l2 = {
-    .set_pending      = evtchn_set_pending_l2,
-    .clear_pending    = evtchn_clear_pending_l2,
-    .unmask           = evtchn_unmask_l2,
-    .is_pending       = evtchn_is_pending_l2,
-    .is_masked        = evtchn_is_masked_l2,
-};
-
 int evtchn_init(struct domain *d)
 {
     BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE);
@@ -1597,8 +1689,10 @@ static void domain_dump_evtchn_info(struct domain *d)
     bitmap_scnlistprintf(keyhandler_scratch, sizeof(keyhandler_scratch),
                          d->poll_mask, d->max_vcpus);
     printk("Event channel information for domain %d:\n"
+           "Using %d-level event channel\n"
            "Polling vCPUs: {%s}\n"
-           "    port [p/m]\n", d->domain_id, keyhandler_scratch);
+           "    port [p/m]\n",
+           d->domain_id, d->evtchn_level, keyhandler_scratch);
 
     spin_lock(&d->event_lock);
 
-- 
1.7.10.4

Jan Beulich

2013-Jan-21 16:36 UTC

head link

Re: [RFC PATCH V2 07/14] Genneralized event channel operations

>>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com> wrote:
> +static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l2 = {
"const" instead of "__read_mostly".
> +    .set_pending      = evtchn_set_pending_l2,
> +    .clear_pending    = evtchn_clear_pending_l2,
> +    .unmask           = evtchn_unmask_l2,
> +    .is_pending       = evtchn_is_pending_l2,
> +    .is_masked        = evtchn_is_masked_l2,
> +};
> @@ -1272,7 +1302,6 @@ void evtchn_move_pirqs(struct vcpu *v)
>      spin_unlock(&d->event_lock);
>  }
>  
> -
>  static void domain_dump_evtchn_info(struct domain *d)
>  {
>      unsigned int port;
> @@ -1334,6 +1363,7 @@ static void domain_dump_evtchn_info(struct domain *d)
>      spin_unlock(&d->event_lock);
>  }
>  
> +
>  static void dump_evtchn_info(unsigned char key)
>  {
>      struct domain *d;
Stray newline adjustments?
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -69,6 +69,7 @@ extern struct domain *dom0;
>  
>  #define EVTCHNS_PER_BUCKET 512
>  #define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
> +struct xen_evtchn_ops;
No need to forward declare this here.
>  
>  struct evtchn
>  {
> @@ -279,6 +280,7 @@ struct domain
>      struct evtchn  **evtchn;
>      spinlock_t       event_lock;
>      unsigned int     evtchn_level;
> +    struct xen_evtchn_ops *eops;
>  
>      struct grant_table *grant_table;
Jan

Jan Beulich

2013-Jan-21 16:38 UTC

head link

Re: [RFC PATCH V2 08/14] Define N-level event channel registration interface

>>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com> wrote:
> @@ -258,6 +259,38 @@ struct evtchn_reset {
>  typedef struct evtchn_reset evtchn_reset_t;
>  
>  /*
> + * EVTCHNOP_register_nlevel: Register N-level event channel
> + * NOTES:
> + *  1. Currently only 3-level is supported.
> + *  2. Should fall back to 2-level if this call fails.
> + */
> +/* 64 bit guests need 8 pages for evtchn_pending and evtchn_mask for
> + * 256k event channels while 32 bit ones only need 1 page for 32k
> + * event channels. */
> +#define EVTCHN_MAX_L3_PAGES 8
> +struct evtchn_register_3level {
> +    /* IN parameters. */
> +    uint32_t nr_pages;
> +    XEN_GUEST_HANDLE(xen_pfn_t) evtchn_pending;
> +    XEN_GUEST_HANDLE(xen_pfn_t) evtchn_mask;
> +    uint32_t nr_vcpus;
Any reason not to put this adjacent to the other uint32_t?
> +    XEN_GUEST_HANDLE(xen_pfn_t) l2sel_mfns;
> +    XEN_GUEST_HANDLE(xen_pfn_t) l2sel_offsets;
> +};
> +typedef struct evtchn_register_3level evtchn_register_3level_t;
> +DEFINE_XEN_GUEST_HANDLE(evtchn_register_3level_t);
> +
> +struct evtchn_register_nlevel {
> +    /* IN parameters. */
> +    uint32_t level;
> +    union {
> +        XEN_GUEST_HANDLE(evtchn_register_3level_t) l3;
Do you really need the extra level of indirection here (i.e. can''t
you embed the structure rather than having a handle to it)?
> +    } u;
> +};
> +typedef struct evtchn_register_nlevel evtchn_register_nlevel_t;
> +DEFINE_XEN_GUEST_HANDLE(evtchn_register_nlevel_t);
> +
> +/*
>   * ` enum neg_errnoval
>   * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op)
>   * `
Jan

Jan Beulich

2013-Jan-21 16:40 UTC

head link

Re: [RFC PATCH V2 09/14] Update Xen public header

>>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com> wrote:
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -554,9 +554,14 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
>  
>  /*
>   * Event channel endpoints per domain:
> + * 2-level:
>   *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
> + * 3-level:
> + *  32k if a long is 32 bits; 256k if a long is 64 bits.
>   */
> -#define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) *
64)
> +#define NR_EVENT_CHANNELS_L2 (sizeof(unsigned long) * sizeof(unsigned
long) * 64)
> +#define NR_EVENT_CHANNELS_L3 (NR_EVENT_CHANNELS_L2 * 64)
> +#define NR_EVENT_CHANNELS NR_EVENT_CHANNELS_L2 /* for compatibility */
Might consider putting an

#if !defined(__XEN__) && !defined(__XEN_TOOLS__)

around the last line, to make sure no references to the old symbol
remain (or re-appear).

Jan
>  
>  struct vcpu_time_info {
>      /*

Jan Beulich

2013-Jan-21 16:46 UTC

head link

Re: [RFC PATCH V2 11/14] Introduce some macros for event channels

>>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com> wrote:
> --- a/xen/include/asm-x86/config.h
> +++ b/xen/include/asm-x86/config.h
> @@ -8,11 +8,13 @@
>  #define __X86_CONFIG_H__
>  
>  #define LONG_BYTEORDER 3
> +#define LONG_BITORDER 6
> +#define BYTE_BITORDER 3
The former ought to use the latter.
>  #define CONFIG_PAGING_LEVELS 4
>  
>  #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
>  #define BITS_PER_LONG (BYTES_PER_LONG << 3)
> -#define BITS_PER_BYTE 8
> +#define BITS_PER_BYTE (1 << BYTE_BITORDER)
>  
>  #define CONFIG_X86 1
>  #define CONFIG_X86_HT 1
Missing a similar change for ARM?
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -69,6 +69,19 @@ extern struct domain *dom0;
>              __v;})
>  
>  #define EVTCHNS_PER_BUCKET 512
> +/* N.B. EVTCHNS_PER_PAGE is always powers of 2, use shifts to optimize */
> +#define EVTCHNS_SHIFT (PAGE_SHIFT+BYTE_BITORDER)
> +#define EVTCHNS_PER_PAGE (_AC(1,L) << EVTCHNS_SHIFT)
> +#define EVTCHN_MASK (~(EVTCHNS_PER_PAGE-1))
> +#define EVTCHN_PAGE_NO(chn) ((chn) >> EVTCHNS_SHIFT)
> +#define EVTCHN_OFFSET_IN_PAGE(chn) ((chn) & ~EVTCHN_MASK)
> +
> +#ifndef CONFIG_COMPAT
> +#define EVTCHN_WORD_BITORDER(d) LONG_BITORDER
> +#else
> +#define EVTCHN_WORD_BITORDER(d) (has_32bit_shinfo(d) ? 5 : LONG_BITORDER)
> +#endif
> +
>  #define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
>  struct xen_evtchn_ops;
All these event channel relate definitions look misplaced in
xen/sched.h, now that it''s more than a couple of lines. The first
option for moving them of course is into the source file (when
they''re not being used elsewhere). The next best option is either
xen/events.h or a new header.

Jan

Jan Beulich

2013-Jan-21 16:50 UTC

head link

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

>>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com> wrote:
> +    evtchn_pending = xzalloc_array(xen_pfn_t, r.nr_pages);
> +    evtchn_mask = xzalloc_array(xen_pfn_t, r.nr_pages);
> +    l2sel_mfns = xzalloc_array(xen_pfn_t, r.nr_vcpus);
> +    l2sel_offsets = xzalloc_array(xen_pfn_t, r.nr_vcpus);
While the former two are okay, the latter two aren''t since
this can easily amount to an allocation of more than a page.
This needs to be broken up.

Jan

Jan Beulich

2013-Jan-21 16:53 UTC

head link

Re: [RFC PATCH V2 14/14] Implement 3-level event channel routines

>>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com> wrote:
> @@ -1163,6 +1250,22 @@ static void __evtchn_setup_bitmap_l3(struct domain
*d)
>             sizeof(shared_info(d, evtchn_mask)));
>  }
>  
> +static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l2 = {
> +    .set_pending      = evtchn_set_pending_l2,
> +    .clear_pending    = evtchn_clear_pending_l2,
> +    .unmask           = evtchn_unmask_l2,
> +    .is_pending       = evtchn_is_pending_l2,
> +    .is_masked        = evtchn_is_masked_l2,
> +};
> +
> +static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l3 = {
> +    .set_pending      = evtchn_set_pending_l3,
> +    .clear_pending    = evtchn_clear_pending_l3,
> +    .unmask           = evtchn_unmask_l3,
> +    .is_pending       = evtchn_is_pending_l3,
> +    .is_masked        = evtchn_is_masked_l3,
> +};
> +
>  static long evtchn_register_3level(
>      XEN_GUEST_HANDLE_PARAM(evtchn_register_3level_t) arg)
>  {
Could you arrange for the movement of xen_evtchn_ops_l2 to not
be necessary here (perhaps by adjusting earlier patches)?

Jan
> @@ -1481,14 +1581,6 @@ void notify_via_xen_event_channel(struct domain *ld,
int lport)
>      spin_unlock(&ld->event_lock);
>  }
>  
> -static struct xen_evtchn_ops __read_mostly xen_evtchn_ops_l2 = {
> -    .set_pending      = evtchn_set_pending_l2,
> -    .clear_pending    = evtchn_clear_pending_l2,
> -    .unmask           = evtchn_unmask_l2,
> -    .is_pending       = evtchn_is_pending_l2,
> -    .is_masked        = evtchn_is_masked_l2,
> -};
> -
>  int evtchn_init(struct domain *d)
>  {
>      BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS >
PAGE_SIZE);

Wei Liu

2013-Jan-28 17:21 UTC

head link

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

On Mon, 2013-01-21 at 16:50 +0000, Jan Beulich wrote:> >>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com>
wrote:
> > +    evtchn_pending = xzalloc_array(xen_pfn_t, r.nr_pages);
> > +    evtchn_mask = xzalloc_array(xen_pfn_t, r.nr_pages);
> > +    l2sel_mfns = xzalloc_array(xen_pfn_t, r.nr_vcpus);
> > +    l2sel_offsets = xzalloc_array(xen_pfn_t, r.nr_vcpus);
> 
> While the former two are okay, the latter two aren''t since
> this can easily amount to an allocation of more than a page.
> This needs to be broken up.
> 
Judging from the code, the underlying _xmalloc is able to handle
situation when allocation is more than a page, isn''t it?


Wei.
> Jan
>

Jan Beulich

2013-Jan-29 08:43 UTC

head link

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

>>> On 28.01.13 at 18:21, Wei Liu <wei.liu2@citrix.com> wrote:
> On Mon, 2013-01-21 at 16:50 +0000, Jan Beulich wrote:
>> >>> On 21.01.13 at 15:30, Wei Liu <wei.liu2@citrix.com>
wrote:
>> > +    evtchn_pending = xzalloc_array(xen_pfn_t, r.nr_pages);
>> > +    evtchn_mask = xzalloc_array(xen_pfn_t, r.nr_pages);
>> > +    l2sel_mfns = xzalloc_array(xen_pfn_t, r.nr_vcpus);
>> > +    l2sel_offsets = xzalloc_array(xen_pfn_t, r.nr_vcpus);
>> 
>> While the former two are okay, the latter two aren''t since
>> this can easily amount to an allocation of more than a page.
>> This needs to be broken up.
>> 
> 
> Judging from the code, the underlying _xmalloc is able to handle
> situation when allocation is more than a page, isn''t it?
Oh, yes, it is capable of doing so, and it is fine to call it that
way at boot time. But any such allocation at run time is prone
to fail just because of memory fragmentation (and they are
particularly bad when tmem is active). We''ve gone through
the code several times to eliminate such run time allocations,
so I don''t see us permitting you to re-introduce such.

And you should, just for future consideration, also take note
of xmalloc() allocations being inefficient for exact page size
(or multiple thereof) allocations.

Jan

Wei Liu

2013-Jan-29 18:10 UTC

head link

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

On Tue, 2013-01-29 at 08:43 +0000, Jan Beulich wrote:> >>> On 28.01.13 at 18:21, Wei Liu <wei.liu2@citrix.com>
wrote:
> > On Mon, 2013-01-21 at 16:50 +0000, Jan Beulich wrote:
> >> >>> On 21.01.13 at 15:30, Wei Liu
<wei.liu2@citrix.com> wrote:
> >> > +    evtchn_pending = xzalloc_array(xen_pfn_t, r.nr_pages);
> >> > +    evtchn_mask = xzalloc_array(xen_pfn_t, r.nr_pages);
> >> > +    l2sel_mfns = xzalloc_array(xen_pfn_t, r.nr_vcpus);
> >> > +    l2sel_offsets = xzalloc_array(xen_pfn_t, r.nr_vcpus);
> >> 
> >> While the former two are okay, the latter two aren''t
since
> >> this can easily amount to an allocation of more than a page.
> >> This needs to be broken up.
> >> 
> > 
> > Judging from the code, the underlying _xmalloc is able to handle
> > situation when allocation is more than a page, isn''t it?
> 
> Oh, yes, it is capable of doing so, and it is fine to call it that
> way at boot time. But any such allocation at run time is prone
> to fail just because of memory fragmentation (and they are
> particularly bad when tmem is active). We''ve gone through
So what exactly does tmem do in this case? It will fragment xenheap
eventually?
> the code several times to eliminate such run time allocations,
> so I don''t see us permitting you to re-introduce such.
> 
_xmalloc calls alloc_xenheap_pages if size is bigger than PAGE_SIZE. If
tmem fragments xenheap, which API should I use here?
> And you should, just for future consideration, also take note
> of xmalloc() allocations being inefficient for exact page size
> (or multiple thereof) allocations.
> 
NP.


Wei.
> Jan
>

Jan Beulich

2013-Jan-30 08:04 UTC

head link

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

>>> On 29.01.13 at 19:10, Wei Liu <wei.liu2@citrix.com> wrote:
> On Tue, 2013-01-29 at 08:43 +0000, Jan Beulich wrote:
>> >>> On 28.01.13 at 18:21, Wei Liu <wei.liu2@citrix.com>
wrote:
>> > On Mon, 2013-01-21 at 16:50 +0000, Jan Beulich wrote:
>> >> >>> On 21.01.13 at 15:30, Wei Liu
<wei.liu2@citrix.com> wrote:
>> >> > +    evtchn_pending = xzalloc_array(xen_pfn_t,
r.nr_pages);
>> >> > +    evtchn_mask = xzalloc_array(xen_pfn_t, r.nr_pages);
>> >> > +    l2sel_mfns = xzalloc_array(xen_pfn_t, r.nr_vcpus);
>> >> > +    l2sel_offsets = xzalloc_array(xen_pfn_t,
r.nr_vcpus);
>> >> 
>> >> While the former two are okay, the latter two aren''t
since
>> >> this can easily amount to an allocation of more than a page.
>> >> This needs to be broken up.
>> >> 
>> > 
>> > Judging from the code, the underlying _xmalloc is able to handle
>> > situation when allocation is more than a page, isn''t it?
>> 
>> Oh, yes, it is capable of doing so, and it is fine to call it that
>> way at boot time. But any such allocation at run time is prone
>> to fail just because of memory fragmentation (and they are
>> particularly bad when tmem is active). We''ve gone through
> 
> So what exactly does tmem do in this case? It will fragment xenheap
> eventually?
Fragment or exhaust. In the latter case an allocation request
will trigger it to release _individual_ pages, and hence to have
only a very slim chance to get an allocation request for any
number of contiguous pages to be satisfied.
>> the code several times to eliminate such run time allocations,
>> so I don''t see us permitting you to re-introduce such.
>> 
> 
> _xmalloc calls alloc_xenheap_pages if size is bigger than PAGE_SIZE. If
> tmem fragments xenheap, which API should I use here?
None, you should break up the allocation (if it needs to be this big
in the first place). Considering the use case here, using the
recently added vmap() to map the individual pages into a
contiguous linear range might be an option. If you go that route,
we would then also need to evaluate whether the address range
reserved for this is big enough.

Jan

Xen devel - Jan 2013 - [RFC PATCH V2] Implement 3-level event channel support in Xen

[RFC PATCH V2] Implement 3-level event channel support in Xen

[RFC PATCH V2 01/14] Remove trailing whitespaces in event_channel.c

[RFC PATCH V2 02/14] Remove trailing whitespaces in sched.h

[RFC PATCH V2 03/14] Add evtchn_level in struct domain

[RFC PATCH V2 04/14] Dynamically allocate d->evtchn

[RFC PATCH V2 05/14] Bump EVTCHNS_PER_BUCKET to 512

[RFC PATCH V2 06/14] Add evtchn_is_{pending, masked} and evtchn_clear_pending

[RFC PATCH V2 07/14] Genneralized event channel operations

[RFC PATCH V2 08/14] Define N-level event channel registration interface

[RFC PATCH V2 09/14] Update Xen public header

[RFC PATCH V2 10/14] Add control structures for 3-level event channel

[RFC PATCH V2 11/14] Introduce some macros for event channels

[RFC PATCH V2 12/14] Make NR_EVTCHN_BUCKETS 3-level ready

[RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

[RFC PATCH V2 14/14] Implement 3-level event channel routines

Re: [RFC PATCH V2 07/14] Genneralized event channel operations

Re: [RFC PATCH V2 08/14] Define N-level event channel registration interface

Re: [RFC PATCH V2 09/14] Update Xen public header

Re: [RFC PATCH V2 11/14] Introduce some macros for event channels

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

Re: [RFC PATCH V2 14/14] Implement 3-level event channel routines

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages

Re: [RFC PATCH V2 13/14] Infrastructure for manipulating 3-level event channel pages