thr3ads.net - Xen devel - [Xen-devel] [PATCH] xen: Implement domain runstates [Nov 2010]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2010-Nov-23 11:26 UTC

[Xen-devel] [PATCH] xen: Implement domain runstates

To help simplify the load configuration of servers, I developed the
idea of a "domain runstate".  The "runstate" of a domain is
defined
by the runstate of its constituent vcpus.  The "states" are defined
as follows:

Blocked: All vcpus blocked
Partial run: Some vcpus running, some blocked
Partial contention: Some vcpus waiting for the cpu, some blocked
Full run: All vcpus running
Concurrency hazard: Some vcpus running, some waiting for the cpu
Full contention: All vcpus waiting for the cpu

The idea is that by looking at the amount of time in each state
over the last unit of time, an administrator (or an automated
load-balancing program) can determine whether their vcpu and/or
domain configuration needs to be tweaked.  For example:

If a VM spends the majority of its time in "full run", it may
not have enough vcpus to do its work.

If a VM spends a large amount of time in full contention, the system
is probably too busy, and some workloads should be moved onto other
servers.

If a VM spends a large amount of time in concurrency hazard, it means
that the VM may be wasting a lot of time in spinlocks and other
synchronization (like cache misses).  Either work should be moved off
the machine, or the number of vcpus assigned to the VM should be
reduced.

The state is protected by a lock, but to avoid tricky deadlock
issues, if the lock cannot be acquired, the state is simply not
updated.  Number of lost state updates is recorded.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c
--- a/xen/common/domain.c	Mon Nov 22 19:16:34 2010 +0000
+++ b/xen/common/domain.c	Tue Nov 23 11:25:50 2010 +0000
@@ -238,6 +238,7 @@
     spin_lock_init_prof(d, domain_lock);
     spin_lock_init_prof(d, page_alloc_lock);
     spin_lock_init(&d->hypercall_deadlock_mutex);
+    spin_lock_init(&d->runstate_lock);
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
 
diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c
--- a/xen/common/domctl.c	Mon Nov 22 19:16:34 2010 +0000
+++ b/xen/common/domctl.c	Tue Nov 23 11:25:50 2010 +0000
@@ -970,6 +970,24 @@
     }
     break;
 
+    case XEN_DOMCTL_get_runstate_info:
+    {
+        struct domain *d;
+
+        ret = -ESRCH;
+        d = rcu_lock_domain_by_id(op->domain);
+        if ( d != NULL )
+        {
+            domain_runstate_get(d, &op->u.domain_runstate);
+            ret = 0;
+
+            rcu_unlock_domain(d);
+
+            if ( copy_to_guest(u_domctl, op, 1) )
+                ret = -EFAULT;
+        }
+        break;
+    }
     default:
         ret = arch_do_domctl(op, u_domctl);
         break;
diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c
--- a/xen/common/schedule.c	Mon Nov 22 19:16:34 2010 +0000
+++ b/xen/common/schedule.c	Tue Nov 23 11:25:50 2010 +0000
@@ -135,10 +135,31 @@
     }
 }
 
+/* Used to quickly map the vcpu runstate mask to a domain runstate */
+static int mask_to_state[] = {
+    /* 000: Nothing in any runstate.  Should never happen. */
+    -1,
+    /* 001: All running */
+    DOMAIN_RUNSTATE_full_run,
+    /* 010: All runnable */
+    DOMAIN_RUNSTATE_full_contention,
+    /* 011: Some running, some runnable */
+    DOMAIN_RUNSTATE_concurrency_hazard,
+    /* 100: All blocked / offline */
+    DOMAIN_RUNSTATE_blocked,
+    /* 101: Some running, some blocked / offline */
+    DOMAIN_RUNSTATE_partial_run,
+    /* 110: Some blocked / offline, some runnable */
+    DOMAIN_RUNSTATE_partial_contention,
+    /* 111: Some in every state.  Mixed running + runnable is most important.
*/
+    DOMAIN_RUNSTATE_concurrency_hazard
+};
+
 static inline void vcpu_runstate_change(
     struct vcpu *v, int new_state, s_time_t new_entry_time)
 {
     s_time_t delta;
+    struct domain *d = v->domain;
 
     ASSERT(v->runstate.state != new_state);
    
ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
@@ -155,6 +176,37 @@
     }
 
     v->runstate.state = new_state;
+
+    /* Update domain runstate */
+    if(spin_trylock(&d->runstate_lock)) {
+        unsigned mask=0;
+        struct vcpu *ov;
+        
+        BUG_ON(d->runstate.state > DOMAIN_RUNSTATE_partial_contention);
+        
+        d->runstate.time[d->runstate.state] ++            (new_entry_time
- d->runstate.state_entry_time);
+        d->runstate.state_entry_time = new_entry_time;
+        
+        /* Determine new runstate.  First, see what states we have */
+        for_each_vcpu(d, ov) {
+            /* Don''t count vcpus that have beent taken offline by the
guest */
+            if(! (ov->runstate.state == RUNSTATE_offline
+                  && test_bit(_VPF_down, &ov->pause_flags)) )
+                mask |= (1 << ov->runstate.state);
+        }
+        
+        BUG_ON(mask == 0);
+        
+        /* Offline & blocked are the same */
+        mask |= ((1 << RUNSTATE_offline) & mask) >> 1;
+        
+        d->runstate.state = mask_to_state[mask&0x7];
+        
+        spin_unlock(&d->runstate_lock);
+    } else {
+        atomic_inc(&d->runstate_missed_changes);
+    }
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
@@ -173,6 +225,18 @@
         vcpu_schedule_unlock_irq(v);
 }
 
+void domain_runstate_get(struct domain *d,
+                         domain_runstate_info_t *runstate)
+{
+    spin_lock(&d->runstate_lock);
+
+    memcpy(runstate, &d->runstate, sizeof(*runstate));
+    runstate->time[d->runstate.state] += NOW() -
runstate->state_entry_time;
+    runstate->missed_changes =
atomic_read(&d->runstate_missed_changes);
+
+    spin_unlock(&d->runstate_lock);
+}
+
 uint64_t get_cpu_idle_time(unsigned int cpu)
 {
     struct vcpu_runstate_info state;
diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h
--- a/xen/include/public/domctl.h	Mon Nov 22 19:16:34 2010 +0000
+++ b/xen/include/public/domctl.h	Tue Nov 23 11:25:50 2010 +0000
@@ -806,6 +806,46 @@
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t);
 #endif
 
+/*
+ * Return information about the state and running time of a domain.
+ * The "domain runstate" is based on the runstates of all the vcpus
of the
+ * domain (see below).
+ * @extra_arg == pointer to domain_runstate_info structure.
+ */
+struct xen_domctl_runstate_info {
+    /* Domain''s''s current state (DOMAIN_RUNSTATE_*). */
+    uint32_t      state;
+    /* Number of times we missed an update due to contention */
+    uint32_t missed_changes;
+    /* When was current state entered (system time, ns)? */
+    uint64_t state_entry_time;
+    /*
+     * Time spent in each RUNSTATE_* (ns). The sum of these times is
+     * NOT guaranteed not to drift from system time.
+     */
+    uint64_t time[6];
+};
+typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t);
+
+/* All vcpus are running */
+#define DOMAIN_RUNSTATE_full_run           0
+
+/* All vcpus are runnable (i.e., waiting for cpu) */
+#define DOMAIN_RUNSTATE_full_contention    1
+
+/* Some vcpus are running, some are runnable */
+#define DOMAIN_RUNSTATE_concurrency_hazard 2
+
+/* All vcpus are blocked / offline */
+#define DOMAIN_RUNSTATE_blocked            3
+
+/* Some vpcus are running, some are blocked */
+#define DOMAIN_RUNSTATE_partial_run        4
+
+/* Some vcpus are runnable, some are blocked */
+#define DOMAIN_RUNSTATE_partial_contention 5
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -868,6 +908,7 @@
 #define XEN_DOMCTL_getpageframeinfo3             61
 #define XEN_DOMCTL_setvcpuextstate               62
 #define XEN_DOMCTL_getvcpuextstate               63
+#define XEN_DOMCTL_get_runstate_info             64
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -923,6 +964,7 @@
         struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
+        struct xen_domctl_runstate_info     domain_runstate;
         uint8_t                             pad[128];
     } u;
 };
diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h
--- a/xen/include/xen/sched.h	Mon Nov 22 19:16:34 2010 +0000
+++ b/xen/include/xen/sched.h	Tue Nov 23 11:25:50 2010 +0000
@@ -200,6 +200,8 @@
     int xen_port;
 };
 
+typedef struct xen_domctl_runstate_info domain_runstate_info_t;
+
 struct domain
 {
     domid_t          domain_id;
@@ -332,6 +334,11 @@
     nodemask_t node_affinity;
     unsigned int last_alloc_node;
     spinlock_t node_affinity_lock;
+
+    /* Domain runstate tracking */
+    spinlock_t runstate_lock;
+    atomic_t runstate_missed_changes;
+    domain_runstate_info_t runstate;
 };
 
 struct domain_setup_info
@@ -614,6 +621,7 @@
 int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity);
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
+void domain_runstate_get(struct domain *d, domain_runstate_info_t *runstate);
 uint64_t get_cpu_idle_time(unsigned int cpu);
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-23 12:51 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Is this really any more useful than just pulling all the individual
vcpustate records out via domctl (not supported yet except in a digested
form, but easily added), and then aggregating in userspace? It doesn''t
look
like it. Also, I would never take a patch for a new control interface
without the thing that uses it being supplied at the same time; Otherwise
you can just as well maintain it out of tree.

 -- Keir

On 23/11/2010 11:26, "George Dunlap"
<george.dunlap@eu.citrix.com> wrote:
> To help simplify the load configuration of servers, I developed the
> idea of a "domain runstate".  The "runstate" of a
domain is defined
> by the runstate of its constituent vcpus.  The "states" are
defined
> as follows:
> 
> Blocked: All vcpus blocked
> Partial run: Some vcpus running, some blocked
> Partial contention: Some vcpus waiting for the cpu, some blocked
> Full run: All vcpus running
> Concurrency hazard: Some vcpus running, some waiting for the cpu
> Full contention: All vcpus waiting for the cpu
> 
> The idea is that by looking at the amount of time in each state
> over the last unit of time, an administrator (or an automated
> load-balancing program) can determine whether their vcpu and/or
> domain configuration needs to be tweaked.  For example:
> 
> If a VM spends the majority of its time in "full run", it may
> not have enough vcpus to do its work.
> 
> If a VM spends a large amount of time in full contention, the system
> is probably too busy, and some workloads should be moved onto other
> servers.
> 
> If a VM spends a large amount of time in concurrency hazard, it means
> that the VM may be wasting a lot of time in spinlocks and other
> synchronization (like cache misses).  Either work should be moved off
> the machine, or the number of vcpus assigned to the VM should be
> reduced.
> 
> The state is protected by a lock, but to avoid tricky deadlock
> issues, if the lock cannot be acquired, the state is simply not
> updated.  Number of lost state updates is recorded.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> 
> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c
> --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000
> +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000
> @@ -238,6 +238,7 @@
>      spin_lock_init_prof(d, domain_lock);
>      spin_lock_init_prof(d, page_alloc_lock);
>      spin_lock_init(&d->hypercall_deadlock_mutex);
> +    spin_lock_init(&d->runstate_lock);
>      INIT_PAGE_LIST_HEAD(&d->page_list);
>      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
>  
> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c
> --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000
> +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000
> @@ -970,6 +970,24 @@
>      }
>      break;
>  
> +    case XEN_DOMCTL_get_runstate_info:
> +    {
> +        struct domain *d;
> +
> +        ret = -ESRCH;
> +        d = rcu_lock_domain_by_id(op->domain);
> +        if ( d != NULL )
> +        {
> +            domain_runstate_get(d, &op->u.domain_runstate);
> +            ret = 0;
> +
> +            rcu_unlock_domain(d);
> +
> +            if ( copy_to_guest(u_domctl, op, 1) )
> +                ret = -EFAULT;
> +        }
> +        break;
> +    }
>      default:
>          ret = arch_do_domctl(op, u_domctl);
>          break;
> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c
> --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000
> +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000
> @@ -135,10 +135,31 @@
>      }
>  }
>  
> +/* Used to quickly map the vcpu runstate mask to a domain runstate */
> +static int mask_to_state[] = {
> +    /* 000: Nothing in any runstate.  Should never happen. */
> +    -1,
> +    /* 001: All running */
> +    DOMAIN_RUNSTATE_full_run,
> +    /* 010: All runnable */
> +    DOMAIN_RUNSTATE_full_contention,
> +    /* 011: Some running, some runnable */
> +    DOMAIN_RUNSTATE_concurrency_hazard,
> +    /* 100: All blocked / offline */
> +    DOMAIN_RUNSTATE_blocked,
> +    /* 101: Some running, some blocked / offline */
> +    DOMAIN_RUNSTATE_partial_run,
> +    /* 110: Some blocked / offline, some runnable */
> +    DOMAIN_RUNSTATE_partial_contention,
> +    /* 111: Some in every state.  Mixed running + runnable is most
important.
> */
> +    DOMAIN_RUNSTATE_concurrency_hazard
> +};
> +
>  static inline void vcpu_runstate_change(
>      struct vcpu *v, int new_state, s_time_t new_entry_time)
>  {
>      s_time_t delta;
> +    struct domain *d = v->domain;
>  
>      ASSERT(v->runstate.state != new_state);
>      
>
ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
> @@ -155,6 +176,37 @@
>      }
>  
>      v->runstate.state = new_state;
> +
> +    /* Update domain runstate */
> +    if(spin_trylock(&d->runstate_lock)) {
> +        unsigned mask=0;
> +        struct vcpu *ov;
> +        
> +        BUG_ON(d->runstate.state >
DOMAIN_RUNSTATE_partial_contention);
> +        
> +        d->runstate.time[d->runstate.state] +> +           
(new_entry_time - d->runstate.state_entry_time);
> +        d->runstate.state_entry_time = new_entry_time;
> +        
> +        /* Determine new runstate.  First, see what states we have */
> +        for_each_vcpu(d, ov) {
> +            /* Don''t count vcpus that have beent taken offline by
the guest
> */ 
> +            if(! (ov->runstate.state == RUNSTATE_offline
> +                  && test_bit(_VPF_down, &ov->pause_flags))
)
> +                mask |= (1 << ov->runstate.state);
> +        }
> +        
> +        BUG_ON(mask == 0);
> +        
> +        /* Offline & blocked are the same */
> +        mask |= ((1 << RUNSTATE_offline) & mask) >> 1;
> +        
> +        d->runstate.state = mask_to_state[mask&0x7];
> +        
> +        spin_unlock(&d->runstate_lock);
> +    } else {
> +        atomic_inc(&d->runstate_missed_changes);
> +    }
>  }
>  
>  void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
*runstate)
> @@ -173,6 +225,18 @@
>          vcpu_schedule_unlock_irq(v);
>  }
>  
> +void domain_runstate_get(struct domain *d,
> +                         domain_runstate_info_t *runstate)
> +{
> +    spin_lock(&d->runstate_lock);
> +
> +    memcpy(runstate, &d->runstate, sizeof(*runstate));
> +    runstate->time[d->runstate.state] += NOW() -
runstate->state_entry_time;
> +    runstate->missed_changes =
atomic_read(&d->runstate_missed_changes);
> +
> +    spin_unlock(&d->runstate_lock);
> +}
> +
>  uint64_t get_cpu_idle_time(unsigned int cpu)
>  {
>      struct vcpu_runstate_info state;
> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h
> --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000
> +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000
> @@ -806,6 +806,46 @@
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t);
>  #endif
>  
> +/*
> + * Return information about the state and running time of a domain.
> + * The "domain runstate" is based on the runstates of all the
vcpus of the
> + * domain (see below).
> + * @extra_arg == pointer to domain_runstate_info structure.
> + */
> +struct xen_domctl_runstate_info {
> +    /* Domain''s''s current state (DOMAIN_RUNSTATE_*). */
> +    uint32_t      state;
> +    /* Number of times we missed an update due to contention */
> +    uint32_t missed_changes;
> +    /* When was current state entered (system time, ns)? */
> +    uint64_t state_entry_time;
> +    /*
> +     * Time spent in each RUNSTATE_* (ns). The sum of these times is
> +     * NOT guaranteed not to drift from system time.
> +     */
> +    uint64_t time[6];
> +};
> +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t);
> +
> +/* All vcpus are running */
> +#define DOMAIN_RUNSTATE_full_run           0
> +
> +/* All vcpus are runnable (i.e., waiting for cpu) */
> +#define DOMAIN_RUNSTATE_full_contention    1
> +
> +/* Some vcpus are running, some are runnable */
> +#define DOMAIN_RUNSTATE_concurrency_hazard 2
> +
> +/* All vcpus are blocked / offline */
> +#define DOMAIN_RUNSTATE_blocked            3
> +
> +/* Some vpcus are running, some are blocked */
> +#define DOMAIN_RUNSTATE_partial_run        4
> +
> +/* Some vcpus are runnable, some are blocked */
> +#define DOMAIN_RUNSTATE_partial_contention 5
> +
>  struct xen_domctl {
>      uint32_t cmd;
>  #define XEN_DOMCTL_createdomain                   1
> @@ -868,6 +908,7 @@
>  #define XEN_DOMCTL_getpageframeinfo3             61
>  #define XEN_DOMCTL_setvcpuextstate               62
>  #define XEN_DOMCTL_getvcpuextstate               63
> +#define XEN_DOMCTL_get_runstate_info             64
>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> @@ -923,6 +964,7 @@
>          struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
>          struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
>          struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
> +        struct xen_domctl_runstate_info     domain_runstate;
>          uint8_t                             pad[128];
>      } u;
>  };
> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h
> --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000
> +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000
> @@ -200,6 +200,8 @@
>      int xen_port;
>  };
>  
> +typedef struct xen_domctl_runstate_info domain_runstate_info_t;
> +
>  struct domain
>  {
>      domid_t          domain_id;
> @@ -332,6 +334,11 @@
>      nodemask_t node_affinity;
>      unsigned int last_alloc_node;
>      spinlock_t node_affinity_lock;
> +
> +    /* Domain runstate tracking */
> +    spinlock_t runstate_lock;
> +    atomic_t runstate_missed_changes;
> +    domain_runstate_info_t runstate;
>  };
>  
>  struct domain_setup_info
> @@ -614,6 +621,7 @@
>  int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity);
>  
>  void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
*runstate);
> +void domain_runstate_get(struct domain *d, domain_runstate_info_t
*runstate);
>  uint64_t get_cpu_idle_time(unsigned int cpu);
>  
>  /*
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Nov-23 14:23 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On 23/11/10 12:51, Keir Fraser wrote:> Is this really any more useful than just pulling all the individual
> vcpustate records out via domctl (not supported yet except in a digested
> form, but easily added), and then aggregating in userspace? It
doesn''t look
> like it.
There is a difference: how the various runstates of the vcpus were 
correlated.

So for example, suppose you have 2 vcpus with the following runstates:
v0: running 50%, runnable 25%, blocked 25%
v1: running 50%, runnable 25%, blocked 25%

That could break down like this:
full run: 25%
concurrency hazard: 50%
blocked: 25%

Or like this:
full run: 5%
concurrency hazard: 50%
partial run: 40%
blocked: 5%

Or like this:
full contention: 20%
full run: 50%
concurrency hazard: 5%
blocked: 25%

To me those tell very different stories about what the VM''s workload is
like and what it''s getting cpu-wise from Xen.

 > Also, I would never take a patch for a new control
interface> without the thing that uses it being supplied at the same time; Otherwise
> you can just as well maintain it out of tree.
Would "the thing that uses it" be a libxc call, or an actual program 
using the data (like xentop)?

FWIW, xapi uses this call in XenServer (just for customer reporting). 
We could carry it out-of-tree, but we''re trying to minimize the 
patchqueue.  But if modifying xentop to be able to report these was a 
requisite for acceptance, I''d make an effort. :-)

  -George
>
>   -- Keir
>
> On 23/11/2010 11:26, "George
Dunlap"<george.dunlap@eu.citrix.com>  wrote:
>
>> To help simplify the load configuration of servers, I developed the
>> idea of a "domain runstate".  The "runstate" of a
domain is defined
>> by the runstate of its constituent vcpus.  The "states" are
defined
>> as follows:
>>
>> Blocked: All vcpus blocked
>> Partial run: Some vcpus running, some blocked
>> Partial contention: Some vcpus waiting for the cpu, some blocked
>> Full run: All vcpus running
>> Concurrency hazard: Some vcpus running, some waiting for the cpu
>> Full contention: All vcpus waiting for the cpu
>>
>> The idea is that by looking at the amount of time in each state
>> over the last unit of time, an administrator (or an automated
>> load-balancing program) can determine whether their vcpu and/or
>> domain configuration needs to be tweaked.  For example:
>>
>> If a VM spends the majority of its time in "full run", it may
>> not have enough vcpus to do its work.
>>
>> If a VM spends a large amount of time in full contention, the system
>> is probably too busy, and some workloads should be moved onto other
>> servers.
>>
>> If a VM spends a large amount of time in concurrency hazard, it means
>> that the VM may be wasting a lot of time in spinlocks and other
>> synchronization (like cache misses).  Either work should be moved off
>> the machine, or the number of vcpus assigned to the VM should be
>> reduced.
>>
>> The state is protected by a lock, but to avoid tricky deadlock
>> issues, if the lock cannot be acquired, the state is simply not
>> updated.  Number of lost state updates is recorded.
>>
>> Signed-off-by: George Dunlap<george.dunlap@eu.citrix.com>
>>
>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c
>> --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000
>> +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000
>> @@ -238,6 +238,7 @@
>>       spin_lock_init_prof(d, domain_lock);
>>       spin_lock_init_prof(d, page_alloc_lock);
>>       spin_lock_init(&d->hypercall_deadlock_mutex);
>> +    spin_lock_init(&d->runstate_lock);
>>       INIT_PAGE_LIST_HEAD(&d->page_list);
>>       INIT_PAGE_LIST_HEAD(&d->xenpage_list);
>>
>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c
>> --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000
>> +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000
>> @@ -970,6 +970,24 @@
>>       }
>>       break;
>>
>> +    case XEN_DOMCTL_get_runstate_info:
>> +    {
>> +        struct domain *d;
>> +
>> +        ret = -ESRCH;
>> +        d = rcu_lock_domain_by_id(op->domain);
>> +        if ( d != NULL )
>> +        {
>> +            domain_runstate_get(d,&op->u.domain_runstate);
>> +            ret = 0;
>> +
>> +            rcu_unlock_domain(d);
>> +
>> +            if ( copy_to_guest(u_domctl, op, 1) )
>> +                ret = -EFAULT;
>> +        }
>> +        break;
>> +    }
>>       default:
>>           ret = arch_do_domctl(op, u_domctl);
>>           break;
>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c
>> --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000
>> +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000
>> @@ -135,10 +135,31 @@
>>       }
>>   }
>>
>> +/* Used to quickly map the vcpu runstate mask to a domain runstate */
>> +static int mask_to_state[] = {
>> +    /* 000: Nothing in any runstate.  Should never happen. */
>> +    -1,
>> +    /* 001: All running */
>> +    DOMAIN_RUNSTATE_full_run,
>> +    /* 010: All runnable */
>> +    DOMAIN_RUNSTATE_full_contention,
>> +    /* 011: Some running, some runnable */
>> +    DOMAIN_RUNSTATE_concurrency_hazard,
>> +    /* 100: All blocked / offline */
>> +    DOMAIN_RUNSTATE_blocked,
>> +    /* 101: Some running, some blocked / offline */
>> +    DOMAIN_RUNSTATE_partial_run,
>> +    /* 110: Some blocked / offline, some runnable */
>> +    DOMAIN_RUNSTATE_partial_contention,
>> +    /* 111: Some in every state.  Mixed running + runnable is most
important.
>> */
>> +    DOMAIN_RUNSTATE_concurrency_hazard
>> +};
>> +
>>   static inline void vcpu_runstate_change(
>>       struct vcpu *v, int new_state, s_time_t new_entry_time)
>>   {
>>       s_time_t delta;
>> +    struct domain *d = v->domain;
>>
>>       ASSERT(v->runstate.state != new_state);
>>
>>
ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
>> @@ -155,6 +176,37 @@
>>       }
>>
>>       v->runstate.state = new_state;
>> +
>> +    /* Update domain runstate */
>> +    if(spin_trylock(&d->runstate_lock)) {
>> +        unsigned mask=0;
>> +        struct vcpu *ov;
>> +
>> +        BUG_ON(d->runstate.state> 
DOMAIN_RUNSTATE_partial_contention);
>> +
>> +        d->runstate.time[d->runstate.state] +>> +         
(new_entry_time - d->runstate.state_entry_time);
>> +        d->runstate.state_entry_time = new_entry_time;
>> +
>> +        /* Determine new runstate.  First, see what states we have */
>> +        for_each_vcpu(d, ov) {
>> +            /* Don''t count vcpus that have beent taken
offline by the guest
>> */
>> +            if(! (ov->runstate.state == RUNSTATE_offline
>> +&&  test_bit(_VPF_down,&ov->pause_flags)) )
>> +                mask |= (1<<  ov->runstate.state);
>> +        }
>> +
>> +        BUG_ON(mask == 0);
>> +
>> +        /* Offline&  blocked are the same */
>> +        mask |= ((1<<  RUNSTATE_offline)&  mask)>>  1;
>> +
>> +        d->runstate.state = mask_to_state[mask&0x7];
>> +
>> +        spin_unlock(&d->runstate_lock);
>> +    } else {
>> +        atomic_inc(&d->runstate_missed_changes);
>> +    }
>>   }
>>
>>   void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
*runstate)
>> @@ -173,6 +225,18 @@
>>           vcpu_schedule_unlock_irq(v);
>>   }
>>
>> +void domain_runstate_get(struct domain *d,
>> +                         domain_runstate_info_t *runstate)
>> +{
>> +    spin_lock(&d->runstate_lock);
>> +
>> +    memcpy(runstate,&d->runstate, sizeof(*runstate));
>> +    runstate->time[d->runstate.state] += NOW() -
runstate->state_entry_time;
>> +    runstate->missed_changes =
atomic_read(&d->runstate_missed_changes);
>> +
>> +    spin_unlock(&d->runstate_lock);
>> +}
>> +
>>   uint64_t get_cpu_idle_time(unsigned int cpu)
>>   {
>>       struct vcpu_runstate_info state;
>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h
>> --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000
>> +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000
>> @@ -806,6 +806,46 @@
>>   DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t);
>>   #endif
>>
>> +/*
>> + * Return information about the state and running time of a domain.
>> + * The "domain runstate" is based on the runstates of all
the vcpus of the
>> + * domain (see below).
>> + * @extra_arg == pointer to domain_runstate_info structure.
>> + */
>> +struct xen_domctl_runstate_info {
>> +    /* Domain''s''s current state (DOMAIN_RUNSTATE_*).
*/
>> +    uint32_t      state;
>> +    /* Number of times we missed an update due to contention */
>> +    uint32_t missed_changes;
>> +    /* When was current state entered (system time, ns)? */
>> +    uint64_t state_entry_time;
>> +    /*
>> +     * Time spent in each RUNSTATE_* (ns). The sum of these times is
>> +     * NOT guaranteed not to drift from system time.
>> +     */
>> +    uint64_t time[6];
>> +};
>> +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t);
>> +
>> +/* All vcpus are running */
>> +#define DOMAIN_RUNSTATE_full_run           0
>> +
>> +/* All vcpus are runnable (i.e., waiting for cpu) */
>> +#define DOMAIN_RUNSTATE_full_contention    1
>> +
>> +/* Some vcpus are running, some are runnable */
>> +#define DOMAIN_RUNSTATE_concurrency_hazard 2
>> +
>> +/* All vcpus are blocked / offline */
>> +#define DOMAIN_RUNSTATE_blocked            3
>> +
>> +/* Some vpcus are running, some are blocked */
>> +#define DOMAIN_RUNSTATE_partial_run        4
>> +
>> +/* Some vcpus are runnable, some are blocked */
>> +#define DOMAIN_RUNSTATE_partial_contention 5
>> +
>>   struct xen_domctl {
>>       uint32_t cmd;
>>   #define XEN_DOMCTL_createdomain                   1
>> @@ -868,6 +908,7 @@
>>   #define XEN_DOMCTL_getpageframeinfo3             61
>>   #define XEN_DOMCTL_setvcpuextstate               62
>>   #define XEN_DOMCTL_getvcpuextstate               63
>> +#define XEN_DOMCTL_get_runstate_info             64
>>   #define XEN_DOMCTL_gdbsx_guestmemio            1000
>>   #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>>   #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
>> @@ -923,6 +964,7 @@
>>           struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
>>           struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
>>           struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
>> +        struct xen_domctl_runstate_info     domain_runstate;
>>           uint8_t                             pad[128];
>>       } u;
>>   };
>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h
>> --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000
>> +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000
>> @@ -200,6 +200,8 @@
>>       int xen_port;
>>   };
>>
>> +typedef struct xen_domctl_runstate_info domain_runstate_info_t;
>> +
>>   struct domain
>>   {
>>       domid_t          domain_id;
>> @@ -332,6 +334,11 @@
>>       nodemask_t node_affinity;
>>       unsigned int last_alloc_node;
>>       spinlock_t node_affinity_lock;
>> +
>> +    /* Domain runstate tracking */
>> +    spinlock_t runstate_lock;
>> +    atomic_t runstate_missed_changes;
>> +    domain_runstate_info_t runstate;
>>   };
>>
>>   struct domain_setup_info
>> @@ -614,6 +621,7 @@
>>   int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity);
>>
>>   void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
*runstate);
>> +void domain_runstate_get(struct domain *d, domain_runstate_info_t
*runstate);
>>   uint64_t get_cpu_idle_time(unsigned int cpu);
>>
>>   /*
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-23 15:00 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On 23/11/2010 14:23, "George Dunlap"
<George.Dunlap@eu.citrix.com> wrote:
> There is a difference: how the various runstates of the vcpus were
> correlated.
> 
> So for example, suppose you have 2 vcpus with the following runstates:
> v0: running 50%, runnable 25%, blocked 25%
> v1: running 50%, runnable 25%, blocked 25%
> 
> That could break down like this:
> full run: 25%
> concurrency hazard: 50%
> blocked: 25%
> 
> Or like this:
> full run: 5%
> concurrency hazard: 50%
> partial run: 40%
> blocked: 5%
> 
> Or like this:
> full contention: 20%
> full run: 50%
> concurrency hazard: 5%
> blocked: 25%
> 
> To me those tell very different stories about what the VM''s
workload is
> like and what it''s getting cpu-wise from Xen.
The concurrency hazard %age might be useful for telling you how much Xen
scheduling is screwing the VM I suppose. Still I wonder how much of this
could typically be deduced from per-vcpu stats. And how much this improves
load balancer performance beyond very basic per-domain stats. Overall you
can count me as not 100% convinced. But mny own personal opinion on the
pros/cons of this particular toolstack interface is only part of my
argument...
>> Also, I would never take a patch for a new control interface
>> without the thing that uses it being supplied at the same time;
Otherwise
>> you can just as well maintain it out of tree.
> 
> Would "the thing that uses it" be a libxc call, or an actual
program
> using the data (like xentop)?
> 
> FWIW, xapi uses this call in XenServer (just for customer reporting).
> We could carry it out-of-tree, but we''re trying to minimize the
> patchqueue.  But if modifying xentop to be able to report these was a
> requisite for acceptance, I''d make an effort. :-)
Well, actually what we care about is #users-who-care. Plumbing into xentop
to satisfy an acceptance criteria is probably misguided. Reducing
patch-queue length just to make a patch-queue short is also probably
misguided.

I think this patch can live in the XCP patchqueue quite happily. If the
feature proves itself and others want it as a more general facility, for
other toolstacks or whatever, it can be considered for upstream in due
course. There''s no rush.

 -- Keir
>   -George
> 
>> 
>>   -- Keir
>> 
>> On 23/11/2010 11:26, "George
Dunlap"<george.dunlap@eu.citrix.com>  wrote:
>> 
>>> To help simplify the load configuration of servers, I developed the
>>> idea of a "domain runstate".  The "runstate" of
a domain is defined
>>> by the runstate of its constituent vcpus.  The "states"
are defined
>>> as follows:
>>> 
>>> Blocked: All vcpus blocked
>>> Partial run: Some vcpus running, some blocked
>>> Partial contention: Some vcpus waiting for the cpu, some blocked
>>> Full run: All vcpus running
>>> Concurrency hazard: Some vcpus running, some waiting for the cpu
>>> Full contention: All vcpus waiting for the cpu
>>> 
>>> The idea is that by looking at the amount of time in each state
>>> over the last unit of time, an administrator (or an automated
>>> load-balancing program) can determine whether their vcpu and/or
>>> domain configuration needs to be tweaked.  For example:
>>> 
>>> If a VM spends the majority of its time in "full run", it
may
>>> not have enough vcpus to do its work.
>>> 
>>> If a VM spends a large amount of time in full contention, the
system
>>> is probably too busy, and some workloads should be moved onto other
>>> servers.
>>> 
>>> If a VM spends a large amount of time in concurrency hazard, it
means
>>> that the VM may be wasting a lot of time in spinlocks and other
>>> synchronization (like cache misses).  Either work should be moved
off
>>> the machine, or the number of vcpus assigned to the VM should be
>>> reduced.
>>> 
>>> The state is protected by a lock, but to avoid tricky deadlock
>>> issues, if the lock cannot be acquired, the state is simply not
>>> updated.  Number of lost state updates is recorded.
>>> 
>>> Signed-off-by: George Dunlap<george.dunlap@eu.citrix.com>
>>> 
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c
>>> --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000
>>> @@ -238,6 +238,7 @@
>>>       spin_lock_init_prof(d, domain_lock);
>>>       spin_lock_init_prof(d, page_alloc_lock);
>>>       spin_lock_init(&d->hypercall_deadlock_mutex);
>>> +    spin_lock_init(&d->runstate_lock);
>>>       INIT_PAGE_LIST_HEAD(&d->page_list);
>>>       INIT_PAGE_LIST_HEAD(&d->xenpage_list);
>>> 
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c
>>> --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000
>>> @@ -970,6 +970,24 @@
>>>       }
>>>       break;
>>> 
>>> +    case XEN_DOMCTL_get_runstate_info:
>>> +    {
>>> +        struct domain *d;
>>> +
>>> +        ret = -ESRCH;
>>> +        d = rcu_lock_domain_by_id(op->domain);
>>> +        if ( d != NULL )
>>> +        {
>>> +            domain_runstate_get(d,&op->u.domain_runstate);
>>> +            ret = 0;
>>> +
>>> +            rcu_unlock_domain(d);
>>> +
>>> +            if ( copy_to_guest(u_domctl, op, 1) )
>>> +                ret = -EFAULT;
>>> +        }
>>> +        break;
>>> +    }
>>>       default:
>>>           ret = arch_do_domctl(op, u_domctl);
>>>           break;
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c
>>> --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000
>>> @@ -135,10 +135,31 @@
>>>       }
>>>   }
>>> 
>>> +/* Used to quickly map the vcpu runstate mask to a domain runstate
*/
>>> +static int mask_to_state[] = {
>>> +    /* 000: Nothing in any runstate.  Should never happen. */
>>> +    -1,
>>> +    /* 001: All running */
>>> +    DOMAIN_RUNSTATE_full_run,
>>> +    /* 010: All runnable */
>>> +    DOMAIN_RUNSTATE_full_contention,
>>> +    /* 011: Some running, some runnable */
>>> +    DOMAIN_RUNSTATE_concurrency_hazard,
>>> +    /* 100: All blocked / offline */
>>> +    DOMAIN_RUNSTATE_blocked,
>>> +    /* 101: Some running, some blocked / offline */
>>> +    DOMAIN_RUNSTATE_partial_run,
>>> +    /* 110: Some blocked / offline, some runnable */
>>> +    DOMAIN_RUNSTATE_partial_contention,
>>> +    /* 111: Some in every state.  Mixed running + runnable is most
>>> important.
>>> */
>>> +    DOMAIN_RUNSTATE_concurrency_hazard
>>> +};
>>> +
>>>   static inline void vcpu_runstate_change(
>>>       struct vcpu *v, int new_state, s_time_t new_entry_time)
>>>   {
>>>       s_time_t delta;
>>> +    struct domain *d = v->domain;
>>> 
>>>       ASSERT(v->runstate.state != new_state);
>>> 
>>>
ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
>>> @@ -155,6 +176,37 @@
>>>       }
>>> 
>>>       v->runstate.state = new_state;
>>> +
>>> +    /* Update domain runstate */
>>> +    if(spin_trylock(&d->runstate_lock)) {
>>> +        unsigned mask=0;
>>> +        struct vcpu *ov;
>>> +
>>> +        BUG_ON(d->runstate.state> 
DOMAIN_RUNSTATE_partial_contention);
>>> +
>>> +        d->runstate.time[d->runstate.state] +>>> + 
(new_entry_time - d->runstate.state_entry_time);
>>> +        d->runstate.state_entry_time = new_entry_time;
>>> +
>>> +        /* Determine new runstate.  First, see what states we have
*/
>>> +        for_each_vcpu(d, ov) {
>>> +            /* Don''t count vcpus that have beent taken
offline by the guest
>>> */
>>> +            if(! (ov->runstate.state == RUNSTATE_offline
>>> +&&  test_bit(_VPF_down,&ov->pause_flags)) )
>>> +                mask |= (1<<  ov->runstate.state);
>>> +        }
>>> +
>>> +        BUG_ON(mask == 0);
>>> +
>>> +        /* Offline&  blocked are the same */
>>> +        mask |= ((1<<  RUNSTATE_offline)&  mask)>>
1;
>>> +
>>> +        d->runstate.state = mask_to_state[mask&0x7];
>>> +
>>> +        spin_unlock(&d->runstate_lock);
>>> +    } else {
>>> +        atomic_inc(&d->runstate_missed_changes);
>>> +    }
>>>   }
>>> 
>>>   void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
>>> *runstate)
>>> @@ -173,6 +225,18 @@
>>>           vcpu_schedule_unlock_irq(v);
>>>   }
>>> 
>>> +void domain_runstate_get(struct domain *d,
>>> +                         domain_runstate_info_t *runstate)
>>> +{
>>> +    spin_lock(&d->runstate_lock);
>>> +
>>> +    memcpy(runstate,&d->runstate, sizeof(*runstate));
>>> +    runstate->time[d->runstate.state] += NOW() -
>>> runstate->state_entry_time;
>>> +    runstate->missed_changes =
atomic_read(&d->runstate_missed_changes);
>>> +
>>> +    spin_unlock(&d->runstate_lock);
>>> +}
>>> +
>>>   uint64_t get_cpu_idle_time(unsigned int cpu)
>>>   {
>>>       struct vcpu_runstate_info state;
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h
>>> --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000
>>> @@ -806,6 +806,46 @@
>>>   DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t);
>>>   #endif
>>> 
>>> +/*
>>> + * Return information about the state and running time of a
domain.
>>> + * The "domain runstate" is based on the runstates of
all the vcpus of the
>>> + * domain (see below).
>>> + * @extra_arg == pointer to domain_runstate_info structure.
>>> + */
>>> +struct xen_domctl_runstate_info {
>>> +    /* Domain''s''s current state
(DOMAIN_RUNSTATE_*). */
>>> +    uint32_t      state;
>>> +    /* Number of times we missed an update due to contention */
>>> +    uint32_t missed_changes;
>>> +    /* When was current state entered (system time, ns)? */
>>> +    uint64_t state_entry_time;
>>> +    /*
>>> +     * Time spent in each RUNSTATE_* (ns). The sum of these times
is
>>> +     * NOT guaranteed not to drift from system time.
>>> +     */
>>> +    uint64_t time[6];
>>> +};
>>> +typedef struct xen_domctl_runstate_info
xen_domctl_runstate_info_t;
>>> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t);
>>> +
>>> +/* All vcpus are running */
>>> +#define DOMAIN_RUNSTATE_full_run           0
>>> +
>>> +/* All vcpus are runnable (i.e., waiting for cpu) */
>>> +#define DOMAIN_RUNSTATE_full_contention    1
>>> +
>>> +/* Some vcpus are running, some are runnable */
>>> +#define DOMAIN_RUNSTATE_concurrency_hazard 2
>>> +
>>> +/* All vcpus are blocked / offline */
>>> +#define DOMAIN_RUNSTATE_blocked            3
>>> +
>>> +/* Some vpcus are running, some are blocked */
>>> +#define DOMAIN_RUNSTATE_partial_run        4
>>> +
>>> +/* Some vcpus are runnable, some are blocked */
>>> +#define DOMAIN_RUNSTATE_partial_contention 5
>>> +
>>>   struct xen_domctl {
>>>       uint32_t cmd;
>>>   #define XEN_DOMCTL_createdomain                   1
>>> @@ -868,6 +908,7 @@
>>>   #define XEN_DOMCTL_getpageframeinfo3             61
>>>   #define XEN_DOMCTL_setvcpuextstate               62
>>>   #define XEN_DOMCTL_getvcpuextstate               63
>>> +#define XEN_DOMCTL_get_runstate_info             64
>>>   #define XEN_DOMCTL_gdbsx_guestmemio            1000
>>>   #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>>>   #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
>>> @@ -923,6 +964,7 @@
>>>           struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
>>>           struct xen_domctl_gdbsx_pauseunp_vcpu
gdbsx_pauseunp_vcpu;
>>>           struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
>>> +        struct xen_domctl_runstate_info     domain_runstate;
>>>           uint8_t                             pad[128];
>>>       } u;
>>>   };
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h
>>> --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000
>>> @@ -200,6 +200,8 @@
>>>       int xen_port;
>>>   };
>>> 
>>> +typedef struct xen_domctl_runstate_info domain_runstate_info_t;
>>> +
>>>   struct domain
>>>   {
>>>       domid_t          domain_id;
>>> @@ -332,6 +334,11 @@
>>>       nodemask_t node_affinity;
>>>       unsigned int last_alloc_node;
>>>       spinlock_t node_affinity_lock;
>>> +
>>> +    /* Domain runstate tracking */
>>> +    spinlock_t runstate_lock;
>>> +    atomic_t runstate_missed_changes;
>>> +    domain_runstate_info_t runstate;
>>>   };
>>> 
>>>   struct domain_setup_info
>>> @@ -614,6 +621,7 @@
>>>   int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity);
>>> 
>>>   void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
>>> *runstate);
>>> +void domain_runstate_get(struct domain *d, domain_runstate_info_t
>>> *runstate);
>>>   uint64_t get_cpu_idle_time(unsigned int cpu);
>>> 
>>>   /*
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>> 
>> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Nov-23 15:48 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On 23/11/10 15:00, Keir Fraser wrote:> Well, actually what we care about is #users-who-care. Plumbing into xentop
> to satisfy an acceptance criteria is probably misguided. Reducing
> patch-queue length just to make a patch-queue short is also probably
> misguided.
I wouldn''t offer to put it in xentop just because I wanted to get the
patch out of the patchqueue. I think it''s a useful feature, that many
people would use if they knew about it. If I didn''t think it was
useful, by far the simplest way to get it out of the patchqueue would be
to simply remove the functionality from the next release of XenServer.

I think the "wait for people to ask for it" won''t work very
well for
this particular feature, because it''s not the kind of thing I think
most
people would think to ask for. A bit like tmem -- I think it''s a
solution to a problem that people have that''s very non-obvious, and
must
therefore be suggested to people rather than waiting for them to ask for
it. I could write a blog post about it or otherwise try to promote the
idea, but how many people are going to download a patch in order to try
it out?

(I''m aware the difference is that tmem''s effectiveness can be
measured
with benchmarks while this feature cannot; I''m just saying that in both
cases, you''d be waiting for a long time for people to demand it.)

xenalyze will calculate and report domain runstate numbers based on vcpu
runstate_change records, and I''ve found it useful to get a rough idea
what''s going on when I get traces in from the field.

Of course there''s no rush, but it''s likely to be forgotten
about for
another two years. :-) (It''s already been in the XS patchqueue since
September 2008.)

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2010-Nov-23 18:41 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On Tue, Nov 23, 2010 at 03:48:33PM +0000, George Dunlap
wrote:> On 23/11/10 15:00, Keir Fraser wrote:
>> Well, actually what we care about is #users-who-care. Plumbing into
xentop
>> to satisfy an acceptance criteria is probably misguided. Reducing
>> patch-queue length just to make a patch-queue short is also probably
>> misguided.
>
> I wouldn''t offer to put it in xentop just because I wanted to get
the
> patch out of the patchqueue.  I think it''s a useful feature, that
many
> people would use if they knew about it.  If I didn''t think it was
> useful, by far the simplest way to get it out of the patchqueue would be  
> to simply remove the functionality from the next release of XenServer.
>
> I think the "wait for people to ask for it" won''t work
very well for
> this particular feature, because it''s not the kind of thing I
think most
> people would think to ask for.  A bit like tmem -- I think it''s a
> solution to a problem that people have that''s very non-obvious,
and must
> therefore be suggested to people rather than waiting for them to ask for  
> it.  I could write a blog post about it or otherwise try to promote the  
> idea, but how many people are going to download a patch in order to try  
> it out?
>
> (I''m aware the difference is that tmem''s effectiveness
can be measured
> with benchmarks while this feature cannot; I''m just saying that in
both
> cases, you''d be waiting for a long time for people to demand it.)
>
> xenalyze will calculate and report domain runstate numbers based on vcpu  
> runstate_change records, and I''ve found it useful to get a rough
idea
> what''s going on when I get traces in from the field.
>
> Of course there''s no rush, but it''s likely to be
forgotten about for
> another two years. :-)  (It''s already been in the XS patchqueue
since
> September 2008.)
>
This feature definitely sounds useful to me :)

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Jackson

2010-Nov-25 19:11 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

George Dunlap writes ("Re: [Xen-devel] [PATCH] xen: Implement domain
runstates"):> I wouldn''t offer to put it in xentop just because I wanted to get
the
> patch out of the patchqueue.  I think it''s a useful feature, that
many
> people would use if they knew about it.  If I didn''t think it was 
> useful, by far the simplest way to get it out of the patchqueue would be 
> to simply remove the functionality from the next release of XenServer.
FWIW I think this sounds like a useful thing to have in xentop on its
own account, not just to satisfy Keir''s (sensible) requirement.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Nov-25 20:02 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On Tue, Nov 23, 2010 at 7:48 AM, George Dunlap
<George.Dunlap@eu.citrix.com>wrote:
> On 23/11/10 15:00, Keir Fraser wrote:
>
>> Well, actually what we care about is #users-who-care. Plumbing into
xentop
>> to satisfy an acceptance criteria is probably misguided. Reducing
>> patch-queue length just to make a patch-queue short is also probably
>> misguided.
>>
>
> I wouldn''t offer to put it in xentop just because I wanted to get
the patch
> out of the patchqueue.  I think it''s a useful feature, that many
people
> would use if they knew about it.  If I didn''t think it was useful,
by far
> the simplest way to get it out of the patchqueue would be to simply remove
> the functionality from the next release of XenServer.
>
> I think the "wait for people to ask for it" won''t work
very well for this
> particular feature, because it''s not the kind of thing I think
most people
> would think to ask for.  A bit like tmem -- I think it''s a
solution to a
> problem that people have that''s very non-obvious, and must
therefore be
> suggested to people rather than waiting for them to ask for it.  I could
> write a blog post about it or otherwise try to promote the idea, but how
> many people are going to download a patch in order to try it out?
>
Irrespective of the nature of the suggested functionality, I do agree that
xen users will be more likely to try something if it''s part of the
default
build, especially if the benefit is not immediately obvious. With all the
complexity associated with any virtualization project, I think that many
integrators don''t have the time to experiment with every possible
option
included in the "off the shelf" Xen, not to mention those that require
explicit patching and manual inclusion.
That said I would try this as a "VCPU allocation suitability" metric
for any
xen system. The ability to see this data ''at a glance'' would
be beneficial
to admins and system architects.
A lot of data is available if you know how to get it. That is very different
from presenting that data in an easily digestible format requiring no work
to use.

-Bruce


>
> (I''m aware the difference is that tmem''s effectiveness
can be measured with
> benchmarks while this feature cannot; I''m just saying that in both
cases,
> you''d be waiting for a long time for people to demand it.)
>
> xenalyze will calculate and report domain runstate numbers based on vcpu
> runstate_change records, and I''ve found it useful to get a rough
idea what''s
> going on when I get traces in from the field.
>
> Of course there''s no rush, but it''s likely to be
forgotten about for
> another two years. :-)  (It''s already been in the XS patchqueue
since
> September 2008.)
>
>  -George
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-25 22:10 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On 25/11/2010 19:11, "Ian Jackson" <Ian.Jackson@eu.citrix.com>
wrote:
> George Dunlap writes ("Re: [Xen-devel] [PATCH] xen: Implement domain
> runstates"):
>> I wouldn''t offer to put it in xentop just because I wanted to
get the
>> patch out of the patchqueue.  I think it''s a useful feature,
that many
>> people would use if they knew about it.  If I didn''t think it
was
>> useful, by far the simplest way to get it out of the patchqueue would
be
>> to simply remove the functionality from the next release of XenServer.
> 
> FWIW I think this sounds like a useful thing to have in xentop on its
> own account, not just to satisfy Keir''s (sensible) requirement.
My further fear about domain runstates is that it adds an extra cost on
every VCPU schedule/deschedule, linear in number of vcpus. We want to keep
that path short, not add extra accounting crap to it.

 -- Keir
> Ian.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-25 22:24 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On 25/11/2010 20:02, "Bruce Edge" <bruce.edge@gmail.com> wrote:
> Irrespective of the nature of the suggested functionality, I do agree that
xen
> users will be more likely to try something if it''s part of the
default build,
> especially if the benefit is not immediately obvious. With all the
complexity
> associated with any virtualization project, I think that many integrators
> don''t have the time to experiment with every possible option
included in the
> "off the shelf" Xen, not to mention those that require explicit
patching and
> manual inclusion.
> That said I would try this as a "VCPU allocation suitability"
metric for any
> xen system. The ability to see this data ''at a glance''
would be beneficial to
> admins and system architects.
> A lot of data is available if you know how to get it. That is very
different
> from presenting that data in an easily digestible format requiring no work
to
> use.
You may be under the impression the feature is free. It isn''t -- it
adds a
lock and loop over all vcpus on every vcpu scheduling/descheduling decision.
That could be frequent for VCPUs doing a lot of short blocking, for example.
And the system-wide work done to support this feature will scale
quadratically on average with vcpus-per-domain. My gut feeling is this
isn''t
a good thing to have on your context switch path.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Nov-25 22:32 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On Thu, Nov 25, 2010 at 2:24 PM, Keir Fraser <keir@xen.org> wrote:
> On 25/11/2010 20:02, "Bruce Edge" <bruce.edge@gmail.com>
wrote:
>
> > Irrespective of the nature of the suggested functionality, I do agree
> that xen
> > users will be more likely to try something if it''s part of
the default
> build,
> > especially if the benefit is not immediately obvious. With all the
> complexity
> > associated with any virtualization project, I think that many
integrators
> > don''t have the time to experiment with every possible option
included in
> the
> > "off the shelf" Xen, not to mention those that require
explicit patching
> and
> > manual inclusion.
> > That said I would try this as a "VCPU allocation
suitability" metric for
> any
> > xen system. The ability to see this data ''at a
glance'' would be
> beneficial to
> > admins and system architects.
> > A lot of data is available if you know how to get it. That is very
> different
> > from presenting that data in an easily digestible format requiring no
> work to
> > use.
>
> You may be under the impression the feature is free. It isn''t --
it adds a
> lock and loop over all vcpus on every vcpu scheduling/descheduling
> decision.
> That could be frequent for VCPUs doing a lot of short blocking, for
> example.
> And the system-wide work done to support this feature will scale
> quadratically on average with vcpus-per-domain. My gut feeling is this
> isn''t
> a good thing to have on your context switch path.
>
>Agreed, probably have to be a compile time option.  That would still give it
more exposure than if it required a patch.

We build debug & release versions of everything, with max instrumentation on
the former and max optimizations on the latter. I could see this being
informative for the debug variant, although that may not be the exact use
case that George was after. I see this more as a tool for confirming
one''s
VCPU allocations on a single system with multiple VMs rather than an admin
tool for monitoring generic virtualization servers.

-Bruce

>  -- Keir
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Nov-26 11:52 UTC

head link

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

On Thu, Nov 25, 2010 at 10:24 PM, Keir Fraser <keir@xen.org>
wrote:> You may be under the impression the feature is free. It isn''t --
it adds a
> lock and loop over all vcpus on every vcpu scheduling/descheduling
decision.
> That could be frequent for VCPUs doing a lot of short blocking, for
example.
> And the system-wide work done to support this feature will scale
> quadratically on average with vcpus-per-domain. My gut feeling is this
isn''t
> a good thing to have on your context switch path.
That''s certainly reasonable.  If I get a chance to do some perf
testing over the next few weeks, I''ll post some performance numbers,
and we can quantify the slowdown (although catching potential cache
"cliffs" may be more difficult).

Would you consider accepting it as a feature off by default, enabled
by a Xen command-line parameter?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2010 - [PATCH] xen: Implement domain runstates

[Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates

Re: [Xen-devel] [PATCH] xen: Implement domain runstates