To help simplify the load configuration of servers, I developed the idea of a "domain runstate". The "runstate" of a domain is defined by the runstate of its constituent vcpus. The "states" are defined as follows: Blocked: All vcpus blocked Partial run: Some vcpus running, some blocked Partial contention: Some vcpus waiting for the cpu, some blocked Full run: All vcpus running Concurrency hazard: Some vcpus running, some waiting for the cpu Full contention: All vcpus waiting for the cpu The idea is that by looking at the amount of time in each state over the last unit of time, an administrator (or an automated load-balancing program) can determine whether their vcpu and/or domain configuration needs to be tweaked. For example: If a VM spends the majority of its time in "full run", it may not have enough vcpus to do its work. If a VM spends a large amount of time in full contention, the system is probably too busy, and some workloads should be moved onto other servers. If a VM spends a large amount of time in concurrency hazard, it means that the VM may be wasting a lot of time in spinlocks and other synchronization (like cache misses). Either work should be moved off the machine, or the number of vcpus assigned to the VM should be reduced. The state is protected by a lock, but to avoid tricky deadlock issues, if the lock cannot be acquired, the state is simply not updated. Number of lost state updates is recorded. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000 +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000 @@ -238,6 +238,7 @@ spin_lock_init_prof(d, domain_lock); spin_lock_init_prof(d, page_alloc_lock); spin_lock_init(&d->hypercall_deadlock_mutex); + spin_lock_init(&d->runstate_lock); INIT_PAGE_LIST_HEAD(&d->page_list); INIT_PAGE_LIST_HEAD(&d->xenpage_list); diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000 +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000 @@ -970,6 +970,24 @@ } break; + case XEN_DOMCTL_get_runstate_info: + { + struct domain *d; + + ret = -ESRCH; + d = rcu_lock_domain_by_id(op->domain); + if ( d != NULL ) + { + domain_runstate_get(d, &op->u.domain_runstate); + ret = 0; + + rcu_unlock_domain(d); + + if ( copy_to_guest(u_domctl, op, 1) ) + ret = -EFAULT; + } + break; + } default: ret = arch_do_domctl(op, u_domctl); break; diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000 +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000 @@ -135,10 +135,31 @@ } } +/* Used to quickly map the vcpu runstate mask to a domain runstate */ +static int mask_to_state[] = { + /* 000: Nothing in any runstate. Should never happen. */ + -1, + /* 001: All running */ + DOMAIN_RUNSTATE_full_run, + /* 010: All runnable */ + DOMAIN_RUNSTATE_full_contention, + /* 011: Some running, some runnable */ + DOMAIN_RUNSTATE_concurrency_hazard, + /* 100: All blocked / offline */ + DOMAIN_RUNSTATE_blocked, + /* 101: Some running, some blocked / offline */ + DOMAIN_RUNSTATE_partial_run, + /* 110: Some blocked / offline, some runnable */ + DOMAIN_RUNSTATE_partial_contention, + /* 111: Some in every state. Mixed running + runnable is most important. */ + DOMAIN_RUNSTATE_concurrency_hazard +}; + static inline void vcpu_runstate_change( struct vcpu *v, int new_state, s_time_t new_entry_time) { s_time_t delta; + struct domain *d = v->domain; ASSERT(v->runstate.state != new_state); ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock)); @@ -155,6 +176,37 @@ } v->runstate.state = new_state; + + /* Update domain runstate */ + if(spin_trylock(&d->runstate_lock)) { + unsigned mask=0; + struct vcpu *ov; + + BUG_ON(d->runstate.state > DOMAIN_RUNSTATE_partial_contention); + + d->runstate.time[d->runstate.state] ++ (new_entry_time - d->runstate.state_entry_time); + d->runstate.state_entry_time = new_entry_time; + + /* Determine new runstate. First, see what states we have */ + for_each_vcpu(d, ov) { + /* Don''t count vcpus that have beent taken offline by the guest */ + if(! (ov->runstate.state == RUNSTATE_offline + && test_bit(_VPF_down, &ov->pause_flags)) ) + mask |= (1 << ov->runstate.state); + } + + BUG_ON(mask == 0); + + /* Offline & blocked are the same */ + mask |= ((1 << RUNSTATE_offline) & mask) >> 1; + + d->runstate.state = mask_to_state[mask&0x7]; + + spin_unlock(&d->runstate_lock); + } else { + atomic_inc(&d->runstate_missed_changes); + } } void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate) @@ -173,6 +225,18 @@ vcpu_schedule_unlock_irq(v); } +void domain_runstate_get(struct domain *d, + domain_runstate_info_t *runstate) +{ + spin_lock(&d->runstate_lock); + + memcpy(runstate, &d->runstate, sizeof(*runstate)); + runstate->time[d->runstate.state] += NOW() - runstate->state_entry_time; + runstate->missed_changes = atomic_read(&d->runstate_missed_changes); + + spin_unlock(&d->runstate_lock); +} + uint64_t get_cpu_idle_time(unsigned int cpu) { struct vcpu_runstate_info state; diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000 +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000 @@ -806,6 +806,46 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t); #endif +/* + * Return information about the state and running time of a domain. + * The "domain runstate" is based on the runstates of all the vcpus of the + * domain (see below). + * @extra_arg == pointer to domain_runstate_info structure. + */ +struct xen_domctl_runstate_info { + /* Domain''s''s current state (DOMAIN_RUNSTATE_*). */ + uint32_t state; + /* Number of times we missed an update due to contention */ + uint32_t missed_changes; + /* When was current state entered (system time, ns)? */ + uint64_t state_entry_time; + /* + * Time spent in each RUNSTATE_* (ns). The sum of these times is + * NOT guaranteed not to drift from system time. + */ + uint64_t time[6]; +}; +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t); + +/* All vcpus are running */ +#define DOMAIN_RUNSTATE_full_run 0 + +/* All vcpus are runnable (i.e., waiting for cpu) */ +#define DOMAIN_RUNSTATE_full_contention 1 + +/* Some vcpus are running, some are runnable */ +#define DOMAIN_RUNSTATE_concurrency_hazard 2 + +/* All vcpus are blocked / offline */ +#define DOMAIN_RUNSTATE_blocked 3 + +/* Some vpcus are running, some are blocked */ +#define DOMAIN_RUNSTATE_partial_run 4 + +/* Some vcpus are runnable, some are blocked */ +#define DOMAIN_RUNSTATE_partial_contention 5 + struct xen_domctl { uint32_t cmd; #define XEN_DOMCTL_createdomain 1 @@ -868,6 +908,7 @@ #define XEN_DOMCTL_getpageframeinfo3 61 #define XEN_DOMCTL_setvcpuextstate 62 #define XEN_DOMCTL_getvcpuextstate 63 +#define XEN_DOMCTL_get_runstate_info 64 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -923,6 +964,7 @@ struct xen_domctl_gdbsx_memio gdbsx_guest_memio; struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; struct xen_domctl_gdbsx_domstatus gdbsx_domstatus; + struct xen_domctl_runstate_info domain_runstate; uint8_t pad[128]; } u; }; diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000 +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000 @@ -200,6 +200,8 @@ int xen_port; }; +typedef struct xen_domctl_runstate_info domain_runstate_info_t; + struct domain { domid_t domain_id; @@ -332,6 +334,11 @@ nodemask_t node_affinity; unsigned int last_alloc_node; spinlock_t node_affinity_lock; + + /* Domain runstate tracking */ + spinlock_t runstate_lock; + atomic_t runstate_missed_changes; + domain_runstate_info_t runstate; }; struct domain_setup_info @@ -614,6 +621,7 @@ int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity); void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate); +void domain_runstate_get(struct domain *d, domain_runstate_info_t *runstate); uint64_t get_cpu_idle_time(unsigned int cpu); /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Is this really any more useful than just pulling all the individual vcpustate records out via domctl (not supported yet except in a digested form, but easily added), and then aggregating in userspace? It doesn''t look like it. Also, I would never take a patch for a new control interface without the thing that uses it being supplied at the same time; Otherwise you can just as well maintain it out of tree. -- Keir On 23/11/2010 11:26, "George Dunlap" <george.dunlap@eu.citrix.com> wrote:> To help simplify the load configuration of servers, I developed the > idea of a "domain runstate". The "runstate" of a domain is defined > by the runstate of its constituent vcpus. The "states" are defined > as follows: > > Blocked: All vcpus blocked > Partial run: Some vcpus running, some blocked > Partial contention: Some vcpus waiting for the cpu, some blocked > Full run: All vcpus running > Concurrency hazard: Some vcpus running, some waiting for the cpu > Full contention: All vcpus waiting for the cpu > > The idea is that by looking at the amount of time in each state > over the last unit of time, an administrator (or an automated > load-balancing program) can determine whether their vcpu and/or > domain configuration needs to be tweaked. For example: > > If a VM spends the majority of its time in "full run", it may > not have enough vcpus to do its work. > > If a VM spends a large amount of time in full contention, the system > is probably too busy, and some workloads should be moved onto other > servers. > > If a VM spends a large amount of time in concurrency hazard, it means > that the VM may be wasting a lot of time in spinlocks and other > synchronization (like cache misses). Either work should be moved off > the machine, or the number of vcpus assigned to the VM should be > reduced. > > The state is protected by a lock, but to avoid tricky deadlock > issues, if the lock cannot be acquired, the state is simply not > updated. Number of lost state updates is recorded. > > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > > diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c > --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000 > +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000 > @@ -238,6 +238,7 @@ > spin_lock_init_prof(d, domain_lock); > spin_lock_init_prof(d, page_alloc_lock); > spin_lock_init(&d->hypercall_deadlock_mutex); > + spin_lock_init(&d->runstate_lock); > INIT_PAGE_LIST_HEAD(&d->page_list); > INIT_PAGE_LIST_HEAD(&d->xenpage_list); > > diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c > --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000 > +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000 > @@ -970,6 +970,24 @@ > } > break; > > + case XEN_DOMCTL_get_runstate_info: > + { > + struct domain *d; > + > + ret = -ESRCH; > + d = rcu_lock_domain_by_id(op->domain); > + if ( d != NULL ) > + { > + domain_runstate_get(d, &op->u.domain_runstate); > + ret = 0; > + > + rcu_unlock_domain(d); > + > + if ( copy_to_guest(u_domctl, op, 1) ) > + ret = -EFAULT; > + } > + break; > + } > default: > ret = arch_do_domctl(op, u_domctl); > break; > diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c > --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000 > +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000 > @@ -135,10 +135,31 @@ > } > } > > +/* Used to quickly map the vcpu runstate mask to a domain runstate */ > +static int mask_to_state[] = { > + /* 000: Nothing in any runstate. Should never happen. */ > + -1, > + /* 001: All running */ > + DOMAIN_RUNSTATE_full_run, > + /* 010: All runnable */ > + DOMAIN_RUNSTATE_full_contention, > + /* 011: Some running, some runnable */ > + DOMAIN_RUNSTATE_concurrency_hazard, > + /* 100: All blocked / offline */ > + DOMAIN_RUNSTATE_blocked, > + /* 101: Some running, some blocked / offline */ > + DOMAIN_RUNSTATE_partial_run, > + /* 110: Some blocked / offline, some runnable */ > + DOMAIN_RUNSTATE_partial_contention, > + /* 111: Some in every state. Mixed running + runnable is most important. > */ > + DOMAIN_RUNSTATE_concurrency_hazard > +}; > + > static inline void vcpu_runstate_change( > struct vcpu *v, int new_state, s_time_t new_entry_time) > { > s_time_t delta; > + struct domain *d = v->domain; > > ASSERT(v->runstate.state != new_state); > > ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock)); > @@ -155,6 +176,37 @@ > } > > v->runstate.state = new_state; > + > + /* Update domain runstate */ > + if(spin_trylock(&d->runstate_lock)) { > + unsigned mask=0; > + struct vcpu *ov; > + > + BUG_ON(d->runstate.state > DOMAIN_RUNSTATE_partial_contention); > + > + d->runstate.time[d->runstate.state] +> + (new_entry_time - d->runstate.state_entry_time); > + d->runstate.state_entry_time = new_entry_time; > + > + /* Determine new runstate. First, see what states we have */ > + for_each_vcpu(d, ov) { > + /* Don''t count vcpus that have beent taken offline by the guest > */ > + if(! (ov->runstate.state == RUNSTATE_offline > + && test_bit(_VPF_down, &ov->pause_flags)) ) > + mask |= (1 << ov->runstate.state); > + } > + > + BUG_ON(mask == 0); > + > + /* Offline & blocked are the same */ > + mask |= ((1 << RUNSTATE_offline) & mask) >> 1; > + > + d->runstate.state = mask_to_state[mask&0x7]; > + > + spin_unlock(&d->runstate_lock); > + } else { > + atomic_inc(&d->runstate_missed_changes); > + } > } > > void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate) > @@ -173,6 +225,18 @@ > vcpu_schedule_unlock_irq(v); > } > > +void domain_runstate_get(struct domain *d, > + domain_runstate_info_t *runstate) > +{ > + spin_lock(&d->runstate_lock); > + > + memcpy(runstate, &d->runstate, sizeof(*runstate)); > + runstate->time[d->runstate.state] += NOW() - runstate->state_entry_time; > + runstate->missed_changes = atomic_read(&d->runstate_missed_changes); > + > + spin_unlock(&d->runstate_lock); > +} > + > uint64_t get_cpu_idle_time(unsigned int cpu) > { > struct vcpu_runstate_info state; > diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h > --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000 > +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000 > @@ -806,6 +806,46 @@ > DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t); > #endif > > +/* > + * Return information about the state and running time of a domain. > + * The "domain runstate" is based on the runstates of all the vcpus of the > + * domain (see below). > + * @extra_arg == pointer to domain_runstate_info structure. > + */ > +struct xen_domctl_runstate_info { > + /* Domain''s''s current state (DOMAIN_RUNSTATE_*). */ > + uint32_t state; > + /* Number of times we missed an update due to contention */ > + uint32_t missed_changes; > + /* When was current state entered (system time, ns)? */ > + uint64_t state_entry_time; > + /* > + * Time spent in each RUNSTATE_* (ns). The sum of these times is > + * NOT guaranteed not to drift from system time. > + */ > + uint64_t time[6]; > +}; > +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t; > +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t); > + > +/* All vcpus are running */ > +#define DOMAIN_RUNSTATE_full_run 0 > + > +/* All vcpus are runnable (i.e., waiting for cpu) */ > +#define DOMAIN_RUNSTATE_full_contention 1 > + > +/* Some vcpus are running, some are runnable */ > +#define DOMAIN_RUNSTATE_concurrency_hazard 2 > + > +/* All vcpus are blocked / offline */ > +#define DOMAIN_RUNSTATE_blocked 3 > + > +/* Some vpcus are running, some are blocked */ > +#define DOMAIN_RUNSTATE_partial_run 4 > + > +/* Some vcpus are runnable, some are blocked */ > +#define DOMAIN_RUNSTATE_partial_contention 5 > + > struct xen_domctl { > uint32_t cmd; > #define XEN_DOMCTL_createdomain 1 > @@ -868,6 +908,7 @@ > #define XEN_DOMCTL_getpageframeinfo3 61 > #define XEN_DOMCTL_setvcpuextstate 62 > #define XEN_DOMCTL_getvcpuextstate 63 > +#define XEN_DOMCTL_get_runstate_info 64 > #define XEN_DOMCTL_gdbsx_guestmemio 1000 > #define XEN_DOMCTL_gdbsx_pausevcpu 1001 > #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 > @@ -923,6 +964,7 @@ > struct xen_domctl_gdbsx_memio gdbsx_guest_memio; > struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; > struct xen_domctl_gdbsx_domstatus gdbsx_domstatus; > + struct xen_domctl_runstate_info domain_runstate; > uint8_t pad[128]; > } u; > }; > diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h > --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000 > +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000 > @@ -200,6 +200,8 @@ > int xen_port; > }; > > +typedef struct xen_domctl_runstate_info domain_runstate_info_t; > + > struct domain > { > domid_t domain_id; > @@ -332,6 +334,11 @@ > nodemask_t node_affinity; > unsigned int last_alloc_node; > spinlock_t node_affinity_lock; > + > + /* Domain runstate tracking */ > + spinlock_t runstate_lock; > + atomic_t runstate_missed_changes; > + domain_runstate_info_t runstate; > }; > > struct domain_setup_info > @@ -614,6 +621,7 @@ > int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity); > > void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate); > +void domain_runstate_get(struct domain *d, domain_runstate_info_t *runstate); > uint64_t get_cpu_idle_time(unsigned int cpu); > > /* > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Nov-23 14:23 UTC
Re: [Xen-devel] [PATCH] xen: Implement domain runstates
On 23/11/10 12:51, Keir Fraser wrote:> Is this really any more useful than just pulling all the individual > vcpustate records out via domctl (not supported yet except in a digested > form, but easily added), and then aggregating in userspace? It doesn''t look > like it.There is a difference: how the various runstates of the vcpus were correlated. So for example, suppose you have 2 vcpus with the following runstates: v0: running 50%, runnable 25%, blocked 25% v1: running 50%, runnable 25%, blocked 25% That could break down like this: full run: 25% concurrency hazard: 50% blocked: 25% Or like this: full run: 5% concurrency hazard: 50% partial run: 40% blocked: 5% Or like this: full contention: 20% full run: 50% concurrency hazard: 5% blocked: 25% To me those tell very different stories about what the VM''s workload is like and what it''s getting cpu-wise from Xen. > Also, I would never take a patch for a new control interface> without the thing that uses it being supplied at the same time; Otherwise > you can just as well maintain it out of tree.Would "the thing that uses it" be a libxc call, or an actual program using the data (like xentop)? FWIW, xapi uses this call in XenServer (just for customer reporting). We could carry it out-of-tree, but we''re trying to minimize the patchqueue. But if modifying xentop to be able to report these was a requisite for acceptance, I''d make an effort. :-) -George> > -- Keir > > On 23/11/2010 11:26, "George Dunlap"<george.dunlap@eu.citrix.com> wrote: > >> To help simplify the load configuration of servers, I developed the >> idea of a "domain runstate". The "runstate" of a domain is defined >> by the runstate of its constituent vcpus. The "states" are defined >> as follows: >> >> Blocked: All vcpus blocked >> Partial run: Some vcpus running, some blocked >> Partial contention: Some vcpus waiting for the cpu, some blocked >> Full run: All vcpus running >> Concurrency hazard: Some vcpus running, some waiting for the cpu >> Full contention: All vcpus waiting for the cpu >> >> The idea is that by looking at the amount of time in each state >> over the last unit of time, an administrator (or an automated >> load-balancing program) can determine whether their vcpu and/or >> domain configuration needs to be tweaked. For example: >> >> If a VM spends the majority of its time in "full run", it may >> not have enough vcpus to do its work. >> >> If a VM spends a large amount of time in full contention, the system >> is probably too busy, and some workloads should be moved onto other >> servers. >> >> If a VM spends a large amount of time in concurrency hazard, it means >> that the VM may be wasting a lot of time in spinlocks and other >> synchronization (like cache misses). Either work should be moved off >> the machine, or the number of vcpus assigned to the VM should be >> reduced. >> >> The state is protected by a lock, but to avoid tricky deadlock >> issues, if the lock cannot be acquired, the state is simply not >> updated. Number of lost state updates is recorded. >> >> Signed-off-by: George Dunlap<george.dunlap@eu.citrix.com> >> >> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c >> --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000 >> +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000 >> @@ -238,6 +238,7 @@ >> spin_lock_init_prof(d, domain_lock); >> spin_lock_init_prof(d, page_alloc_lock); >> spin_lock_init(&d->hypercall_deadlock_mutex); >> + spin_lock_init(&d->runstate_lock); >> INIT_PAGE_LIST_HEAD(&d->page_list); >> INIT_PAGE_LIST_HEAD(&d->xenpage_list); >> >> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c >> --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000 >> +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000 >> @@ -970,6 +970,24 @@ >> } >> break; >> >> + case XEN_DOMCTL_get_runstate_info: >> + { >> + struct domain *d; >> + >> + ret = -ESRCH; >> + d = rcu_lock_domain_by_id(op->domain); >> + if ( d != NULL ) >> + { >> + domain_runstate_get(d,&op->u.domain_runstate); >> + ret = 0; >> + >> + rcu_unlock_domain(d); >> + >> + if ( copy_to_guest(u_domctl, op, 1) ) >> + ret = -EFAULT; >> + } >> + break; >> + } >> default: >> ret = arch_do_domctl(op, u_domctl); >> break; >> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c >> --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000 >> +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000 >> @@ -135,10 +135,31 @@ >> } >> } >> >> +/* Used to quickly map the vcpu runstate mask to a domain runstate */ >> +static int mask_to_state[] = { >> + /* 000: Nothing in any runstate. Should never happen. */ >> + -1, >> + /* 001: All running */ >> + DOMAIN_RUNSTATE_full_run, >> + /* 010: All runnable */ >> + DOMAIN_RUNSTATE_full_contention, >> + /* 011: Some running, some runnable */ >> + DOMAIN_RUNSTATE_concurrency_hazard, >> + /* 100: All blocked / offline */ >> + DOMAIN_RUNSTATE_blocked, >> + /* 101: Some running, some blocked / offline */ >> + DOMAIN_RUNSTATE_partial_run, >> + /* 110: Some blocked / offline, some runnable */ >> + DOMAIN_RUNSTATE_partial_contention, >> + /* 111: Some in every state. Mixed running + runnable is most important. >> */ >> + DOMAIN_RUNSTATE_concurrency_hazard >> +}; >> + >> static inline void vcpu_runstate_change( >> struct vcpu *v, int new_state, s_time_t new_entry_time) >> { >> s_time_t delta; >> + struct domain *d = v->domain; >> >> ASSERT(v->runstate.state != new_state); >> >> ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock)); >> @@ -155,6 +176,37 @@ >> } >> >> v->runstate.state = new_state; >> + >> + /* Update domain runstate */ >> + if(spin_trylock(&d->runstate_lock)) { >> + unsigned mask=0; >> + struct vcpu *ov; >> + >> + BUG_ON(d->runstate.state> DOMAIN_RUNSTATE_partial_contention); >> + >> + d->runstate.time[d->runstate.state] +>> + (new_entry_time - d->runstate.state_entry_time); >> + d->runstate.state_entry_time = new_entry_time; >> + >> + /* Determine new runstate. First, see what states we have */ >> + for_each_vcpu(d, ov) { >> + /* Don''t count vcpus that have beent taken offline by the guest >> */ >> + if(! (ov->runstate.state == RUNSTATE_offline >> +&& test_bit(_VPF_down,&ov->pause_flags)) ) >> + mask |= (1<< ov->runstate.state); >> + } >> + >> + BUG_ON(mask == 0); >> + >> + /* Offline& blocked are the same */ >> + mask |= ((1<< RUNSTATE_offline)& mask)>> 1; >> + >> + d->runstate.state = mask_to_state[mask&0x7]; >> + >> + spin_unlock(&d->runstate_lock); >> + } else { >> + atomic_inc(&d->runstate_missed_changes); >> + } >> } >> >> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate) >> @@ -173,6 +225,18 @@ >> vcpu_schedule_unlock_irq(v); >> } >> >> +void domain_runstate_get(struct domain *d, >> + domain_runstate_info_t *runstate) >> +{ >> + spin_lock(&d->runstate_lock); >> + >> + memcpy(runstate,&d->runstate, sizeof(*runstate)); >> + runstate->time[d->runstate.state] += NOW() - runstate->state_entry_time; >> + runstate->missed_changes = atomic_read(&d->runstate_missed_changes); >> + >> + spin_unlock(&d->runstate_lock); >> +} >> + >> uint64_t get_cpu_idle_time(unsigned int cpu) >> { >> struct vcpu_runstate_info state; >> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h >> --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000 >> +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000 >> @@ -806,6 +806,46 @@ >> DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t); >> #endif >> >> +/* >> + * Return information about the state and running time of a domain. >> + * The "domain runstate" is based on the runstates of all the vcpus of the >> + * domain (see below). >> + * @extra_arg == pointer to domain_runstate_info structure. >> + */ >> +struct xen_domctl_runstate_info { >> + /* Domain''s''s current state (DOMAIN_RUNSTATE_*). */ >> + uint32_t state; >> + /* Number of times we missed an update due to contention */ >> + uint32_t missed_changes; >> + /* When was current state entered (system time, ns)? */ >> + uint64_t state_entry_time; >> + /* >> + * Time spent in each RUNSTATE_* (ns). The sum of these times is >> + * NOT guaranteed not to drift from system time. >> + */ >> + uint64_t time[6]; >> +}; >> +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t; >> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t); >> + >> +/* All vcpus are running */ >> +#define DOMAIN_RUNSTATE_full_run 0 >> + >> +/* All vcpus are runnable (i.e., waiting for cpu) */ >> +#define DOMAIN_RUNSTATE_full_contention 1 >> + >> +/* Some vcpus are running, some are runnable */ >> +#define DOMAIN_RUNSTATE_concurrency_hazard 2 >> + >> +/* All vcpus are blocked / offline */ >> +#define DOMAIN_RUNSTATE_blocked 3 >> + >> +/* Some vpcus are running, some are blocked */ >> +#define DOMAIN_RUNSTATE_partial_run 4 >> + >> +/* Some vcpus are runnable, some are blocked */ >> +#define DOMAIN_RUNSTATE_partial_contention 5 >> + >> struct xen_domctl { >> uint32_t cmd; >> #define XEN_DOMCTL_createdomain 1 >> @@ -868,6 +908,7 @@ >> #define XEN_DOMCTL_getpageframeinfo3 61 >> #define XEN_DOMCTL_setvcpuextstate 62 >> #define XEN_DOMCTL_getvcpuextstate 63 >> +#define XEN_DOMCTL_get_runstate_info 64 >> #define XEN_DOMCTL_gdbsx_guestmemio 1000 >> #define XEN_DOMCTL_gdbsx_pausevcpu 1001 >> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 >> @@ -923,6 +964,7 @@ >> struct xen_domctl_gdbsx_memio gdbsx_guest_memio; >> struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; >> struct xen_domctl_gdbsx_domstatus gdbsx_domstatus; >> + struct xen_domctl_runstate_info domain_runstate; >> uint8_t pad[128]; >> } u; >> }; >> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h >> --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000 >> +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000 >> @@ -200,6 +200,8 @@ >> int xen_port; >> }; >> >> +typedef struct xen_domctl_runstate_info domain_runstate_info_t; >> + >> struct domain >> { >> domid_t domain_id; >> @@ -332,6 +334,11 @@ >> nodemask_t node_affinity; >> unsigned int last_alloc_node; >> spinlock_t node_affinity_lock; >> + >> + /* Domain runstate tracking */ >> + spinlock_t runstate_lock; >> + atomic_t runstate_missed_changes; >> + domain_runstate_info_t runstate; >> }; >> >> struct domain_setup_info >> @@ -614,6 +621,7 @@ >> int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity); >> >> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate); >> +void domain_runstate_get(struct domain *d, domain_runstate_info_t *runstate); >> uint64_t get_cpu_idle_time(unsigned int cpu); >> >> /* >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 23/11/2010 14:23, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> There is a difference: how the various runstates of the vcpus were > correlated. > > So for example, suppose you have 2 vcpus with the following runstates: > v0: running 50%, runnable 25%, blocked 25% > v1: running 50%, runnable 25%, blocked 25% > > That could break down like this: > full run: 25% > concurrency hazard: 50% > blocked: 25% > > Or like this: > full run: 5% > concurrency hazard: 50% > partial run: 40% > blocked: 5% > > Or like this: > full contention: 20% > full run: 50% > concurrency hazard: 5% > blocked: 25% > > To me those tell very different stories about what the VM''s workload is > like and what it''s getting cpu-wise from Xen.The concurrency hazard %age might be useful for telling you how much Xen scheduling is screwing the VM I suppose. Still I wonder how much of this could typically be deduced from per-vcpu stats. And how much this improves load balancer performance beyond very basic per-domain stats. Overall you can count me as not 100% convinced. But mny own personal opinion on the pros/cons of this particular toolstack interface is only part of my argument...>> Also, I would never take a patch for a new control interface >> without the thing that uses it being supplied at the same time; Otherwise >> you can just as well maintain it out of tree. > > Would "the thing that uses it" be a libxc call, or an actual program > using the data (like xentop)? > > FWIW, xapi uses this call in XenServer (just for customer reporting). > We could carry it out-of-tree, but we''re trying to minimize the > patchqueue. But if modifying xentop to be able to report these was a > requisite for acceptance, I''d make an effort. :-)Well, actually what we care about is #users-who-care. Plumbing into xentop to satisfy an acceptance criteria is probably misguided. Reducing patch-queue length just to make a patch-queue short is also probably misguided. I think this patch can live in the XCP patchqueue quite happily. If the feature proves itself and others want it as a more general facility, for other toolstacks or whatever, it can be considered for upstream in due course. There''s no rush. -- Keir> -George > >> >> -- Keir >> >> On 23/11/2010 11:26, "George Dunlap"<george.dunlap@eu.citrix.com> wrote: >> >>> To help simplify the load configuration of servers, I developed the >>> idea of a "domain runstate". The "runstate" of a domain is defined >>> by the runstate of its constituent vcpus. The "states" are defined >>> as follows: >>> >>> Blocked: All vcpus blocked >>> Partial run: Some vcpus running, some blocked >>> Partial contention: Some vcpus waiting for the cpu, some blocked >>> Full run: All vcpus running >>> Concurrency hazard: Some vcpus running, some waiting for the cpu >>> Full contention: All vcpus waiting for the cpu >>> >>> The idea is that by looking at the amount of time in each state >>> over the last unit of time, an administrator (or an automated >>> load-balancing program) can determine whether their vcpu and/or >>> domain configuration needs to be tweaked. For example: >>> >>> If a VM spends the majority of its time in "full run", it may >>> not have enough vcpus to do its work. >>> >>> If a VM spends a large amount of time in full contention, the system >>> is probably too busy, and some workloads should be moved onto other >>> servers. >>> >>> If a VM spends a large amount of time in concurrency hazard, it means >>> that the VM may be wasting a lot of time in spinlocks and other >>> synchronization (like cache misses). Either work should be moved off >>> the machine, or the number of vcpus assigned to the VM should be >>> reduced. >>> >>> The state is protected by a lock, but to avoid tricky deadlock >>> issues, if the lock cannot be acquired, the state is simply not >>> updated. Number of lost state updates is recorded. >>> >>> Signed-off-by: George Dunlap<george.dunlap@eu.citrix.com> >>> >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c >>> --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000 >>> @@ -238,6 +238,7 @@ >>> spin_lock_init_prof(d, domain_lock); >>> spin_lock_init_prof(d, page_alloc_lock); >>> spin_lock_init(&d->hypercall_deadlock_mutex); >>> + spin_lock_init(&d->runstate_lock); >>> INIT_PAGE_LIST_HEAD(&d->page_list); >>> INIT_PAGE_LIST_HEAD(&d->xenpage_list); >>> >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c >>> --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000 >>> @@ -970,6 +970,24 @@ >>> } >>> break; >>> >>> + case XEN_DOMCTL_get_runstate_info: >>> + { >>> + struct domain *d; >>> + >>> + ret = -ESRCH; >>> + d = rcu_lock_domain_by_id(op->domain); >>> + if ( d != NULL ) >>> + { >>> + domain_runstate_get(d,&op->u.domain_runstate); >>> + ret = 0; >>> + >>> + rcu_unlock_domain(d); >>> + >>> + if ( copy_to_guest(u_domctl, op, 1) ) >>> + ret = -EFAULT; >>> + } >>> + break; >>> + } >>> default: >>> ret = arch_do_domctl(op, u_domctl); >>> break; >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c >>> --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000 >>> @@ -135,10 +135,31 @@ >>> } >>> } >>> >>> +/* Used to quickly map the vcpu runstate mask to a domain runstate */ >>> +static int mask_to_state[] = { >>> + /* 000: Nothing in any runstate. Should never happen. */ >>> + -1, >>> + /* 001: All running */ >>> + DOMAIN_RUNSTATE_full_run, >>> + /* 010: All runnable */ >>> + DOMAIN_RUNSTATE_full_contention, >>> + /* 011: Some running, some runnable */ >>> + DOMAIN_RUNSTATE_concurrency_hazard, >>> + /* 100: All blocked / offline */ >>> + DOMAIN_RUNSTATE_blocked, >>> + /* 101: Some running, some blocked / offline */ >>> + DOMAIN_RUNSTATE_partial_run, >>> + /* 110: Some blocked / offline, some runnable */ >>> + DOMAIN_RUNSTATE_partial_contention, >>> + /* 111: Some in every state. Mixed running + runnable is most >>> important. >>> */ >>> + DOMAIN_RUNSTATE_concurrency_hazard >>> +}; >>> + >>> static inline void vcpu_runstate_change( >>> struct vcpu *v, int new_state, s_time_t new_entry_time) >>> { >>> s_time_t delta; >>> + struct domain *d = v->domain; >>> >>> ASSERT(v->runstate.state != new_state); >>> >>> ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock)); >>> @@ -155,6 +176,37 @@ >>> } >>> >>> v->runstate.state = new_state; >>> + >>> + /* Update domain runstate */ >>> + if(spin_trylock(&d->runstate_lock)) { >>> + unsigned mask=0; >>> + struct vcpu *ov; >>> + >>> + BUG_ON(d->runstate.state> DOMAIN_RUNSTATE_partial_contention); >>> + >>> + d->runstate.time[d->runstate.state] +>>> + (new_entry_time - d->runstate.state_entry_time); >>> + d->runstate.state_entry_time = new_entry_time; >>> + >>> + /* Determine new runstate. First, see what states we have */ >>> + for_each_vcpu(d, ov) { >>> + /* Don''t count vcpus that have beent taken offline by the guest >>> */ >>> + if(! (ov->runstate.state == RUNSTATE_offline >>> +&& test_bit(_VPF_down,&ov->pause_flags)) ) >>> + mask |= (1<< ov->runstate.state); >>> + } >>> + >>> + BUG_ON(mask == 0); >>> + >>> + /* Offline& blocked are the same */ >>> + mask |= ((1<< RUNSTATE_offline)& mask)>> 1; >>> + >>> + d->runstate.state = mask_to_state[mask&0x7]; >>> + >>> + spin_unlock(&d->runstate_lock); >>> + } else { >>> + atomic_inc(&d->runstate_missed_changes); >>> + } >>> } >>> >>> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info >>> *runstate) >>> @@ -173,6 +225,18 @@ >>> vcpu_schedule_unlock_irq(v); >>> } >>> >>> +void domain_runstate_get(struct domain *d, >>> + domain_runstate_info_t *runstate) >>> +{ >>> + spin_lock(&d->runstate_lock); >>> + >>> + memcpy(runstate,&d->runstate, sizeof(*runstate)); >>> + runstate->time[d->runstate.state] += NOW() - >>> runstate->state_entry_time; >>> + runstate->missed_changes = atomic_read(&d->runstate_missed_changes); >>> + >>> + spin_unlock(&d->runstate_lock); >>> +} >>> + >>> uint64_t get_cpu_idle_time(unsigned int cpu) >>> { >>> struct vcpu_runstate_info state; >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h >>> --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000 >>> @@ -806,6 +806,46 @@ >>> DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t); >>> #endif >>> >>> +/* >>> + * Return information about the state and running time of a domain. >>> + * The "domain runstate" is based on the runstates of all the vcpus of the >>> + * domain (see below). >>> + * @extra_arg == pointer to domain_runstate_info structure. >>> + */ >>> +struct xen_domctl_runstate_info { >>> + /* Domain''s''s current state (DOMAIN_RUNSTATE_*). */ >>> + uint32_t state; >>> + /* Number of times we missed an update due to contention */ >>> + uint32_t missed_changes; >>> + /* When was current state entered (system time, ns)? */ >>> + uint64_t state_entry_time; >>> + /* >>> + * Time spent in each RUNSTATE_* (ns). The sum of these times is >>> + * NOT guaranteed not to drift from system time. >>> + */ >>> + uint64_t time[6]; >>> +}; >>> +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t; >>> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t); >>> + >>> +/* All vcpus are running */ >>> +#define DOMAIN_RUNSTATE_full_run 0 >>> + >>> +/* All vcpus are runnable (i.e., waiting for cpu) */ >>> +#define DOMAIN_RUNSTATE_full_contention 1 >>> + >>> +/* Some vcpus are running, some are runnable */ >>> +#define DOMAIN_RUNSTATE_concurrency_hazard 2 >>> + >>> +/* All vcpus are blocked / offline */ >>> +#define DOMAIN_RUNSTATE_blocked 3 >>> + >>> +/* Some vpcus are running, some are blocked */ >>> +#define DOMAIN_RUNSTATE_partial_run 4 >>> + >>> +/* Some vcpus are runnable, some are blocked */ >>> +#define DOMAIN_RUNSTATE_partial_contention 5 >>> + >>> struct xen_domctl { >>> uint32_t cmd; >>> #define XEN_DOMCTL_createdomain 1 >>> @@ -868,6 +908,7 @@ >>> #define XEN_DOMCTL_getpageframeinfo3 61 >>> #define XEN_DOMCTL_setvcpuextstate 62 >>> #define XEN_DOMCTL_getvcpuextstate 63 >>> +#define XEN_DOMCTL_get_runstate_info 64 >>> #define XEN_DOMCTL_gdbsx_guestmemio 1000 >>> #define XEN_DOMCTL_gdbsx_pausevcpu 1001 >>> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 >>> @@ -923,6 +964,7 @@ >>> struct xen_domctl_gdbsx_memio gdbsx_guest_memio; >>> struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; >>> struct xen_domctl_gdbsx_domstatus gdbsx_domstatus; >>> + struct xen_domctl_runstate_info domain_runstate; >>> uint8_t pad[128]; >>> } u; >>> }; >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h >>> --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000 >>> @@ -200,6 +200,8 @@ >>> int xen_port; >>> }; >>> >>> +typedef struct xen_domctl_runstate_info domain_runstate_info_t; >>> + >>> struct domain >>> { >>> domid_t domain_id; >>> @@ -332,6 +334,11 @@ >>> nodemask_t node_affinity; >>> unsigned int last_alloc_node; >>> spinlock_t node_affinity_lock; >>> + >>> + /* Domain runstate tracking */ >>> + spinlock_t runstate_lock; >>> + atomic_t runstate_missed_changes; >>> + domain_runstate_info_t runstate; >>> }; >>> >>> struct domain_setup_info >>> @@ -614,6 +621,7 @@ >>> int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity); >>> >>> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info >>> *runstate); >>> +void domain_runstate_get(struct domain *d, domain_runstate_info_t >>> *runstate); >>> uint64_t get_cpu_idle_time(unsigned int cpu); >>> >>> /* >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Nov-23 15:48 UTC
Re: [Xen-devel] [PATCH] xen: Implement domain runstates
On 23/11/10 15:00, Keir Fraser wrote:> Well, actually what we care about is #users-who-care. Plumbing into xentop > to satisfy an acceptance criteria is probably misguided. Reducing > patch-queue length just to make a patch-queue short is also probably > misguided.I wouldn''t offer to put it in xentop just because I wanted to get the patch out of the patchqueue. I think it''s a useful feature, that many people would use if they knew about it. If I didn''t think it was useful, by far the simplest way to get it out of the patchqueue would be to simply remove the functionality from the next release of XenServer. I think the "wait for people to ask for it" won''t work very well for this particular feature, because it''s not the kind of thing I think most people would think to ask for. A bit like tmem -- I think it''s a solution to a problem that people have that''s very non-obvious, and must therefore be suggested to people rather than waiting for them to ask for it. I could write a blog post about it or otherwise try to promote the idea, but how many people are going to download a patch in order to try it out? (I''m aware the difference is that tmem''s effectiveness can be measured with benchmarks while this feature cannot; I''m just saying that in both cases, you''d be waiting for a long time for people to demand it.) xenalyze will calculate and report domain runstate numbers based on vcpu runstate_change records, and I''ve found it useful to get a rough idea what''s going on when I get traces in from the field. Of course there''s no rush, but it''s likely to be forgotten about for another two years. :-) (It''s already been in the XS patchqueue since September 2008.) -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Nov-23 18:41 UTC
Re: [Xen-devel] [PATCH] xen: Implement domain runstates
On Tue, Nov 23, 2010 at 03:48:33PM +0000, George Dunlap wrote:> On 23/11/10 15:00, Keir Fraser wrote: >> Well, actually what we care about is #users-who-care. Plumbing into xentop >> to satisfy an acceptance criteria is probably misguided. Reducing >> patch-queue length just to make a patch-queue short is also probably >> misguided. > > I wouldn''t offer to put it in xentop just because I wanted to get the > patch out of the patchqueue. I think it''s a useful feature, that many > people would use if they knew about it. If I didn''t think it was > useful, by far the simplest way to get it out of the patchqueue would be > to simply remove the functionality from the next release of XenServer. > > I think the "wait for people to ask for it" won''t work very well for > this particular feature, because it''s not the kind of thing I think most > people would think to ask for. A bit like tmem -- I think it''s a > solution to a problem that people have that''s very non-obvious, and must > therefore be suggested to people rather than waiting for them to ask for > it. I could write a blog post about it or otherwise try to promote the > idea, but how many people are going to download a patch in order to try > it out? > > (I''m aware the difference is that tmem''s effectiveness can be measured > with benchmarks while this feature cannot; I''m just saying that in both > cases, you''d be waiting for a long time for people to demand it.) > > xenalyze will calculate and report domain runstate numbers based on vcpu > runstate_change records, and I''ve found it useful to get a rough idea > what''s going on when I get traces in from the field. > > Of course there''s no rush, but it''s likely to be forgotten about for > another two years. :-) (It''s already been in the XS patchqueue since > September 2008.) >This feature definitely sounds useful to me :) -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap writes ("Re: [Xen-devel] [PATCH] xen: Implement domain runstates"):> I wouldn''t offer to put it in xentop just because I wanted to get the > patch out of the patchqueue. I think it''s a useful feature, that many > people would use if they knew about it. If I didn''t think it was > useful, by far the simplest way to get it out of the patchqueue would be > to simply remove the functionality from the next release of XenServer.FWIW I think this sounds like a useful thing to have in xentop on its own account, not just to satisfy Keir''s (sensible) requirement. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Nov 23, 2010 at 7:48 AM, George Dunlap <George.Dunlap@eu.citrix.com>wrote:> On 23/11/10 15:00, Keir Fraser wrote: > >> Well, actually what we care about is #users-who-care. Plumbing into xentop >> to satisfy an acceptance criteria is probably misguided. Reducing >> patch-queue length just to make a patch-queue short is also probably >> misguided. >> > > I wouldn''t offer to put it in xentop just because I wanted to get the patch > out of the patchqueue. I think it''s a useful feature, that many people > would use if they knew about it. If I didn''t think it was useful, by far > the simplest way to get it out of the patchqueue would be to simply remove > the functionality from the next release of XenServer. > > I think the "wait for people to ask for it" won''t work very well for this > particular feature, because it''s not the kind of thing I think most people > would think to ask for. A bit like tmem -- I think it''s a solution to a > problem that people have that''s very non-obvious, and must therefore be > suggested to people rather than waiting for them to ask for it. I could > write a blog post about it or otherwise try to promote the idea, but how > many people are going to download a patch in order to try it out? >Irrespective of the nature of the suggested functionality, I do agree that xen users will be more likely to try something if it''s part of the default build, especially if the benefit is not immediately obvious. With all the complexity associated with any virtualization project, I think that many integrators don''t have the time to experiment with every possible option included in the "off the shelf" Xen, not to mention those that require explicit patching and manual inclusion. That said I would try this as a "VCPU allocation suitability" metric for any xen system. The ability to see this data ''at a glance'' would be beneficial to admins and system architects. A lot of data is available if you know how to get it. That is very different from presenting that data in an easily digestible format requiring no work to use. -Bruce> > (I''m aware the difference is that tmem''s effectiveness can be measured with > benchmarks while this feature cannot; I''m just saying that in both cases, > you''d be waiting for a long time for people to demand it.) > > xenalyze will calculate and report domain runstate numbers based on vcpu > runstate_change records, and I''ve found it useful to get a rough idea what''s > going on when I get traces in from the field. > > Of course there''s no rush, but it''s likely to be forgotten about for > another two years. :-) (It''s already been in the XS patchqueue since > September 2008.) > > -George > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/11/2010 19:11, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote:> George Dunlap writes ("Re: [Xen-devel] [PATCH] xen: Implement domain > runstates"): >> I wouldn''t offer to put it in xentop just because I wanted to get the >> patch out of the patchqueue. I think it''s a useful feature, that many >> people would use if they knew about it. If I didn''t think it was >> useful, by far the simplest way to get it out of the patchqueue would be >> to simply remove the functionality from the next release of XenServer. > > FWIW I think this sounds like a useful thing to have in xentop on its > own account, not just to satisfy Keir''s (sensible) requirement.My further fear about domain runstates is that it adds an extra cost on every VCPU schedule/deschedule, linear in number of vcpus. We want to keep that path short, not add extra accounting crap to it. -- Keir> Ian. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/11/2010 20:02, "Bruce Edge" <bruce.edge@gmail.com> wrote:> Irrespective of the nature of the suggested functionality, I do agree that xen > users will be more likely to try something if it''s part of the default build, > especially if the benefit is not immediately obvious. With all the complexity > associated with any virtualization project, I think that many integrators > don''t have the time to experiment with every possible option included in the > "off the shelf" Xen, not to mention those that require explicit patching and > manual inclusion. > That said I would try this as a "VCPU allocation suitability" metric for any > xen system. The ability to see this data ''at a glance'' would be beneficial to > admins and system architects. > A lot of data is available if you know how to get it. That is very different > from presenting that data in an easily digestible format requiring no work to > use.You may be under the impression the feature is free. It isn''t -- it adds a lock and loop over all vcpus on every vcpu scheduling/descheduling decision. That could be frequent for VCPUs doing a lot of short blocking, for example. And the system-wide work done to support this feature will scale quadratically on average with vcpus-per-domain. My gut feeling is this isn''t a good thing to have on your context switch path. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 25, 2010 at 2:24 PM, Keir Fraser <keir@xen.org> wrote:> On 25/11/2010 20:02, "Bruce Edge" <bruce.edge@gmail.com> wrote: > > > Irrespective of the nature of the suggested functionality, I do agree > that xen > > users will be more likely to try something if it''s part of the default > build, > > especially if the benefit is not immediately obvious. With all the > complexity > > associated with any virtualization project, I think that many integrators > > don''t have the time to experiment with every possible option included in > the > > "off the shelf" Xen, not to mention those that require explicit patching > and > > manual inclusion. > > That said I would try this as a "VCPU allocation suitability" metric for > any > > xen system. The ability to see this data ''at a glance'' would be > beneficial to > > admins and system architects. > > A lot of data is available if you know how to get it. That is very > different > > from presenting that data in an easily digestible format requiring no > work to > > use. > > You may be under the impression the feature is free. It isn''t -- it adds a > lock and loop over all vcpus on every vcpu scheduling/descheduling > decision. > That could be frequent for VCPUs doing a lot of short blocking, for > example. > And the system-wide work done to support this feature will scale > quadratically on average with vcpus-per-domain. My gut feeling is this > isn''t > a good thing to have on your context switch path. > >Agreed, probably have to be a compile time option. That would still give it more exposure than if it required a patch. We build debug & release versions of everything, with max instrumentation on the former and max optimizations on the latter. I could see this being informative for the debug variant, although that may not be the exact use case that George was after. I see this more as a tool for confirming one''s VCPU allocations on a single system with multiple VMs rather than an admin tool for monitoring generic virtualization servers. -Bruce> -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Nov-26 11:52 UTC
Re: [Xen-devel] [PATCH] xen: Implement domain runstates
On Thu, Nov 25, 2010 at 10:24 PM, Keir Fraser <keir@xen.org> wrote:> You may be under the impression the feature is free. It isn''t -- it adds a > lock and loop over all vcpus on every vcpu scheduling/descheduling decision. > That could be frequent for VCPUs doing a lot of short blocking, for example. > And the system-wide work done to support this feature will scale > quadratically on average with vcpus-per-domain. My gut feeling is this isn''t > a good thing to have on your context switch path.That''s certainly reasonable. If I get a chance to do some perf testing over the next few weeks, I''ll post some performance numbers, and we can quantify the slowdown (although catching potential cache "cliffs" may be more difficult). Would you consider accepting it as a feature off by default, enabled by a Xen command-line parameter? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel