As an experiment trying to reduce the latency when scheduling dom0
vcpus, I applied the following patch to __runq_insert() to xen 4.2:
diff -r 8643ca19d356 -r 91b13479c1a2 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -205,6 +205,15 @@
BUG_ON( __vcpu_on_runq(svc) );
BUG_ON( cpu != svc->vcpu->processor );
+ /* if svc is a dom0 vcpu, put it always before all the other vcpus
in the runq,
+ * so that dom0 vcpus always have priority
+ */
+ if (svc->vcpu->domain->domain_id == 0) {
+ svc->pri = CSCHED_PRI_TS_BOOST; /* make sure no vcpu goes in
front of this one until this vcpu is scheduled */
+ list_add(&svc->runq_elem, (struct list_head *)runq);
+ return;
+ }
+
list_for_each( iter, runq )
{
const struct csched_vcpu * const iter_svc = __runq_elem(iter);
However, this patch seems to have had the opposite effect, and I would
like to understand why. A win7 guest now takes hours to start up, and I
believe this is due to dom0 taking an order of 10ms to serve each vm i/o
request, even though the dom0 vcpus and the guest vcpu are in different
pcpus.
xenalyze-a.out: http://pastelink.me/getfile.php?key=390a25
xentrace-D-T5.out: http://pastelink.me/getfile.php?key=b3d584
Any ideas why this is the case?
thanks,
Marcus
--
xenalyze-a.out head:
--
0.006977926 ------ x d32768v23 runstate_change d4v0
blocked->runnable
Creating domain 4
Creating vcpu 0 for dom 4
] 0.006979023 ------ x d32768v23 28004(2:8:4) 2 [ 4 0 ]
] 0.006980999 ------ x d32768v23 2800e(2:8:e) 2 [
7fff edd9df ]
] 0.006981126 ------ x d32768v23 2800f(2:8:f) 3 [ 4
e82 1c9c380 ]
] 0.006981403 ------ x d32768v23 2800a(2:8:a) 4 [
7fff 17 4 0 ]
0.006981687 ------ x d32768v23 runstate_change
d32767v23 running->runnable
Creating vcpu 23 for dom 32767
Using first_tsc for d32767v23 (9024 cycles)
0.006982783 ------ x d?v? runstate_change d4v0
runnable->running
] 0.006996466 ------ x d4v0 28006(2:8:6) 2 [ 4 0 ]
] 0.006997600 ------ x d4v0 2800e(2:8:e) 2 [ 4 4d19 ]
] 0.006997726 ------ x d4v0 2800f(2:8:f) 3 [ 7fff
4d19 ffffffff ]
] 0.006997881 ------ x d4v0 2800a(2:8:a) 4 [ 4 0 7fff
17 ]
0.006998070 ------ x d4v0 runstate_change d4v0
running->blocked
0.006998242 ------ x d?v? runstate_change d32767v23
runnable->running
0.014874949 ----x- - d32767v4 runstate_change d0v4
blocked->runnable
] 0.014879473 ----x- - d32767v4 28004(2:8:4) 2 [ 0 4 ]
0.014880331 -x---- - d32767v1 runstate_change d0v1
blocked->runnable
] 0.014884417 ----x- - d32767v4 2800e(2:8:e) 2 [ 7fff
97fc06 ]
] 0.014884544 ----x- - d32767v4 2800f(2:8:f) 3 [ 0
1978 1c9c380 ]
] 0.014884916 ----x- - d32767v4 2800a(2:8:a) 4 [ 7fff
4 0 4 ]
] 0.014885022 -x---- - d32767v1 28004(2:8:4) 2 [ 0 1 ]
0.014885134 ----x- - d32767v4 runstate_change
d32767v4 running->runnable
0.014885251 --x- - - d32767v2 runstate_change d0v2
blocked->runnable
] 0.014889526 -x-- - - d32767v1 2800e(2:8:e) 2 [ 7fff
97cdd8 ]
] 0.014889731 -x-- - - d32767v1 2800f(2:8:f) 3 [ 0
1b68 1c9c380 ]
] 0.014889949 -x-- - - d32767v1 2800a(2:8:a) 4 [ 7fff
1 0 1 ]
0.014890084 ----x- - d?v? runstate_change d0v4
runnable->running
0.014890176 -x--|- - d32767v1 runstate_change
d32767v1 running->runnable
] 0.014890291 - x-|- - d32767v2 28004(2:8:4) 2 [ 0 2 ]
0.014890374 - -x|- - d32767v3 runstate_change d0v3
blocked->runnable
0.014891134 -x--|- - d?v? runstate_change d0v1
runnable->running
] 0.014891811 -|x-|- - d32767v2 2800e(2:8:e) 2 [ 7fff
96f8a4 ]
] 0.014891905 -|-x|- - d32767v3 28004(2:8:4) 2 [ 0 3 ]
] 0.014891936 -|x-|- - d32767v2 2800f(2:8:f) 3 [ 0
1c23 1c9c380 ]
] 0.014892155 -|x-|- - d32767v2 2800a(2:8:a) 4 [ 7fff
2 0 2 ]
0.014892362 -|--|x - d32767v5 runstate_change d0v5
blocked->runnable
0.014892395 -|x-|- - d32767v2 runstate_change
d32767v2 running->runnable
] 0.014893226 -| x|- - d32767v3 2800e(2:8:e) 2 [ 7fff
982ddb ]
] 0.014893343 -| x|- - d32767v3 2800f(2:8:f) 3 [ 0
c64 1c9c380 ]
0.014893386 -|x-|- - d?v? runstate_change d0v2
runnable->running
] 0.014893556 -||x|- - d32767v3 2800a(2:8:a) 4 [ 7fff
3 0 3 ]
] 0.014893778 -||-|x - d32767v5 28004(2:8:4) 2 [ 0 5 ]
0.014893867 -||x|- - d32767v3 runstate_change
d32767v3 running->runnable
0.014894811 -||x|- - d?v? runstate_change d0v3
runnable->running
] 0.014895067 -||||x - d32767v5 2800e(2:8:e) 2 [ 7fff
982654 ]
] 0.014895192 -||||x - d32767v5 2800f(2:8:f) 3 [ 0
c3c 1c9c380 ]
] 0.014895439 -||||x - d32767v5 2800a(2:8:a) 4 [ 7fff
5 0 5 ]
0.014895815 -||||x - d32767v5 runstate_change
d32767v5 running->runnable
0.014896751 -||||x - d?v? runstate_change d0v5
runnable->running
] 0.014908155 -|||x| - d0v4 28006(2:8:6) 2 [ 0 4 ]
] 0.014908228 -||||x - d0v5 28006(2:8:6) 2 [ 0 5 ]
] 0.014908405 -x|||| - d0v1 28006(2:8:6) 2 [ 0 1 ]
] 0.014909231 -||x|| - d0v3 28006(2:8:6) 2 [ 0 3 ]
] 0.014910265 -|||x| - d0v4 2800e(2:8:e) 2 [ 0 7f14 ]
] 0.014910384 -|||x| - d0v4 2800f(2:8:f) 3 [ 7fff
7f14 ffffffff ]
] 0.014910550 -|||x| - d0v4 2800a(2:8:a) 4 [ 0 4 7fff 4 ]
] 0.014910566 -x|||| - d0v1 2800e(2:8:e) 2 [ 0 6743 ]
] 0.014910679 -x|||| - d0v1 2800f(2:8:f) 3 [ 7fff
6743 ffffffff ]
] 0.014910707 -||||x - d0v5 2800e(2:8:e) 2 [ 0 3f80 ]
0.014910783 -|||x| - d0v4 runstate_change d0v4
running->blocked
] 0.014910803 -x|| | - d0v1 2800a(2:8:a) 4 [ 0 1 7fff 1 ]
] 0.014910819 -||| x - d0v5 2800f(2:8:f) 3 [ 7fff
3f80 ffffffff ]
] 0.014910944 -||| x - d0v5 2800a(2:8:a) 4 [ 0 5 7fff 5 ]
0.014911030 -x|| | - d0v1 runstate_change d0v1
running->blocked
0.014911109 - || x - d0v5 runstate_change d0v5
running->blocked
] 0.014911307 - |x - d0v3 2800e(2:8:e) 2 [ 0 4c74 ]
0.014911367 - || x - d?v? runstate_change d32767v5
runnable->running
] 0.014911417 - |x - - d0v3 2800f(2:8:f) 3 [ 7fff
4c74 ffffffff ]
0.014911471 - ||x- - d?v? runstate_change d32767v4
runnable->running
0.014911512 -x||-- - d?v? runstate_change d32767v1
runnable->running
] 0.014911530 --|x-- - d0v3 2800a(2:8:a) 4 [ 0 3 7fff 3 ]
0.014911687 --|x-- - d0v3 runstate_change d0v3
running->blocked
0.014912276 --|x-- - d?v? runstate_change d32767v3
runnable->running
] 0.015036914 --x--- - d0v2 28006(2:8:6) 2 [ 0 2 ]
] 0.015038191 --x--- - d0v2 2800e(2:8:e) 2 [ 0 28d83 ]
] 0.015038313 --x--- - d0v2 2800f(2:8:f) 3 [ 7fff
28d83 ffffffff ]
] 0.015038445 --x--- - d0v2 2800a(2:8:a) 4 [ 0 2 7fff 2 ]
0.015038617 --x--- - d0v2 runstate_change d0v2
running->blocked
0.015039232 --x--- - d?v? runstate_change d32767v2
runnable->running
0.020630385 ------ x d32767v23 runstate_change d4v0
blocked->runnable
] 0.020631491 ------ x d32767v23 28004(2:8:4) 2 [ 4 0 ]
] 0.020633401 ------ x d32767v23 2800e(2:8:e) 2 [
7fff edb796 ]
] 0.020633555 ------ x d32767v23 2800f(2:8:f) 3 [ 4
d97 1c9c380 ]
] 0.020633813 ------ x d32767v23 2800a(2:8:a) 4 [
7fff 17 4 0 ]
0.020634086 ------ x d32767v23 runstate_change
d32767v23 running->runnable
0.020635147 ------ x d?v? runstate_change d4v0
runnable->running
] 0.020650487 ------ x d4v0 28006(2:8:6) 2 [ 4 0 ]
] 0.020651616 ------ x d4v0 2800e(2:8:e) 2 [ 4 5400 ]
] 0.020651739 ------ x d4v0 2800f(2:8:f) 3 [ 7fff
5400 ffffffff ]
] 0.020651876 ------ x d4v0 2800a(2:8:a) 4 [ 4 0 7fff
17 ]
0.020652054 ------ x d4v0 runstate_change d4v0
running->blocked
On ven, 2013-05-31 at 18:18 +0100, Marcus Granado wrote:> As an experiment trying to reduce the latency when scheduling dom0 > vcpus, I applied the following patch to __runq_insert() to xen 4.2: > > diff -r 8643ca19d356 -r 91b13479c1a2 xen/common/sched_credit.c > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c > @@ -205,6 +205,15 @@ > BUG_ON( __vcpu_on_runq(svc) ); > BUG_ON( cpu != svc->vcpu->processor ); > > + /* if svc is a dom0 vcpu, put it always before all the other vcpus > in the runq, > + * so that dom0 vcpus always have priority > + */ > + if (svc->vcpu->domain->domain_id == 0) { > + svc->pri = CSCHED_PRI_TS_BOOST; /* make sure no vcpu goes in > front of this one until this vcpu is scheduled */ > + list_add(&svc->runq_elem, (struct list_head *)runq); > + return; > + } > + > list_for_each( iter, runq ) > { > const struct csched_vcpu * const iter_svc = __runq_elem(iter); >Mmm... Are we talking about wakeup latency --which, BTW, is what TS_BOOST is all about, AFAIUI ? In that case, isn''t a waking vcpu, whether or not it belongs to Dom0, being boosted already in csched_vcpu_wake()? __runq_insert() is called right after that, so I think it sees the boosting already, without the need of the above. If it''s not only wakeup latency issues that you''re trying to address, then I''m not sure, but still, __runq_insert() does not look the right place where to place such logic, at least per my personal taste. :-)> However, this patch seems to have had the opposite effect, and I would > like to understand why. A win7 guest now takes hours to start up, and I > believe this is due to dom0 taking an order of 10ms to serve each vm i/o > request, even though the dom0 vcpus and the guest vcpu are in different > pcpus. >Well, just shooting in the dark, but __runq_insert() is also called in csched_schedule(). Perhaps your modification above interacts badly with the current scheduling logic? Another way of trying to achieve what you seem to be up to, could be to put an "is_it_dom0?" check in csched_vcpu_acct() and, if true, do not clear the boosting. Beware, I''m not saying that it makes sense, or that I like it, it just seems more clean (at least to me) than hijacking __runq_insert(). What do you think?> xenalyze-a.out: http://pastelink.me/getfile.php?key=390a25 > xentrace-D-T5.out: http://pastelink.me/getfile.php?key=b3d584 >Sorry, can''t look at the traces right now... If I find 5 mins for them and spot something weird, I''ll let you know. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel