As an experiment trying to reduce the latency when scheduling dom0 vcpus, I applied the following patch to __runq_insert() to xen 4.2: diff -r 8643ca19d356 -r 91b13479c1a2 xen/common/sched_credit.c --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -205,6 +205,15 @@ BUG_ON( __vcpu_on_runq(svc) ); BUG_ON( cpu != svc->vcpu->processor ); + /* if svc is a dom0 vcpu, put it always before all the other vcpus in the runq, + * so that dom0 vcpus always have priority + */ + if (svc->vcpu->domain->domain_id == 0) { + svc->pri = CSCHED_PRI_TS_BOOST; /* make sure no vcpu goes in front of this one until this vcpu is scheduled */ + list_add(&svc->runq_elem, (struct list_head *)runq); + return; + } + list_for_each( iter, runq ) { const struct csched_vcpu * const iter_svc = __runq_elem(iter); However, this patch seems to have had the opposite effect, and I would like to understand why. A win7 guest now takes hours to start up, and I believe this is due to dom0 taking an order of 10ms to serve each vm i/o request, even though the dom0 vcpus and the guest vcpu are in different pcpus. xenalyze-a.out: http://pastelink.me/getfile.php?key=390a25 xentrace-D-T5.out: http://pastelink.me/getfile.php?key=b3d584 Any ideas why this is the case? thanks, Marcus -- xenalyze-a.out head: -- 0.006977926 ------ x d32768v23 runstate_change d4v0 blocked->runnable Creating domain 4 Creating vcpu 0 for dom 4 ] 0.006979023 ------ x d32768v23 28004(2:8:4) 2 [ 4 0 ] ] 0.006980999 ------ x d32768v23 2800e(2:8:e) 2 [ 7fff edd9df ] ] 0.006981126 ------ x d32768v23 2800f(2:8:f) 3 [ 4 e82 1c9c380 ] ] 0.006981403 ------ x d32768v23 2800a(2:8:a) 4 [ 7fff 17 4 0 ] 0.006981687 ------ x d32768v23 runstate_change d32767v23 running->runnable Creating vcpu 23 for dom 32767 Using first_tsc for d32767v23 (9024 cycles) 0.006982783 ------ x d?v? runstate_change d4v0 runnable->running ] 0.006996466 ------ x d4v0 28006(2:8:6) 2 [ 4 0 ] ] 0.006997600 ------ x d4v0 2800e(2:8:e) 2 [ 4 4d19 ] ] 0.006997726 ------ x d4v0 2800f(2:8:f) 3 [ 7fff 4d19 ffffffff ] ] 0.006997881 ------ x d4v0 2800a(2:8:a) 4 [ 4 0 7fff 17 ] 0.006998070 ------ x d4v0 runstate_change d4v0 running->blocked 0.006998242 ------ x d?v? runstate_change d32767v23 runnable->running 0.014874949 ----x- - d32767v4 runstate_change d0v4 blocked->runnable ] 0.014879473 ----x- - d32767v4 28004(2:8:4) 2 [ 0 4 ] 0.014880331 -x---- - d32767v1 runstate_change d0v1 blocked->runnable ] 0.014884417 ----x- - d32767v4 2800e(2:8:e) 2 [ 7fff 97fc06 ] ] 0.014884544 ----x- - d32767v4 2800f(2:8:f) 3 [ 0 1978 1c9c380 ] ] 0.014884916 ----x- - d32767v4 2800a(2:8:a) 4 [ 7fff 4 0 4 ] ] 0.014885022 -x---- - d32767v1 28004(2:8:4) 2 [ 0 1 ] 0.014885134 ----x- - d32767v4 runstate_change d32767v4 running->runnable 0.014885251 --x- - - d32767v2 runstate_change d0v2 blocked->runnable ] 0.014889526 -x-- - - d32767v1 2800e(2:8:e) 2 [ 7fff 97cdd8 ] ] 0.014889731 -x-- - - d32767v1 2800f(2:8:f) 3 [ 0 1b68 1c9c380 ] ] 0.014889949 -x-- - - d32767v1 2800a(2:8:a) 4 [ 7fff 1 0 1 ] 0.014890084 ----x- - d?v? runstate_change d0v4 runnable->running 0.014890176 -x--|- - d32767v1 runstate_change d32767v1 running->runnable ] 0.014890291 - x-|- - d32767v2 28004(2:8:4) 2 [ 0 2 ] 0.014890374 - -x|- - d32767v3 runstate_change d0v3 blocked->runnable 0.014891134 -x--|- - d?v? runstate_change d0v1 runnable->running ] 0.014891811 -|x-|- - d32767v2 2800e(2:8:e) 2 [ 7fff 96f8a4 ] ] 0.014891905 -|-x|- - d32767v3 28004(2:8:4) 2 [ 0 3 ] ] 0.014891936 -|x-|- - d32767v2 2800f(2:8:f) 3 [ 0 1c23 1c9c380 ] ] 0.014892155 -|x-|- - d32767v2 2800a(2:8:a) 4 [ 7fff 2 0 2 ] 0.014892362 -|--|x - d32767v5 runstate_change d0v5 blocked->runnable 0.014892395 -|x-|- - d32767v2 runstate_change d32767v2 running->runnable ] 0.014893226 -| x|- - d32767v3 2800e(2:8:e) 2 [ 7fff 982ddb ] ] 0.014893343 -| x|- - d32767v3 2800f(2:8:f) 3 [ 0 c64 1c9c380 ] 0.014893386 -|x-|- - d?v? runstate_change d0v2 runnable->running ] 0.014893556 -||x|- - d32767v3 2800a(2:8:a) 4 [ 7fff 3 0 3 ] ] 0.014893778 -||-|x - d32767v5 28004(2:8:4) 2 [ 0 5 ] 0.014893867 -||x|- - d32767v3 runstate_change d32767v3 running->runnable 0.014894811 -||x|- - d?v? runstate_change d0v3 runnable->running ] 0.014895067 -||||x - d32767v5 2800e(2:8:e) 2 [ 7fff 982654 ] ] 0.014895192 -||||x - d32767v5 2800f(2:8:f) 3 [ 0 c3c 1c9c380 ] ] 0.014895439 -||||x - d32767v5 2800a(2:8:a) 4 [ 7fff 5 0 5 ] 0.014895815 -||||x - d32767v5 runstate_change d32767v5 running->runnable 0.014896751 -||||x - d?v? runstate_change d0v5 runnable->running ] 0.014908155 -|||x| - d0v4 28006(2:8:6) 2 [ 0 4 ] ] 0.014908228 -||||x - d0v5 28006(2:8:6) 2 [ 0 5 ] ] 0.014908405 -x|||| - d0v1 28006(2:8:6) 2 [ 0 1 ] ] 0.014909231 -||x|| - d0v3 28006(2:8:6) 2 [ 0 3 ] ] 0.014910265 -|||x| - d0v4 2800e(2:8:e) 2 [ 0 7f14 ] ] 0.014910384 -|||x| - d0v4 2800f(2:8:f) 3 [ 7fff 7f14 ffffffff ] ] 0.014910550 -|||x| - d0v4 2800a(2:8:a) 4 [ 0 4 7fff 4 ] ] 0.014910566 -x|||| - d0v1 2800e(2:8:e) 2 [ 0 6743 ] ] 0.014910679 -x|||| - d0v1 2800f(2:8:f) 3 [ 7fff 6743 ffffffff ] ] 0.014910707 -||||x - d0v5 2800e(2:8:e) 2 [ 0 3f80 ] 0.014910783 -|||x| - d0v4 runstate_change d0v4 running->blocked ] 0.014910803 -x|| | - d0v1 2800a(2:8:a) 4 [ 0 1 7fff 1 ] ] 0.014910819 -||| x - d0v5 2800f(2:8:f) 3 [ 7fff 3f80 ffffffff ] ] 0.014910944 -||| x - d0v5 2800a(2:8:a) 4 [ 0 5 7fff 5 ] 0.014911030 -x|| | - d0v1 runstate_change d0v1 running->blocked 0.014911109 - || x - d0v5 runstate_change d0v5 running->blocked ] 0.014911307 - |x - d0v3 2800e(2:8:e) 2 [ 0 4c74 ] 0.014911367 - || x - d?v? runstate_change d32767v5 runnable->running ] 0.014911417 - |x - - d0v3 2800f(2:8:f) 3 [ 7fff 4c74 ffffffff ] 0.014911471 - ||x- - d?v? runstate_change d32767v4 runnable->running 0.014911512 -x||-- - d?v? runstate_change d32767v1 runnable->running ] 0.014911530 --|x-- - d0v3 2800a(2:8:a) 4 [ 0 3 7fff 3 ] 0.014911687 --|x-- - d0v3 runstate_change d0v3 running->blocked 0.014912276 --|x-- - d?v? runstate_change d32767v3 runnable->running ] 0.015036914 --x--- - d0v2 28006(2:8:6) 2 [ 0 2 ] ] 0.015038191 --x--- - d0v2 2800e(2:8:e) 2 [ 0 28d83 ] ] 0.015038313 --x--- - d0v2 2800f(2:8:f) 3 [ 7fff 28d83 ffffffff ] ] 0.015038445 --x--- - d0v2 2800a(2:8:a) 4 [ 0 2 7fff 2 ] 0.015038617 --x--- - d0v2 runstate_change d0v2 running->blocked 0.015039232 --x--- - d?v? runstate_change d32767v2 runnable->running 0.020630385 ------ x d32767v23 runstate_change d4v0 blocked->runnable ] 0.020631491 ------ x d32767v23 28004(2:8:4) 2 [ 4 0 ] ] 0.020633401 ------ x d32767v23 2800e(2:8:e) 2 [ 7fff edb796 ] ] 0.020633555 ------ x d32767v23 2800f(2:8:f) 3 [ 4 d97 1c9c380 ] ] 0.020633813 ------ x d32767v23 2800a(2:8:a) 4 [ 7fff 17 4 0 ] 0.020634086 ------ x d32767v23 runstate_change d32767v23 running->runnable 0.020635147 ------ x d?v? runstate_change d4v0 runnable->running ] 0.020650487 ------ x d4v0 28006(2:8:6) 2 [ 4 0 ] ] 0.020651616 ------ x d4v0 2800e(2:8:e) 2 [ 4 5400 ] ] 0.020651739 ------ x d4v0 2800f(2:8:f) 3 [ 7fff 5400 ffffffff ] ] 0.020651876 ------ x d4v0 2800a(2:8:a) 4 [ 4 0 7fff 17 ] 0.020652054 ------ x d4v0 runstate_change d4v0 running->blocked
On ven, 2013-05-31 at 18:18 +0100, Marcus Granado wrote:> As an experiment trying to reduce the latency when scheduling dom0 > vcpus, I applied the following patch to __runq_insert() to xen 4.2: > > diff -r 8643ca19d356 -r 91b13479c1a2 xen/common/sched_credit.c > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c > @@ -205,6 +205,15 @@ > BUG_ON( __vcpu_on_runq(svc) ); > BUG_ON( cpu != svc->vcpu->processor ); > > + /* if svc is a dom0 vcpu, put it always before all the other vcpus > in the runq, > + * so that dom0 vcpus always have priority > + */ > + if (svc->vcpu->domain->domain_id == 0) { > + svc->pri = CSCHED_PRI_TS_BOOST; /* make sure no vcpu goes in > front of this one until this vcpu is scheduled */ > + list_add(&svc->runq_elem, (struct list_head *)runq); > + return; > + } > + > list_for_each( iter, runq ) > { > const struct csched_vcpu * const iter_svc = __runq_elem(iter); >Mmm... Are we talking about wakeup latency --which, BTW, is what TS_BOOST is all about, AFAIUI ? In that case, isn''t a waking vcpu, whether or not it belongs to Dom0, being boosted already in csched_vcpu_wake()? __runq_insert() is called right after that, so I think it sees the boosting already, without the need of the above. If it''s not only wakeup latency issues that you''re trying to address, then I''m not sure, but still, __runq_insert() does not look the right place where to place such logic, at least per my personal taste. :-)> However, this patch seems to have had the opposite effect, and I would > like to understand why. A win7 guest now takes hours to start up, and I > believe this is due to dom0 taking an order of 10ms to serve each vm i/o > request, even though the dom0 vcpus and the guest vcpu are in different > pcpus. >Well, just shooting in the dark, but __runq_insert() is also called in csched_schedule(). Perhaps your modification above interacts badly with the current scheduling logic? Another way of trying to achieve what you seem to be up to, could be to put an "is_it_dom0?" check in csched_vcpu_acct() and, if true, do not clear the boosting. Beware, I''m not saying that it makes sense, or that I like it, it just seems more clean (at least to me) than hijacking __runq_insert(). What do you think?> xenalyze-a.out: http://pastelink.me/getfile.php?key=390a25 > xentrace-D-T5.out: http://pastelink.me/getfile.php?key=b3d584 >Sorry, can''t look at the traces right now... If I find 5 mins for them and spot something weird, I''ll let you know. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel