Hi everyone, I''ve been playing around with the atropos scheduler last couple of days, and I''m quite convinced that it *does not* enforce the soft real time guarantees. Maybe I''m using the wrong parameters or something, so let me describe my experiment: o first I create 2 VMs -- VM1 and VM2 o then I change their atropos params as follows: $ xm atropos 1 10 100 1 1 $ xm atropos 2 70 100 1 1 Ideally, this should guarantee that VM1 gets 10ns of CPU time every 100ns, and VM2 gets 70ns every 100ns, and that any left over CPU time will be shared between the 2. o after this I write a simple program that computes fibonacci numbers using naive recursion to eat away all the CPU, and loops around indefinitely. Programs in both VMs are identical, and I start them within seconds of each other. o I take reading from xm list a few seconds after the programs start, as my base reference: CPU-TIME VM1 1 63 0 ----- 173.5 9601 VM2 2 63 0 ----- 10.9 9602 o Thereafter I take readings every few seconds. The abosolute values of CPU time are not that important, if the *rate* at which CPU time increases in both the VMs can reflect the atropos scheduling, that is just as well. Here are some of the subsequent readings: VM1 1 63 0 ----- 178.0 9601 VM2 2 63 0 ----- 15.4 9602 VM1 1 63 0 ----- 216.9 9601 VM2 2 63 0 ----- 54.1 9602 VM1 1 63 0 ----- 308.4 9601 VM2 2 63 0 ----- 145.3 9602 VM1 1 63 0 ----- 428.4 9601 VM2 2 63 0 ----- 265.1 9602 As can be seen, the CPU times of both VMs are increasing at almost *identical* rates. If the atropos params were working, VM2''s cpu time should have been increasing a lot faster. Has anyone had this problem before? I''ll start looking at the code, but since I''m not familiar with Xen''s scheduling code, it might be a while. In the meanwhile if anyone has any pointers, it will be great. TIA -- Diwaker Gupta http://resolute.ucsd.edu/diwaker ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
>I''ve been playing around with the atropos scheduler last couple of >days, and I''m quite convinced that it *does not* enforce the soft real >time guarantees.It is quite possible our current implementation is bugged -- we''ve not gotten around to extensive testing in the recent past.> Maybe I''m using the wrong parameters or something, so let me describe > my experiment: > >o first I create 2 VMs -- VM1 and VM2 >o then I change their atropos params as follows: >$ xm atropos 1 10 100 1 1 >$ xm atropos 2 70 100 1 1 >Ideally, this should guarantee that VM1 gets 10ns of CPU time every >100ns, and VM2 gets 70ns every 100ns, and that any left over CPU time >will be shared between the 2.Well your parameters are somewhat aggressive -- although times are specified in nanoseconds this is for precision rather than for allowing 10ns slices and 100ns periods (which translates into at least 10 millions context switches a second). x86 CPUs don''t really turn corners too fast, and so this is a considerable overhead. Atropos doesn''t work it it''s in overload (>= 100%), which includes both allocated slices and all overhead for context switching, running through the scheduler, and certain irq handling. Your latency values are also rather aggressive - 1ns means that if a domain blocks for any reason (e.g. to do I/O) then when it unblocks it''s new period will start at most 1ns after the current pass through the scheduler. There''s a small modification in the current implementation which means this may not bit quite as hard as it could, but even so any domain waiting more than 100ns for something could cause an immediate reentry into the scheduler after unblocking due to this. One simple thing to try is to scale your scheduling parameters to something more reasonable; e.g. $ xm atropos 1 10000 100000 50000 1 $ xm atropos 2 70000 100000 50000 1 Let us know how well this works -- if this is also broken, then we have a real bug. cheers, S. p.s. you''re not running on SMP are you? if so, the domains will be on different CPUs and hence the x flag will cause each of them to get approximately the same allocation, just as you observed.
> >o first I create 2 VMs -- VM1 and VM2 > >o then I change their atropos params as follows: > >$ xm atropos 1 10 100 1 1 > >$ xm atropos 2 70 100 1 1 > >Ideally, this should guarantee that VM1 gets 10ns of CPU time every > >100ns, and VM2 gets 70ns every 100ns, and that any left over CPU time > >will be shared between the 2.You might find the following program useful while testing out the scheduler. It prints the amount of CPU it''s getting once a second. Atropos was working fine for CPU bound domains a few months back, but had some fairly odd behaviour for IO intensive domains. Because no one has been using it its probably rotted a bit. The original algorithm (used in the Nemesis OS) worked just fine, so this is just an implementation issue. Ian /****************************************************************************** * slurp.c * * Slurps spare CPU cycles and prints a percentage estimate every second. */ #include <stdio.h> #include <stdlib.h> #include <string.h> /* rpcc: get full 64-bit Pentium TSC value */ static __inline__ unsigned long long int rpcc(void) { unsigned int __h, __l; __asm__ __volatile__ ("rdtsc" :"=a" (__l), "=d" (__h)); return (((unsigned long long)__h) << 32) + __l; } /* * find_cpu_speed: * Interrogates /proc/cpuinfo for the processor clock speed. * * Returns: speed of processor in MHz, rounded down to nearest whole MHz. */ #define MAX_LINE_LEN 50 int find_cpu_speed(void) { FILE *f; char s[MAX_LINE_LEN], *a, *b; if ( (f = fopen("/proc/cpuinfo", "r")) == NULL ) goto out; while ( fgets(s, MAX_LINE_LEN, f) ) { if ( strstr(s, "cpu MHz") ) { /* Find the start of the speed value, and stop at the dec point. */ if ( !(a=strpbrk(s,"0123456789")) || !(b=strpbrk(a,".")) ) break; *b = ''\0''; fclose(f); return(atoi(a)); } } out: fprintf(stderr, "find_cpu_speed: error parsing /proc/cpuinfo for cpu MHz"); exit(1); } int main(void) { int mhz, i; /* * no_preempt_estimate is our estimate, in clock cycles, of how long it * takes to execute one iteration of the main loop when we aren''t * preempted. 50000 cycles is an overestimate, which we want because: * (a) On the first pass through the loop, diff will be almost 0, * which will knock the estimate down to <40000 immediately. * (b) It''s safer to approach real value from above than from below -- * note that this algorithm is unstable if n_p_e gets too small! */ unsigned int no_preempt_estimate = 50000; /* * prev = timestamp on previous iteration; * this = timestamp on this iteration; * diff = difference between the above two stamps; * start = timestamp when we last printed CPU % estimate; */ unsigned long long int prev, this, diff, start; /* * preempt_time = approx. cycles we''ve been preempted for since last stats * display. */ unsigned long long int preempt_time = 0; /* Required in order to print intermediate results at fixed period. */ mhz = find_cpu_speed(); printf("CPU speed = %d MHz\n", mhz); start = prev = rpcc(); for ( ; ; ) { /* * By looping for a while here we hope to reduce affect of getting * preempted in critical "timestamp swapping" section of the loop. * In addition, it should ensure that ''no_preempt_estimate'' stays * reasonably large which helps keep this algorithm stable. */ for ( i = 0; i < 10000; i++ ); /* * The critical bit! Getting preempted here will shaft us a bit, * but the loop above should make this a rare occurrence. */ this = rpcc(); diff = this - prev; prev = this; /* if ( diff > (1.5 * preempt_estimate) */ if ( diff > no_preempt_estimate + (no_preempt_estimate>>1) ) { /* We were probably preempted for a while. */ preempt_time += diff - no_preempt_estimate; } else { /* * Looks like we weren''t preempted -- update our time estimate: * New estimate = 0.75*old_est + 0.25*curr_diff */ no_preempt_estimate (no_preempt_estimate>>1) + (no_preempt_estimate>>2) + (diff>>2); } /* Dump CPU time every second. */ if ( (this - start) / mhz > 1000000 ) { printf("Slurped %.2f%% CPU, TSC %08x\n", 100.0*((this-start-preempt_time)/((double)this-start)), this); start = this; preempt_time = 0; } } return(0); }
> It is quite possible our current implementation is bugged -- we''ve > not gotten around to extensive testing in the recent past.AFAIK it doesn''t behave quite correct. There''s some difficult-to-spot bug somewhere in the code - it may well only be a small tweak once tracked down. This currently looks unlikely to be fixed for 2.0 but hopefully will be fully operational in 2.1.> p.s. you''re not running on SMP are you? if so, the domains will be > on different CPUs and hence the x flag will cause each of them > to get approximately the same allocation, just as you observed.That''s also a good point: Xen effectively runs a uniprocessor scheduler ON EACH CPU, with load balancing across CPUs achieved by CPU pinning in domain configs or using xm. If you have one domain on each CPU with the xtratime flag set, they''ll get all the CPU they want... HTH, Mark
Hi everyone, Thats for the replies. Here''s the update: I used Ian''s slurp program, with the params suggested by Steven. Actually I had myself been thinking about the small values that I had been using, but I was not sure what kind of impact they would have. Here are the params I used (kind of an extreme case, but I just wanted to be sure that if there was *some* change, I would be able to see it): xm atropos 1 10000 200000 50000 0 xm atropos 2 150000 200000 50000 0 So with these changes, here''s a snippet of slurp''s output from both the VMs (VM2 is started a few seconds after VM1) VM1: CPU speed = 498 MHz Slurped 90.72% CPU, TSC 6f70c573 Slurped 97.76% CPU, TSC 8d20a825 Slurped 98.77% CPU, TSC aacff8d9 Slurped 99.14% CPU, TSC c87f42c0 Slurped 99.36% CPU, TSC e62ef069 Slurped 98.22% CPU, TSC 03ded2ca Slurped 98.71% CPU, TSC 218de63c Slurped 76.88% CPU, TSC 3f40d8a3 Slurped 46.75% CPU, TSC 5cf03633 Slurped 39.86% CPU, TSC 7ab377bc Slurped 47.18% CPU, TSC 986e75c5 Slurped 59.25% CPU, TSC b61ddba2 Slurped 51.54% CPU, TSC d3ccf714 VM2: Slurped 53.26% CPU, TSC 52e34564 Slurped 55.14% CPU, TSC 70ae5af6 Slurped 57.19% CPU, TSC 8e5d809e Slurped 42.62% CPU, TSC ac0cbb65 Slurped 42.80% CPU, TSC c9bc96d9 Slurped 56.01% CPU, TSC e7766b1c Slurped 54.60% CPU, TSC 0530e391 Slurped 57.15% CPU, TSC 22e0003b Slurped 56.18% CPU, TSC 40a1e234 Slurped 57.09% CPU, TSC 5e50c733 Slurped 56.75% CPU, TSC 7c125064 Slurped 55.62% CPU, TSC 99c1f384 Slurped 59.20% CPU, TSC b77143db Slurped 47.35% CPU, TSC d5365862 Slurped 37.21% CPU, TSC f2e61a4e Slurped 54.03% CPU, TSC 1095a58f Slurped 59.79% CPU, TSC 2e675a34 Observations: o When VM2 is not running, VM1 effectively gets *all* the CPU, even if the xtratime bit is set to 0. o When VM2 starts running, it looks like they get roughly equal CPU. There doesn''t seem to be any ''atropos'' scheduling happening. So how should one go about debugging Xen? -- Diwaker Gupta http://resolute.ucsd.edu/diwaker