thr3ads.net - Xen devel - [Xen-devel] compute performace problem [Apr 2005]

If this information is useful, please help other people find it:
Share via:

David Becker

2005-Apr-23 14:52 UTC

[Xen-devel] compute performace problem

One of my users discovered large deviations in execution time for his
mpi jobs on xenUs.  I can reproduce the problem running his job on
a single VM.  On a native linux box the job completes in 64 secs +/-
a second or so.  On a xenU, it completes somewhere between 64 and 250
secs.  This is true on 2.0.5 (2.6.10-xenU) and 2.0-testing(2.6.11-xenU).
I tried xen-unstable but it seemed any task was taking 4 times as
long as on 2.0 so I guess its still too unstable.

Any suggestions I can try?


Software is debian sarge with lam4-7.1.1 on xen-2.0-testing(Apr 22).
stracing mpirun and lamd show no system calls being made during the
computation phase, and that phase is where the extra time disappears.
Starting and stopping do not cause the delay.  xen is running the default
bvt scheduler at default settings.  Raising the priority of xenU made
no difference.   The domains on the box are an idle xen0 and the xenU running
the app.  /lib/tls is moved to tls.disabled on both domains, and on
native linux.

Hardware is a Dell PowerEdge 1650 (dual cpu sockets but only one cpu
installed, 2GB mem).  The app itself uses 375MB of mem.  xenU was config for
HIGHMEM4GB but was created with 640MB. No swap space is consumed on
the system.  I saw similar compute time variation running this job
on a dual IBM x335.

Raw results for 2.0-testing 2.6.11-xenU linux:
Run Time  =    104.590
Run Time  =    247.370
Run Time  =     89.050
Run Time  =     64.090
Run Time  =     63.430
Run Time  =     80.360
Run Time  =     64.410
Run Time  =    131.070
Run Time  =    236.850
Run Time  =     75.470
Run Time  =    134.570
Run Time  =     65.350
Run Time  =     65.480
Run Time  =     64.970
Run Time  =    202.650


Raw results for native 2.6.10 linux:
Run Time  =     64.120
Run Time  =     63.170
Run Time  =     63.540
Run Time  =     64.670
Run Time  =     64.990
Run Time  =     64.070
Run Time  =     64.930
Run Time  =     64.640
Run Time  =     64.030





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Steven Hand

2005-Apr-23 15:06 UTC

head link

Re: [Xen-devel] compute performace problem

>One of my users discovered large deviations in execution time for his
>mpi jobs on xenUs.  I can reproduce the problem running his job on
>a single VM.  On a native linux box the job completes in 64 secs +/-
>a second or so.  On a xenU, it completes somewhere between 64 and 250
>secs.  This is true on 2.0.5 (2.6.10-xenU) and 2.0-testing(2.6.11-xenU).
>I tried xen-unstable but it seemed any task was taking 4 times as
>long as on 2.0 so I guess its still too unstable.
>
>Any suggestions I can try?
Can you repeat the experiment using the round robin scheduler in Xen (i.e. 
boot with xen option "sched=rrobin")? 

cheers,

S.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Apr-23 15:12 UTC

head link

Re: [Xen-devel] compute performace problem

On 23 Apr 2005, at 15:52, David Becker wrote:
> One of my users discovered large deviations in execution time for his
> mpi jobs on xenUs.  I can reproduce the problem running his job on
> a single VM.  On a native linux box the job completes in 64 secs +/-
> a second or so.  On a xenU, it completes somewhere between 64 and 250
> secs.  This is true on 2.0.5 (2.6.10-xenU) and 
> 2.0-testing(2.6.11-xenU).
> I tried xen-unstable but it seemed any task was taking 4 times as
> long as on 2.0 so I guess its still too unstable.
>
> Any suggestions I can try?
Does the domU have the same amount of memory as the native Linux? Is 
the native Linux running on a single cpu, just like the domU? Is the 
domU definitely quiescent apart from the mpi job? Have you actually 
directly observed the app taking 250 seconds, or are you relying on 
wall-clock time within the domU (it would be very odd if its time was 
so badly wrong, but this sounds like a very odd situation).

If the app is cpu-bound, there are no other apps running in the domain, 
and no other domains contending for that cpu, then it is hard to 
imagine where the slowdown coudl come from.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Apr-23 15:14 UTC

head link

Re: [Xen-devel] compute performace problem

On 23 Apr 2005, at 16:06, Steven Hand wrote:
>> One of my users discovered large deviations in execution time for his
>> mpi jobs on xenUs.  I can reproduce the problem running his job on
>> a single VM.  On a native linux box the job completes in 64 secs +/-
>> a second or so.  On a xenU, it completes somewhere between 64 and 250
>> secs.  This is true on 2.0.5 (2.6.10-xenU) and 
>> 2.0-testing(2.6.11-xenU).
>> I tried xen-unstable but it seemed any task was taking 4 times as
>> long as on 2.0 so I guess its still too unstable.
>>
>> Any suggestions I can try?
>
> Can you repeat the experiment using the round robin scheduler in Xen 
> (i.e.
> boot with xen option "sched=rrobin")?
Probably will be worse -- I''m not certain that the rrobin scheduler is 
even smart enough not to include the idle domain in its round-robin 
schedule. An important reason why I killed it in the unstable tree.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Becker

2005-Apr-23 16:50 UTC

head link

Re: [Xen-devel] compute performace problem

I turned on rrobin
    rack116-xen:~# xm dmesg | grep -i sched
    (XEN) Using scheduler: Round-Robin Scheduler (rrobin)
and get the same range of execution times as with bvt:
Run Time  =    183.780
Run Time  =    157.980
Run Time  =     65.770
Run Time  =     65.530
Run Time  =     86.000
Run Time  =     65.530
Run Time  =     79.270
Run Time  =     88.150
Run Time  =     69.600
Run Time  =     64.900
Run Time  =    246.310
Run Time  =    252.230
Run Time  =     64.880



" Does the domU have the same amount of memory as the native Linux? Is 

Yes.  I reran on native linux with 512MB and the job ran in 64s every time.

" the native Linux running on a single cpu, just like the domU?

Yes.  The Dell1650 has 1 cpu installed, and no HT (PIII).
I''ve seen the effect on dual P4s as well.

" domU definitely quiescent apart from the mpi job?

there are some background daemons like gmond and rwhod, but that is the
same on all setups.

" directly observed the app taking 250 seconds

good question.  I wondered the same thing so I now made the script
ssh to the ntp server to print the date between each run.  And ...
yes the elasped times match the wallclock from the ntp server.

" If the app is cpu-bound, there are no other apps running in the domain, 
" and no other domains contending for that cpu, then it is hard to 
" imagine where the slowdown coudl come from.

agreed.  If the native linux execution time wasn''t so consistent,
I''d blame
the app.  I sent mail upstream to the app authors to see if they have a
suggestion.  It is part of the CardioWave simulation of electical pulses
that flow through the heart (http://cardiowave.duke.edu)

I tried some tight loops and got consistent durations for time scales from
fractions of a second to 2000 seconds.  The loops are like this:
    time for ((i=0;i<100000;++i)); do : ;done


Here are /proc stats during the app compute phase:
xenU vmstat:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0      0   4584    456 248044    0    0     8     0  108    15 100  0  0  0
 1  0      0   4584    456 248044    0    0     0     0  106    11 100  0  0  0
 1  0      0   4584    456 248044    0    0     0     0  106     9 100  0  0  0
 1  0      0   4584    456 248044    0    0     0    24  110    17 100  0  0  0 

xen0 vmstat:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0   9108   1648   2324  10532    0    0     0    24   59    28  0  0 100  0
 0  0   9108   1648   2324  10532    0    0     0     0   38    14  0  0 100  0
 0  0   9108   1640   2332  10532    0    0     0    88   62    37  0  1 94  5


xenU interrupt per second:
irq128:         0 Dynamic-irq  misdire  irq131:         0 Dynamic-irq  blkif   
irq129:         0 Dynamic-irq  ctrl-if  irq132:         7 Dynamic-irq  eth0    
irq130:       100 Dynamic-irq  timer   



xen0 interrupt per second:
irq  1:         0 Phys-irq  i8042       irq128:         0 Dynamic-irq  misdire 
irq  6:         0                       irq129:         0 Dynamic-irq  ctrl-if 
irq 12:         0                       irq130:        38 Dynamic-irq  timer   
irq 14:         0 Phys-irq  ide0        irq131:         0 Dynamic-irq  console 
irq 17:         6 Phys-irq  eth0        irq132:         0 Dynamic-irq  net-be- 
irq 18:         6 Phys-irq  aic7xxx     irq133:         0 Dynamic-irq  blkif-b 
irq 19:         0 Phys-irq  aic7xxx     irq134:         0 Dynamic-irq  vif2.0  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Apr-23 17:53 UTC

head link

Re: [Xen-devel] compute performace problem

On 23 Apr 2005, at 17:50, David Becker wrote:
> agreed.  If the native linux execution time wasn''t so consistent,
I''d
> blame
> the app.  I sent mail upstream to the app authors to see if they have a
> suggestion.  It is part of the CardioWave simulation of electical 
> pulses
> that flow through the heart (http://cardiowave.duke.edu)
Could the app be thrashing, or otherwise causing I/O implicitly (e.g., 
mmap''ed I/O rather than via syscalls)?
  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-23 21:17 UTC

head link

RE: [Xen-devel] compute performace problem

> One of my users discovered large deviations in execution time 
> for his mpi jobs on xenUs.  I can reproduce the problem 
> running his job on a single VM.  On a native linux box the 
> job completes in 64 secs +/- a second or so.  On a xenU, it 
> completes somewhere between 64 and 250 secs.  This is true on 
> 2.0.5 (2.6.10-xenU) and 2.0-testing(2.6.11-xenU).
> I tried xen-unstable but it seemed any task was taking 4 
> times as long as on 2.0 so I guess its still too unstable.
> 
> Any suggestions I can try?
I''d focus on figuring out why it''s always slow on unstable.
For 32 bit
non-SMP guests the unstable tree is believed to be at least as stable as
2.0.

Do you get predictable performance running in domain0?

Thanks,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Becker

2005-Apr-24 17:28 UTC

head link

Re: [Xen-devel] compute performace problem

" Do you get predictable performance running in domain0?

d''oh I thought of it, and then forgot.  Trying it now, 2.0-testing
dom-0 does give consistent run times. 

Today I also get consistent times with xen-unstable.  Turns out
unstable dom-0 needs more memory than the 2.0 system, so my default
setup was swapping when I tried unstable on Friday.  If I give the unstable
dom-0 more memory, xen0 and xenU do run the app with consistent times.
And, if I run the tests in xen0 and xenU simulataneously, cpu time is shared
evenly on unstable and on 2.0.  Thats all good.

So the only domains showing the 4x deviation in run time are the 2.0 xenU
domains.  I''ll have to discuss with the users if we want to upgrade or
investigate 2.0 further.  Upgrading Xen is a pretty disruptive task so
close to the end of the term, so I may end up figuring out whats going
on in 2.0 anyway.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Apr-24 17:54 UTC

head link

Re: [Xen-devel] compute performace problem

On 24 Apr 2005, at 18:28, David Becker wrote:
> Today I also get consistent times with xen-unstable.  Turns out
> unstable dom-0 needs more memory than the 2.0 system, so my default
> setup was swapping when I tried unstable on Friday.  If I give the 
> unstable
> dom-0 more memory, xen0 and xenU do run the app with consistent times.
> And, if I run the tests in xen0 and xenU simulataneously, cpu time is 
> shared
> evenly on unstable and on 2.0.  Thats all good.
How much bigger is our domain0 memory footprint in unstable? That kind 
of info is incredibly useful for keeping tabs on excessive resource and 
performance hogs.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2005 - compute performace problem

[Xen-devel] compute performace problem

Re: [Xen-devel] compute performace problem

Re: [Xen-devel] compute performace problem

Re: [Xen-devel] compute performace problem

Re: [Xen-devel] compute performace problem

Re: [Xen-devel] compute performace problem

RE: [Xen-devel] compute performace problem

Re: [Xen-devel] compute performace problem

Re: [Xen-devel] compute performace problem