thr3ads.net - Xen devel - [Xen-devel] New MPI benchmark performance results (update) [May 2005]

If this information is useful, please help other people find it:
Share via:

xuehai zhang

2005-May-03 09:11 UTC

[Xen-devel] New MPI benchmark performance results (update)

Hi all,

In the following post I sent in early April 
(http://lists.xensource.com/archives/html/xen-devel/2005-04/msg00091.html), I
reported some
performance gap when running PMB SendRecv benchmark on both native Linux and
domU. Now I''ve prepared
a webpage comparing 8 PMB benchmarks'' performance under 4 scenarios
(native Linux, dom0, domU with
SMP, and domU without SMP) at
http://people.cs.uchicago.edu/~hai/vm1/vcluster/PMB/.

In the graphs presented on the webpage, we take the results of native Linux as
the reference and
normalize the other 3 scenarios to it. We observe a general pattern that usually
dom0 has a better
performance than domU with SMP than domU without SMP (here better performance
means low latency and
high throughput). However, we also notice very big performance gap between domU
(w/o SMP) and native
linux (or dom0 because generally dom0 has a very similar performance as native
linux). Some distinct
examples are: 8-node SendRecv latency (max domU/linux score ~ 18), 8-node
Allgather latency (max
domU/linux score ~ 17), and 8-node Alltoall latency (max domU/linux > 60).
The performance
difference in the last example is HUGE and we could not think about a reasonable
explaination why
transferring 512B message size is so much different than other sizes. We
appreciate if you can
provide your insight to such a big performance problem in these benchmarks.

BTW, all the benchmarking is based on the original Xen code. That is, we
didn''t modify the
net_rx_action netback to kick the frontend after every packet as suggested by
Ian in the following
post (http://lists.xensource.com/archives/html/xen-devel/2005-04/msg00180.html)

Please let me know if you have any questions about the configuration of the
benchmarking
experiments. I am looking forward to your insightful explainations.

Thanks.

Xuehai

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Steven Hand

2005-May-03 09:28 UTC

head link

Re: [Xen-devel] New MPI benchmark performance results (update)

> Please let me know if you have any questions about the configuration
> of the benchmarking experiments. I am looking forward to your
> insightful explainations.
Erm, what version of Xen are you using for these? I notice that the 
dom0 kernel seems to be using 2.4.28 which is not current in any of 
the trees. Since you''re using SMP guests, I''m guessing this is
some
old version of xen-unstable? 

Your results are kinda interesting but I think you''d probably be
better off trying to compare like with like so that we can isolate
the performance issues due to Xen/XenLinux, i.e. 

  - use the same kernel (or ported kernel) in each case; 
  - use the same amount of memory in each case. 

Otherwise you end up comparing 2.4 to 2.6, or 128MB/360MB/512MB, ...

Also you should probably use the current unstable tree since there
have been a number of performance fixes.


cheers,

S.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-May-03 13:56 UTC

head link

RE: [Xen-devel] New MPI benchmark performance results (update)

> 
> In the graphs presented on the webpage, we take the results 
> of native Linux as the reference and normalize the other 3 
> scenarios to it. We observe a general pattern that usually 
> dom0 has a better performance than domU with SMP than domU 
> without SMP (here better performance means low latency and 
> high throughput). However, we also notice very big 
> performance gap between domU (w/o SMP) and native linux (or 
> dom0 because generally dom0 has a very similar performance as 
> native linux). Some distinct examples are: 8-node SendRecv 
> latency (max domU/linux score ~ 18), 8-node Allgather latency 
> (max domU/linux score ~ 17), and 8-node Alltoall latency (max 
> domU/linux > 60). The performance difference in the last 
> example is HUGE and we could not think about a reasonable 
> explaination why transferring 512B message size is so much 
> different than other sizes. We appreciate if you can provide 
> your insight to such a big performance problem in these benchmarks.
I still don''t quite understand your experimental setup. What version of
Xen are you using? How many CPUs does each node have? How many domU''s
do
you run on a single node?

As regards the anomalous result for 512B AlltoAll performance, the best
way to track this down would be to use xen-oprofile. Is it reliably
repeatable? Really bad results are usually due to packets being dropped
somewhere -- there hasn''t ben a whole lot of effort put into UDP
performance because so few applications use it.

Ian


 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2005-May-03 16:13 UTC

head link

Re: [Xen-devel] New MPI benchmark performance results (update)

> I will grab the current unstable tree and rerun the experiments by
> integrating the above configuration improvements. I will send a new result
> update when I finish.
Any bug fixes should also have gone into the -testing tree, which is part of 
the 2.0 series.

Cheers,
Mark

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

xuehai zhang

2005-May-03 16:36 UTC

head link

Re: [Xen-devel] New MPI benchmark performance results (update)

Steven,

Thanks for the response.
>>Please let me know if you have any questions about the configuration
>>of the benchmarking experiments. I am looking forward to your
>>insightful explainations.
> 
> 
> Erm, what version of Xen are you using for these? I notice that the 
> dom0 kernel seems to be using 2.4.28 which is not current in any of 
> the trees. Since you''re using SMP guests, I''m guessing
this is some
> old version of xen-unstable? 
The Xen version is 2.0 for all the experiments. I am not sure if the SMP
mentioned in my email is
the same as "SMP guests" you mentioned. To clarify, "domU with
SMP" I mentioned means Xen is booted
with SMP support (no "nosmp" option) and I pin dom0 to the 1st CPU and
pin domU to the 2nd CPU;
"domU with no SMP" I mentioned means Xen is booted without SMP support
(with "nosmp" option) and
both dom0 and domU use the same single CPU.
> Your results are kinda interesting but I think you''d probably be
> better off trying to compare like with like so that we can isolate
> the performance issues due to Xen/XenLinux, i.e. 
I agree with your suggestion.
>   - use the same kernel (or ported kernel) in each case; 
I will use 2.6 kernel for both dom0 and domU. For native linux, the current
kernel version is 2.4
and I have to convince the cluster administrator to upgrade it to 2.6 for a fair
comparison as you
point out.
>   - use the same amount of memory in each case. 
It is hard to use the same amount of memory, especially for domU memory because
dom0 will occupy
part of the 512MB physical memory. BTW, we think the memory is unlikely a key
factor to the
performance because the maximum message size is 4MB and we only test up to
8-node cluster (8
processes) and the memory will not be overallocated.
> Otherwise you end up comparing 2.4 to 2.6, or 128MB/360MB/512MB, ...
> 
> Also you should probably use the current unstable tree since there
> have been a number of performance fixes.
I will grab the current unstable tree and rerun the experiments by integrating
the above
configuration improvements. I will send a new result update when I finish.

Thanks again for the help.

Xuehai
> cheers,
> 
> S.
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

xuehai zhang

2005-May-03 16:48 UTC

head link

Re: [Xen-devel] New MPI benchmark performance results (update)

Ian,

Thanks for the response.
>>In the graphs presented on the webpage, we take the results 
>>of native Linux as the reference and normalize the other 3 
>>scenarios to it. We observe a general pattern that usually 
>>dom0 has a better performance than domU with SMP than domU 
>>without SMP (here better performance means low latency and 
>>high throughput). However, we also notice very big 
>>performance gap between domU (w/o SMP) and native linux (or 
>>dom0 because generally dom0 has a very similar performance as 
>>native linux). Some distinct examples are: 8-node SendRecv 
>>latency (max domU/linux score ~ 18), 8-node Allgather latency 
>>(max domU/linux score ~ 17), and 8-node Alltoall latency (max 
>>domU/linux > 60). The performance difference in the last 
>>example is HUGE and we could not think about a reasonable 
>>explaination why transferring 512B message size is so much 
>>different than other sizes. We appreciate if you can provide 
>>your insight to such a big performance problem in these benchmarks.
> 
> 
> I still don''t quite understand your experimental setup. What
version of
> Xen are you using? How many CPUs does each node have? How many
domU''s do
> you run on a single node?
The Xen version is 2.0. Each node has 2 CPUs. "domU with SMP" I
mentioned in the previous email
means Xen is booted with SMP support (no "nosmp" option) and I pin
dom0 to the 1st CPU and pin domU
to the 2nd CPU; "domU with no SMP" I mentioned means Xen is booted
without SMP support (with "nosmp"
option) and both dom0 and domU use the same single CPU. There is only 1 domU
running on a single
node for each experiment.
> As regards the anomalous result for 512B AlltoAll performance, the best
> way to track this down would be to use xen-oprofile. 
I am not very familar with xen-oprofile. I notice there are some discussions
about it in the mailing
list. I wonder if there is any other documents that I can refer to. Thanks.
> Is it reliably repeatable? 
Yes, we observe this anomaly repeatable. The reported data point in the graph is
the average of 10
different runs of the same experiment in different time.
> Really bad results are usually due to packets being dropped
> somewhere -- there hasn''t ben a whole lot of effort put into UDP
> performance because so few applications use it.
To clarify: do you indicate that benchmark like AlltoAll might use UDP rather
than TCP as
transportation protocol?

Thanks again for the help.

Xuehai

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

xuehai zhang

2005-May-03 16:58 UTC

head link

Re: [Xen-devel] New MPI benchmark performance results (update)

Mark Williamson wrote:>>I will grab the current unstable tree and rerun the experiments by
>>integrating the above configuration improvements. I will send a new
result
>>update when I finish.
> 
> 
> Any bug fixes should also have gone into the -testing tree, which is part
of
> the 2.0 series.
Mark,
Thanks for the reminding. I will try the -testing tree instead of -unstable tree
then. BTW, could
you please provide me some information about the current status of the Atropos
scheduler? I sent out
an email earlier to ask about it but no response so far. Thanks again.
Xuehai

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G (Jose Renato Santos)

2005-May-03 19:09 UTC

head link

RE: [Xen-devel] New MPI benchmark performance results (update)

> I am not very familar with xen-oprofile. I notice there are 
> some discussions about it in the mailing 
> list. I wonder if there is any other documents that I can 
> refer to. Thanks.
> 
  Please, see http://xenoprof.sourceforge.net for a description
  of xenoprof and for downloading patches.(You will need 3 
  patches: one for xen, one for linux and one for oprofile). 
  You need to be familiar with oprofile to use xenoprof. Please check
  http://oprofile.sourceforge.net/ for more info on oprofile.

  Xenoprof is currently available only for Xen 2.0.5.
  I am working on getting it to xen unstable but there
  is a problem with NMI handling which was not solved yet.
   
  I have also attached a text file that gives an overview
  of xenoprof

  Renato


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nivedita Singhvi

2005-May-03 20:24 UTC

head link

Re: [Xen-devel] New MPI benchmark performance results (update)

xuehai zhang wrote:> Hi all,
> 
> In the following post I sent in early April 
> (http://lists.xensource.com/archives/html/xen-devel/2005-04/msg00091.html),
Hi, thanks for sharing the data - it was interesting.
I tried to find additional data on the benchmarks using
the link you have for the user manual but it gave me
a 404 Error. It wasn''t clear whether your benchmarks
use TCP or UDP or possibly raw sockets?

As has been pointed out by several people, running the
2.6 kernel and comparing apples to apples as much as possible
would help.

Is there any chance you kept some of the system statistics
and settings (netstat -s, sysctl -a info)?.
Did you tune the settings for the system at all?
> Alltoall latency (max domU/linux > 60). The performance difference in 
> the last example is HUGE and we could not think about a reasonable 
> explaination why transferring 512B message size is so much different 
> than other sizes. We appreciate if you can provide your insight to such 
> a big performance problem in these benchmarks.
You have an anomalous point on most of the results - and again,
knowing what kind of traffic this is would really help.

thanks,
Nivedita




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

xuehai zhang

2005-May-03 22:05 UTC

head link

Re: [Xen-devel] New MPI benchmark performance results (update)

Hi Nivedita,

Thanks for the response and the suggestion!
>> Hi all,
>>
>> In the following post I sent in early April 
>>
(http://lists.xensource.com/archives/html/xen-devel/2005-04/msg00091.html),
> 
> 
> 
> Hi, thanks for sharing the data - it was interesting.
> I tried to find additional data on the benchmarks using
> the link you have for the user manual but it gave me
> a 404 Error. 
I corrected the link error and now you can access the user manual through the
link.
> It wasn''t clear whether your benchmarks
> use TCP or UDP or possibly raw sockets?
I''ve read through the PMB user manual and it doesn''t mention
the communication protocol it uses.
However, I do read "typically TCP/IP is the protocol used over Ethernet
networks for MPI
communications" from several references.
> As has been pointed out by several people, running the
> 2.6 kernel and comparing apples to apples as much as possible
> would help.
I fully agree with that and currently I try to rerun the experiments by using
the same kernel
versions for both dom0 and domU (maybe native linux
too).> 
> Is there any chance you kept some of the system statistics
> and settings (netstat -s, sysctl -a info)?.
I did not collect them while running the benchmarks, but I will try to log them
when I rerun the
experiments.
> Did you tune the settings for the system at all?No, I did not do any specific things to tune the system.
>> Alltoall latency (max domU/linux > 60). The performance difference
in
>> the last example is HUGE and we could not think about a reasonable 
>> explaination why transferring 512B message size is so much different 
>> than other sizes. We appreciate if you can provide your insight to 
>> such a big performance problem in these benchmarks.
> 
> 
> You have an anomalous point on most of the results - and again,
> knowing what kind of traffic this is would really help.
I will try to dig into the source code and find it out.

Thanks again for the help.

Xuehai> 
> thanks,
> Nivedita
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - May 2005 - New MPI benchmark performance results (update)

[Xen-devel] New MPI benchmark performance results (update)

Re: [Xen-devel] New MPI benchmark performance results (update)

RE: [Xen-devel] New MPI benchmark performance results (update)

Re: [Xen-devel] New MPI benchmark performance results (update)

Re: [Xen-devel] New MPI benchmark performance results (update)

Re: [Xen-devel] New MPI benchmark performance results (update)

Re: [Xen-devel] New MPI benchmark performance results (update)

RE: [Xen-devel] New MPI benchmark performance results (update)

Re: [Xen-devel] New MPI benchmark performance results (update)

Re: [Xen-devel] New MPI benchmark performance results (update)