thr3ads.net - Xen devel - [Xen-devel] More network tests with xenoprofile this time [May 2005]

If this information is useful, please help other people find it:
Share via:

Andrew Theurer

2005-May-31 22:01 UTC

[Xen-devel] More network tests with xenoprofile this time

I had a chance to run a couple of the netperf tests with xenoprofile.  I 
am still having some trouble with multi-domain profiles (probably user 
error), but I have been able to profile dom0 while running 2 types of 
tests.  I was surprised to see as much as 50% cpu in hypervisor on 
these tests:

netperf tcp_stream 16k msg size, dom1 -> dom2
dom0 is on cpu0, HT thread 0, dom1 is on cpu1, HT thread 0,
dom2 is on cpu1, HT thread 1.

Throughput is ~900 Mbps

xenoprofile opreport:

  4914314 61.2189 vmlinux-2.6.11-xen0-up
  3022609 37.6534 xen-unstable-syms
    79516  0.9906 oprofiled
     3602  0.0449 libc-2.3.3.so
     2764  0.0344 libpython2.3.so.1.0

xenoprofile opreport -l:

   1656571  20.64 vmlinux-2.6.11-xen0-up      skb_copy_bits
    457043   5.69 vmlinux-2.6.11-xen0-up      net_tx_action
    361259   4.50      xen-unstable-syms       do_mmuext_op
    325335   4.05      xen-unstable-syms  find_domain_by_id
    277331   3.45      xen-unstable-syms __copy_from_user_ll
    242850   3.03      xen-unstable-syms do_update_va_mapping
    208405   2.60 vmlinux-2.6.11-xen0-up              kfree
    200640   2.50      xen-unstable-syms      do_mmu_update
    199645   2.49      xen-unstable-syms  get_page_from_l1e
    189219   2.36      xen-unstable-syms  put_page_from_l1e
    185831   2.31      xen-unstable-syms      get_page_type
    172362   2.15 vmlinux-2.6.11-xen0-up   make_rx_response
    171329   2.13 vmlinux-2.6.11-xen0-up         nf_iterate
    165977   2.07 vmlinux-2.6.11-xen0-up      net_rx_action
    165341   2.06      xen-unstable-syms       mod_l1_entry
    156055   1.94 vmlinux-2.6.11-xen0-up       nf_hook_slow
    116876   1.46      xen-unstable-syms alloc_domheap_pages
    116650   1.45      xen-unstable-syms        evtchn_send
    116215   1.45 vmlinux-2.6.11-xen0-up         fdb_insert
    111314   1.39 vmlinux-2.6.11-xen0-up   make_tx_response
    108480   1.35      xen-unstable-syms   alloc_heap_pages
    107896   1.34      xen-unstable-syms          hypercall
     99013   1.23 vmlinux-2.6.11-xen0-up netif_be_start_xmit
     91792   1.14 vmlinux-2.6.11-xen0-up    br_handle_frame


netperf tcp_stream 16k msg size, dom1 -> external host
dom0 is on cpu0, HT thread 0, dom1 is on cpu1, HT thread 1.


Throughput is ~940 Mbps, wire speed.

xenoprofile opreport:

  4244562 49.9375 xen-unstable-syms
  4110594 48.3614 vmlinux-2.6.11-xen0-up
   132643  1.5606 oprofiled
     4212  0.0496 libc-2.3.3.so
     2892  0.0340 libpython2.3.so.1.0

xenoprofile opreport -l:

    828587   9.75      xen-unstable-syms end_level_ioapic_irq
    712035   8.38      xen-unstable-syms mask_and_ack_level_ioapic_irq
    370265   4.36 vmlinux-2.6.11-xen0-up      net_tx_action
    323797   3.81 vmlinux-2.6.11-xen0-up           ohci_irq
    282005   3.32 vmlinux-2.6.11-xen0-up      tg3_interrupt
    273161   3.21      xen-unstable-syms  find_domain_by_id
    234726   2.76      xen-unstable-syms          hypercall
    206693   2.43      xen-unstable-syms do_update_va_mapping
    203758   2.40      xen-unstable-syms __copy_from_user_ll
    201665   2.37      xen-unstable-syms       do_mmuext_op
    195020   2.29 vmlinux-2.6.11-xen0-up         nf_iterate
    184295   2.17 vmlinux-2.6.11-xen0-up       nf_hook_slow
    172110   2.02 vmlinux-2.6.11-xen0-up             tg3_rx
    164337   1.93 vmlinux-2.6.11-xen0-up      net_rx_action
    141999   1.67      xen-unstable-syms      do_mmu_update
    139120   1.64 vmlinux-2.6.11-xen0-up         fdb_insert
    122483   1.44      xen-unstable-syms       mod_l1_entry
    122017   1.44      xen-unstable-syms  put_page_from_l1e
    111159   1.31      xen-unstable-syms  get_page_from_l1e
    109921   1.29      xen-unstable-syms             do_IRQ
     99847   1.17 vmlinux-2.6.11-xen0-up    br_handle_frame
     99709   1.17      xen-unstable-syms      get_page_type
     93613   1.10 vmlinux-2.6.11-xen0-up              kfree
     90885   1.07 vmlinux-2.6.11-xen0-up           end_pirq


-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-May-31 22:16 UTC

head link

RE: [Xen-devel] More network tests with xenoprofile this time

> I had a chance to run a couple of the netperf tests with 
> xenoprofile.  I am still having some trouble with 
> multi-domain profiles (probably user error), but I have been 
> able to profile dom0 while running 2 types of tests.  I was 
> surprised to see as much as 50% cpu in hypervisor on these tests:
> 
> netperf tcp_stream 16k msg size, dom1 -> dom2 dom0 is on 
> cpu0, HT thread 0, dom1 is on cpu1, HT thread 0,
> dom2 is on cpu1, HT thread 1.
Let''s ignore the domU <-> domU results for the moment as we know
about
the problem with lack of batching in this scenario. Let''s dig into the
dom1 -> external.

First off, are these figures just for CPU 0 HT 0? i.e. just dom0 so we
don''t see where time goes in the domU? How is idle time on the CPU
reported?

Spending 18% of the time handling interrupts in Xen is surprisingy (at
least to me). 

What interrupt rate are you observing? What are the default tg3
interrupt coallescing settings? What interrupt rate do you get on
native? Also, what hypercall rate are you seeing?  

(It would be good to put this in context of the rx/tx packet rates).

Is the Ethernet NIC sharing an interrupt with the USB controller per
chance?

Seeing find_domain_by_id and copy_from_user so high up the list is
pretty surprising. 

Cheers,
Ian
> netperf tcp_stream 16k msg size, dom1 -> external host dom0 
> is on cpu0, HT thread 0, dom1 is on cpu1, HT thread 1.
> 
> 
> Throughput is ~940 Mbps, wire speed.
> 
> xenoprofile opreport:
> 
>   4244562 49.9375 xen-unstable-syms
>   4110594 48.3614 vmlinux-2.6.11-xen0-up
>    132643  1.5606 oprofiled
>      4212  0.0496 libc-2.3.3.so
>      2892  0.0340 libpython2.3.so.1.0
> 
> xenoprofile opreport -l:
> 
>     828587   9.75      xen-unstable-syms end_level_ioapic_irq
>     712035   8.38      xen-unstable-syms mask_and_ack_level_ioapic_irq
>     370265   4.36 vmlinux-2.6.11-xen0-up      net_tx_action
>     323797   3.81 vmlinux-2.6.11-xen0-up           ohci_irq
>     282005   3.32 vmlinux-2.6.11-xen0-up      tg3_interrupt
>     273161   3.21      xen-unstable-syms  find_domain_by_id
>     234726   2.76      xen-unstable-syms          hypercall
>     206693   2.43      xen-unstable-syms do_update_va_mapping
>     203758   2.40      xen-unstable-syms __copy_from_user_ll
>     201665   2.37      xen-unstable-syms       do_mmuext_op
>     195020   2.29 vmlinux-2.6.11-xen0-up         nf_iterate
>     184295   2.17 vmlinux-2.6.11-xen0-up       nf_hook_slow
>     172110   2.02 vmlinux-2.6.11-xen0-up             tg3_rx
>     164337   1.93 vmlinux-2.6.11-xen0-up      net_rx_action
>     141999   1.67      xen-unstable-syms      do_mmu_update
>     139120   1.64 vmlinux-2.6.11-xen0-up         fdb_insert
>     122483   1.44      xen-unstable-syms       mod_l1_entry
>     122017   1.44      xen-unstable-syms  put_page_from_l1e
>     111159   1.31      xen-unstable-syms  get_page_from_l1e
>     109921   1.29      xen-unstable-syms             do_IRQ
>      99847   1.17 vmlinux-2.6.11-xen0-up    br_handle_frame
>      99709   1.17      xen-unstable-syms      get_page_type
>      93613   1.10 vmlinux-2.6.11-xen0-up              kfree
>      90885   1.07 vmlinux-2.6.11-xen0-up           end_pirq
> 
> 
> -Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-May-31 22:38 UTC

head link

Re: [Xen-devel] More network tests with xenoprofile this time

On Tuesday 31 May 2005 17:16, Ian Pratt wrote:> > I had a chance to run a couple of the netperf tests with
> > xenoprofile.  I am still having some trouble with
> > multi-domain profiles (probably user error), but I have been
> > able to profile dom0 while running 2 types of tests.  I was
> > surprised to see as much as 50% cpu in hypervisor on these tests:
> >
> > netperf tcp_stream 16k msg size, dom1 -> dom2 dom0 is on
> > cpu0, HT thread 0, dom1 is on cpu1, HT thread 0,
> > dom2 is on cpu1, HT thread 1.
>
> Let''s ignore the domU <-> domU results for the moment as we
know
> about the problem with lack of batching in this scenario. Let''s
dig
> into the dom1 -> external.
>
> First off, are these figures just for CPU 0 HT 0? i.e. just dom0 so
> we don''t see where time goes in the domU? How is idle time on the
CPU
> reported?
Yes, this is just for CPU 0 HT 0.  DomU is pinned to its own cpu, which 
is CPU 1 HT 0.  

I have cpu util from polling xc_domain_get_cpu_usage() for both domains, 
which is (an exerpt from the whole run, in 3 second intervals):

 cpu0:	[100.4] d0-0[100.4]
 cpu2:	[045.1] d1-0[045.1]

 cpu0:	[100.0] d0-0[100.0]
 cpu2:	[045.1] d1-0[045.1]

 cpu0:	[099.6] d0-0[099.6]
 cpu2:	[045.1] d1-0[045.1]

 cpu0:	[101.3] d0-0[101.3]
 cpu2:	[045.3] d1-0[045.3]

 cpu0:	[099.7] d0-0[099.7]
 cpu2:	[045.1] d1-0[045.1]

 cpu0:	[099.7] d0-0[099.7]
 cpu2:	[045.0] d1-0[045.0]

This is fairly consistent for the whole test.
> Spending 18% of the time handling interrupts in Xen is surprisingy
> (at least to me).
>
> What interrupt rate are you observing? What are the default tg3
> interrupt coallescing settings? What interrupt rate do you get on
> native? Also, what hypercall rate are you seeing?
>
> (It would be good to put this in context of the rx/tx packet rates).
I don''t have that data from this test, but I am queuing up another with
sar, so I should have it soon.

I will also queue up a test with just baremetal linux so we can compare 
int rates, etc.
> Is the Ethernet NIC sharing an interrupt with the USB controller per
> chance?
Not as far as I can tell:

           CPU0
  1:          8        Phys-irq  i8042
  3:          0        Phys-irq  acpi
  4:       3031        Phys-irq  serial
 11:    6764395        Phys-irq  ohci_hcd
 12:         93        Phys-irq  i8042
 15:      38687        Phys-irq  ide1
 18:      39398        Phys-irq  qla2300
 22:      47905        Phys-irq  ioc0
 24:    6037311        Phys-irq  eth0
256:          7     Dynamic-irq  ctrl-if
257:     182396     Dynamic-irq  timer0
258:          0     Dynamic-irq  net-be-dbg
259:      83437     Dynamic-irq  blkif-backend
260:    1688517     Dynamic-irq  vif1.0
>
> Seeing find_domain_by_id and copy_from_user so high up the list is
> pretty surprising.
Yes.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-May-31 22:48 UTC

head link

RE: [Xen-devel] More network tests with xenoprofile this time

> I have cpu util from polling xc_domain_get_cpu_usage() for 
> both domains, which is (an exerpt from the whole run, in 3 
> second intervals):
> 
>  cpu0:	[100.4] d0-0[100.4]
>  cpu2:	[045.1] d1-0[045.1]
OK, so you''re confident idle time would be reported OK if there was
any.> > Is the Ethernet NIC sharing an interrupt with the USB 
> controller per 
> > chance?
> 
> Not as far as I can tell:
> 
>            CPU0
>  11:    6764395        Phys-irq  ohci_hcd
>  24:    6037311        Phys-irq  eth0
> 260:    1688517     Dynamic-irq  vif1.0
Anyone care to suggest hy ohci_hcd is taking so many interrupts? Looks
very fishy to me. I take it you''re not using a USB Ethernet NIC? :-)

What happens if you boot ''nousb'' ?
> > Seeing find_domain_by_id and copy_from_user so high up the list is 
> > pretty surprising.
> 
> Yes.
Definitely worth looking in to...

Sorry for all the questions. This work is much appreciated.

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2005-Jun-01 00:15 UTC

head link

RE: [Xen-devel] More network tests with xenoprofile this time

Andrew,

  You may want to take a look at the folowing paper
  which is being presented at VEE''05 (June 11 and 12, 2005).

http://www.hpl.hp.com/research/dca/system/papers/xenoprof-vee05.pdf

  It presents network performance results using xenoprof.
  This was done for xen 2.0.3. The profile you reported
  has some similarities with our results although the
  exact numbers are different. But that is expected, since
  you are running a different version of Xen on a different
  hardware. 
  We have seen that a significant amount of time was spent
  on handling interrupts in Xen, as well. 
  We have also seen that a significant amount of time is 
  spent on the hypervisor (+/- 40%) for the dom1 <-> external
  case, measured both at dom1 and at dom0.
  (in our case we instrumented the receive side)
  When we run the benchmark on dom0 the time spent on Xen
  is reduced to (+/-20%). 
  Most of this extra Xen overhead when running a guest
  seems to come from the page transfer between
  domain 0 and the guest (see table 6 and discussion
  on paper).
 
  The paper omits the complete oprofile reports
  for brevity. I will be happy to send you any
  detailed oprofile report we have generated for the
  paper, if you want to compare it with your results. 
  Just let me know ...

  Renato


>> -----Original Message-----
>> From: xen-devel-bounces@lists.xensource.com 
>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Ian Pratt
>> Sent: Tuesday, May 31, 2005 3:16 PM
>> To: Andrew Theurer; xen-devel@lists.xensource.com
>> Subject: RE: [Xen-devel] More network tests with xenoprofile 
>> this time
>> 
>> 
>> > I had a chance to run a couple of the netperf tests with
>> > xenoprofile.  I am still having some trouble with 
>> > multi-domain profiles (probably user error), but I have been 
>> > able to profile dom0 while running 2 types of tests.  I was 
>> > surprised to see as much as 50% cpu in hypervisor on these tests:
>> > 
>> > netperf tcp_stream 16k msg size, dom1 -> dom2 dom0 is on
>> > cpu0, HT thread 0, dom1 is on cpu1, HT thread 0,
>> > dom2 is on cpu1, HT thread 1.
>> 
>> Let''s ignore the domU <-> domU results for the moment as
we
>> know about the problem with lack of batching in this 
>> scenario. Let''s dig into the dom1 -> external.
>> 
>> First off, are these figures just for CPU 0 HT 0? i.e. just 
>> dom0 so we don''t see where time goes in the domU? How is 
>> idle time on the CPU reported?
>> 
>> Spending 18% of the time handling interrupts in Xen is 
>> surprisingy (at least to me). 
>> 
>> What interrupt rate are you observing? What are the default 
>> tg3 interrupt coallescing settings? What interrupt rate do 
>> you get on native? Also, what hypercall rate are you seeing?  
>> 
>> (It would be good to put this in context of the rx/tx packet rates).
>> 
>> Is the Ethernet NIC sharing an interrupt with the USB 
>> controller per chance?
>> 
>> Seeing find_domain_by_id and copy_from_user so high up the 
>> list is pretty surprising. 
>> 
>> Cheers,
>> Ian
>> 
>> > netperf tcp_stream 16k msg size, dom1 -> external host dom0
>> > is on cpu0, HT thread 0, dom1 is on cpu1, HT thread 1.
>> > 
>> > 
>> > Throughput is ~940 Mbps, wire speed.
>> > 
>> > xenoprofile opreport:
>> > 
>> >   4244562 49.9375 xen-unstable-syms
>> >   4110594 48.3614 vmlinux-2.6.11-xen0-up
>> >    132643  1.5606 oprofiled
>> >      4212  0.0496 libc-2.3.3.so
>> >      2892  0.0340 libpython2.3.so.1.0
>> > 
>> > xenoprofile opreport -l:
>> > 
>> >     828587   9.75      xen-unstable-syms end_level_ioapic_irq
>> >     712035   8.38      xen-unstable-syms 
>> mask_and_ack_level_ioapic_irq
>> >     370265   4.36 vmlinux-2.6.11-xen0-up      net_tx_action
>> >     323797   3.81 vmlinux-2.6.11-xen0-up           ohci_irq
>> >     282005   3.32 vmlinux-2.6.11-xen0-up      tg3_interrupt
>> >     273161   3.21      xen-unstable-syms  find_domain_by_id
>> >     234726   2.76      xen-unstable-syms          hypercall
>> >     206693   2.43      xen-unstable-syms do_update_va_mapping
>> >     203758   2.40      xen-unstable-syms __copy_from_user_ll
>> >     201665   2.37      xen-unstable-syms       do_mmuext_op
>> >     195020   2.29 vmlinux-2.6.11-xen0-up         nf_iterate
>> >     184295   2.17 vmlinux-2.6.11-xen0-up       nf_hook_slow
>> >     172110   2.02 vmlinux-2.6.11-xen0-up             tg3_rx
>> >     164337   1.93 vmlinux-2.6.11-xen0-up      net_rx_action
>> >     141999   1.67      xen-unstable-syms      do_mmu_update
>> >     139120   1.64 vmlinux-2.6.11-xen0-up         fdb_insert
>> >     122483   1.44      xen-unstable-syms       mod_l1_entry
>> >     122017   1.44      xen-unstable-syms  put_page_from_l1e
>> >     111159   1.31      xen-unstable-syms  get_page_from_l1e
>> >     109921   1.29      xen-unstable-syms             do_IRQ
>> >      99847   1.17 vmlinux-2.6.11-xen0-up    br_handle_frame
>> >      99709   1.17      xen-unstable-syms      get_page_type
>> >      93613   1.10 vmlinux-2.6.11-xen0-up              kfree
>> >      90885   1.07 vmlinux-2.6.11-xen0-up           end_pirq
>> > 
>> > 
>> > -Andrew
>> > 
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> > 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jon Mason

2005-Jun-01 20:03 UTC

head link

Re: [Xen-devel] More network tests with xenoprofile this time

On Tuesday 31 May 2005 05:48 pm, Ian Pratt wrote:> > I have cpu util from polling xc_domain_get_cpu_usage() for
> > both domains, which is (an exerpt from the whole run, in 3
> > second intervals):
> >
> >  cpu0:	[100.4] d0-0[100.4]
> >  cpu2:	[045.1] d1-0[045.1]
>
> OK, so you''re confident idle time would be reported OK if there
was any.
>
> > > Is the Ethernet NIC sharing an interrupt with the USB
> >
> > controller per
> >
> > > chance?
> >
> > Not as far as I can tell:
> >
> >            CPU0
> >  11:    6764395        Phys-irq  ohci_hcd
> >  24:    6037311        Phys-irq  eth0
> > 260:    1688517     Dynamic-irq  vif1.0
>
> Anyone care to suggest hy ohci_hcd is taking so many interrupts? Looks
> very fishy to me. I take it you''re not using a USB Ethernet NIC?
:-)
The bladecenters have a shared USB connected to all the blades.  I would 
imagine it is the keyboard/mouse or USB CDROM connected to this bus that is 
generating all of these interrupts.
> What happens if you boot ''nousb'' ?
This shouldn''t hurt anything, unless Andrew needs access to kdb or
cdrom.

Thanks,
Jon

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Jun-01 20:21 UTC

head link

Re: [Xen-devel] More network tests with xenoprofile this time

On Wednesday 01 June 2005 15:03, Jon Mason wrote:> On Tuesday 31 May 2005 05:48 pm, Ian Pratt wrote:
> > > I have cpu util from polling xc_domain_get_cpu_usage() for
> > > both domains, which is (an exerpt from the whole run, in 3
> > > second intervals):
> > >
> > >  cpu0:	[100.4] d0-0[100.4]
> > >  cpu2:	[045.1] d1-0[045.1]
> >
> > OK, so you''re confident idle time would be reported OK if
there was
> > any.
> >
> > > > Is the Ethernet NIC sharing an interrupt with the USB
> > >
> > > controller per
> > >
> > > > chance?
> > >
> > > Not as far as I can tell:
> > >
> > >            CPU0
> > >  11:    6764395        Phys-irq  ohci_hcd
> > >  24:    6037311        Phys-irq  eth0
> > > 260:    1688517     Dynamic-irq  vif1.0
> >
> > Anyone care to suggest hy ohci_hcd is taking so many interrupts?
> > Looks very fishy to me. I take it you''re not using a USB
Ethernet
> > NIC? :-)
>
> The bladecenters have a shared USB connected to all the blades.  I
> would imagine it is the keyboard/mouse or USB CDROM connected to this
> bus that is generating all of these interrupts.
>
> > What happens if you boot ''nousb'' ?
>
> This shouldn''t hurt anything, unless Andrew needs access to kdb or
> cdrom.
This is on a x336 system, P4 Xeon, not much USB really needed.  I did 
not see any difference in performace or the profile with nousb.

I also tried disbaling the locks in find_domain_by_id and saw no 
difference.  I''m curious to see how things differ with dom0 on CPU-0 
HT-0 and dom1 on CPU-0 HT-1.  I will probably try that next.

FWIW, baremetal linux used about 33% of one cpu to drive the same 
throughput.  int''s/sec was 41k/sec for baremetal vs 59k/sec for dom0.  
I don''t have the breakdown of int/sec per interrupt number yet.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Jun-02 14:53 UTC

head link

Re: [Xen-devel] More network tests with xenoprofile this time

On Wednesday 01 June 2005 15:21, Andrew Theurer wrote:> On Wednesday 01 June 2005 15:03, Jon Mason wrote:
> > On Tuesday 31 May 2005 05:48 pm, Ian Pratt wrote:
> > > > I have cpu util from polling xc_domain_get_cpu_usage() for
> > > > both domains, which is (an exerpt from the whole run, in 3
> > > > second intervals):
> > > >
> > > >  cpu0:	[100.4] d0-0[100.4]
> > > >  cpu2:	[045.1] d1-0[045.1]
> > >
> > > OK, so you''re confident idle time would be reported OK
if there
> > > was any.
> > >
> > > > > Is the Ethernet NIC sharing an interrupt with the USB
> > > >
> > > > controller per
> > > >
> > > > > chance?
> > > >
> > > > Not as far as I can tell:
> > > >
> > > >            CPU0
> > > >  11:    6764395        Phys-irq  ohci_hcd
> > > >  24:    6037311        Phys-irq  eth0
> > > > 260:    1688517     Dynamic-irq  vif1.0
> > >
> > > Anyone care to suggest hy ohci_hcd is taking so many interrupts?
> > > Looks very fishy to me. I take it you''re not using a USB
Ethernet
> > > NIC? :-)
> >
> > The bladecenters have a shared USB connected to all the blades.  I
> > would imagine it is the keyboard/mouse or USB CDROM connected to
> > this bus that is generating all of these interrupts.
> >
> > > What happens if you boot ''nousb'' ?
> >
> > This shouldn''t hurt anything, unless Andrew needs access to
kdb or
> > cdrom.
>
> This is on a x336 system, P4 Xeon, not much USB really needed.  I did
> not see any difference in performace or the profile with nousb.
>
> I also tried disbaling the locks in find_domain_by_id and saw no
> difference.  I''m curious to see how things differ with dom0 on
CPU-0
> HT-0 and dom1 on CPU-0 HT-1.  I will probably try that next.
>
> FWIW, baremetal linux used about 33% of one cpu to drive the same
> throughput.  int''s/sec was 41k/sec for baremetal vs 59k/sec for
dom0.
> I don''t have the breakdown of int/sec per interrupt number yet.
Wanted to follow up, one correction, I did not have usb disabled 
properly, and with properly removing usb, there is a slight reduction 
in irq handling overhead as a result:

542129    6.2205  xen-unstable-syms        mask_and_ack_level_ioapic_irq
506060    5.8067  xen-unstable-syms        end_level_ioapic_irq
475786    5.4593  vmlinux-2.6.11-xen0-up   net_tx_action
376309    4.3179  vmlinux-2.6.11-xen0-up   tg3_interrupt
263008    3.0178  xen-unstable-syms        find_domain_by_id
239789    2.7514  xen-unstable-syms        hypercall
224547    2.5765  vmlinux-2.6.11-xen0-up   nf_iterate

...vs about 8-9% each for the top two functions before.  The interrupt 
rate for the tg3 adapter is very high still, about 24k/sec.  At that 
rate it does not appear to have any interrupt coalescing going on, so I 
am going to look into that.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

William Cohen

2005-Jun-07 21:47 UTC

head link

Re: [Xen-devel] More network tests with xenoprofile this time

Santos, Jose Renato G wrote:>   Andrew,
> 
>   You may want to take a look at the folowing paper
>   which is being presented at VEE''05 (June 11 and 12, 2005).
> 
> http://www.hpl.hp.com/research/dca/system/papers/xenoprof-vee05.pdf
> 
>   It presents network performance results using xenoprof.
>   This was done for xen 2.0.3. The profile you reported
>   has some similarities with our results although the
>   exact numbers are different. But that is expected, since
>   you are running a different version of Xen on a different
>   hardware. 
>   We have seen that a significant amount of time was spent
>   on handling interrupts in Xen, as well. 
>   We have also seen that a significant amount of time is 
>   spent on the hypervisor (+/- 40%) for the dom1 <-> external
>   case, measured both at dom1 and at dom0.
>   (in our case we instrumented the receive side)
>   When we run the benchmark on dom0 the time spent on Xen
>   is reduced to (+/-20%). 
>   Most of this extra Xen overhead when running a guest
>   seems to come from the page transfer between
>   domain 0 and the guest (see table 6 and discussion
>   on paper).
>  
>   The paper omits the complete oprofile reports
>   for brevity. I will be happy to send you any
>   detailed oprofile report we have generated for the
>   paper, if you want to compare it with your results. 
>   Just let me know ...
> 
>   Renato
Hi Renato,

The article was an interesting application of the xenoprof.

It seem like it would be useful to also have data collected using the 
cycle counts (GLOBAL_POWER_EVENTS on P4) to give some indication of 
areas with high overhead operations. There may be some areas with few 
very expensive instructions. Calling attention to those areas would help 
improve performance.

The increases in I-TLB and D-TLB events for Xen-domain0 shown in Figure 
4 are surprising. Why would the working sets be that much larger for 
Xen-domain0 than regular linux, particularly for code? Is there an table 
similar to table 3 for I-TLB event sample locations?

Can''t the VMM use a 4-MB page and the Xen-domain0 kernel
shouldn''t be
that much larger than regular linux kernel?  How were TLB flushes ruled 
out as a cause? Could the PERFCOUNTER_CPU counters in perfc_defn.h be 
used to see if the VMM is doing a lot of TLB flushes?

Also how much of I-TLB and D-TLB events are due to the P4 architecture? 
Are the results so dramatic for a Athlon or AMD64 processors?

-Will

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Jun-08 19:20 UTC

head link

Re: [Xen-devel] More network tests with xenoprofile this time

> Hi Renato,
>
> The article was an interesting application of the xenoprof.
>
> It seem like it would be useful to also have data collected using the
> cycle counts (GLOBAL_POWER_EVENTS on P4) to give some indication of
> areas with high overhead operations. There may be some areas with few
> very expensive instructions. Calling attention to those areas would
> help improve performance.
>
> The increases in I-TLB and D-TLB events for Xen-domain0 shown in
> Figure 4 are surprising. Why would the working sets be that much
> larger for Xen-domain0 than regular linux, particularly for code? Is
> there an table similar to table 3 for I-TLB event sample locations?
>
> Can''t the VMM use a 4-MB page and the Xen-domain0 kernel
shouldn''t be
> that much larger than regular linux kernel?  How were TLB flushes
> ruled out as a cause? Could the PERFCOUNTER_CPU counters in
> perfc_defn.h be used to see if the VMM is doing a lot of TLB flushes?
I had the same concern as you, and IMO, it seemed unlikely that the 
working set for dom0 would be so much larger to cause significant 
amount of TLB miss.  I also suspect TLB flushes to be the problem, but 
I have not had a chance to look at it.  I hope to very soon.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2005-Jun-17 19:39 UTC

head link

RE: [Xen-devel] More network tests with xenoprofile this time

William and Andrew

  Sorry for the delay in replying. I have been traveling
  and did not have email access while away.
> 
> Hi Renato,
> 
> The article was an interesting application of the xenoprof.
> 
> It seem like it would be useful to also have data collected using the 
> cycle counts (GLOBAL_POWER_EVENTS on P4) to give some indication of 
> areas with high overhead operations. There may be some areas with few 
> very expensive instructions. Calling attention to those areas 
> would help 
> improve performance.
  Yes, you are right. We have in fact collected GLOBAL_POWER_EVENTS,
  but did not include in the paper due to space limitations.
  I have attached oprofile results for our ttcp like benchmark(receive
  side) for the case with 1  NIC (both cycle counts and instructions).
  As you can see there are some functions with very expensive
instructions.
  For example "hypercall" add anly 0.6% additional instructions but
  these consume 3.0% more clock cycles; "unmask_IO_APIC_irq" add
  0.25% instructions but consume 5% more cycles. It would be
  interesting to investigate these and see if we can optimize them.
     > 
> The increases in I-TLB and D-TLB events for Xen-domain0 shown 
> in Figure 
> 4 are surprising. Why would the working sets be that much larger for 
> Xen-domain0 than regular linux, particularly for code? Is 
> there an table 
> similar to table 3 for I-TLB event sample locations?
> 
  Yes, we were also surprised by these results. I have attached
  the complete I-TLB and D_TLB oprofile results (for the 3 NICs case)
  (note these are on a different type of machine than the other
   2 attached oprofile results)  

  Aravind instrumented the macros in xen/include/asm-x86/flushtlb.h.
  I am not sure if he used PERFCOUNTER_CPU or if he included his
  own instrumentation. With this instrumentation we did not observe
  any TLB flush, but I suppose we could have missed TLB flushes
  that did not use the macro... I think it would be a good idea to
  investigate this further to confirm that TLB flushes are not
happening.

  One additional observation is that in general the number of misses
  in NOT proportional to the size of the working set. It is possible
  that a small increase in the working set significantly increase the
  number of misses. Therefore it is possible that the increase
  in TLB misses is in fact due to a larger working set. But, I agree
  we have to investigate this further to get confirmation ...
> Can''t the VMM use a 4-MB page and the Xen-domain0 kernel
shouldn''t be
> that much larger than regular linux kernel? 
> How were TLB  flushes ruled 
> out as a cause? Could the PERFCOUNTER_CPU counters in perfc_defn.h be 
> used to see if the VMM is doing a lot of TLB flushes?
> 
> Also how much of I-TLB and D-TLB events are due to the P4 
> architecture? 
> Are the results so dramatic for a Athlon or AMD64 processors?
>   We did not try this on any other architecture. 
  Right now xenoprof is only supported on P4.
  Support for other architectures is not on top of our priority list.

  Regards

  Renato 
> -Will
> 
> 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Jun-22 02:22 UTC

head link

Re: [Xen-devel] More network tests with xenoprofile this time

FWIW, I took a look at find_domain_by_id() and swapped out the rw lock 
with a spin lock for domlist_lock.  The time spent in the this function 
was reduced by 18%.  This function was certainly not the "hottest" one
I
recorded while running netperf, but every little bit helps I suppose.  
Below are before/after xenoprofile snippets:

before:

547036   6.32      xen-unstable-syms mask_and_ack_level_ioapic_irq
510448   5.90      xen-unstable-syms end_level_ioapic_irq
463386   5.35 vmlinux-2.6.11-xen0-up      net_tx_action
371072   4.29                 tg3.ko      tg3_interrupt
261341   3.02      xen-unstable-syms  find_domain_by_id
237601   2.75      xen-unstable-syms          hypercall
228649   2.64 vmlinux-2.6.11-xen0-up         nf_iterate
215634   2.49      xen-unstable-syms do_update_va_mapping
214077   2.47 vmlinux-2.6.11-xen0-up      net_rx_action

after:

549276   6.35      xen-unstable-syms mask_and_ack_level_ioapic_irq
511693   5.91      xen-unstable-syms end_level_ioapic_irq
466873   5.39 vmlinux-2.6.11-xen0-up      net_tx_action
375702   4.34                 tg3.ko      tg3_interrupt
239219   2.76      xen-unstable-syms          hypercall
230641   2.67 vmlinux-2.6.11-xen0-up         nf_iterate
220480   2.55      xen-unstable-syms do_update_va_mapping
217472   2.51                 tg3.ko             tg3_rx
217029   2.51 vmlinux-2.6.11-xen0-up      net_rx_action
214271   2.48      xen-unstable-syms  find_domain_by_id

Has anyone thought about using read-copy-update in Xen?

I plan on looking at the two irq functions next.

-Andrew





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - May 2005 - More network tests with xenoprofile this time

[Xen-devel] More network tests with xenoprofile this time

RE: [Xen-devel] More network tests with xenoprofile this time

Re: [Xen-devel] More network tests with xenoprofile this time

RE: [Xen-devel] More network tests with xenoprofile this time

RE: [Xen-devel] More network tests with xenoprofile this time

Re: [Xen-devel] More network tests with xenoprofile this time

Re: [Xen-devel] More network tests with xenoprofile this time

Re: [Xen-devel] More network tests with xenoprofile this time

Re: [Xen-devel] More network tests with xenoprofile this time

Re: [Xen-devel] More network tests with xenoprofile this time

RE: [Xen-devel] More network tests with xenoprofile this time

Re: [Xen-devel] More network tests with xenoprofile this time