thr3ads.net - Xen users - [Xen-users] Performance Issues: I/O Wait [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Nick Couchman

2007-Oct-18 17:05 UTC

[Xen-users] Performance Issues: I/O Wait

Hey, everyone, 
I''m having some issues with a Xen DomU right related to performance. 
We have an application that cross-compiles a Linux distribution from scratch for
embedded systems.  We''re attempting to run this application inside a
Xen DomU (paravirtualized, modified guest kernel), and the performance is really
bad.  The culprit seems to be high I/O wait times related to the network
interface.

The host machine is a Dell PowerEdge 1950 with 2 x Dual-Core Xeon processors
(Xeon 5150 @ 2.66GHz, 1333 FSB).  The domU is configured with 2 vCPUs and 1GB of
RAM.  The Dom0 O/S is SuSE Linux Enterprise Server 10, and the DomU O/S is
CentOS4U5.  For comparison, we''re comparing against a physical machine
that has a Pentium 4 3.2GHz processor on an 800MGz FSB with 4GB of RAM.  The RAM
does not seem to be an issue - on the DomU, about half of the 1024MB is used by
active processes and the other half is left to buffering and caching.  There are
4 kbytes of memory used on the swap partition.  Building these Linux
distributions on the physical system takes 70-80 minutes (real time) - on the
DomU system it takes 130-140 minutes.

The few times that I''ve see the system "lock up" the way the
engineers who are using it claim it is doing, one of the DomU vCPUs goes to 100%
wait for 20-30 seconds.  This seems to coincide with network operations in the
DomU.  So, I''m wondering, what can I do to improve network performance
and eliminate these I/O wait times on this CPU.  The server itself
doesn''t seem to be having any problems - while this occurs in the DomU,
other physical systems can access the server just fine without any issues.  Any
light that anyone can shed on the situation - how I can improve network
performance and eliminate these I/O waits - would be most appreciated. 
Here''s the network config line from my Xen configuration file:

vif=[ ''mac=00:16:3e:75:0d:be,bridge=xenbr108'', ] 

And I found a suggestion for traffic control in the guest that I applied - maybe
I did something wrong there, too.  Here''s the output from tc qdisc
show:
qdisc tbf 8001: dev eth0 rate 50Mbit burst 50Kb lat 20.0ms 

Thanks, 
Nick 


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Nick Couchman

2007-Oct-24 00:32 UTC

head link

[Xen-users] Re: Performance Issues: I/O Wait

Hi, again...haven''t had any responses to this, yet.  I''d
really appreciate any light that anyone could shed on the issue.  By way of
update, it doesn''t seem to necessarily be the network, it just seems to
happen more frequently on network-based traffic.  Local block devices experience
the same issues, just not quite as long sitting and waiting.  I''m open
to any suggestions.  I may try to set it to a single vCPU and see if that helps,
but I can''t understand why this only happens in the guest and I see
absolutely no signs of it in the Dom0.  Any suggestions for how to track down
why it''s happening would also be appreciated.

Thanks,
Nick
>>> Nick Couchman 10/18/07 11:05 AM >>>Hey, everyone, 
I''m having some issues with a Xen DomU right related to performance. 
We have an application that cross-compiles a Linux distribution from scratch for
embedded systems.  We''re attempting to run this application inside a
Xen DomU (paravirtualized, modified guest kernel), and the performance is really
bad.  The culprit seems to be high I/O wait times related to the network
interface.

The host machine is a Dell PowerEdge 1950 with 2 x Dual-Core Xeon processors
(Xeon 5150 @ 2.66GHz, 1333 FSB).  The domU is configured with 2 vCPUs and 1GB of
RAM.  The Dom0 O/S is SuSE Linux Enterprise Server 10, and the DomU O/S is
CentOS4U5.  For comparison, we''re comparing against a physical machine
that has a Pentium 4 3.2GHz processor on an 800MGz FSB with 4GB of RAM.  The RAM
does not seem to be an issue - on the DomU, about half of the 1024MB is used by
active processes and the other half is left to buffering and caching.  There are
4 kbytes of memory used on the swap partition.  Building these Linux
distributions on the physical system takes 70-80 minutes (real time) - on the
DomU system it takes 130-140 minutes.

The few times that I''ve see the system "lock up" the way the
engineers who are using it claim it is doing, one of the DomU vCPUs goes to 100%
wait for 20-30 seconds.  This seems to coincide with network operations in the
DomU.  So, I''m wondering, what can I do to improve network performance
and eliminate these I/O wait times on this CPU.  The server itself
doesn''t seem to be having any problems - while this occurs in the DomU,
other physical systems can access the server just fine without any issues.  Any
light that anyone can shed on the situation - how I can improve network
performance and eliminate these I/O waits - would be most appreciated. 
Here''s the network config line from my Xen configuration file:

vif=[ ''mac=00:16:3e:75:0d:be,bridge=xenbr108'', ] 

And I found a suggestion for traffic control in the guest that I applied - maybe
I did something wrong there, too.  Here''s the output from tc qdisc
show:
qdisc tbf 8001: dev eth0 rate 50Mbit burst 50Kb lat 20.0ms 

Thanks, 
Nick 

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Stephan Seitz

2007-Oct-24 02:03 UTC

head link

Re: [Xen-users] Re: Performance Issues: I/O Wait

Nick Couchman schrieb:> Hi, again...haven''t had any responses to this, yet.
> 
> Thanks,
> Nick
Hi,

have you tried to pin your vCPU''s to dedicated Cores?
It might be suboptimal if a domU uses.. say core 0 of
cpu 0 and core 0 of cpu 1 ... or anything not on the
same cpu.
config:
e.g. use the second cpu on a 2x dualcore system
count starts at 0.
cpus = [ 2,3 ]


If your domU is bridged, you might try to disable
transmission checksums on the NIC''s.
ethtool -K [nicname] tx off

Anyway, you mentioned high I/O latency due to network
issues. Is your domU using some kind of NFS or something?
Btw. I might missed this part, but are you using PV or
HVM guests?

Greetings


-- 
Stephan Seitz
Senior System Administrator

*netz-haut* e.K.
multimediale kommunikation

zweierweg 22
97074 würzburg

fon: +49 931 2876247
fax: +49 931 2876248

web: www.netz-haut.de <http://www.netz-haut.de/>

registriergericht: amtsgericht würzburg, hra 5054



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Nick Couchman

2007-Oct-24 02:55 UTC

head link

Re: [Xen-users] Re: Performance Issues: I/O Wait

Stephan,
Thanks for the tips - I''ll give them a try.  One clarification - for
disabling transmission checksums, is that inside the domU or inside
dom0?

Some of the tests was using NFS, but lately I''ve been testing with
iSCSI
and even local block devices and it seems to happen on those, too,
though not to the same degree.

Thanks,

Nick Couchman
SEAKR Engineering, Inc.
6221 South Racine Circle
Centennial, CO 80111
Main: (303) 790-8499
Fax: (303) 790-8720
Web: http://www.seakr.com>>> Stephan Seitz <s.seitz@netz-haut.de> 10/23/07 8:03 PM
>>>
Nick Couchman schrieb:> Hi, again...haven''t had any responses to this, yet.
> 
> Thanks,
> Nick
Hi,

have you tried to pin your vCPU''s to dedicated Cores?
It might be suboptimal if a domU uses.. say core 0 of
cpu 0 and core 0 of cpu 1 ... or anything not on the
same cpu.
config:
e.g. use the second cpu on a 2x dualcore system
count starts at 0.
cpus = [ 2,3 ]


If your domU is bridged, you might try to disable
transmission checksums on the NIC''s.
ethtool -K [nicname] tx off

Anyway, you mentioned high I/O latency due to network
issues. Is your domU using some kind of NFS or something?
Btw. I might missed this part, but are you using PV or
HVM guests?

Greetings


-- 
Stephan Seitz
Senior System Administrator

*netz-haut* e.K.
multimediale kommunikation

zweierweg 22
97074 würzburg

fon: +49 931 2876247
fax: +49 931 2876248

web: www.netz-haut.de <http://www.netz-haut.de/>

registriergericht: amtsgericht würzburg, hra 5054


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steve Senator (Senator Ent)

2007-Oct-24 03:17 UTC

head link

Re: [Xen-users] Re: Performance Issues: I/O Wait

Xen can exacerbate Linux SMP issues. Do you have hyperthreading turned  
on in your CPU''s? If so, at least for testing, try turning it off.

Also, beyond turning of the TX offloading in both the dom0 and domU,  
is there any chance that there''s another device attached to that  
bridge which would cause network delays? In particular, is there a  
device that may incorrectly see the domU IP as coming from the dom0  
due to an ARP conflict? I see that you''ve specified a fixed MAC  
address. Is there any chance that that same MAC address is used by the  
dom0? Perhaps the initrd is the one from dom0 and its got the MAC  
address set in the initrd to be the same as the one in the dom0?

Try tcpdumping from both domains and see if you see any  
retransmissions, or perhaps even a smoking gun like a system ARPing  
for itself when it should know better.

It''s also possible that there''s a transmission size problem.
There
have been reported problems of dom0<->domU traffic not honoring the  
MTU of the bridge or virtual device, which then forces retransmission  
when the receiving side cannot handle the larger buffer.

If NFS, try changing from TCP to UDP or modifying the rsize and wsize  
buffering to fit within the MTU of your (virtual) ethernet devices.

Hope this helps,
-Steve Senator



Quoting Nick Couchman <Nick.Couchman@seakr.com>:
> Hi, again...haven''t had any responses to this, yet.
>>>> Nick Couchman 10/18/07 11:05 AM >>>
> Hey, everyone,
> I''m having some issues with a Xen DomU right related to
performance.
>  ... The culprit seems to be  high I/O wait times related to the  
> network interface.
>
> The host machine is a Dell PowerEdge 1950 with 2 x Dual-Core Xeon   
> processors (Xeon 5150 @ 2.66GHz, 1333 FSB).  ...  Building these  
> Linux distributions on the physical system takes  70-80 minutes  
> (real time) - on the DomU system it takes 130-140  minutes.
> ...
> vif=[ ''mac=00:16:3e:75:0d:be,bridge=xenbr108'', ]


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Stephan Seitz

2007-Oct-24 13:17 UTC

head link

Re: [Xen-users] Re: Performance Issues: I/O Wait

Nick Couchman schrieb:> Stephan,
> Thanks for the tips - I''ll give them a try.  One clarification -
for
> disabling transmission checksums, is that inside the domU or inside
> dom0?
Hi Nick,

I dont'' know if it''s really necessary, but we''re
disabling transmission
checksums on every domain including dom0.
Btw. some weeks ago i noticed a thread at the list with problems using
tg3 cards. As we don''t own cards with Broadcom Tigon3, i
didn''t mention.
> Some of the tests was using NFS, but lately I''ve been testing with
iSCSI
> and even local block devices and it seems to happen on those, too,
> though not to the same degree.
This sounds really strange. We''ve tried different kinds of domU disks
and
are currently using dom0 LVM''s on locally attached raid arrays. It is
really paranoid, but we avoided disk images with the same filesystem as
the underlaying. so we didn''t test ext3 on ext3 or xfs on xfs. Besides
this restriction, we didn''t notice any BIG performance problems.



Greetings

Stephan

-- 
Stephan Seitz
Senior System Administrator

*netz-haut* e.K.
multimediale kommunikation

zweierweg 22
97074 würzburg

fon: +49 931 2876247
fax: +49 931 2876248

web: www.netz-haut.de <http://www.netz-haut.de/>

registriergericht: amtsgericht würzburg, hra 5054



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Nick Couchman

2007-Oct-24 13:30 UTC

head link

Re: [Xen-users] Re: Performance Issues: I/O Wait

Stephan,
I''m not using tg3 cards, but they are Broadcom (Dell PE1950 on-board). 
There is an add-in, dual-port Intel card present, though, so maybe I''ll
give that a try.

My "local" disks are also LVM volumes in DOM0 on a hardware RAID1.  I
then present the LVM volume as xvda, xvdb, etc., to the guest and then
partition in the guest.  The iSCSI volume that I''m using for testing is
a software-initiated (open-iscsi) connection inside of DOM0 that''s part
of a volume group.  I then created a volume in that group (different
group from volumes from on-board RAID) and presented that to the domU as
xvdb.  I''m using ext3, currently, but XFS is on my list of filesystems
to try.

Thanks,
Nick
>>> Stephan Seitz <s.seitz@netz-haut.de> 10/24/07 7:17 AM
>>>
Nick Couchman schrieb:> Stephan,
> Thanks for the tips - I''ll give them a try.  One clarification -
for
> disabling transmission checksums, is that inside the domU or inside
> dom0?
Hi Nick,

I dont'' know if it''s really necessary, but we''re
disabling transmission
checksums on every domain including dom0.
Btw. some weeks ago i noticed a thread at the list with problems using
tg3 cards. As we don''t own cards with Broadcom Tigon3, i
didn''t mention.
> Some of the tests was using NFS, but lately I''ve been testing with
iSCSI> and even local block devices and it seems to happen on those, too,
> though not to the same degree.
This sounds really strange. We''ve tried different kinds of domU disks
and
are currently using dom0 LVM''s on locally attached raid arrays. It is
really paranoid, but we avoided disk images with the same filesystem as
the underlaying. so we didn''t test ext3 on ext3 or xfs on xfs. Besides
this restriction, we didn''t notice any BIG performance problems.



Greetings

Stephan

-- 
Stephan Seitz
Senior System Administrator

*netz-haut* e.K.
multimediale kommunikation

zweierweg 22
97074 würzburg

fon: +49 931 2876247
fax: +49 931 2876248

web: www.netz-haut.de <http://www.netz-haut.de/>

registriergericht: amtsgericht würzburg, hra 5054


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Nick Couchman

2007-Oct-24 15:20 UTC

head link

Re: [Xen-users] Re: Performance Issues: I/O Wait

Yeah, HT is off (I don''t even know if you can turn it on in the
PE1950s!).  I''m getting some interesting stuff from tcpdump:

09:04:14.639345 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42499521 win 5080 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639345 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42500969 win 5804 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639373 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42502417 win 6528 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639373 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42503865 win 7252 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639374 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42505313 win 7976 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639374 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42506761 win 8700 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639375 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42508209 win 9424 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639396 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack
42509657 win 10148 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639647 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194578178: reply
ERR 1448
09:04:14.639657 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.1879243268: reply
ERR 1448
09:04:14.639661 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194609922: reply
ERR 1448
09:04:14.639665 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4009949700: reply
ERR 1448
09:04:14.639670 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194630148: reply
ERR 1448
09:04:14.639674 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.2533620228: reply
ERR 1448
09:04:14.639720 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194630148: reply
ERR 1448

I''ve briefly looked at some Google results for "reply ERR
1448" but haven''t come up with anything real concrete. 
I''m going to keep looking at that one to see if that may lead
somewhere.  In the meantime, I''ve disabled tx checksums in domU and am
running a couple more tests to see if I can reproduce the long I/O waits at all.
I''ll let you know how that turns out.  I also get some "reply ERR
1084" messages sprinkled in there, too.

I''ll also try out some of the NFS settings to see if anything there
helps and let you know.

Thanks for the help - much appreciated! 

--Nick 
>>> On Tue, Oct 23, 2007 at  9:17 PM, "Steve Senator (Senator
Ent)" <sts@senator.net> wrote:
Xen can exacerbate Linux SMP issues. Do you have hyperthreading turned 
on in your CPU''s? If so, at least for testing, try turning it off.

Also, beyond turning of the TX offloading in both the dom0 and domU, 
is there any chance that there''s another device attached to that 
bridge which would cause network delays? In particular, is there a 
device that may incorrectly see the domU IP as coming from the dom0 
due to an ARP conflict? I see that you''ve specified a fixed MAC 
address. Is there any chance that that same MAC address is used by the 
dom0? Perhaps the initrd is the one from dom0 and its got the MAC 
address set in the initrd to be the same as the one in the dom0?

Try tcpdumping from both domains and see if you see any 
retransmissions, or perhaps even a smoking gun like a system ARPing 
for itself when it should know better.

It''s also possible that there''s a transmission size problem.
There
have been reported problems of dom0<->domU traffic not honoring the 
MTU of the bridge or virtual device, which then forces retransmission 
when the receiving side cannot handle the larger buffer.

If NFS, try changing from TCP to UDP or modifying the rsize and wsize 
buffering to fit within the MTU of your (virtual) ethernet devices.

Hope this helps,
-Steve Senator

Quoting Nick Couchman <Nick.Couchman@seakr.com>:
> Hi, again...haven''t had any responses to this, yet.
>>>> Nick Couchman 10/18/07 11:05 AM >>>
> Hey, everyone,
> I''m having some issues with a Xen DomU right related to
performance.
>  ... The culprit seems to be  high I/O wait times related to the 
> network interface.
>
> The host machine is a Dell PowerEdge 1950 with 2 x Dual-Core Xeon  
> processors (Xeon 5150 @ 2.66GHz, 1333 FSB).  ...  Building these 
> Linux distributions on the physical system takes  70-80 minutes 
> (real time) - on the DomU system it takes 130-140  minutes.
> ...
> vif=[ ''mac=00:16:3e:75:0d:be,bridge=xenbr108'', ]

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Maybe Matching Threads

Search for more maybe matching threads

Xen users - Oct 2007 - Performance Issues: I/O Wait

[Xen-users] Performance Issues: I/O Wait

[Xen-users] Re: Performance Issues: I/O Wait

Re: [Xen-users] Re: Performance Issues: I/O Wait

Re: [Xen-users] Re: Performance Issues: I/O Wait

Re: [Xen-users] Re: Performance Issues: I/O Wait

Re: [Xen-users] Re: Performance Issues: I/O Wait

Re: [Xen-users] Re: Performance Issues: I/O Wait

Re: [Xen-users] Re: Performance Issues: I/O Wait

Maybe Matching Threads