thr3ads.net - Xen devel - [Xen-devel] freezing when using GPLPV drivers (including Dom0) [Dec 2008]

If this information is useful, please help other people find it:
Share via:

James Harper

2008-Dec-31 02:46 UTC

[Xen-devel] freezing when using GPLPV drivers (including Dom0)

I''m trying to resolve an issue in my GPLPV drivers that has come about
in doing some restores using Backup Exec across the network.

The server running Backup Exec can be a DomU or a completely separate
machine (connected via gigabit Ethernet).

When restoring a large file (30G exchange mailbox store), everything
locks up for a bit, long enough for ARP to timeout and the TCP
connection for the backup data to drop, failing the backup. This can
happen anywhere from 500MB to 20G into the restore, but normally around
the 2G mark.

Investigating is a bit tricky as even Dom0 is not usable - any command I
type at a shell doesn''t do anything until it unfreezes. When everything
comes back, it all comes back at once. There are no messages in the
kernel logs or the xen logs.

I am suspecting that maybe the problem is disk starvation but I don''t
quite understand why the lockup happens for so long. I''m also not sure
why I''m only seeing the problem when using my GPLPV drivers - one
possibility is that the increased performance puts more load on the
storage system.

Any suggestions?

Thanks

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Dec-31 02:53 UTC

head link

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

Did restore process finish even when you see freeze in the middle?

Thanks,
Kevin
>From: James Harper
>Sent: Wednesday, December 31, 2008 10:46 AM
>
>I''m trying to resolve an issue in my GPLPV drivers that has come
about
>in doing some restores using Backup Exec across the network.
>
>The server running Backup Exec can be a DomU or a completely separate
>machine (connected via gigabit Ethernet).
>
>When restoring a large file (30G exchange mailbox store), everything
>locks up for a bit, long enough for ARP to timeout and the TCP
>connection for the backup data to drop, failing the backup. This can
>happen anywhere from 500MB to 20G into the restore, but normally around
>the 2G mark.
>
>Investigating is a bit tricky as even Dom0 is not usable - any 
>command I
>type at a shell doesn''t do anything until it unfreezes. When
everything
>comes back, it all comes back at once. There are no messages in the
>kernel logs or the xen logs.
>
>I am suspecting that maybe the problem is disk starvation but I
don''t
>quite understand why the lockup happens for so long. I''m also not
sure
>why I''m only seeing the problem when using my GPLPV drivers - one
>possibility is that the increased performance puts more load on the
>storage system.
>
>Any suggestions?
>
>Thanks
>
>James
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Dec-31 02:55 UTC

head link

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

> 
> Did restore process finish even when you see freeze in the middle?
> 
No. the TCP connection is closed because of the delay (arp cache times
out) which causes the restore to fail.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Dec-31 03:00 UTC

head link

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

>From: James Harper
>Sent: Wednesday, December 31, 2008 10:56 AM
>
>> 
>> Did restore process finish even when you see freeze in the middle?
>> 
>
>No. the TCP connection is closed because of the delay (arp cache times
>out) which causes the restore to fail.
>
>James
>
Then if you kill this windows guest, does dom0 come back normal?
If yes, it''s possible due to servicing windows guest activity such as
heavy disk i/o as you guess. If not, it may indicate some hang condition
happening either within dom0 and Xen, and then you may first find out
the hang point and then dig into for detail. Also it''d be good to check
both dom0/xen dmesg to see any warning reported already.

Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Dec-31 03:07 UTC

head link

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

>From: James Harper
>Sent: Wednesday, December 31, 2008 10:46 AM
>
>I am suspecting that maybe the problem is disk starvation but I
don''t
>quite understand why the lockup happens for so long. I''m also not
sure
>why I''m only seeing the problem when using my GPLPV drivers - one
>possibility is that the increased performance puts more load on the
>storage system.
>
Maybe you can check cycles spent on kernel thread/event handler
in backend driver side. I''m not sure whether heavy communication
between be/fe could disturb dom0 scheduler if care is not taken in
current design. E.g. back kernel thread may eat too many cycles
before giving up, or your GPLPV fe driver may issue too many events
to break be side... 

Just my two cents. :-)

Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Dec-31 03:16 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

> 
> >From: James Harper
> >Sent: Wednesday, December 31, 2008 10:46 AM
> >
> >I am suspecting that maybe the problem is disk starvation but I
don''t
> >quite understand why the lockup happens for so long. I''m also
not
sure> >why I''m only seeing the problem when using my GPLPV drivers -
one
> >possibility is that the increased performance puts more load on the
> >storage system.
> >
> 
> Maybe you can check cycles spent on kernel thread/event handler
> in backend driver side. I''m not sure whether heavy communication
> between be/fe could disturb dom0 scheduler if care is not taken in
> current design. E.g. back kernel thread may eat too many cycles
> before giving up, or your GPLPV fe driver may issue too many events
> to break be side...
> 
I am running the restore again and monitoring using:
. xentop running in dom0
. arping to the DomU running from an external machine
. ping to Dom0 running from an external machine

With arping and ping running I have noticed that the freeze is not
always long enough to cause the TCP connections to time out - I was only
noticing the ones that were long enough.

During the freeze, xentop shows very low Dom0 and DomU CPU, arping stops
receiving replies to the arp requests, but the ping to Dom0 keeps going.
The freeze that just occurred was not long enough for me to tell if the
DomU xentop counters for network and disk were increasing or not.
(xentop keeps running, lending weight to the freeze only concerning
tasks that want to access the disk).

Is there a way under Linux of monitoring disk queue length? I am using
LVM on top of a low end HP ''Smart Array'' (E200) running two
RAID1
volumes using SATA disks.

Thanks

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Dec-31 03:23 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

>From: James Harper [mailto:james.harper@bendigoit.com.au] 
>Sent: Wednesday, December 31, 2008 11:16 AM
>
>> 
>> >From: James Harper
>> >Sent: Wednesday, December 31, 2008 10:46 AM
>> >
>> >I am suspecting that maybe the problem is disk starvation 
>but I don''t
>> >quite understand why the lockup happens for so long. I''m
also not
>sure
>> >why I''m only seeing the problem when using my GPLPV
drivers - one
>> >possibility is that the increased performance puts more load on the
>> >storage system.
>> >
>> 
>> Maybe you can check cycles spent on kernel thread/event handler
>> in backend driver side. I''m not sure whether heavy
communication
>> between be/fe could disturb dom0 scheduler if care is not taken in
>> current design. E.g. back kernel thread may eat too many cycles
>> before giving up, or your GPLPV fe driver may issue too many events
>> to break be side...
>> 
>
>I am running the restore again and monitoring using:
>. xentop running in dom0
>. arping to the DomU running from an external machine
>. ping to Dom0 running from an external machine
>
>With arping and ping running I have noticed that the freeze is not
>always long enough to cause the TCP connections to time out - 
>I was only
>noticing the ones that were long enough.
>
>During the freeze, xentop shows very low Dom0 and DomU CPU, 
>arping stops
>receiving replies to the arp requests, but the ping to Dom0 
>keeps going.
>The freeze that just occurred was not long enough for me to tell if the
>DomU xentop counters for network and disk were increasing or not.
>(xentop keeps running, lending weight to the freeze only concerning
>tasks that want to access the disk).
>
>Is there a way under Linux of monitoring disk queue length? I am using
>LVM on top of a low end HP ''Smart Array'' (E200) running
two RAID1
>volumes using SATA disks.
>
''sar'' could provide such info, IMO.

Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Dec-31 03:37 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

>
> >Is there a way under Linux of monitoring disk queue length? I am
using> >LVM on top of a low end HP ''Smart Array'' (E200)
running two RAID1
> >volumes using SATA disks.
> >
> 
> ''sar'' could provide such info, IMO.
> 
iostat shows very very low disk usage when things are frozen. I am
finding that I can type ''sync'' and things will unfreeze
again...
unfreezing before the sync completes. I haven''t done this enough times
to know if things would have unfrozen on their own though.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Venefax

2008-Dec-31 03:45 UTC

head link

[Xen-devel] Question

Dear Gentlemen
Suppose you need to know the overall load on the host, in terms if CPU,
bandwidth and disk IO, not per domu, but aggregated, and split per domu and
dom0. Xentop does not show aggregated totals, and also it does not show
percentages relative to available resources, so for management is kind of
useless. The only tool that shows (somehow) that information is graphic, the
libvirt virtual machine manager, but is there a text mode tool to manage a
node?
Federico


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Dec-31 04:18 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

>From: James Harper [mailto:james.harper@bendigoit.com.au] 
>Sent: Wednesday, December 31, 2008 11:37 AM
>> >Is there a way under Linux of monitoring disk queue length? I am
>using
>> >LVM on top of a low end HP ''Smart Array'' (E200)
running two RAID1
>> >volumes using SATA disks.
>> >
>> 
>> ''sar'' could provide such info, IMO.
>> 
>
>iostat shows very very low disk usage when things are frozen. I am
>finding that I can type ''sync'' and things will unfreeze
again...
>unfreezing before the sync completes. I haven''t done this enough
times
>to know if things would have unfrozen on their own though.
>
That looks interesting. Now both cpu/disk utilizations are low, but system
is not responsive for unknown time... Does time in dom0 look sane? I
guess you may have to check behavior/statistics of fe/be drivers in depth,
e.g. event count/s, whether kernel thread is waken effectively, how many
requests handled per event notification, etc. and then may judge whether
those stats are expected.

Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Dec-31 04:51 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

> 
> >From: James Harper [mailto:james.harper@bendigoit.com.au]
> >Sent: Wednesday, December 31, 2008 11:37 AM
> >> >Is there a way under Linux of monitoring disk queue length? I
am
> >using
> >> >LVM on top of a low end HP ''Smart Array''
(E200) running two RAID1
> >> >volumes using SATA disks.
> >> >
> >>
> >> ''sar'' could provide such info, IMO.
> >>
> >
> >iostat shows very very low disk usage when things are frozen. I am
> >finding that I can type ''sync'' and things will
unfreeze again...
> >unfreezing before the sync completes. I haven''t done this
enough
times> >to know if things would have unfrozen on their own though.
> >
> 
> That looks interesting. Now both cpu/disk utilizations are low, but
system> is not responsive for unknown time... Does time in dom0 look sane? I
> guess you may have to check behavior/statistics of fe/be drivers in
depth,> e.g. event count/s, whether kernel thread is waken effectively, how
many> requests handled per event notification, etc. and then may judge
whether> those stats are expected.
> 
I have written a script that does ''sync ; sleep 5'' in a loop.
My restore
is now at 20G and still going. I''ll follow up if it completes.

I''m not sure where to look for this problem though... When I use the
qemu emulated devices instead of GPLPV, the restore runs to completion,
but it also runs slower, so maybe the problem isn''t the GPLPV drivers
but more that the qemu drivers can''t get the i/o load up high enough to
see the problem.

As I said earlier in the thread, the system is using a HP E200
''Smart''
array controller, with no battery backup, and 2 pairs of RAID1 arrays on
SATA disks. Obviously not the highest performing setup ever.

I have a 500G disk I can attach to one of the onboard SATA ports, but
I''m not sure that that will actually prove anything either way.

One other thing I didn''t mention - I am using sparse files as my disk
images, using ''file:'' under Xen. Again, not the highest
performing
configuration, but the restore process we are using needs to see disks
at least as big as those that were backed up originally, and I just
don''t have 2TB of disk lying around! The data access is DomU ->
blkback
-> /dev/loopX -> file(sparse) -> filesystem(xfs) -> LVM ->
E200...
that''s a lot of room for stuff to go wrong in isn''t it?

I could try switching to tap:aio but I don''t think that my GPLPV
drivers
work in that configuration... maybe time to find out why :)

Thanks

James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-Dec-31 09:21 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

> One other thing I didn''t mention - I am using sparse files as my
disk
> images, using ''file:'' under Xen. Again, not the highest
performing
> configuration, but the restore process we are using needs to see disks
> at least as big as those that were backed up originally, and I just
> don''t have 2TB of disk lying around! The data access is DomU ->
blkback> -> /dev/loopX -> file(sparse) -> filesystem(xfs) -> LVM ->
E200...
> that''s a lot of room for stuff to go wrong in isn''t it?
Which loop driver are you using? The std loop driver is well known to
deadlock under high write load. I think this may have been fixed with
loop-ng, but you''d likely be better off using tap:aio.

Ian

 > I could try switching to tap:aio but I don''t think that my GPLPV
> drivers
> work in that configuration... maybe time to find out why :)
> 
> Thanks
> 
> James
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Dec-31 10:21 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

> 
> > One other thing I didn''t mention - I am using sparse files as
my
disk> > images, using ''file:'' under Xen. Again, not the
highest performing
> > configuration, but the restore process we are using needs to see
disks> > at least as big as those that were backed up originally, and I just
> > don''t have 2TB of disk lying around! The data access is DomU
->
> blkback
> > -> /dev/loopX -> file(sparse) -> filesystem(xfs) -> LVM
-> E200...
> > that''s a lot of room for stuff to go wrong in isn''t
it?
> 
> Which loop driver are you using? The std loop driver is well known to
> deadlock under high write load. I think this may have been fixed with
> loop-ng, but you''d likely be better off using tap:aio.
> 
I''ve never even heard of loop-ng... I just did a
''find'' for any kernel
module with ''loop'' in the name and didn''t see
anything called
''loop-ng''... is it something I need to enable in the kernel
config?

I just tried tap:aio but the DomU hung for ages after starting the
restore... I''m just about to investigate.

Thanks

James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-Dec-31 10:41 UTC

head link

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

> I''ve never even heard of loop-ng... I just did a
''find'' for any kernel
> module with ''loop'' in the name and didn''t see
anything called
> ''loop-ng''... is it something I need to enable in the
kernel config?
It''s now part of device mapper, and called dm-loop. The key improvement
is that it''s supposed to avoid dirtying unbounded amounts of memory and
then deadlocking. 

[However, this type of deadlock is fairly terminal -- I''ve never tried,
but I don''t think ''sync'' would unwedge it, so you may
have a different
issue.]
> I just tried tap:aio but the DomU hung for ages after starting the
> restore... I''m just about to investigate.
Blktap certainly doesn''t suffer from memory deadlock issues as it opens
the file O_DIRECT.

BTW: To my mind we should switch over from blktap to blktap2 soon.
Blktap2 isn''t as mature yet, but its more aesthetically pleasing and
has
equivalent performance. 

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Dec-31 10:50 UTC

head link

Re: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

On 31/12/2008 10:41, "Ian Pratt" <Ian.Pratt@eu.citrix.com>
wrote:
> BTW: To my mind we should switch over from blktap to blktap2 soon.
> Blktap2 isn''t as mature yet, but its more aesthetically pleasing
and has
> equivalent performance.
There''s currently discussion between Andy''s team and Intel to
get blktap2
working with Intel''s test setup. When that works, blktap2 will be going
into
xen-unstable.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Dec 2008 - freezing when using GPLPV drivers (including Dom0)

[Xen-devel] freezing when using GPLPV drivers (including Dom0)

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

[Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

[Xen-devel] Question

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

Re: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)