thr3ads.net - Xen users - [Xen-users] Xen 4.0.0

If this information is useful, please help other people find it:
Share via:

Heiko Wundram

2010-May-04 13:09 UTC

[Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Hey all!

I''m currently in the process of migrating a (Gentoo-based) Xen-server
to use
Xen 4.0.0 (where I''m using the Xen ebuilds from bugs.gentoo.org), and
I''m
having severe problems with tapdisk2 (which I wish to use to do I/O
prioritizing using CFQ on the LVM-based backing storage of a virtual
server).

It seems that after a while of heavy I/O in the virtual domain, the
communication between the (paravirtualized) DomU and Dom0 (the
tapdisk2-process) breaks, in that no more interrupts are delivered to Dom0
for I/O requests from the virtual domain, and as such the virtual host
"loses" its harddisk (but does not "break" besides not
responding). The
network front-/backend is not affected by this communication loss, AFAICT.

The virtual host can be destroyed by an xm destroy, but the created blktap2
interface does not disappear until the next reboot, and cannot be removed by
the respective sysfs accesses (rather, echoing a 1 into "remove"
blocks,
too, and is "unkillable", i.e. stays in kernel space). After a blktap2
device has entered this broken state, no more hosts can be created by xm
create (that blocks, too), and the host system must be rebooted to enter a
usable state again.

I''ve not been able to provoke this breakage by "normal" I/O
(i.e., when the
hosts run normally), but I have been able to provoke it by using bonnie,
which after a short period of substained read/write I/O of +120MB/s will
freeze the blktap2 device.

The Dom0 and the DomU kernels that are being used are xen-sources-2.6.32-r1
(which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel sources,
AFAIK) from the official portage tree; the kernel configuration that''s
in
use is attached.

I''ve tried iommu=off for xen (the mobo doesn''t support VT-d
anyway, so Xen
never turns it on), and I''ve also looked for any signs of errors
appearing
when setting verbosity 9 for the blktap2 module and loglvl=all and
guest_loglvl=all for Xen, but there are no errors that I''ve seen so
far.

Strace-ing the tapdisk2 process reveals that it''s blocked on select(),
and
none of the descriptors it''s polling on ever return as readable (which
is
the condition that tapdisk2 queries), rather they always timeout after 600s.

Thanks in advance for any hint as to what is causing this, or if
there''s
anything I might try to get things working...

PS: I have to boot with acpi=off, as the mobo won''t reboot when acpi is
turned on for Dom0 (not even when disabling ACPI reboots), but using acpi
directly doesn''t change that blktap2 blocks.

--- Heiko.








_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andrew Lyon

2010-May-05 18:02 UTC

head link

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Tue, May 4, 2010 at 2:09 PM, Heiko Wundram <modelnine@modelnine.org>
wrote:> Hey all!
>
> I''m currently in the process of migrating a (Gentoo-based)
Xen-server to use
> Xen 4.0.0 (where I''m using the Xen ebuilds from bugs.gentoo.org),
and I''m
> having severe problems with tapdisk2 (which I wish to use to do I/O
> prioritizing using CFQ on the LVM-based backing storage of a virtual
> server).
>
> It seems that after a while of heavy I/O in the virtual domain, the
> communication between the (paravirtualized) DomU and Dom0 (the
> tapdisk2-process) breaks, in that no more interrupts are delivered to Dom0
> for I/O requests from the virtual domain, and as such the virtual host
> "loses" its harddisk (but does not "break" besides not
responding). The
> network front-/backend is not affected by this communication loss, AFAICT.
>
> The virtual host can be destroyed by an xm destroy, but the created blktap2
> interface does not disappear until the next reboot, and cannot be removed
by
> the respective sysfs accesses (rather, echoing a 1 into "remove"
blocks,
> too, and is "unkillable", i.e. stays in kernel space). After a
blktap2
> device has entered this broken state, no more hosts can be created by xm
> create (that blocks, too), and the host system must be rebooted to enter a
> usable state again.
>
> I''ve not been able to provoke this breakage by "normal"
I/O (i.e., when the
> hosts run normally), but I have been able to provoke it by using bonnie,
> which after a short period of substained read/write I/O of +120MB/s will
> freeze the blktap2 device.
>
> The Dom0 and the DomU kernels that are being used are xen-sources-2.6.32-r1
> (which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel
sources,
> AFAIK) from the official portage tree; the kernel configuration
that''s in
> use is attached.
>
> I''ve tried iommu=off for xen (the mobo doesn''t support
VT-d anyway, so Xen
> never turns it on), and I''ve also looked for any signs of errors
appearing
> when setting verbosity 9 for the blktap2 module and loglvl=all and
> guest_loglvl=all for Xen, but there are no errors that I''ve seen
so far.
>
> Strace-ing the tapdisk2 process reveals that it''s blocked on
select(), and
> none of the descriptors it''s polling on ever return as readable
(which is
> the condition that tapdisk2 queries), rather they always timeout after
600s.
>
> Thanks in advance for any hint as to what is causing this, or if
there''s
> anything I might try to get things working...
>
> PS: I have to boot with acpi=off, as the mobo won''t reboot when
acpi is
> turned on for Dom0 (not even when disabling ACPI reboots), but using acpi
> directly doesn''t change that blktap2 blocks.
>
> --- Heiko.
>
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>
I have had exactly the same problem and ended up going back to tapdisk1.

I was able to replicate the problem using the entire SLE11-SP1 kernel
source patch set which proves that the bug exists upstream,
unfortunately I am very busy on other projects at the moment so did
not have time to debug it at all.

The SLE11-SP1 tree has been updated since xen-sources-2.6.32-r1, I
will make a updated set of patches for you to try but it will take me
a couple of days.

Andy

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Heiko Wundram

2010-May-05 18:48 UTC

head link

AW: Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Thanks for the info!

I''ve already tried a PV_OPS kernel in the meantime, and that seems to
work (blktap2), but bombs somewhere else and I haven''t had the time to
analyze that crash.

Generally, why I''m interested in blktap2 is that I can''t get a
host that has an XFS root filesystem to boot with blktap1; the PV kernel loads,
but freezes when trying to access the root filesystem (this doesn''t
happen for another host which uses an ext3 root). The aftermath of the XFS
freeze is the same: the dom0 is fscked and needs to be rebooted. If all fails,
I''ll just stick with blkback for the moment. ;-)

Anyway, thanks for the info, and if you get around to packaging the new kernel,
just give me a heads up and I''ll happily try it!



-- Heiko vom Palm Pre
Andrew Lyon &lt;andrew.lyon@gmail.com&gt; schrieb am 05.05.2010 20:03: 

On Tue, May 4, 2010 at 2:09 PM, Heiko Wundram
&lt;modelnine@modelnine.org&gt; wrote:

&gt; Hey all!

&gt;

&gt; I''m currently in the process of migrating a (Gentoo-based)
Xen-server to use

&gt; Xen 4.0.0 (where I''m using the Xen ebuilds from
bugs.gentoo.org), and I''m

&gt; having severe problems with tapdisk2 (which I wish to use to do I/O

&gt; prioritizing using CFQ on the LVM-based backing storage of a virtual

&gt; server).

&gt;

&gt; It seems that after a while of heavy I/O in the virtual domain, the

&gt; communication between the (paravirtualized) DomU and Dom0 (the

&gt; tapdisk2-process) breaks, in that no more interrupts are delivered to
Dom0

&gt; for I/O requests from the virtual domain, and as such the virtual host

&gt; "loses" its harddisk (but does not "break" besides
not responding). The

&gt; network front-/backend is not affected by this communication loss,
AFAICT.

&gt;

&gt; The virtual host can be destroyed by an xm destroy, but the created
blktap2

&gt; interface does not disappear until the next reboot, and cannot be
removed by

&gt; the respective sysfs accesses (rather, echoing a 1 into
"remove" blocks,

&gt; too, and is "unkillable", i.e. stays in kernel space). After
a blktap2

&gt; device has entered this broken state, no more hosts can be created by
xm

&gt; create (that blocks, too), and the host system must be rebooted to
enter a

&gt; usable state again.

&gt;

&gt; I''ve not been able to provoke this breakage by
"normal" I/O (i.e., when the

&gt; hosts run normally), but I have been able to provoke it by using
bonnie,

&gt; which after a short period of substained read/write I/O of +120MB/s
will

&gt; freeze the blktap2 device.

&gt;

&gt; The Dom0 and the DomU kernels that are being used are
xen-sources-2.6.32-r1

&gt; (which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel
sources,

&gt; AFAIK) from the official portage tree; the kernel configuration
that''s in

&gt; use is attached.

&gt;

&gt; I''ve tried iommu=off for xen (the mobo doesn''t
support VT-d anyway, so Xen

&gt; never turns it on), and I''ve also looked for any signs of
errors appearing

&gt; when setting verbosity 9 for the blktap2 module and loglvl=all and

&gt; guest_loglvl=all for Xen, but there are no errors that I''ve
seen so far.

&gt;

&gt; Strace-ing the tapdisk2 process reveals that it''s blocked on
select(), and

&gt; none of the descriptors it''s polling on ever return as
readable (which is

&gt; the condition that tapdisk2 queries), rather they always timeout after
600s.

&gt;

&gt; Thanks in advance for any hint as to what is causing this, or if
there''s

&gt; anything I might try to get things working...

&gt;

&gt; PS: I have to boot with acpi=off, as the mobo won''t reboot
when acpi is

&gt; turned on for Dom0 (not even when disabling ACPI reboots), but using
acpi

&gt; directly doesn''t change that blktap2 blocks.

&gt;

&gt; --- Heiko.

&gt;

&gt;

&gt;

&gt; _______________________________________________

&gt; Xen-users mailing list

&gt; Xen-users@lists.xensource.com

&gt; http://lists.xensource.com/xen-users

&gt;



I have had exactly the same problem and ended up going back to tapdisk1.



I was able to replicate the problem using the entire SLE11-SP1 kernel

source patch set which proves that the bug exists upstream,

unfortunately I am very busy on other projects at the moment so did

not have time to debug it at all.



The SLE11-SP1 tree has been updated since xen-sources-2.6.32-r1, I

will make a updated set of patches for you to try but it will take me

a couple of days.



Andy



_______________________________________________

Xen-users mailing list

Xen-users@lists.xensource.com

http://lists.xensource.com/xen-users




_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-May-06 06:31 UTC

head link

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Thu, May 6, 2010 at 1:02 AM, Andrew Lyon <andrew.lyon@gmail.com>
wrote:> On Tue, May 4, 2010 at 2:09 PM, Heiko Wundram
<modelnine@modelnine.org> wrote:
>> It seems that after a while of heavy I/O in the virtual domain, the
>> communication between the (paravirtualized) DomU and Dom0 (the
>> tapdisk2-process) breaks
>> I''ve not been able to provoke this breakage by
"normal" I/O (i.e., when the
>> hosts run normally), but I have been able to provoke it by using
bonnie,
>> which after a short period of substained read/write I/O of +120MB/s
will
>> freeze the blktap2 device.
> I have had exactly the same problem and ended up going back to tapdisk1.
>
That''s odd. I don''t have that behavior (only tested with
bonnie++ on
PV domU). Then again, my throughput maxed about 35MB/s, which might
not be big enough to trigger the bug.
> I was able to replicate the problem using the entire SLE11-SP1 kernel
> source patch set which proves that the bug exists upstream,
> unfortunately I am very busy on other projects at the moment so did
> not have time to debug it at all.
>
> The SLE11-SP1 tree has been updated since xen-sources-2.6.32-r1, I
> will make a updated set of patches for you to try but it will take me
> a couple of days.
Thanks. We''ll be waiting :D

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-May-06 08:14 UTC

head link

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Thu, May 6, 2010 at 1:31 PM, Fajar A. Nugraha <fajar@fajar.net>
wrote:> On Thu, May 6, 2010 at 1:02 AM, Andrew Lyon <andrew.lyon@gmail.com>
wrote:
>> I have had exactly the same problem and ended up going back to
tapdisk1.
Just curious: how did you go back to tapdisk1? And how did you verify
it does indeed use tapdisk1?

When I tried "xm block-attach 0 tap:aio:...", lsmod shows blktap
module (which is the new blktap2 module) is used. Also /dev/tapdev*,
/dev/xen/blktap-2/blktap*, and /dev/xen/blktap-2/tapdev* is created.
So it seems that blktap2 is used unconditionally? CMIIW.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andrew Lyon

2010-May-17 09:07 UTC

head link

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Wed, May 5, 2010 at 7:02 PM, Andrew Lyon <andrew.lyon@gmail.com>
wrote:> On Tue, May 4, 2010 at 2:09 PM, Heiko Wundram
<modelnine@modelnine.org> wrote:
>> Hey all!
>>
>> I''m currently in the process of migrating a (Gentoo-based)
Xen-server to use
>> Xen 4.0.0 (where I''m using the Xen ebuilds from
bugs.gentoo.org), and I''m
>> having severe problems with tapdisk2 (which I wish to use to do I/O
>> prioritizing using CFQ on the LVM-based backing storage of a virtual
>> server).
>>
>> It seems that after a while of heavy I/O in the virtual domain, the
>> communication between the (paravirtualized) DomU and Dom0 (the
>> tapdisk2-process) breaks, in that no more interrupts are delivered to
Dom0
>> for I/O requests from the virtual domain, and as such the virtual host
>> "loses" its harddisk (but does not "break" besides
not responding). The
>> network front-/backend is not affected by this communication loss,
AFAICT.
>>
>> The virtual host can be destroyed by an xm destroy, but the created
blktap2
>> interface does not disappear until the next reboot, and cannot be
removed by
>> the respective sysfs accesses (rather, echoing a 1 into
"remove" blocks,
>> too, and is "unkillable", i.e. stays in kernel space). After
a blktap2
>> device has entered this broken state, no more hosts can be created by
xm
>> create (that blocks, too), and the host system must be rebooted to
enter a
>> usable state again.
>>
>> I''ve not been able to provoke this breakage by
"normal" I/O (i.e., when the
>> hosts run normally), but I have been able to provoke it by using
bonnie,
>> which after a short period of substained read/write I/O of +120MB/s
will
>> freeze the blktap2 device.
>>
>> The Dom0 and the DomU kernels that are being used are
xen-sources-2.6.32-r1
>> (which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel
sources,
>> AFAIK) from the official portage tree; the kernel configuration
that''s in
>> use is attached.
>>
>> I''ve tried iommu=off for xen (the mobo doesn''t
support VT-d anyway, so Xen
>> never turns it on), and I''ve also looked for any signs of
errors appearing
>> when setting verbosity 9 for the blktap2 module and loglvl=all and
>> guest_loglvl=all for Xen, but there are no errors that I''ve
seen so far.
>>
>> Strace-ing the tapdisk2 process reveals that it''s blocked on
select(), and
>> none of the descriptors it''s polling on ever return as
readable (which is
>> the condition that tapdisk2 queries), rather they always timeout after
600s.
>>
>> Thanks in advance for any hint as to what is causing this, or if
there''s
>> anything I might try to get things working...
>>
>> PS: I have to boot with acpi=off, as the mobo won''t reboot
when acpi is
>> turned on for Dom0 (not even when disabling ACPI reboots), but using
acpi
>> directly doesn''t change that blktap2 blocks.
>>
>> --- Heiko.
>>
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>>
>
> I have had exactly the same problem and ended up going back to tapdisk1.
>
> I was able to replicate the problem using the entire SLE11-SP1 kernel
> source patch set which proves that the bug exists upstream,
> unfortunately I am very busy on other projects at the moment so did
> not have time to debug it at all.
>
> The SLE11-SP1 tree has been updated since xen-sources-2.6.32-r1, I
> will make a updated set of patches for you to try but it will take me
> a couple of days.
>
> Andy
>
Hi,

I have uploaded updated 2.6.32 patches and ebuild to
http://code.google.com/p/gentoo-xen-kernel/downloads/list, note that
patches should be applied to 2.6.32.13.

They should be added to portage in a few days time, provided no
problems are found.

Andy

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-May-17 10:28 UTC

head link

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Mon, May 17, 2010 at 4:07 PM, Andrew Lyon <andrew.lyon@gmail.com>
wrote:> I have uploaded updated 2.6.32 patches and ebuild to
> http://code.google.com/p/gentoo-xen-kernel/downloads/list, note that
> patches should be applied to 2.6.32.13.
>
> They should be added to portage in a few days time, provided no
> problems are found.
Quick question. Have you had the chance to test this version yet? You
mentioned that you had problems with tapdisk2 and the previous
version.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Heiko Wundram

2010-May-17 10:51 UTC

head link

AW: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

I''m in the process of testing this version (i.e., the kernel is being
compiled as I write this); I should be able to comment on whether this fixes
things some time in the afternoon.

--- Heiko.

-----Ursprüngliche Nachricht-----
Von: xen-users-bounces@lists.xensource.com
[mailto:xen-users-bounces@lists.xensource.com] Im Auftrag von Fajar A.
Nugraha
Gesendet: Montag, 17. Mai 2010 12:28
An: Andrew Lyon
Cc: Xen User-List
Betreff: Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Mon, May 17, 2010 at 4:07 PM, Andrew Lyon <andrew.lyon@gmail.com>
wrote:> I have uploaded updated 2.6.32 patches and ebuild to
> http://code.google.com/p/gentoo-xen-kernel/downloads/list, note that
> patches should be applied to 2.6.32.13.
>
> They should be added to portage in a few days time, provided no
> problems are found.
Quick question. Have you had the chance to test this version yet? You
mentioned that you had problems with tapdisk2 and the previous
version.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-May-17 14:56 UTC

head link

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Mon, May 17, 2010 at 5:51 PM, Heiko Wundram <modelnine@modelnine.org>
wrote:> I''m in the process of testing this version (i.e., the kernel is
being
> compiled as I write this); I should be able to comment on whether this
fixes
> things some time in the afternoon.
Thanks.

Could you also start-and-destroy some domUs repeatedly (two or three
times should be enough), and see if the devices on /dev/xen/blktap-2/*
keeps on increasing instead of reusing ununsed device nodes? Just want
to make sure whether the other problem I had is specific to my
environment (distro version, config, etc.) or if it''s in the kernel.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Heiko Wundram

2010-May-17 15:45 UTC

head link

AW: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

No, this update does not fix the behaviour I''m seeing with tapdisk2
(i.e.,
64-bit HVM, Multi-VCPU-domains causing the tapdisk2 process to block
indefinitely/fail), and I''ve also got the problem that tapdev*-devices
don''t
disappear after restarting a DomU.

So, I guess this is no "fix" yet. I''m still waiting for the
patch spoken of
on xen-devel, which queues the requests for tapdisk2 to a single thread,
which might possibly be a fix for this (as I''m not seeing the problem
on
Uni-VCPU domains).

--- Heiko.


-----Ursprüngliche Nachricht-----
Von: xen-users-bounces@lists.xensource.com
[mailto:xen-users-bounces@lists.xensource.com] Im Auftrag von Fajar A.
Nugraha
Gesendet: Montag, 17. Mai 2010 16:57
An: Heiko Wundram
Cc: Xen User-List
Betreff: Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Mon, May 17, 2010 at 5:51 PM, Heiko Wundram <modelnine@modelnine.org>
wrote:> I''m in the process of testing this version (i.e., the kernel is
being
> compiled as I write this); I should be able to comment on whether this
fixes> things some time in the afternoon.
Thanks.

Could you also start-and-destroy some domUs repeatedly (two or three
times should be enough), and see if the devices on /dev/xen/blktap-2/*
keeps on increasing instead of reusing ununsed device nodes? Just want
to make sure whether the other problem I had is specific to my
environment (distro version, config, etc.) or if it''s in the kernel.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Heiko Wundram

2010-May-21 07:50 UTC

head link

AW: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Just to let you know: I let the machine which tested the 2.6.32-xen-r2
kernel run with it (i.e., I didn''t downgrade again), and the machine
froze
completely yesterday (out of the blue, without any specific strain on the
machines running on it, using the "known working" Xen setup). This
didn''t
happen for me with 2.6.32-xen-r1.

--- Heiko.


-----Ursprüngliche Nachricht-----
Von: xen-users-bounces@lists.xensource.com
[mailto:xen-users-bounces@lists.xensource.com] Im Auftrag von Andrew Lyon
Gesendet: Montag, 17. Mai 2010 11:08
An: Heiko Wundram
Cc: xen-users@lists.xensource.com
Betreff: Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Wed, May 5, 2010 at 7:02 PM, Andrew Lyon <andrew.lyon@gmail.com>
wrote:> On Tue, May 4, 2010 at 2:09 PM, Heiko Wundram
<modelnine@modelnine.org>
wrote:>> Hey all!
>>
>> I''m currently in the process of migrating a (Gentoo-based)
Xen-server to
use>> Xen 4.0.0 (where I''m using the Xen ebuilds from
bugs.gentoo.org), and I''m
>> having severe problems with tapdisk2 (which I wish to use to do I/O
>> prioritizing using CFQ on the LVM-based backing storage of a virtual
>> server).
>>
>> It seems that after a while of heavy I/O in the virtual domain, the
>> communication between the (paravirtualized) DomU and Dom0 (the
>> tapdisk2-process) breaks, in that no more interrupts are delivered to
Dom0>> for I/O requests from the virtual domain, and as such the virtual host
>> "loses" its harddisk (but does not "break" besides
not responding). The
>> network front-/backend is not affected by this communication loss,
AFAICT.>>
>> The virtual host can be destroyed by an xm destroy, but the created
blktap2>> interface does not disappear until the next reboot, and cannot be
removed
by>> the respective sysfs accesses (rather, echoing a 1 into
"remove" blocks,
>> too, and is "unkillable", i.e. stays in kernel space). After
a blktap2
>> device has entered this broken state, no more hosts can be created by
xm
>> create (that blocks, too), and the host system must be rebooted to
enter
a>> usable state again.
>>
>> I''ve not been able to provoke this breakage by
"normal" I/O (i.e., when
the>> hosts run normally), but I have been able to provoke it by using
bonnie,
>> which after a short period of substained read/write I/O of +120MB/s
will
>> freeze the blktap2 device.
>>
>> The Dom0 and the DomU kernels that are being used are
xen-sources-2.6.32-r1>> (which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel
sources,>> AFAIK) from the official portage tree; the kernel configuration
that''s in
>> use is attached.
>>
>> I''ve tried iommu=off for xen (the mobo doesn''t
support VT-d anyway, so
Xen>> never turns it on), and I''ve also looked for any signs of
errors
appearing>> when setting verbosity 9 for the blktap2 module and loglvl=all and
>> guest_loglvl=all for Xen, but there are no errors that I''ve
seen so far.
>>
>> Strace-ing the tapdisk2 process reveals that it''s blocked on
select(),
and>> none of the descriptors it''s polling on ever return as
readable (which is
>> the condition that tapdisk2 queries), rather they always timeout after
600s.>>
>> Thanks in advance for any hint as to what is causing this, or if
there''s
>> anything I might try to get things working...
>>
>> PS: I have to boot with acpi=off, as the mobo won''t reboot
when acpi is
>> turned on for Dom0 (not even when disabling ACPI reboots), but using
acpi
>> directly doesn''t change that blktap2 blocks.
>>
>> --- Heiko.
>>
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>>
>
> I have had exactly the same problem and ended up going back to tapdisk1.
>
> I was able to replicate the problem using the entire SLE11-SP1 kernel
> source patch set which proves that the bug exists upstream,
> unfortunately I am very busy on other projects at the moment so did
> not have time to debug it at all.
>
> The SLE11-SP1 tree has been updated since xen-sources-2.6.32-r1, I
> will make a updated set of patches for you to try but it will take me
> a couple of days.
>
> Andy
>
Hi,

I have uploaded updated 2.6.32 patches and ebuild to
http://code.google.com/p/gentoo-xen-kernel/downloads/list, note that
patches should be applied to 2.6.32.13.

They should be added to portage in a few days time, provided no
problems are found.

Andy

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - May 2010 - Xen 4.0.0 - tapdisk2 "hangs"

[Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

AW: Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

AW: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

AW: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

AW: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"