thr3ads.net - Xen devel - [Xen-devel] phy disks and vifs timing out in DomU [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Anthony Wright

2011-Jul-28 07:24 UTC

[Xen-devel] phy disks and vifs timing out in DomU

I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am trying to run a 32 bit PV
DomU with two tap:aio disks, two phy disks & 1 vif. The two tap:aio disks
are working fine, but the phy disks and the vif don''t work and I get
the following error messages from the DomU kernel during boot:

[    1.783658] Using IPI No-Shortcut mode
[   11.880061] XENBUS: Timeout connecting to device: device/vbd/51729 (state 3)
[   11.880072] XENBUS: Timeout connecting to device: device/vbd/51745 (state 3)
[   11.880079] XENBUS: Timeout connecting to device: device/vif/0 (state 0)
[   11.880146] md: Waiting for all devices to be available before autodetect

The DomU VM runs linux version 2.6.30.1 and has worked perfectly on other
systems running a 2.6.18 kernel under Xen 3.4.

Any ideas?

thanks,

Anthony Wright.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Todd Deshane

2011-Jul-28 15:01 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright <anthony@overnetdata.com>
wrote:> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am trying to run a 32
bit PV DomU with two tap:aio disks, two phy disks & 1 vif. The two tap:aio
disks are working fine, but the phy disks and the vif don''t work and I
get the following error messages from the DomU kernel during boot:
>
> [    1.783658] Using IPI No-Shortcut mode
> [   11.880061] XENBUS: Timeout connecting to device: device/vbd/51729
(state 3)
> [   11.880072] XENBUS: Timeout connecting to device: device/vbd/51745
(state 3)
> [   11.880079] XENBUS: Timeout connecting to device: device/vif/0 (state 0)
> [   11.880146] md: Waiting for all devices to be available before
autodetect
>
> The DomU VM runs linux version 2.6.30.1 and has worked perfectly on other
systems running a 2.6.18 kernel under Xen 3.4.
>
> Any ideas?
>
You should post your domU config file. Maybe the problem is some
syntax change from Xen 3.4 to Xen 4.1 or from 2.6.18 kernel to 3.0
kernel.

Thanks,
Todd

-- 
Todd Deshane
http://www.linkedin.com/in/deshantm
http://www.xen.org/products/cloudxen.html
http://runningxen.com/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Jul-28 15:36 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On 28/07/2011 16:01, Todd Deshane wrote:> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright
<anthony@overnetdata.com> wrote:
>> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am trying to run a
32 bit PV DomU with two tap:aio disks, two phy disks & 1 vif. The two
tap:aio disks are working fine, but the phy disks and the vif don''t
work and I get the following error messages from the DomU kernel during boot:
>>
>> [    1.783658] Using IPI No-Shortcut mode
>> [   11.880061] XENBUS: Timeout connecting to device: device/vbd/51729
(state 3)
>> [   11.880072] XENBUS: Timeout connecting to device: device/vbd/51745
(state 3)
>> [   11.880079] XENBUS: Timeout connecting to device: device/vif/0
(state 0)
>> [   11.880146] md: Waiting for all devices to be available before
autodetect
>>
>> The DomU VM runs linux version 2.6.30.1 and has worked perfectly on
other systems running a 2.6.18 kernel under Xen 3.4.
>>
>> Any ideas?
>>
> You should post your domU config file. Maybe the problem is some
> syntax change from Xen 3.4 to Xen 4.1 or from 2.6.18 kernel to 3.0
> kernel.
>
> Thanks,
> ToddI''ve attached the Dom0 & the DomU kernel configs. Dom0 is running
linux
3.0, DomU is running linux 2.6.30.1.

I''m somewhat surprised that the DomU kernel should be important since I
know it runs under Xen 3.4. Does this mean that to upgrade from Xen 3.4
to 4.1 I''ve also got to upgrade all my VMs?

thanks,

Anthony.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Todd Deshane

2011-Jul-28 15:46 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On Thu, Jul 28, 2011 at 3:36 PM, Anthony Wright <anthony@overnetdata.com>
wrote:> On 28/07/2011 16:01, Todd Deshane wrote:
>> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright
<anthony@overnetdata.com> wrote:
>>> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am trying to run
a 32 bit PV DomU with two tap:aio disks, two phy disks & 1 vif. The two
tap:aio disks are working fine, but the phy disks and the vif don''t
work and I get the following error messages from the DomU kernel during boot:
>>>
>>> [    1.783658] Using IPI No-Shortcut mode
>>> [   11.880061] XENBUS: Timeout connecting to device:
device/vbd/51729 (state 3)
>>> [   11.880072] XENBUS: Timeout connecting to device:
device/vbd/51745 (state 3)
>>> [   11.880079] XENBUS: Timeout connecting to device: device/vif/0
(state 0)
>>> [   11.880146] md: Waiting for all devices to be available before
autodetect
>>>
>>> The DomU VM runs linux version 2.6.30.1 and has worked perfectly on
other systems running a 2.6.18 kernel under Xen 3.4.
>>>
>>> Any ideas?
>>>
>> You should post your domU config file. Maybe the problem is some
>> syntax change from Xen 3.4 to Xen 4.1 or from 2.6.18 kernel to 3.0
>> kernel.
>>
>> Thanks,
>> Todd
> I''ve attached the Dom0 & the DomU kernel configs. Dom0 is
running linux
> 3.0, DomU is running linux 2.6.30.1.
I meant the domU guest configuration file (the xm/xl one). I meant
that you might be specifying the disk line incorrectly.
>
> I''m somewhat surprised that the DomU kernel should be important
since I
> know it runs under Xen 3.4. Does this mean that to upgrade from Xen 3.4
> to 4.1 I''ve also got to upgrade all my VMs?


-- 
Todd Deshane
http://www.linkedin.com/in/deshantm
http://www.xen.org/products/cloudxen.html
http://runningxen.com/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Jul-28 16:00 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On 28/07/2011 16:46, Todd Deshane wrote:> On Thu, Jul 28, 2011 at 3:36 PM, Anthony Wright
<anthony@overnetdata.com> wrote:
>> On 28/07/2011 16:01, Todd Deshane wrote:
>>> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright
<anthony@overnetdata.com> wrote:
>>>> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am trying to
run a 32 bit PV DomU with two tap:aio disks, two phy disks & 1 vif. The two
tap:aio disks are working fine, but the phy disks and the vif don''t
work and I get the following error messages from the DomU kernel during boot:
>>>>
>>>> [    1.783658] Using IPI No-Shortcut mode
>>>> [   11.880061] XENBUS: Timeout connecting to device:
device/vbd/51729 (state 3)
>>>> [   11.880072] XENBUS: Timeout connecting to device:
device/vbd/51745 (state 3)
>>>> [   11.880079] XENBUS: Timeout connecting to device:
device/vif/0 (state 0)
>>>> [   11.880146] md: Waiting for all devices to be available
before autodetect
>>>>
>>>> The DomU VM runs linux version 2.6.30.1 and has worked
perfectly on other systems running a 2.6.18 kernel under Xen 3.4.
>>>>
>>>> Any ideas?
>>>>
>>> You should post your domU config file. Maybe the problem is some
>>> syntax change from Xen 3.4 to Xen 4.1 or from 2.6.18 kernel to 3.0
>>> kernel.
>>>
>>> Thanks,
>>> Todd
>> I''ve attached the Dom0 & the DomU kernel configs. Dom0 is
running linux
>> 3.0, DomU is running linux 2.6.30.1.
> I meant the domU guest configuration file (the xm/xl one). I meant
> that you might be specifying the disk line incorrectly.I''ve attached the startup config. I''ve tried more RAM, but
that doesn''t
help, and I can''t find anything useful in the xen logs.

thanks,

Anthony.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Jul-28 16:28 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On Thu, 2011-07-28 at 11:36 -0400, Anthony Wright wrote:> I''m somewhat surprised that the DomU kernel should be important
since
> I know it runs under Xen 3.4. Does this mean that to upgrade from Xen
> 3.4 to 4.1 I''ve also got to upgrade all my VMs? 
The domU ABI is stable so this should never be necessary (of course bugs
do happen).

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Jul-29 07:53 UTC

head link

[Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

I''ve just upgraded to xen 4.1.1 with a stock 3.0 kernel on dom0 (with
the vga-support patch backported). I can''t get my DomU''s to
work due to
the phy disks and vifs timing out in DomU and looking through my logs
this morning I''m getting a consistent kernel bug report with xen
mentioned at the top of the stack trace and vifdisconnect mentioned on
the third line of the stack trace, so I suspect it''s related to the
problem I''m seeing. I don''t remember seeing the stack trace
with 4.1.0
xen, but it''s possible I missed it. I''ve had the report on two
consecutive boots and attach the message log from both plus the Dom0
kernel config.

Anthony.

On 28/07/2011 17:28, Ian Campbell wrote:> On Thu, 2011-07-28 at 11:36 -0400, Anthony Wright wrote:
>> I''m somewhat surprised that the DomU kernel should be
important since
>> I know it runs under Xen 3.4. Does this mean that to upgrade from Xen
>> 3.4 to 4.1 I''ve also got to upgrade all my VMs? 
> The domU ABI is stable so this should never be necessary (of course bugs
> do happen).
>
> Ian.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Jul-29 15:48 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU (only on certain hardware)

Ok, the plot thickens...

I have installed virtually identical systems on two physical machines -
identical (and I mean identical) xen, dom0, domU with possibly a
slightly different configuration for the domU. They''re as close as I
can
get without imaging the disk. On both machines Dom0 starts normally, but
on one physical machine the DomU starts correctly seeing the two tap:aio
disks, the two phy disks and the vif, while on the other physical
machine the DomU only sees the two tap:aio disks.

The two pieces of hardware are quite different one is a 32 bit processor
from 2000/2001 (this is the one that works), the other is a much more
modern 64 bit processor about a year old.

On 28/07/2011 17:28, Ian Campbell wrote:> On Thu, 2011-07-28 at 11:36 -0400, Anthony Wright wrote:
>> I''m somewhat surprised that the DomU kernel should be
important since
>> I know it runs under Xen 3.4. Does this mean that to upgrade from Xen
>> 3.4 to 4.1 I''ve also got to upgrade all my VMs? 
> The domU ABI is stable so this should never be necessary (of course bugs
> do happen).
>
> Ian.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jul-29 15:55 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On Thu, Jul 28, 2011 at 05:00:13PM +0100, Anthony Wright
wrote:> On 28/07/2011 16:46, Todd Deshane wrote:
> > On Thu, Jul 28, 2011 at 3:36 PM, Anthony Wright
<anthony@overnetdata.com> wrote:
> >> On 28/07/2011 16:01, Todd Deshane wrote:
> >>> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright
<anthony@overnetdata.com> wrote:
> >>>> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am
trying to run a 32 bit PV DomU with two tap:aio disks, two phy disks & 1
vif. The two tap:aio disks are working fine, but the phy disks and the vif
don''t work and I get the following error messages from the DomU kernel
during boot:
> >>>>
> >>>> [    1.783658] Using IPI No-Shortcut mode
> >>>> [   11.880061] XENBUS: Timeout connecting to device:
device/vbd/51729 (state 3)
> >>>> [   11.880072] XENBUS: Timeout connecting to device:
device/vbd/51745 (state 3)
What device does that correspond to (hint: run xl block-list or xm block-list)?
> disk        = [
>                
''tap:aio:/workspace/agent/appliances/XenFileServer-3.18/rootfs,xvda1,r''
>               
,''tap:aio:/var/agent/running/1/sysconfig,xvda2,r''
>                ,''phy:/dev/Master/Workspace-1,xvdb1,w''
>                ,''phy:/dev/Master/Filesystem,xvdc1,w''
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jul-29 16:06 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU (only on certain hardware)

On Fri, Jul 29, 2011 at 04:48:15PM +0100, Anthony Wright
wrote:> Ok, the plot thickens...
Please don''t top post.> 
> I have installed virtually identical systems on two physical machines -
> identical (and I mean identical) xen, dom0, domU with possibly a
md5sum match?> slightly different configuration for the domU. They''re as close as
I can
> get without imaging the disk. On both machines Dom0 starts normally, but
> on one physical machine the DomU starts correctly seeing the two tap:aio
> disks, the two phy disks and the vif, while on the other physical
> machine the DomU only sees the two tap:aio disks.
> 
> The two pieces of hardware are quite different one is a 32 bit processor
> from 2000/2001 (this is the one that works), the other is a much more
> modern 64 bit processor about a year old.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Jul-29 18:40 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On 29/07/2011 16:55, Konrad Rzeszutek Wilk wrote:> On Thu, Jul 28, 2011 at 05:00:13PM +0100, Anthony Wright wrote:
>> On 28/07/2011 16:46, Todd Deshane wrote:
>>> On Thu, Jul 28, 2011 at 3:36 PM, Anthony Wright
<anthony@overnetdata.com> wrote:
>>>> On 28/07/2011 16:01, Todd Deshane wrote:
>>>>> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright
<anthony@overnetdata.com> wrote:
>>>>>> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am
trying to run a 32 bit PV DomU with two tap:aio disks, two phy disks & 1
vif. The two tap:aio disks are working fine, but the phy disks and the vif
don''t work and I get the following error messages from the DomU kernel
during boot:
>>>>>>
>>>>>> [    1.783658] Using IPI No-Shortcut mode
>>>>>> [   11.880061] XENBUS: Timeout connecting to device:
device/vbd/51729 (state 3)
>>>>>> [   11.880072] XENBUS: Timeout connecting to device:
device/vbd/51745 (state 3)
> What device does that correspond to (hint: run xl block-list or xm
block-list)?
>The output from block-list is:

Vdev  BE  handle state evt-ch ring-ref BE-path
51729 0   764    3     10     10       /local/domain/0/backend/vbd/764/51729
51745 0   764    3     11     11       /local/domain/0/backend/vbd/764/51745
51713 0   764    4     8      8       
/local/domain/0/backend/qdisk/764/51713
51714 0   764    4     9      9       
/local/domain/0/backend/qdisk/764/51714

The two vbds map to two LVM logical volumes in two different volume groups.

On 29/07/2011 17:06, Konrad Rzeszutek Wilk wrote:>> > I have installed virtually identical systems on two physical
machines -
>> > identical (and I mean identical) xen, dom0, domU with possibly a
> md5sum match?Yes - md5sum match on all the key components, i.e. xen, dom0 kernel,
99.9% of the root filesystem, the domU kernel & 99.9% of the domU
filesystem. Where there isn''t a precise match is on some of the config
files. I don''t think these should have any effect, but I will have a go
at mirroring the disks (I can''t swap disks since one is SATA & the
other
IDE).

I also was having problems with the vif device, and got a kernel bug
report that could potentially relate to it. I''ve attached two syslogs.

Anthony.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jul-29 20:01 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

[Ian, I copied you on this b/c of the netbk issue - read on]
> >>>>> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright
<anthony@overnetdata.com> wrote:
> >>>>>> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I
am trying to run a 32 bit PV DomU with two tap:aio disks, two phy disks & 1
vif. The two tap:aio disks are working fine, but the phy disks and the vif
don''t work and I get the following error messages from the DomU kernel
during boot:
> >>>>>>
> >>>>>> [    1.783658] Using IPI No-Shortcut mode
> >>>>>> [   11.880061] XENBUS: Timeout connecting to
device: device/vbd/51729 (state 3)
> >>>>>> [   11.880072] XENBUS: Timeout connecting to
device: device/vbd/51745 (state 3)
Hm, which version of DomU were these? I wonder if this is related to the
''feature-barrier''
that is not supported with 3.0. Do you see anything in the DomU about the disks?
or xen-blkfront? Can you run the guests with ''initcall_debug loglevel=8
debug'' to see
if if the blkfront is actually running on those disks.

Any idea where the source for those DomU''s is? If it is an issue with
''feature-barrier''
it looks like it can''t handle not having that option visible which it
should.

> > What device does that correspond to (hint: run xl block-list or xm
block-list)?
> >
> The output from block-list is:
> 
> Vdev  BE  handle state evt-ch ring-ref BE-path
> 51729 0   764    3     10     10      
/local/domain/0/backend/vbd/764/51729
> 51745 0   764    3     11     11      
/local/domain/0/backend/vbd/764/51745
> 51713 0   764    4     8      8       
> /local/domain/0/backend/qdisk/764/51713
> 51714 0   764    4     9      9       
> /local/domain/0/backend/qdisk/764/51714
> 
> The two vbds map to two LVM logical volumes in two different volume groups.
qdisk.. ok so it does swap over to QEMU internal AIO path. From the output it
looks
like the ones that hang are the ''phy'' types? Is that right?
> 
> On 29/07/2011 17:06, Konrad Rzeszutek Wilk wrote:
> >> > I have installed virtually identical systems on two physical
machines -
> >> > identical (and I mean identical) xen, dom0, domU with
possibly a
> > md5sum match?
> Yes - md5sum match on all the key components, i.e. xen, dom0 kernel,
> 99.9% of the root filesystem, the domU kernel & 99.9% of the domU
> filesystem. Where there isn''t a precise match is on some of the
config
> files. I don''t think these should have any effect, but I will have
a go
> at mirroring the disks (I can''t swap disks since one is SATA &
the other
> IDE).
> 
> I also was having problems with the vif device, and got a kernel bug
> report that could potentially relate to it. I''ve attached two
syslogs.
Yeah, that is bad. I actually see a similar issue if I kill forcibly the guests.
I hadn''t yet narrowed it down - .. you are looking to be using 4.1..
But not
4.1.1 right?

Can you describe to me how you get the netbk crash?
> 2011 Jul 29 07:02:10 kernel: [   33.242680] vbd vbd-1-51745: 1 mapping
ring-ref 11 port 11
> 2011 Jul 29 07:02:10 kernel: [   33.253038] vif vif-1-0: vif1.0: failed to
map tx ring. err=-12 status=-1
> 2011 Jul 29 07:02:10 kernel: [   33.253065] vif vif-1-0: 1 mapping
shared-frames 768/769 port 12
> 2011 Jul 29 07:02:43 kernel: [   66.103514] vif vif-1-0: 2 reading script
> 2011 Jul 29 07:02:43 kernel: [   66.106265] br-internal: port 1(vif1.0)
entering disabled state
> 2011 Jul 29 07:02:43 kernel: [   66.106309] libfcoe_device_notification:
NETDEV_UNREGISTER vif1.0
> 2011 Jul 29 07:02:43 kernel: [   66.106333] br-internal: port 1(vif1.0)
entering disabled state
> 2011 Jul 29 07:02:43 kernel: [   66.106372] br-internal: mixed no
checksumming and other settings.
> 2011 Jul 29 07:02:43 kernel: [   66.114097] ------------[ cut here
]------------
> 2011 Jul 29 07:02:43 kernel: [   66.114878] kernel BUG at
mm/vmalloc.c:2164!
> 2011 Jul 29 07:02:43 kernel: [   66.115058] invalid opcode: 0000 [#1] SMP
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Modules linked in:
> 2011 Jul 29 07:02:43 kernel: [   66.115376]
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Pid: 20, comm: xenwatch Not
tainted 3.0.0 #1 MSI MS-7309/MS-7309
> 2011 Jul 29 07:02:43 kernel: [   66.115376] EIP: 0061:[<c0494bff>]
EFLAGS: 00010203 CPU: 1
> 2011 Jul 29 07:02:43 kernel: [   66.115376] EIP is at free_vm_area+0xf/0x19
> 2011 Jul 29 07:02:43 kernel: [   66.115376] EAX: 00000000 EBX: cf866480
ECX: 00000018 EDX: 00000000
> 2011 Jul 29 07:02:43 kernel: [   66.115376] ESI: cfa06800 EDI: d076c400
EBP: cfa06c00 ESP: d0ce7eb4
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  DS: 007b ES: 007b FS: 00d8 GS:
0000 SS: 0069
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Process xenwatch (pid: 20,
ti=d0ce6000 task=d0c55140 task.ti=d0ce6000)
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Stack:
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  cfa06c00 c09e87aa fffc6e63
c0c4bd65 d0ce7ecc cfa06844 d0ce7ecc d0ce7ecc
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  cfa06c00 cfa06800 d076c400
cfa06c94 c09eace0 d04cd380 00000000 fffffffe
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  d0ce7f9c c061fe74 d04cd2e0
d076c420 d076c400 d0ce7f9c c09e9f8c d076c400
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Call Trace:
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09e87aa>] ?
xen_netbk_unmap_frontend_rings+0xbf/0xd3
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0c4bd65>] ?
netdev_run_todo+0x1b7/0x1cc
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09eace0>] ?
xenvif_disconnect+0xd0/0xe4
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c061fe74>] ?
xenbus_rm+0x37/0x3e
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09e9f8c>] ?
netback_remove+0x40/0x5d
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c062075d>] ?
xenbus_dev_remove+0x2c/0x3d
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06620e6>] ?
__device_release_driver+0x42/0x79
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06621ac>] ?
device_release_driver+0xf/0x17
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0661818>] ?
bus_remove_device+0x75/0x84
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0660693>] ?
device_del+0xe6/0x125
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06606da>] ?
device_unregister+0x8/0x10
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06205f0>] ?
xenbus_dev_changed+0x71/0x129
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0405394>] ?
check_events+0x8/0xc
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c061f711>] ?
xenwatch_thread+0xeb/0x113
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c044792c>] ?
wake_up_bit+0x53/0x53
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c061f626>] ?
xenbus_thread+0x1cc/0x1cc
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c0447616>] ?
kthread+0x63/0x68
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c04475b3>] ?
kthread_worker_fn+0x122/0x122
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c0e0f036>] ?
kernel_thread_helper+0x6/0x10
> 2011 Jul 29 07:02:43 kernel: [   66.129624] Code: c1 00 00 00 01 89 f0 e8
a1 ff ff ff 81 6b 08 00 10 00 00 eb 02 31 db 89 d8 5b 5e c3 53 89 c3 8b 40 04 e8
9b ff ff ff 39 d8 74 04 <0f> 0b eb fe 5b e9 73 95 00 00 57 89 d7 56 31 f6
53 89 c3 eb 09
> 2011 Jul 29 07:02:43 kernel: [   66.129624] EIP: [<c0494bff>]
free_vm_area+0xf/0x19 SS:ESP 0069:d0ce7eb4
> 2011 Jul 29 07:02:43 kernel: [   66.129624] ---[ end trace 7bb110af96f32256
]---
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Jul-30 17:05 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

On 29/07/2011 21:01, Konrad Rzeszutek Wilk wrote:> [Ian, I copied you on this b/c of the netbk issue - read on]
>
>>>>>>> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright
<anthony@overnetdata.com> wrote:
>>>>>>>> I have a 32 bit 3.0 Dom0 kernel running Xen
4.1. I am trying to run a 32 bit PV DomU with two tap:aio disks, two phy disks
& 1 vif. The two tap:aio disks are working fine, but the phy disks and the
vif don''t work and I get the following error messages from the DomU
kernel during boot:
>>>>>>>>
>>>>>>>> [    1.783658] Using IPI No-Shortcut mode
>>>>>>>> [   11.880061] XENBUS: Timeout connecting to
device: device/vbd/51729 (state 3)
>>>>>>>> [   11.880072] XENBUS: Timeout connecting to
device: device/vbd/51745 (state 3)
> Hm, which version of DomU were these? I wonder if this is related to the
''feature-barrier''
> that is not supported with 3.0. Do you see anything in the DomU about the
disks?
> or xen-blkfront? Can you run the guests with ''initcall_debug
loglevel=8 debug'' to see
> if if the blkfront is actually running on those disks.I have attached the domU console output with these options set.

I have also spent a fair amount of time trying to narrow down the
conditions that cause it, with lots of hardware switching & disk
imaging. The conclusion I came to was that it''s not hardware related,
but there''s a subtle interaction going on with LVM that''s
causing the
problem, but I''m struggling to work out how to narrow it down any
further than that.

I started with a setup that works: Machine 1 with HDD 1 (IDE), and a
setup that didn''t: Machine 2 with HDD 2 (SATA). Machine 2 has an IDE
port so I unplugged HDD 2 and put HDD 1 in Machine 2 and that setup
worked thus excluding most of the hardware. Next I imaged HDD 3 (SATA)
from HDD 1 (IDE), unplugged HDD 1 and put HDD 3 in Machine 2, and that
setup worked, thus excluding an IDE/SATA issue, and giving me a disk I
could safely play with. The disks are organised into two partitions,
partition 1 is for Dom0, partition 2 is an LVM volume group and is used
for the DomUs. One LV (called Main) in this volume group is used by Dom0
to hold the DomU kernels, config information and other static data &
executables, the rest of the VG is issued as LVs to the various DomUs as
needed with a fair amount of free space left in the VG. I took the Main
LV from HDD 2 (didn''t work) and imaged it onto HDD 3 and by judicious
LV
renaming booted against this image and the setup failed - great I
thought - it''s looks like a very subtle config issue. Next I created a
third LV this time imaged from the Main LV that worked, giving me three
Main LVs (I called them Main-Works, Main-Broken & Main-Testing) and I
simply use lvrename to select the one I wanted as active. However now I
couldn''t get the setup to work with any of these three Main LVs
including the one that originally worked. Removing the LVs I had
recently created, and going back to the original Main LV, the setup
started working again.

I''m going to try an up to date version of LVM (the one I''m
using is a
little out of date), and see if that makes any difference, but the
version I have at the moment has worked without problem in the
past.> Any idea where the source for those DomU''s is? If it is an issue
with ''feature-barrier''
> it looks like it can''t handle not having that option visible which
it should.
>We build the DomUs with a tightly controlled internal build sysem, so I
have a full manifest for the DomU.>>> What device does that correspond to (hint: run xl block-list or xm
block-list)?
>>>
>> The output from block-list is:
>>
>> Vdev  BE  handle state evt-ch ring-ref BE-path
>> 51729 0   764    3     10     10      
/local/domain/0/backend/vbd/764/51729
>> 51745 0   764    3     11     11      
/local/domain/0/backend/vbd/764/51745
>> 51713 0   764    4     8      8       
>> /local/domain/0/backend/qdisk/764/51713
>> 51714 0   764    4     9      9       
>> /local/domain/0/backend/qdisk/764/51714
>>
>> The two vbds map to two LVM logical volumes in two different volume
groups.
> qdisk.. ok so it does swap over to QEMU internal AIO path. From the output
it looks
> like the ones that hang are the ''phy'' types? Is that
right?
>The ones that hang are phy and are the first two, with vdev numbers of
51729 & 51745.>> On 29/07/2011 17:06, Konrad Rzeszutek Wilk wrote:
>>>>> I have installed virtually identical systems on two
physical machines -
>>>>> identical (and I mean identical) xen, dom0, domU with
possibly a
>>> md5sum match?
>> Yes - md5sum match on all the key components, i.e. xen, dom0 kernel,
>> 99.9% of the root filesystem, the domU kernel & 99.9% of the domU
>> filesystem. Where there isn''t a precise match is on some of
the config
>> files. I don''t think these should have any effect, but I will
have a go
>> at mirroring the disks (I can''t swap disks since one is SATA
& the other
>> IDE).
>>
>> I also was having problems with the vif device, and got a kernel bug
>> report that could potentially relate to it. I''ve attached two
syslogs.
> Yeah, that is bad. I actually see a similar issue if I kill forcibly the
guests.
> I hadn''t yet narrowed it down - .. you are looking to be using
4.1.. But not
> 4.1.1 right?I started with 4.1.0, but upgraded to 4.1.1 in the hope that it might
fix the problem. The vif timeouts have happened with both versions, but
I think the kernel errors have only been happening since I upgraded to
xen 4.1.1, however I''m not sure. I''ve also had a number of
kernel Oops
in place of the kernel errors as well.> Can you describe to me how you get the netbk crash?The DomU when it realises it has a problem with one of it''s disks
issues
a warning message and then shuts itself down. The netbk crash happens
partway through that shutdown process, but not when the DomU is touching
the network (as far as I know) - it''s issuing SIG KILLs to all
processes. It''s always at the same point in the shutdown process, but
the shutdown process pauses at that point for quite a while and since it
doesn''t touch the network, I''m not convinced it''s
triggered by something
that DomU is doing. The netbk crash only happens the first time the DomU
starts up & shuts down, it doesn''t happen on subsequent DomU
startup/shutdown cycles. It also doesn''t happen if the disks work
correctly. I do have a setup that consistently produces
it.>> 2011 Jul 29 07:02:10 kernel: [   33.242680] vbd vbd-1-51745: 1 mapping
ring-ref 11 port 11
>> 2011 Jul 29 07:02:10 kernel: [   33.253038] vif vif-1-0: vif1.0: failed
to map tx ring. err=-12 status=-1
>> 2011 Jul 29 07:02:10 kernel: [   33.253065] vif vif-1-0: 1 mapping
shared-frames 768/769 port 12
>> 2011 Jul 29 07:02:43 kernel: [   66.103514] vif vif-1-0: 2 reading
script
>> 2011 Jul 29 07:02:43 kernel: [   66.106265] br-internal: port 1(vif1.0)
entering disabled state
>> 2011 Jul 29 07:02:43 kernel: [   66.106309]
libfcoe_device_notification: NETDEV_UNREGISTER vif1.0
>> 2011 Jul 29 07:02:43 kernel: [   66.106333] br-internal: port 1(vif1.0)
entering disabled state
>> 2011 Jul 29 07:02:43 kernel: [   66.106372] br-internal: mixed no
checksumming and other settings.
>> 2011 Jul 29 07:02:43 kernel: [   66.114097] ------------[ cut here
]------------
>> 2011 Jul 29 07:02:43 kernel: [   66.114878] kernel BUG at
mm/vmalloc.c:2164!
>> 2011 Jul 29 07:02:43 kernel: [   66.115058] invalid opcode: 0000 [#1]
SMP
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] Modules linked in:
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] Pid: 20, comm: xenwatch Not
tainted 3.0.0 #1 MSI MS-7309/MS-7309
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] EIP:
0061:[<c0494bff>] EFLAGS: 00010203 CPU: 1
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] EIP is at
free_vm_area+0xf/0x19
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] EAX: 00000000 EBX: cf866480
ECX: 00000018 EDX: 00000000
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] ESI: cfa06800 EDI: d076c400
EBP: cfa06c00 ESP: d0ce7eb4
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  DS: 007b ES: 007b FS: 00d8
GS: 0000 SS: 0069
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] Process xenwatch (pid: 20,
ti=d0ce6000 task=d0c55140 task.ti=d0ce6000)
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] Stack:
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  cfa06c00 c09e87aa fffc6e63
c0c4bd65 d0ce7ecc cfa06844 d0ce7ecc d0ce7ecc
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  cfa06c00 cfa06800 d076c400
cfa06c94 c09eace0 d04cd380 00000000 fffffffe
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  d0ce7f9c c061fe74 d04cd2e0
d076c420 d076c400 d0ce7f9c c09e9f8c d076c400
>> 2011 Jul 29 07:02:43 kernel: [   66.115376] Call Trace:
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09e87aa>] ?
xen_netbk_unmap_frontend_rings+0xbf/0xd3
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0c4bd65>] ?
netdev_run_todo+0x1b7/0x1cc
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09eace0>] ?
xenvif_disconnect+0xd0/0xe4
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c061fe74>] ?
xenbus_rm+0x37/0x3e
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09e9f8c>] ?
netback_remove+0x40/0x5d
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c062075d>] ?
xenbus_dev_remove+0x2c/0x3d
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06620e6>] ?
__device_release_driver+0x42/0x79
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06621ac>] ?
device_release_driver+0xf/0x17
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0661818>] ?
bus_remove_device+0x75/0x84
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0660693>] ?
device_del+0xe6/0x125
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06606da>] ?
device_unregister+0x8/0x10
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06205f0>] ?
xenbus_dev_changed+0x71/0x129
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0405394>] ?
check_events+0x8/0xc
>> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c061f711>] ?
xenwatch_thread+0xeb/0x113
>> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c044792c>] ?
wake_up_bit+0x53/0x53
>> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c061f626>] ?
xenbus_thread+0x1cc/0x1cc
>> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c0447616>] ?
kthread+0x63/0x68
>> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c04475b3>] ?
kthread_worker_fn+0x122/0x122
>> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c0e0f036>] ?
kernel_thread_helper+0x6/0x10
>> 2011 Jul 29 07:02:43 kernel: [   66.129624] Code: c1 00 00 00 01 89 f0
e8 a1 ff ff ff 81 6b 08 00 10 00 00 eb 02 31 db 89 d8 5b 5e c3 53 89 c3 8b 40 04
e8 9b ff ff ff 39 d8 74 04 <0f> 0b eb fe 5b e9 73 95 00 00 57 89 d7 56 31
f6 53 89 c3 eb 09
>> 2011 Jul 29 07:02:43 kernel: [   66.129624] EIP: [<c0494bff>]
free_vm_area+0xf/0x19 SS:ESP 0069:d0ce7eb4
>> 2011 Jul 29 07:02:43 kernel: [   66.129624] ---[ end trace
7bb110af96f32256 ]---


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-01 11:03 UTC

head link

Re: [Xen-devel] phy disks and vifs timing out in DomU

> > Hm, which version of DomU were these? I wonder if this is related to
> > the ''feature-barrier''
> > that is not supported with 3.0. Do you see anything in the DomU
> > about the disks?
> > or xen-blkfront? Can you run the guests with ''initcall_debug
> > loglevel=8 debug'' to see
> > if if the blkfront is actually running on those disks.
> I have attached the domU console output with these options set.
> 
> I have also spent a fair amount of time trying to narrow down the
> conditions that cause it, with lots of hardware switching & disk
> imaging. The conclusion I came to was that it''s not hardware
related,
> but there''s a subtle interaction going on with LVM that''s
causing the
> problem, but I''m struggling to work out how to narrow it down any
> further than that.
> 
> I started with a setup that works: Machine 1 with HDD 1 (IDE), and a
> setup that didn''t: Machine 2 with HDD 2 (SATA). Machine 2 has an
IDE
> port so I unplugged HDD 2 and put HDD 1 in Machine 2 and that setup
> worked thus excluding most of the hardware. Next I imaged HDD 3 (SATA)
> from HDD 1 (IDE), unplugged HDD 1 and put HDD 3 in Machine 2, and that
> setup worked, thus excluding an IDE/SATA issue, and giving me a disk I
> could safely play with. The disks are organised into two partitions,
> partition 1 is for Dom0, partition 2 is an LVM volume group and is
> used
> for the DomUs. One LV (called Main) in this volume group is used by
> Dom0
> to hold the DomU kernels, config information and other static data &
> executables, the rest of the VG is issued as LVs to the various DomUs
> as
> needed with a fair amount of free space left in the VG. I took the
> Main
> LV from HDD 2 (didn''t work) and imaged it onto HDD 3 and by
judicious
> LV
> renaming booted against this image and the setup failed - great I
> thought - it''s looks like a very subtle config issue. Next I
created a
> third LV this time imaged from the Main LV that worked, giving me
> three
> Main LVs (I called them Main-Works, Main-Broken & Main-Testing) and I
> simply use lvrename to select the one I wanted as active. However now
> I
> couldn''t get the setup to work with any of these three Main LVs
> including the one that originally worked. Removing the LVs I had
> recently created, and going back to the original Main LV, the setup
> started working again.
> 
> I''m going to try an up to date version of LVM (the one
I''m using is a
> little out of date), and see if that makes any difference, but the
> version I have at the moment has worked without problem in the past.
I''ve managed to isolate it a little tighter, but it''s very
strange. I''m also updated to the latest version of LVM but it makes no
difference.

I have a system with two partitions, the second of which is an LVM volume group.
I have a VM which has one vif, two tap:aio disks based on files in a LV within
the volume group and two phy disks based on LVs within the volume group. I have
managed to get to the situation where I can boot the physical machine and the VM
starts correctly. If however I create a new LV of any size and with any name,
and restart the physical machine the VM fails to start correctly, with the two
phy disks timing out, the vif timing out and a kernel bug 90% of the time and a
kernel oops 10% of the time. If I remove this new LV and reboot the physical
machine the VM starts correctly again. There is no reason within my code that
would cause the new LV to have an effect on the VM, but somehow it does.

Anthony.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-03 15:28 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright
wrote:> I''ve just upgraded to xen 4.1.1 with a stock 3.0 kernel on dom0
(with
> the vga-support patch backported). I can''t get my DomU''s
to work due to
> the phy disks and vifs timing out in DomU and looking through my logs
> this morning I''m getting a consistent kernel bug report with xen
> mentioned at the top of the stack trace and vifdisconnect mentioned on
Yikes! Ian any ideas what to try?

Anthony, can you compile the kernel with debug=y and when this happens
see what ''xl dmesg'' gives? Also there is also the ''xl
debug-keys g'' which
should dump the grants in use.. that might help a bit.
> 2011 Jul 29 07:18:50 kernel: [   33.213500] vif vif-1-0: vif1.0: failed to
map tx ring. err=-12 status=-1
> 2011 Jul 29 07:18:50 kernel: [   33.213516] vif vif-1-0: 1 mapping
shared-frames 768/769 port 12
> 2011 Jul 29 07:19:01 /usr/sbin/cron[3719]: (root) CMD
(/usr/monitor/monitor)
> 2011 Jul 29 07:19:23 kernel: [   66.043164] vif vif-1-0: 2 reading script
> 2011 Jul 29 07:19:23 kernel: [   66.045984] br-internal: port 1(vif1.0)
entering disabled state
> 2011 Jul 29 07:19:23 kernel: [   66.046044] libfcoe_device_notification:
NETDEV_UNREGISTER vif1.0
> 2011 Jul 29 07:19:23 kernel: [   66.046082] br-internal: port 1(vif1.0)
entering disabled state
> 2011 Jul 29 07:19:23 kernel: [   66.046279] br-internal: mixed no
checksumming and other settings.
> 2011 Jul 29 07:19:23 kernel: [   66.050077] ------------[ cut here
]------------
> 2011 Jul 29 07:19:23 kernel: [   66.050858] kernel BUG at
mm/vmalloc.c:2164!
> 2011 Jul 29 07:19:23 kernel: [   66.051034] invalid opcode: 0000 [#1] SMP
> 2011 Jul 29 07:19:23 kernel: [   66.051034] Modules linked in:
> 2011 Jul 29 07:19:23 kernel: [   66.051034]
> 2011 Jul 29 07:19:23 kernel: [   66.051034] Pid: 20, comm: xenwatch Not
tainted 3.0.0 #1 MSI MS-7309/MS-7309
> 2011 Jul 29 07:19:23 kernel: [   66.051034] EIP: 0061:[<c0494bff>]
EFLAGS: 00010207 CPU: 1
> 2011 Jul 29 07:19:23 kernel: [   66.051034] EIP is at free_vm_area+0xf/0x19
> 2011 Jul 29 07:19:23 kernel: [   66.051034] EAX: 00000000 EBX: d0799700
ECX: 00000018 EDX: 00000000
> 2011 Jul 29 07:19:23 kernel: [   66.051034] ESI: cf9e5800 EDI: d051a600
EBP: cf9e5c00 ESP: d0ce7eb4
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  DS: 007b ES: 007b FS: 00d8 GS:
0000 SS: 0069
> 2011 Jul 29 07:19:23 kernel: [   66.051034] Process xenwatch (pid: 20,
ti=d0ce6000 task=d0c55140 task.ti=d0ce6000)
> 2011 Jul 29 07:19:23 kernel: [   66.051034] Stack:
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  cf9e5c00 c09e87aa fffc6e23
c0c4bd65 d0ce7ecc cf9e5844 d0ce7ecc d0ce7ecc
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  cf9e5c00 cf9e5800 d051a600
cf9e5c94 c09eace0 cffdbfe0 00000000 fffffffe
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  d0ce7f9c c061fe74 cffdbe60
d051a620 d051a600 d0ce7f9c c09e9f8c d051a600
> 2011 Jul 29 07:19:23 kernel: [   66.051034] Call Trace:
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c09e87aa>] ?
xen_netbk_unmap_frontend_rings+0xbf/0xd3
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0c4bd65>] ?
netdev_run_todo+0x1b7/0x1cc
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c09eace0>] ?
xenvif_disconnect+0xd0/0xe4
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c061fe74>] ?
xenbus_rm+0x37/0x3e
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c09e9f8c>] ?
netback_remove+0x40/0x5d
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c062075d>] ?
xenbus_dev_remove+0x2c/0x3d
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06620e6>] ?
__device_release_driver+0x42/0x79
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06621ac>] ?
device_release_driver+0xf/0x17
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0661818>] ?
bus_remove_device+0x75/0x84
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0660693>] ?
device_del+0xe6/0x125
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06606da>] ?
device_unregister+0x8/0x10
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06205f0>] ?
xenbus_dev_changed+0x71/0x129
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0405394>] ?
check_events+0x8/0xc
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c061f711>] ?
xenwatch_thread+0xeb/0x113
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c044792c>] ?
wake_up_bit+0x53/0x53
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c061f626>] ?
xenbus_thread+0x1cc/0x1cc
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0447616>] ?
kthread+0x63/0x68
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c04475b3>] ?
kthread_worker_fn+0x122/0x122
> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0e0f036>] ?
kernel_thread_helper+0x6/0x10
> 2011 Jul 29 07:19:23 kernel: [   66.051034] Code: c1 00 00 00 01 89 f0 e8
a1 ff ff ff 81 6b 08 00 10 00 00 eb 02 31 db 89 d8 5b 5e c3 53 89 c3 8b 40 04 e8
9b ff ff ff 39 d8 74 04 <0f> 0b eb fe 5b e9 73 95 00 00 57 89 d7 56 31 f6
53 89 c3 eb 09
> 2011 Jul 29 07:19:23 kernel: [   66.051034] EIP: [<c0494bff>]
free_vm_area+0xf/0x19 SS:ESP 0069:d0ce7eb4
> 2011 Jul 29 07:19:23 kernel: [   66.051034] ---[ end trace b47a8d30fa29735c
]---
> 2011 Jul 29 07:19:23 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/qdisk/1/51714
> 2011 Jul 29 07:19:23 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/qdisk/1/51713
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-09 16:35 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Wed, Aug 03, 2011 at 11:28:41AM -0400, Konrad Rzeszutek Wilk
wrote:> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
> > I''ve just upgraded to xen 4.1.1 with a stock 3.0 kernel on
dom0 (with
> > the vga-support patch backported). I can''t get my
DomU''s to work due to
> > the phy disks and vifs timing out in DomU and looking through my logs
> > this morning I''m getting a consistent kernel bug report with
xen
> > mentioned at the top of the stack trace and vifdisconnect mentioned on
> 
> Yikes! Ian any ideas what to try?
Actually, the patch that Stefano posted might be worth trying. See
the attached file.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-19 10:22 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
>> I''ve just upgraded to xen 4.1.1 with a stock 3.0 kernel on
dom0 (with
>> the vga-support patch backported). I can''t get my
DomU''s to work due to
>> the phy disks and vifs timing out in DomU and looking through my logs
>> this morning I''m getting a consistent kernel bug report with
xen
>> mentioned at the top of the stack trace and vifdisconnect mentioned on
> Yikes! Ian any ideas what to try?
>
> Anthony, can you compile the kernel with debug=y and when this happens
> see what ''xl dmesg'' gives? Also there is also the
''xl debug-keys g'' which
> should dump the grants in use.. that might help a bit.I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of other
config values appeared at this point, and I took defaults for them).

The output from /var/log/messages & ''xl dmesg'' is
attached. There was no
output from ''xl debug-keys g''.

I''m going to try 3.0.3 for monday.>> 2011 Jul 29 07:18:50 kernel: [   33.213500] vif vif-1-0: vif1.0: failed
to map tx ring. err=-12 status=-1
>> 2011 Jul 29 07:18:50 kernel: [   33.213516] vif vif-1-0: 1 mapping
shared-frames 768/769 port 12
>> 2011 Jul 29 07:19:01 /usr/sbin/cron[3719]: (root) CMD
(/usr/monitor/monitor)
>> 2011 Jul 29 07:19:23 kernel: [   66.043164] vif vif-1-0: 2 reading
script
>> 2011 Jul 29 07:19:23 kernel: [   66.045984] br-internal: port 1(vif1.0)
entering disabled state
>> 2011 Jul 29 07:19:23 kernel: [   66.046044]
libfcoe_device_notification: NETDEV_UNREGISTER vif1.0
>> 2011 Jul 29 07:19:23 kernel: [   66.046082] br-internal: port 1(vif1.0)
entering disabled state
>> 2011 Jul 29 07:19:23 kernel: [   66.046279] br-internal: mixed no
checksumming and other settings.
>> 2011 Jul 29 07:19:23 kernel: [   66.050077] ------------[ cut here
]------------
>> 2011 Jul 29 07:19:23 kernel: [   66.050858] kernel BUG at
mm/vmalloc.c:2164!
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] invalid opcode: 0000 [#1]
SMP
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] Modules linked in:
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] Pid: 20, comm: xenwatch Not
tainted 3.0.0 #1 MSI MS-7309/MS-7309
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] EIP:
0061:[<c0494bff>] EFLAGS: 00010207 CPU: 1
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] EIP is at
free_vm_area+0xf/0x19
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] EAX: 00000000 EBX: d0799700
ECX: 00000018 EDX: 00000000
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] ESI: cf9e5800 EDI: d051a600
EBP: cf9e5c00 ESP: d0ce7eb4
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  DS: 007b ES: 007b FS: 00d8
GS: 0000 SS: 0069
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] Process xenwatch (pid: 20,
ti=d0ce6000 task=d0c55140 task.ti=d0ce6000)
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] Stack:
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  cf9e5c00 c09e87aa fffc6e23
c0c4bd65 d0ce7ecc cf9e5844 d0ce7ecc d0ce7ecc
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  cf9e5c00 cf9e5800 d051a600
cf9e5c94 c09eace0 cffdbfe0 00000000 fffffffe
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  d0ce7f9c c061fe74 cffdbe60
d051a620 d051a600 d0ce7f9c c09e9f8c d051a600
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] Call Trace:
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c09e87aa>] ?
xen_netbk_unmap_frontend_rings+0xbf/0xd3
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0c4bd65>] ?
netdev_run_todo+0x1b7/0x1cc
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c09eace0>] ?
xenvif_disconnect+0xd0/0xe4
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c061fe74>] ?
xenbus_rm+0x37/0x3e
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c09e9f8c>] ?
netback_remove+0x40/0x5d
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c062075d>] ?
xenbus_dev_remove+0x2c/0x3d
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06620e6>] ?
__device_release_driver+0x42/0x79
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06621ac>] ?
device_release_driver+0xf/0x17
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0661818>] ?
bus_remove_device+0x75/0x84
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0660693>] ?
device_del+0xe6/0x125
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06606da>] ?
device_unregister+0x8/0x10
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c06205f0>] ?
xenbus_dev_changed+0x71/0x129
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0405394>] ?
check_events+0x8/0xc
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c061f711>] ?
xenwatch_thread+0xeb/0x113
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c044792c>] ?
wake_up_bit+0x53/0x53
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c061f626>] ?
xenbus_thread+0x1cc/0x1cc
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0447616>] ?
kthread+0x63/0x68
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c04475b3>] ?
kthread_worker_fn+0x122/0x122
>> 2011 Jul 29 07:19:23 kernel: [   66.051034]  [<c0e0f036>] ?
kernel_thread_helper+0x6/0x10
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] Code: c1 00 00 00 01 89 f0
e8 a1 ff ff ff 81 6b 08 00 10 00 00 eb 02 31 db 89 d8 5b 5e c3 53 89 c3 8b 40 04
e8 9b ff ff ff 39 d8 74 04 <0f> 0b eb fe 5b e9 73 95 00 00 57 89 d7 56 31
f6 53 89 c3 eb 09
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] EIP: [<c0494bff>]
free_vm_area+0xf/0x19 SS:ESP 0069:d0ce7eb4
>> 2011 Jul 29 07:19:23 kernel: [   66.051034] ---[ end trace
b47a8d30fa29735c ]---
>> 2011 Jul 29 07:19:23 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/qdisk/1/51714
>> 2011 Jul 29 07:19:23 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/qdisk/1/51713



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-19 12:56 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright
wrote:> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
> >> I''ve just upgraded to xen 4.1.1 with a stock 3.0 kernel
on dom0 (with
> >> the vga-support patch backported). I can''t get my
DomU''s to work due to
> >> the phy disks and vifs timing out in DomU and looking through my
logs
> >> this morning I''m getting a consistent kernel bug report
with xen
> >> mentioned at the top of the stack trace and vifdisconnect
mentioned on
> > Yikes! Ian any ideas what to try?
> >
> > Anthony, can you compile the kernel with debug=y and when this happens
> > see what ''xl dmesg'' gives? Also there is also the
''xl debug-keys g'' which
> > should dump the grants in use.. that might help a bit.
> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of
other
> config values appeared at this point, and I took defaults for them).
> 
> The output from /var/log/messages & ''xl dmesg'' is
attached. There was no
> output from ''xl debug-keys g''.
Ok, so I am hitting this too - I was hoping that the patch from Stefano
would have fixed the issue, but sadly it did not.

Let me (I am traveling right now) see if I can come up with an internim
solution until Ian comes with the right fix.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-22 11:02 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
>>>> I''ve just upgraded to xen 4.1.1 with a stock 3.0
kernel on dom0 (with
>>>> the vga-support patch backported). I can''t get my
DomU''s to work due to
>>>> the phy disks and vifs timing out in DomU and looking through
my logs
>>>> this morning I''m getting a consistent kernel bug
report with xen
>>>> mentioned at the top of the stack trace and vifdisconnect
mentioned on
>>> Yikes! Ian any ideas what to try?
>>>
>>> Anthony, can you compile the kernel with debug=y and when this
happens
>>> see what ''xl dmesg'' gives? Also there is also the
''xl debug-keys g'' which
>>> should dump the grants in use.. that might help a bit.
>> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of
other
>> config values appeared at this point, and I took defaults for them).
>>
>> The output from /var/log/messages & ''xl dmesg'' is
attached. There was no
>> output from ''xl debug-keys g''.
> Ok, so I am hitting this too - I was hoping that the patch from Stefano
> would have fixed the issue, but sadly it did not.
>
> Let me (I am traveling right now) see if I can come up with an internim
> solution until Ian comes with the right fix.
>I''ve just tested with a vanilla 3.0.3 kernel and I get exactly the same
result.

Anthony.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-25 20:31 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
>>>> I''ve just upgraded to xen 4.1.1 with a stock 3.0
kernel on dom0 (with
>>>> the vga-support patch backported). I can''t get my
DomU''s to work due to
>>>> the phy disks and vifs timing out in DomU and looking through
my logs
>>>> this morning I''m getting a consistent kernel bug
report with xen
>>>> mentioned at the top of the stack trace and vifdisconnect
mentioned on
>>> Yikes! Ian any ideas what to try?
>>>
>>> Anthony, can you compile the kernel with debug=y and when this
happens
>>> see what ''xl dmesg'' gives? Also there is also the
''xl debug-keys g'' which
>>> should dump the grants in use.. that might help a bit.
>> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of
other
>> config values appeared at this point, and I took defaults for them).
>>
>> The output from /var/log/messages & ''xl dmesg'' is
attached. There was no
>> output from ''xl debug-keys g''.
> Ok, so I am hitting this too - I was hoping that the patch from Stefano
> would have fixed the issue, but sadly it did not.
>
> Let me (I am traveling right now) see if I can come up with an internim
> solution until Ian comes with the right fix.
>Hi Konrad - any progress on this - it''s a bit of a show stopper for me.

One thing to add is that I''ve got a qemu-dm process running for a
para-virtualised DomU, which I''m told shouldn''t happen.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-25 21:11 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
>>>> I''ve just upgraded to xen 4.1.1 with a stock 3.0
kernel on dom0 (with
>>>> the vga-support patch backported). I can''t get my
DomU''s to work due to
>>>> the phy disks and vifs timing out in DomU and looking through
my logs
>>>> this morning I''m getting a consistent kernel bug
report with xen
>>>> mentioned at the top of the stack trace and vifdisconnect
mentioned on
>>> Yikes! Ian any ideas what to try?
>>>
>>> Anthony, can you compile the kernel with debug=y and when this
happens
>>> see what ''xl dmesg'' gives? Also there is also the
''xl debug-keys g'' which
>>> should dump the grants in use.. that might help a bit.
>> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of
other
>> config values appeared at this point, and I took defaults for them).
>>
>> The output from /var/log/messages & ''xl dmesg'' is
attached. There was no
>> output from ''xl debug-keys g''.
> Ok, so I am hitting this too - I was hoping that the patch from Stefano
> would have fixed the issue, but sadly it did not.
>
> Let me (I am traveling right now) see if I can come up with an internim
> solution until Ian comes with the right fix.
>On different hardware with the same software I''m also getting problems
starting DomUs, but this time the error is different. I''ve attached a
copy of the xl console output, but basically the server hang at
"Mount-cache hash table entries: 512". Again the VM is
paravirtualised,
and again I get a qemu-dm process for it.

The references to this message are normally related to memory issues,
but the server has only 1000M of ram, so can''t see it causing too much
of a problem.

Is this related to the other problems I''m seeing or completely
separate?

thanks,

Anthony




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2011-Aug-26 07:10 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Hello Anthony,

Perhaps you could try running with xend instead of the xl toolstack ?
Since you have also changed the hypervisor version to 4.1.1, i think you were
previously using xend instead of xl ?

So in theory it could also be a problem in the xl toolstack causing the extra
qemu processes when building the domain.

--

Sander

Thursday, August 25, 2011, 11:11:44 PM, you wrote:
> On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
>> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
>>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
>>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
>>>>> I''ve just upgraded to xen 4.1.1 with a stock 3.0
kernel on dom0 (with
>>>>> the vga-support patch backported). I can''t get my
DomU''s to work due to
>>>>> the phy disks and vifs timing out in DomU and looking
through my logs
>>>>> this morning I''m getting a consistent kernel bug
report with xen
>>>>> mentioned at the top of the stack trace and vifdisconnect
mentioned on
>>>> Yikes! Ian any ideas what to try?
>>>>
>>>> Anthony, can you compile the kernel with debug=y and when this
happens
>>>> see what ''xl dmesg'' gives? Also there is also
the ''xl debug-keys g'' which
>>>> should dump the grants in use.. that might help a bit.
>>> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number
of other
>>> config values appeared at this point, and I took defaults for
them).
>>>
>>> The output from /var/log/messages & ''xl
dmesg'' is attached. There was no
>>> output from ''xl debug-keys g''.
>> Ok, so I am hitting this too - I was hoping that the patch from Stefano
>> would have fixed the issue, but sadly it did not.
>>
>> Let me (I am traveling right now) see if I can come up with an internim
>> solution until Ian comes with the right fix.
>>
> On different hardware with the same software I''m also getting
problems
> starting DomUs, but this time the error is different. I''ve
attached a
> copy of the xl console output, but basically the server hang at
> "Mount-cache hash table entries: 512". Again the VM is
paravirtualised,
> and again I get a qemu-dm process for it.
> The references to this message are normally related to memory issues,
> but the server has only 1000M of ram, so can''t see it causing too
much
> of a problem.
> Is this related to the other problems I''m seeing or completely
separate?
> thanks,
> Anthony




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2011-Aug-26 11:23 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Fri, Aug 26, 2011 at 09:10:55AM +0200, Sander Eikelenboom
wrote:> Hello Anthony,
> 
> Perhaps you could try running with xend instead of the xl toolstack ?
> Since you have also changed the hypervisor version to 4.1.1, i think you
were previously using xend instead of xl ?
> 
> So in theory it could also be a problem in the xl toolstack causing the
extra qemu processes when building the domain.
> 

Are you using pvfb for the domU?
If yes, pvfb needs qemu-dm for the VNC server..

-- Pasi
> 
> Sander
> 
> Thursday, August 25, 2011, 11:11:44 PM, you wrote:
> 
> > On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
> >> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
> >>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> >>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright
wrote:
> >>>>> I''ve just upgraded to xen 4.1.1 with a stock
3.0 kernel on dom0 (with
> >>>>> the vga-support patch backported). I can''t
get my DomU''s to work due to
> >>>>> the phy disks and vifs timing out in DomU and looking
through my logs
> >>>>> this morning I''m getting a consistent kernel
bug report with xen
> >>>>> mentioned at the top of the stack trace and
vifdisconnect mentioned on
> >>>> Yikes! Ian any ideas what to try?
> >>>>
> >>>> Anthony, can you compile the kernel with debug=y and when
this happens
> >>>> see what ''xl dmesg'' gives? Also there is
also the ''xl debug-keys g'' which
> >>>> should dump the grants in use.. that might help a bit.
> >>> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a
number of other
> >>> config values appeared at this point, and I took defaults for
them).
> >>>
> >>> The output from /var/log/messages & ''xl
dmesg'' is attached. There was no
> >>> output from ''xl debug-keys g''.
> >> Ok, so I am hitting this too - I was hoping that the patch from
Stefano
> >> would have fixed the issue, but sadly it did not.
> >>
> >> Let me (I am traveling right now) see if I can come up with an
internim
> >> solution until Ian comes with the right fix.
> >>
> > On different hardware with the same software I''m also getting
problems
> > starting DomUs, but this time the error is different. I''ve
attached a
> > copy of the xl console output, but basically the server hang at
> > "Mount-cache hash table entries: 512". Again the VM is
paravirtualised,
> > and again I get a qemu-dm process for it.
> 
> > The references to this message are normally related to memory issues,
> > but the server has only 1000M of ram, so can''t see it causing
too much
> > of a problem.
> 
> > Is this related to the other problems I''m seeing or
completely separate?
> 
> > thanks,
> 
> > Anthony
> 
> 
> 
> 
> 
> -- 
> Best regards,
>  Sander                            mailto:linux@eikelenboom.it
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-26 12:15 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 26/08/2011 13:16, Stefano Stabellini wrote:> On Thu, 25 Aug 2011, Anthony Wright wrote:
>> On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
>>>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
>>>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright
wrote:
>>>>>> I''ve just upgraded to xen 4.1.1 with a stock
3.0 kernel on dom0 (with
>>>>>> the vga-support patch backported). I can''t get
my DomU''s to work due to
>>>>>> the phy disks and vifs timing out in DomU and looking
through my logs
>>>>>> this morning I''m getting a consistent kernel
bug report with xen
>>>>>> mentioned at the top of the stack trace and
vifdisconnect mentioned on
>>>>> Yikes! Ian any ideas what to try?
>>>>>
>>>>> Anthony, can you compile the kernel with debug=y and when
this happens
>>>>> see what ''xl dmesg'' gives? Also there is
also the ''xl debug-keys g'' which
>>>>> should dump the grants in use.. that might help a bit.
>>>> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a
number of other
>>>> config values appeared at this point, and I took defaults for
them).
>>>>
>>>> The output from /var/log/messages & ''xl
dmesg'' is attached. There was no
>>>> output from ''xl debug-keys g''.
>>> Ok, so I am hitting this too - I was hoping that the patch from
Stefano
>>> would have fixed the issue, but sadly it did not.
>>>
>>> Let me (I am traveling right now) see if I can come up with an
internim
>>> solution until Ian comes with the right fix.
>>>
>> On different hardware with the same software I''m also getting
problems
>> starting DomUs, but this time the error is different. I''ve
attached a
>> copy of the xl console output, but basically the server hang at
>> "Mount-cache hash table entries: 512". Again the VM is
paravirtualised,
>> and again I get a qemu-dm process for it.
>>
>> The references to this message are normally related to memory issues,
>> but the server has only 1000M of ram, so can''t see it causing
too much
>> of a problem.
>>
>> Is this related to the other problems I''m seeing or completely
separate?
> Could you please post your VM config file?Attached are two VM config files. The file xen-config-A is the xen
server that fails at the Mount-cache line. The file xen-config-B is the
xen server that timeout attaching to some of the xvds and the vif.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Aug-26 12:16 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, 25 Aug 2011, Anthony Wright wrote:> On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
> > On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
> >> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> >>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright
wrote:
> >>>> I''ve just upgraded to xen 4.1.1 with a stock 3.0
kernel on dom0 (with
> >>>> the vga-support patch backported). I can''t get my
DomU''s to work due to
> >>>> the phy disks and vifs timing out in DomU and looking
through my logs
> >>>> this morning I''m getting a consistent kernel bug
report with xen
> >>>> mentioned at the top of the stack trace and vifdisconnect
mentioned on
> >>> Yikes! Ian any ideas what to try?
> >>>
> >>> Anthony, can you compile the kernel with debug=y and when this
happens
> >>> see what ''xl dmesg'' gives? Also there is
also the ''xl debug-keys g'' which
> >>> should dump the grants in use.. that might help a bit.
> >> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a
number of other
> >> config values appeared at this point, and I took defaults for
them).
> >>
> >> The output from /var/log/messages & ''xl
dmesg'' is attached. There was no
> >> output from ''xl debug-keys g''.
> > Ok, so I am hitting this too - I was hoping that the patch from
Stefano
> > would have fixed the issue, but sadly it did not.
> >
> > Let me (I am traveling right now) see if I can come up with an
internim
> > solution until Ian comes with the right fix.
> >
> On different hardware with the same software I''m also getting
problems
> starting DomUs, but this time the error is different. I''ve
attached a
> copy of the xl console output, but basically the server hang at
> "Mount-cache hash table entries: 512". Again the VM is
paravirtualised,
> and again I get a qemu-dm process for it.
> 
> The references to this message are normally related to memory issues,
> but the server has only 1000M of ram, so can''t see it causing too
much
> of a problem.
> 
> Is this related to the other problems I''m seeing or completely
separate?
Could you please post your VM config file?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Aug-26 12:32 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Fri, 26 Aug 2011, Anthony Wright wrote:> On 26/08/2011 13:16, Stefano Stabellini wrote:
> > On Thu, 25 Aug 2011, Anthony Wright wrote:
> >> On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
> >>> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright
wrote:
> >>>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> >>>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony
Wright wrote:
> >>>>>> I''ve just upgraded to xen 4.1.1 with a
stock 3.0 kernel on dom0 (with
> >>>>>> the vga-support patch backported). I
can''t get my DomU''s to work due to
> >>>>>> the phy disks and vifs timing out in DomU and
looking through my logs
> >>>>>> this morning I''m getting a consistent
kernel bug report with xen
> >>>>>> mentioned at the top of the stack trace and
vifdisconnect mentioned on
> >>>>> Yikes! Ian any ideas what to try?
> >>>>>
> >>>>> Anthony, can you compile the kernel with debug=y and
when this happens
> >>>>> see what ''xl dmesg'' gives? Also
there is also the ''xl debug-keys g'' which
> >>>>> should dump the grants in use.. that might help a bit.
> >>>> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y
(a number of other
> >>>> config values appeared at this point, and I took defaults
for them).
> >>>>
> >>>> The output from /var/log/messages & ''xl
dmesg'' is attached. There was no
> >>>> output from ''xl debug-keys g''.
> >>> Ok, so I am hitting this too - I was hoping that the patch
from Stefano
> >>> would have fixed the issue, but sadly it did not.
> >>>
> >>> Let me (I am traveling right now) see if I can come up with an
internim
> >>> solution until Ian comes with the right fix.
> >>>
> >> On different hardware with the same software I''m also
getting problems
> >> starting DomUs, but this time the error is different.
I''ve attached a
> >> copy of the xl console output, but basically the server hang at
> >> "Mount-cache hash table entries: 512". Again the VM is
paravirtualised,
> >> and again I get a qemu-dm process for it.
> >>
> >> The references to this message are normally related to memory
issues,
> >> but the server has only 1000M of ram, so can''t see it
causing too much
> >> of a problem.
> >>
> >> Is this related to the other problems I''m seeing or
completely separate?
> > Could you please post your VM config file?
> Attached are two VM config files. The file xen-config-A is the xen
> server that fails at the Mount-cache line. The file xen-config-B is the
> xen server that timeout attaching to some of the xvds and the vif.
> 
can you try to use losetup to setup a loop device for each of the
tap:aio files you have and then specify phy:/dev/loopN in the config
file rather than tap:aio?

For example I mean:

losetup /dev/loop0 /workspace/agent/appliances/XenFileServer-3.20/rootfs

then in the config file:

phy:/dev/loop0,xvda1,r


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-26 14:26 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, Aug 25, 2011 at 09:31:46PM +0100, Anthony Wright
wrote:> On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
> > On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
> >> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> >>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright
wrote:
> >>>> I''ve just upgraded to xen 4.1.1 with a stock 3.0
kernel on dom0 (with
> >>>> the vga-support patch backported). I can''t get my
DomU''s to work due to
> >>>> the phy disks and vifs timing out in DomU and looking
through my logs
> >>>> this morning I''m getting a consistent kernel bug
report with xen
> >>>> mentioned at the top of the stack trace and vifdisconnect
mentioned on
> >>> Yikes! Ian any ideas what to try?
> >>>
> >>> Anthony, can you compile the kernel with debug=y and when this
happens
> >>> see what ''xl dmesg'' gives? Also there is
also the ''xl debug-keys g'' which
> >>> should dump the grants in use.. that might help a bit.
> >> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a
number of other
> >> config values appeared at this point, and I took defaults for
them).
> >>
> >> The output from /var/log/messages & ''xl
dmesg'' is attached. There was no
> >> output from ''xl debug-keys g''.
> > Ok, so I am hitting this too - I was hoping that the patch from
Stefano
> > would have fixed the issue, but sadly it did not.
> >
> > Let me (I am traveling right now) see if I can come up with an
internim
> > solution until Ian comes with the right fix.
> >
> Hi Konrad - any progress on this - it''s a bit of a show stopper
for me.
What is interesting is that it happens only with 32-bit guests and with
not-so fast hardware: Atom D510 for me and in your case MSI MS-7309 motherboard
(with what kind of processor?). I''ve a 64-bit hypervisor - not sure if
you
are using a 32-bit or 64-bit.

I hadn''t tried to reproduce this on the Atom D510 with a 64-bit Dom0.
But I was wondering if you had this setup before - with a 64-bit dom0?
Or is that really not an option with your CPU?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-26 14:44 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Fri, Aug 26, 2011 at 10:26:06AM -0400, Konrad Rzeszutek Wilk
wrote:> On Thu, Aug 25, 2011 at 09:31:46PM +0100, Anthony Wright wrote:
> > On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
> > >> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> > >>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright
wrote:
> > >>>> I''ve just upgraded to xen 4.1.1 with a stock
3.0 kernel on dom0 (with
> > >>>> the vga-support patch backported). I can''t
get my DomU''s to work due to
> > >>>> the phy disks and vifs timing out in DomU and looking
through my logs
> > >>>> this morning I''m getting a consistent kernel
bug report with xen
> > >>>> mentioned at the top of the stack trace and
vifdisconnect mentioned on
> > >>> Yikes! Ian any ideas what to try?
> > >>>
> > >>> Anthony, can you compile the kernel with debug=y and when
this happens
> > >>> see what ''xl dmesg'' gives? Also there
is also the ''xl debug-keys g'' which
> > >>> should dump the grants in use.. that might help a bit.
> > >> I''ve compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a
number of other
> > >> config values appeared at this point, and I took defaults for
them).
> > >>
> > >> The output from /var/log/messages & ''xl
dmesg'' is attached. There was no
> > >> output from ''xl debug-keys g''.
> > > Ok, so I am hitting this too - I was hoping that the patch from
Stefano
> > > would have fixed the issue, but sadly it did not.
> > >
> > > Let me (I am traveling right now) see if I can come up with an
internim
> > > solution until Ian comes with the right fix.
> > >
> > Hi Konrad - any progress on this - it''s a bit of a show
stopper for me.
> 
> What is interesting is that it happens only with 32-bit guests and with
> not-so fast hardware: Atom D510 for me and in your case MSI MS-7309
motherboard
> (with what kind of processor?). I''ve a 64-bit hypervisor - not
sure if you
> are using a 32-bit or 64-bit.
> 
> I hadn''t tried to reproduce this on the Atom D510 with a 64-bit
Dom0.
> But I was wondering if you had this setup before - with a 64-bit dom0?
> Or is that really not an option with your CPU?
So while I am still looking at the hypervisor code to figure out why
it would give me:

(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000

I''ve cobbled this patch^H^H^Hhack to retry the transaction to see if
this is
a tempory issue (race) or really - somehow that L1 PTE is gone.

If you could, can you try it out and see if the errors that are spit
are repeated - mainly the "Could not find L1 PTE". You will need to
run the hypervisor with "loglvl=all" to get that information.

to compile the hypervisor with debug=y to get that

diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index fd00f25..7bee981 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1607,7 +1607,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 	struct gnttab_map_grant_ref op;
 	struct xen_netif_tx_sring *txs;
 	struct xen_netif_rx_sring *rxs;
-
+	int retry = 3;
 	int err = -ENOMEM;
 
 	vif->tx_comms_area = alloc_vm_area(PAGE_SIZE);
@@ -1620,7 +1620,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 
 	gnttab_set_map_op(&op, (unsigned long)vif->tx_comms_area->addr,
 			  GNTMAP_host_map, tx_ring_ref, vif->domid);
-
+	op.status = 0;
+retry_tx:
 	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
 		BUG();
 
@@ -1628,6 +1629,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 		netdev_warn(vif->dev,
 			    "failed to map tx ring. err=%d status=%d\n",
 			    err, op.status);
+		if (retry-- > 0)
+			goto retry_tx;
 		err = op.status;
 		goto err;
 	}
@@ -1641,6 +1644,9 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 	gnttab_set_map_op(&op, (unsigned long)vif->rx_comms_area->addr,
 			  GNTMAP_host_map, rx_ring_ref, vif->domid);
 
+	retry = 3;
+	op.status = 0;
+retry_rx:
 	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
 		BUG();
 
@@ -1648,6 +1654,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 		netdev_warn(vif->dev,
 			    "failed to map rx ring. err=%d status=%d\n",
 			    err, op.status);
+		if (retry-- > 0)
+			goto retry_rx;
 		err = op.status;
 		goto err;
 	}> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-29 12:13 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 26/08/2011 15:44, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 26, 2011 at 10:26:06AM -0400, Konrad Rzeszutek Wilk wrote:
>> What is interesting is that it happens only with 32-bit guests and with
>> not-so fast hardware: Atom D510 for me and in your case MSI MS-7309
motherboard
>> (with what kind of processor?). I''ve a 64-bit hypervisor - not
sure if you
>> are using a 32-bit or 64-bit.
>>
>> I hadn''t tried to reproduce this on the Atom D510 with a
64-bit Dom0.
>> But I was wondering if you had this setup before - with a 64-bit dom0?
>> Or is that really not an option with your CPU?
> So while I am still looking at the hypervisor code to figure out why
> it would give me:
>
> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
>
> I''ve cobbled this patch^H^H^Hhack to retry the transaction to see
if this is
> a tempory issue (race) or really - somehow that L1 PTE is gone.
>
> If you could, can you try it out and see if the errors that are spit
> are repeated - mainly the "Could not find L1 PTE". You will need
to
> run the hypervisor with "loglvl=all" to get that information.
>
> to compile the hypervisor with debug=y to get thatI built xen with debug on I think (make debug=y world ; make debug=y
install)

I''ve taken linux 3.0.3 and added the patch which seems to have compiled
correctly.

The DomU''s continue to fail as before.

Attached are a number of logs:

dmesg.0.log - After Dom0 had booted, but before any DomU''s had started
dmesg.1.log - After the first DomU had started (subsequent DomUs
generated no further messages)

domU-1.log - the console log of the first time the dom U was started
This dom U fails to run, and generates a kernel bug report

domU-2.log - the console log of the second time the dom U was started
This dom U also fails to run, but with different output. It does not
generate a kernel bug report

messages - the relevant /var/log/messages output

Anthony.






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Aug-29 17:33 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 26/08/2011 15:26, Konrad Rzeszutek Wilk wrote:> On Thu, Aug 25, 2011 at 09:31:46PM +0100, Anthony Wright wrote:
>> Hi Konrad - any progress on this - it''s a bit of a show
stopper for me.
> What is interesting is that it happens only with 32-bit guests and with
> not-so fast hardware: Atom D510 for me and in your case MSI MS-7309
motherboard
> (with what kind of processor?). I''ve a 64-bit hypervisor - not
sure if you
> are using a 32-bit or 64-bit.
>
> I hadn''t tried to reproduce this on the Atom D510 with a 64-bit
Dom0.
> But I was wondering if you had this setup before - with a 64-bit dom0?
> Or is that really not an option with your CPU?The processor for the system I''m having problems with is an "AMD
Athlon
II X2 250", I''ve attached the cpuinfo output. It''s a 64
bit processor
that supports HVM. I run everything as 32 bit and use
paravirtualisation, so that I can work with a wide range of systems, so
we have a 32 bit Dom0 running 32 bit DomUs using paravirtualisation on a
64 bit processor that supports HVM.

As an experiment I''ve spent my day building a 64 bit dom0 kernel, which
I then discovered also needs a 64 bit Xen. I built a 64 bit Xen, but the
32 bit xen tools don''t seem to want to work with the 64 bit xen, and at
that point things started going wrong since I trying to switch the
entire system to 64 bit is going to be very painful (all the xen tools,
all the libraries and all the other applications that rely on those
libraries). Is there a simple way to get the 32 bit xen tools to control
a 64 bit xen and dom0 kernel, or am I stuck?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Vrabel

2011-Aug-31 16:58 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:> 
> So while I am still looking at the hypervisor code to figure out why
> it would give me [when trying to map a grant page]:
> 
> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
It is failing in guest_map_l1e() because the page for the vmalloc''d
virtual address PTEs is not present.

The test that fails is:

(l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT

I think this is because the GNTTABOP_map_grant_ref hypercall is done
when task->active_mm != &init_mm and alloc_vm_area() only adds PTEs into
init_mm so when Xen looks in the page tables it doesn''t find the
entries
because they''re not there yet.

Putting a call to vmalloc_sync_all() after create_vm_area() and before
the hypercall makes it work for me.  Classic Xen kernels used to have
such a call.

This presumably works on some systems/configuration and not others
depending on what else is using vmalloc(). i.e., if another kernel
thread (?) calls vmalloc() etc. then there will be a page for vmalloc
area PTEs and it will work.

I''ll try and post a patch tomorrow.

Thanks to Ian Campbell for pointing me in the right direction.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-31 17:07 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel
wrote:> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
> > 
> > So while I am still looking at the hypervisor code to figure out why
> > it would give me [when trying to map a grant page]:
> > 
> > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> 
> It is failing in guest_map_l1e() because the page for the
vmalloc''d
> virtual address PTEs is not present.
> 
> The test that fails is:
> 
> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT
> 
> I think this is because the GNTTABOP_map_grant_ref hypercall is done
> when task->active_mm != &init_mm and alloc_vm_area() only adds PTEs
into
> init_mm so when Xen looks in the page tables it doesn''t find the
entries
> because they''re not there yet.
> 
> Putting a call to vmalloc_sync_all() after create_vm_area() and before
> the hypercall makes it work for me.  Classic Xen kernels used to have
> such a call.
That sounds quite reasonable.
> 
> This presumably works on some systems/configuration and not others
> depending on what else is using vmalloc(). i.e., if another kernel
> thread (?) calls vmalloc() etc. then there will be a page for vmalloc
> area PTEs and it will work.
> 
> I''ll try and post a patch tomorrow.
> 
> Thanks to Ian Campbell for pointing me in the right direction.
Great! Thanks for hunting this one down.> 
> David
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-01 07:42 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk
wrote:> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
> > On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
> > > 
> > > So while I am still looking at the hypervisor code to figure out
why
> > > it would give me [when trying to map a grant page]:
> > > 
> > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> > 
> > It is failing in guest_map_l1e() because the page for the
vmalloc''d
> > virtual address PTEs is not present.
> > 
> > The test that fails is:
> > 
> > (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
> > 
> > I think this is because the GNTTABOP_map_grant_ref hypercall is done
> > when task->active_mm != &init_mm and alloc_vm_area() only adds
PTEs into
> > init_mm so when Xen looks in the page tables it doesn''t find
the entries
> > because they''re not there yet.
> > 
> > Putting a call to vmalloc_sync_all() after create_vm_area() and before
> > the hypercall makes it work for me.  Classic Xen kernels used to have
> > such a call.
> 
> That sounds quite reasonable.
I was wondering why upstream was missing the vmalloc_sync_all() in
alloc_vm_area() since the out-of-tree kernels did have it and the
function was added by us. I found this:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a

commit ef691947d8a3d479e67652312783aedcf629320a
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Wed Dec 1 15:45:48 2010 -0800

    vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
    
    There''s no need for it: it will get faulted into the current
pagetable
    as needed.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The flaw in the reasoning here is that you cannot take a kernel fault
while processing a hypercall, so hypercall arguments must have been
faulted in beforehand and that is what the sync_all was for.

It''s probably fair to say that the Xen specific caller should take care
of that Xen-specific requirement rather than pushing it into common
code. On the other hand Xen is the only user and creating a Xen specific
helper/wrapper seems a bit pointless.

Ian.
> > 
> > This presumably works on some systems/configuration and not others
> > depending on what else is using vmalloc(). i.e., if another kernel
> > thread (?) calls vmalloc() etc. then there will be a page for vmalloc
> > area PTEs and it will work.
> > 
> > I''ll try and post a patch tomorrow.
> > 
> > Thanks to Ian Campbell for pointing me in the right direction.
> 
> Great! Thanks for hunting this one down.
> > 
> > David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-01 14:23 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, Sep 01, 2011 at 08:42:52AM +0100, Ian Campbell
wrote:> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
> > On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
> > > On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
> > > > 
> > > > So while I am still looking at the hypervisor code to figure
out why
> > > > it would give me [when trying to map a grant page]:
> > > > 
> > > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address
fbb42000
> > > 
> > > It is failing in guest_map_l1e() because the page for the
vmalloc''d
> > > virtual address PTEs is not present.
> > > 
> > > The test that fails is:
> > > 
> > > (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
> > > 
> > > I think this is because the GNTTABOP_map_grant_ref hypercall is
done
> > > when task->active_mm != &init_mm and alloc_vm_area() only
adds PTEs into
> > > init_mm so when Xen looks in the page tables it doesn''t
find the entries
> > > because they''re not there yet.
> > > 
> > > Putting a call to vmalloc_sync_all() after create_vm_area() and
before
> > > the hypercall makes it work for me.  Classic Xen kernels used to
have
> > > such a call.
> > 
> > That sounds quite reasonable.
> 
> I was wondering why upstream was missing the vmalloc_sync_all() in
> alloc_vm_area() since the out-of-tree kernels did have it and the
> function was added by us. I found this:
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
> 
> commit ef691947d8a3d479e67652312783aedcf629320a
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Wed Dec 1 15:45:48 2010 -0800
> 
>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
>     
>     There''s no need for it: it will get faulted into the current
pagetable
>     as needed.
>     
>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> 
> The flaw in the reasoning here is that you cannot take a kernel fault
> while processing a hypercall, so hypercall arguments must have been
> faulted in beforehand and that is what the sync_all was for.
> 
> It''s probably fair to say that the Xen specific caller should take
care
> of that Xen-specific requirement rather than pushing it into common
> code. On the other hand Xen is the only user and creating a Xen specific
> helper/wrapper seems a bit pointless.
Perhaps then doing the vmalloc_sync_all() (or are more precise one:
vmalloc_sync_one) should be employed in the netback code then?

And obviously guarded by the CONFIG_HIGHMEM case?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Vrabel

2011-Sep-01 15:12 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 01/09/11 15:23, Konrad Rzeszutek Wilk wrote:> On Thu, Sep 01, 2011 at 08:42:52AM +0100, Ian Campbell wrote:
>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
>>>>>
>>>>> So while I am still looking at the hypervisor code to
figure out why
>>>>> it would give me [when trying to map a grant page]:
>>>>>
>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address
fbb42000
>>>>
>>>> It is failing in guest_map_l1e() because the page for the
vmalloc''d
>>>> virtual address PTEs is not present.
>>>>
>>>> The test that fails is:
>>>>
>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
>>>>
>>>> I think this is because the GNTTABOP_map_grant_ref hypercall is
done
>>>> when task->active_mm != &init_mm and alloc_vm_area()
only adds PTEs into
>>>> init_mm so when Xen looks in the page tables it
doesn''t find the entries
>>>> because they''re not there yet.
>>>>
>>>> Putting a call to vmalloc_sync_all() after create_vm_area() and
before
>>>> the hypercall makes it work for me.  Classic Xen kernels used
to have
>>>> such a call.
>>>
>>> That sounds quite reasonable.
>>
>> I was wondering why upstream was missing the vmalloc_sync_all() in
>> alloc_vm_area() since the out-of-tree kernels did have it and the
>> function was added by us. I found this:
>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
>>
>> commit ef691947d8a3d479e67652312783aedcf629320a
>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> Date:   Wed Dec 1 15:45:48 2010 -0800
>>
>>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
>>     
>>     There''s no need for it: it will get faulted into the
current pagetable
>>     as needed.
>>     
>>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>>
>> The flaw in the reasoning here is that you cannot take a kernel fault
>> while processing a hypercall, so hypercall arguments must have been
>> faulted in beforehand and that is what the sync_all was for.
>>
>> It''s probably fair to say that the Xen specific caller should
take care
>> of that Xen-specific requirement rather than pushing it into common
>> code. On the other hand Xen is the only user and creating a Xen
specific
>> helper/wrapper seems a bit pointless.
> 
> Perhaps then doing the vmalloc_sync_all() (or are more precise one:
> vmalloc_sync_one) should be employed in the netback code then?
> 
> And obviously guarded by the CONFIG_HIGHMEM case?
Perhaps. But I think the correct thing to do initially is revert the
change and then look at possible improvements.  Particularly as the fix
needs to be a backported to stable.

David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-01 15:12 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, 2011-09-01 at 15:23 +0100, Konrad Rzeszutek Wilk
wrote:> On Thu, Sep 01, 2011 at 08:42:52AM +0100, Ian Campbell wrote:
> > On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
> > > > On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
> > > > > 
> > > > > So while I am still looking at the hypervisor code to
figure out why
> > > > > it would give me [when trying to map a grant page]:
> > > > > 
> > > > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address
fbb42000
> > > > 
> > > > It is failing in guest_map_l1e() because the page for the
vmalloc''d
> > > > virtual address PTEs is not present.
> > > > 
> > > > The test that fails is:
> > > > 
> > > > (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
> > > > 
> > > > I think this is because the GNTTABOP_map_grant_ref hypercall
is done
> > > > when task->active_mm != &init_mm and alloc_vm_area()
only adds PTEs into
> > > > init_mm so when Xen looks in the page tables it
doesn''t find the entries
> > > > because they''re not there yet.
> > > > 
> > > > Putting a call to vmalloc_sync_all() after create_vm_area()
and before
> > > > the hypercall makes it work for me.  Classic Xen kernels
used to have
> > > > such a call.
> > > 
> > > That sounds quite reasonable.
> > 
> > I was wondering why upstream was missing the vmalloc_sync_all() in
> > alloc_vm_area() since the out-of-tree kernels did have it and the
> > function was added by us. I found this:
> >
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
> > 
> > commit ef691947d8a3d479e67652312783aedcf629320a
> > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> > Date:   Wed Dec 1 15:45:48 2010 -0800
> > 
> >     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
> >     
> >     There''s no need for it: it will get faulted into the
current pagetable
> >     as needed.
> >     
> >     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> > 
> > The flaw in the reasoning here is that you cannot take a kernel fault
> > while processing a hypercall, so hypercall arguments must have been
> > faulted in beforehand and that is what the sync_all was for.
> > 
> > It''s probably fair to say that the Xen specific caller should
take care
> > of that Xen-specific requirement rather than pushing it into common
> > code. On the other hand Xen is the only user and creating a Xen
specific
> > helper/wrapper seems a bit pointless.
> 
> Perhaps then doing the vmalloc_sync_all() (or are more precise one:
> vmalloc_sync_one) should be employed in the netback code then?
Not just netback but everywhere which uses this interface.
> And obviously guarded by the CONFIG_HIGHMEM case?
I don''t think this has anything to do with highmem, does it? It is
potentially just as much of a problem on 64 bit for example.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-01 15:37 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

> >>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
> >>     
> >>     There''s no need for it: it will get faulted into the
current pagetable
> >>     as needed.
> >>     
> >>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> >>
> >> The flaw in the reasoning here is that you cannot take a kernel
fault
> >> while processing a hypercall, so hypercall arguments must have
been
> >> faulted in beforehand and that is what the sync_all was for.
> >>
> >> It''s probably fair to say that the Xen specific caller
should take care
> >> of that Xen-specific requirement rather than pushing it into
common
> >> code. On the other hand Xen is the only user and creating a Xen
specific
> >> helper/wrapper seems a bit pointless.
> > 
> > Perhaps then doing the vmalloc_sync_all() (or are more precise one:
> > vmalloc_sync_one) should be employed in the netback code then?
> > 
> > And obviously guarded by the CONFIG_HIGHMEM case?
> 
> Perhaps. But I think the correct thing to do initially is revert the
> change and then look at possible improvements.  Particularly as the fix
> needs to be a backported to stable.
I disagree. Ian pointed out properly that this a Xen requirment - and there
is no reason for us to slow down non-Xen runs with vmalloc_sync_all plucked in
a generic path.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-01 15:38 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

> > > The flaw in the reasoning here is that you cannot take a kernel
fault
> > > while processing a hypercall, so hypercall arguments must have
been
> > > faulted in beforehand and that is what the sync_all was for.
> > > 
> > > It''s probably fair to say that the Xen specific caller
should take care
> > > of that Xen-specific requirement rather than pushing it into
common
> > > code. On the other hand Xen is the only user and creating a Xen
specific
> > > helper/wrapper seems a bit pointless.
> > 
> > Perhaps then doing the vmalloc_sync_all() (or are more precise one:
> > vmalloc_sync_one) should be employed in the netback code then?
> 
> Not just netback but everywhere which uses this interface.
Which is for right now netback :-). But yes - wherever we use that
we should do follow with some sort of vmalloc.> 
> > And obviously guarded by the CONFIG_HIGHMEM case?
> 
> I don''t think this has anything to do with highmem, does it? It is
> potentially just as much of a problem on 64 bit for example.
You are right. I somehow had vmalloc == highmem equated but that is
bogus.> 
> Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-01 15:43 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, 2011-09-01 at 16:37 +0100, Konrad Rzeszutek Wilk
wrote:> > >>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
> > >>     
> > >>     There''s no need for it: it will get faulted into
the current pagetable
> > >>     as needed.
> > >>     
> > >>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> > >>
> > >> The flaw in the reasoning here is that you cannot take a
kernel fault
> > >> while processing a hypercall, so hypercall arguments must
have been
> > >> faulted in beforehand and that is what the sync_all was for.
> > >>
> > >> It''s probably fair to say that the Xen specific
caller should take care
> > >> of that Xen-specific requirement rather than pushing it into
common
> > >> code. On the other hand Xen is the only user and creating a
Xen specific
> > >> helper/wrapper seems a bit pointless.
> > > 
> > > Perhaps then doing the vmalloc_sync_all() (or are more precise
one:
> > > vmalloc_sync_one) should be employed in the netback code then?
> > > 
> > > And obviously guarded by the CONFIG_HIGHMEM case?
> > 
> > Perhaps. But I think the correct thing to do initially is revert the
> > change and then look at possible improvements.  Particularly as the
fix
> > needs to be a backported to stable.
> 
> I disagree. Ian pointed out properly that this a Xen requirment - and there
> is no reason for us to slow down non-Xen runs with vmalloc_sync_all plucked
in
> a generic path.
There is literally no other caller of alloc_vm_area than xen so you
won''t be slowing anyone else down.

Maybe we should add alloc_vm_area_sync and use that everywhere?

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-01 15:44 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, 2011-09-01 at 16:38 +0100, Konrad Rzeszutek Wilk
wrote:> > > > The flaw in the reasoning here is that you cannot take a
kernel fault
> > > > while processing a hypercall, so hypercall arguments must
have been
> > > > faulted in beforehand and that is what the sync_all was for.
> > > > 
> > > > It''s probably fair to say that the Xen specific
caller should take care
> > > > of that Xen-specific requirement rather than pushing it into
common
> > > > code. On the other hand Xen is the only user and creating a
Xen specific
> > > > helper/wrapper seems a bit pointless.
> > > 
> > > Perhaps then doing the vmalloc_sync_all() (or are more precise
one:
> > > vmalloc_sync_one) should be employed in the netback code then?
> > 
> > Not just netback but everywhere which uses this interface.
> 
> Which is for right now netback :-). But yes - wherever we use that
> we should do follow with some sort of vmalloc.
blkback, xenbus_client and the grant table stuff all use it as well and
AFAICT have the same requirement for syncing.

$ git grep alloc_vm_area 
arch/x86/include/asm/xen/grant_table.h:#define xen_alloc_vm_area(size) 
alloc_vm_area(size)

-- this macro is unused...

arch/x86/xen/grant-table.c:                     xen_alloc_vm_area(PAGE_SIZE *
max_nr_gframes);
drivers/block/xen-blkback/xenbus.c:     blkif->blk_ring_area =
alloc_vm_area(PAGE_SIZE);
drivers/net/xen-netback/netback.c:      vif->tx_comms_area =
alloc_vm_area(PAGE_SIZE);
drivers/net/xen-netback/netback.c:      vif->rx_comms_area =
alloc_vm_area(PAGE_SIZE);
drivers/xen/xenbus/xenbus_client.c:     area = xen_alloc_vm_area(PAGE_SIZE);

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-01 16:07 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, Sep 01, 2011 at 04:43:05PM +0100, Ian Campbell
wrote:> On Thu, 2011-09-01 at 16:37 +0100, Konrad Rzeszutek Wilk wrote:
> > > >>     vmalloc: remove vmalloc_sync_all() from
alloc_vm_area()
> > > >>     
> > > >>     There''s no need for it: it will get faulted
into the current pagetable
> > > >>     as needed.
> > > >>     
> > > >>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> > > >>
> > > >> The flaw in the reasoning here is that you cannot take a
kernel fault
> > > >> while processing a hypercall, so hypercall arguments
must have been
> > > >> faulted in beforehand and that is what the sync_all was
for.
> > > >>
> > > >> It''s probably fair to say that the Xen specific
caller should take care
> > > >> of that Xen-specific requirement rather than pushing it
into common
> > > >> code. On the other hand Xen is the only user and
creating a Xen specific
> > > >> helper/wrapper seems a bit pointless.
> > > > 
> > > > Perhaps then doing the vmalloc_sync_all() (or are more
precise one:
> > > > vmalloc_sync_one) should be employed in the netback code
then?
> > > > 
> > > > And obviously guarded by the CONFIG_HIGHMEM case?
> > > 
> > > Perhaps. But I think the correct thing to do initially is revert
the
> > > change and then look at possible improvements.  Particularly as
the fix
> > > needs to be a backported to stable.
> > 
> > I disagree. Ian pointed out properly that this a Xen requirment - and
there
> > is no reason for us to slow down non-Xen runs with vmalloc_sync_all
plucked in
> > a generic path.
> 
> There is literally no other caller of alloc_vm_area than xen so you
> won''t be slowing anyone else down.
Duh! I totally missed that. Sounds plausible then - let me ping Andrew Morton
on re-adding the vmalloc back.> 
> Maybe we should add alloc_vm_area_sync and use that everywhere?
That is an option too.> 
> Ian.
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Sep-01 17:32 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 09/01/2011 12:42 AM, Ian Campbell wrote:> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
>>>> So while I am still looking at the hypervisor code to figure
out why
>>>> it would give me [when trying to map a grant page]:
>>>>
>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
>>> It is failing in guest_map_l1e() because the page for the
vmalloc''d
>>> virtual address PTEs is not present.
>>>
>>> The test that fails is:
>>>
>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
>>>
>>> I think this is because the GNTTABOP_map_grant_ref hypercall is
done
>>> when task->active_mm != &init_mm and alloc_vm_area() only
adds PTEs into
>>> init_mm so when Xen looks in the page tables it doesn''t
find the entries
>>> because they''re not there yet.
>>>
>>> Putting a call to vmalloc_sync_all() after create_vm_area() and
before
>>> the hypercall makes it work for me.  Classic Xen kernels used to
have
>>> such a call.
>> That sounds quite reasonable.
> I was wondering why upstream was missing the vmalloc_sync_all() in
> alloc_vm_area() since the out-of-tree kernels did have it and the
> function was added by us. I found this:
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
>
> commit ef691947d8a3d479e67652312783aedcf629320a
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Wed Dec 1 15:45:48 2010 -0800
>
>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
>     
>     There''s no need for it: it will get faulted into the current
pagetable
>     as needed.
>     
>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>
> The flaw in the reasoning here is that you cannot take a kernel fault
> while processing a hypercall, so hypercall arguments must have been
> faulted in beforehand and that is what the sync_all was for.
That''s a good point.  (Maybe Xen should have generated pagefaults when
hypercall arg pointers are bad...)
> It''s probably fair to say that the Xen specific caller should take
care
> of that Xen-specific requirement rather than pushing it into common
> code. On the other hand Xen is the only user and creating a Xen specific
> helper/wrapper seems a bit pointless.
There''s already a wrapper: xen_alloc_vm_area(), which is just a
#define.  But we could easily add a sync_all to it (and use it in
netback, like we do in grant-table and xenbus).

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Sep-01 17:34 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 09/01/2011 08:44 AM, Ian Campbell wrote:>
> blkback, xenbus_client and the grant table stuff all use it as well and
> AFAICT have the same requirement for syncing.
>
> $ git grep alloc_vm_area 
> arch/x86/include/asm/xen/grant_table.h:#define xen_alloc_vm_area(size) 
alloc_vm_area(size)
>
> -- this macro is unused...
>
> arch/x86/xen/grant-table.c:                     xen_alloc_vm_area(PAGE_SIZE
* max_nr_gframes);
> drivers/block/xen-blkback/xenbus.c:     blkif->blk_ring_area =
alloc_vm_area(PAGE_SIZE);
> drivers/net/xen-netback/netback.c:      vif->tx_comms_area =
alloc_vm_area(PAGE_SIZE);
> drivers/net/xen-netback/netback.c:      vif->rx_comms_area =
alloc_vm_area(PAGE_SIZE);
> drivers/xen/xenbus/xenbus_client.c:     area =
xen_alloc_vm_area(PAGE_SIZE);
Well, 3/5ths unused.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-01 19:19 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, 2011-09-01 at 18:34 +0100, Jeremy Fitzhardinge
wrote:> On 09/01/2011 08:44 AM, Ian Campbell wrote:
> >
> > blkback, xenbus_client and the grant table stuff all use it as well
and
> > AFAICT have the same requirement for syncing.
> >
> > $ git grep alloc_vm_area 
> > arch/x86/include/asm/xen/grant_table.h:#define xen_alloc_vm_area(size)
alloc_vm_area(size)
> >
> > -- this macro is unused...
> >
> > arch/x86/xen/grant-table.c:                    
xen_alloc_vm_area(PAGE_SIZE * max_nr_gframes);
> > drivers/block/xen-blkback/xenbus.c:     blkif->blk_ring_area =
alloc_vm_area(PAGE_SIZE);
> > drivers/net/xen-netback/netback.c:      vif->tx_comms_area =
alloc_vm_area(PAGE_SIZE);
> > drivers/net/xen-netback/netback.c:      vif->rx_comms_area =
alloc_vm_area(PAGE_SIZE);
> > drivers/xen/xenbus/xenbus_client.c:     area =
xen_alloc_vm_area(PAGE_SIZE);
> 
> Well, 3/5ths unused.
Hmm, yes, no sure how I missed that.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-01 19:21 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, 2011-09-01 at 18:32 +0100, Jeremy Fitzhardinge
wrote:> On 09/01/2011 12:42 AM, Ian Campbell wrote:
> > On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
> >>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
> >>>> So while I am still looking at the hypervisor code to
figure out why
> >>>> it would give me [when trying to map a grant page]:
> >>>>
> >>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address
fbb42000
> >>> It is failing in guest_map_l1e() because the page for the
vmalloc''d
> >>> virtual address PTEs is not present.
> >>>
> >>> The test that fails is:
> >>>
> >>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
> >>>
> >>> I think this is because the GNTTABOP_map_grant_ref hypercall
is done
> >>> when task->active_mm != &init_mm and alloc_vm_area()
only adds PTEs into
> >>> init_mm so when Xen looks in the page tables it
doesn''t find the entries
> >>> because they''re not there yet.
> >>>
> >>> Putting a call to vmalloc_sync_all() after create_vm_area()
and before
> >>> the hypercall makes it work for me.  Classic Xen kernels used
to have
> >>> such a call.
> >> That sounds quite reasonable.
> > I was wondering why upstream was missing the vmalloc_sync_all() in
> > alloc_vm_area() since the out-of-tree kernels did have it and the
> > function was added by us. I found this:
> >
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
> >
> > commit ef691947d8a3d479e67652312783aedcf629320a
> > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> > Date:   Wed Dec 1 15:45:48 2010 -0800
> >
> >     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
> >     
> >     There''s no need for it: it will get faulted into the
current pagetable
> >     as needed.
> >     
> >     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> >
> > The flaw in the reasoning here is that you cannot take a kernel fault
> > while processing a hypercall, so hypercall arguments must have been
> > faulted in beforehand and that is what the sync_all was for.
> 
> That''s a good point.  (Maybe Xen should have generated pagefaults
when
> hypercall arg pointers are bad...)
I think it would be a bit tricky to do in practice, you''d either have
to
support recursive hypercalls in the middle of other hypercalls (because
the page fault handler is surely going to want to do some) or proper
hypercall restart (so you can fully return to guest context to handle
the fault then retry) or something along those and complexifying up the
hypervisor one way or another. Probably not impossible if you were
building something form the ground up, but not trivial.
> > It''s probably fair to say that the Xen specific caller should
take care
> > of that Xen-specific requirement rather than pushing it into common
> > code. On the other hand Xen is the only user and creating a Xen
specific
> > helper/wrapper seems a bit pointless.
> 
> There''s already a wrapper: xen_alloc_vm_area(), which is just a
> #define.  But we could easily add a sync_all to it (and use it in
> netback, like we do in grant-table and xenbus).
OOI what was the wrapper for originally?

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Sep-01 20:34 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 09/01/2011 12:21 PM, Ian Campbell wrote:> On Thu, 2011-09-01 at 18:32 +0100, Jeremy Fitzhardinge wrote:
>> On 09/01/2011 12:42 AM, Ian Campbell wrote:
>>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
>>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
>>>>>> So while I am still looking at the hypervisor code to
figure out why
>>>>>> it would give me [when trying to map a grant page]:
>>>>>>
>>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address
fbb42000
>>>>> It is failing in guest_map_l1e() because the page for the
vmalloc''d
>>>>> virtual address PTEs is not present.
>>>>>
>>>>> The test that fails is:
>>>>>
>>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
>>>>>
>>>>> I think this is because the GNTTABOP_map_grant_ref
hypercall is done
>>>>> when task->active_mm != &init_mm and alloc_vm_area()
only adds PTEs into
>>>>> init_mm so when Xen looks in the page tables it
doesn''t find the entries
>>>>> because they''re not there yet.
>>>>>
>>>>> Putting a call to vmalloc_sync_all() after create_vm_area()
and before
>>>>> the hypercall makes it work for me.  Classic Xen kernels
used to have
>>>>> such a call.
>>>> That sounds quite reasonable.
>>> I was wondering why upstream was missing the vmalloc_sync_all() in
>>> alloc_vm_area() since the out-of-tree kernels did have it and the
>>> function was added by us. I found this:
>>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
>>>
>>> commit ef691947d8a3d479e67652312783aedcf629320a
>>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>> Date:   Wed Dec 1 15:45:48 2010 -0800
>>>
>>>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
>>>     
>>>     There''s no need for it: it will get faulted into the
current pagetable
>>>     as needed.
>>>     
>>>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>>>
>>> The flaw in the reasoning here is that you cannot take a kernel
fault
>>> while processing a hypercall, so hypercall arguments must have been
>>> faulted in beforehand and that is what the sync_all was for.
>> That''s a good point.  (Maybe Xen should have generated
pagefaults when
>> hypercall arg pointers are bad...)
> I think it would be a bit tricky to do in practice, you''d either
have to
> support recursive hypercalls in the middle of other hypercalls (because
> the page fault handler is surely going to want to do some) or proper
> hypercall restart (so you can fully return to guest context to handle
> the fault then retry) or something along those and complexifying up the
> hypervisor one way or another. Probably not impossible if you were
> building something form the ground up, but not trivial.
Well, Xen already has the continuation machinery for dealing with
hypercall restart, so that could be reused.  And accesses to guest
memory are already special events which must be checked so that EFAULT
can be returned.  If, rather than failing with EFAULT Xen set up a
pagefault exception for the guest CPU with the return set up to retry
the hypercall, it should all work...

Of course, if the guest isn''t expecting that - or its buggy - then it
could end up in an infinite loop.  But maybe a flag (set a high bit in
the hypercall number?), or a feature, or something?  Might be worthwhile
if it saves guests having to do something expensive (like a
vmalloc_sync_all), even if they have to also deal with old hypervisors.
>> There''s already a wrapper: xen_alloc_vm_area(), which is just
a
>> #define.  But we could easily add a sync_all to it (and use it in
>> netback, like we do in grant-table and xenbus).
> OOI what was the wrapper for originally?
Not sure; I brought it over from 2.6.18-xen.

BTW, vmalloc_sync_all() is much hated, and is slated for removal at some
point - there are definitely target sights on it.  So we should think
about not needing it.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-02 07:17 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Thu, 2011-09-01 at 21:34 +0100, Jeremy Fitzhardinge
wrote:> On 09/01/2011 12:21 PM, Ian Campbell wrote:
> > On Thu, 2011-09-01 at 18:32 +0100, Jeremy Fitzhardinge wrote:
> >> On 09/01/2011 12:42 AM, Ian Campbell wrote:
> >>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk
wrote:
> >>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel
wrote:
> >>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
> >>>>>> So while I am still looking at the hypervisor code
to figure out why
> >>>>>> it would give me [when trying to map a grant
page]:
> >>>>>>
> >>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for
address fbb42000
> >>>>> It is failing in guest_map_l1e() because the page for
the vmalloc''d
> >>>>> virtual address PTEs is not present.
> >>>>>
> >>>>> The test that fails is:
> >>>>>
> >>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE))
!= _PAGE_PRESENT
> >>>>>
> >>>>> I think this is because the GNTTABOP_map_grant_ref
hypercall is done
> >>>>> when task->active_mm != &init_mm and
alloc_vm_area() only adds PTEs into
> >>>>> init_mm so when Xen looks in the page tables it
doesn''t find the entries
> >>>>> because they''re not there yet.
> >>>>>
> >>>>> Putting a call to vmalloc_sync_all() after
create_vm_area() and before
> >>>>> the hypercall makes it work for me.  Classic Xen
kernels used to have
> >>>>> such a call.
> >>>> That sounds quite reasonable.
> >>> I was wondering why upstream was missing the
vmalloc_sync_all() in
> >>> alloc_vm_area() since the out-of-tree kernels did have it and
the
> >>> function was added by us. I found this:
> >>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
> >>>
> >>> commit ef691947d8a3d479e67652312783aedcf629320a
> >>> Author: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> >>> Date:   Wed Dec 1 15:45:48 2010 -0800
> >>>
> >>>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
> >>>     
> >>>     There''s no need for it: it will get faulted into
the current pagetable
> >>>     as needed.
> >>>     
> >>>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> >>>
> >>> The flaw in the reasoning here is that you cannot take a
kernel fault
> >>> while processing a hypercall, so hypercall arguments must have
been
> >>> faulted in beforehand and that is what the sync_all was for.
> >> That''s a good point.  (Maybe Xen should have generated
pagefaults when
> >> hypercall arg pointers are bad...)
> > I think it would be a bit tricky to do in practice, you''d
either have to
> > support recursive hypercalls in the middle of other hypercalls
(because
> > the page fault handler is surely going to want to do some) or proper
> > hypercall restart (so you can fully return to guest context to handle
> > the fault then retry) or something along those and complexifying up
the
> > hypervisor one way or another. Probably not impossible if you were
> > building something form the ground up, but not trivial.
> 
> Well, Xen already has the continuation machinery for dealing with
> hypercall restart, so that could be reused.
That requires special support beyond just calling the continuation in
each hypercall (often extending into the ABI) for pickling progress and
picking it up again, only a small number of (usually long running)
hypercalls have that support today. It also uses the guest context to
store the state which perhaps isn''t helpful if you want to return to
the
guest, although I suppose building a nested frame would work.

The guys doing paging and sharing etc looked into this and came to the
conclusion that it would be intractably difficult to do this fully --
hence we now have the ability to sleep in hypercalls, which works
because the pager/sharer is in a different domain/vcpu.
>   And accesses to guest
> memory are already special events which must be checked so that EFAULT
> can be returned.  If, rather than failing with EFAULT Xen set up a
> pagefault exception for the guest CPU with the return set up to retry
> the hypercall, it should all work...
> 
> Of course, if the guest isn''t expecting that - or its buggy - then
it
> could end up in an infinite loop.  But maybe a flag (set a high bit in
> the hypercall number?), or a feature, or something?  Might be worthwhile
> if it saves guests having to do something expensive (like a
> vmalloc_sync_all), even if they have to also deal with old hypervisors.
The vmalloc_sync_all is a pretty event even on Xen though, isn''t it?
> >> There''s already a wrapper: xen_alloc_vm_area(), which is
just a
> >> #define.  But we could easily add a sync_all to it (and use it in
> >> netback, like we do in grant-table and xenbus).
> > OOI what was the wrapper for originally?
> 
> Not sure; I brought it over from 2.6.18-xen.
> 
> BTW, vmalloc_sync_all() is much hated, and is slated for removal at some
> point - there are definitely target sights on it.  So we should think
> about not needing it.
> 
>     J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Sep-02 20:26 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 09/02/2011 12:17 AM, Ian Campbell wrote:> On Thu, 2011-09-01 at 21:34 +0100, Jeremy Fitzhardinge wrote:
>> On 09/01/2011 12:21 PM, Ian Campbell wrote:
>>> On Thu, 2011-09-01 at 18:32 +0100, Jeremy Fitzhardinge wrote:
>>>> On 09/01/2011 12:42 AM, Ian Campbell wrote:
>>>>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk
wrote:
>>>>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel
wrote:
>>>>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
>>>>>>>> So while I am still looking at the hypervisor
code to figure out why
>>>>>>>> it would give me [when trying to map a grant
page]:
>>>>>>>>
>>>>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for
address fbb42000
>>>>>>> It is failing in guest_map_l1e() because the page
for the vmalloc''d
>>>>>>> virtual address PTEs is not present.
>>>>>>>
>>>>>>> The test that fails is:
>>>>>>>
>>>>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT |
_PAGE_PSE)) != _PAGE_PRESENT
>>>>>>>
>>>>>>> I think this is because the GNTTABOP_map_grant_ref
hypercall is done
>>>>>>> when task->active_mm != &init_mm and
alloc_vm_area() only adds PTEs into
>>>>>>> init_mm so when Xen looks in the page tables it
doesn''t find the entries
>>>>>>> because they''re not there yet.
>>>>>>>
>>>>>>> Putting a call to vmalloc_sync_all() after
create_vm_area() and before
>>>>>>> the hypercall makes it work for me.  Classic Xen
kernels used to have
>>>>>>> such a call.
>>>>>> That sounds quite reasonable.
>>>>> I was wondering why upstream was missing the
vmalloc_sync_all() in
>>>>> alloc_vm_area() since the out-of-tree kernels did have it
and the
>>>>> function was added by us. I found this:
>>>>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
>>>>>
>>>>> commit ef691947d8a3d479e67652312783aedcf629320a
>>>>> Author: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>>>>> Date:   Wed Dec 1 15:45:48 2010 -0800
>>>>>
>>>>>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
>>>>>     
>>>>>     There''s no need for it: it will get faulted
into the current pagetable
>>>>>     as needed.
>>>>>     
>>>>>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>>>>>
>>>>> The flaw in the reasoning here is that you cannot take a
kernel fault
>>>>> while processing a hypercall, so hypercall arguments must
have been
>>>>> faulted in beforehand and that is what the sync_all was
for.
>>>> That''s a good point.  (Maybe Xen should have generated
pagefaults when
>>>> hypercall arg pointers are bad...)
>>> I think it would be a bit tricky to do in practice, you''d
either have to
>>> support recursive hypercalls in the middle of other hypercalls
(because
>>> the page fault handler is surely going to want to do some) or
proper
>>> hypercall restart (so you can fully return to guest context to
handle
>>> the fault then retry) or something along those and complexifying up
the
>>> hypervisor one way or another. Probably not impossible if you were
>>> building something form the ground up, but not trivial.
>> Well, Xen already has the continuation machinery for dealing with
>> hypercall restart, so that could be reused.
> That requires special support beyond just calling the continuation in
> each hypercall (often extending into the ABI) for pickling progress and
> picking it up again, only a small number of (usually long running)
> hypercalls have that support today. It also uses the guest context to
> store the state which perhaps isn''t helpful if you want to return
to the
> guest, although I suppose building a nested frame would work.
I guess it depends on how many hypercalls do work before touching guest
memory, but any hypercall should be like that anyway, or at least be
able to wind back work done if a later read EFAULTs.

I was vaguely speculating about a scheme on the lines of:

 1. In copy_to/from_user, if we touch a bad address, save it in a
    per-vcpu "bad_guest_addr"
 2. when returning to the guest, if the errno is EFAULT and
    bad_guest_addr is set, then generate a memory fault frame with cr2    
bad_guest_addr, and with the exception return restarting the hypercall

Perhaps there should be a EFAULT_RETRY error return to trigger this
behaviour, rather than doing it for all EFAULTs, so the faulting
behaviour can be added incrementally.

Maybe this is a lost cause for x86, but perhaps its worth considering
for new ports?
> The guys doing paging and sharing etc looked into this and came to the
> conclusion that it would be intractably difficult to do this fully --
> hence we now have the ability to sleep in hypercalls, which works
> because the pager/sharer is in a different domain/vcpu.
Hmm.  Were they looking at injecting faults back into the guest, or
forwarding "missing page" events off to another domain?
>>   And accesses to guest
>> memory are already special events which must be checked so that EFAULT
>> can be returned.  If, rather than failing with EFAULT Xen set up a
>> pagefault exception for the guest CPU with the return set up to retry
>> the hypercall, it should all work...
>>
>> Of course, if the guest isn''t expecting that - or its buggy -
then it
>> could end up in an infinite loop.  But maybe a flag (set a high bit in
>> the hypercall number?), or a feature, or something?  Might be
worthwhile
>> if it saves guests having to do something expensive (like a
>> vmalloc_sync_all), even if they have to also deal with old hypervisors.
> The vmalloc_sync_all is a pretty event even on Xen though, isn''t
it?
Looks like an important word is missing there.  But its very expensive,
if that''s what you''re saying.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Sep-03 10:27 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On Fri, 2011-09-02 at 21:26 +0100, Jeremy Fitzhardinge
wrote:> On 09/02/2011 12:17 AM, Ian Campbell wrote:
> > On Thu, 2011-09-01 at 21:34 +0100, Jeremy Fitzhardinge wrote:
> >> On 09/01/2011 12:21 PM, Ian Campbell wrote:
> >>> On Thu, 2011-09-01 at 18:32 +0100, Jeremy Fitzhardinge wrote:
> >>>> On 09/01/2011 12:42 AM, Ian Campbell wrote:
> >>>>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek
Wilk wrote:
> >>>>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David
Vrabel wrote:
> >>>>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk
wrote:
> >>>>>>>> So while I am still looking at the
hypervisor code to figure out why
> >>>>>>>> it would give me [when trying to map a
grant page]:
> >>>>>>>>
> >>>>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE
for address fbb42000
> >>>>>>> It is failing in guest_map_l1e() because the
page for the vmalloc''d
> >>>>>>> virtual address PTEs is not present.
> >>>>>>>
> >>>>>>> The test that fails is:
> >>>>>>>
> >>>>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT |
_PAGE_PSE)) != _PAGE_PRESENT
> >>>>>>>
> >>>>>>> I think this is because the
GNTTABOP_map_grant_ref hypercall is done
> >>>>>>> when task->active_mm != &init_mm and
alloc_vm_area() only adds PTEs into
> >>>>>>> init_mm so when Xen looks in the page tables
it doesn''t find the entries
> >>>>>>> because they''re not there yet.
> >>>>>>>
> >>>>>>> Putting a call to vmalloc_sync_all() after
create_vm_area() and before
> >>>>>>> the hypercall makes it work for me.  Classic
Xen kernels used to have
> >>>>>>> such a call.
> >>>>>> That sounds quite reasonable.
> >>>>> I was wondering why upstream was missing the
vmalloc_sync_all() in
> >>>>> alloc_vm_area() since the out-of-tree kernels did have
it and the
> >>>>> function was added by us. I found this:
> >>>>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
> >>>>>
> >>>>> commit ef691947d8a3d479e67652312783aedcf629320a
> >>>>> Author: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> >>>>> Date:   Wed Dec 1 15:45:48 2010 -0800
> >>>>>
> >>>>>     vmalloc: remove vmalloc_sync_all() from
alloc_vm_area()
> >>>>>     
> >>>>>     There''s no need for it: it will get
faulted into the current pagetable
> >>>>>     as needed.
> >>>>>     
> >>>>>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
> >>>>>
> >>>>> The flaw in the reasoning here is that you cannot take
a kernel fault
> >>>>> while processing a hypercall, so hypercall arguments
must have been
> >>>>> faulted in beforehand and that is what the sync_all
was for.
> >>>> That''s a good point.  (Maybe Xen should have
generated pagefaults when
> >>>> hypercall arg pointers are bad...)
> >>> I think it would be a bit tricky to do in practice,
you''d either have to
> >>> support recursive hypercalls in the middle of other hypercalls
(because
> >>> the page fault handler is surely going to want to do some) or
proper
> >>> hypercall restart (so you can fully return to guest context to
handle
> >>> the fault then retry) or something along those and
complexifying up the
> >>> hypervisor one way or another. Probably not impossible if you
were
> >>> building something form the ground up, but not trivial.
> >> Well, Xen already has the continuation machinery for dealing with
> >> hypercall restart, so that could be reused.
> > That requires special support beyond just calling the continuation in
> > each hypercall (often extending into the ABI) for pickling progress
and
> > picking it up again, only a small number of (usually long running)
> > hypercalls have that support today. It also uses the guest context to
> > store the state which perhaps isn''t helpful if you want to
return to the
> > guest, although I suppose building a nested frame would work.
> 
> I guess it depends on how many hypercalls do work before touching guest
> memory, but any hypercall should be like that anyway, or at least be
> able to wind back work done if a later read EFAULTs.
> 
> I was vaguely speculating about a scheme on the lines of:
> 
>  1. In copy_to/from_user, if we touch a bad address, save it in a
>     per-vcpu "bad_guest_addr"
>  2. when returning to the guest, if the errno is EFAULT and
>     bad_guest_addr is set, then generate a memory fault frame with cr2 >
bad_guest_addr, and with the exception return restarting the hypercall
> 
> Perhaps there should be a EFAULT_RETRY error return to trigger this
> behaviour, rather than doing it for all EFAULTs, so the faulting
> behaviour can be added incrementally.
The kernel uses -ERESTARTSSYS for something similar, doesn''t it?

Does this scheme work if the hypercall causing the exception was itself
runnnig in an exception handler? I guess it depends on the architecture
+OSes handling of nested faults.
> Maybe this is a lost cause for x86, but perhaps its worth considering
> for new ports?
Certainly worth thinking about.
> > The guys doing paging and sharing etc looked into this and came to the
> > conclusion that it would be intractably difficult to do this fully --
> > hence we now have the ability to sleep in hypercalls, which works
> > because the pager/sharer is in a different domain/vcpu.
> 
> Hmm.  Were they looking at injecting faults back into the guest, or
> forwarding "missing page" events off to another domain?
Sharing and swapping are transparent to the domain, another domain runs
the swapper/unshare process (actually, unshare might be in the h/v
itself, not sure).
> >>   And accesses to guest
> >> memory are already special events which must be checked so that
EFAULT
> >> can be returned.  If, rather than failing with EFAULT Xen set up a
> >> pagefault exception for the guest CPU with the return set up to
retry
> >> the hypercall, it should all work...
> >>
> >> Of course, if the guest isn''t expecting that - or its
buggy - then it
> >> could end up in an infinite loop.  But maybe a flag (set a high
bit in
> >> the hypercall number?), or a feature, or something?  Might be
worthwhile
> >> if it saves guests having to do something expensive (like a
> >> vmalloc_sync_all), even if they have to also deal with old
hypervisors.
> > The vmalloc_sync_all is a pretty event even on Xen though,
isn''t it?
> 
> Looks like an important word is missing there.  But its very expensive,
> if that''s what you''re saying.
Oops. "rare" was the missing word.
> 
>     J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Sep-07 12:57 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 01/09/2011 16:12, David Vrabel wrote:> On 01/09/11 15:23, Konrad Rzeszutek Wilk wrote:
>> On Thu, Sep 01, 2011 at 08:42:52AM +0100, Ian Campbell wrote:
>>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
>>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
>>>>>> So while I am still looking at the hypervisor code to
figure out why
>>>>>> it would give me [when trying to map a grant page]:
>>>>>>
>>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address
fbb42000
>>>>> It is failing in guest_map_l1e() because the page for the
vmalloc''d
>>>>> virtual address PTEs is not present.
>>>>>
>>>>> The test that fails is:
>>>>>
>>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) !=
_PAGE_PRESENT
>>>>>
>>>>> I think this is because the GNTTABOP_map_grant_ref
hypercall is done
>>>>> when task->active_mm != &init_mm and alloc_vm_area()
only adds PTEs into
>>>>> init_mm so when Xen looks in the page tables it
doesn''t find the entries
>>>>> because they''re not there yet.
>>>>>
>>>>> Putting a call to vmalloc_sync_all() after create_vm_area()
and before
>>>>> the hypercall makes it work for me.  Classic Xen kernels
used to have
>>>>> such a call.
>>>> That sounds quite reasonable.
>>> I was wondering why upstream was missing the vmalloc_sync_all() in
>>> alloc_vm_area() since the out-of-tree kernels did have it and the
>>> function was added by us. I found this:
>>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
>>>
>>> commit ef691947d8a3d479e67652312783aedcf629320a
>>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>> Date:   Wed Dec 1 15:45:48 2010 -0800
>>>
>>>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
>>>     
>>>     There''s no need for it: it will get faulted into the
current pagetable
>>>     as needed.
>>>     
>>>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>>>
>>> The flaw in the reasoning here is that you cannot take a kernel
fault
>>> while processing a hypercall, so hypercall arguments must have been
>>> faulted in beforehand and that is what the sync_all was for.
>>>
>>> It''s probably fair to say that the Xen specific caller
should take care
>>> of that Xen-specific requirement rather than pushing it into common
>>> code. On the other hand Xen is the only user and creating a Xen
specific
>>> helper/wrapper seems a bit pointless.
>> Perhaps then doing the vmalloc_sync_all() (or are more precise one:
>> vmalloc_sync_one) should be employed in the netback code then?
>>
>> And obviously guarded by the CONFIG_HIGHMEM case?
> Perhaps. But I think the correct thing to do initially is revert the
> change and then look at possible improvements.  Particularly as the fix
> needs to be a backported to stable.
>
> David
>I have implement a patch which does essentially this, i.e. calls
vmalloc_sync_all() afer every alloc_vm_area() call (all 5 of them). Now
my VMs start correctly, but I still get error messages in the xen dmesg
output (attached).

Is this expected?

Anthony


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-07 18:35 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

> (XEN) mm.c:907:d0 Error getting mfn 3a09c (pfn 55555555) from L1 entry
000000003a09c023 for l1e_owner=0, pg_owner=0
> (XEN) mm.c:907:d0 Error getting mfn 3a09d (pfn 55555555) from L1 entry
000000003a09d023 for l1e_owner=0, pg_owner=0
> (XEN) mm.c:907:d0 Error getting mfn 3a09e (pfn 55555555) from L1 entry
000000003a09e023 for l1e_owner=0, pg_owner=0
> (XEN) mm.c:907:d0 Error getting mfn 3a09f (pfn 55555555) from L1 entry
000000003a09f023 for l1e_owner=0, pg_owner=0
> (XEN) traps.c:2388:d0 Domain attempted WRMSR c0010004 from
0x0000ab23d6d622da to 0x000000000000abcd.
Do they show up during bootup? As in do you see those _when_ you launch your
guests?
To figure out this particular issue you should try using
''console_to_ring'' (so that
dom0 output and Xen output are mingled togehter) and also post this under a new
subject
to not confuse this email thread.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Wright

2011-Sep-23 12:35 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 03/09/2011 11:27, Ian Campbell wrote:> On Fri, 2011-09-02 at 21:26 +0100, Jeremy Fitzhardinge wrote:
>> On 09/02/2011 12:17 AM, Ian Campbell wrote:
>>> On Thu, 2011-09-01 at 21:34 +0100, Jeremy Fitzhardinge wrote:
>>>> On 09/01/2011 12:21 PM, Ian Campbell wrote:
>>>>> On Thu, 2011-09-01 at 18:32 +0100, Jeremy Fitzhardinge
wrote:
>>>>>> On 09/01/2011 12:42 AM, Ian Campbell wrote:
>>>>>>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek
Wilk wrote:
>>>>>>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David
Vrabel wrote:
>>>>>>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk
wrote:
>>>>>>>>>> So while I am still looking at the
hypervisor code to figure out why
>>>>>>>>>> it would give me [when trying to map a
grant page]:
>>>>>>>>>>
>>>>>>>>>> (XEN) mm.c:3846:d0 Could not find L1
PTE for address fbb42000
>>>>>>>>> It is failing in guest_map_l1e() because
the page for the vmalloc''d
>>>>>>>>> virtual address PTEs is not present.
>>>>>>>>>
>>>>>>>>> The test that fails is:
>>>>>>>>>
>>>>>>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT |
_PAGE_PSE)) != _PAGE_PRESENT
>>>>>>>>>
>>>>>>>>> I think this is because the
GNTTABOP_map_grant_ref hypercall is done
>>>>>>>>> when task->active_mm != &init_mm and
alloc_vm_area() only adds PTEs into
>>>>>>>>> init_mm so when Xen looks in the page
tables it doesn''t find the entries
>>>>>>>>> because they''re not there yet.
>>>>>>>>>
>>>>>>>>> Putting a call to vmalloc_sync_all() after
create_vm_area() and before
>>>>>>>>> the hypercall makes it work for me. 
Classic Xen kernels used to have
>>>>>>>>> such a call.
>>>>>>>> That sounds quite reasonable.
>>>>>>> I was wondering why upstream was missing the
vmalloc_sync_all() in
>>>>>>> alloc_vm_area() since the out-of-tree kernels did
have it and the
>>>>>>> function was added by us. I found this:
>>>>>>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
>>>>>>>
>>>>>>> commit ef691947d8a3d479e67652312783aedcf629320a
>>>>>>> Author: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>>>>>>> Date:   Wed Dec 1 15:45:48 2010 -0800
>>>>>>>
>>>>>>>     vmalloc: remove vmalloc_sync_all() from
alloc_vm_area()
>>>>>>>     
>>>>>>>     There''s no need for it: it will get
faulted into the current pagetable
>>>>>>>     as needed.
>>>>>>>     
>>>>>>>     Signed-off-by: Jeremy Fitzhardinge
<jeremy.fitzhardinge@citrix.com>
>>>>>>>
>>>>>>> The flaw in the reasoning here is that you cannot
take a kernel fault
>>>>>>> while processing a hypercall, so hypercall
arguments must have been
>>>>>>> faulted in beforehand and that is what the sync_all
was for.
>>>>>> That''s a good point.  (Maybe Xen should have
generated pagefaults when
>>>>>> hypercall arg pointers are bad...)
>>>>> I think it would be a bit tricky to do in practice,
you''d either have to
>>>>> support recursive hypercalls in the middle of other
hypercalls (because
>>>>> the page fault handler is surely going to want to do some)
or proper
>>>>> hypercall restart (so you can fully return to guest context
to handle
>>>>> the fault then retry) or something along those and
complexifying up the
>>>>> hypervisor one way or another. Probably not impossible if
you were
>>>>> building something form the ground up, but not trivial.
>>>> Well, Xen already has the continuation machinery for dealing
with
>>>> hypercall restart, so that could be reused.
>>> That requires special support beyond just calling the continuation
in
>>> each hypercall (often extending into the ABI) for pickling progress
and
>>> picking it up again, only a small number of (usually long running)
>>> hypercalls have that support today. It also uses the guest context
to
>>> store the state which perhaps isn''t helpful if you want to
return to the
>>> guest, although I suppose building a nested frame would work.
>> I guess it depends on how many hypercalls do work before touching guest
>> memory, but any hypercall should be like that anyway, or at least be
>> able to wind back work done if a later read EFAULTs.
>>
>> I was vaguely speculating about a scheme on the lines of:
>>
>>  1. In copy_to/from_user, if we touch a bad address, save it in a
>>     per-vcpu "bad_guest_addr"
>>  2. when returning to the guest, if the errno is EFAULT and
>>     bad_guest_addr is set, then generate a memory fault frame with cr2
>>     bad_guest_addr, and with the exception return restarting the
hypercall
>>
>> Perhaps there should be a EFAULT_RETRY error return to trigger this
>> behaviour, rather than doing it for all EFAULTs, so the faulting
>> behaviour can be added incrementally.
> The kernel uses -ERESTARTSSYS for something similar, doesn''t it?
>
> Does this scheme work if the hypercall causing the exception was itself
> runnnig in an exception handler? I guess it depends on the architecture
> +OSes handling of nested faults.
>
>> Maybe this is a lost cause for x86, but perhaps its worth considering
>> for new ports?
> Certainly worth thinking about.
>
>>> The guys doing paging and sharing etc looked into this and came to
the
>>> conclusion that it would be intractably difficult to do this fully
--
>>> hence we now have the ability to sleep in hypercalls, which works
>>> because the pager/sharer is in a different domain/vcpu.
>> Hmm.  Were they looking at injecting faults back into the guest, or
>> forwarding "missing page" events off to another domain?
> Sharing and swapping are transparent to the domain, another domain runs
> the swapper/unshare process (actually, unshare might be in the h/v
> itself, not sure).
>
>>>>   And accesses to guest
>>>> memory are already special events which must be checked so that
EFAULT
>>>> can be returned.  If, rather than failing with EFAULT Xen set
up a
>>>> pagefault exception for the guest CPU with the return set up to
retry
>>>> the hypercall, it should all work...
>>>>
>>>> Of course, if the guest isn''t expecting that - or its
buggy - then it
>>>> could end up in an infinite loop.  But maybe a flag (set a high
bit in
>>>> the hypercall number?), or a feature, or something?  Might be
worthwhile
>>>> if it saves guests having to do something expensive (like a
>>>> vmalloc_sync_all), even if they have to also deal with old
hypervisors.
>>> The vmalloc_sync_all is a pretty event even on Xen though,
isn''t it?
>> Looks like an important word is missing there.  But its very expensive,
>> if that''s what you''re saying.
> Oops. "rare" was the missing word.Is there any progress on an official patch for this? I have my own
unofficial patch which places a vmalloc_sync_all() after every
alloc_vm_area() call and it works, but from the thread it sounds like
there should be a more sophisticated solution to the problem.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Vrabel

2011-Sep-23 12:49 UTC

head link

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 23/09/11 13:35, Anthony Wright wrote:> 
> Is there any progress on an official patch for this [unsync''d
vmalloc
> address space bug]? I have my own unofficial patch which places a
> vmalloc_sync_all() after every alloc_vm_area() call and it works, but
> from the thread it sounds like there should be a more sophisticated
> solution to the problem.
The simple patch (re-adding the vmalloc_sync_all()) has been applied to
3.1-rc7 and should be in the next 3.0-stable release.

I''m still working on a more elegant fix.

David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2011 - phy disks and vifs timing out in DomU

[Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

[Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] phy disks and vifs timing out in DomU (only on certain hardware)

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU (only on certain hardware)

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] phy disks and vifs timing out in DomU

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)