thr3ads.net - Xen devel - Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could). [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Konrad Rzeszutek Wilk

2012-Mar-27 18:13 UTC

Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

With Xen 4.0 we could boot up dom0 with 1TB of memory. But with
Xen 4.1 that is no longer the case. Any ideas of what might be the culprit?

Please see attached logs.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Mar-28 12:59 UTC

head link

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

>>> On 27.03.12 at 20:13, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> With Xen 4.0 we could boot up dom0 with 1TB of memory. But with
> Xen 4.1 that is no longer the case. Any ideas of what might be the culprit?
> 
> Please see attached logs.
Is this with the same kernel? I suspect not, in particular because of

(XEN)  Phys-Mach map: ffffea0000000000->ffffea007e5acc80

vs

(XEN) Dom0 memory clipped to 130846720 pages

(the former suggesting a kernel making use of XEN_ELFNOTE_INIT_P2M,
i.e. a forward ported one based on ours, the latter suggesting one
that doesn''t, e.g. pv-ops). If booting fails completely, I''d
suppose
the clipping calculation might be off by a few pages. Does output look
the same with "sync_console"? If so, does "watchdog" allow
you to
get a stack trace and register dump of where execution hangs?

I also suppose that the second kernel boots fine when you pass
dom0_mem= with a value below 400G.

Jan

Konrad Rzeszutek Wilk

2012-Mar-28 14:34 UTC

head link

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

On Wed, Mar 28, 2012 at 01:59:07PM +0100, Jan Beulich
wrote:> >>> On 27.03.12 at 20:13, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > With Xen 4.0 we could boot up dom0 with 1TB of memory. But with
> > Xen 4.1 that is no longer the case. Any ideas of what might be the
culprit?
> > 
> > Please see attached logs.
> 
> Is this with the same kernel? I suspect not, in particular because of
No, it is a pvops kernel (the older was a 2.6.32 classic
one).> 
> (XEN)  Phys-Mach map: ffffea0000000000->ffffea007e5acc80
> 
> vs
> 
> (XEN) Dom0 memory clipped to 130846720 pages
> 
> (the former suggesting a kernel making use of XEN_ELFNOTE_INIT_P2M,
> i.e. a forward ported one based on ours, the latter suggesting one
> that doesn''t, e.g. pv-ops). If booting fails completely,
I''d suppose
> the clipping calculation might be off by a few pages. Does output look
> the same with "sync_console"? If so, does "watchdog"
allow you to
> get a stack trace and register dump of where execution hangs?
Ok, will try those out.> 
> I also suppose that the second kernel boots fine when you pass
> dom0_mem= with a value below 400G.
It does indeed. Thought the value that was used was a more conservative
of 4G.> 
> Jan

Jan Beulich

2012-Mar-28 14:53 UTC

head link

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

>>> On 28.03.12 at 16:34, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Mar 28, 2012 at 01:59:07PM +0100, Jan Beulich wrote:
>> >>> On 27.03.12 at 20:13, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> > With Xen 4.0 we could boot up dom0 with 1TB of memory. But with
>> > Xen 4.1 that is no longer the case. Any ideas of what might be the
culprit?
>> > 
>> > Please see attached logs.
>> 
>> Is this with the same kernel? I suspect not, in particular because of
> 
> No, it is a pvops kernel (the older was a 2.6.32 classic one).
>> 
>> (XEN)  Phys-Mach map: ffffea0000000000->ffffea007e5acc80
>> 
>> vs
>> 
>> (XEN) Dom0 memory clipped to 130846720 pages
>> 
>> (the former suggesting a kernel making use of XEN_ELFNOTE_INIT_P2M,
>> i.e. a forward ported one based on ours, the latter suggesting one
>> that doesn''t, e.g. pv-ops). If booting fails completely,
I''d suppose
>> the clipping calculation might be off by a few pages. Does output look
>> the same with "sync_console"? If so, does
"watchdog" allow you to
>> get a stack trace and register dump of where execution hangs?
> 
> Ok, will try those out.
>> 
>> I also suppose that the second kernel boots fine when you pass
>> dom0_mem= with a value below 400G.
> 
> It does indeed. Thought the value that was used was a more conservative
> of 4G.
Okay, so it''s more a kernel shortcoming than a Xen regression (I
suppose that the same kernels used on 4.0/4.1 will behave the
same way irrespective of Xen version).

Jan

Konrad Rzeszutek Wilk

2012-Mar-28 14:55 UTC

head link

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

On Wed, Mar 28, 2012 at 03:53:56PM +0100, Jan Beulich
wrote:> >>> On 28.03.12 at 16:34, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > On Wed, Mar 28, 2012 at 01:59:07PM +0100, Jan Beulich wrote:
> >> >>> On 27.03.12 at 20:13, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> >> > With Xen 4.0 we could boot up dom0 with 1TB of memory. But
with
> >> > Xen 4.1 that is no longer the case. Any ideas of what might
be the culprit?
> >> > 
> >> > Please see attached logs.
> >> 
> >> Is this with the same kernel? I suspect not, in particular because
of
> > 
> > No, it is a pvops kernel (the older was a 2.6.32 classic one).
> >> 
> >> (XEN)  Phys-Mach map: ffffea0000000000->ffffea007e5acc80
> >> 
> >> vs
> >> 
> >> (XEN) Dom0 memory clipped to 130846720 pages
> >> 
> >> (the former suggesting a kernel making use of
XEN_ELFNOTE_INIT_P2M,
> >> i.e. a forward ported one based on ours, the latter suggesting one
> >> that doesn''t, e.g. pv-ops). If booting fails completely,
I''d suppose
> >> the clipping calculation might be off by a few pages. Does output
look
> >> the same with "sync_console"? If so, does
"watchdog" allow you to
> >> get a stack trace and register dump of where execution hangs?
> > 
> > Ok, will try those out.
> >> 
> >> I also suppose that the second kernel boots fine when you pass
> >> dom0_mem= with a value below 400G.
> > 
> > It does indeed. Thought the value that was used was a more
conservative
> > of 4G.
> 
> Okay, so it''s more a kernel shortcoming than a Xen regression (I
> suppose that the same kernels used on 4.0/4.1 will behave the
> same way irrespective of Xen version).
Not sure (on using Xen 4.0 on that box). Once I am done with my bug-list I
was thinking to take a deeper spin on that box. I do remember that we could
only do up to 500GB - but I can''t recall the details.

Jan Beulich

2012-Mar-28 15:12 UTC

head link

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

>>> On 28.03.12 at 16:55, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Mar 28, 2012 at 03:53:56PM +0100, Jan Beulich wrote:
>> >>> On 28.03.12 at 16:34, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> > On Wed, Mar 28, 2012 at 01:59:07PM +0100, Jan Beulich wrote:
>> >> >>> On 27.03.12 at 20:13, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> >> > With Xen 4.0 we could boot up dom0 with 1TB of memory.
But with
>> >> > Xen 4.1 that is no longer the case. Any ideas of what
might be the
> culprit?
>> >> > 
>> >> > Please see attached logs.
>> >> 
>> >> Is this with the same kernel? I suspect not, in particular
because of
>> > 
>> > No, it is a pvops kernel (the older was a 2.6.32 classic one).
>> >> 
>> >> (XEN)  Phys-Mach map: ffffea0000000000->ffffea007e5acc80
>> >> 
>> >> vs
>> >> 
>> >> (XEN) Dom0 memory clipped to 130846720 pages
>> >> 
>> >> (the former suggesting a kernel making use of
XEN_ELFNOTE_INIT_P2M,
>> >> i.e. a forward ported one based on ours, the latter suggesting
one
>> >> that doesn''t, e.g. pv-ops). If booting fails
completely, I''d suppose
>> >> the clipping calculation might be off by a few pages. Does
output look
>> >> the same with "sync_console"? If so, does
"watchdog" allow you to
>> >> get a stack trace and register dump of where execution hangs?
>> > 
>> > Ok, will try those out.
>> >> 
>> >> I also suppose that the second kernel boots fine when you pass
>> >> dom0_mem= with a value below 400G.
>> > 
>> > It does indeed. Thought the value that was used was a more
conservative
>> > of 4G.
>> 
>> Okay, so it''s more a kernel shortcoming than a Xen regression
(I
>> suppose that the same kernels used on 4.0/4.1 will behave the
>> same way irrespective of Xen version).
> 
> Not sure (on using Xen 4.0 on that box). Once I am done with my bug-list I
> was thinking to take a deeper spin on that box. I do remember that we could
> only do up to 500GB - but I can''t recall the details.
The details are: With the initial mapping starting at 2Gb from the top
of address space, and with the initial P2M mapping being part of it,
you just can''t get beyond that. Both meaningfully variable size pieces
of the initial mapping (p2m and initrd) can and should be moved out
of that rather limited address space.

Jan

Xen devel - Mar 2012 - Xen 4.1 regression - can't boot on 1TB anymore (Xen 4.0 could).

Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).

Re: Xen 4.1 regression - can''t boot on 1TB anymore (Xen 4.0 could).