thr3ads.net - Xen devel - [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Kay, Allen M

2011-Jan-25 02:26 UTC

[Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

I''m encountering following boot failure with the latest pvops 2.6.32.27
dom0 and xen staging tree on my system.  Attached file contains the full serial
console log.

Has anyone seen it?

....

init_memory_mapping: 0000000000000000-000000009b000000
init_memory_mapping: 0000000100000000-000000023a6f4000
(XEN) mm.c:802:d0 Bad L1 flags 400000
(XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365
(XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type 100000
0000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:2965:d0 Error while pinning mfn 1c3eb4
(XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0
[ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1.0-rc2-pre  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff8100cd18>]
(XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
(XEN) rax: 00000000ffffffea   rbx: ffff88001454c000   rcx: ffffffff8261f000
(XEN) rdx: 00000000deadbeef   rsi: 00000000deadbeef   rdi: 00000000deadbeef
(XEN) rbp: ffffffff816c1bf8   rsp: ffffffff816c1b98   r8:  00003ffffffff000
(XEN) r9:  ffff880000000000   r10: 00000000deadbeef   r11: 0000000000000000
(XEN) r12: ffffffffff400b10   r13: 0000000000000162   r14: ffffffffff440000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000001c1001000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff816c1b98:
(XEN)    ffffffff8261f000 0000000000000000 ffffffff8100cd18 000000010000e030
(XEN)    0000000000010082 ffffffff816c1bd8 000000000000e02b ffffffff8100cd14
(XEN)    ffffffff00000000 00000000001c3eb4 ffffffff81702c18 ffff88001454c000
(XEN)    ffffffff816c1c18 ffffffff81980f27 ffffffff816c1c38 000000000001454c
(XEN)    ffffffff816c1c38 ffffffff81036cd6 ffffffffff400b10 000000016c400000
(XEN)    ffffffff816c1cd8 ffffffff819ba427 00000001c3350067 8000000000000163
(XEN)    0000000100000001 ffffffffff400000 000000008100cb50 80000000000001e3
(XEN)    8000000000000163 000000023a6f4000 000000016c600000 0000000000000000
(XEN)    ffffffff816c1ca8 000000001454c000 ffffffff816c1cd8 ffff880001002028
(XEN)    ffffffffff400000 0000000140000000 0000000140000000 000000023a6f4000
(XEN)    ffffffff816c1d68 ffffffff819ba65d ffffffff8100c407 8000000000000163
(XEN)    ffff880001002000 0000000000000000 0000000000000000 0000000000000004
(XEN)    00000001816c1d68 0000000000000000 0000000000000000 00000000143e9000
(XEN)    ffffffff8108a086 000000023a6f4000 ffffffff81001880 ffff88023a6f4000
(XEN)    000000023a6f4000 ffff88023a6f4000 ffffffff816c1dc8 ffffffff819ba7b8
(XEN)    ffff880100000000 0000000000000000 ffffffff816c1d98 000000009b000000
(XEN)    ffffffff816c1dc8 ffffffff816c1e20 0000000000000001 0000000000000001
(XEN)    000000023a6f4000 0000000000000000 ffffffff816c1eb8 ffffffff81465e66
(XEN)    0000000000000000 0000000000000000 0000050100000000 ffffffff816c1e88
(XEN)    0000000000000000 0000000100000000 0000000100000000 000000023a6f4000
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-25 14:39 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Mon, Jan 24, 2011 at 06:26:49PM -0800, Kay, Allen M
wrote:> I''m encountering following boot failure with the latest pvops
2.6.32.27 dom0 and xen staging tree on my system.  Attached file contains the
full serial console log.
> 
> Has anyone seen it?
What does the 0xffffffff8100cd18 translate to in your System.map of your Linux
kernel?
> 
> ....
> 
> init_memory_mapping: 0000000000000000-000000009b000000
> init_memory_mapping: 0000000100000000-000000023a6f4000
> (XEN) mm.c:802:d0 Bad L1 flags 400000
> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365
> (XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type
100000
> 0000000000: caf=8000000000000003 taf=1000000000000001
> (XEN) mm.c:2965:d0 Error while pinning mfn 1c3eb4
> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0
[ec=0000]
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.1.0-rc2-pre  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e033:[<ffffffff8100cd18>]
> (XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
> (XEN) rax: 00000000ffffffea   rbx: ffff88001454c000   rcx: ffffffff8261f000
> (XEN) rdx: 00000000deadbeef   rsi: 00000000deadbeef   rdi: 00000000deadbeef
> (XEN) rbp: ffffffff816c1bf8   rsp: ffffffff816c1b98   r8:  00003ffffffff000
> (XEN) r9:  ffff880000000000   r10: 00000000deadbeef   r11: 0000000000000000
> (XEN) r12: ffffffffff400b10   r13: 0000000000000162   r14: ffffffffff440000
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000001c1001000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffffffff816c1b98:
> (XEN)    ffffffff8261f000 0000000000000000 ffffffff8100cd18
000000010000e030
> (XEN)    0000000000010082 ffffffff816c1bd8 000000000000e02b
ffffffff8100cd14
> (XEN)    ffffffff00000000 00000000001c3eb4 ffffffff81702c18
ffff88001454c000
> (XEN)    ffffffff816c1c18 ffffffff81980f27 ffffffff816c1c38
000000000001454c
> (XEN)    ffffffff816c1c38 ffffffff81036cd6 ffffffffff400b10
000000016c400000
> (XEN)    ffffffff816c1cd8 ffffffff819ba427 00000001c3350067
8000000000000163
> (XEN)    0000000100000001 ffffffffff400000 000000008100cb50
80000000000001e3
> (XEN)    8000000000000163 000000023a6f4000 000000016c600000
0000000000000000
> (XEN)    ffffffff816c1ca8 000000001454c000 ffffffff816c1cd8
ffff880001002028
> (XEN)    ffffffffff400000 0000000140000000 0000000140000000
000000023a6f4000
> (XEN)    ffffffff816c1d68 ffffffff819ba65d ffffffff8100c407
8000000000000163
> (XEN)    ffff880001002000 0000000000000000 0000000000000000
0000000000000004
> (XEN)    00000001816c1d68 0000000000000000 0000000000000000
00000000143e9000
> (XEN)    ffffffff8108a086 000000023a6f4000 ffffffff81001880
ffff88023a6f4000
> (XEN)    000000023a6f4000 ffff88023a6f4000 ffffffff816c1dc8
ffffffff819ba7b8
> (XEN)    ffff880100000000 0000000000000000 ffffffff816c1d98
000000009b000000
> (XEN)    ffffffff816c1dc8 ffffffff816c1e20 0000000000000001
0000000000000001
> (XEN)    000000023a6f4000 0000000000000000 ffffffff816c1eb8
ffffffff81465e66
> (XEN)    0000000000000000 0000000000000000 0000050100000000
ffffffff816c1e88
> (XEN)    0000000000000000 0000000100000000 0000000100000000
000000023a6f4000
> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-25 18:49 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

[This email is either empty or too large to be displayed at this time]

Konrad Rzeszutek Wilk

2011-Jan-25 19:07 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M
wrote:> Looks like it translates to pin_pagetable_pfn.  I have also attached the
entire System.map file.
> 
> ...
> ffffffff8100cce2 t pin_pagetable_pfn
> ffffffff8100cd1e t p2m_mid_mfn_init
Ok, then it probably is related to the issues we had with the P2M
or MFN list being incorrect... and your E820:

(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009bc00 (usable)
(XEN)  000000000009bc00 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000020000000 (usable)
(XEN)  0000000020000000 - 0000000020200000 (reserved)
(XEN)  0000000020200000 - 0000000040000000 (usable)
(XEN)  0000000040000000 - 0000000040200000 (reserved)
(XEN)  0000000040200000 - 000000009acd3000 (usable)
(XEN)  000000009acd3000 - 000000009ad67000 (reserved)
(XEN)  000000009ad67000 - 000000009afe7000 (ACPI NVS)
(XEN)  000000009afe7000 - 000000009afff000 (ACPI data)
(XEN)  000000009afff000 - 000000009b000000 (usable)
(XEN)  000000009b000000 - 000000009fa00000 (reserved)
(XEN)  00000000f8000000 - 00000000fc000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed10000 - 00000000fed14000 (reserved)
(XEN)  00000000fed18000 - 00000000fed1a000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff980000 - 00000000ffc00000 (reserved)
(XEN)  00000000ffd80000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 00000001de600000 (usable)

is like swiss-cheese with the RAM regions.
What machine is this and how can I get my hands on it?

Does it boot if you have ''dom0_mem=max:512MB'' (it is important
to have the ''max'' there)?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-25 19:24 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

The machine is an Intel Sandybridge Desktop SDP (Software Development Platform).

Setting ''dom0_mem=max:1024MB'' worked.  Booting with
"dom0_mem=max:512MB" panic''ed in mount_block_root().

Allen

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Tuesday, January 25, 2011 11:08 AM
To: Kay, Allen M
Cc: xen-devel
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M
wrote:> Looks like it translates to pin_pagetable_pfn.  I have also attached the
entire System.map file.
> 
> ...
> ffffffff8100cce2 t pin_pagetable_pfn
> ffffffff8100cd1e t p2m_mid_mfn_init
Ok, then it probably is related to the issues we had with the P2M
or MFN list being incorrect... and your E820:

(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009bc00 (usable)
(XEN)  000000000009bc00 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000020000000 (usable)
(XEN)  0000000020000000 - 0000000020200000 (reserved)
(XEN)  0000000020200000 - 0000000040000000 (usable)
(XEN)  0000000040000000 - 0000000040200000 (reserved)
(XEN)  0000000040200000 - 000000009acd3000 (usable)
(XEN)  000000009acd3000 - 000000009ad67000 (reserved)
(XEN)  000000009ad67000 - 000000009afe7000 (ACPI NVS)
(XEN)  000000009afe7000 - 000000009afff000 (ACPI data)
(XEN)  000000009afff000 - 000000009b000000 (usable)
(XEN)  000000009b000000 - 000000009fa00000 (reserved)
(XEN)  00000000f8000000 - 00000000fc000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed10000 - 00000000fed14000 (reserved)
(XEN)  00000000fed18000 - 00000000fed1a000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff980000 - 00000000ffc00000 (reserved)
(XEN)  00000000ffd80000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 00000001de600000 (usable)

is like swiss-cheese with the RAM regions.
What machine is this and how can I get my hands on it?

Does it boot if you have ''dom0_mem=max:512MB'' (it is important
to have the ''max'' there)?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-25 20:10 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, Jan 25, 2011 at 11:24:54AM -0800, Kay, Allen M
wrote:> The machine is an Intel Sandybridge Desktop SDP (Software Development
Platform).
> 
> Setting ''dom0_mem=max:1024MB'' worked.  Booting with
"dom0_mem=max:512MB" panic''ed in mount_block_root().
OK, do you see anything on the Xen console (if you up the debug options?)

I wondering if you see something akin to this:

(XEN) mm.c:889:d0 Error getting mfn 110000 (pfn 5555555555555555) from L1 entry
8000000110000463 for l1e_owner=0, pg_owner=0
(Xrror while pinning mfn 20c8c0
> 
> Allen
> 
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Tuesday, January 25, 2011 11:08 AM
> To: Kay, Allen M
> Cc: xen-devel
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> 
> On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote:
> > Looks like it translates to pin_pagetable_pfn.  I have also attached
the entire System.map file.
> > 
> > ...
> > ffffffff8100cce2 t pin_pagetable_pfn
> > ffffffff8100cd1e t p2m_mid_mfn_init
> 
> Ok, then it probably is related to the issues we had with the P2M
> or MFN list being incorrect... and your E820:
> 
> (XEN) Xen-e820 RAM map:
> (XEN)  0000000000000000 - 000000000009bc00 (usable)
> (XEN)  000000000009bc00 - 00000000000a0000 (reserved)
> (XEN)  00000000000e0000 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 0000000020000000 (usable)
> (XEN)  0000000020000000 - 0000000020200000 (reserved)
> (XEN)  0000000020200000 - 0000000040000000 (usable)
> (XEN)  0000000040000000 - 0000000040200000 (reserved)
> (XEN)  0000000040200000 - 000000009acd3000 (usable)
> (XEN)  000000009acd3000 - 000000009ad67000 (reserved)
> (XEN)  000000009ad67000 - 000000009afe7000 (ACPI NVS)
> (XEN)  000000009afe7000 - 000000009afff000 (ACPI data)
> (XEN)  000000009afff000 - 000000009b000000 (usable)
> (XEN)  000000009b000000 - 000000009fa00000 (reserved)
> (XEN)  00000000f8000000 - 00000000fc000000 (reserved)
> (XEN)  00000000fec00000 - 00000000fec01000 (reserved)
> (XEN)  00000000fed10000 - 00000000fed14000 (reserved)
> (XEN)  00000000fed18000 - 00000000fed1a000 (reserved)
> (XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
> (XEN)  00000000ff980000 - 00000000ffc00000 (reserved)
> (XEN)  00000000ffd80000 - 0000000100000000 (reserved)
> (XEN)  0000000100000000 - 00000001de600000 (usable)
> 
> is like swiss-cheese with the RAM regions.
> What machine is this and how can I get my hands on it?
> 
> Does it boot if you have ''dom0_mem=max:512MB'' (it is
important
> to have the ''max'' there)?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-25 21:26 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

I do not see any message from mm.c if dom0_mem param is used.  If dom0_mem is
not used, then I see following error messages in the serial console log.  It is
part of the log I sent out in my original bug report:

(XEN) mm.c:802:d0 Bad L1 flags 400000
(XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365
(XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type 100000

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Tuesday, January 25, 2011 12:10 PM
To: Kay, Allen M
Cc: xen-devel
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, Jan 25, 2011 at 11:24:54AM -0800, Kay, Allen M
wrote:> The machine is an Intel Sandybridge Desktop SDP (Software Development
Platform).
> 
> Setting ''dom0_mem=max:1024MB'' worked.  Booting with
"dom0_mem=max:512MB" panic''ed in mount_block_root().
OK, do you see anything on the Xen console (if you up the debug options?)

I wondering if you see something akin to this:

(XEN) mm.c:889:d0 Error getting mfn 110000 (pfn 5555555555555555) from L1 entry
8000000110000463 for l1e_owner=0, pg_owner=0
(Xrror while pinning mfn 20c8c0
> 
> Allen
> 
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Tuesday, January 25, 2011 11:08 AM
> To: Kay, Allen M
> Cc: xen-devel
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> 
> On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote:
> > Looks like it translates to pin_pagetable_pfn.  I have also attached
the entire System.map file.
> > 
> > ...
> > ffffffff8100cce2 t pin_pagetable_pfn
> > ffffffff8100cd1e t p2m_mid_mfn_init
> 
> Ok, then it probably is related to the issues we had with the P2M
> or MFN list being incorrect... and your E820:
> 
> (XEN) Xen-e820 RAM map:
> (XEN)  0000000000000000 - 000000000009bc00 (usable)
> (XEN)  000000000009bc00 - 00000000000a0000 (reserved)
> (XEN)  00000000000e0000 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 0000000020000000 (usable)
> (XEN)  0000000020000000 - 0000000020200000 (reserved)
> (XEN)  0000000020200000 - 0000000040000000 (usable)
> (XEN)  0000000040000000 - 0000000040200000 (reserved)
> (XEN)  0000000040200000 - 000000009acd3000 (usable)
> (XEN)  000000009acd3000 - 000000009ad67000 (reserved)
> (XEN)  000000009ad67000 - 000000009afe7000 (ACPI NVS)
> (XEN)  000000009afe7000 - 000000009afff000 (ACPI data)
> (XEN)  000000009afff000 - 000000009b000000 (usable)
> (XEN)  000000009b000000 - 000000009fa00000 (reserved)
> (XEN)  00000000f8000000 - 00000000fc000000 (reserved)
> (XEN)  00000000fec00000 - 00000000fec01000 (reserved)
> (XEN)  00000000fed10000 - 00000000fed14000 (reserved)
> (XEN)  00000000fed18000 - 00000000fed1a000 (reserved)
> (XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
> (XEN)  00000000ff980000 - 00000000ffc00000 (reserved)
> (XEN)  00000000ffd80000 - 0000000100000000 (reserved)
> (XEN)  0000000100000000 - 00000001de600000 (usable)
> 
> is like swiss-cheese with the RAM regions.
> What machine is this and how can I get my hands on it?
> 
> Does it boot if you have ''dom0_mem=max:512MB'' (it is
important
> to have the ''max'' there)?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-26 02:41 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

I noticed one of my e820 entry is not page aligned:
> (XEN)  0000000000000000 - 000000000009bc00 (usable)
It might be similar to the problem reported by Michael Young in attached email.

-----Original Message-----
From: Kay, Allen M 
Sent: Tuesday, January 25, 2011 1:26 PM
To: ''Konrad Rzeszutek Wilk''
Cc: xen-devel
Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

I do not see any message from mm.c if dom0_mem param is used.  If dom0_mem is
not used, then I see following error messages in the serial console log.  It is
part of the log I sent out in my original bug report:

(XEN) mm.c:802:d0 Bad L1 flags 400000
(XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365
(XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type 100000

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Tuesday, January 25, 2011 12:10 PM
To: Kay, Allen M
Cc: xen-devel
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, Jan 25, 2011 at 11:24:54AM -0800, Kay, Allen M
wrote:> The machine is an Intel Sandybridge Desktop SDP (Software Development
Platform).
> 
> Setting ''dom0_mem=max:1024MB'' worked.  Booting with
"dom0_mem=max:512MB" panic''ed in mount_block_root().
OK, do you see anything on the Xen console (if you up the debug options?)

I wondering if you see something akin to this:

(XEN) mm.c:889:d0 Error getting mfn 110000 (pfn 5555555555555555) from L1 entry
8000000110000463 for l1e_owner=0, pg_owner=0
(Xrror while pinning mfn 20c8c0
> 
> Allen
> 
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Tuesday, January 25, 2011 11:08 AM
> To: Kay, Allen M
> Cc: xen-devel
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> 
> On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote:
> > Looks like it translates to pin_pagetable_pfn.  I have also attached
the entire System.map file.
> > 
> > ...
> > ffffffff8100cce2 t pin_pagetable_pfn
> > ffffffff8100cd1e t p2m_mid_mfn_init
> 
> Ok, then it probably is related to the issues we had with the P2M
> or MFN list being incorrect... and your E820:
> 
> (XEN) Xen-e820 RAM map:
> (XEN)  0000000000000000 - 000000000009bc00 (usable)
> (XEN)  000000000009bc00 - 00000000000a0000 (reserved)
> (XEN)  00000000000e0000 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 0000000020000000 (usable)
> (XEN)  0000000020000000 - 0000000020200000 (reserved)
> (XEN)  0000000020200000 - 0000000040000000 (usable)
> (XEN)  0000000040000000 - 0000000040200000 (reserved)
> (XEN)  0000000040200000 - 000000009acd3000 (usable)
> (XEN)  000000009acd3000 - 000000009ad67000 (reserved)
> (XEN)  000000009ad67000 - 000000009afe7000 (ACPI NVS)
> (XEN)  000000009afe7000 - 000000009afff000 (ACPI data)
> (XEN)  000000009afff000 - 000000009b000000 (usable)
> (XEN)  000000009b000000 - 000000009fa00000 (reserved)
> (XEN)  00000000f8000000 - 00000000fc000000 (reserved)
> (XEN)  00000000fec00000 - 00000000fec01000 (reserved)
> (XEN)  00000000fed10000 - 00000000fed14000 (reserved)
> (XEN)  00000000fed18000 - 00000000fed1a000 (reserved)
> (XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
> (XEN)  00000000ff980000 - 00000000ffc00000 (reserved)
> (XEN)  00000000ffd80000 - 0000000100000000 (reserved)
> (XEN)  0000000100000000 - 00000001de600000 (usable)
> 
> is like swiss-cheese with the RAM regions.
> What machine is this and how can I get my hands on it?
> 
> Does it boot if you have ''dom0_mem=max:512MB'' (it is
important
> to have the ''max'' there)?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-26 16:14 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M
wrote:> I noticed one of my e820 entry is not page aligned:
> 
> > (XEN)  0000000000000000 - 000000000009bc00 (usable)
> 
> It might be similar to the problem reported by Michael Young in attached
email.
Did you try their patch?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-26 18:46 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

I just tried it and it can now boot successfully without the need for
dom0_mem=max:1024MB parameter.

Is the patch going to be checked into pvops tree?  It does not seems to be in
2.6.32.27 dom0 pvops tree yet.

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Wednesday, January 26, 2011 8:14 AM
To: Kay, Allen M
Cc: xen-devel
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M
wrote:> I noticed one of my e820 entry is not page aligned:
> 
> > (XEN)  0000000000000000 - 000000000009bc00 (usable)
> 
> It might be similar to the problem reported by Michael Young in attached
email.
Did you try their patch?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-26 21:28 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Wed, Jan 26, 2011 at 10:46:13AM -0800, Kay, Allen M
wrote:> I just tried it and it can now boot successfully without the need for
dom0_mem=max:1024MB parameter.
Woot! Great.> 
> Is the patch going to be checked into pvops tree?  It does not seems to be
in 2.6.32.27 dom0 pvops tree yet.
It is 2.6.38. Hadn''t yet done it for 2.6.32 - let me spin up a patch
for it.
Can I put a Tested-by on it from you?
> 
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Wednesday, January 26, 2011 8:14 AM
> To: Kay, Allen M
> Cc: xen-devel
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> 
> On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote:
> > I noticed one of my e820 entry is not page aligned:
> > 
> > > (XEN)  0000000000000000 - 000000000009bc00 (usable)
> > 
> > It might be similar to the problem reported by Michael Young in
attached email.
> 
> Did you try their patch?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-26 21:53 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

> Can I put a Tested-by on it from you?
Sure.  I have attached the Stefano''s patch I used just to make sure
that we are referring to the same patch.

I would also like to use 2.6.38.  What commands should I use to pull it?  It is
not clear to me from readying pvops wiki page
(http://wiki.xensource.com/xenwiki/XenParavirtOps).

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Wednesday, January 26, 2011 1:29 PM
To: Kay, Allen M
Cc: xen-devel
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Wed, Jan 26, 2011 at 10:46:13AM -0800, Kay, Allen M
wrote:> I just tried it and it can now boot successfully without the need for
dom0_mem=max:1024MB parameter.
Woot! Great.> 
> Is the patch going to be checked into pvops tree?  It does not seems to be
in 2.6.32.27 dom0 pvops tree yet.
It is 2.6.38. Hadn''t yet done it for 2.6.32 - let me spin up a patch
for it.
Can I put a Tested-by on it from you?
> 
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Wednesday, January 26, 2011 8:14 AM
> To: Kay, Allen M
> Cc: xen-devel
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> 
> On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote:
> > I noticed one of my e820 entry is not page aligned:
> > 
> > > (XEN)  0000000000000000 - 000000000009bc00 (usable)
> > 
> > It might be similar to the problem reported by Michael Young in
attached email.
> 
> Did you try their patch?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-27 01:16 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Looks like I spoke too soon.  I have encountered additional failures even with
the Stefano''s patch in subsequent boots.

-----Original Message-----
From: Kay, Allen M 
Sent: Wednesday, January 26, 2011 1:54 PM
To: ''Konrad Rzeszutek Wilk''
Cc: xen-devel
Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> Can I put a Tested-by on it from you?
Sure.  I have attached the Stefano''s patch I used just to make sure
that we are referring to the same patch.

I would also like to use 2.6.38.  What commands should I use to pull it?  It is
not clear to me from readying pvops wiki page
(http://wiki.xensource.com/xenwiki/XenParavirtOps).

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Wednesday, January 26, 2011 1:29 PM
To: Kay, Allen M
Cc: xen-devel
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Wed, Jan 26, 2011 at 10:46:13AM -0800, Kay, Allen M
wrote:> I just tried it and it can now boot successfully without the need for
dom0_mem=max:1024MB parameter.
Woot! Great.> 
> Is the patch going to be checked into pvops tree?  It does not seems to be
in 2.6.32.27 dom0 pvops tree yet.
It is 2.6.38. Hadn''t yet done it for 2.6.32 - let me spin up a patch
for it.
Can I put a Tested-by on it from you?
> 
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Wednesday, January 26, 2011 8:14 AM
> To: Kay, Allen M
> Cc: xen-devel
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> 
> On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote:
> > I noticed one of my e820 entry is not page aligned:
> > 
> > > (XEN)  0000000000000000 - 000000000009bc00 (usable)
> > 
> > It might be similar to the problem reported by Michael Young in
attached email.
> 
> Did you try their patch?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Jan-27 11:59 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Thu, 27 Jan 2011, Kay, Allen M wrote:> Looks like I spoke too soon.  I have encountered additional failures even
with the Stefano''s patch in subsequent boots.
> 
Could you please post the error you are getting? The full xen+kernel
serial output would be nice.

> -----Original Message-----
> From: Kay, Allen M 
> Sent: Wednesday, January 26, 2011 1:54 PM
> To: ''Konrad Rzeszutek Wilk''
> Cc: xen-devel
> Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> 
> > Can I put a Tested-by on it from you?
> 
> Sure.  I have attached the Stefano''s patch I used just to make
sure that we are referring to the same patch.
> 
> I would also like to use 2.6.38.  What commands should I use to pull it? 
It is not clear to me from readying pvops wiki page
(http://wiki.xensource.com/xenwiki/XenParavirtOps).
> 
try this branch:

git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-rc2-fixes

it is 2.6.38-rc2 plus three bug fixes. This branch is able to boot on
all my testboxes.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-27 14:45 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

> > Can I put a Tested-by on it from you?
> 
> Sure.  I have attached the Stefano''s patch I used just to make
sure that we are referring to the same patch.
> 
> I would also like to use 2.6.38.  What commands should I use to pull it? 
It is not clear to me from readying pvops wiki page
(http://wiki.xensource.com/xenwiki/XenParavirtOps).
Well, you can just run the vanilla one -- BUT it is still work in progress so
any
bugs you find  - well - you might have better luck just fixing them yourself
until we have gotten past most of the bootup problems.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Jan-27 18:51 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

[This email is either empty or too large to be displayed at this time]

Konrad Rzeszutek Wilk

2011-Jan-28 15:28 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M
wrote:> Following are the brief error messages from the serial console log.  I have
also attached the full serial console log and dom0 system map.
> 
> (XEN) mm.c:802:d0 Bad L1 flags 400000
On a second look, this is a different issue than I had encountered.

The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that
is not right. Googling for this shows that I had fixed this with a
Xorg server at some point, but I can''t remember the details so that is
not
that useful :-(

You said it works if you give the domain 1024MB, but I wonder if
it also works if you disable the IOMMU? What happens then?
> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type
1000000
> 000000000: caf=8000000000000003 taf=1000000000000001
> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0
[ec=0000
> ]
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-28 15:47 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk
wrote:> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
> > Following are the brief error messages from the serial console log.  I
have also attached the full serial console log and dom0 system map.
> > 
> > (XEN) mm.c:802:d0 Bad L1 flags 400000
> 
> On a second look, this is a different issue than I had encountered.
> 
> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that
> is not right. Googling for this shows that I had fixed this with a
> Xorg server at some point, but I can''t remember the details so
that is not
> that useful :-(
> 
> You said it works if you give the domain 1024MB, but I wonder if
> it also works if you disable the IOMMU? What happens then?
Can you also patch your Xen hypervisor with this patch? It will print out the
other 89 entries so we can see what type of values they have.. You might need to
move it a bit as this is for xen-unstable.

diff -r 003acf02d416 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
+++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
@@ -1201,11 +1201,12 @@
     return 0;
 
  fail:
-    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
+    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other
L1 values:", i, pfn);
     while ( i-- > 0 )
-        if ( is_guest_l1_slot(i) )
+        if ( is_guest_l1_slot(i) ) {
+            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
             put_page_from_l1e(pl1e[i], d);
-
+	}
     unmap_domain_page(pl1e);
     return -EINVAL;
 }
> 
> > (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
> > (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for
type 1000000
> > 000000000: caf=8000000000000003 taf=1000000000000001
> > (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
> > (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU
0 [ec=0000
> > ]
> > (XEN) domain_crash_sync called from entry.S
> > (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Feb-11 01:03 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Konrad/Stefano,

Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few
weeks ago.

I finally got around to narrow down the problem the call to xen_add_extra_mem()
in arch/x86/xen/setup.c/xen_memory_setup().  This call increase the top of E820
memory in dom0 beyond what is actually available.

Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is:

    0000000100000000 - 000000016b45a000 (usable)

After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes:

    0000000100000000 - 000000023a6f4000 (usable)

This pushes the top of RAM beyond what was reported by Xen''s e820
table, which is:

(XEN)  0000000100000000 - 00000001de600000 (usable)

AFAICT, the failure is caused by dom0 accessing non-existent physical memory. 
The failure went away after I removed the call to xen_add_extra_mem().

Another potential problem I noticed with e820 processing is that there is a
discrepancy between how Xen processes e820 and how dom0 does it.  In Xen
(arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT
boundary while dom0 e820 code does not.  As a result, one of my e820 entry that
is 1 page in size got dropped by Xen but got picked up in dom0.  This does not
cause problem in my case but the inconsistency on how memory is used by xen and
dom0 can potentially be a problem.

Allen

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Friday, January 28, 2011 7:48 AM
To: Kay, Allen M
Cc: xen-devel; Stefano Stabellini
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk
wrote:> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
> > Following are the brief error messages from the serial console log.  I
have also attached the full serial console log and dom0 system map.
> > 
> > (XEN) mm.c:802:d0 Bad L1 flags 400000
> 
> On a second look, this is a different issue than I had encountered.
> 
> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that
> is not right. Googling for this shows that I had fixed this with a
> Xorg server at some point, but I can''t remember the details so
that is not
> that useful :-(
> 
> You said it works if you give the domain 1024MB, but I wonder if
> it also works if you disable the IOMMU? What happens then?
Can you also patch your Xen hypervisor with this patch? It will print out the
other 89 entries so we can see what type of values they have.. You might need to
move it a bit as this is for xen-unstable.

diff -r 003acf02d416 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
+++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
@@ -1201,11 +1201,12 @@
     return 0;
 
  fail:
-    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
+    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other
L1 values:", i, pfn);
     while ( i-- > 0 )
-        if ( is_guest_l1_slot(i) )
+        if ( is_guest_l1_slot(i) ) {
+            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
             put_page_from_l1e(pl1e[i], d);
-
+	}
     unmap_domain_page(pl1e);
     return -EINVAL;
 }
> 
> > (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
> > (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for
type 1000000
> > 000000000: caf=8000000000000003 taf=1000000000000001
> > (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
> > (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU
0 [ec=0000
> > ]
> > (XEN) domain_crash_sync called from entry.S
> > (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Feb-11 02:56 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On 02/10/2011 05:03 PM, Kay, Allen M wrote:> Konrad/Stefano,
>
> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported
a few weeks ago.
>
> I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
>
> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is:
>
>     0000000100000000 - 000000016b45a000 (usable)
>
> After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
>
>     0000000100000000 - 000000023a6f4000 (usable)
>
> This pushes the top of RAM beyond what was reported by Xen''s e820
table, which is:
>
> (XEN)  0000000100000000 - 00000001de600000 (usable)
>
> AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
That "extra memory" stuff is reserving some physical address space for
ballooning.  It should be completely unused (and unbacked by any pages)
until the balloon driver populates it; it is reserved memory in the
meantime.

How is that memory getting referenced in your case?
> Another potential problem I noticed with e820 processing is that there is a
discrepancy between how Xen processes e820 and how dom0 does it.  In Xen
(arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT
boundary while dom0 e820 code does not.  As a result, one of my e820 entry that
is 1 page in size got dropped by Xen but got picked up in dom0.  This does not
cause problem in my case but the inconsistency on how memory is used by xen and
dom0 can potentially be a problem.
I don''t think that matters.  Xen can choose not to use non-2M aligned
pieces of memory if it wants, but that doesn''t really affect the dom0
kernel''s use of the host E820, because dom0 is only looking for
possible
device memory, rather than RAM.

    J> Allen
>
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Friday, January 28, 2011 7:48 AM
> To: Kay, Allen M
> Cc: xen-devel; Stefano Stabellini
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>> Following are the brief error messages from the serial console log.
I have also attached the full serial console log and dom0 system map.
>>>
>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>> On a second look, this is a different issue than I had encountered.
>>
>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that
>> is not right. Googling for this shows that I had fixed this with a
>> Xorg server at some point, but I can''t remember the details so
that is not
>> that useful :-(
>>
>> You said it works if you give the domain 1024MB, but I wonder if
>> it also works if you disable the IOMMU? What happens then?
> Can you also patch your Xen hypervisor with this patch? It will print out
the
> other 89 entries so we can see what type of values they have.. You might
need to
> move it a bit as this is for xen-unstable.
>
> diff -r 003acf02d416 xen/arch/x86/mm.c
> --- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
> +++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
> @@ -1201,11 +1201,12 @@
>      return 0;
>  
>   fail:
> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx).
Other L1 values:", i, pfn);
>      while ( i-- > 0 )
> -        if ( is_guest_l1_slot(i) )
> +        if ( is_guest_l1_slot(i) ) {
> +            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
>              put_page_from_l1e(pl1e[i], d);
> -
> +	}
>      unmap_domain_page(pl1e);
>      return -EINVAL;
>  }
>
>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for
type 1000000
>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on
VCPU 0 [ec=0000
>>> ]
>>> (XEN) domain_crash_sync called from entry.S
>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Feb-11 03:07 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

> That "extra memory" stuff is reserving some physical address
space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
On my system, the entire chunk is marked as usable memory:

    0000000100000000 - 000000023a6f4000 (usable)

When you said it is reserved memory, are you saying it should be marked as
"reserved" or is there somewhere else in the code that keeps track of
which portion of this e820 chunk is back by real memory and which chunk is
"extra memory"?


-----Original Message-----
From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
Sent: Thursday, February 10, 2011 6:56 PM
To: Kay, Allen M
Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On 02/10/2011 05:03 PM, Kay, Allen M wrote:> Konrad/Stefano,
>
> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported
a few weeks ago.
>
> I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
>
> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is:
>
>     0000000100000000 - 000000016b45a000 (usable)
>
> After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
>
>     0000000100000000 - 000000023a6f4000 (usable)
>
> This pushes the top of RAM beyond what was reported by Xen''s e820
table, which is:
>
> (XEN)  0000000100000000 - 00000001de600000 (usable)
>
> AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
That "extra memory" stuff is reserving some physical address space for
ballooning.  It should be completely unused (and unbacked by any pages)
until the balloon driver populates it; it is reserved memory in the
meantime.

How is that memory getting referenced in your case?
> Another potential problem I noticed with e820 processing is that there is a
discrepancy between how Xen processes e820 and how dom0 does it.  In Xen
(arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT
boundary while dom0 e820 code does not.  As a result, one of my e820 entry that
is 1 page in size got dropped by Xen but got picked up in dom0.  This does not
cause problem in my case but the inconsistency on how memory is used by xen and
dom0 can potentially be a problem.
I don''t think that matters.  Xen can choose not to use non-2M aligned
pieces of memory if it wants, but that doesn''t really affect the dom0
kernel''s use of the host E820, because dom0 is only looking for
possible
device memory, rather than RAM.

    J> Allen
>
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
> Sent: Friday, January 28, 2011 7:48 AM
> To: Kay, Allen M
> Cc: xen-devel; Stefano Stabellini
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>> Following are the brief error messages from the serial console log.
I have also attached the full serial console log and dom0 system map.
>>>
>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>> On a second look, this is a different issue than I had encountered.
>>
>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that
>> is not right. Googling for this shows that I had fixed this with a
>> Xorg server at some point, but I can''t remember the details so
that is not
>> that useful :-(
>>
>> You said it works if you give the domain 1024MB, but I wonder if
>> it also works if you disable the IOMMU? What happens then?
> Can you also patch your Xen hypervisor with this patch? It will print out
the
> other 89 entries so we can see what type of values they have.. You might
need to
> move it a bit as this is for xen-unstable.
>
> diff -r 003acf02d416 xen/arch/x86/mm.c
> --- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
> +++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
> @@ -1201,11 +1201,12 @@
>      return 0;
>  
>   fail:
> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx).
Other L1 values:", i, pfn);
>      while ( i-- > 0 )
> -        if ( is_guest_l1_slot(i) )
> +        if ( is_guest_l1_slot(i) ) {
> +            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
>              put_page_from_l1e(pl1e[i], d);
> -
> +	}
>      unmap_domain_page(pl1e);
>      return -EINVAL;
>  }
>
>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for
type 1000000
>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on
VCPU 0 [ec=0000
>>> ]
>>> (XEN) domain_crash_sync called from entry.S
>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Feb-11 14:51 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Fri, 11 Feb 2011, Jeremy Fitzhardinge wrote:> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
> > Konrad/Stefano,
> >
> > Getting back to the xen/dom0 boot failure on my Sandybridge SDP I
reported a few weeks ago.
> >
> > I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
> >
> > Before xen_add_extra_mem() is called, the last entry of dom0 e820
table is:
> >
> >     0000000100000000 - 000000016b45a000 (usable)
> >
> > After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
> >
> >     0000000100000000 - 000000023a6f4000 (usable)
> >
> > This pushes the top of RAM beyond what was reported by Xen''s
e820 table, which is:
> >
> > (XEN)  0000000100000000 - 00000001de600000 (usable)
> >
> > AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
> 
> That "extra memory" stuff is reserving some physical address
space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
> 
> How is that memory getting referenced in your case?
> 
In particular it would be very interesting to know what the RIP of the
crash resolves to.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Feb-11 17:06 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address
space for
>> ballooning.  It should be completely unused (and unbacked by any pages)
>> until the balloon driver populates it; it is reserved memory in the
>> meantime.
> On my system, the entire chunk is marked as usable memory:
>
>     0000000100000000 - 000000023a6f4000 (usable)
>
> When you said it is reserved memory, are you saying it should be marked as
"reserved" or is there somewhere else in the code that keeps track of
which portion of this e820 chunk is back by real memory and which chunk is
"extra memory"?
Yes, it is marked as usable in the E820 so that the kernel will allocate
page structures for it.  But then the extra part is reserved with
memblock_x86_reserve_range(), which should prevent the kernel from ever
trying to use that memory (ie, it will never get added to the pools of
memory the allocator allocates from).  The balloon driver backs these
pseudo-physical pageframes with real memory pages, and then releases
into the pool for allocation.

    J
> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
> Sent: Thursday, February 10, 2011 6:56 PM
> To: Kay, Allen M
> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
>> Konrad/Stefano,
>>
>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I
reported a few weeks ago.
>>
>> I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
>>
>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table
is:
>>
>>     0000000100000000 - 000000016b45a000 (usable)
>>
>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
>>
>>     0000000100000000 - 000000023a6f4000 (usable)
>>
>> This pushes the top of RAM beyond what was reported by Xen''s
e820 table, which is:
>>
>> (XEN)  0000000100000000 - 00000001de600000 (usable)
>>
>> AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
> That "extra memory" stuff is reserving some physical address
space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
>
> How is that memory getting referenced in your case?
>
>> Another potential problem I noticed with e820 processing is that there
is a discrepancy between how Xen processes e820 and how dom0 does it.  In Xen
(arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT
boundary while dom0 e820 code does not.  As a result, one of my e820 entry that
is 1 page in size got dropped by Xen but got picked up in dom0.  This does not
cause problem in my case but the inconsistency on how memory is used by xen and
dom0 can potentially be a problem.
> I don''t think that matters.  Xen can choose not to use non-2M
aligned
> pieces of memory if it wants, but that doesn''t really affect the
dom0
> kernel''s use of the host E820, because dom0 is only looking for
possible
> device memory, rather than RAM.
>
>     J
>> Allen
>>
>> -----Original Message-----
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
>> Sent: Friday, January 28, 2011 7:48 AM
>> To: Kay, Allen M
>> Cc: xen-devel; Stefano Stabellini
>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot
failure
>>
>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>>> Following are the brief error messages from the serial console
log.  I have also attached the full serial console log and dom0 system map.
>>>>
>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>>> On a second look, this is a different issue than I had encountered.
>>>
>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but
that
>>> is not right. Googling for this shows that I had fixed this with a
>>> Xorg server at some point, but I can''t remember the
details so that is not
>>> that useful :-(
>>>
>>> You said it works if you give the domain 1024MB, but I wonder if
>>> it also works if you disable the IOMMU? What happens then?
>> Can you also patch your Xen hypervisor with this patch? It will print
out the
>> other 89 entries so we can see what type of values they have.. You
might need to
>> move it a bit as this is for xen-unstable.
>>
>> diff -r 003acf02d416 xen/arch/x86/mm.c
>> --- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
>> +++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
>> @@ -1201,11 +1201,12 @@
>>      return 0;
>>  
>>   fail:
>> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn:
%lx). Other L1 values:", i, pfn);
>>      while ( i-- > 0 )
>> -        if ( is_guest_l1_slot(i) )
>> +        if ( is_guest_l1_slot(i) ) {
>> +            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
>>              put_page_from_l1e(pl1e[i], d);
>> -
>> +	}
>>      unmap_domain_page(pl1e);
>>      return -EINVAL;
>>  }
>>
>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69)
for type 1000000
>>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6]
on VCPU 0 [ec=0000
>>>> ]
>>>> (XEN) domain_crash_sync called from entry.S
>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Feb-11 19:00 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops
dom0.  I did find it in Konrad''s tree at
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git.

So is this a problem for 2.6.32.27 stable tree?  If so, which pvops dom0 tree
should I be using?

Allen

-----Original Message-----
From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
Sent: Friday, February 11, 2011 9:07 AM
To: Kay, Allen M
Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address
space for
>> ballooning.  It should be completely unused (and unbacked by any pages)
>> until the balloon driver populates it; it is reserved memory in the
>> meantime.
> On my system, the entire chunk is marked as usable memory:
>
>     0000000100000000 - 000000023a6f4000 (usable)
>
> When you said it is reserved memory, are you saying it should be marked as
"reserved" or is there somewhere else in the code that keeps track of
which portion of this e820 chunk is back by real memory and which chunk is
"extra memory"?
Yes, it is marked as usable in the E820 so that the kernel will allocate
page structures for it.  But then the extra part is reserved with
memblock_x86_reserve_range(), which should prevent the kernel from ever
trying to use that memory (ie, it will never get added to the pools of
memory the allocator allocates from).  The balloon driver backs these
pseudo-physical pageframes with real memory pages, and then releases
into the pool for allocation.

    J
> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
> Sent: Thursday, February 10, 2011 6:56 PM
> To: Kay, Allen M
> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
>> Konrad/Stefano,
>>
>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I
reported a few weeks ago.
>>
>> I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
>>
>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table
is:
>>
>>     0000000100000000 - 000000016b45a000 (usable)
>>
>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
>>
>>     0000000100000000 - 000000023a6f4000 (usable)
>>
>> This pushes the top of RAM beyond what was reported by Xen''s
e820 table, which is:
>>
>> (XEN)  0000000100000000 - 00000001de600000 (usable)
>>
>> AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
> That "extra memory" stuff is reserving some physical address
space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
>
> How is that memory getting referenced in your case?
>
>> Another potential problem I noticed with e820 processing is that there
is a discrepancy between how Xen processes e820 and how dom0 does it.  In Xen
(arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT
boundary while dom0 e820 code does not.  As a result, one of my e820 entry that
is 1 page in size got dropped by Xen but got picked up in dom0.  This does not
cause problem in my case but the inconsistency on how memory is used by xen and
dom0 can potentially be a problem.
> I don''t think that matters.  Xen can choose not to use non-2M
aligned
> pieces of memory if it wants, but that doesn''t really affect the
dom0
> kernel''s use of the host E820, because dom0 is only looking for
possible
> device memory, rather than RAM.
>
>     J
>> Allen
>>
>> -----Original Message-----
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
>> Sent: Friday, January 28, 2011 7:48 AM
>> To: Kay, Allen M
>> Cc: xen-devel; Stefano Stabellini
>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot
failure
>>
>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>>> Following are the brief error messages from the serial console
log.  I have also attached the full serial console log and dom0 system map.
>>>>
>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>>> On a second look, this is a different issue than I had encountered.
>>>
>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but
that
>>> is not right. Googling for this shows that I had fixed this with a
>>> Xorg server at some point, but I can''t remember the
details so that is not
>>> that useful :-(
>>>
>>> You said it works if you give the domain 1024MB, but I wonder if
>>> it also works if you disable the IOMMU? What happens then?
>> Can you also patch your Xen hypervisor with this patch? It will print
out the
>> other 89 entries so we can see what type of values they have.. You
might need to
>> move it a bit as this is for xen-unstable.
>>
>> diff -r 003acf02d416 xen/arch/x86/mm.c
>> --- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
>> +++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
>> @@ -1201,11 +1201,12 @@
>>      return 0;
>>  
>>   fail:
>> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn:
%lx). Other L1 values:", i, pfn);
>>      while ( i-- > 0 )
>> -        if ( is_guest_l1_slot(i) )
>> +        if ( is_guest_l1_slot(i) ) {
>> +            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
>>              put_page_from_l1e(pl1e[i], d);
>> -
>> +	}
>>      unmap_domain_page(pl1e);
>>      return -EINVAL;
>>  }
>>
>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69)
for type 1000000
>>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6]
on VCPU 0 [ec=0000
>>>> ]
>>>> (XEN) domain_crash_sync called from entry.S
>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Feb-11 19:11 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

By the way, I pulled 2.6.32.27 from
git://git.kernel.org/pub/scm/linux/Jeremy/xen.git.

-----Original Message-----
From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Kay, Allen M
Sent: Friday, February 11, 2011 11:01 AM
To: Jeremy Fitzhardinge
Cc: Stefano Stabellini; Keir Fraser; xen-devel; Konrad Rzeszutek Wilk
Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops
dom0.  I did find it in Konrad''s tree at
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git.

So is this a problem for 2.6.32.27 stable tree?  If so, which pvops dom0 tree
should I be using?

Allen

-----Original Message-----
From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
Sent: Friday, February 11, 2011 9:07 AM
To: Kay, Allen M
Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address
space for
>> ballooning.  It should be completely unused (and unbacked by any pages)
>> until the balloon driver populates it; it is reserved memory in the
>> meantime.
> On my system, the entire chunk is marked as usable memory:
>
>     0000000100000000 - 000000023a6f4000 (usable)
>
> When you said it is reserved memory, are you saying it should be marked as
"reserved" or is there somewhere else in the code that keeps track of
which portion of this e820 chunk is back by real memory and which chunk is
"extra memory"?
Yes, it is marked as usable in the E820 so that the kernel will allocate
page structures for it.  But then the extra part is reserved with
memblock_x86_reserve_range(), which should prevent the kernel from ever
trying to use that memory (ie, it will never get added to the pools of
memory the allocator allocates from).  The balloon driver backs these
pseudo-physical pageframes with real memory pages, and then releases
into the pool for allocation.

    J
> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
> Sent: Thursday, February 10, 2011 6:56 PM
> To: Kay, Allen M
> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
>> Konrad/Stefano,
>>
>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I
reported a few weeks ago.
>>
>> I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
>>
>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table
is:
>>
>>     0000000100000000 - 000000016b45a000 (usable)
>>
>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
>>
>>     0000000100000000 - 000000023a6f4000 (usable)
>>
>> This pushes the top of RAM beyond what was reported by Xen''s
e820 table, which is:
>>
>> (XEN)  0000000100000000 - 00000001de600000 (usable)
>>
>> AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
> That "extra memory" stuff is reserving some physical address
space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
>
> How is that memory getting referenced in your case?
>
>> Another potential problem I noticed with e820 processing is that there
is a discrepancy between how Xen processes e820 and how dom0 does it.  In Xen
(arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT
boundary while dom0 e820 code does not.  As a result, one of my e820 entry that
is 1 page in size got dropped by Xen but got picked up in dom0.  This does not
cause problem in my case but the inconsistency on how memory is used by xen and
dom0 can potentially be a problem.
> I don''t think that matters.  Xen can choose not to use non-2M
aligned
> pieces of memory if it wants, but that doesn''t really affect the
dom0
> kernel''s use of the host E820, because dom0 is only looking for
possible
> device memory, rather than RAM.
>
>     J
>> Allen
>>
>> -----Original Message-----
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
>> Sent: Friday, January 28, 2011 7:48 AM
>> To: Kay, Allen M
>> Cc: xen-devel; Stefano Stabellini
>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot
failure
>>
>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>>> Following are the brief error messages from the serial console
log.  I have also attached the full serial console log and dom0 system map.
>>>>
>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>>> On a second look, this is a different issue than I had encountered.
>>>
>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but
that
>>> is not right. Googling for this shows that I had fixed this with a
>>> Xorg server at some point, but I can''t remember the
details so that is not
>>> that useful :-(
>>>
>>> You said it works if you give the domain 1024MB, but I wonder if
>>> it also works if you disable the IOMMU? What happens then?
>> Can you also patch your Xen hypervisor with this patch? It will print
out the
>> other 89 entries so we can see what type of values they have.. You
might need to
>> move it a bit as this is for xen-unstable.
>>
>> diff -r 003acf02d416 xen/arch/x86/mm.c
>> --- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
>> +++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
>> @@ -1201,11 +1201,12 @@
>>      return 0;
>>  
>>   fail:
>> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn:
%lx). Other L1 values:", i, pfn);
>>      while ( i-- > 0 )
>> -        if ( is_guest_l1_slot(i) )
>> +        if ( is_guest_l1_slot(i) ) {
>> +            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
>>              put_page_from_l1e(pl1e[i], d);
>> -
>> +	}
>>      unmap_domain_page(pl1e);
>>      return -EINVAL;
>>  }
>>
>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69)
for type 1000000
>>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6]
on VCPU 0 [ec=0000
>>>> ]
>>>> (XEN) domain_crash_sync called from entry.S
>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Feb-11 22:10 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

I switched to next-2.6.37 branch in Jeremy''s tree which has
memblock_x86_reserve_range() function.

The boot failure still occurs.  RIP points to the BUG() call in
arch/x86/xen/mm.c/pin_pagetable_pfn().

Any suggestions?

Allen

-----Original Message-----
From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] 
Sent: Friday, February 11, 2011 6:51 AM
To: Jeremy Fitzhardinge
Cc: Kay, Allen M; Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir
Fraser
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Fri, 11 Feb 2011, Jeremy Fitzhardinge wrote:> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
> > Konrad/Stefano,
> >
> > Getting back to the xen/dom0 boot failure on my Sandybridge SDP I
reported a few weeks ago.
> >
> > I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
> >
> > Before xen_add_extra_mem() is called, the last entry of dom0 e820
table is:
> >
> >     0000000100000000 - 000000016b45a000 (usable)
> >
> > After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
> >
> >     0000000100000000 - 000000023a6f4000 (usable)
> >
> > This pushes the top of RAM beyond what was reported by Xen''s
e820 table, which is:
> >
> > (XEN)  0000000100000000 - 00000001de600000 (usable)
> >
> > AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
> 
> That "extra memory" stuff is reserving some physical address
space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
> 
> How is that memory getting referenced in your case?
> 
In particular it would be very interesting to know what the RIP of the
crash resolves to.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Feb-11 22:55 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On 02/11/2011 11:00 AM, Kay, Allen M wrote:> The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops
dom0.
No, the function changed name, but the concept is the same..
>   I did find it in Konrad''s tree at
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git.
>
> So is this a problem for 2.6.32.27 stable tree?  If so, which pvops dom0
tree should I be using?
I *just* pushed .32.27 and haven''t had a chance to test it.  The
xen/stable-2.6.32.x branch contains the version of xen/next-2.6.32 which
has at least passed an amount of testing (ie, boots on something at the
very least).

    J
> Allen
>
> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
> Sent: Friday, February 11, 2011 9:07 AM
> To: Kay, Allen M
> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On 02/10/2011 07:07 PM, Kay, Allen M wrote:
>>> That "extra memory" stuff is reserving some physical
address space for
>>> ballooning.  It should be completely unused (and unbacked by any
pages)
>>> until the balloon driver populates it; it is reserved memory in the
>>> meantime.
>> On my system, the entire chunk is marked as usable memory:
>>
>>     0000000100000000 - 000000023a6f4000 (usable)
>>
>> When you said it is reserved memory, are you saying it should be marked
as "reserved" or is there somewhere else in the code that keeps track
of which portion of this e820 chunk is back by real memory and which chunk is
"extra memory"?
> Yes, it is marked as usable in the E820 so that the kernel will allocate
> page structures for it.  But then the extra part is reserved with
> memblock_x86_reserve_range(), which should prevent the kernel from ever
> trying to use that memory (ie, it will never get added to the pools of
> memory the allocator allocates from).  The balloon driver backs these
> pseudo-physical pageframes with real memory pages, and then releases
> into the pool for allocation.
>
>     J
>
>> -----Original Message-----
>> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
>> Sent: Thursday, February 10, 2011 6:56 PM
>> To: Kay, Allen M
>> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot
failure
>>
>> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
>>> Konrad/Stefano,
>>>
>>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I
reported a few weeks ago.
>>>
>>> I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
>>>
>>> Before xen_add_extra_mem() is called, the last entry of dom0 e820
table is:
>>>
>>>     0000000100000000 - 000000016b45a000 (usable)
>>>
>>> After xen_add_extra_mem() is called, the last entry of dom0 e820
table becomes:
>>>
>>>     0000000100000000 - 000000023a6f4000 (usable)
>>>
>>> This pushes the top of RAM beyond what was reported by
Xen''s e820 table, which is:
>>>
>>> (XEN)  0000000100000000 - 00000001de600000 (usable)
>>>
>>> AFAICT, the failure is caused by dom0 accessing non-existent
physical memory.  The failure went away after I removed the call to
xen_add_extra_mem().
>> That "extra memory" stuff is reserving some physical address
space for
>> ballooning.  It should be completely unused (and unbacked by any pages)
>> until the balloon driver populates it; it is reserved memory in the
>> meantime.
>>
>> How is that memory getting referenced in your case?
>>
>>> Another potential problem I noticed with e820 processing is that
there is a discrepancy between how Xen processes e820 and how dom0 does it.  In
Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on
L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not.  As a result, one of
my e820 entry that is 1 page in size got dropped by Xen but got picked up in
dom0.  This does not cause problem in my case but the inconsistency on how
memory is used by xen and dom0 can potentially be a problem.
>> I don''t think that matters.  Xen can choose not to use non-2M
aligned
>> pieces of memory if it wants, but that doesn''t really affect
the dom0
>> kernel''s use of the host E820, because dom0 is only looking
for possible
>> device memory, rather than RAM.
>>
>>     J
>>> Allen
>>>
>>> -----Original Message-----
>>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
>>> Sent: Friday, January 28, 2011 7:48 AM
>>> To: Kay, Allen M
>>> Cc: xen-devel; Stefano Stabellini
>>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot
failure
>>>
>>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk
wrote:
>>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>>>> Following are the brief error messages from the serial
console log.  I have also attached the full serial console log and dom0 system
map.
>>>>>
>>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>>>> On a second look, this is a different issue than I had
encountered.
>>>>
>>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set,
but that
>>>> is not right. Googling for this shows that I had fixed this
with a
>>>> Xorg server at some point, but I can''t remember the
details so that is not
>>>> that useful :-(
>>>>
>>>> You said it works if you give the domain 1024MB, but I wonder
if
>>>> it also works if you disable the IOMMU? What happens then?
>>> Can you also patch your Xen hypervisor with this patch? It will
print out the
>>> other 89 entries so we can see what type of values they have.. You
might need to
>>> move it a bit as this is for xen-unstable.
>>>
>>> diff -r 003acf02d416 xen/arch/x86/mm.c
>>> --- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
>>> +++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
>>> @@ -1201,11 +1201,12 @@
>>>      return 0;
>>>  
>>>   fail:
>>> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>>> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn:
%lx). Other L1 values:", i, pfn);
>>>      while ( i-- > 0 )
>>> -        if ( is_guest_l1_slot(i) )
>>> +        if ( is_guest_l1_slot(i) ) {
>>> +            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
>>>              put_page_from_l1e(pl1e[i], d);
>>> -
>>> +	}
>>>      unmap_domain_page(pl1e);
>>>      return -EINVAL;
>>>  }
>>>
>>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn
3d69) for type 1000000
>>>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap
[#6] on VCPU 0 [ec=0000
>>>>> ]
>>>>> (XEN) domain_crash_sync called from entry.S
>>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Feb-15 04:28 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

The failure occurs when dom0 tries to pin an l1 table entry because L1 table
entry for the failed address is garbage.  The memory range being pinned is in
the range of this extra balloon driver memory not backed by real RAM.  Is this
intended?

Here is the stack trace.  I don''t see any code that trying to restrict
what memory is pin-able in e820 table.

Call Trace:

 [<ffffffff810057f8>] pin_pagetable_pfn+0x5c/0x69
 [<ffffffff81c9652e>] xen_alloc_pte_init+0x2f/0x34
 [<ffffffff81cd4bb4>] phys_pmd_init+0x234/0x2a0
 [<ffffffff81006e5b>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
 [<ffffffff81ca8b94>] ? early_memremap+0x13/0x15
 [<ffffffff81cd4e20>] phys_pud_init+0x200/0x2d3
 [<ffffffff81cd5215>] kernel_physical_mapping_init+0xd2/0x1ce
 [<ffffffff8146adfe>] init_memory_mapping+0x446/0x5b3
 [<ffffffff8100542f>] ? xen_mc_issue+0x3b/0x57
 [<ffffffff81cb551e>] ? memblock_reserve+0x1f/0x21
 [<ffffffff81c975f6>] setup_arch+0x65a/0xb2b
 [<ffffffff8107b972>] ? clockevents_register_notifier+0x1b/0x45
 [<ffffffff8147f033>] ? printk+0x41/0x46
 [<ffffffff81c91ad4>] start_kernel+0xe4/0x41e
 [<ffffffff81c912cb>] x86_64_start_reservations+0xb6/0xba
 [<ffffffff81c9528f>] xen_start_kernel+0x5c9/0x5d0

-----Original Message-----
From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
Sent: Friday, February 11, 2011 9:07 AM
To: Kay, Allen M
Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address
space for
>> ballooning.  It should be completely unused (and unbacked by any pages)
>> until the balloon driver populates it; it is reserved memory in the
>> meantime.
> On my system, the entire chunk is marked as usable memory:
>
>     0000000100000000 - 000000023a6f4000 (usable)
>
> When you said it is reserved memory, are you saying it should be marked as
"reserved" or is there somewhere else in the code that keeps track of
which portion of this e820 chunk is back by real memory and which chunk is
"extra memory"?
Yes, it is marked as usable in the E820 so that the kernel will allocate
page structures for it.  But then the extra part is reserved with
memblock_x86_reserve_range(), which should prevent the kernel from ever
trying to use that memory (ie, it will never get added to the pools of
memory the allocator allocates from).  The balloon driver backs these
pseudo-physical pageframes with real memory pages, and then releases
into the pool for allocation.

    J
> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
> Sent: Thursday, February 10, 2011 6:56 PM
> To: Kay, Allen M
> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
>> Konrad/Stefano,
>>
>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I
reported a few weeks ago.
>>
>> I finally got around to narrow down the problem the call to
xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call
increase the top of E820 memory in dom0 beyond what is actually available.
>>
>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table
is:
>>
>>     0000000100000000 - 000000016b45a000 (usable)
>>
>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table
becomes:
>>
>>     0000000100000000 - 000000023a6f4000 (usable)
>>
>> This pushes the top of RAM beyond what was reported by Xen''s
e820 table, which is:
>>
>> (XEN)  0000000100000000 - 00000001de600000 (usable)
>>
>> AFAICT, the failure is caused by dom0 accessing non-existent physical
memory.  The failure went away after I removed the call to xen_add_extra_mem().
> That "extra memory" stuff is reserving some physical address
space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
>
> How is that memory getting referenced in your case?
>
>> Another potential problem I noticed with e820 processing is that there
is a discrepancy between how Xen processes e820 and how dom0 does it.  In Xen
(arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT
boundary while dom0 e820 code does not.  As a result, one of my e820 entry that
is 1 page in size got dropped by Xen but got picked up in dom0.  This does not
cause problem in my case but the inconsistency on how memory is used by xen and
dom0 can potentially be a problem.
> I don''t think that matters.  Xen can choose not to use non-2M
aligned
> pieces of memory if it wants, but that doesn''t really affect the
dom0
> kernel''s use of the host E820, because dom0 is only looking for
possible
> device memory, rather than RAM.
>
>     J
>> Allen
>>
>> -----Original Message-----
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
>> Sent: Friday, January 28, 2011 7:48 AM
>> To: Kay, Allen M
>> Cc: xen-devel; Stefano Stabellini
>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot
failure
>>
>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>>> Following are the brief error messages from the serial console
log.  I have also attached the full serial console log and dom0 system map.
>>>>
>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>>> On a second look, this is a different issue than I had encountered.
>>>
>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but
that
>>> is not right. Googling for this shows that I had fixed this with a
>>> Xorg server at some point, but I can''t remember the
details so that is not
>>> that useful :-(
>>>
>>> You said it works if you give the domain 1024MB, but I wonder if
>>> it also works if you disable the IOMMU? What happens then?
>> Can you also patch your Xen hypervisor with this patch? It will print
out the
>> other 89 entries so we can see what type of values they have.. You
might need to
>> move it a bit as this is for xen-unstable.
>>
>> diff -r 003acf02d416 xen/arch/x86/mm.c
>> --- a/xen/arch/x86/mm.c	Thu Jan 20 17:04:06 2011 +0000
>> +++ b/xen/arch/x86/mm.c	Fri Jan 28 10:46:13 2011 -0500
>> @@ -1201,11 +1201,12 @@
>>      return 0;
>>  
>>   fail:
>> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn:
%lx). Other L1 values:", i, pfn);
>>      while ( i-- > 0 )
>> -        if ( is_guest_l1_slot(i) )
>> +        if ( is_guest_l1_slot(i) ) {
>> +            MEM_LOG("L1[%d] = %lx", i, (unsigned
long)l1e_get_intpte(pl1e[i]));
>>              put_page_from_l1e(pl1e[i], d);
>> -
>> +	}
>>      unmap_domain_page(pl1e);
>>      return -EINVAL;
>>  }
>>
>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69)
for type 1000000
>>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6]
on VCPU 0 [ec=0000
>>>> ]
>>>> (XEN) domain_crash_sync called from entry.S
>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Feb-15 14:58 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, 15 Feb 2011, Kay, Allen M wrote:> The failure occurs when dom0 tries to pin an l1 table entry because L1
table entry for the failed address is garbage.  The memory range being pinned is
in the range of this extra balloon driver memory not backed by real RAM.  Is
this intended?
> 
The pfn in question is returned by alloc_low_page and is equal to
e820_table_end, that should fall into the range
e820_table_start-e820_table_top.
This range is allocated by find_early_table_space in the initial
pagetable mappings between 0x8000 and max_pfn_mapped, respecting the
reserved memblock regions.
Maybe we didn''t reserve some ranges in there that should have been
reserved?
What happen you apply this patch?


diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5e92b61..73a21db 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1653,9 +1653,6 @@ static __init void xen_map_identity_early(pmd_t *pmd,
unsigned long max_pfn)
 		for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) {
 			pte_t pte;
 
-			if (pfn > max_pfn_mapped)
-				max_pfn_mapped = pfn;
-
 			if (!pte_none(pte_page[pteidx]))
 				continue;
 
@@ -1713,6 +1710,8 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 	pud_t *l3;
 	pmd_t *l2;
 
+	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
+
 	/* Zap identity mapping */
 	init_level4_pgt[0] = __pgd(0);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Feb-16 03:08 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Setting max_pfn_mapped here has no effect.  It still fails the same way.

Later on in setup_arch(), max_pfn_mapped is set to max_lower_pfn_mapped.  For 64
bit dom0, it will try to set it again for memory above 4GB by calling
init_memory_mapping(1UL<<32, max_pfn<<PAGE_SHIFT) - this is where it
eventually fails.

The problem is max_pfn also contains non-RAM extra memory.  Should max_pfn set
to xen_low_pfn_mapped before calling init_memory_mapping for memory above 4GB?

Allen 

-----Original Message-----
From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] 
Sent: Tuesday, February 15, 2011 6:58 AM
To: Kay, Allen M
Cc: Jeremy Fitzhardinge; Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel;
Keir Fraser
Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Tue, 15 Feb 2011, Kay, Allen M wrote:> The failure occurs when dom0 tries to pin an l1 table entry because L1
table entry for the failed address is garbage.  The memory range being pinned is
in the range of this extra balloon driver memory not backed by real RAM.  Is
this intended?
> 
The pfn in question is returned by alloc_low_page and is equal to
e820_table_end, that should fall into the range
e820_table_start-e820_table_top.
This range is allocated by find_early_table_space in the initial
pagetable mappings between 0x8000 and max_pfn_mapped, respecting the
reserved memblock regions.
Maybe we didn''t reserve some ranges in there that should have been
reserved?
What happen you apply this patch?


diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5e92b61..73a21db 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1653,9 +1653,6 @@ static __init void xen_map_identity_early(pmd_t *pmd,
unsigned long max_pfn)
 		for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) {
 			pte_t pte;
 
-			if (pfn > max_pfn_mapped)
-				max_pfn_mapped = pfn;
-
 			if (!pte_none(pte_page[pteidx]))
 				continue;
 
@@ -1713,6 +1710,8 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 	pud_t *l3;
 	pmd_t *l2;
 
+	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
+
 	/* Zap identity mapping */
 	init_level4_pgt[0] = __pgd(0);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Feb-16 17:19 UTC

head link

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

On Wed, 16 Feb 2011, Kay, Allen M wrote:> Setting max_pfn_mapped here has no effect.  It still fails the same way.
> 
> Later on in setup_arch(), max_pfn_mapped is set to max_lower_pfn_mapped. 
For 64 bit dom0, it will try to set it again for memory above 4GB by calling
init_memory_mapping(1UL<<32, max_pfn<<PAGE_SHIFT) - this is where it
eventually fails.
> 
It shouldn''t make any difference what value max_pfn_mapped holds at
that
point. It mattered before because it changes the way the memory for the
initial page table is allocated (see
arch/x86/mm/init.c:find_early_table_space).
> The problem is max_pfn also contains non-RAM extra memory.  Should max_pfn
set to xen_low_pfn_mapped before calling init_memory_mapping for memory above
4GB?
> 
that shouldn''t be a problem because it is marked as reserved in
xen_add_extra_mem, so it shouldn''t be used to store the pagetable pages
(or anything else really).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jan 2011 - 2.6.32.27 dom0 + latest xen staging boot failure

[Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure