Kay, Allen M
2011-Jan-25 02:26 UTC
[Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
I''m encountering following boot failure with the latest pvops 2.6.32.27 dom0 and xen staging tree on my system. Attached file contains the full serial console log. Has anyone seen it? .... init_memory_mapping: 0000000000000000-000000009b000000 init_memory_mapping: 0000000100000000-000000023a6f4000 (XEN) mm.c:802:d0 Bad L1 flags 400000 (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365 (XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type 100000 0000000000: caf=8000000000000003 taf=1000000000000001 (XEN) mm.c:2965:d0 Error while pinning mfn 1c3eb4 (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff8100cd18>] (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest (XEN) rax: 00000000ffffffea rbx: ffff88001454c000 rcx: ffffffff8261f000 (XEN) rdx: 00000000deadbeef rsi: 00000000deadbeef rdi: 00000000deadbeef (XEN) rbp: ffffffff816c1bf8 rsp: ffffffff816c1b98 r8: 00003ffffffff000 (XEN) r9: ffff880000000000 r10: 00000000deadbeef r11: 0000000000000000 (XEN) r12: ffffffffff400b10 r13: 0000000000000162 r14: ffffffffff440000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 00000001c1001000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff816c1b98: (XEN) ffffffff8261f000 0000000000000000 ffffffff8100cd18 000000010000e030 (XEN) 0000000000010082 ffffffff816c1bd8 000000000000e02b ffffffff8100cd14 (XEN) ffffffff00000000 00000000001c3eb4 ffffffff81702c18 ffff88001454c000 (XEN) ffffffff816c1c18 ffffffff81980f27 ffffffff816c1c38 000000000001454c (XEN) ffffffff816c1c38 ffffffff81036cd6 ffffffffff400b10 000000016c400000 (XEN) ffffffff816c1cd8 ffffffff819ba427 00000001c3350067 8000000000000163 (XEN) 0000000100000001 ffffffffff400000 000000008100cb50 80000000000001e3 (XEN) 8000000000000163 000000023a6f4000 000000016c600000 0000000000000000 (XEN) ffffffff816c1ca8 000000001454c000 ffffffff816c1cd8 ffff880001002028 (XEN) ffffffffff400000 0000000140000000 0000000140000000 000000023a6f4000 (XEN) ffffffff816c1d68 ffffffff819ba65d ffffffff8100c407 8000000000000163 (XEN) ffff880001002000 0000000000000000 0000000000000000 0000000000000004 (XEN) 00000001816c1d68 0000000000000000 0000000000000000 00000000143e9000 (XEN) ffffffff8108a086 000000023a6f4000 ffffffff81001880 ffff88023a6f4000 (XEN) 000000023a6f4000 ffff88023a6f4000 ffffffff816c1dc8 ffffffff819ba7b8 (XEN) ffff880100000000 0000000000000000 ffffffff816c1d98 000000009b000000 (XEN) ffffffff816c1dc8 ffffffff816c1e20 0000000000000001 0000000000000001 (XEN) 000000023a6f4000 0000000000000000 ffffffff816c1eb8 ffffffff81465e66 (XEN) 0000000000000000 0000000000000000 0000050100000000 ffffffff816c1e88 (XEN) 0000000000000000 0000000100000000 0000000100000000 000000023a6f4000 (XEN) Domain 0 crashed: rebooting machine in 5 seconds. (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-25 14:39 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Mon, Jan 24, 2011 at 06:26:49PM -0800, Kay, Allen M wrote:> I''m encountering following boot failure with the latest pvops 2.6.32.27 dom0 and xen staging tree on my system. Attached file contains the full serial console log. > > Has anyone seen it?What does the 0xffffffff8100cd18 translate to in your System.map of your Linux kernel?> > .... > > init_memory_mapping: 0000000000000000-000000009b000000 > init_memory_mapping: 0000000100000000-000000023a6f4000 > (XEN) mm.c:802:d0 Bad L1 flags 400000 > (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365 > (XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type 100000 > 0000000000: caf=8000000000000003 taf=1000000000000001 > (XEN) mm.c:2965:d0 Error while pinning mfn 1c3eb4 > (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000] > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e033:[<ffffffff8100cd18>] > (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest > (XEN) rax: 00000000ffffffea rbx: ffff88001454c000 rcx: ffffffff8261f000 > (XEN) rdx: 00000000deadbeef rsi: 00000000deadbeef rdi: 00000000deadbeef > (XEN) rbp: ffffffff816c1bf8 rsp: ffffffff816c1b98 r8: 00003ffffffff000 > (XEN) r9: ffff880000000000 r10: 00000000deadbeef r11: 0000000000000000 > (XEN) r12: ffffffffff400b10 r13: 0000000000000162 r14: ffffffffff440000 > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 00000001c1001000 cr2: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 > (XEN) Guest stack trace from rsp=ffffffff816c1b98: > (XEN) ffffffff8261f000 0000000000000000 ffffffff8100cd18 000000010000e030 > (XEN) 0000000000010082 ffffffff816c1bd8 000000000000e02b ffffffff8100cd14 > (XEN) ffffffff00000000 00000000001c3eb4 ffffffff81702c18 ffff88001454c000 > (XEN) ffffffff816c1c18 ffffffff81980f27 ffffffff816c1c38 000000000001454c > (XEN) ffffffff816c1c38 ffffffff81036cd6 ffffffffff400b10 000000016c400000 > (XEN) ffffffff816c1cd8 ffffffff819ba427 00000001c3350067 8000000000000163 > (XEN) 0000000100000001 ffffffffff400000 000000008100cb50 80000000000001e3 > (XEN) 8000000000000163 000000023a6f4000 000000016c600000 0000000000000000 > (XEN) ffffffff816c1ca8 000000001454c000 ffffffff816c1cd8 ffff880001002028 > (XEN) ffffffffff400000 0000000140000000 0000000140000000 000000023a6f4000 > (XEN) ffffffff816c1d68 ffffffff819ba65d ffffffff8100c407 8000000000000163 > (XEN) ffff880001002000 0000000000000000 0000000000000000 0000000000000004 > (XEN) 00000001816c1d68 0000000000000000 0000000000000000 00000000143e9000 > (XEN) ffffffff8108a086 000000023a6f4000 ffffffff81001880 ffff88023a6f4000 > (XEN) 000000023a6f4000 ffff88023a6f4000 ffffffff816c1dc8 ffffffff819ba7b8 > (XEN) ffff880100000000 0000000000000000 ffffffff816c1d98 000000009b000000 > (XEN) ffffffff816c1dc8 ffffffff816c1e20 0000000000000001 0000000000000001 > (XEN) 000000023a6f4000 0000000000000000 ffffffff816c1eb8 ffffffff81465e66 > (XEN) 0000000000000000 0000000000000000 0000050100000000 ffffffff816c1e88 > (XEN) 0000000000000000 0000000100000000 0000000100000000 000000023a6f4000 > (XEN) Domain 0 crashed: rebooting machine in 5 seconds. > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-25 18:49 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
[This email is either empty or too large to be displayed at this time]
Konrad Rzeszutek Wilk
2011-Jan-25 19:07 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote:> Looks like it translates to pin_pagetable_pfn. I have also attached the entire System.map file. > > ... > ffffffff8100cce2 t pin_pagetable_pfn > ffffffff8100cd1e t p2m_mid_mfn_initOk, then it probably is related to the issues we had with the P2M or MFN list being incorrect... and your E820: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009bc00 (usable) (XEN) 000000000009bc00 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 0000000020000000 (usable) (XEN) 0000000020000000 - 0000000020200000 (reserved) (XEN) 0000000020200000 - 0000000040000000 (usable) (XEN) 0000000040000000 - 0000000040200000 (reserved) (XEN) 0000000040200000 - 000000009acd3000 (usable) (XEN) 000000009acd3000 - 000000009ad67000 (reserved) (XEN) 000000009ad67000 - 000000009afe7000 (ACPI NVS) (XEN) 000000009afe7000 - 000000009afff000 (ACPI data) (XEN) 000000009afff000 - 000000009b000000 (usable) (XEN) 000000009b000000 - 000000009fa00000 (reserved) (XEN) 00000000f8000000 - 00000000fc000000 (reserved) (XEN) 00000000fec00000 - 00000000fec01000 (reserved) (XEN) 00000000fed10000 - 00000000fed14000 (reserved) (XEN) 00000000fed18000 - 00000000fed1a000 (reserved) (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ff980000 - 00000000ffc00000 (reserved) (XEN) 00000000ffd80000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 00000001de600000 (usable) is like swiss-cheese with the RAM regions. What machine is this and how can I get my hands on it? Does it boot if you have ''dom0_mem=max:512MB'' (it is important to have the ''max'' there)? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-25 19:24 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
The machine is an Intel Sandybridge Desktop SDP (Software Development Platform). Setting ''dom0_mem=max:1024MB'' worked. Booting with "dom0_mem=max:512MB" panic''ed in mount_block_root(). Allen -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Tuesday, January 25, 2011 11:08 AM To: Kay, Allen M Cc: xen-devel Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote:> Looks like it translates to pin_pagetable_pfn. I have also attached the entire System.map file. > > ... > ffffffff8100cce2 t pin_pagetable_pfn > ffffffff8100cd1e t p2m_mid_mfn_initOk, then it probably is related to the issues we had with the P2M or MFN list being incorrect... and your E820: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009bc00 (usable) (XEN) 000000000009bc00 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 0000000020000000 (usable) (XEN) 0000000020000000 - 0000000020200000 (reserved) (XEN) 0000000020200000 - 0000000040000000 (usable) (XEN) 0000000040000000 - 0000000040200000 (reserved) (XEN) 0000000040200000 - 000000009acd3000 (usable) (XEN) 000000009acd3000 - 000000009ad67000 (reserved) (XEN) 000000009ad67000 - 000000009afe7000 (ACPI NVS) (XEN) 000000009afe7000 - 000000009afff000 (ACPI data) (XEN) 000000009afff000 - 000000009b000000 (usable) (XEN) 000000009b000000 - 000000009fa00000 (reserved) (XEN) 00000000f8000000 - 00000000fc000000 (reserved) (XEN) 00000000fec00000 - 00000000fec01000 (reserved) (XEN) 00000000fed10000 - 00000000fed14000 (reserved) (XEN) 00000000fed18000 - 00000000fed1a000 (reserved) (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ff980000 - 00000000ffc00000 (reserved) (XEN) 00000000ffd80000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 00000001de600000 (usable) is like swiss-cheese with the RAM regions. What machine is this and how can I get my hands on it? Does it boot if you have ''dom0_mem=max:512MB'' (it is important to have the ''max'' there)? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-25 20:10 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Tue, Jan 25, 2011 at 11:24:54AM -0800, Kay, Allen M wrote:> The machine is an Intel Sandybridge Desktop SDP (Software Development Platform). > > Setting ''dom0_mem=max:1024MB'' worked. Booting with "dom0_mem=max:512MB" panic''ed in mount_block_root().OK, do you see anything on the Xen console (if you up the debug options?) I wondering if you see something akin to this: (XEN) mm.c:889:d0 Error getting mfn 110000 (pfn 5555555555555555) from L1 entry 8000000110000463 for l1e_owner=0, pg_owner=0 (Xrror while pinning mfn 20c8c0> > Allen > > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Tuesday, January 25, 2011 11:08 AM > To: Kay, Allen M > Cc: xen-devel > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote: > > Looks like it translates to pin_pagetable_pfn. I have also attached the entire System.map file. > > > > ... > > ffffffff8100cce2 t pin_pagetable_pfn > > ffffffff8100cd1e t p2m_mid_mfn_init > > Ok, then it probably is related to the issues we had with the P2M > or MFN list being incorrect... and your E820: > > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009bc00 (usable) > (XEN) 000000000009bc00 - 00000000000a0000 (reserved) > (XEN) 00000000000e0000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 0000000020000000 (usable) > (XEN) 0000000020000000 - 0000000020200000 (reserved) > (XEN) 0000000020200000 - 0000000040000000 (usable) > (XEN) 0000000040000000 - 0000000040200000 (reserved) > (XEN) 0000000040200000 - 000000009acd3000 (usable) > (XEN) 000000009acd3000 - 000000009ad67000 (reserved) > (XEN) 000000009ad67000 - 000000009afe7000 (ACPI NVS) > (XEN) 000000009afe7000 - 000000009afff000 (ACPI data) > (XEN) 000000009afff000 - 000000009b000000 (usable) > (XEN) 000000009b000000 - 000000009fa00000 (reserved) > (XEN) 00000000f8000000 - 00000000fc000000 (reserved) > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > (XEN) 00000000fed10000 - 00000000fed14000 (reserved) > (XEN) 00000000fed18000 - 00000000fed1a000 (reserved) > (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ff980000 - 00000000ffc00000 (reserved) > (XEN) 00000000ffd80000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 00000001de600000 (usable) > > is like swiss-cheese with the RAM regions. > What machine is this and how can I get my hands on it? > > Does it boot if you have ''dom0_mem=max:512MB'' (it is important > to have the ''max'' there)? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-25 21:26 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
I do not see any message from mm.c if dom0_mem param is used. If dom0_mem is not used, then I see following error messages in the serial console log. It is part of the log I sent out in my original bug report: (XEN) mm.c:802:d0 Bad L1 flags 400000 (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365 (XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type 100000 -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Tuesday, January 25, 2011 12:10 PM To: Kay, Allen M Cc: xen-devel Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Tue, Jan 25, 2011 at 11:24:54AM -0800, Kay, Allen M wrote:> The machine is an Intel Sandybridge Desktop SDP (Software Development Platform). > > Setting ''dom0_mem=max:1024MB'' worked. Booting with "dom0_mem=max:512MB" panic''ed in mount_block_root().OK, do you see anything on the Xen console (if you up the debug options?) I wondering if you see something akin to this: (XEN) mm.c:889:d0 Error getting mfn 110000 (pfn 5555555555555555) from L1 entry 8000000110000463 for l1e_owner=0, pg_owner=0 (Xrror while pinning mfn 20c8c0> > Allen > > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Tuesday, January 25, 2011 11:08 AM > To: Kay, Allen M > Cc: xen-devel > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote: > > Looks like it translates to pin_pagetable_pfn. I have also attached the entire System.map file. > > > > ... > > ffffffff8100cce2 t pin_pagetable_pfn > > ffffffff8100cd1e t p2m_mid_mfn_init > > Ok, then it probably is related to the issues we had with the P2M > or MFN list being incorrect... and your E820: > > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009bc00 (usable) > (XEN) 000000000009bc00 - 00000000000a0000 (reserved) > (XEN) 00000000000e0000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 0000000020000000 (usable) > (XEN) 0000000020000000 - 0000000020200000 (reserved) > (XEN) 0000000020200000 - 0000000040000000 (usable) > (XEN) 0000000040000000 - 0000000040200000 (reserved) > (XEN) 0000000040200000 - 000000009acd3000 (usable) > (XEN) 000000009acd3000 - 000000009ad67000 (reserved) > (XEN) 000000009ad67000 - 000000009afe7000 (ACPI NVS) > (XEN) 000000009afe7000 - 000000009afff000 (ACPI data) > (XEN) 000000009afff000 - 000000009b000000 (usable) > (XEN) 000000009b000000 - 000000009fa00000 (reserved) > (XEN) 00000000f8000000 - 00000000fc000000 (reserved) > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > (XEN) 00000000fed10000 - 00000000fed14000 (reserved) > (XEN) 00000000fed18000 - 00000000fed1a000 (reserved) > (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ff980000 - 00000000ffc00000 (reserved) > (XEN) 00000000ffd80000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 00000001de600000 (usable) > > is like swiss-cheese with the RAM regions. > What machine is this and how can I get my hands on it? > > Does it boot if you have ''dom0_mem=max:512MB'' (it is important > to have the ''max'' there)? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-26 02:41 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
I noticed one of my e820 entry is not page aligned:> (XEN) 0000000000000000 - 000000000009bc00 (usable)It might be similar to the problem reported by Michael Young in attached email. -----Original Message----- From: Kay, Allen M Sent: Tuesday, January 25, 2011 1:26 PM To: ''Konrad Rzeszutek Wilk'' Cc: xen-devel Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure I do not see any message from mm.c if dom0_mem param is used. If dom0_mem is not used, then I see following error messages in the serial console log. It is part of the log I sent out in my original bug report: (XEN) mm.c:802:d0 Bad L1 flags 400000 (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 365 (XEN) mm.c:2142:d0 Error while validating mfn 1c3eb4 (pfn 1454c) for type 100000 -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Tuesday, January 25, 2011 12:10 PM To: Kay, Allen M Cc: xen-devel Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Tue, Jan 25, 2011 at 11:24:54AM -0800, Kay, Allen M wrote:> The machine is an Intel Sandybridge Desktop SDP (Software Development Platform). > > Setting ''dom0_mem=max:1024MB'' worked. Booting with "dom0_mem=max:512MB" panic''ed in mount_block_root().OK, do you see anything on the Xen console (if you up the debug options?) I wondering if you see something akin to this: (XEN) mm.c:889:d0 Error getting mfn 110000 (pfn 5555555555555555) from L1 entry 8000000110000463 for l1e_owner=0, pg_owner=0 (Xrror while pinning mfn 20c8c0> > Allen > > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Tuesday, January 25, 2011 11:08 AM > To: Kay, Allen M > Cc: xen-devel > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Tue, Jan 25, 2011 at 10:49:52AM -0800, Kay, Allen M wrote: > > Looks like it translates to pin_pagetable_pfn. I have also attached the entire System.map file. > > > > ... > > ffffffff8100cce2 t pin_pagetable_pfn > > ffffffff8100cd1e t p2m_mid_mfn_init > > Ok, then it probably is related to the issues we had with the P2M > or MFN list being incorrect... and your E820: > > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009bc00 (usable) > (XEN) 000000000009bc00 - 00000000000a0000 (reserved) > (XEN) 00000000000e0000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 0000000020000000 (usable) > (XEN) 0000000020000000 - 0000000020200000 (reserved) > (XEN) 0000000020200000 - 0000000040000000 (usable) > (XEN) 0000000040000000 - 0000000040200000 (reserved) > (XEN) 0000000040200000 - 000000009acd3000 (usable) > (XEN) 000000009acd3000 - 000000009ad67000 (reserved) > (XEN) 000000009ad67000 - 000000009afe7000 (ACPI NVS) > (XEN) 000000009afe7000 - 000000009afff000 (ACPI data) > (XEN) 000000009afff000 - 000000009b000000 (usable) > (XEN) 000000009b000000 - 000000009fa00000 (reserved) > (XEN) 00000000f8000000 - 00000000fc000000 (reserved) > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > (XEN) 00000000fed10000 - 00000000fed14000 (reserved) > (XEN) 00000000fed18000 - 00000000fed1a000 (reserved) > (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ff980000 - 00000000ffc00000 (reserved) > (XEN) 00000000ffd80000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 00000001de600000 (usable) > > is like swiss-cheese with the RAM regions. > What machine is this and how can I get my hands on it? > > Does it boot if you have ''dom0_mem=max:512MB'' (it is important > to have the ''max'' there)? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-26 16:14 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote:> I noticed one of my e820 entry is not page aligned: > > > (XEN) 0000000000000000 - 000000000009bc00 (usable) > > It might be similar to the problem reported by Michael Young in attached email.Did you try their patch? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-26 18:46 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
I just tried it and it can now boot successfully without the need for dom0_mem=max:1024MB parameter. Is the patch going to be checked into pvops tree? It does not seems to be in 2.6.32.27 dom0 pvops tree yet. -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Wednesday, January 26, 2011 8:14 AM To: Kay, Allen M Cc: xen-devel Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote:> I noticed one of my e820 entry is not page aligned: > > > (XEN) 0000000000000000 - 000000000009bc00 (usable) > > It might be similar to the problem reported by Michael Young in attached email.Did you try their patch? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-26 21:28 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Wed, Jan 26, 2011 at 10:46:13AM -0800, Kay, Allen M wrote:> I just tried it and it can now boot successfully without the need for dom0_mem=max:1024MB parameter.Woot! Great.> > Is the patch going to be checked into pvops tree? It does not seems to be in 2.6.32.27 dom0 pvops tree yet.It is 2.6.38. Hadn''t yet done it for 2.6.32 - let me spin up a patch for it. Can I put a Tested-by on it from you?> > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Wednesday, January 26, 2011 8:14 AM > To: Kay, Allen M > Cc: xen-devel > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote: > > I noticed one of my e820 entry is not page aligned: > > > > > (XEN) 0000000000000000 - 000000000009bc00 (usable) > > > > It might be similar to the problem reported by Michael Young in attached email. > > Did you try their patch? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-26 21:53 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> Can I put a Tested-by on it from you?Sure. I have attached the Stefano''s patch I used just to make sure that we are referring to the same patch. I would also like to use 2.6.38. What commands should I use to pull it? It is not clear to me from readying pvops wiki page (http://wiki.xensource.com/xenwiki/XenParavirtOps). -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Wednesday, January 26, 2011 1:29 PM To: Kay, Allen M Cc: xen-devel Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Wed, Jan 26, 2011 at 10:46:13AM -0800, Kay, Allen M wrote:> I just tried it and it can now boot successfully without the need for dom0_mem=max:1024MB parameter.Woot! Great.> > Is the patch going to be checked into pvops tree? It does not seems to be in 2.6.32.27 dom0 pvops tree yet.It is 2.6.38. Hadn''t yet done it for 2.6.32 - let me spin up a patch for it. Can I put a Tested-by on it from you?> > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Wednesday, January 26, 2011 8:14 AM > To: Kay, Allen M > Cc: xen-devel > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote: > > I noticed one of my e820 entry is not page aligned: > > > > > (XEN) 0000000000000000 - 000000000009bc00 (usable) > > > > It might be similar to the problem reported by Michael Young in attached email. > > Did you try their patch? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-27 01:16 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
Looks like I spoke too soon. I have encountered additional failures even with the Stefano''s patch in subsequent boots. -----Original Message----- From: Kay, Allen M Sent: Wednesday, January 26, 2011 1:54 PM To: ''Konrad Rzeszutek Wilk'' Cc: xen-devel Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure> Can I put a Tested-by on it from you?Sure. I have attached the Stefano''s patch I used just to make sure that we are referring to the same patch. I would also like to use 2.6.38. What commands should I use to pull it? It is not clear to me from readying pvops wiki page (http://wiki.xensource.com/xenwiki/XenParavirtOps). -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Wednesday, January 26, 2011 1:29 PM To: Kay, Allen M Cc: xen-devel Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Wed, Jan 26, 2011 at 10:46:13AM -0800, Kay, Allen M wrote:> I just tried it and it can now boot successfully without the need for dom0_mem=max:1024MB parameter.Woot! Great.> > Is the patch going to be checked into pvops tree? It does not seems to be in 2.6.32.27 dom0 pvops tree yet.It is 2.6.38. Hadn''t yet done it for 2.6.32 - let me spin up a patch for it. Can I put a Tested-by on it from you?> > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Wednesday, January 26, 2011 8:14 AM > To: Kay, Allen M > Cc: xen-devel > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Tue, Jan 25, 2011 at 06:41:30PM -0800, Kay, Allen M wrote: > > I noticed one of my e820 entry is not page aligned: > > > > > (XEN) 0000000000000000 - 000000000009bc00 (usable) > > > > It might be similar to the problem reported by Michael Young in attached email. > > Did you try their patch? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Jan-27 11:59 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Thu, 27 Jan 2011, Kay, Allen M wrote:> Looks like I spoke too soon. I have encountered additional failures even with the Stefano''s patch in subsequent boots. >Could you please post the error you are getting? The full xen+kernel serial output would be nice.> -----Original Message----- > From: Kay, Allen M > Sent: Wednesday, January 26, 2011 1:54 PM > To: ''Konrad Rzeszutek Wilk'' > Cc: xen-devel > Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > > Can I put a Tested-by on it from you? > > Sure. I have attached the Stefano''s patch I used just to make sure that we are referring to the same patch. > > I would also like to use 2.6.38. What commands should I use to pull it? It is not clear to me from readying pvops wiki page (http://wiki.xensource.com/xenwiki/XenParavirtOps). >try this branch: git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-rc2-fixes it is 2.6.38-rc2 plus three bug fixes. This branch is able to boot on all my testboxes. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-27 14:45 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> > Can I put a Tested-by on it from you? > > Sure. I have attached the Stefano''s patch I used just to make sure that we are referring to the same patch. > > I would also like to use 2.6.38. What commands should I use to pull it? It is not clear to me from readying pvops wiki page (http://wiki.xensource.com/xenwiki/XenParavirtOps).Well, you can just run the vanilla one -- BUT it is still work in progress so any bugs you find - well - you might have better luck just fixing them yourself until we have gotten past most of the bootup problems. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Jan-27 18:51 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
[This email is either empty or too large to be displayed at this time]
Konrad Rzeszutek Wilk
2011-Jan-28 15:28 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. > > (XEN) mm.c:802:d0 Bad L1 flags 400000On a second look, this is a different issue than I had encountered. The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that is not right. Googling for this shows that I had fixed this with a Xorg server at some point, but I can''t remember the details so that is not that useful :-( You said it works if you give the domain 1024MB, but I wonder if it also works if you disable the IOMMU? What happens then?> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 > (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 > 000000000: caf=8000000000000003 taf=1000000000000001 > (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 > (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 > ] > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 0 (vcpu#0) crashed on cpu#0:_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-28 15:47 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: > > Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. > > > > (XEN) mm.c:802:d0 Bad L1 flags 400000 > > On a second look, this is a different issue than I had encountered. > > The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that > is not right. Googling for this shows that I had fixed this with a > Xorg server at some point, but I can''t remember the details so that is not > that useful :-( > > You said it works if you give the domain 1024MB, but I wonder if > it also works if you disable the IOMMU? What happens then?Can you also patch your Xen hypervisor with this patch? It will print out the other 89 entries so we can see what type of values they have.. You might need to move it a bit as this is for xen-unstable. diff -r 003acf02d416 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 @@ -1201,11 +1201,12 @@ return 0; fail: - MEM_LOG("Failure in alloc_l1_table: entry %d", i); + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); while ( i-- > 0 ) - if ( is_guest_l1_slot(i) ) + if ( is_guest_l1_slot(i) ) { + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); put_page_from_l1e(pl1e[i], d); - + } unmap_domain_page(pl1e); return -EINVAL; }> > > (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 > > (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 > > 000000000: caf=8000000000000003 taf=1000000000000001 > > (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 > > (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 > > ] > > (XEN) domain_crash_sync called from entry.S > > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Feb-11 01:03 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
Konrad/Stefano, Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: 0000000100000000 - 000000016b45a000 (usable) After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: 0000000100000000 - 000000023a6f4000 (usable) This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: (XEN) 0000000100000000 - 00000001de600000 (usable) AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem. Allen -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Friday, January 28, 2011 7:48 AM To: Kay, Allen M Cc: xen-devel; Stefano Stabellini Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: > > Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. > > > > (XEN) mm.c:802:d0 Bad L1 flags 400000 > > On a second look, this is a different issue than I had encountered. > > The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that > is not right. Googling for this shows that I had fixed this with a > Xorg server at some point, but I can''t remember the details so that is not > that useful :-( > > You said it works if you give the domain 1024MB, but I wonder if > it also works if you disable the IOMMU? What happens then?Can you also patch your Xen hypervisor with this patch? It will print out the other 89 entries so we can see what type of values they have.. You might need to move it a bit as this is for xen-unstable. diff -r 003acf02d416 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 @@ -1201,11 +1201,12 @@ return 0; fail: - MEM_LOG("Failure in alloc_l1_table: entry %d", i); + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); while ( i-- > 0 ) - if ( is_guest_l1_slot(i) ) + if ( is_guest_l1_slot(i) ) { + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); put_page_from_l1e(pl1e[i], d); - + } unmap_domain_page(pl1e); return -EINVAL; }> > > (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 > > (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 > > 000000000: caf=8000000000000003 taf=1000000000000001 > > (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 > > (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 > > ] > > (XEN) domain_crash_sync called from entry.S > > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Feb-11 02:56 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On 02/10/2011 05:03 PM, Kay, Allen M wrote:> Konrad/Stefano, > > Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. > > I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. > > Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: > > 0000000100000000 - 000000016b45a000 (usable) > > After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: > > 0000000100000000 - 000000023a6f4000 (usable) > > This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: > > (XEN) 0000000100000000 - 00000001de600000 (usable) > > AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem().That "extra memory" stuff is reserving some physical address space for ballooning. It should be completely unused (and unbacked by any pages) until the balloon driver populates it; it is reserved memory in the meantime. How is that memory getting referenced in your case?> Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem.I don''t think that matters. Xen can choose not to use non-2M aligned pieces of memory if it wants, but that doesn''t really affect the dom0 kernel''s use of the host E820, because dom0 is only looking for possible device memory, rather than RAM. J> Allen > > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Friday, January 28, 2011 7:48 AM > To: Kay, Allen M > Cc: xen-devel; Stefano Stabellini > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. >>> >>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >> On a second look, this is a different issue than I had encountered. >> >> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >> is not right. Googling for this shows that I had fixed this with a >> Xorg server at some point, but I can''t remember the details so that is not >> that useful :-( >> >> You said it works if you give the domain 1024MB, but I wonder if >> it also works if you disable the IOMMU? What happens then? > Can you also patch your Xen hypervisor with this patch? It will print out the > other 89 entries so we can see what type of values they have.. You might need to > move it a bit as this is for xen-unstable. > > diff -r 003acf02d416 xen/arch/x86/mm.c > --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 > +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 > @@ -1201,11 +1201,12 @@ > return 0; > > fail: > - MEM_LOG("Failure in alloc_l1_table: entry %d", i); > + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); > while ( i-- > 0 ) > - if ( is_guest_l1_slot(i) ) > + if ( is_guest_l1_slot(i) ) { > + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); > put_page_from_l1e(pl1e[i], d); > - > + } > unmap_domain_page(pl1e); > return -EINVAL; > } > >>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 >>> 000000000: caf=8000000000000003 taf=1000000000000001 >>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 >>> ] >>> (XEN) domain_crash_sync called from entry.S >>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Feb-11 03:07 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
> That "extra memory" stuff is reserving some physical address space for > ballooning. It should be completely unused (and unbacked by any pages) > until the balloon driver populates it; it is reserved memory in the > meantime.On my system, the entire chunk is marked as usable memory: 0000000100000000 - 000000023a6f4000 (usable) When you said it is reserved memory, are you saying it should be marked as "reserved" or is there somewhere else in the code that keeps track of which portion of this e820 chunk is back by real memory and which chunk is "extra memory"? -----Original Message----- From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] Sent: Thursday, February 10, 2011 6:56 PM To: Kay, Allen M Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On 02/10/2011 05:03 PM, Kay, Allen M wrote:> Konrad/Stefano, > > Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. > > I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. > > Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: > > 0000000100000000 - 000000016b45a000 (usable) > > After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: > > 0000000100000000 - 000000023a6f4000 (usable) > > This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: > > (XEN) 0000000100000000 - 00000001de600000 (usable) > > AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem().That "extra memory" stuff is reserving some physical address space for ballooning. It should be completely unused (and unbacked by any pages) until the balloon driver populates it; it is reserved memory in the meantime. How is that memory getting referenced in your case?> Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem.I don''t think that matters. Xen can choose not to use non-2M aligned pieces of memory if it wants, but that doesn''t really affect the dom0 kernel''s use of the host E820, because dom0 is only looking for possible device memory, rather than RAM. J> Allen > > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Friday, January 28, 2011 7:48 AM > To: Kay, Allen M > Cc: xen-devel; Stefano Stabellini > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. >>> >>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >> On a second look, this is a different issue than I had encountered. >> >> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >> is not right. Googling for this shows that I had fixed this with a >> Xorg server at some point, but I can''t remember the details so that is not >> that useful :-( >> >> You said it works if you give the domain 1024MB, but I wonder if >> it also works if you disable the IOMMU? What happens then? > Can you also patch your Xen hypervisor with this patch? It will print out the > other 89 entries so we can see what type of values they have.. You might need to > move it a bit as this is for xen-unstable. > > diff -r 003acf02d416 xen/arch/x86/mm.c > --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 > +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 > @@ -1201,11 +1201,12 @@ > return 0; > > fail: > - MEM_LOG("Failure in alloc_l1_table: entry %d", i); > + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); > while ( i-- > 0 ) > - if ( is_guest_l1_slot(i) ) > + if ( is_guest_l1_slot(i) ) { > + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); > put_page_from_l1e(pl1e[i], d); > - > + } > unmap_domain_page(pl1e); > return -EINVAL; > } > >>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 >>> 000000000: caf=8000000000000003 taf=1000000000000001 >>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 >>> ] >>> (XEN) domain_crash_sync called from entry.S >>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Feb-11 14:51 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Fri, 11 Feb 2011, Jeremy Fitzhardinge wrote:> On 02/10/2011 05:03 PM, Kay, Allen M wrote: > > Konrad/Stefano, > > > > Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. > > > > I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. > > > > Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: > > > > 0000000100000000 - 000000016b45a000 (usable) > > > > After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: > > > > 0000000100000000 - 000000023a6f4000 (usable) > > > > This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: > > > > (XEN) 0000000100000000 - 00000001de600000 (usable) > > > > AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). > > That "extra memory" stuff is reserving some physical address space for > ballooning. It should be completely unused (and unbacked by any pages) > until the balloon driver populates it; it is reserved memory in the > meantime. > > How is that memory getting referenced in your case? >In particular it would be very interesting to know what the RIP of the crash resolves to. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Feb-11 17:06 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address space for >> ballooning. It should be completely unused (and unbacked by any pages) >> until the balloon driver populates it; it is reserved memory in the >> meantime. > On my system, the entire chunk is marked as usable memory: > > 0000000100000000 - 000000023a6f4000 (usable) > > When you said it is reserved memory, are you saying it should be marked as "reserved" or is there somewhere else in the code that keeps track of which portion of this e820 chunk is back by real memory and which chunk is "extra memory"?Yes, it is marked as usable in the E820 so that the kernel will allocate page structures for it. But then the extra part is reserved with memblock_x86_reserve_range(), which should prevent the kernel from ever trying to use that memory (ie, it will never get added to the pools of memory the allocator allocates from). The balloon driver backs these pseudo-physical pageframes with real memory pages, and then releases into the pool for allocation. J> -----Original Message----- > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Thursday, February 10, 2011 6:56 PM > To: Kay, Allen M > Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On 02/10/2011 05:03 PM, Kay, Allen M wrote: >> Konrad/Stefano, >> >> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. >> >> I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. >> >> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: >> >> 0000000100000000 - 000000016b45a000 (usable) >> >> After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: >> >> 0000000100000000 - 000000023a6f4000 (usable) >> >> This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: >> >> (XEN) 0000000100000000 - 00000001de600000 (usable) >> >> AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). > That "extra memory" stuff is reserving some physical address space for > ballooning. It should be completely unused (and unbacked by any pages) > until the balloon driver populates it; it is reserved memory in the > meantime. > > How is that memory getting referenced in your case? > >> Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem. > I don''t think that matters. Xen can choose not to use non-2M aligned > pieces of memory if it wants, but that doesn''t really affect the dom0 > kernel''s use of the host E820, because dom0 is only looking for possible > device memory, rather than RAM. > > J >> Allen >> >> -----Original Message----- >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >> Sent: Friday, January 28, 2011 7:48 AM >> To: Kay, Allen M >> Cc: xen-devel; Stefano Stabellini >> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >> >> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>>> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. >>>> >>>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >>> On a second look, this is a different issue than I had encountered. >>> >>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >>> is not right. Googling for this shows that I had fixed this with a >>> Xorg server at some point, but I can''t remember the details so that is not >>> that useful :-( >>> >>> You said it works if you give the domain 1024MB, but I wonder if >>> it also works if you disable the IOMMU? What happens then? >> Can you also patch your Xen hypervisor with this patch? It will print out the >> other 89 entries so we can see what type of values they have.. You might need to >> move it a bit as this is for xen-unstable. >> >> diff -r 003acf02d416 xen/arch/x86/mm.c >> --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 >> +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 >> @@ -1201,11 +1201,12 @@ >> return 0; >> >> fail: >> - MEM_LOG("Failure in alloc_l1_table: entry %d", i); >> + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); >> while ( i-- > 0 ) >> - if ( is_guest_l1_slot(i) ) >> + if ( is_guest_l1_slot(i) ) { >> + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); >> put_page_from_l1e(pl1e[i], d); >> - >> + } >> unmap_domain_page(pl1e); >> return -EINVAL; >> } >> >>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 >>>> 000000000: caf=8000000000000003 taf=1000000000000001 >>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 >>>> ] >>>> (XEN) domain_crash_sync called from entry.S >>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Feb-11 19:00 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops dom0. I did find it in Konrad''s tree at git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git. So is this a problem for 2.6.32.27 stable tree? If so, which pvops dom0 tree should I be using? Allen -----Original Message----- From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] Sent: Friday, February 11, 2011 9:07 AM To: Kay, Allen M Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address space for >> ballooning. It should be completely unused (and unbacked by any pages) >> until the balloon driver populates it; it is reserved memory in the >> meantime. > On my system, the entire chunk is marked as usable memory: > > 0000000100000000 - 000000023a6f4000 (usable) > > When you said it is reserved memory, are you saying it should be marked as "reserved" or is there somewhere else in the code that keeps track of which portion of this e820 chunk is back by real memory and which chunk is "extra memory"?Yes, it is marked as usable in the E820 so that the kernel will allocate page structures for it. But then the extra part is reserved with memblock_x86_reserve_range(), which should prevent the kernel from ever trying to use that memory (ie, it will never get added to the pools of memory the allocator allocates from). The balloon driver backs these pseudo-physical pageframes with real memory pages, and then releases into the pool for allocation. J> -----Original Message----- > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Thursday, February 10, 2011 6:56 PM > To: Kay, Allen M > Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On 02/10/2011 05:03 PM, Kay, Allen M wrote: >> Konrad/Stefano, >> >> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. >> >> I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. >> >> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: >> >> 0000000100000000 - 000000016b45a000 (usable) >> >> After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: >> >> 0000000100000000 - 000000023a6f4000 (usable) >> >> This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: >> >> (XEN) 0000000100000000 - 00000001de600000 (usable) >> >> AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). > That "extra memory" stuff is reserving some physical address space for > ballooning. It should be completely unused (and unbacked by any pages) > until the balloon driver populates it; it is reserved memory in the > meantime. > > How is that memory getting referenced in your case? > >> Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem. > I don''t think that matters. Xen can choose not to use non-2M aligned > pieces of memory if it wants, but that doesn''t really affect the dom0 > kernel''s use of the host E820, because dom0 is only looking for possible > device memory, rather than RAM. > > J >> Allen >> >> -----Original Message----- >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >> Sent: Friday, January 28, 2011 7:48 AM >> To: Kay, Allen M >> Cc: xen-devel; Stefano Stabellini >> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >> >> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>>> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. >>>> >>>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >>> On a second look, this is a different issue than I had encountered. >>> >>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >>> is not right. Googling for this shows that I had fixed this with a >>> Xorg server at some point, but I can''t remember the details so that is not >>> that useful :-( >>> >>> You said it works if you give the domain 1024MB, but I wonder if >>> it also works if you disable the IOMMU? What happens then? >> Can you also patch your Xen hypervisor with this patch? It will print out the >> other 89 entries so we can see what type of values they have.. You might need to >> move it a bit as this is for xen-unstable. >> >> diff -r 003acf02d416 xen/arch/x86/mm.c >> --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 >> +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 >> @@ -1201,11 +1201,12 @@ >> return 0; >> >> fail: >> - MEM_LOG("Failure in alloc_l1_table: entry %d", i); >> + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); >> while ( i-- > 0 ) >> - if ( is_guest_l1_slot(i) ) >> + if ( is_guest_l1_slot(i) ) { >> + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); >> put_page_from_l1e(pl1e[i], d); >> - >> + } >> unmap_domain_page(pl1e); >> return -EINVAL; >> } >> >>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 >>>> 000000000: caf=8000000000000003 taf=1000000000000001 >>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 >>>> ] >>>> (XEN) domain_crash_sync called from entry.S >>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Feb-11 19:11 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
By the way, I pulled 2.6.32.27 from git://git.kernel.org/pub/scm/linux/Jeremy/xen.git. -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Kay, Allen M Sent: Friday, February 11, 2011 11:01 AM To: Jeremy Fitzhardinge Cc: Stefano Stabellini; Keir Fraser; xen-devel; Konrad Rzeszutek Wilk Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops dom0. I did find it in Konrad''s tree at git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git. So is this a problem for 2.6.32.27 stable tree? If so, which pvops dom0 tree should I be using? Allen -----Original Message----- From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] Sent: Friday, February 11, 2011 9:07 AM To: Kay, Allen M Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address space for >> ballooning. It should be completely unused (and unbacked by any pages) >> until the balloon driver populates it; it is reserved memory in the >> meantime. > On my system, the entire chunk is marked as usable memory: > > 0000000100000000 - 000000023a6f4000 (usable) > > When you said it is reserved memory, are you saying it should be marked as "reserved" or is there somewhere else in the code that keeps track of which portion of this e820 chunk is back by real memory and which chunk is "extra memory"?Yes, it is marked as usable in the E820 so that the kernel will allocate page structures for it. But then the extra part is reserved with memblock_x86_reserve_range(), which should prevent the kernel from ever trying to use that memory (ie, it will never get added to the pools of memory the allocator allocates from). The balloon driver backs these pseudo-physical pageframes with real memory pages, and then releases into the pool for allocation. J> -----Original Message----- > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Thursday, February 10, 2011 6:56 PM > To: Kay, Allen M > Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On 02/10/2011 05:03 PM, Kay, Allen M wrote: >> Konrad/Stefano, >> >> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. >> >> I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. >> >> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: >> >> 0000000100000000 - 000000016b45a000 (usable) >> >> After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: >> >> 0000000100000000 - 000000023a6f4000 (usable) >> >> This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: >> >> (XEN) 0000000100000000 - 00000001de600000 (usable) >> >> AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). > That "extra memory" stuff is reserving some physical address space for > ballooning. It should be completely unused (and unbacked by any pages) > until the balloon driver populates it; it is reserved memory in the > meantime. > > How is that memory getting referenced in your case? > >> Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem. > I don''t think that matters. Xen can choose not to use non-2M aligned > pieces of memory if it wants, but that doesn''t really affect the dom0 > kernel''s use of the host E820, because dom0 is only looking for possible > device memory, rather than RAM. > > J >> Allen >> >> -----Original Message----- >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >> Sent: Friday, January 28, 2011 7:48 AM >> To: Kay, Allen M >> Cc: xen-devel; Stefano Stabellini >> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >> >> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>>> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. >>>> >>>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >>> On a second look, this is a different issue than I had encountered. >>> >>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >>> is not right. Googling for this shows that I had fixed this with a >>> Xorg server at some point, but I can''t remember the details so that is not >>> that useful :-( >>> >>> You said it works if you give the domain 1024MB, but I wonder if >>> it also works if you disable the IOMMU? What happens then? >> Can you also patch your Xen hypervisor with this patch? It will print out the >> other 89 entries so we can see what type of values they have.. You might need to >> move it a bit as this is for xen-unstable. >> >> diff -r 003acf02d416 xen/arch/x86/mm.c >> --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 >> +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 >> @@ -1201,11 +1201,12 @@ >> return 0; >> >> fail: >> - MEM_LOG("Failure in alloc_l1_table: entry %d", i); >> + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); >> while ( i-- > 0 ) >> - if ( is_guest_l1_slot(i) ) >> + if ( is_guest_l1_slot(i) ) { >> + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); >> put_page_from_l1e(pl1e[i], d); >> - >> + } >> unmap_domain_page(pl1e); >> return -EINVAL; >> } >> >>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 >>>> 000000000: caf=8000000000000003 taf=1000000000000001 >>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 >>>> ] >>>> (XEN) domain_crash_sync called from entry.S >>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Feb-11 22:10 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
I switched to next-2.6.37 branch in Jeremy''s tree which has memblock_x86_reserve_range() function. The boot failure still occurs. RIP points to the BUG() call in arch/x86/xen/mm.c/pin_pagetable_pfn(). Any suggestions? Allen -----Original Message----- From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] Sent: Friday, February 11, 2011 6:51 AM To: Jeremy Fitzhardinge Cc: Kay, Allen M; Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Fri, 11 Feb 2011, Jeremy Fitzhardinge wrote:> On 02/10/2011 05:03 PM, Kay, Allen M wrote: > > Konrad/Stefano, > > > > Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. > > > > I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. > > > > Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: > > > > 0000000100000000 - 000000016b45a000 (usable) > > > > After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: > > > > 0000000100000000 - 000000023a6f4000 (usable) > > > > This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: > > > > (XEN) 0000000100000000 - 00000001de600000 (usable) > > > > AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). > > That "extra memory" stuff is reserving some physical address space for > ballooning. It should be completely unused (and unbacked by any pages) > until the balloon driver populates it; it is reserved memory in the > meantime. > > How is that memory getting referenced in your case? >In particular it would be very interesting to know what the RIP of the crash resolves to. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Feb-11 22:55 UTC
Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On 02/11/2011 11:00 AM, Kay, Allen M wrote:> The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops dom0.No, the function changed name, but the concept is the same..> I did find it in Konrad''s tree at git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git. > > So is this a problem for 2.6.32.27 stable tree? If so, which pvops dom0 tree should I be using?I *just* pushed .32.27 and haven''t had a chance to test it. The xen/stable-2.6.32.x branch contains the version of xen/next-2.6.32 which has at least passed an amount of testing (ie, boots on something at the very least). J> Allen > > -----Original Message----- > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Friday, February 11, 2011 9:07 AM > To: Kay, Allen M > Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On 02/10/2011 07:07 PM, Kay, Allen M wrote: >>> That "extra memory" stuff is reserving some physical address space for >>> ballooning. It should be completely unused (and unbacked by any pages) >>> until the balloon driver populates it; it is reserved memory in the >>> meantime. >> On my system, the entire chunk is marked as usable memory: >> >> 0000000100000000 - 000000023a6f4000 (usable) >> >> When you said it is reserved memory, are you saying it should be marked as "reserved" or is there somewhere else in the code that keeps track of which portion of this e820 chunk is back by real memory and which chunk is "extra memory"? > Yes, it is marked as usable in the E820 so that the kernel will allocate > page structures for it. But then the extra part is reserved with > memblock_x86_reserve_range(), which should prevent the kernel from ever > trying to use that memory (ie, it will never get added to the pools of > memory the allocator allocates from). The balloon driver backs these > pseudo-physical pageframes with real memory pages, and then releases > into the pool for allocation. > > J > >> -----Original Message----- >> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] >> Sent: Thursday, February 10, 2011 6:56 PM >> To: Kay, Allen M >> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser >> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >> >> On 02/10/2011 05:03 PM, Kay, Allen M wrote: >>> Konrad/Stefano, >>> >>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. >>> >>> I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. >>> >>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: >>> >>> 0000000100000000 - 000000016b45a000 (usable) >>> >>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: >>> >>> 0000000100000000 - 000000023a6f4000 (usable) >>> >>> This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: >>> >>> (XEN) 0000000100000000 - 00000001de600000 (usable) >>> >>> AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). >> That "extra memory" stuff is reserving some physical address space for >> ballooning. It should be completely unused (and unbacked by any pages) >> until the balloon driver populates it; it is reserved memory in the >> meantime. >> >> How is that memory getting referenced in your case? >> >>> Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem. >> I don''t think that matters. Xen can choose not to use non-2M aligned >> pieces of memory if it wants, but that doesn''t really affect the dom0 >> kernel''s use of the host E820, because dom0 is only looking for possible >> device memory, rather than RAM. >> >> J >>> Allen >>> >>> -----Original Message----- >>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >>> Sent: Friday, January 28, 2011 7:48 AM >>> To: Kay, Allen M >>> Cc: xen-devel; Stefano Stabellini >>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >>> >>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>>>> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. >>>>> >>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >>>> On a second look, this is a different issue than I had encountered. >>>> >>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >>>> is not right. Googling for this shows that I had fixed this with a >>>> Xorg server at some point, but I can''t remember the details so that is not >>>> that useful :-( >>>> >>>> You said it works if you give the domain 1024MB, but I wonder if >>>> it also works if you disable the IOMMU? What happens then? >>> Can you also patch your Xen hypervisor with this patch? It will print out the >>> other 89 entries so we can see what type of values they have.. You might need to >>> move it a bit as this is for xen-unstable. >>> >>> diff -r 003acf02d416 xen/arch/x86/mm.c >>> --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 >>> +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 >>> @@ -1201,11 +1201,12 @@ >>> return 0; >>> >>> fail: >>> - MEM_LOG("Failure in alloc_l1_table: entry %d", i); >>> + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); >>> while ( i-- > 0 ) >>> - if ( is_guest_l1_slot(i) ) >>> + if ( is_guest_l1_slot(i) ) { >>> + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); >>> put_page_from_l1e(pl1e[i], d); >>> - >>> + } >>> unmap_domain_page(pl1e); >>> return -EINVAL; >>> } >>> >>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 >>>>> 000000000: caf=8000000000000003 taf=1000000000000001 >>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 >>>>> ] >>>>> (XEN) domain_crash_sync called from entry.S >>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Feb-15 04:28 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
The failure occurs when dom0 tries to pin an l1 table entry because L1 table entry for the failed address is garbage. The memory range being pinned is in the range of this extra balloon driver memory not backed by real RAM. Is this intended? Here is the stack trace. I don''t see any code that trying to restrict what memory is pin-able in e820 table. Call Trace: [<ffffffff810057f8>] pin_pagetable_pfn+0x5c/0x69 [<ffffffff81c9652e>] xen_alloc_pte_init+0x2f/0x34 [<ffffffff81cd4bb4>] phys_pmd_init+0x234/0x2a0 [<ffffffff81006e5b>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e [<ffffffff81ca8b94>] ? early_memremap+0x13/0x15 [<ffffffff81cd4e20>] phys_pud_init+0x200/0x2d3 [<ffffffff81cd5215>] kernel_physical_mapping_init+0xd2/0x1ce [<ffffffff8146adfe>] init_memory_mapping+0x446/0x5b3 [<ffffffff8100542f>] ? xen_mc_issue+0x3b/0x57 [<ffffffff81cb551e>] ? memblock_reserve+0x1f/0x21 [<ffffffff81c975f6>] setup_arch+0x65a/0xb2b [<ffffffff8107b972>] ? clockevents_register_notifier+0x1b/0x45 [<ffffffff8147f033>] ? printk+0x41/0x46 [<ffffffff81c91ad4>] start_kernel+0xe4/0x41e [<ffffffff81c912cb>] x86_64_start_reservations+0xb6/0xba [<ffffffff81c9528f>] xen_start_kernel+0x5c9/0x5d0 -----Original Message----- From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] Sent: Friday, February 11, 2011 9:07 AM To: Kay, Allen M Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On 02/10/2011 07:07 PM, Kay, Allen M wrote:>> That "extra memory" stuff is reserving some physical address space for >> ballooning. It should be completely unused (and unbacked by any pages) >> until the balloon driver populates it; it is reserved memory in the >> meantime. > On my system, the entire chunk is marked as usable memory: > > 0000000100000000 - 000000023a6f4000 (usable) > > When you said it is reserved memory, are you saying it should be marked as "reserved" or is there somewhere else in the code that keeps track of which portion of this e820 chunk is back by real memory and which chunk is "extra memory"?Yes, it is marked as usable in the E820 so that the kernel will allocate page structures for it. But then the extra part is reserved with memblock_x86_reserve_range(), which should prevent the kernel from ever trying to use that memory (ie, it will never get added to the pools of memory the allocator allocates from). The balloon driver backs these pseudo-physical pageframes with real memory pages, and then releases into the pool for allocation. J> -----Original Message----- > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Thursday, February 10, 2011 6:56 PM > To: Kay, Allen M > Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On 02/10/2011 05:03 PM, Kay, Allen M wrote: >> Konrad/Stefano, >> >> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a few weeks ago. >> >> I finally got around to narrow down the problem the call to xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call increase the top of E820 memory in dom0 beyond what is actually available. >> >> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: >> >> 0000000100000000 - 000000016b45a000 (usable) >> >> After xen_add_extra_mem() is called, the last entry of dom0 e820 table becomes: >> >> 0000000100000000 - 000000023a6f4000 (usable) >> >> This pushes the top of RAM beyond what was reported by Xen''s e820 table, which is: >> >> (XEN) 0000000100000000 - 00000001de600000 (usable) >> >> AFAICT, the failure is caused by dom0 accessing non-existent physical memory. The failure went away after I removed the call to xen_add_extra_mem(). > That "extra memory" stuff is reserving some physical address space for > ballooning. It should be completely unused (and unbacked by any pages) > until the balloon driver populates it; it is reserved memory in the > meantime. > > How is that memory getting referenced in your case? > >> Another potential problem I noticed with e820 processing is that there is a discrepancy between how Xen processes e820 and how dom0 does it. In Xen (arch/x86/setup.c/start_xen()), e820 entries are aligned on L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, one of my e820 entry that is 1 page in size got dropped by Xen but got picked up in dom0. This does not cause problem in my case but the inconsistency on how memory is used by xen and dom0 can potentially be a problem. > I don''t think that matters. Xen can choose not to use non-2M aligned > pieces of memory if it wants, but that doesn''t really affect the dom0 > kernel''s use of the host E820, because dom0 is only looking for possible > device memory, rather than RAM. > > J >> Allen >> >> -----Original Message----- >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >> Sent: Friday, January 28, 2011 7:48 AM >> To: Kay, Allen M >> Cc: xen-devel; Stefano Stabellini >> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >> >> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>>> Following are the brief error messages from the serial console log. I have also attached the full serial console log and dom0 system map. >>>> >>>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >>> On a second look, this is a different issue than I had encountered. >>> >>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >>> is not right. Googling for this shows that I had fixed this with a >>> Xorg server at some point, but I can''t remember the details so that is not >>> that useful :-( >>> >>> You said it works if you give the domain 1024MB, but I wonder if >>> it also works if you disable the IOMMU? What happens then? >> Can you also patch your Xen hypervisor with this patch? It will print out the >> other 89 entries so we can see what type of values they have.. You might need to >> move it a bit as this is for xen-unstable. >> >> diff -r 003acf02d416 xen/arch/x86/mm.c >> --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 >> +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 >> @@ -1201,11 +1201,12 @@ >> return 0; >> >> fail: >> - MEM_LOG("Failure in alloc_l1_table: entry %d", i); >> + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 values:", i, pfn); >> while ( i-- > 0 ) >> - if ( is_guest_l1_slot(i) ) >> + if ( is_guest_l1_slot(i) ) { >> + MEM_LOG("L1[%d] = %lx", i, (unsigned long)l1e_get_intpte(pl1e[i])); >> put_page_from_l1e(pl1e[i], d); >> - >> + } >> unmap_domain_page(pl1e); >> return -EINVAL; >> } >> >>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 1000000 >>>> 000000000: caf=8000000000000003 taf=1000000000000001 >>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000 >>>> ] >>>> (XEN) domain_crash_sync called from entry.S >>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Feb-15 14:58 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Tue, 15 Feb 2011, Kay, Allen M wrote:> The failure occurs when dom0 tries to pin an l1 table entry because L1 table entry for the failed address is garbage. The memory range being pinned is in the range of this extra balloon driver memory not backed by real RAM. Is this intended? >The pfn in question is returned by alloc_low_page and is equal to e820_table_end, that should fall into the range e820_table_start-e820_table_top. This range is allocated by find_early_table_space in the initial pagetable mappings between 0x8000 and max_pfn_mapped, respecting the reserved memblock regions. Maybe we didn''t reserve some ranges in there that should have been reserved? What happen you apply this patch? diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 5e92b61..73a21db 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1653,9 +1653,6 @@ static __init void xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn) for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) { pte_t pte; - if (pfn > max_pfn_mapped) - max_pfn_mapped = pfn; - if (!pte_none(pte_page[pteidx])) continue; @@ -1713,6 +1710,8 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, pud_t *l3; pmd_t *l2; + max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list)); + /* Zap identity mapping */ init_level4_pgt[0] = __pgd(0); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Feb-16 03:08 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
Setting max_pfn_mapped here has no effect. It still fails the same way. Later on in setup_arch(), max_pfn_mapped is set to max_lower_pfn_mapped. For 64 bit dom0, it will try to set it again for memory above 4GB by calling init_memory_mapping(1UL<<32, max_pfn<<PAGE_SHIFT) - this is where it eventually fails. The problem is max_pfn also contains non-RAM extra memory. Should max_pfn set to xen_low_pfn_mapped before calling init_memory_mapping for memory above 4GB? Allen -----Original Message----- From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] Sent: Tuesday, February 15, 2011 6:58 AM To: Kay, Allen M Cc: Jeremy Fitzhardinge; Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser Subject: RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure On Tue, 15 Feb 2011, Kay, Allen M wrote:> The failure occurs when dom0 tries to pin an l1 table entry because L1 table entry for the failed address is garbage. The memory range being pinned is in the range of this extra balloon driver memory not backed by real RAM. Is this intended? >The pfn in question is returned by alloc_low_page and is equal to e820_table_end, that should fall into the range e820_table_start-e820_table_top. This range is allocated by find_early_table_space in the initial pagetable mappings between 0x8000 and max_pfn_mapped, respecting the reserved memblock regions. Maybe we didn''t reserve some ranges in there that should have been reserved? What happen you apply this patch? diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 5e92b61..73a21db 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1653,9 +1653,6 @@ static __init void xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn) for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) { pte_t pte; - if (pfn > max_pfn_mapped) - max_pfn_mapped = pfn; - if (!pte_none(pte_page[pteidx])) continue; @@ -1713,6 +1710,8 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, pud_t *l3; pmd_t *l2; + max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list)); + /* Zap identity mapping */ init_level4_pgt[0] = __pgd(0); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Feb-16 17:19 UTC
RE: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On Wed, 16 Feb 2011, Kay, Allen M wrote:> Setting max_pfn_mapped here has no effect. It still fails the same way. > > Later on in setup_arch(), max_pfn_mapped is set to max_lower_pfn_mapped. For 64 bit dom0, it will try to set it again for memory above 4GB by calling init_memory_mapping(1UL<<32, max_pfn<<PAGE_SHIFT) - this is where it eventually fails. >It shouldn''t make any difference what value max_pfn_mapped holds at that point. It mattered before because it changes the way the memory for the initial page table is allocated (see arch/x86/mm/init.c:find_early_table_space).> The problem is max_pfn also contains non-RAM extra memory. Should max_pfn set to xen_low_pfn_mapped before calling init_memory_mapping for memory above 4GB? >that shouldn''t be a problem because it is marked as reserved in xen_add_extra_mem, so it shouldn''t be used to store the pagetable pages (or anything else really). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel