On Wed, Jan 18, 2012 at 11:41:22AM -0500, Tom Goetz wrote: CC-ing xen-devel and David.> We have dom0_mem=672MB for Xen and mem=672MB for linux.Ok, if you don''t have the mem=X and have the "(''x86: use ''dom0_mem'' to limit the number of pages for dom0'') (c/s 23790) in your hypervisor what happens? And also have ''dom0_mem=max:672MB'' do you get the same issue?> > [ 0.000000] e820 remove range: 000000002a000000 - ffffffffffffffff (usable) > > appears to come from > > static int __init parse_memopt(char *p) > { > u64 mem_size; > > if (!p) > return -EINVAL; > > if (!strcmp(p, "nopentium")) { > #ifdef CONFIG_X86_32 > setup_clear_cpu_cap(X86_FEATURE_PSE); > return 0; > #else > printk(KERN_WARNING "mem=nopentium ignored! (only supported on x86_32)\n"); > return -EINVAL; > #endif > } > > userdef = 1; > mem_size = memparse(p, &p); > /* don''t remove all of memory when handling "mem={invalid}" param */ > if (mem_size == 0) > return -EINVAL; > e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1); <----------------------------- > > return 0; > } > early_param("mem", parse_memopt); > > but we have the same mem opt for 2.6.38 and 3.2 and the mem code still has the e820_remove_range in 3.2. Dom0 is showing the right amount of mem when booted on other machines, so I don''t think the mem= option is failing.The ''mem=X'' argument I remember being a work-around. The original bug had been fixed in both hypervisor and in the kernel.> > I''m taking a break for lunch now and I''ll did in further on the mem= option after. > > On Jan 18, 2012, at 11:34 AM, Konrad Rzeszutek Wilk wrote: > > > On Wed, Jan 18, 2012 at 11:02:48AM -0500, Tom Goetz wrote: > >> The E820s are different: > >> > >> Xen E820: > >> > >> (XEN) Xen-e820 RAM map: > >> (XEN) 0000000000000000 - 000000000009f000 (usable) > >> (XEN) 000000000009f000 - 00000000000a0000 (reserved) > >> (XEN) 0000000000100000 - 00000000bf65b800 (usable) > >> (XEN) 00000000bf65b800 - 00000000c0000000 (reserved) > >> (XEN) 00000000f8000000 - 00000000fc000000 (reserved) > >> (XEN) 00000000fec00000 - 00000000fec10000 (reserved) > >> (XEN) 00000000fed18000 - 00000000fed1c000 (reserved) > >> (XEN) 00000000fed20000 - 00000000fed90000 (reserved) > >> (XEN) 00000000feda0000 - 00000000feda6000 (reserved) > >> (XEN) 00000000fee00000 - 00000000fee10000 (reserved) > >> (XEN) 00000000ffe00000 - 0000000100000000 (reserved) > >> > >> 2.6.38 E820: > >> > >> [ 0.000000] BIOS-provided physical RAM map: > >> [ 0.000000] Xen: 0000000000000000 - 000000000009f000 (usable) > >> [ 0.000000] Xen: 000000000009f000 - 0000000000100000 (reserved) > >> [ 0.000000] Xen: 0000000000100000 - 000000002a000000 (usable) > >> [ 0.000000] Xen: 000000002a000000 - 00000000bf65b000 (unusable) > > > > Good. That is correct. > > > >> [ 0.000000] Xen: 00000000bf65b800 - 00000000c0000000 (reserved) > >> [ 0.000000] Xen: 00000000f8000000 - 00000000fc000000 (reserved) > >> [ 0.000000] Xen: 00000000fec00000 - 00000000fec10000 (reserved) > >> [ 0.000000] Xen: 00000000fed18000 - 00000000fed1c000 (reserved) > >> [ 0.000000] Xen: 00000000fed20000 - 00000000fed90000 (reserved) > >> [ 0.000000] Xen: 00000000feda0000 - 00000000feda6000 (reserved) > >> [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) > >> [ 0.000000] Xen: 00000000ffe00000 - 0000000100000000 (reserved) > >> [ 0.000000] Xen: 0000000100000000 - 000000019565b000 (usable) > >> [ 0.000000] e820 remove range: 000000002a000000 - ffffffffffffffff (usable) > >> [ 0.000000] NX (Execute Disable) protection: active > >> [ 0.000000] user-defined physical RAM map: > >> [ 0.000000] user: 0000000000000000 - 000000000009f000 (usable) - 1 > >> [ 0.000000] user: 000000000009f000 - 0000000000100000 (reserved) - 2 > >> [ 0.000000] user: 0000000000100000 - 000000002a000000 (usable) - 3 > >> [ 0.000000] user: 000000002a000000 - 00000000bf65b000 (unusable) - 4 <------------ This isn''t in the Xen version either. > > > > Yup, that is OK. We want that region to be mapped as ''unusable''. > > > > That will make the intel-agp code _not_ use that region (which we > > should not as that is a RAM region). > > > >> [ 0.000000] user: 00000000bf65b800 - 00000000c0000000 (reserved) - 5 > >> [ 0.000000] user: 00000000f8000000 - 00000000fc000000 (reserved) - 6 > >> [ 0.000000] user: 00000000fec00000 - 00000000fec10000 (reserved) - 7 > >> [ 0.000000] user: 00000000fed18000 - 00000000fed1c000 (reserved) - 8 > >> [ 0.000000] user: 00000000fed20000 - 00000000fed90000 (reserved) - 9 > >> [ 0.000000] user: 00000000feda0000 - 00000000feda6000 (reserved) - 10 > >> [ 0.000000] user: 00000000fee00000 - 00000000fee10000 (reserved) - 11 > >> [ 0.000000] user: 00000000ffe00000 - 0000000100000000 (reserved) - 12 > >> [ 0.000000] DMI 2.4 present. > >> [ 0.000000] DMI: Dell Inc. Latitude D830 /0HN341, BIOS A05 11/05/2007 > >> [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) <---- 3.2 is also missing these lines > >> [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) > >> > >> > >> 3.2 E820: > >> > >> [ 0.000000] Set 264710 page(s) to 1-1 mapping > >> > >> [ 0.000000] BIOS-provided physical RAM map: > >> [ 0.000000] Xen: 0000000000000000 - 000000000009f000 (usable) > >> [ 0.000000] Xen: 000000000009f000 - 0000000000100000 (reserved) > >> [ 0.000000] Xen: 0000000000100000 - 00000000bf65b000 (usable) > > > > So here, we should have had the > > > > 2a000 -> bf65b marked as unsuable.On a second thought that is OK too. The 2a00->bf65b will protect the region from being slurped up by the PCI as "gap" region.> > > > You booted the kernel with the same dom0_mem=X argument right? > > > >> [ 0.000000] Xen: 00000000bf65b800 - 00000000c0000000 (reserved) > >> [ 0.000000] Xen: 00000000f8000000 - 00000000fc000000 (reserved) > >> [ 0.000000] Xen: 00000000fec00000 - 00000000fec10000 (reserved) > >> [ 0.000000] Xen: 00000000fed18000 - 00000000fed1c000 (reserved) > >> [ 0.000000] Xen: 00000000fed20000 - 00000000fed90000 (reserved) > >> [ 0.000000] Xen: 00000000feda0000 - 00000000feda6000 (reserved) > >> [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) > >> [ 0.000000] Xen: 00000000ffe00000 - 0000000100000000 (reserved) > >> [ 0.000000] NX (Execute Disable) protection: active > >> > >> [ 0.000000] user-defined physical RAM map: > >> [ 0.000000] user: 0000000000000000 - 000000000009f000 (usable) - 1 > >> [ 0.000000] user: 000000000009f000 - 0000000000100000 (reserved) - 2 > >> [ 0.000000] user: 0000000000100000 - 000000002a000000 (usable) - 3Ah, and this now punches the E820 with 2a000->bf65b as a "gap" and it ends up being used by the PCI subsystem. That is the problem. So ... can you make sure you have that hypervisor fix in and boot it without ''mem'' and see what the E820 comes out as? Thanks!> >> [ 0.000000] user: 00000000bf65b800 - 00000000c0000000 (reserved) - 5 > >> [ 0.000000] user: 00000000f8000000 - 00000000fc000000 (reserved) - 6 > >> [ 0.000000] user: 00000000fec00000 - 00000000fec10000 (reserved) - 7 > >> [ 0.000000] user: 00000000fed18000 - 00000000fed1c000 (reserved) - 8 > >> [ 0.000000] user: 00000000fed20000 - 00000000fed90000 (reserved) - 9 > >> [ 0.000000] user: 00000000feda0000 - 00000000feda6000 (reserved) - 10 > >> [ 0.000000] user: 00000000fee00000 - 00000000fee10000 (reserved) - 11 > >> [ 0.000000] user: 00000000ffe00000 - 0000000100000000 (reserved) - 12 > >> > >> On Jan 17, 2012, at 4:09 PM, Konrad Rzeszutek Wilk wrote: > >> > >>> On Tue, Jan 17, 2012 at 03:58:11PM -0500, Tom Goetz wrote: > >>>> Konrad, > >>>> > >>>> We''re seeing a crash on an Intel video Core2Duo. The crash looks similar to this one: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726. The last comment gives a commit ID for a fix. I don''t find that commit in any of our trees. Do you know anything about this? > >>> > >>> Yes. It was 2f14ddc3a7146ea4cd5a3d1ecd993f85f2e4f948 > >>> > >>> but that was a fix in 2.6.39 (I think) and you are using 3.2. > >>> > >>> Which could be releated to the fact that in 3.2 the E820 code > >>> (arch/x86/xen/setup.c) went through some surgery to make it easier. > >>> > >>> But the code in it looks like it handles it correctly. Hm, > >>> any chance you can see what the Xen E820 looks in 3.2 vs anything > >>> before v3.2? > >>> > >>>> > >>>> Thanks for any help, > >>>> > >>>> Tom > >>>> > >>>> Dom0 mem was restricted to 672MB. The machine has 3GB. > >>>> > >>>> > >>>> [ 2.463600] agpgart-intel 0000:00:00.0: Intel 965GM Chipset^M > >>>> (XEN) mm.c:878:d0 Error getting mfn 30600 (pfn 5555555555555555) from L1 entry 8000000030600473 for l1e_owner=0, pg_owner=0 > >>>> (XEN) mm.c:4664:d0 ptwr_emulate: could not get_page_from_l1e() > >>>> [ 2.463891] BUG: unable to handle kernel paging request at ffff880023f28c30^M > >>>> [ 2.463904] IP: [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M > >>>> [ 2.463921] PGD 1a06067 PUD 1a0a067 PMD 209d067 PTE 8010000023f28065^M > >>>> [ 2.463934] Oops: 0003 [#1] SMP ^M > >>>> [ 2.463943] CPU 1 ^M > >>>> [ 2.463946] Modules linked in: intel_agp(+) intel_gtt^M > >>>> [ 2.463957] ^M > >>>> [ 2.463961] Pid: 128, comm: modprobe Not tainted 3.2.1-orc #102 Dell Inc. Latitude D830 /0HN341^M > >>>> [ 2.463974] RIP: e030:[<ffffffff81008bee>] [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M > >>>> [ 2.463984] RSP: e02b:ffff880004b91ac8 EFLAGS: 00010297^M > >>>> [ 2.463990] RAX: 0000000000000000 RBX: 8000000030600473 RCX: 8000000030600473^M > >>>> [ 2.463996] RDX: 0000000000000000 RSI: ffffc90000186000 RDI: ffffffff81a38020^M > >>>> [ 2.464002] RBP: ffff880004b91b18 R08: ffff880004d87d80 R09: 00000000000000d0^M > >>>> [ 2.464009] R10: ffffe8ffffffffff R11: ffffc90000000000 R12: ffff880023f28c30^M > >>>> [ 2.464015] R13: 0000000000030600 R14: ffff880023f28c30 R15: ffffc90000187000^M > >>>> [ 2.464024] FS: 00007f11b34db720(0000) GS:ffff880029fd1000(0000) knlGS:0000000000000000^M > >>>> [ 2.464031] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b^M > >>>> [ 2.464037] CR2: ffff880023f28c30 CR3: 0000000004bf1000 CR4: 0000000000002660^M > >>>> [ 2.464044] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M > >>>> [ 2.464050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M > >>>> [ 2.464057] Process modprobe (pid: 128, threadinfo ffff880004b90000, task ffff880004ac96b0)^M > >>>> [ 2.464063] Stack:^M > >>>> [ 2.464067] ffffc90000186000 ffffffff81a38020 ffffffff810051ed ffffc90000000000^M > >>>> [ 2.464079] ffffe8ffffffffff ffffc90000186000 ffff880023f28c30 0000000000030600^M > >>>> [ 2.464091] 8000000000000573 ffffc90000187000 ffff880004b91bc8 ffffffff812b01e4^M > >>>> [ 2.464104] Call Trace:^M > >>>> [ 2.464111] [<ffffffff810051ed>] ? __raw_callee_save_xen_make_pte+0x11/0x1e^M > >>>> [ 2.464121] [<ffffffff812b01e4>] ioremap_page_range+0x214/0x2f0^M > >>>> [ 2.464130] [<ffffffff8113b6a2>] ? insert_vmalloc_vmlist+0x22/0x80^M > >>>> [ 2.464140] [<ffffffff8103dc43>] __ioremap_caller+0x283/0x390^M > >>>> [ 2.464149] [<ffffffffa000070a>] ? i9xx_setup+0x20a/0x2e0 [intel_gtt]^M > >>>> [ 2.464158] [<ffffffff81579cee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30^M > >>>> [ 2.464166] [<ffffffff8103dda7>] ioremap_nocache+0x17/0x20^M > >>>> [ 2.464173] [<ffffffffa000070a>] i9xx_setup+0x20a/0x2e0 [intel_gtt]^M > >>>> [ 2.464181] [<ffffffffa0001739>] intel_gmch_probe+0x369/0xa08 [intel_gtt]^M > >>>> [ 2.464190] [<ffffffffa0009e8a>] agp_intel_probe+0x48/0x19f [intel_agp]^M > >>>> [ 2.464198] [<ffffffff812d794c>] local_pci_probe+0x5c/0xd0^M > >>>> [ 2.464205] [<ffffffff812d9201>] pci_device_probe+0x101/0x120^M > >>>> [ 2.464214] [<ffffffff81392f5e>] driver_probe_device+0x7e/0x1b0^M > >>>> [ 2.464222] [<ffffffff8139313b>] __driver_attach+0xab/0xb0^M > >>>> [ 2.464229] [<ffffffff81393090>] ? driver_probe_device+0x1b0/0x1b0^M > >>>> [ 2.464236] [<ffffffff81393090>] ? driver_probe_device+0x1b0/0x1b0^M > >>>> [ 2.464244] [<ffffffff81391f1c>] bus_for_each_dev+0x5c/0x90^M > >>>> [ 2.464252] [<ffffffff81392bee>] driver_attach+0x1e/0x20^M > >>>> [ 2.464259] [<ffffffff81392840>] bus_add_driver+0x1a0/0x270^M > >>>> [ 2.464266] [<ffffffffa000d000>] ? 0xffffffffa000cfff^M > >>>> [ 2.464273] [<ffffffff813936a6>] driver_register+0x76/0x140^M > >>>> [ 2.464280] [<ffffffff8157d89d>] ? notifier_call_chain+0x4d/0x70^M > >>>> [ 2.464287] [<ffffffffa000d000>] ? 0xffffffffa000cfff^M > >>>> [ 2.464294] [<ffffffff812d8ed5>] __pci_register_driver+0x55/0xd0^M > >>>> [ 2.464303] [<ffffffff81089173>] ? __blocking_notifier_call_chain+0x63/0x80^M > >>>> [ 2.464312] [<ffffffffa000d02c>] agp_intel_init+0x2c/0x2e [intel_agp]^M > >>>> [ 2.464320] [<ffffffff81002040>] do_one_initcall+0x40/0x180^M > >>>> [ 2.464328] [<ffffffff810a0561>] sys_init_module+0x91/0x200^M > >>>> [ 2.464336] [<ffffffff81581b02>] system_call_fastpath+0x16/0x1b^M > >>>> [ 2.464341] Code: e8 4c 89 75 f0 4c 89 7d f8 66 66 66 66 90 48 89 7d b8 48 89 75 b0 49 89 d6 48 89 cb 66 66 66 66 90 e8 57 1b 03 00 83 f8 01 74 75 <49> 89 1e 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b ^M > >>>> [ 2.464450] RIP [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M > >>>> [ 2.464459] RSP <ffff880004b91ac8>^M > >>>> [ 2.464463] CR2: ffff880023f28c30^M > >>>> [ 2.464469] ---[ end trace 5223388e4a422cb4 ]---^M > >>>> > >>>> > >>>> --- > >>>> Tom Goetz > >>>> tom.goetz@virtualcomputer.com > >>>> > >>>> > >> > >> --- > >> Tom Goetz > >> tom.goetz@virtualcomputer.com > >> > >> > >> > > --- > Tom Goetz > tom.goetz@virtualcomputer.com > > >
On 18/01/12 16:51, Konrad Rzeszutek Wilk wrote:> On Wed, Jan 18, 2012 at 11:41:22AM -0500, Tom Goetz wrote: > > CC-ing xen-devel and David. > >> We have dom0_mem=672MB for Xen and mem=672MB for linux. > > Ok, if you don''t have the mem=X and have the "(''x86: use ''dom0_mem'' to limit > the number of pages for dom0'') (c/s 23790) in your hypervisor what happens? > > And also have ''dom0_mem=max:672MB'' do you get the same issue?The kernel''s mem option should be marking the extra memory as unusable instead of just removing it from the E820. I''ll take a look at this -- it should be pretty straight-forward. I would recommend what Konrad says above. This ought to work. David>> [ 0.000000] e820 remove range: 000000002a000000 - ffffffffffffffff (usable) >> >> appears to come from >> >> static int __init parse_memopt(char *p) >> { >> u64 mem_size; >> >> if (!p) >> return -EINVAL; >> >> if (!strcmp(p, "nopentium")) { >> #ifdef CONFIG_X86_32 >> setup_clear_cpu_cap(X86_FEATURE_PSE); >> return 0; >> #else >> printk(KERN_WARNING "mem=nopentium ignored! (only supported on x86_32)\n"); >> return -EINVAL; >> #endif >> } >> >> userdef = 1; >> mem_size = memparse(p, &p); >> /* don''t remove all of memory when handling "mem={invalid}" param */ >> if (mem_size == 0) >> return -EINVAL; >> e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1); <----------------------------- >> >> return 0; >> } >> early_param("mem", parse_memopt); >> >> but we have the same mem opt for 2.6.38 and 3.2 and the mem code still has the e820_remove_range in 3.2. Dom0 is showing the right amount of mem when booted on other machines, so I don''t think the mem= option is failing. > > The ''mem=X'' argument I remember being a work-around. The original bug had been fixed in both > hypervisor and in the kernel. > > >> >> I''m taking a break for lunch now and I''ll did in further on the mem= option after. >> >> On Jan 18, 2012, at 11:34 AM, Konrad Rzeszutek Wilk wrote: >> >>> On Wed, Jan 18, 2012 at 11:02:48AM -0500, Tom Goetz wrote: >>>> The E820s are different: >>>> >>>> Xen E820: >>>> >>>> (XEN) Xen-e820 RAM map: >>>> (XEN) 0000000000000000 - 000000000009f000 (usable) >>>> (XEN) 000000000009f000 - 00000000000a0000 (reserved) >>>> (XEN) 0000000000100000 - 00000000bf65b800 (usable) >>>> (XEN) 00000000bf65b800 - 00000000c0000000 (reserved) >>>> (XEN) 00000000f8000000 - 00000000fc000000 (reserved) >>>> (XEN) 00000000fec00000 - 00000000fec10000 (reserved) >>>> (XEN) 00000000fed18000 - 00000000fed1c000 (reserved) >>>> (XEN) 00000000fed20000 - 00000000fed90000 (reserved) >>>> (XEN) 00000000feda0000 - 00000000feda6000 (reserved) >>>> (XEN) 00000000fee00000 - 00000000fee10000 (reserved) >>>> (XEN) 00000000ffe00000 - 0000000100000000 (reserved) >>>> >>>> 2.6.38 E820: >>>> >>>> [ 0.000000] BIOS-provided physical RAM map: >>>> [ 0.000000] Xen: 0000000000000000 - 000000000009f000 (usable) >>>> [ 0.000000] Xen: 000000000009f000 - 0000000000100000 (reserved) >>>> [ 0.000000] Xen: 0000000000100000 - 000000002a000000 (usable) >>>> [ 0.000000] Xen: 000000002a000000 - 00000000bf65b000 (unusable) >>> >>> Good. That is correct. >>> >>>> [ 0.000000] Xen: 00000000bf65b800 - 00000000c0000000 (reserved) >>>> [ 0.000000] Xen: 00000000f8000000 - 00000000fc000000 (reserved) >>>> [ 0.000000] Xen: 00000000fec00000 - 00000000fec10000 (reserved) >>>> [ 0.000000] Xen: 00000000fed18000 - 00000000fed1c000 (reserved) >>>> [ 0.000000] Xen: 00000000fed20000 - 00000000fed90000 (reserved) >>>> [ 0.000000] Xen: 00000000feda0000 - 00000000feda6000 (reserved) >>>> [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) >>>> [ 0.000000] Xen: 00000000ffe00000 - 0000000100000000 (reserved) >>>> [ 0.000000] Xen: 0000000100000000 - 000000019565b000 (usable) >>>> [ 0.000000] e820 remove range: 000000002a000000 - ffffffffffffffff (usable) >>>> [ 0.000000] NX (Execute Disable) protection: active >>>> [ 0.000000] user-defined physical RAM map: >>>> [ 0.000000] user: 0000000000000000 - 000000000009f000 (usable) - 1 >>>> [ 0.000000] user: 000000000009f000 - 0000000000100000 (reserved) - 2 >>>> [ 0.000000] user: 0000000000100000 - 000000002a000000 (usable) - 3 >>>> [ 0.000000] user: 000000002a000000 - 00000000bf65b000 (unusable) - 4 <------------ This isn''t in the Xen version either. >>> >>> Yup, that is OK. We want that region to be mapped as ''unusable''. >>> >>> That will make the intel-agp code _not_ use that region (which we >>> should not as that is a RAM region). >>> >>>> [ 0.000000] user: 00000000bf65b800 - 00000000c0000000 (reserved) - 5 >>>> [ 0.000000] user: 00000000f8000000 - 00000000fc000000 (reserved) - 6 >>>> [ 0.000000] user: 00000000fec00000 - 00000000fec10000 (reserved) - 7 >>>> [ 0.000000] user: 00000000fed18000 - 00000000fed1c000 (reserved) - 8 >>>> [ 0.000000] user: 00000000fed20000 - 00000000fed90000 (reserved) - 9 >>>> [ 0.000000] user: 00000000feda0000 - 00000000feda6000 (reserved) - 10 >>>> [ 0.000000] user: 00000000fee00000 - 00000000fee10000 (reserved) - 11 >>>> [ 0.000000] user: 00000000ffe00000 - 0000000100000000 (reserved) - 12 >>>> [ 0.000000] DMI 2.4 present. >>>> [ 0.000000] DMI: Dell Inc. Latitude D830 /0HN341, BIOS A05 11/05/2007 >>>> [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) <---- 3.2 is also missing these lines >>>> [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) >>>> >>>> >>>> 3.2 E820: >>>> >>>> [ 0.000000] Set 264710 page(s) to 1-1 mapping >>>> >>>> [ 0.000000] BIOS-provided physical RAM map: >>>> [ 0.000000] Xen: 0000000000000000 - 000000000009f000 (usable) >>>> [ 0.000000] Xen: 000000000009f000 - 0000000000100000 (reserved) >>>> [ 0.000000] Xen: 0000000000100000 - 00000000bf65b000 (usable) >>> >>> So here, we should have had the >>> >>> 2a000 -> bf65b marked as unsuable. > > On a second thought that is OK too. The 2a00->bf65b will > protect the region from being slurped up by the PCI as "gap" region. >>> >>> You booted the kernel with the same dom0_mem=X argument right? >>> >>>> [ 0.000000] Xen: 00000000bf65b800 - 00000000c0000000 (reserved) >>>> [ 0.000000] Xen: 00000000f8000000 - 00000000fc000000 (reserved) >>>> [ 0.000000] Xen: 00000000fec00000 - 00000000fec10000 (reserved) >>>> [ 0.000000] Xen: 00000000fed18000 - 00000000fed1c000 (reserved) >>>> [ 0.000000] Xen: 00000000fed20000 - 00000000fed90000 (reserved) >>>> [ 0.000000] Xen: 00000000feda0000 - 00000000feda6000 (reserved) >>>> [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) >>>> [ 0.000000] Xen: 00000000ffe00000 - 0000000100000000 (reserved) >>>> [ 0.000000] NX (Execute Disable) protection: active >>>> >>>> [ 0.000000] user-defined physical RAM map: >>>> [ 0.000000] user: 0000000000000000 - 000000000009f000 (usable) - 1 >>>> [ 0.000000] user: 000000000009f000 - 0000000000100000 (reserved) - 2 >>>> [ 0.000000] user: 0000000000100000 - 000000002a000000 (usable) - 3 > > Ah, and this now punches the E820 with 2a000->bf65b as a "gap" and > it ends up being used by the PCI subsystem. > > That is the problem. So ... can you make sure you have that > hypervisor fix in and boot it without ''mem'' and see what the E820 comes out as? > > Thanks! >>>> [ 0.000000] user: 00000000bf65b800 - 00000000c0000000 (reserved) - 5 >>>> [ 0.000000] user: 00000000f8000000 - 00000000fc000000 (reserved) - 6 >>>> [ 0.000000] user: 00000000fec00000 - 00000000fec10000 (reserved) - 7 >>>> [ 0.000000] user: 00000000fed18000 - 00000000fed1c000 (reserved) - 8 >>>> [ 0.000000] user: 00000000fed20000 - 00000000fed90000 (reserved) - 9 >>>> [ 0.000000] user: 00000000feda0000 - 00000000feda6000 (reserved) - 10 >>>> [ 0.000000] user: 00000000fee00000 - 00000000fee10000 (reserved) - 11 >>>> [ 0.000000] user: 00000000ffe00000 - 0000000100000000 (reserved) - 12 >>>> >>>> On Jan 17, 2012, at 4:09 PM, Konrad Rzeszutek Wilk wrote: >>>> >>>>> On Tue, Jan 17, 2012 at 03:58:11PM -0500, Tom Goetz wrote: >>>>>> Konrad, >>>>>> >>>>>> We''re seeing a crash on an Intel video Core2Duo. The crash looks similar to this one: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726. The last comment gives a commit ID for a fix. I don''t find that commit in any of our trees. Do you know anything about this? >>>>> >>>>> Yes. It was 2f14ddc3a7146ea4cd5a3d1ecd993f85f2e4f948 >>>>> >>>>> but that was a fix in 2.6.39 (I think) and you are using 3.2. >>>>> >>>>> Which could be releated to the fact that in 3.2 the E820 code >>>>> (arch/x86/xen/setup.c) went through some surgery to make it easier. >>>>> >>>>> But the code in it looks like it handles it correctly. Hm, >>>>> any chance you can see what the Xen E820 looks in 3.2 vs anything >>>>> before v3.2? >>>>> >>>>>> >>>>>> Thanks for any help, >>>>>> >>>>>> Tom >>>>>> >>>>>> Dom0 mem was restricted to 672MB. The machine has 3GB. >>>>>> >>>>>> >>>>>> [ 2.463600] agpgart-intel 0000:00:00.0: Intel 965GM Chipset^M >>>>>> (XEN) mm.c:878:d0 Error getting mfn 30600 (pfn 5555555555555555) from L1 entry 8000000030600473 for l1e_owner=0, pg_owner=0 >>>>>> (XEN) mm.c:4664:d0 ptwr_emulate: could not get_page_from_l1e() >>>>>> [ 2.463891] BUG: unable to handle kernel paging request at ffff880023f28c30^M >>>>>> [ 2.463904] IP: [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M >>>>>> [ 2.463921] PGD 1a06067 PUD 1a0a067 PMD 209d067 PTE 8010000023f28065^M >>>>>> [ 2.463934] Oops: 0003 [#1] SMP ^M >>>>>> [ 2.463943] CPU 1 ^M >>>>>> [ 2.463946] Modules linked in: intel_agp(+) intel_gtt^M >>>>>> [ 2.463957] ^M >>>>>> [ 2.463961] Pid: 128, comm: modprobe Not tainted 3.2.1-orc #102 Dell Inc. Latitude D830 /0HN341^M >>>>>> [ 2.463974] RIP: e030:[<ffffffff81008bee>] [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M >>>>>> [ 2.463984] RSP: e02b:ffff880004b91ac8 EFLAGS: 00010297^M >>>>>> [ 2.463990] RAX: 0000000000000000 RBX: 8000000030600473 RCX: 8000000030600473^M >>>>>> [ 2.463996] RDX: 0000000000000000 RSI: ffffc90000186000 RDI: ffffffff81a38020^M >>>>>> [ 2.464002] RBP: ffff880004b91b18 R08: ffff880004d87d80 R09: 00000000000000d0^M >>>>>> [ 2.464009] R10: ffffe8ffffffffff R11: ffffc90000000000 R12: ffff880023f28c30^M >>>>>> [ 2.464015] R13: 0000000000030600 R14: ffff880023f28c30 R15: ffffc90000187000^M >>>>>> [ 2.464024] FS: 00007f11b34db720(0000) GS:ffff880029fd1000(0000) knlGS:0000000000000000^M >>>>>> [ 2.464031] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b^M >>>>>> [ 2.464037] CR2: ffff880023f28c30 CR3: 0000000004bf1000 CR4: 0000000000002660^M >>>>>> [ 2.464044] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M >>>>>> [ 2.464050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M >>>>>> [ 2.464057] Process modprobe (pid: 128, threadinfo ffff880004b90000, task ffff880004ac96b0)^M >>>>>> [ 2.464063] Stack:^M >>>>>> [ 2.464067] ffffc90000186000 ffffffff81a38020 ffffffff810051ed ffffc90000000000^M >>>>>> [ 2.464079] ffffe8ffffffffff ffffc90000186000 ffff880023f28c30 0000000000030600^M >>>>>> [ 2.464091] 8000000000000573 ffffc90000187000 ffff880004b91bc8 ffffffff812b01e4^M >>>>>> [ 2.464104] Call Trace:^M >>>>>> [ 2.464111] [<ffffffff810051ed>] ? __raw_callee_save_xen_make_pte+0x11/0x1e^M >>>>>> [ 2.464121] [<ffffffff812b01e4>] ioremap_page_range+0x214/0x2f0^M >>>>>> [ 2.464130] [<ffffffff8113b6a2>] ? insert_vmalloc_vmlist+0x22/0x80^M >>>>>> [ 2.464140] [<ffffffff8103dc43>] __ioremap_caller+0x283/0x390^M >>>>>> [ 2.464149] [<ffffffffa000070a>] ? i9xx_setup+0x20a/0x2e0 [intel_gtt]^M >>>>>> [ 2.464158] [<ffffffff81579cee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30^M >>>>>> [ 2.464166] [<ffffffff8103dda7>] ioremap_nocache+0x17/0x20^M >>>>>> [ 2.464173] [<ffffffffa000070a>] i9xx_setup+0x20a/0x2e0 [intel_gtt]^M >>>>>> [ 2.464181] [<ffffffffa0001739>] intel_gmch_probe+0x369/0xa08 [intel_gtt]^M >>>>>> [ 2.464190] [<ffffffffa0009e8a>] agp_intel_probe+0x48/0x19f [intel_agp]^M >>>>>> [ 2.464198] [<ffffffff812d794c>] local_pci_probe+0x5c/0xd0^M >>>>>> [ 2.464205] [<ffffffff812d9201>] pci_device_probe+0x101/0x120^M >>>>>> [ 2.464214] [<ffffffff81392f5e>] driver_probe_device+0x7e/0x1b0^M >>>>>> [ 2.464222] [<ffffffff8139313b>] __driver_attach+0xab/0xb0^M >>>>>> [ 2.464229] [<ffffffff81393090>] ? driver_probe_device+0x1b0/0x1b0^M >>>>>> [ 2.464236] [<ffffffff81393090>] ? driver_probe_device+0x1b0/0x1b0^M >>>>>> [ 2.464244] [<ffffffff81391f1c>] bus_for_each_dev+0x5c/0x90^M >>>>>> [ 2.464252] [<ffffffff81392bee>] driver_attach+0x1e/0x20^M >>>>>> [ 2.464259] [<ffffffff81392840>] bus_add_driver+0x1a0/0x270^M >>>>>> [ 2.464266] [<ffffffffa000d000>] ? 0xffffffffa000cfff^M >>>>>> [ 2.464273] [<ffffffff813936a6>] driver_register+0x76/0x140^M >>>>>> [ 2.464280] [<ffffffff8157d89d>] ? notifier_call_chain+0x4d/0x70^M >>>>>> [ 2.464287] [<ffffffffa000d000>] ? 0xffffffffa000cfff^M >>>>>> [ 2.464294] [<ffffffff812d8ed5>] __pci_register_driver+0x55/0xd0^M >>>>>> [ 2.464303] [<ffffffff81089173>] ? __blocking_notifier_call_chain+0x63/0x80^M >>>>>> [ 2.464312] [<ffffffffa000d02c>] agp_intel_init+0x2c/0x2e [intel_agp]^M >>>>>> [ 2.464320] [<ffffffff81002040>] do_one_initcall+0x40/0x180^M >>>>>> [ 2.464328] [<ffffffff810a0561>] sys_init_module+0x91/0x200^M >>>>>> [ 2.464336] [<ffffffff81581b02>] system_call_fastpath+0x16/0x1b^M >>>>>> [ 2.464341] Code: e8 4c 89 75 f0 4c 89 7d f8 66 66 66 66 90 48 89 7d b8 48 89 75 b0 49 89 d6 48 89 cb 66 66 66 66 90 e8 57 1b 03 00 83 f8 01 74 75 <49> 89 1e 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b ^M >>>>>> [ 2.464450] RIP [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M >>>>>> [ 2.464459] RSP <ffff880004b91ac8>^M >>>>>> [ 2.464463] CR2: ffff880023f28c30^M >>>>>> [ 2.464469] ---[ end trace 5223388e4a422cb4 ]---^M >>>>>> >>>>>> >>>>>> --- >>>>>> Tom Goetz >>>>>> tom.goetz@virtualcomputer.com >>>>>> >>>>>> >>>> >>>> --- >>>> Tom Goetz >>>> tom.goetz@virtualcomputer.com >>>> >>>> >>>> >> >> --- >> Tom Goetz >> tom.goetz@virtualcomputer.com >> >> >>
On Jan 18, 2012, at 11:51 AM, Konrad Rzeszutek Wilk wrote:> On Wed, Jan 18, 2012 at 11:41:22AM -0500, Tom Goetz wrote: > > CC-ing xen-devel and David. > >> We have dom0_mem=672MB for Xen and mem=672MB for linux. > > Ok, if you don''t have the mem=X and have the "(''x86: use ''dom0_mem'' to limit > the number of pages for dom0'') (c/s 23790) in your hypervisor what happens? > > And also have ''dom0_mem=max:672MB'' do you get the same issue?With dom0_mem= and no mem=, it boots fine.>> >> but we have the same mem opt for 2.6.38 and 3.2 and the mem code still has the e820_remove_range in 3.2. Dom0 is showing the right amount of mem when booted on other machines, so I don''t think the mem= option is failing. > > The ''mem=X'' argument I remember being a work-around. The original bug had been fixed in both > hypervisor and in the kernel.I checked with the guy who added the mem= option and he doesn''t remember why. I going to remove and go from there. Thanks!