Mukesh Rathor
2009-Feb-10 03:08 UTC
[Xen-devel] bring up Hypervisor on large (512GB) memory
Hi all, Trying to bring up the hypervisor on 512GB : 1. Started with xen 3.1.4 (last oracle release), and 512GB, the system panic''d right away: (XEN) Early fatal page fault at e008:ffff828c801ce0ad (cr2=ffff8300cfc00000, ec=0002) I noticed an anomaly here in the RAM map: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009f400 (usable) (XEN) 000000000009f400 - 00000000000a0000 (reserved) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cfd4c000 (usable) (XEN) 00000000cfd4c000 - 00000000cfd56000 (ACPI data) (XEN) 00000000cfd56000 - 00000000cfd57000 (usable) (XEN) 00000000cfd57000 - 00000000e0000000 (reserved) (XEN) 00000000fec00000 - 00000000fed00000 (reserved) (XEN) 00000000fee00000 - 00000000fee10000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000802ffff000 (usable) <------- Seems that should be 0000008000000000 ??? 2. Reduce some RAM, and booting with 440GB, map makes more sense: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009f400 (usable) (XEN) 000000000009f400 - 00000000000a0000 (reserved) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cfd4c000 (usable) (XEN) 00000000cfd4c000 - 00000000cfd56000 (ACPI data) (XEN) 00000000cfd56000 - 00000000cfd57000 (usable) (XEN) 00000000cfd57000 - 00000000e0000000 (reserved) (XEN) 00000000fec00000 - 00000000fed00000 (reserved) (XEN) 00000000fee00000 - 00000000fee10000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000006e00000000 (usable) Panic again: (XEN) Early fatal page fault at e008:ffff828c80210460 (cr2=ffff8300cfc00000, ec=0002) The panic is trying to allocate bitmap in init_boot_allocator(). The bitmap starts at cfac0000 and given the size dc1000, won''t fit. 3. Moving to unstable 19164, looks like things are even more tighter!! I couldn''t even boot with 64GB. bitmap_start:cfac0000, bitmap_size:201000, alloc_bitmap:ffff8300cfac0000 The only solution I can think of is moving the bitmap elsewhere, above 4GB in this case: figure the size of bitmap, DIRECT map space, allocate the map, mark it reserved in the RAM map, and should work! I''d have add a loop around init_boot_allocator() in __start_xen() iterating thru the RAM map again, and finding space above 16M. Am I on the right track? Thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-Feb-10 04:52 UTC
RE: [Xen-devel] bring up Hypervisor on large (512GB) memory
>From: Mukesh Rathor >Sent: Tuesday, February 10, 2009 11:09 AM > >Hi all, > >Trying to bring up the hypervisor on 512GB : > >1. Started with xen 3.1.4 (last oracle release), and 512GB, the system >panic''d right away: > (XEN) Early fatal page fault at e008:ffff828c801ce0ad > (cr2=ffff8300cfc00000, ec=0002) > >I noticed an anomaly here in the RAM map: >(XEN) Xen-e820 RAM map: >(XEN) 0000000000000000 - 000000000009f400 (usable) >(XEN) 000000000009f400 - 00000000000a0000 (reserved) >(XEN) 00000000000f0000 - 0000000000100000 (reserved) >(XEN) 0000000000100000 - 00000000cfd4c000 (usable) >(XEN) 00000000cfd4c000 - 00000000cfd56000 (ACPI data) >(XEN) 00000000cfd56000 - 00000000cfd57000 (usable) >(XEN) 00000000cfd57000 - 00000000e0000000 (reserved) >(XEN) 00000000fec00000 - 00000000fed00000 (reserved) >(XEN) 00000000fee00000 - 00000000fee10000 (reserved) >(XEN) 00000000ffc00000 - 0000000100000000 (reserved) >(XEN) 0000000100000000 - 000000802ffff000 (usable) <------- > >Seems that should be 0000008000000000 ???802ffff000 seems more reasonable since there''s a big hole reserved under 4G and thus max page in address space is shifted up to exceed 512GB size> > >2. Reduce some RAM, and booting with 440GB, map makes more sense: > >(XEN) Xen-e820 RAM map: >(XEN) 0000000000000000 - 000000000009f400 (usable) >(XEN) 000000000009f400 - 00000000000a0000 (reserved) >(XEN) 00000000000f0000 - 0000000000100000 (reserved) >(XEN) 0000000000100000 - 00000000cfd4c000 (usable) >(XEN) 00000000cfd4c000 - 00000000cfd56000 (ACPI data) >(XEN) 00000000cfd56000 - 00000000cfd57000 (usable) >(XEN) 00000000cfd57000 - 00000000e0000000 (reserved) >(XEN) 00000000fec00000 - 00000000fed00000 (reserved) >(XEN) 00000000fee00000 - 00000000fee10000 (reserved) >(XEN) 00000000ffc00000 - 0000000100000000 (reserved) >(XEN) 0000000100000000 - 0000006e00000000 (usable) >When you say "reducing some RAM", did you plug off DIMMs physically or just use "mem=" option to have Xen ignore pages beyond that size? If the latter, then it makes sense to observe 6e00000000.>Panic again: >(XEN) Early fatal page fault at e008:ffff828c80210460 >(cr2=ffff8300cfc00000, ec=0002) > > >The panic is trying to allocate bitmap in >init_boot_allocator(). The bitmap >starts at cfac0000 and given the size dc1000, won''t fit. > > >3. Moving to unstable 19164, looks like things are even more >tighter!! I > couldn''t even boot with 64GB. bitmap_start:cfac0000, >bitmap_size:201000, > alloc_bitmap:ffff8300cfac0000 > > >The only solution I can think of is moving the bitmap >elsewhere, above 4GB in >this case: > figure the size of bitmap, DIRECT map space, allocate the map, > mark it reserved in the RAM map, and should work! > > I''d have add a loop around init_boot_allocator() in __start_xen() > iterating thru the RAM map again, and finding space above 16M. > > >Am I on the right track? >that bitmap follows xen image immediately, and there''s a limitation to only choose e820 entry (< BOOTSTRAP_DIRECTMAP_END) for Xen relocation. I don''t know the tricks behind that limitation. But if you want to move bitmap higher, is it more generic to allow Xen image itself relocated to >4G trunk which also solves your case here? Of course I may lose some background here... Or alternative is to have reloc_size covering bitmap size with smaller change. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-10 05:34 UTC
Re: [Xen-devel] bring up Hypervisor on large (512GB) memory
On 10/02/2009 03:08, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:> The only solution I can think of is moving the bitmap elsewhere, above 4GB in > this case: > figure the size of bitmap, DIRECT map space, allocate the map, > mark it reserved in the RAM map, and should work! > > I''d have add a loop around init_boot_allocator() in __start_xen() > iterating thru the RAM map again, and finding space above 16M. > > Am I on the right track?A 512GB system still needs only 16MB of allocator bitmap. There''s no need for a complicated solution, moving it above 4GB or anything. I''ve actually broken xen-unstable and forgotten to account for the allocator bitmap overhead when I relocate Xen. I will fix that, and properly account for the bitmap overhead rather than add a fixed overhead (which is probably what breaks you on 3.1.4). I''ll let you know when that''s done. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Feb-10 08:11 UTC
Re: [Xen-devel] bring up Hypervisor on large (512GB) memory
Minus the regression Keir talks about, we have indications that Xen (even 3.3 iirc) itself boots fine even with 1Tb. The issues come with Dom0 booting: I recently submitted a patch to add infrastructure in Xen to allow Dom0 to get a valid initial memory map, but the kernel side changes we have could not easily be applied to the 2.6.18 tree, so that kernel will continue to not be suitable for this large an amount of memory. Nor can I at this point really tell whether those changes were all that''s needed to get things going, as the testing get stalled by the testers losing access to the single machine that had this much memory. For SLES10 (3.2 based), where we backported the necessary hypervisor changes, we have indications that there might be other issues (as the testers reported even booting with mem=400G failed), but we haven''t got hard data (read: back traces), and I haven''t been able to spot further issues by static code inspection. Jan>>> Mukesh Rathor <mukesh.rathor@oracle.com> 10.02.09 04:08 >>>Hi all, Trying to bring up the hypervisor on 512GB : 1. Started with xen 3.1.4 (last oracle release), and 512GB, the system panic''d right away: (XEN) Early fatal page fault at e008:ffff828c801ce0ad (cr2=ffff8300cfc00000, ec=0002) I noticed an anomaly here in the RAM map: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009f400 (usable) (XEN) 000000000009f400 - 00000000000a0000 (reserved) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cfd4c000 (usable) (XEN) 00000000cfd4c000 - 00000000cfd56000 (ACPI data) (XEN) 00000000cfd56000 - 00000000cfd57000 (usable) (XEN) 00000000cfd57000 - 00000000e0000000 (reserved) (XEN) 00000000fec00000 - 00000000fed00000 (reserved) (XEN) 00000000fee00000 - 00000000fee10000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000802ffff000 (usable) <------- Seems that should be 0000008000000000 ??? 2. Reduce some RAM, and booting with 440GB, map makes more sense: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009f400 (usable) (XEN) 000000000009f400 - 00000000000a0000 (reserved) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cfd4c000 (usable) (XEN) 00000000cfd4c000 - 00000000cfd56000 (ACPI data) (XEN) 00000000cfd56000 - 00000000cfd57000 (usable) (XEN) 00000000cfd57000 - 00000000e0000000 (reserved) (XEN) 00000000fec00000 - 00000000fed00000 (reserved) (XEN) 00000000fee00000 - 00000000fee10000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000006e00000000 (usable) Panic again: (XEN) Early fatal page fault at e008:ffff828c80210460 (cr2=ffff8300cfc00000, ec=0002) The panic is trying to allocate bitmap in init_boot_allocator(). The bitmap starts at cfac0000 and given the size dc1000, won''t fit. 3. Moving to unstable 19164, looks like things are even more tighter!! I couldn''t even boot with 64GB. bitmap_start:cfac0000, bitmap_size:201000, alloc_bitmap:ffff8300cfac0000 The only solution I can think of is moving the bitmap elsewhere, above 4GB in this case: figure the size of bitmap, DIRECT map space, allocate the map, mark it reserved in the RAM map, and should work! I''d have add a loop around init_boot_allocator() in __start_xen() iterating thru the RAM map again, and finding space above 16M. Am I on the right track? Thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-10 08:47 UTC
Re: [Xen-devel] bring up Hypervisor on large (512GB) memory
On 10/02/2009 08:11, "Jan Beulich" <jbeulich@novell.com> wrote:> Minus the regression Keir talks about, we have indications that Xen (even > 3.3 iirc) itself boots fine even with 1Tb.Yes, with your patch to upsize the Xen heap automatically (or with xenheap override specified on the Xen command line) it should work fine. Good to hear it worked at least once in practice too. The regression is an omission on my part. I just need to bring your ''auto-sizing'' logic back in, to correctly size the relocation area that I allocate for Xen-plus-bitmap. I''ll get it done later today hopefully. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Feb-10 23:43 UTC
Re: [Xen-devel] bring up Hypervisor on large (512GB) memory
Keir Fraser wrote:> On 10/02/2009 03:08, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > >> The only solution I can think of is moving the bitmap elsewhere, above 4GB in >> this case: >> figure the size of bitmap, DIRECT map space, allocate the map, >> mark it reserved in the RAM map, and should work! >> >> I''d have add a loop around init_boot_allocator() in __start_xen() >> iterating thru the RAM map again, and finding space above 16M. >> >> Am I on the right track? > > A 512GB system still needs only 16MB of allocator bitmap. There''s no need > for a complicated solution, moving it above 4GB or anything. > > I''ve actually broken xen-unstable and forgotten to account for the allocator > bitmap overhead when I relocate Xen. I will fix that, and properly account > for the bitmap overhead rather than add a fixed overhead (which is probably > what breaks you on 3.1.4). I''ll let you know when that''s done. > > -- Keir >ok, that sounds better solution. Let me know. Next on my list is customer unable to boot 256GB PV guest, stay tuned. Thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-11 10:43 UTC
Re: [Xen-devel] bring up Hypervisor on large (512GB) memory
On 10/02/2009 08:47, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> The regression is an omission on my part. I just need to bring your > ''auto-sizing'' logic back in, to correctly size the relocation area that I > allocate for Xen-plus-bitmap. I''ll get it done later today hopefully.It was less broken than I remembered and the patch is really small. It''s changeset 19188 (also attached to this email). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Feb-27 04:42 UTC
Re: [Xen-devel] bring up Hypervisor on large (512GB) memory
Keir Fraser wrote: > On 10/02/2009 08:47, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >> The regression is an omission on my part. I just need to bring your >> ''auto-sizing'' logic back in, to correctly size the relocation area that I >> allocate for Xen-plus-bitmap. I''ll get it done later today hopefully. > > It was less broken than I remembered and the patch is really small. It''s > changeset 19188 (also attached to this email). > > -- Keir > I thought I''d give it a try to see if I can "port" this fix to 3.1.4, but didnt'' get very far. Looks like xenheap is getting exhausted. It appears it has to be in low 2GB. Do you think it''s even worth exploring this further, or are there too many issues backporting this? Thanks a lot, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-27 09:36 UTC
Re: [Xen-devel] bring up Hypervisor on large (512GB) memory
On 27/02/2009 04:42, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:>> It was less broken than I remembered and the patch is really small. It''s >> changeset 19188 (also attached to this email). >> >> -- Keir >> > > I thought I''d give it a try to see if I can "port" this fix to 3.1.4, > but didnt'' get very far. Looks like xenheap is getting exhausted. It > appears it has to be in low 2GB. > > Do you think it''s even worth exploring this further, or are there too > many issues backporting this?Xen 3.1 does not relocate Xen, and it will be hard to backport those patches. You''ll have to specify xenheap_megabytes=64 (or whatever) manually on the command line if you want a bigger Xen heap. The specific ''fix'' you mention above was actually only for a bug in xen-unstable anyway. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel