Hi! Changeset 22706:ca10302ac285 causes below boot crash. Reverting it makes xen boot again. The rip points to xen/common/page_alloc.c:543 Christoph (XEN) NUMA: Using 8 for the hash shift. (XEN) Early fatal page fault at e008:ffff82c480114cd2 (cr2=ffff82c400000404, ec=0002) (XEN) Stack dump: ffff82f600002020 ffff82c4802abd00 0000000000000000 0000000000000008 ffff82c480297d48 0000000000000101 0000000000000101 0000000000000000 0000000000000001 0000000000000000 ffff82c400000000 0000000000000000 ffffffffffffffff 0000000000000101 ffff82f600002020 0000000000000002 ffff82c480114cd2 000000000000e008 0000000000010093 ffff82c480297d08 0000000000000000 ffff82c480114c15 ffff82c480297d48 0000000000000000 ffff82c480297d28 0000000000000000 0000000000000000 0000000000000000 ffff82c4802abd00 0000000000000eff ffff82c480297da8 ffff82c48011653a 0000000000000030 00000000fff7ae3d ffff82f600002020 ffff82f600002020 00ff82c48026bd5f ffff830000100000 0000000000000008 0000000000000000 0000000000000000 ffffffffffffffff ffff82c480297de8 ffff82c480260de3 0000000000000101 00000000000ffe6a 0000000000000001 000004ffffffffff 000000012f2a7000 000000000000000e ffff82c480297f08 ffff82c48027d55f 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000000 ffff830000037c91 ffff83000004b020 0000000000000000 0100000000000000 0000000000000000 00000000cfa00000 0000000000000000 0000000000000000 ffffffffffffffff ffff83000003b320 000000000003b320 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff82c48028d6cc ffffffff00000000 0000000001000000 ffff82c48025d500 ffff83000003b320 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000067abc ffff82c4801000b5 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000fffff000 0000000000000000 0000000000000000 -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 12, Christoph Egger wrote:> > Hi! > > Changeset 22706:ca10302ac285 causes below boot crash. > Reverting it makes xen boot again. > > The rip points to xen/common/page_alloc.c:543Yes, that change was not well done. Sorry for that. I''m sure it doesnt crash if set_gpfn_from_mfn() is called from free_domheap_pages(). Looking at free_heap_pages(), now the page owner is cleared at the beginning of the loop. But later in the loop it is checked wether a TLB flush is required. So the set_gpfn_from_mfn() should be at least moved past this check. Even if that doesnt fix the crash you are seening. I will see if I can come up with a better version. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 12.01.11 at 10:45, Christoph Egger <Christoph.Egger@amd.com> wrote: > Changeset 22706:ca10302ac285 causes below boot crash. > Reverting it makes xen boot again. > > The rip points to xen/common/page_alloc.c:543Assuming the change was tested on x86-64, it must be something unusual on your system that results in free_heap_pages() getting called before the compat m2p table (and possibly the native one too, as that one gets written after the compat one) was set up. For understanding that, the actual call stack would need to be worked out from the raw stack dump you provided. Perhaps we''ll need a global variable indicating when those tables are ready to be used... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/01/2011 10:09, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 12.01.11 at 10:45, Christoph Egger <Christoph.Egger@amd.com> wrote: >> Changeset 22706:ca10302ac285 causes below boot crash. >> Reverting it makes xen boot again. >> >> The rip points to xen/common/page_alloc.c:543 > > Assuming the change was tested on x86-64, it must be something > unusual on your system that results in free_heap_pages() getting > called before the compat m2p table (and possibly the native one > too, as that one gets written after the compat one) was set up. > For understanding that, the actual call stack would need to be > worked out from the raw stack dump you provided. > > Perhaps we''ll need a global variable indicating when those tables > are ready to be used...I''ll revert the patch for now, I think we''ll need to revisit after 4.1.0. Not really an issue I''d say as it''s pretty clear that xenpaging is not going to be in a production-ready state for 4.1.0 anyway. Best we can hope for is to get some tested patches backported for 4.1.1 or 4.1.2. The basic issue here seems to be a dependency violation in trying to access m2p at all from the heap allocator. The code that sets up the m2p itself allocs from the domheap, so having the heap, or domheap, subsystem, address the m2p itself seems flawed. We''d get away with it from free_domheap_pages() as no pages get freed until later, as it happens, but: (a) not nice to depend on that; (b) it''s nice to do it in free_heap_pages() and cover Xenheap as well as domheap. One way around this might be to set up the m2p tables earlier, perhaps when we currently init_frametable(), and get the required pages via alloc_bootheap_pages(). But this is too big a change for 4.1 now. -- Keir> Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 12, Olaf Hering wrote:> On Wed, Jan 12, Christoph Egger wrote: > > > > > Hi! > > > > Changeset 22706:ca10302ac285 causes below boot crash. > > Reverting it makes xen boot again. > > > > The rip points to xen/common/page_alloc.c:543 > > Yes, that change was not well done. Sorry for that. I''m sure it doesnt > crash if set_gpfn_from_mfn() is called from free_domheap_pages(). > > Looking at free_heap_pages(), now the page owner is cleared at the > beginning of the loop. But later in the loop it is checked wether a TLB > flush is required. So the set_gpfn_from_mfn() should be at least moved > past this check. Even if that doesnt fix the crash you are seening. > > I will see if I can come up with a better version.It crashes in end_boot_allocator -> init_heap_pages -> free_heap_pages. paging_init() initializes the machine_to_phys_mapping[] array, but its called after end_boot_allocator(). As Keir said, there needs to be a more complete change for the machine_to_phys_mapping[] array handling. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel