Hi: This issue can be easily reproduced by continuous and almost concurrently reboot 12 Xen HVM VMS on a single physic server. The reproduce hit the back trace about 6 to 14 hours after it started. I have several similar Xen back traces, please refer to the end of the mail. The first three back traces almost the same, they happened in domain_kill, while the last backtrace happened in do_multicall. As go through the Xen code, in /xen-4.0.0/xen/arch/x86/mm.c, it shows that the author aware of the race competition between domain_relinquish_resources and presented code. It occurred me to simply move line 2765 and 2766 before 2764, that is move put_page_and_type(page) into the spin_lock to avoid competition. 2753 /* A page is dirtied when its pin status is set. */ 2754 paging_mark_dirty(pg_owner, mfn); 2755 2756 /* We can race domain destruction (domain_relinquish_resources). */ 2757 if ( unlikely(pg_owner != d) ) 2758 { 2759 int drop_ref; 2760 spin_lock(&pg_owner->page_alloc_lock); 2761 drop_ref = (pg_owner->is_dying && 2762 test_and_clear_bit(_PGT_pinned, 2763 &page->u.inuse.type_info)); 2764 spin_unlock(&pg_owner->page_alloc_lock); 2765 if ( drop_ref ) 2766 put_page_and_type(page); 2767 } 2768 2769 break; 2770 } Form the result of reproduce on patched code, it appears the patch worked well since the reproduce succeed during a 48hours long run. But I am not sure of the side effects it brings. Appreciate in advance if someone could give more clauses, thx. =============Trace 1: ============ (XEN) ----[ Xen-4.0.0 x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c48011617c>] free_heap_pages+0x55a/0x575 (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: 0000001fffffffe0 rbx: ffff82f60b8bbfc0 rcx: ffff83063fe01a20 (XEN) rdx: ffff8315ffffffe0 rsi: ffff8315ffffffe0 rdi: 00000000ffffffff (XEN) rbp: ffff82c48037fc98 rsp: ffff82c48037fc58 r8: 0000000000000000 (XEN) r9: ffffffffffffffff r10: ffff82c48020e770 r11: 0000000000000282 (XEN) r12: 00007d0a00000000 r13: 0000000000000000 r14: ffff82f60b8bbfe0 (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000232914000 cr2: ffff8315ffffffe4 (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff82c48037fc58: (XEN) 0000000000000016 0000000000000000 00000000000001a2 ffff8304afc40000 (XEN) 0000000000000000 ffff82f60b8bbfe0 00000000000330fe ffff82f60b8bc000 (XEN) ffff82c48037fcd8 ffff82c48011647e 0000000100000000 ffff82f60b8bbfe0 (XEN) ffff8304afc40020 0000000000000000 ffff8304afc40000 0000000000000000 (XEN) ffff82c48037fcf8 ffff82c480160caf ffff8304afc40000 ffff82f60b8bbfe0 (XEN) ffff82c48037fd68 ffff82c48014deaf 0000000000000ca3 ffff8304afc40fd8 (XEN) ffff8304afc40fd8 ffff8304afc40fd8 4000000000000000 ffff82c48037ff28 (XEN) 0000000000000000 ffff8304afc40000 ffff8304afc40000 000000000099e000 (XEN) 00000000ffffffda 0000000000000001 ffff82c48037fd98 ffff82c4801504de (XEN) ffff8304afc40000 0000000000000000 000000000099e000 00000000ffffffda (XEN) ffff82c48037fdb8 ffff82c4801062ee 000000000099e000 fffffffffffffff3 (XEN) ffff82c48037ff08 ffff82c480104cd7 ffff82c40000f800 0000000000000286 (XEN) 0000000000000286 ffff8300bf76c000 000000ea864b1814 ffff8300bf76c030 (XEN) ffff83023ff1ded8 ffff83023ff1ded0 ffff82c48037fe38 ffff82c48011c9f5 (XEN) ffff82c48037ff08 ffff82c480272100 ffff8300bf76c000 ffff82c48037fe48 (XEN) ffff82c48011f557 ffff82c480272100 0000000600000002 000000004700000a (XEN) 000000004700bf2c 0000000000000000 000000004700c158 0000000000000000 (XEN) 00002b3b59e7d050 0000000000000000 0000007f00b14140 00002b3b5f257a80 (XEN) 0000000000996380 00002aaaaaad0830 00002b3b5f257a80 00000000009bb690 (XEN) 00002aaaaaad0830 000000398905abf3 000000000078de60 00002b3b5f257aa4 (XEN) Xen call trace: (XEN) [<ffff82c48011617c>] free_heap_pages+0x55a/0x575 (XEN) [<ffff82c48011647e>] free_domheap_pages+0x2e7/0x3ab (XEN) [<ffff82c480160caf>] put_page+0x69/0x70 (XEN) [<ffff82c48014deaf>] relinquish_memory+0x36e/0x499 (XEN) [<ffff82c4801504de>] domain_relinquish_resources+0x1ac/0x24c (XEN) [<ffff82c4801062ee>] domain_kill+0x93/0xe4 (XEN) [<ffff82c480104cd7>] do_domctl+0xa1c/0x1205 (XEN) [<ffff82c4801f71bf>] syscall_enter+0xef/0x149 (XEN) (XEN) Pagetable walk from ffff8315ffffffe4: (XEN) L4[0x106] = 00000000bf589027 5555555555555555 (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0002] (XEN) Faulting linear address: ffff8315ffffffe4 (XEN) **************************************** (XEN) (XEN) Manual reset required (''noreboot'' specified) =============Trace 2: ============ (XEN) Xen call trace: (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 (XEN) [<ffff82c48010739b>] evtchn_set_pending+0xab/0x1b0 (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae (XEN) (XEN) Pagetable walk from ffff8315ffffffe4: (XEN) L4[0x106] = 00000000bf569027 5555555555555555 (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff (XEN) stdvga.c:147:d60 entering stdvga and caching modes (XEN) (XEN) **************************************** (XEN) HVM60: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $ (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0002] (XEN) Faulting linear address: ffff8315ffffffe4 (XEN) **************************************** (XEN) (XEN) Manual reset required (''noreboot'' specified) =============Trace 3: ============ (XEN) Xen call trace: (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 (XEN) [<ffff82c480117804>] csched_acct+0x384/0x430 (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae (XEN) (XEN) Pagetable walk from ffff8315ffffffe4: (XEN) L4[0x106] = 00000000bf569027 5555555555555555 (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0002] (XEN) Faulting linear address: ffff8315ffffffe4 (XEN) **************************************** (XEN) (XEN) Manual reset required (''noreboot'' specified) =============Trace 4: ============ (XEN) Xen call trace: (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 (XEN) [<ffff82c48015b0e5>] free_page_type+0x4c5/0x670 (XEN) [<ffff82c48015a218>] get_page+0x28/0xf0 (XEN) [<ffff82c48015b439>] __put_page_type+0x1a9/0x290 (XEN) [<ffff82c48016211f>] do_mmuext_op+0xf3f/0x1320 (XEN) [<ffff82c480113d7e>] do_multicall+0x14e/0x340 (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae (XEN) (XEN) Pagetable walk from ffff8315ffffffe4: (XEN) L4[0x106] = 00000000bf569027 5555555555555555 (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) FATAL PAGE FAULT (XEN) [error_code=0002] (XEN) Faulting linear address: ffff8315ffffffe4 (XEN) **************************************** (XEN) (XEN) Manual reset required (''noreboot'' specified) ----------------------------------------------------- Sun, 07 Feb 2010 11:56:26 +0000, Keir Fraser >> wrote:>I''ll have to decode the backtrace a bit, but I would guess most likely is >some memory got corrupted along the way, which would be rather nasty. I >already need to follow up on a report of apparent memory corruption in a >domU userspace (testing with the ''memtester'' utility), so with a bit of luck >they could be maifestations of the same bug.>-- KeirOn 06/02/2010 22:56, "Mark Hurenkamp" <mark.hurenkamp@xxxxxxxxx>> wrote:>> Hi, >> >> >> While playing with my xen server (which is running xen-unstable/linux pvops), >> it suddenly crashed with the following messages on the serial port. >> This is a recent version of xen-unstable, but i am a few updates behind. >> I''ve seen this only once, so perhaps it is hard to reproduce. I hope this >> info is still of use to someone. >> >> >> Regards, >> Mark. >> >> >> (XEN) tmem: all pools frozen for all domains >> (XEN) tmem: all pools frozen for all domains >> (XEN) tmem: all pools thawed for all domains >> (XEN) tmem: all pools thawed for all domains >> (XEN) paging.c:170: paging_free_log_dirty_bitmap: used 19 pages for domain 3 >> dirty logging >> (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Tainted: C ]---- >> (XEN) CPU: 2 >> (XEN) RIP: e008:[<ffff82c4801150c5>>] free_heap_pages+0x53a/0x555 >> (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor >> (XEN) rax: ffff82c4803004c0 rbx: ffff82f600ae4b40 rcx: ffff8315ffffffe0 >> (XEN) rdx: 00000000ffffffff rsi: ffff8315ffffffe0 rdi: ffff82f600000000 >> (XEN) rbp: ffff83013ff27bc8 rsp: ffff83013ff27b68 r8: 0000000000000000 >> (XEN) r9: 0200000000000000 r10: 0000000000000001 r11: 0080000000000000 >> (XEN) r12: ffff82f600ae4b60 r13: 0000000000000000 r14: 00007d0a00000000 >> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 >> (XEN) cr3: 0000000101001000 cr2: ffff8315ffffffe4 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff83013ff27b68: >> (XEN) c2c2c2c2c2c2c2c2 0000000000000064 0000000000000000 0000000000000012 >> (XEN) 0000000000000297 000000000000017a ffff82c48011e1e3 0000000000000000 >> (XEN) ffff83010fc50000 ffff82f600ae4b60 0000000000069f65 ffff82f600ae4b80 >> (XEN) ffff83013ff27c18 ffff82c4801153ee 0000000000000001 0000000000000001 >> (XEN) ffff82f600ae49c8 ffff82f600ae4b60 0000000000800727 ffff83013fef0000 >> (XEN) ffff82f600ae4b60 ffff83010fc50000 ffff83013ff27c38 ffff82c48015d4d0 >> (XEN) 000000000000e010 800000005725b727 ffff83013ff27c78 ffff82c48015f8d8 >> (XEN) 80000000571bf727 ffff8300aae3ac60 ffff83013fef0000 ffff8300aae3b000 >> (XEN) ffff83013ff27f28 0000000000000000 ffff83013ff27cd8 ffff82c48015eaf4 >> (XEN) ffff83013ff27d08 ffff82c48015fe3d ffff83013ff27cf8 ffff82c48015d4fe >> (XEN) ffff83013ff27cc8 1400000000000001 ffff82f60155c740 ffff82f60155c740 >> (XEN) ffff83013ff27f28 007fffffffffffff ffff83013ff27d28 ffff82c48015f11c >> (XEN) 000000003fef0000 ffff82f60155c750 ffff83013ff27d38 ffff83013fef0000 >> (XEN) 0000000000000000 ffffc9000000c2b0 00000000000aae3a ffff83013ff27f28 >> (XEN) ffff83013ff27d38 ffff82c48015f2f8 ffff83013ff27e38 ffff82c480163a4f >> (XEN) ffff83013fef0018 00007ff03fef0000 0000000000000000 ffff82c480264db0 >> (XEN) ffff82c480264db8 ffff83013ff27f28 ffff83013ff27f28 ffff83013fef0218 >> (XEN) ffff8300bf524000 ffff83013fef0000 ffff8300bf524000 ffff83013fef0000 >> (XEN) ffff83013fff3da8 0000000100000002 ffff830100000000 ffff82f60155c740 >> (XEN) 800000008eadf063 ffff880000000001 ffff83013ff27de8 000000003fff3d90 >> (XEN) Xen call trace: >> (XEN) [<ffff82c4801150c5>>] free_heap_pages+0x53a/0x555 >> (XEN) [<ffff82c4801153ee>>] free_domheap_pages+0x30e/0x3cc >> (XEN) [<ffff82c48015d4d0>>] put_page+0x6c/0x73 >> (XEN) [<ffff82c48015f8d8>>] put_page_from_l1e+0x19f/0x1b5 >> (XEN) [<ffff82c48015eaf4>>] free_page_type+0x25c/0x7b0 >> (XEN) [<ffff82c48015f11c>>] __put_page_type+0xd4/0x292 >> (XEN) [<ffff82c48015f2f8>>] put_page_type+0xe/0x23 >> (XEN) [<ffff82c480163a4f>>] do_mmuext_op+0x6ff/0x14b8 >> (XEN) [<ffff82c480114235>>] do_multicall+0x285/0x410 >> (XEN) [<ffff82c4801f01bf>>] syscall_enter+0xef/0x149 >> (XEN) >> (XEN) Pagetable walk from ffff8315ffffffe4: >> (XEN) L4[0x106] = 00000000bf4f5027 5555555555555555 >> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 2: >> (XEN) FATAL PAGE FAULT >> (XEN) [error_code=0002] >> (XEN) Faulting linear address: ffff8315ffffffe4 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26/08/2010 05:49, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> Hi: > > This issue can be easily reproduced by continuous and almost concurrently > reboot 12 Xen HVM VMS on a single physic server. The reproduce hit the back > trace about 6 to 14 hours after it started. I have several similar Xen back > traces, please refer to the end of the mail. The first three back traces > almost the same, they happened in domain_kill, while the last backtrace > happened in do_multicall. > > As go through the Xen code, in /xen-4.0.0/xen/arch/x86/mm.c, it shows > that the author aware of the race competition between > domain_relinquish_resources and presented code. It occurred me to simply move > line 2765 and 2766 before 2764, that is move put_page_and_type(page) into the > spin_lock to avoid competition.Well, thanks for the detailed bug report: it is good to have a report that includes an attempt at a fix! In the below code, the put_page_and_type() is outside the locked region for good reason. Put_page_and_type() -> put_page() -> free_domheap_pages() which acquires d->page_alloc_lock. Because we do not use spin_lock_recursive() in the below code, this recursive acquisition of the lock in free_domheap_pages() would deadlock! Now, I do not think this fix really affected your testing anyway, because the below code is part of the MMUEXT_PIN_... hypercalls, and further is only triggered when a domain executes one of those hypercalls on *another* domain''s memory. The *only* time that should happen is when dom0 builds a *PV* VM. So since all your testing is on HVM guests I wouldn''t expect the code in the if() statement below to be executed ever. Well, maybe unless you are using qemu stub domains, or pvgrub. But even if the below code is being executed, I don''t think your change is a fix, or anything that should greatly affect the system apart from introducing a deadlock. Is it instead possible that you somehow were testing a broken build of Xen before, and simply re-building Xen with your change is what fixed things? I wonder if the bug stays gone away if you revert your change and re-build? If it still appears that your fix is good, I would add tracing to the below code and find out a bit more about when/why it is being executed. -- Keir> 2753 /* A page is dirtied when its pin status is set. */ > 2754 paging_mark_dirty(pg_owner, mfn); > 2755 > 2756 /* We can race domain destruction > (domain_relinquish_resources). */ > 2757 if ( unlikely(pg_owner != d) ) > 2758 { > 2759 int drop_ref; > 2760 spin_lock(&pg_owner->page_alloc_lock); > 2761 drop_ref = (pg_owner->is_dying && > 2762 test_and_clear_bit(_PGT_pinned, > 2763 > &page->u.inuse.type_info)); > 2764 spin_unlock(&pg_owner->page_alloc_lock); > 2765 if ( drop_ref ) > 2766 put_page_and_type(page); > 2767 } > 2768 > 2769 break; > 2770 } > > Form the result of reproduce on patched code, it appears the patch > worked well since the reproduce succeed during a 48hours long run. But I am > not sure of the side effects it brings. > Appreciate in advance if someone could give more clauses, thx. > > =============Trace 1: ============> > (XEN) ----[ Xen-4.0.0 x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c48011617c>] free_heap_pages+0x55a/0x575 > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > (XEN) rax: 0000001fffffffe0 rbx: ffff82f60b8bbfc0 rcx: ffff83063fe01a20 > (XEN) rdx: ffff8315ffffffe0 rsi: ffff8315ffffffe0 rdi: 00000000ffffffff > (XEN) rbp: ffff82c48037fc98 rsp: ffff82c48037fc58 r8: 0000000000000000 > (XEN) r9: ffffffffffffffff r10: ffff82c48020e770 r11: 0000000000000282 > (XEN) r12: 00007d0a00000000 r13: 0000000000000000 r14: ffff82f60b8bbfe0 > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 0000000232914000 cr2: ffff8315ffffffe4 > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c48037fc58: > (XEN) 0000000000000016 0000000000000000 00000000000001a2 ffff8304afc40000 > (XEN) 0000000000000000 ffff82f60b8bbfe0 00000000000330fe ffff82f60b8bc000 > (XEN) ffff82c48037fcd8 ffff82c48011647e 0000000100000000 ffff82f60b8bbfe0 > (XEN) ffff8304afc40020 0000000000000000 ffff8304afc40000 0000000000000000 > (XEN) ffff82c48037fcf8 ffff82c480160caf ffff8304afc40000 ffff82f60b8bbfe0 > (XEN) ffff82c48037fd68 ffff82c48014deaf 0000000000000ca3 ffff8304afc40fd8 > (XEN) ffff8304afc40fd8 ffff8304afc40fd8 4000000000000000 ffff82c48037ff28 > (XEN) 0000000000000000 ffff8304afc40000 ffff8304afc40000 000000000099e000 > (XEN) 00000000ffffffda 0000000000000001 ffff82c48037fd98 ffff82c4801504de > (XEN) ffff8304afc40000 0000000000000000 000000000099e000 00000000ffffffda > (XEN) ffff82c48037fdb8 ffff82c4801062ee 000000000099e000 fffffffffffffff3 > (XEN) ffff82c48037ff08 ffff82c480104cd7 ffff82c40000f800 0000000000000286 > (XEN) 0000000000000286 ffff8300bf76c000 000000ea864b1814 ffff8300bf76c030 > (XEN) ffff83023ff1ded8 ffff83023ff1ded0 ffff82c48037fe38 ffff82c48011c9f5 > (XEN) ffff82c48037ff08 ffff82c480272100 ffff8300bf76c000 ffff82c48037fe48 > (XEN) ffff82c48011f557 ffff82c480272100 0000000600000002 000000004700000a > (XEN) 000000004700bf2c 0000000000000000 000000004700c158 0000000000000000 > (XEN) 00002b3b59e7d050 0000000000000000 0000007f00b14140 00002b3b5f257a80 > (XEN) 0000000000996380 00002aaaaaad0830 00002b3b5f257a80 00000000009bb690 > (XEN) 00002aaaaaad0830 000000398905abf3 000000000078de60 00002b3b5f257aa4 > (XEN) Xen call trace: > (XEN) [<ffff82c48011617c>] free_heap_pages+0x55a/0x575 > (XEN) [<ffff82c48011647e>] free_domheap_pages+0x2e7/0x3ab > (XEN) [<ffff82c480160caf>] put_page+0x69/0x70 > (XEN) [<ffff82c48014deaf>] relinquish_memory+0x36e/0x499 > (XEN) [<ffff82c4801504de>] domain_relinquish_resources+0x1ac/0x24c > (XEN) [<ffff82c4801062ee>] domain_kill+0x93/0xe4 > (XEN) [<ffff82c480104cd7>] do_domctl+0xa1c/0x1205 > (XEN) [<ffff82c4801f71bf>] syscall_enter+0xef/0x149 > (XEN) > (XEN) Pagetable walk from ffff8315ffffffe4: > (XEN) L4[0x106] = 00000000bf589027 5555555555555555 > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0002] > (XEN) Faulting linear address: ffff8315ffffffe4 > (XEN) **************************************** > (XEN) > (XEN) Manual reset required (''noreboot'' specified) > > =============Trace 2: ============> > (XEN) Xen call trace: > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 > (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > (XEN) [<ffff82c48010739b>] evtchn_set_pending+0xab/0x1b0 > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > (XEN) > (XEN) Pagetable walk from ffff8315ffffffe4: > (XEN) L4[0x106] = 00000000bf569027 5555555555555555 > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > (XEN) stdvga.c:147:d60 entering stdvga and caching modes > (XEN) > (XEN) **************************************** > (XEN) HVM60: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $ > (XEN) Panic on CPU 0: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0002] > (XEN) Faulting linear address: ffff8315ffffffe4 > (XEN) **************************************** > (XEN) > (XEN) Manual reset required (''noreboot'' specified) > > =============Trace 3: ============> > > (XEN) Xen call trace: > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 > (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > (XEN) [<ffff82c480117804>] csched_acct+0x384/0x430 > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Appreciate for the detail. I notice the spin_lock for the code I referred, which as you mentioned will introduce a deadlock. In fact, during the 48 hours long run, there was a VM hung, and from the xm list command, the cpu time is quite high, to ten thousands, but the other VMS worked fine. I don''t know whether it related to the potential deadlock, since Xen still worked. So a quick question is if we replace the spin_lock with spin_lock_recursive, could we avoid this deadlock? The if statement was executed during the test since I happend put the log and got the output log. As a matter of fact, HVMS(all windowns 2003) under my test all are have PV driver installed. I think that''s why the patch take effects. Besides, I have been working on this issue for sometime, it is not possible I made a build mistake since I have been carefully all the time. Anyway, I plan to kick off two reproduce on two physical servers, one has this patch enabled(use spin_lock_recursive instead of spin_lock) and the other with no change, completely on clean code. It would be useful if u have some trace to be added into the test. I will keep you informed. In addtion, my kernel is 2.6.31.13-pvops-patch #1 SMP Tue Aug 24 11:23:51 CST 2010 x86_64 x86_64 x86_64 GNU/Linux Xen is 4.0.0 Thanks.> Date: Thu, 26 Aug 2010 08:39:03 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > On 26/08/2010 05:49, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > > > Hi: > > > > This issue can be easily reproduced by continuous and almost concurrently > > reboot 12 Xen HVM VMS on a single physic server. The reproduce hit the back > > trace about 6 to 14 hours after it started. I have several similar Xen back > > traces, please refer to the end of the mail. The first three back traces > > almost the same, they happened in domain_kill, while the last backtrace > > happened in do_multicall. > > > > As go through the Xen code, in /xen-4.0.0/xen/arch/x86/mm.c, it shows > > that the author aware of the race competition between > > domain_relinquish_resources and presented code. It occurred me to simply move > > line 2765 and 2766 before 2764, that is move put_page_and_type(page) into the > > spin_lock to avoid competition. > > Well, thanks for the detailed bug report: it is good to have a report that > includes an attempt at a fix! > > In the below code, the put_page_and_type() is outside the locked region for > good reason. Put_page_and_type() -> put_page() -> free_domheap_pages() which > acquires d->page_alloc_lock. Because we do not use spin_lock_recursive() in > the below code, this recursive acquisition of the lock in > free_domheap_pages() would deadlock! > > Now, I do not think this fix really affected your testing anyway, because > the below code is part of the MMUEXT_PIN_... hypercalls, and further is only > triggered when a domain executes one of those hypercalls on *another* > domain''s memory. The *only* time that should happen is when dom0 builds a > *PV* VM. So since all your testing is on HVM guests I wouldn''t expect the > code in the if() statement below to be executed ever. Well, maybe unless you > are using qemu stub domains, or pvgrub. > > But even if the below code is being executed, I don''t think your change is a > fix, or anything that should greatly affect the system apart from > introducing a deadlock. Is it instead possible that you somehow were testing > a broken build of Xen before, and simply re-building Xen with your change is > what fixed things? I wonder if the bug stays gone away if you revert your > change and re-build? > > If it still appears that your fix is good, I would add tracing to the below > code and find out a bit more about when/why it is being executed. > > -- Keir > > > 2753 /* A page is dirtied when its pin status is set. */ > > 2754 paging_mark_dirty(pg_owner, mfn); > > 2755 > > 2756 /* We can race domain destruction > > (domain_relinquish_resources). */ > > 2757 if ( unlikely(pg_owner != d) ) > > 2758 { > > 2759 int drop_ref; > > 2760 spin_lock(&pg_owner->page_alloc_lock); > > 2761 drop_ref = (pg_owner->is_dying && > > 2762 test_and_clear_bit(_PGT_pinned, > > 2763 > > &page->u.inuse.type_info)); > > 2764 spin_unlock(&pg_owner->page_alloc_lock); > > 2765 if ( drop_ref ) > > 2766 put_page_and_type(page); > > 2767 } > > 2768 > > 2769 break; > > 2770 } > > > > Form the result of reproduce on patched code, it appears the patch > > worked well since the reproduce succeed during a 48hours long run. But I am > > not sure of the side effects it brings. > > Appreciate in advance if someone could give more clauses, thx. > > > > =============Trace 1: ============> > > > (XEN) ----[ Xen-4.0.0 x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c48011617c>] free_heap_pages+0x55a/0x575 > > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > > (XEN) rax: 0000001fffffffe0 rbx: ffff82f60b8bbfc0 rcx: ffff83063fe01a20 > > (XEN) rdx: ffff8315ffffffe0 rsi: ffff8315ffffffe0 rdi: 00000000ffffffff > > (XEN) rbp: ffff82c48037fc98 rsp: ffff82c48037fc58 r8: 0000000000000000 > > (XEN) r9: ffffffffffffffff r10: ffff82c48020e770 r11: 0000000000000282 > > (XEN) r12: 00007d0a00000000 r13: 0000000000000000 r14: ffff82f60b8bbfe0 > > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 > > (XEN) cr3: 0000000232914000 cr2: ffff8315ffffffe4 > > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c48037fc58: > > (XEN) 0000000000000016 0000000000000000 00000000000001a2 ffff8304afc40000 > > (XEN) 0000000000000000 ffff82f60b8bbfe0 00000000000330fe ffff82f60b8bc000 > > (XEN) ffff82c48037fcd8 ffff82c48011647e 0000000100000000 ffff82f60b8bbfe0 > > (XEN) ffff8304afc40020 0000000000000000 ffff8304afc40000 0000000000000000 > > (XEN) ffff82c48037fcf8 ffff82c480160caf ffff8304afc40000 ffff82f60b8bbfe0 > > (XEN) ffff82c48037fd68 ffff82c48014deaf 0000000000000ca3 ffff8304afc40fd8 > > (XEN) ffff8304afc40fd8 ffff8304afc40fd8 4000000000000000 ffff82c48037ff28 > > (XEN) 0000000000000000 ffff8304afc40000 ffff8304afc40000 000000000099e000 > > (XEN) 00000000ffffffda 0000000000000001 ffff82c48037fd98 ffff82c4801504de > > (XEN) ffff8304afc40000 0000000000000000 000000000099e000 00000000ffffffda > > (XEN) ffff82c48037fdb8 ffff82c4801062ee 000000000099e000 fffffffffffffff3 > > (XEN) ffff82c48037ff08 ffff82c480104cd7 ffff82c40000f800 0000000000000286 > > (XEN) 0000000000000286 ffff8300bf76c000 000000ea864b1814 ffff8300bf76c030 > > (XEN) ffff83023ff1ded8 ffff83023ff1ded0 ffff82c48037fe38 ffff82c48011c9f5 > > (XEN) ffff82c48037ff08 ffff82c480272100 ffff8300bf76c000 ffff82c48037fe48 > > (XEN) ffff82c48011f557 ffff82c480272100 0000000600000002 000000004700000a > > (XEN) 000000004700bf2c 0000000000000000 000000004700c158 0000000000000000 > > (XEN) 00002b3b59e7d050 0000000000000000 0000007f00b14140 00002b3b5f257a80 > > (XEN) 0000000000996380 00002aaaaaad0830 00002b3b5f257a80 00000000009bb690 > > (XEN) 00002aaaaaad0830 000000398905abf3 000000000078de60 00002b3b5f257aa4 > > (XEN) Xen call trace: > > (XEN) [<ffff82c48011617c>] free_heap_pages+0x55a/0x575 > > (XEN) [<ffff82c48011647e>] free_domheap_pages+0x2e7/0x3ab > > (XEN) [<ffff82c480160caf>] put_page+0x69/0x70 > > (XEN) [<ffff82c48014deaf>] relinquish_memory+0x36e/0x499 > > (XEN) [<ffff82c4801504de>] domain_relinquish_resources+0x1ac/0x24c > > (XEN) [<ffff82c4801062ee>] domain_kill+0x93/0xe4 > > (XEN) [<ffff82c480104cd7>] do_domctl+0xa1c/0x1205 > > (XEN) [<ffff82c4801f71bf>] syscall_enter+0xef/0x149 > > (XEN) > > (XEN) Pagetable walk from ffff8315ffffffe4: > > (XEN) L4[0x106] = 00000000bf589027 5555555555555555 > > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) FATAL PAGE FAULT > > (XEN) [error_code=0002] > > (XEN) Faulting linear address: ffff8315ffffffe4 > > (XEN) **************************************** > > (XEN) > > (XEN) Manual reset required (''noreboot'' specified) > > > > =============Trace 2: ============> > > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > > (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 > > (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 > > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > > (XEN) [<ffff82c48010739b>] evtchn_set_pending+0xab/0x1b0 > > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > > (XEN) > > (XEN) Pagetable walk from ffff8315ffffffe4: > > (XEN) L4[0x106] = 00000000bf569027 5555555555555555 > > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > > (XEN) stdvga.c:147:d60 entering stdvga and caching modes > > (XEN) > > (XEN) **************************************** > > (XEN) HVM60: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $ > > (XEN) Panic on CPU 0: > > (XEN) FATAL PAGE FAULT > > (XEN) [error_code=0002] > > (XEN) Faulting linear address: ffff8315ffffffe4 > > (XEN) **************************************** > > (XEN) > > (XEN) Manual reset required (''noreboot'' specified) > > > > =============Trace 3: ============> > > > > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > > (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 > > (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 > > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > > (XEN) [<ffff82c480117804>] csched_acct+0x384/0x430 > > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26/08/2010 09:59, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> Appreciate for the detail. > > I notice the spin_lock for the code I referred, which as you mentioned will > introduce a deadlock. > In fact, during the 48 hours long run, there was a VM hung, and from the xm > list command, > the cpu time is quite high, to ten thousands, but the other VMS worked fine. I > don''t know whether > it related to the potential deadlock, since Xen still worked. > > So a quick question is if we replace the spin_lock with spin_lock_recursive, > could we avoid this deadlock?Yes. But we don''t understand why this change to MMUEXT_PIN_xxx would fix your observed bug, and without that understanding I wouldn''t accept the change into the tree.> The if statement was executed during the test since I happend put the log and > got the output log.Tell us more. Like, for example, the domain id''s of ''d'' and ''pg_owner'', and whether they are PV or HVM domains.> As a matter of fact, HVMS(all windowns 2003) under my test all are have PV > driver installed. I think that''s why the patch take effects.Nope. That hypercall is to do with PV pagetable management. An HVM guest with PV drivers still has HVM pagetable management.> Besides, I have been working on this issue for sometime, it is not possible I > made a build mistake > since I have been carefully all the time. > > Anyway, I plan to kick off two reproduce on two physical servers, one has this > patch enabled(use spin_lock_recursive > instead of spin_lock) and the other with no change, completely on clean code. > It would be useful if u have some > trace to be added into the test. I will keep you informed.Whether this fixes your problem is a good data point, but without full understanding of the bug and why this is the correct and best fix, it will not be accepted I''m afraid. -- Keir> In addtion, my kernel is > 2.6.31.13-pvops-patch #1 SMP Tue Aug 24 11:23:51 CST 2010 x86_64 x86_64 x86_64 > GNU/Linux > Xen is > 4.0.0 > > Thanks. > > > > >> Date: Thu, 26 Aug 2010 08:39:03 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@eu.citrix.com >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> On 26/08/2010 05:49, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: >> >>> Hi: >>> >>> This issue can be easily reproduced by continuous and almost concurrently >>> reboot 12 Xen HVM VMS on a single physic server. The reproduce hit the back >>> trace about 6 to 14 hours after it started. I have several similar Xen back >>> traces, please refer to the end of the mail. The first three back traces >>> almost the same, they happened in domain_kill, while the last backtrace >>> happened in do_multicall. >>> >>> As go through the Xen code, in /xen-4.0.0/xen/arch/x86/mm.c, it shows >>> that the author aware of the race competition between >>> domain_relinquish_resources and presented code. It occurred me to simply >>> move >>> line 2765 and 2766 before 2764, that is move put_page_and_type(page) into >>> the >>> spin_lock to avoid competition. >> >> Well, thanks for the detailed bug report: it is good to have a report that >> includes an attempt at a fix! >> >> In the below code, the put_page_and_type() is outside the locked region for >> good reason. Put_page_and_type() -> put_page() -> free_domheap_pages() which >> acquires d->page_alloc_lock. Because we do not use spin_lock_recursive() in >> the below code, this recursive acquisition of the lock in >> free_domheap_pages() would deadlock! >> >> Now, I do not think this fix really affected your testing anyway, because >> the below code is part of the MMUEXT_PIN_... hypercalls, and further is only >> triggered when a domain executes one of those hypercalls on *another* >> domain''s memory. The *only* time that should happen is when dom0 builds a >> *PV* VM. So since all your testing is on HVM guests I wouldn''t expect the >> code in the if() statement below to be executed ever. Well, maybe unless you >> are using qemu stub domains, or pvgrub. >> >> But even if the below code is being executed, I don''t think your change is a >> fix, or anything that should greatly affect the system apart from >> introducing a deadlock. Is it instead possible that you somehow were testing >> a broken build of Xen before, and simply re-building Xen with your change is >> what fixed things? I wonder if the bug stays gone away if you revert your >> change and re-build? >> >> If it still appears that your fix is good, I would add tracing to the below >> code and find out a bit more about when/why it is being executed. >> >> -- Keir >> >>> 2753 /* A page is dirtied when its pin status is set. */ >>> 2754 paging_mark_dirty(pg_owner, mfn); >>> 2755 >>> 2756 /* We can race domain destruction >>> (domain_relinquish_resources). */ >>> 2757 if ( unlikely(pg_owner != d) ) >>> 2758 { >>> 2759 int drop_ref; >>> 2760 spin_lock(&pg_owner->page_alloc_lock); >>> 2761 drop_ref = (pg_owner->is_dying && >>> 2762 test_and_clear_bit(_PGT_pinned, >>> 2763 >>> &page->u.inuse.type_info)); >>> 2764 spin_unlock(&pg_owner->page_alloc_lock); >>> 2765 if ( drop_ref ) >>> 2766 put_page_and_type(page); >>> 2767 } >>> 2768 >>> 2769 break; >>> 2770 } >>> >>> Form the result of reproduce on patched code, it appears the patch >>> worked well since the reproduce succeed during a 48hours long run. But I am >>> not sure of the side effects it brings. >>> Appreciate in advance if someone could give more clauses, thx. >>> >>> =============Trace 1: ============>>> >>> (XEN) ----[ Xen-4.0.0 x86_64 debug=y Not tainted ]---- >>> (XEN) CPU: 0 >>> (XEN) RIP: e008:[<ffff82c48011617c>] free_heap_pages+0x55a/0x575 >>> (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor >>> (XEN) rax: 0000001fffffffe0 rbx: ffff82f60b8bbfc0 rcx: ffff83063fe01a20 >>> (XEN) rdx: ffff8315ffffffe0 rsi: ffff8315ffffffe0 rdi: 00000000ffffffff >>> (XEN) rbp: ffff82c48037fc98 rsp: ffff82c48037fc58 r8: 0000000000000000 >>> (XEN) r9: ffffffffffffffff r10: ffff82c48020e770 r11: 0000000000000282 >>> (XEN) r12: 00007d0a00000000 r13: 0000000000000000 r14: ffff82f60b8bbfe0 >>> (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 >>> (XEN) cr3: 0000000232914000 cr2: ffff8315ffffffe4 >>> (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 >>> (XEN) Xen stack trace from rsp=ffff82c48037fc58: >>> (XEN) 0000000000000016 0000000000000000 00000000000001a2 ffff8304afc40000 >>> (XEN) 0000000000000000 ffff82f60b8bbfe0 00000000000330fe ffff82f60b8bc000 >>> (XEN) ffff82c48037fcd8 ffff82c48011647e 0000000100000000 ffff82f60b8bbfe0 >>> (XEN) ffff8304afc40020 0000000000000000 ffff8304afc40000 0000000000000000 >>> (XEN) ffff82c48037fcf8 ffff82c480160caf ffff8304afc40000 ffff82f60b8bbfe0 >>> (XEN) ffff82c48037fd68 ffff82c48014deaf 0000000000000ca3 ffff8304afc40fd8 >>> (XEN) ffff8304afc40fd8 ffff8304afc40fd8 4000000000000000 ffff82c48037ff28 >>> (XEN) 0000000000000000 ffff8304afc40000 ffff8304afc40000 000000000099e000 >>> (XEN) 00000000ffffffda 0000000000000001 ffff82c48037fd98 ffff82c4801504de >>> (XEN) ffff8304afc40000 0000000000000000 000000000099e000 00000000ffffffda >>> (XEN) ffff82c48037fdb8 ffff82c4801062ee 000000000099e000 fffffffffffffff3 >>> (XEN) ffff82c48037ff08 ffff82c480104cd7 ffff82c40000f800 0000000000000286 >>> (XEN) 0000000000000286 ffff8300bf76c000 000000ea864b1814 ffff8300bf76c030 >>> (XEN) ffff83023ff1ded8 ffff83023ff1ded0 ffff82c48037fe38 ffff82c48011c9f5 >>> (XEN) ffff82c48037ff08 ffff82c480272100 ffff8300bf76c000 ffff82c48037fe48 >>> (XEN) ffff82c48011f557 ffff82c480272100 0000000600000002 000000004700000a >>> (XEN) 000000004700bf2c 0000000000000000 000000004700c158 0000000000000000 >>> (XEN) 00002b3b59e7d050 0000000000000000 0000007f00b14140 00002b3b5f257a80 >>> (XEN) 0000000000996380 00002aaaaaad0830 00002b3b5f257a80 00000000009bb690 >>> (XEN) 00002aaaaaad0830 000000398905abf3 000000000078de60 00002b3b5f257aa4 >>> (XEN) Xen call trace: >>> (XEN) [<ffff82c48011617c>] free_heap_pages+0x55a/0x575 >>> (XEN) [<ffff82c48011647e>] free_domheap_pages+0x2e7/0x3ab >>> (XEN) [<ffff82c480160caf>] put_page+0x69/0x70 >>> (XEN) [<ffff82c48014deaf>] relinquish_memory+0x36e/0x499 >>> (XEN) [<ffff82c4801504de>] domain_relinquish_resources+0x1ac/0x24c >>> (XEN) [<ffff82c4801062ee>] domain_kill+0x93/0xe4 >>> (XEN) [<ffff82c480104cd7>] do_domctl+0xa1c/0x1205 >>> (XEN) [<ffff82c4801f71bf>] syscall_enter+0xef/0x149 >>> (XEN) >>> (XEN) Pagetable walk from ffff8315ffffffe4: >>> (XEN) L4[0x106] = 00000000bf589027 5555555555555555 >>> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 0: >>> (XEN) FATAL PAGE FAULT >>> (XEN) [error_code=0002] >>> (XEN) Faulting linear address: ffff8315ffffffe4 >>> (XEN) **************************************** >>> (XEN) >>> (XEN) Manual reset required (''noreboot'' specified) >>> >>> =============Trace 2: ============>>> >>> (XEN) Xen call trace: >>> (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 >>> (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 >>> (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 >>> (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 >>> (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 >>> (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 >>> (XEN) [<ffff82c48010739b>] evtchn_set_pending+0xab/0x1b0 >>> (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae >>> (XEN) >>> (XEN) Pagetable walk from ffff8315ffffffe4: >>> (XEN) L4[0x106] = 00000000bf569027 5555555555555555 >>> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff >>> (XEN) stdvga.c:147:d60 entering stdvga and caching modes >>> (XEN) >>> (XEN) **************************************** >>> (XEN) HVM60: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp >>> $ >>> (XEN) Panic on CPU 0: >>> (XEN) FATAL PAGE FAULT >>> (XEN) [error_code=0002] >>> (XEN) Faulting linear address: ffff8315ffffffe4 >>> (XEN) **************************************** >>> (XEN) >>> (XEN) Manual reset required (''noreboot'' specified) >>> >>> =============Trace 3: ============>>> >>> >>> (XEN) Xen call trace: >>> (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 >>> (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 >>> (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 >>> (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 >>> (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 >>> (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 >>> (XEN) [<ffff82c480117804>] csched_acct+0x384/0x430 >>> (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae >>> >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Keir: You are right about the if statement execution. After I rerun the reproduce, I never saw the output log. Obviously, I made a mistake before and I apologize. Here is more what I found. 1) We kicked off 2 reproduces, one successfully run more than 3days, the server is idle(idle means only tests are run on the server, on other workload), and one succeed run 3days idle, but when we do some other stuff(I remember is compiling kernel), the bug show up. Wired is on normally the bug will show up in less than 24 hours based on our former test. 2) Take previous failures of our test, the bug show up may not solely related to VM reboot, some other operations(such as tapdisk) might trigger the unexpected operation on domain''s pages. So when the VM is destroyed, pages are walked by free_heap_pages, which will finally go to panic. (Which also indicates frequently reboot help to expose the bug earlier). Is this possible? 3) Every panic pointer to the same address: ffff8315ffffffe4, which is not a valid page address. I printted pages of the domain in assign_pages, which all looks like ffff82f60bd64000, at least ffff82f60 is the same. A bit of lost direction to go further. Thanks.> Date: Thu, 26 Aug 2010 10:11:21 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > On 26/08/2010 09:59, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > > > Appreciate for the detail. > > > > I notice the spin_lock for the code I referred, which as you mentioned will > > introduce a deadlock. > > In fact, during the 48 hours long run, there was a VM hung, and from the xm > > list command, > > the cpu time is quite high, to ten thousands, but the other VMS worked fine. I > > don''t know whether > > it related to the potential deadlock, since Xen still worked. > > > > So a quick question is if we replace the spin_lock with spin_lock_recursive, > > could we avoid this deadlock? > > Yes. But we don''t understand why this change to MMUEXT_PIN_xxx would fix > your observed bug, and without that understanding I wouldn''t accept the > change into the tree. > > > The if statement was executed during the test since I happend put the log and > > got the output log. > > Tell us more. Like, for example, the domain id''s of ''d'' and ''pg_owner'', and > whether they are PV or HVM domains. > > > As a matter of fact, HVMS(all windowns 2003) under my test all are have PV > > driver installed. I think that''s why the patch take effects. > > Nope. That hypercall is to do with PV pagetable management. An HVM guest > with PV drivers still has HVM pagetable management. > > > Besides, I have been working on this issue for sometime, it is not possible I > > made a build mistake > > since I have been carefully all the time. > > > > Anyway, I plan to kick off two reproduce on two physical servers, one has this > > patch enabled(use spin_lock_recursive > > instead of spin_lock) and the other with no change, completely on clean code. > > It would be useful if u have some > > trace to be added into the test. I will keep you informed. > > Whether this fixes your problem is a good data point, but without full > understanding of the bug and why this is the correct and best fix, it will > not be accepted I''m afraid. > > -- Keir > > > In addtion, my kernel is > > 2.6.31.13-pvops-patch #1 SMP Tue Aug 24 11:23:51 CST 2010 x86_64 x86_64 x86_64 > > GNU/Linux > > Xen is > > 4.0.0 > > > > Thanks. > > > > > > > > > >> Date: Thu, 26 Aug 2010 08:39:03 +0100 > >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >> From: keir.fraser@eu.citrix.com > >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > >> > >> On 26/08/2010 05:49, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > >> > >>> Hi: > >>> > >>> This issue can be easily reproduced by continuous and almost concurrently > >>> reboot 12 Xen HVM VMS on a single physic server. The reproduce hit the back > >>> trace about 6 to 14 hours after it started. I have several similar Xen back > >>> traces, please refer to the end of the mail. The first three back traces > >>> almost the same, they happened in domain_kill, while the last backtrace > >>> happened in do_multicall. > >>> > >>> As go through the Xen code, in /xen-4.0.0/xen/arch/x86/mm.c, it shows > >>> that the author aware of the race competition between > >>> domain_relinquish_resources and presented code. It occurred me to simply > >>> move > >>> line 2765 and 2766 before 2764, that is move put_page_and_type(page) into > >>> the > >>> spin_lock to avoid competition. > >> > >> Well, thanks for the detailed bug report: it is good to have a report that > >> includes an attempt at a fix! > >> > >> In the below code, the put_page_and_type() is outside the locked region for > >> good reason. Put_page_and_type() -> put_page() -> free_domheap_pages() which > >> acquires d->page_alloc_lock. Because we do not use spin_lock_recursive() in > >> the below code, this recursive acquisition of the lock in > >> free_domheap_pages() would deadlock! > >> > >> Now, I do not think this fix really affected your testing anyway, because > >> the below code is part of the MMUEXT_PIN_... hypercalls, and further is only > >> triggered when a domain executes one of those hypercalls on *another* > >> domain''s memory. The *only* time that should happen is when dom0 builds a > >> *PV* VM. So since all your testing is on HVM guests I wouldn''t expect the > >> code in the if() statement below to be executed ever. Well, maybe unless you > >> are using qemu stub domains, or pvgrub. > >> > >> But even if the below code is being executed, I don''t think your change is a > >> fix, or anything that should greatly affect the system apart from > >> introducing a deadlock. Is it instead possible that you somehow were testing > >> a broken build of Xen before, and simply re-building Xen with your change is > >> what fixed things? I wonder if the bug stays gone away if you revert your > >> change and re-build? > >> > >> If it still appears that your fix is good, I would add tracing to the below > >> code and find out a bit more about when/why it is being executed. > >> > >> -- Keir > >> > >>> 2753 /* A page is dirtied when its pin status is set. */ > >>> 2754 paging_mark_dirty(pg_owner, mfn); > >>> 2755 > >>> 2756 /* We can race domain destruction > >>> (domain_relinquish_resources). */ > >>> 2757 if ( unlikely(pg_owner != d) ) > >>> 2758 { > >>> 2759 int drop_ref; > >>> 2760 spin_lock(&pg_owner->page_alloc_lock); > >>> 2761 drop_ref = (pg_owner->is_dying && > >>> 2762 test_and_clear_bit(_PGT_pinned, > >>> 2763 > >>> &page->u.inuse.type_info)); > >>> 2764 spin_unlock(&pg_owner->page_alloc_lock); > >>> 2765 if ( drop_ref ) > >>> 2766 put_page_and_type(page); > >>> 2767 } > >>> 2768 > >>> 2769 break; > >>> 2770 } > >>> > >>> Form the result of reproduce on patched code, it appears the patch > >>> worked well since the reproduce succeed during a 48hours long run. But I am > >>> not sure of the side effects it brings. > >>> Appreciate in advance if someone could give more clauses, thx. > >>> > >>> =============Trace 1: ============> >>> > >>> (XEN) ----[ Xen-4.0.0 x86_64 debug=y Not tainted ]---- > >>> (XEN) CPU: 0 > >>> (XEN) RIP: e008:[<ffff82c48011617c>] free_heap_pages+0x55a/0x575 > >>> (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > >>> (XEN) rax: 0000001fffffffe0 rbx: ffff82f60b8bbfc0 rcx: ffff83063fe01a20 > >>> (XEN) rdx: ffff8315ffffffe0 rsi: ffff8315ffffffe0 rdi: 00000000ffffffff > >>> (XEN) rbp: ffff82c48037fc98 rsp: ffff82c48037fc58 r8: 0000000000000000 > >>> (XEN) r9: ffffffffffffffff r10: ffff82c48020e770 r11: 0000000000000282 > >>> (XEN) r12: 00007d0a00000000 r13: 0000000000000000 r14: ffff82f60b8bbfe0 > >>> (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 > >>> (XEN) cr3: 0000000232914000 cr2: ffff8315ffffffe4 > >>> (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > >>> (XEN) Xen stack trace from rsp=ffff82c48037fc58: > >>> (XEN) 0000000000000016 0000000000000000 00000000000001a2 ffff8304afc40000 > >>> (XEN) 0000000000000000 ffff82f60b8bbfe0 00000000000330fe ffff82f60b8bc000 > >>> (XEN) ffff82c48037fcd8 ffff82c48011647e 0000000100000000 ffff82f60b8bbfe0 > >>> (XEN) ffff8304afc40020 0000000000000000 ffff8304afc40000 0000000000000000 > >>> (XEN) ffff82c48037fcf8 ffff82c480160caf ffff8304afc40000 ffff82f60b8bbfe0 > >>> (XEN) ffff82c48037fd68 ffff82c48014deaf 0000000000000ca3 ffff8304afc40fd8 > >>> (XEN) ffff8304afc40fd8 ffff8304afc40fd8 4000000000000000 ffff82c48037ff28 > >>> (XEN) 0000000000000000 ffff8304afc40000 ffff8304afc40000 000000000099e000 > >>> (XEN) 00000000ffffffda 0000000000000001 ffff82c48037fd98 ffff82c4801504de > >>> (XEN) ffff8304afc40000 0000000000000000 000000000099e000 00000000ffffffda > >>> (XEN) ffff82c48037fdb8 ffff82c4801062ee 000000000099e000 fffffffffffffff3 > >>> (XEN) ffff82c48037ff08 ffff82c480104cd7 ffff82c40000f800 0000000000000286 > >>> (XEN) 0000000000000286 ffff8300bf76c000 000000ea864b1814 ffff8300bf76c030 > >>> (XEN) ffff83023ff1ded8 ffff83023ff1ded0 ffff82c48037fe38 ffff82c48011c9f5 > >>> (XEN) ffff82c48037ff08 ffff82c480272100 ffff8300bf76c000 ffff82c48037fe48 > >>> (XEN) ffff82c48011f557 ffff82c480272100 0000000600000002 000000004700000a > >>> (XEN) 000000004700bf2c 0000000000000000 000000004700c158 0000000000000000 > >>> (XEN) 00002b3b59e7d050 0000000000000000 0000007f00b14140 00002b3b5f257a80 > >>> (XEN) 0000000000996380 00002aaaaaad0830 00002b3b5f257a80 00000000009bb690 > >>> (XEN) 00002aaaaaad0830 000000398905abf3 000000000078de60 00002b3b5f257aa4 > >>> (XEN) Xen call trace: > >>> (XEN) [<ffff82c48011617c>] free_heap_pages+0x55a/0x575 > >>> (XEN) [<ffff82c48011647e>] free_domheap_pages+0x2e7/0x3ab > >>> (XEN) [<ffff82c480160caf>] put_page+0x69/0x70 > >>> (XEN) [<ffff82c48014deaf>] relinquish_memory+0x36e/0x499 > >>> (XEN) [<ffff82c4801504de>] domain_relinquish_resources+0x1ac/0x24c > >>> (XEN) [<ffff82c4801062ee>] domain_kill+0x93/0xe4 > >>> (XEN) [<ffff82c480104cd7>] do_domctl+0xa1c/0x1205 > >>> (XEN) [<ffff82c4801f71bf>] syscall_enter+0xef/0x149 > >>> (XEN) > >>> (XEN) Pagetable walk from ffff8315ffffffe4: > >>> (XEN) L4[0x106] = 00000000bf589027 5555555555555555 > >>> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > >>> (XEN) > >>> (XEN) **************************************** > >>> (XEN) Panic on CPU 0: > >>> (XEN) FATAL PAGE FAULT > >>> (XEN) [error_code=0002] > >>> (XEN) Faulting linear address: ffff8315ffffffe4 > >>> (XEN) **************************************** > >>> (XEN) > >>> (XEN) Manual reset required (''noreboot'' specified) > >>> > >>> =============Trace 2: ============> >>> > >>> (XEN) Xen call trace: > >>> (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > >>> (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > >>> (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 > >>> (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 > >>> (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > >>> (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > >>> (XEN) [<ffff82c48010739b>] evtchn_set_pending+0xab/0x1b0 > >>> (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > >>> (XEN) > >>> (XEN) Pagetable walk from ffff8315ffffffe4: > >>> (XEN) L4[0x106] = 00000000bf569027 5555555555555555 > >>> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > >>> (XEN) stdvga.c:147:d60 entering stdvga and caching modes > >>> (XEN) > >>> (XEN) **************************************** > >>> (XEN) HVM60: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp > >>> $ > >>> (XEN) Panic on CPU 0: > >>> (XEN) FATAL PAGE FAULT > >>> (XEN) [error_code=0002] > >>> (XEN) Faulting linear address: ffff8315ffffffe4 > >>> (XEN) **************************************** > >>> (XEN) > >>> (XEN) Manual reset required (''noreboot'' specified) > >>> > >>> =============Trace 3: ============> >>> > >>> > >>> (XEN) Xen call trace: > >>> (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > >>> (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > >>> (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500 > >>> (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280 > >>> (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > >>> (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > >>> (XEN) [<ffff82c480117804>] csched_acct+0x384/0x430 > >>> (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > >>> > >> > >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is > not a valid page address. > I printted pages of the domain in assign_pages, which all looks like > ffff82f60bd64000, at least > ffff82f60 is the same.Yes, well you may not be crashing on a supposed page address. Certainly the page pointer that relinquish_memory() is working on, and passed to put_page->free_domheap_pages is valid enough to not cause any of those functions to crash when dereferencing it. At the moment you really have no idea what is causing free_heap_pages() to crash.> A bit of lost direction to go further. Thanks.You need to find out which line of code in free_heap_pages() is crashing, and what variable it is trying to dereference when it crashes. You have a nice backtrace with an EIP value, so you can ''objdump -d xen-syms'' and search for the EIP in the disassembly. If you have a debug build of Xen you can even do ''objdump -S xen-syms'' and have the disassembly annotated with corresponding source lines. Have you seen this on more than one physical machine? If not, have you run memtest on the offending machine? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Appreciate for the quick response. Actually I have done some decode on the backtrace last Friday. According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms" (please see below). It looks like the bug happened on the domain page list travels, which is beyond my understanding. Since in my understanding, those domain pages come from kernel memory zone, they are always reside in the physical memory, and the address shouldn''t have the chance to be changed, right? If so, what is the relationship between all those panic and free_heap_pages? Several servers (at least 3) experienced the same panic on the same test. Those servers have the identical hardware, kernel and xen configuration. Right now, on one server, memtest is running, shall be finished in a few hours. (24G memory) ------------------------------------------------------------------------------------ 169 static inline void 170 page_list_del(struct page_info *page, struct page_list_head *head) 171 { 172 struct page_info *next = pdx_to_page(page->list.next); 173 struct page_info *prev = pdx_to_page(page->list.prev); 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax # ffff82c4803764c0 <_heap> 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx 179 } 180 static inline void 181 page_list_del(struct page_info *page, struct page_list_head *head) 182 { 183 struct page_info *next = pdx_to_page(page->list.next); 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%rax 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575 <free_heap_pages+0x405> 189 struct page_info *prev = pdx_to_page(page->list.prev); 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0 <free_heap_pages+0x440> 195 196 if ( !__page_list_del_head(page, head, next, prev) ) 197 { 198 ------------------------------------------------------------------------------------> Date: Mon, 30 Aug 2010 10:02:05 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > > > 3) Every panic pointer to the same address: ffff8315ffffffe4, which is > > not a valid page address. > > I printted pages of the domain in assign_pages, which all looks like > > ffff82f60bd64000, at least > > ffff82f60 is the same. > > Yes, well you may not be crashing on a supposed page address. Certainly the > page pointer that relinquish_memory() is working on, and passed to > put_page->free_domheap_pages is valid enough to not cause any of those > functions to crash when dereferencing it. At the moment you really have no > idea what is causing free_heap_pages() to crash. > > > A bit of lost direction to go further. Thanks. > > You need to find out which line of code in free_heap_pages() is crashing, > and what variable it is trying to dereference when it crashes. You have a > nice backtrace with an EIP value, so you can ''objdump -d xen-syms'' and > search for the EIP in the disassembly. If you have a debug build of Xen you > can even do ''objdump -S xen-syms'' and have the disassembly annotated with > corresponding source lines. > > Have you seen this on more than one physical machine? If not, have you run > memtest on the offending machine? > > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> Appreciate for the quick response. > > Actually I have done some decode on the backtrace last Friday. > According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms" > (please see below). It looks like the bug happened on the domain page listffff82c4801153c3 isn''t the start of an instruction in your below disassembly. Hence you didn''t disassemble exactly the build of Xen which crashed. It needs to be exactly the same image. -- keir> travels, which is beyond my understanding. Since in my understanding, > those domain pages come from kernel memory zone, they are always > reside in the physical memory, and the address shouldn''t have the chance > to be changed, right? > If so, what is the relationship between all those panic and free_heap_pages? > > Several servers (at least 3) experienced the same panic on the same test. > Those servers have the identical hardware, kernel and xen configuration. > Right now, on one server, memtest is running, shall be finished in a few > hours. > (24G memory) > > ------------------------------------------------------------------------------ > ------ > 169 static inline void > 170 page_list_del(struct page_info *page, struct page_list_head *head) > 171 { > 172 struct page_info *next = pdx_to_page(page->list.next); > 173 struct page_info *prev = pdx_to_page(page->list.prev); > 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi > 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx > 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax > # ffff82c4803764c0 <_heap> > 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx > 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > 179 } > 180 static inline void > 181 page_list_del(struct page_info *page, struct page_list_head *head) > 182 { > 183 struct page_info *next = pdx_to_page(page->list.next); > 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax > 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax > 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%rax > 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx > 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575 > <free_heap_pages+0x405> > 189 struct page_info *prev = pdx_to_page(page->list.prev); > 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx > 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx > 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx > 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx > 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0 > <free_heap_pages+0x440> > 195 > 196 if ( !__page_list_del_head(page, head, next, prev) ) > 197 { > 198 > ------------------------------------------------------------------------------ > ------ > >> Date: Mon, 30 Aug 2010 10:02:05 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@eu.citrix.com >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: >> >>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is >>> not a valid page address. >>> I printted pages of the domain in assign_pages, which all looks like >>> ffff82f60bd64000, at least >>> ffff82f60 is the same. >> >> Yes, well you may not be crashing on a supposed page address. Certainly the >> page pointer that relinquish_memory() is working on, and passed to >> put_page->free_domheap_pages is valid enough to not cause any of those >> functions to crash when dereferencing it. At the moment you really have no >> idea what is causing free_heap_pages() to crash. >> >>> A bit of lost direction to go further. Thanks. >> >> You need to find out which line of code in free_heap_pages() is crashing, >> and what variable it is trying to dereference when it crashes. You have a >> nice backtrace with an EIP value, so you can ''objdump -d xen-syms'' and >> search for the EIP in the disassembly. If you have a debug build of Xen you >> can even do ''objdump -S xen-syms'' and have the disassembly annotated with >> corresponding source lines. >> >> Have you seen this on more than one physical machine? If not, have you run >> memtest on the offending machine? >> >> -- Keir >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Keir: Thank you for correcting my mistakes. Here is the lastest panic and its objdump. I am not familiar with assemble language and those regigsters usage. I will try to spend some other time to get more understandings. What''s your opionion? btw, the memtest is still running, so far so good, thanks. ------------------objdump------------------------------------------------------------------------ 177 ffff82c480115396:<++48 c1 e1 04 <++shl $0x4,%rcx 178 ffff82c48011539a:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx 179 } 180 static inline void 181 page_list_del(struct page_info *page, struct page_list_head *head) 182 { 183 struct page_info *next = pdx_to_page(page->list.next); 184 ffff82c48011539e:<++8b 03 <++mov (%rbx),%eax 185 ffff82c4801153a0:<++48 c1 e0 05 <++shl $0x5,%rax 186 ffff82c4801153a4:<++48 29 e8 <++sub %rbp,%rax 187 ffff82c4801153a7:<++48 3b 19 <++cmp (%rcx),%rbx 188 ffff82c4801153aa:<++0f 84 95 01 00 00 <++je ffff82c480115545 <free_heap_pages+0x405> 189 struct page_info *prev = pdx_to_page(page->list.prev); 190 ffff82c4801153b0:<++89 f2 <++mov %esi,%edx 191 ffff82c4801153b2:<++48 c1 e2 05 <++shl $0x5,%rdx 192 ffff82c4801153b6:<++48 29 ea <++sub %rbp,%rdx 193 ffff82c4801153b9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx 194 ffff82c4801153bd:<++0f 84 bd 01 00 00 <++je ffff82c480115580 <free_heap_pages+0x440> 195 196 if ( !__page_list_del_head(page, head, next, prev) ) 197 { 198 next->list.prev = page->list.prev; 199 ffff82c4801153c3:<++89 70 04 <++mov %esi,0x4(%rax) 200 prev->list.next = page->list.next; 201 ffff82c4801153c6:<++8b 03 <++mov (%rbx),%eax 202 ffff82c4801153c8:<++89 02 <++mov %eax,(%rdx) 203 ffff82c4801153ca:<++49 89 dd <++mov %rbx,%r13 204 ffff82c4801153cd:<++41 83 c4 01 <++add $0x1,%r12d 205 ffff82c4801153d1:<++41 83 fc 12 <++cmp $0x12,%r12d 206 ffff82c4801153d5:<++0f 84 e3 00 00 00 <++je ffff82c4801154be <free_heap_pages+0x37e> 207 ffff82c4801153db:<++48 bd 00 00 00 00 0a <++mov $0x7d0a00000000,%rbp 208 ffff82c4801153e2:<++7d 00 00 209 ffff82c4801153e5:<++44 89 e1 <++mov %r12d,%ecx 210 ffff82c4801153e8:<++be 01 00 00 00 <++mov $0x1,%esi --------------------------------------------------------------------------------------------------- blktap_sysfs_create: adding attributes for dev ffff880239496c00 (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor (XEN) rax: ffff8315ffffffe0 rbx: ffff82f6093b0040 rcx: ffff83063fc01a20 (XEN) rdx: ffff8315ffffffe0 rsi: 00000000ffffffff rdi: 000000000049d802 (XEN) rbp: 00007d0a00000000 rsp: ffff83023ff37cb8 r8: 0000000000000000 (XEN) r9: ffffffffffffffff r10: ffff83060a3c0018 r11: 0000000000000282 (XEN) r12: 0000000000000000 r13: ffff82f6093b0060 r14: 00000000000001a2 (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000008da54000 cr2: ffff8315ffffffe4 (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff83023ff37cb8: (XEN) ffff82f6093b7f80 00000000ffffffe0 00000000000001a2 ffff83060a3c0000 (XEN) 0000000000000000 0000000000000001 ffff82f6093b0060 0000000000000000 (XEN) ffff82f6093b0080 ffff82c480115732 00000001093b7cc0 ffff82f6093b0060 (XEN) ffff83060a3c0018 0000000000000000 ffff83060a3c0000 ffff83060a3c0fa8 (XEN) 0000000000000000 ffff82c48014aaa6 ffff83060a3c0fa8 ffff83060a3c0fa8 (XEN) ffff83060a3c0014 4000000000000000 ffff83023ff37f28 ffff83060a3c0018 (XEN) 0000000000000000 ffff83060a3c0000 0000000000305000 0000000000000009 (XEN) 0000000000000009 ffff82c48014b2fd 00ffffffffffffff ffff83060a3c0000 (XEN) 0000000000000000 ffff83023ff37e28 0000000000305000 ffff82c480105fe0 (XEN) ffff82c480255240 fffffffffffffff3 0000000002599000 ffff82c4801043ce (XEN) ffff82c4801447da 0000000000000080 ffff83023ff37f28 0000000000000096 (XEN) ffff83023ff37f28 00000000000000fc 0000000600000002 00000000023c0031 (XEN) 0000000000000001 00000039890a8e2a 0000003000000018 000000004523af30 (XEN) 000000004523ae70 0000000000000000 00007fc608ea8a70 000000398903c8a4 (XEN) 000000004523af44 0000000000000000 000000004523b158 0000000000000000 (XEN) 0000007f024f6d20 00007fc60a094750 000000000255ff40 00007fc607be5ea8 (XEN) fffffffffffffff5 0000000000000246 00000039880cc557 0000000000000100 (XEN) 00000039880cc557 0000000000000033 0000000000000246 ffff8300bf562000 (XEN) ffff8801db8d3e78 000000004523aec0 0000000000305000 0000000000000009 (XEN) 0000000000000009 ffff82c4801e3169 0000000000000009 0000000000000009 (XEN) Xen call trace: (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 (XEN) [<ffff82c48014aaa6>] relinquish_memory+0x186/0x530 (XEN) [<ffff82c48014b2fd>] domain_relinquish_resources+0x1ad/0x280 (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae (XEN) (XEN) Pagetable walk from ffff8315ffffffe4: (XEN) L4[0x106] = 00000000bf569027 5555555555555555 (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) FATAL PAGE FAULT (XEN) [error_code=0002] (XEN) Faulting linear address: ffff8315ffffffe4 (XEN) **************************************** (XEN) (XEN) Manual reset required (''noreboot'' specified) ---------------------------------------------------------------------------------------------------> Date: Mon, 30 Aug 2010 14:16:09 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > > > Appreciate for the quick response. > > > > Actually I have done some decode on the backtrace last Friday. > > According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms" > > (please see below). It looks like the bug happened on the domain page list > > ffff82c4801153c3 isn''t the start of an instruction in your below > disassembly. Hence you didn''t disassemble exactly the build of Xen which > crashed. It needs to be exactly the same image. > > -- keir > > > travels, which is beyond my understanding. Since in my understanding, > > those domain pages come from kernel memory zone, they are always > > reside in the physical memory, and the address shouldn''t have the chance > > to be changed, right? > > If so, what is the relationship between all those panic and free_heap_pages? > > > > Several servers (at least 3) experienced the same panic on the same test. > > Those servers have the identical hardware, kernel and xen configuration. > > Right now, on one server, memtest is running, shall be finished in a few > > hours. > > (24G memory) > > > > ------------------------------------------------------------------------------ > > ------ > > 169 static inline void > > 170 page_list_del(struct page_info *page, struct page_list_head *head) > > 171 { > > 172 struct page_info *next = pdx_to_page(page->list.next); > > 173 struct page_info *prev = pdx_to_page(page->list.prev); > > 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi > > 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx > > 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax > > # ffff82c4803764c0 <_heap> > > 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx > > 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > > 179 } > > 180 static inline void > > 181 page_list_del(struct page_info *page, struct page_list_head *head) > > 182 { > > 183 struct page_info *next = pdx_to_page(page->list.next); > > 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax > > 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax > > 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%rax > > 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx > > 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575 > > <free_heap_pages+0x405> > > 189 struct page_info *prev = pdx_to_page(page->list.prev); > > 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx > > 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx > > 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx > > 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx > > 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0 > > <free_heap_pages+0x440> > > 195 > > 196 if ( !__page_list_del_head(page, head, next, prev) ) > > 197 { > > 198 > > ------------------------------------------------------------------------------ > > ------ > > > >> Date: Mon, 30 Aug 2010 10:02:05 +0100 > >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >> From: keir.fraser@eu.citrix.com > >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > >> > >> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > >> > >>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is > >>> not a valid page address. > >>> I printted pages of the domain in assign_pages, which all looks like > >>> ffff82f60bd64000, at least > >>> ffff82f60 is the same. > >> > >> Yes, well you may not be crashing on a supposed page address. Certainly the > >> page pointer that relinquish_memory() is working on, and passed to > >> put_page->free_domheap_pages is valid enough to not cause any of those > >> functions to crash when dereferencing it. At the moment you really have no > >> idea what is causing free_heap_pages() to crash. > >> > >>> A bit of lost direction to go further. Thanks. > >> > >> You need to find out which line of code in free_heap_pages() is crashing, > >> and what variable it is trying to dereference when it crashes. You have a > >> nice backtrace with an EIP value, so you can ''objdump -d xen-syms'' and > >> search for the EIP in the disassembly. If you have a debug build of Xen you > >> can even do ''objdump -S xen-syms'' and have the disassembly annotated with > >> corresponding source lines. > >> > >> Have you seen this on more than one physical machine? If not, have you run > >> memtest on the offending machine? > >> > >> -- Keir > >> > >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Do you have a line in Xen boot output that starts "PFN compression on bits"? If so what does it say? My suspicion is that Jan Beulich''s patches to implement a consolidated page array for sparse memory maps has broken the assumption in some Xen code that: page_to_mfn(mfn_to_page(x)+y) == x+y, for all valid mfns x, and all y up to some pretty big limit. Looking in free_heap_pages() I see we do a whole bunch of chunk merging in our buddy allocator, doing arithmetic on variable ''pg'' to find neigbouring chunks. It''s a bit dodgy I suspect. I''m cc''ing Jan to see what we can get away with in doing arithmetic on page_info pointers. What''s the guaranteed smallest aligned contiguous ranges of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent page_info structs relate to adjacent MFNs) If this is the problem I''m pretty sure we can come up with a patch quite easily, but depending on the answer to my above question to Jan, we may need to do some code auditing. -- Keir On 31/08/2010 14:49, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> Hi Keir: > > Thank you for correcting my mistakes. > Here is the lastest panic and its objdump. > I am not familiar with assemble language and those regigsters usage. > I will try to spend some other time to get more understandings. > What''s your opionion? > btw, the memtest is still running, so far so good, thanks. > > ------------------objdump----------------------------------------------------- > ------------------- > 177 ffff82c480115396:<++48 c1 e1 04 <++shl $0x4,%rcx > 178 ffff82c48011539a:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > 179 } > 180 static inline void > 181 page_list_del(struct page_info *page, struct page_list_head *head) > 182 { > 183 struct page_info *next = pdx_to_page(page->list.next); > 184 ffff82c48011539e:<++8b 03 <++mov (%rbx),%eax > 185 ffff82c4801153a0:<++48 c1 e0 05 <++shl $0x5,%rax > 186 ffff82c4801153a4:<++48 29 e8 <++sub %rbp,%rax 187 > ffff82c4801153a7:<++48 3b 19 <++cmp (%rcx),%rbx > 188 ffff82c4801153aa:<++0f 84 95 01 00 00 <++je ffff82c480115545 > <free_heap_pages+0x405> > 189 struct page_info *prev = pdx_to_page(page->list.prev); > 190 ffff82c4801153b0:<++89 f2 <++mov %esi,%edx > 191 ffff82c4801153b2:<++48 c1 e2 05 <++shl $0x5,%rdx > 192 ffff82c4801153b6:<++48 29 ea <++sub %rbp,%rdx > 193 ffff82c4801153b9:<++48 3b 59 08 <++cmp &nbs p; 0x8(%rcx),%rbx > 194 ffff82c4801153bd:<++0f 84 bd 01 00 00 <++je ffff82c480115580 > <free_heap_pages+0x440> > 195 > 196 if ( !__page_list_del_head(page, head, next, prev) ) > 197 { > 198 next->list.prev = page->list.prev; > 199 ffff82c4801153c3:<++89 70 04 <++mov %esi,0x4(%rax) > 200 prev->list.next = page->list.next; > 201 ffff82c4801153c6:<++8b 03 <++mov (%rbx),%eax > &nbs p; > 202 ffff82c4801153c8:<++89 02 <++mov %eax,(%rdx) > 203 ffff82c4801153ca:<++49 89 dd <++mov %rbx,%r13 > 204 ffff82c4801153cd:<++41 83 c4 01 <++add $0x1,%r12d > 205 ffff82c4801153d1:<++41 83 fc 12 <++cmp ; $0x12,%r12d > 206 ffff82c4801153d5:<++0f 84 e3 00 00 00 <++je ffff82c4801154be > <free_heap_pages+0x37e> > 207 ffff82c4801153db:<++48 bd 00 00 00 00 0a <++mov $0x7d0a00000000,%rbp > 208 ffff82c4801153e2:<++7d 00 00 > 209 ffff82c4801153e5:<++44 89 e1 <++mov %r12d,%ecx > 210 ffff82c4801153e8:<++be 01 00 00 00 <++mov $0x1,%esi > > > ------------------------------------------------------------------------------ > --------------------- > blktap_sysfs_create: adding attributes for dev ffff880239496c00 > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 2 > (XEN) RIP: e008:[<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > (XEN) rax: ffff8315ffffffe0 rbx: ffff82f6093b0040 rcx: ffff83063fc01a20 > (XEN) rdx: ffff8315ffffffe0 rsi: 00000000ffffffff rdi: 000000000049d802 > (XEN) rbp: 00007d0a00000000 rsp: ffff83023ff37cb8 r8: 0000000000000000 > (XEN) r9: ffffffffffffffff r10: ffff83060a3c0018 r11: 0000000000000282 > (XEN) r12: 0000000000000000 r13: ffff82f6093b0060 r14: 00000000000001a2 > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 000000008da54000 cr2: ffff83 15ffffffe4 > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff83023ff37cb8: > (XEN) ffff82f6093b7f80 00000000ffffffe0 00000000000001a2 ffff83060a3c0000 > (XEN) 0000000000000000 0000000000000001 ffff82f6093b0060 0000000000000000 > (XEN) ffff82f6093b0080 ffff82c480115732 00000001093b7cc0 ffff82f6093b0060 > (XEN) ffff83060a3c0018 0000000000000000 ffff83060a3c0000 ffff83060a3c0fa8 > (XEN) 0000000000000000 ffff82c48014aaa6 ffff83060a3c0fa8 ffff83060a3c0fa8 > (XEN) ffff83060a3c0014 4000000000000000 ffff83023ff37f28 ffff83060a3c0018 > (XEN) 0000000000000000 ffff83060a3c0000 0000000000305000 0000000000000009 > (XEN) 0000000000000009 ffff82c48014b2fd 00ffffffffffffff ffff83060a3c0000 > (XEN) 0000000000000000 ffff83023ff37e28 0000000000305000 ffff82c480105fe0 > (XEN) ffff82c480255240 fffffffffffffff3 0000000002599000 ffff82c4801043ce > (XEN) ffff82c4801447da 0000000000000080 ffff83023ff37f28 0000000000000096 > (XEN) ffff83023ff37f28 00000000000000fc 0000000600000002 00000000023c0031 > (XEN) 0000000000000001 00000039890a8e2a 0000003000000018 000000004523af30 > (XEN) 000000004523ae70 0000000000000000 00007fc608ea8a70 000000398903c8a4 > (XEN) 000000004523af44 0000000000000000 000000004523b158 0000000000000000 > (XEN) 0000007f024f6d20 00007fc60a094750 000000000255ff40 00007fc607be5ea8 > (XEN) fffffffffffffff5 0000000000000246 00000039880cc557 0000000000000100 > (XEN) 00000039880cc557 0000000000000033 0000000000000246 ffff8300bf562000 > (XEN) ffff8801db8d3e78 000000004523aec0 0000000000305000 000000 0000000009 > (XEN) 0000000000000009 ffff82c4801e3169 0000000000000009 0000000000000009 > (XEN) Xen call trace: > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > (XEN) [<ffff82c48014aaa6>] relinquish_memory+0x186/0x530 > (XEN) [<ffff82c48014b2fd>] domain_relinquish_resources+0x1ad/0x280 > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > (XEN) > (XEN) Pagetable walk from ffff8315ffffffe4: > (XEN) L4[0x106] = 00000000bf569027 5555555555555555 > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > (XE N) > (XEN) **************************************** > (XEN) Panic on CPU 2: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0002] > (XEN) Faulting linear address: ffff8315ffffffe4 > (XEN) **************************************** > (XEN) > (XEN) Manual reset required (''noreboot'' specified) > > ------------------------------------------------------------------------------ > --------------------- >> Date: Mon, 30 Aug 2010 14:16:09 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@eu.citrix.com >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: >> >>> Appreciate for the quick response. >>> >>> Actually I have done some decode on the backtrace last Friday. >>> According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms" >>> (please see below). It looks like the bug happened on the domain page list >> >> ffff82c4801153c3 isn''t the start of an instruction in your below >> disassembly. Hence you didn''t disassemble exactly the build of Xen which >> crashed. It needs to be exactly the same image. >> >> -- keir >> >> & gt; travels, which is beyond my understanding. Since in my understanding, >>> those domain pages come from kernel memory zone, they are always >>> reside in the physical memory, and the address shouldn''t have the chance >>> to be changed, right? >>> If so, what is the relationship between all those panic and free_heap_pages? >>> >>> Several servers (at least 3) experienced the same panic on the same test. >>> Those servers have the identical hardware, kernel and xen configuration. >>> Right now, on one server, memtest is running, shall be finished in a few >>> hours. >>> (24G memory) >>> >>> ---------------------------------------------------------------------------- >>> -- >>> ------ >>> 169 static inline void >>> 170 page_list_del(struct page_info *page, struct page_list_head *head) >>> 171 { >>> 172 struct page_info *next = p dx_to_page(page->list.next); >>> 173 struct page_info *prev = pdx_to_page(page->list.prev); >>> 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi >>> 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx >>> 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax >>> # ffff82c4803764c0 <_heap> >>> 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx >>> 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx >>> 179 } >>> 180 static inline void >>> 181 page_list_del(struct page_info *page, struct page_list_head *head) >>> 182 { >>> 183 struct page_info *next = pdx_to_page(page->list.next); >>> 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax >>> 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax >>> 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%r ax >>> 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx >>> 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575 >>> <free_heap_pages+0x405> >>> 189 struct page_info *prev = pdx_to_page(page->list.prev); >>> 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx >>> 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx >>> 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx >>> 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx >>> 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0 >>> <free_heap_pages+0x440> >>> 195 >>> 196 if ( !__page_list_del_head(page, head, next, prev) ) >>> 197 { >>> 198 >>> ---------------------------------------------------------------------------- >>> -- >>> ------ >>> >>>> Date: Mon, 30 Aug 2010 10:02:05 +01 00 >>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >>>> From: keir.fraser@eu.citrix.com >>>> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >>>> >>>> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: >>>> >>>>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is >>>>> not a valid page address. >>>>> I printted pages of the domain in assign_pages, which all looks like >>>>> ffff82f60bd64000, at least >>>>> ffff82f60 is the same. >>>> >>>> Yes, well you may not be crashing on a supposed page address. Certainly the >>>> page pointer that relinquish_memory() is working on, and passed to >>>> put_page->free_domheap_pages is valid enough to not cause any of those >>>> functions to crash when dereferenci ng it. At the moment you really have no >>>> idea what is causing free_heap_pages() to crash. >>>> >>>>> A bit of lost direction to go further. Thanks. >>>> >>>> You need to find out which line of code in free_heap_pages() is crashing, >>>> and what variable it is trying to dereference when it crashes. You have a >>>> nice backtrace with an EIP value, so you can ''objdump -d xen-syms'' and >>>> search for the EIP in the disassembly. If you have a debug build of Xen you >>>> can even do ''objdump -S xen-syms'' and have the disassembly annotated with >>>> corresponding source lines. >>>> >>>> Have you seen this on more than one physical machine? If not, have you run >>>> memtest on the offending machine? >>>> >>>> -- Keir >>>> >>>> >>> >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/08/2010 15:49, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> I''m cc''ing Jan to see what we can get away with in doing arithmetic on > page_info pointers. What''s the guaranteed smallest aligned contiguous ranges > of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent > page_info structs relate to adjacent MFNs) > > If this is the problem I''m pretty sure we can come up with a patch quite > easily, but depending on the answer to my above question to Jan, we may need > to do some code auditing.Actually I think we get away with it if it is guaranteed that: pfn_pdx_bottom_mask >= (1<<MAX_ORDER)-1 I don''t see that this is guaranteed by pfn_pdx_hole_setup() but it would be easy to do, have no real harm on the technique''s space saving, and I think then all of our existing page_info pointer arithmetic would be guaranteed to just work as it always has done. Anyway, need to know if you have the line about "PFN compression" in your ''xm dmesg'' boot output. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.08.10 at 16:49, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > I''m cc''ing Jan to see what we can get away with in doing arithmetic on > page_info pointers. What''s the guaranteed smallest aligned contiguous ranges > of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent > page_info structs relate to adjacent MFNs)Any range of struct page_info-s that crosses a 2Mb boundary is unsafe to make assumptions upon (with 32-byte struct page_info that means 256Mb of memory, but if struct page_info grows that range might shrink). If that limit is too low, we might need to enforce a lower limit on the bit positions on which compression may be done (possibly at the price of doing less compression). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/08/2010 16:07, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 31.08.10 at 16:49, Keir Fraser <keir.fraser@eu.citrix.com> wrote: >> I''m cc''ing Jan to see what we can get away with in doing arithmetic on >> page_info pointers. What''s the guaranteed smallest aligned contiguous ranges >> of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent >> page_info structs relate to adjacent MFNs) > > Any range of struct page_info-s that crosses a 2Mb boundary is > unsafe to make assumptions uponWhere is even that constraint ensured in the code? I can''t see it (I would have assumed that pfn_pdx_hole_setup() would be ensuring it). -- Keir> (with 32-byte struct page_info > that means 256Mb of memory, but if struct page_info grows that > range might shrink). If that limit is too low, we might need to > enforce a lower limit on the bit positions on which compression > may be done (possibly at the price of doing less compression)._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.08.10 at 18:01, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 31/08/2010 16:07, "Jan Beulich" <JBeulich@novell.com> wrote: > >>>>> On 31.08.10 at 16:49, Keir Fraser <keir.fraser@eu.citrix.com> wrote: >>> I''m cc''ing Jan to see what we can get away with in doing arithmetic on >>> page_info pointers. What''s the guaranteed smallest aligned contiguous ranges >>> of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent >>> page_info structs relate to adjacent MFNs) >> >> Any range of struct page_info-s that crosses a 2Mb boundary is >> unsafe to make assumptions upon > > Where is even that constraint ensured in the code? I can''t see it (I would > have assumed that pfn_pdx_hole_setup() would be ensuring it).That''s somewhat implicit: srat_parse_regions() gets passed an address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G). Thus srat_parse_regions() starts off with a mask with the lower 32 bits all set (only more bits can get set subsequently). Thus the earliest zero bit pfn_pdx_hole_setup() can find is bit 20 (due to the >> PAGE_SHIFT in the invocation). Consequently the smallest chunk where arithmetic is valid really is 4Gb, not 256Mb as I first wrote. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/08/2010 17:22, "Jan Beulich" <JBeulich@novell.com> wrote:>> Where is even that constraint ensured in the code? I can''t see it (I would >> have assumed that pfn_pdx_hole_setup() would be ensuring it). > > That''s somewhat implicit: srat_parse_regions() gets passed an > address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G). > Thus srat_parse_regions() starts off with a mask with the lower > 32 bits all set (only more bits can get set subsequently). Thus > the earliest zero bit pfn_pdx_hole_setup() can find is bit 20 > (due to the >> PAGE_SHIFT in the invocation). Consequently > the smallest chunk where arithmetic is valid really is 4Gb, not > 256Mb as I first wrote.Well, that''s a bit too implicit for me. How about we initialise ''j'' to MAX_ORDER in pfn_pdx_hole_setup() with a comment about supporting page_info pointer arithmetic within allocatable multi-page regions? Something like the appended (but with a code comment)? -- Keir --- a/xen/arch/x86/x86_64/mm.c Mon Aug 30 14:59:12 2010 +0100 +++ b/xen/arch/x86/x86_64/mm.c Tue Aug 31 17:34:34 2010 +0100 @@ -165,7 +165,8 @@ { unsigned int i, j, bottom_shift, hole_shift; - for ( hole_shift = bottom_shift = j = 0; ; ) + hole_shift = bottom_shift = 0; + for ( j = MAX_ORDER-1; ; ) { i = find_next_zero_bit(&mask, BITS_PER_LONG, j); j = find_next_bit(&mask, BITS_PER_LONG, i); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/08/2010 17:35, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> That''s somewhat implicit: srat_parse_regions() gets passed an >> address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G). >> Thus srat_parse_regions() starts off with a mask with the lower >> 32 bits all set (only more bits can get set subsequently). Thus >> the earliest zero bit pfn_pdx_hole_setup() can find is bit 20 >> (due to the >> PAGE_SHIFT in the invocation). Consequently >> the smallest chunk where arithmetic is valid really is 4Gb, not >> 256Mb as I first wrote. > > Well, that''s a bit too implicit for me. How about we initialise ''j'' to > MAX_ORDER in pfn_pdx_hole_setup() with a comment about supporting page_info > pointer arithmetic within allocatable multi-page regions?Well I agree with your logic anyway. So I don''t see that this can be the cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as to why the page arithmetic and checks in free_heap_pages are (apparently) resulting in a page pointer way outside the frame-table region and actually in the directmap region. I think an obvious next step wpuld be to get your boot output, MaoXiaoyun. Can you please post it? And you may as well stop your memtest if you haven''t already. If you''ve seen the issue on more than one machine then it certainly isn''t due to that kind of hardware failure. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you the details. There is no "PFN compression on bits" on Xen boot output. I add some extra log, and found it returned from xen/arch/x86/x86_64/mm.c, line 183. Please refer to the boot log below. I may can add some assertions on the pages address after chunk merging. Thank you for mails your forwarded. I will go through all of them later. --------------------------pfn_pdx_hole_setup----------------- 164 void __init pfn_pdx_hole_setup(unsigned long mask) 165 { 166 unsigned int i, j, bottom_shift, hole_shift; 167 printk("-------in pfn\n"); 168 169 for ( hole_shift = bottom_shift = j = 0; ; ) 170 { 171 i = find_next_zero_bit(&mask, BITS_PER_LONG, j); 172 j = find_next_bit(&mask, BITS_PER_LONG, i); 173 if ( j >= BITS_PER_LONG ) 174 break; 175 if ( j - i > hole_shift ) 176 { 177 hole_shift = j - i; 178 bottom_shift = i; 179 } 180 } 181 if ( !hole_shift ){ 182 printk("-------hole shift returned\n"); 183 return; 184 } 185 printk("-------in pfn middle \n"); 186 187 printk(KERN_INFO "PFN compression on bits %u...%u\n", 188 bottom_shift, bottom_shift + hole_shift - 1); 189 printk("----PFN compression on bits %u...%u\n", 190 bottom_shift, bottom_shift + hole_shift - 1); 191 192 pfn_pdx_hole_shift = hole_shift; 193 pfn_pdx_bottom_mask = (1UL << bottom_shift) - 1; 194 ma_va_bottom_mask = (PAGE_SIZE << bottom_shift) - 1; 195 pfn_hole_mask = ((1UL << hole_shift) - 1) << bottom_shift; 196 pfn_top_mask = ~(pfn_pdx_bottom_mask | pfn_hole_mask); 197 ma_top_mask = pfn_top_mask << PAGE_SHIFT; 198 } ------------------------------------------xen boot log--------------------- (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009a800 (usable) (XEN) 000000000009a800 - 00000000000a0000 (reserved) (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000bf790000 (usable) (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000fff00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000640000000 (usable) (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) (XEN) ACPI: FACS BF79E000, 0040 (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) (XEN) --------------844 (XEN) ---------srat enter (XEN) ---------prepare enter into pfn (XEN) -------in pfn (XEN) -------hole shift returned (XEN) --------------849 (XEN) System RAM: 24542MB (25131224kB) (XEN) Domain heap initialised DMA width 31 bits> Date: Tue, 31 Aug 2010 15:49:29 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > CC: JBeulich@novell.com > > Do you have a line in Xen boot output that starts "PFN compression on bits"? > If so what does it say? > > My suspicion is that Jan Beulich''s patches to implement a consolidated page > array for sparse memory maps has broken the assumption in some Xen code > that: > page_to_mfn(mfn_to_page(x)+y) == x+y, for all valid mfns x, and all y up to > some pretty big limit. > > Looking in free_heap_pages() I see we do a whole bunch of chunk merging in > our buddy allocator, doing arithmetic on variable ''pg'' to find neigbouring > chunks. It''s a bit dodgy I suspect. > > I''m cc''ing Jan to see what we can get away with in doing arithmetic on > page_info pointers. What''s the guaranteed smallest aligned contiguous ranges > of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent > page_info structs relate to adjacent MFNs) > > If this is the problem I''m pretty sure we can come up with a patch quite > easily, but depending on the answer to my above question to Jan, we may need > to do some code auditing. > > -- Keir > > On 31/08/2010 14:49, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > > > Hi Keir: > > > > Thank you for correcting my mistakes. > > Here is the lastest panic and its objdump. > > I am not familiar with assemble language and those regigsters usage. > > I will try to spend some other time to get more understandings. > > What''s your opionion? > > btw, the memtest is still running, so far so good, thanks. > > > > ------------------objdump----------------------------------------------------- > > ------------------- > > 177 ffff82c480115396:<++48 c1 e1 04 <++shl $0x4,%rcx > > 178 ffff82c48011539a:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > > 179 } > > 180 static inline void > > 181 page_list_del(struct page_info *page, struct page_list_head *head) > > 182 { > > 183 struct page_info *next = pdx_to_page(page->list.next); > > 184 ffff82c48011539e:<++8b 03 <++mov (%rbx),%eax > > 185 ffff82c4801153a0:<++48 c1 e0 05 <++shl $0x5,%rax > > 186 ffff82c4801153a4:<++48 29 e8 <++sub %rbp,%rax 187 > > ffff82c4801153a7:<++48 3b 19 <++cmp (%rcx),%rbx > > 188 ffff82c4801153aa:<++0f 84 95 01 00 00 <++je ffff82c480115545 > > <free_heap_pages+0x405> > > 189 struct page_info *prev = pdx_to_page(page->list.prev); > > 190 ffff82c4801153b0:<++89 f2 <++mov %esi,%edx > > 191 ffff82c4801153b2:<++48 c1 e2 05 <++shl $0x5,%rdx > > 192 ffff82c4801153b6:<++48 29 ea <++sub %rbp,%rdx > > 193 ffff82c4801153b9:<++48 3b 59 08 <++cmp &nbs p; 0x8(%rcx),%rbx > > 194 ffff82c4801153bd:<++0f 84 bd 01 00 00 <++je ffff82c480115580 > > <free_heap_pages+0x440> > > 195 > > 196 if ( !__page_list_del_head(page, head, next, prev) ) > > 197 { > > 198 next->list.prev = page->list.prev; > > 199 ffff82c4801153c3:<++89 70 04 <++mov %esi,0x4(%rax) > > 200 prev->list.next = page->list.next; > > 201 ffff82c4801153c6:<++8b 03 <++mov (%rbx),%eax > > &nbs p; > > 202 ffff82c4801153c8:<++89 02 <++mov %eax,(%rdx) > > 203 ffff82c4801153ca:<++49 89 dd <++mov %rbx,%r13 > > 204 ffff82c4801153cd:<++41 83 c4 01 <++add $0x1,%r12d > > 205 ffff82c4801153d1:<++41 83 fc 12 <++cmp ; $0x12,%r12d > > 206 ffff82c4801153d5:<++0f 84 e3 00 00 00 <++je ffff82c4801154be > > <free_heap_pages+0x37e> > > 207 ffff82c4801153db:<++48 bd 00 00 00 00 0a <++mov $0x7d0a00000000,%rbp > > 208 ffff82c4801153e2:<++7d 00 00 > > 209 ffff82c4801153e5:<++44 89 e1 <++mov %r12d,%ecx > > 210 ffff82c4801153e8:<++be 01 00 00 00 <++mov $0x1,%esi > > > > > > ------------------------------------------------------------------------------ > > --------------------- > > blktap_sysfs_create: adding attributes for dev ffff880239496c00 > > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- > > (XEN) CPU: 2 > > (XEN) RIP: e008:[<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > > (XEN) rax: ffff8315ffffffe0 rbx: ffff82f6093b0040 rcx: ffff83063fc01a20 > > (XEN) rdx: ffff8315ffffffe0 rsi: 00000000ffffffff rdi: 000000000049d802 > > (XEN) rbp: 00007d0a00000000 rsp: ffff83023ff37cb8 r8: 0000000000000000 > > (XEN) r9: ffffffffffffffff r10: ffff83060a3c0018 r11: 0000000000000282 > > (XEN) r12: 0000000000000000 r13: ffff82f6093b0060 r14: 00000000000001a2 > > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 > > (XEN) cr3: 000000008da54000 cr2: ffff83 15ffffffe4 > > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > > (XEN) Xen stack trace from rsp=ffff83023ff37cb8: > > (XEN) ffff82f6093b7f80 00000000ffffffe0 00000000000001a2 ffff83060a3c0000 > > (XEN) 0000000000000000 0000000000000001 ffff82f6093b0060 0000000000000000 > > (XEN) ffff82f6093b0080 ffff82c480115732 00000001093b7cc0 ffff82f6093b0060 > > (XEN) ffff83060a3c0018 0000000000000000 ffff83060a3c0000 ffff83060a3c0fa8 > > (XEN) 0000000000000000 ffff82c48014aaa6 ffff83060a3c0fa8 ffff83060a3c0fa8 > > (XEN) ffff83060a3c0014 4000000000000000 ffff83023ff37f28 ffff83060a3c0018 > > (XEN) 0000000000000000 ffff83060a3c0000 0000000000305000 0000000000000009 > > (XEN) 0000000000000009 ffff82c48014b2fd 00ffffffffffffff ffff83060a3c0000 > > (XEN) 0000000000000000 ffff83023ff37e28 0000000000305000 ffff82c480105fe0 > > (XEN) ffff82c480255240 fffffffffffffff3 0000000002599000 ffff82c4801043ce > > (XEN) ffff82c4801447da 0000000000000080 ffff83023ff37f28 0000000000000096 > > (XEN) ffff83023ff37f28 00000000000000fc 0000000600000002 00000000023c0031 > > (XEN) 0000000000000001 00000039890a8e2a 0000003000000018 000000004523af30 > > (XEN) 000000004523ae70 0000000000000000 00007fc608ea8a70 000000398903c8a4 > > (XEN) 000000004523af44 0000000000000000 000000004523b158 0000000000000000 > > (XEN) 0000007f024f6d20 00007fc60a094750 000000000255ff40 00007fc607be5ea8 > > (XEN) fffffffffffffff5 0000000000000246 00000039880cc557 0000000000000100 > > (XEN) 00000039880cc557 0000000000000033 0000000000000246 ffff8300bf562000 > > (XEN) ffff8801db8d3e78 000000004523aec0 0000000000305000 000000 0000000009 > > (XEN) 0000000000000009 ffff82c4801e3169 0000000000000009 0000000000000009 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > > (XEN) [<ffff82c48014aaa6>] relinquish_memory+0x186/0x530 > > (XEN) [<ffff82c48014b2fd>] domain_relinquish_resources+0x1ad/0x280 > > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 > > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > > (XEN) > > (XEN) Pagetable walk from ffff8315ffffffe4: > > (XEN) L4[0x106] = 00000000bf569027 5555555555555555 > > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > > (XE N) > > (XEN) **************************************** > > (XEN) Panic on CPU 2: > > (XEN) FATAL PAGE FAULT > > (XEN) [error_code=0002] > > (XEN) Faulting linear address: ffff8315ffffffe4 > > (XEN) **************************************** > > (XEN) > > (XEN) Manual reset required (''noreboot'' specified) > > > > ------------------------------------------------------------------------------ > > --------------------- > >> Date: Mon, 30 Aug 2010 14:16:09 +0100 > >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >> From: keir.fraser@eu.citrix.com > >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > >> > >> On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > >> > >>> Appreciate for the quick response. > >>> > >>> Actually I have done some decode on the backtrace last Friday. > >>> According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms" > >>> (please see below). It looks like the bug happened on the domain page list > >> > >> ffff82c4801153c3 isn''t the start of an instruction in your below > >> disassembly. Hence you didn''t disassemble exactly the build of Xen which > >> crashed. It needs to be exactly the same image. > >> > >> -- keir > >> > >> & gt; travels, which is beyond my understanding. Since in my understanding, > >>> those domain pages come from kernel memory zone, they are always > >>> reside in the physical memory, and the address shouldn''t have the chance > >>> to be changed, right? > >>> If so, what is the relationship between all those panic and free_heap_pages? > >>> > >>> Several servers (at least 3) experienced the same panic on the same test. > >>> Those servers have the identical hardware, kernel and xen configuration. > >>> Right now, on one server, memtest is running, shall be finished in a few > >>> hours. > >>> (24G memory) > >>> > >>> ---------------------------------------------------------------------------- > >>> -- > >>> ------ > >>> 169 static inline void > >>> 170 page_list_del(struct page_info *page, struct page_list_head *head) > >>> 171 { > >>> 172 struct page_info *next = p dx_to_page(page->list.next); > >>> 173 struct page_info *prev = pdx_to_page(page->list.prev); > >>> 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi > >>> 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx > >>> 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax > >>> # ffff82c4803764c0 <_heap> > >>> 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx > >>> 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > >>> 179 } > >>> 180 static inline void > >>> 181 page_list_del(struct page_info *page, struct page_list_head *head) > >>> 182 { > >>> 183 struct page_info *next = pdx_to_page(page->list.next); > >>> 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax > >>> 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax > >>> 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%r ax > >>> 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx > >>> 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575 > >>> <free_heap_pages+0x405> > >>> 189 struct page_info *prev = pdx_to_page(page->list.prev); > >>> 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx > >>> 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx > >>> 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx > >>> 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx > >>> 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0 > >>> <free_heap_pages+0x440> > >>> 195 > >>> 196 if ( !__page_list_del_head(page, head, next, prev) ) > >>> 197 { > >>> 198 > >>> ---------------------------------------------------------------------------- > >>> -- > >>> ------ > >>> > >>>> Date: Mon, 30 Aug 2010 10:02:05 +01 00 > >>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >>>> From: keir.fraser@eu.citrix.com > >>>> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > >>>> > >>>> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > >>>> > >>>>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is > >>>>> not a valid page address. > >>>>> I printted pages of the domain in assign_pages, which all looks like > >>>>> ffff82f60bd64000, at least > >>>>> ffff82f60 is the same. > >>>> > >>>> Yes, well you may not be crashing on a supposed page address. Certainly the > >>>> page pointer that relinquish_memory() is working on, and passed to > >>>> put_page->free_domheap_pages is valid enough to not cause any of those > >>>> functions to crash when dereferenci ng it. At the moment you really have no > >>>> idea what is causing free_heap_pages() to crash. > >>>> > >>>>> A bit of lost direction to go further. Thanks. > >>>> > >>>> You need to find out which line of code in free_heap_pages() is crashing, > >>>> and what variable it is trying to dereference when it crashes. You have a > >>>> nice backtrace with an EIP value, so you can ''objdump -d xen-syms'' and > >>>> search for the EIP in the disassembly. If you have a debug build of Xen you > >>>> can even do ''objdump -S xen-syms'' and have the disassembly annotated with > >>>> corresponding source lines. > >>>> > >>>> Have you seen this on more than one physical machine? If not, have you run > >>>> memtest on the offending machine? > >>>> > >>>> -- Keir > >>>> > >>>> > >>> > >> > >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
As I go through the chunk merge code in free_heap_pages, one thing I''d like to mention is, previously, I printted out all domain pages when allocated, and I found the order in assgin_pages in /xen-4.0.0/xen/common/page_alloc.c:1087, the order either be 0, or 9, and later I know that is because domain U populate physmap 2M Bytes everytime. And here in the while statement, the order is compare with MAX_ORDER, which is 20. I wonder if it might have some clues. Thanks. ------------------------------- 531 532 /* Merge chunks as far as possible. */ 533 while ( order < MAX_ORDER ) 534 { 535 mask = 1UL << order;> Date: Tue, 31 Aug 2010 18:03:41 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: JBeulich@novell.com > CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > On 31/08/2010 17:35, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > > >> That''s somewhat implicit: srat_parse_regions() gets passed an > >> address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G). > >> Thus srat_parse_regions() starts off with a mask with the lower > >> 32 bits all set (only more bits can get set subsequently). Thus > >> the earliest zero bit pfn_pdx_hole_setup() can find is bit 20 > >> (due to the >> PAGE_SHIFT in the invocation). Consequently > >> the smallest chunk where arithmetic is valid really is 4Gb, not > >> 256Mb as I first wrote. > > > > Well, that''s a bit too implicit for me. How about we initialise ''j'' to > > MAX_ORDER in pfn_pdx_hole_setup() with a comment about supporting page_info > > pointer arithmetic within allocatable multi-page regions? > > Well I agree with your logic anyway. So I don''t see that this can be the > cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as to > why the page arithmetic and checks in free_heap_pages are (apparently) > resulting in a page pointer way outside the frame-table region and actually > in the directmap region. > > I think an obvious next step wpuld be to get your boot output, MaoXiaoyun. > Can you please post it? And you may as well stop your memtest if you haven''t > already. If you''ve seen the issue on more than one machine then it certainly > isn''t due to that kind of hardware failure. > > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 01/09/2010 08:17, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> As I go through the chunk merge code in free_heap_pages, one thing I''d like > to mention is, previously, I printted out all domain pages when allocated, > and I found the order in assgin_pages in > /xen-4.0.0/xen/common/page_alloc.c:1087, > the order either be 0, or 9, and later I know that is because domain U > populate physmap > 2M Bytes everytime. > > And here in the while statement, the order is compare with MAX_ORDER, which > is 20. > I wonder if it might have some clues.Xen''s buddy allocator merges pairs of adjacent free chunks up to a maximum size of 2**20 pages. That merging needs to be careful it doesn''t merge off the end of RAM. I''m just guessing that maybe there''s an issue with that on your fairly large memory system. -- Keir> Thanks. > ------------------------------- > 531 > 532 /* Merge chunks as far as possible. */ > 533 while ( order < MAX_ORDER ) > 534 { > 535 mask = 1UL << order; > >> Date: Tue, 31 Aug 2010 18:03:41 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@eu.citrix.com >> To: JBeulich@novell.com >> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> On 31/08/2010 17:35, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >> >>>> That''s somewhat implicit: srat_parse_regions() gets passed an >>>> address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G). >>>> Thus srat_parse_regions() starts off with a mask with the lower >>>> 32 bits all set (only more bits can get set subsequently). Thus >>>> the earliest zero bit pfn_pdx_hole_setup() can find is bit 20 >>>> (due to the >> PAGE_SHIFT in the invocation). Consequently >>>> the smallest chunk where arithmetic is valid really is 4Gb, not >>>> 256Mb as I first wrote. >>> >>> Well, that''s a bit too implicit for me. How about we initialise ''j'' to >>> MAX_ORDER in pfn_pdx_hole_setup() with a comment about supporting page_info >>> pointer arithmetic within allocatable multi-page regions? >> >> Well I agree with your logic anyway. So I don''t see that this can be the >> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as to >> why the page arithmetic and checks in free_heap_pages are (apparently) >> resulting in a page pointer way outside the frame-table region and actually >> in the directmap region. >> >> I think an obvious next step wpuld be to get your boot output, MaoXiaoyun. >> Can you please post it? And you may as well stop your memtest if you haven''t >> already. If you''ve seen the issue on more than one machine then it certainly >> isn''t due to that kind of hardware failure. >> >> -- Keir >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.08.10 at 18:35, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 31/08/2010 17:22, "Jan Beulich" <JBeulich@novell.com> wrote: > >>> Where is even that constraint ensured in the code? I can''t see it (I would >>> have assumed that pfn_pdx_hole_setup() would be ensuring it). >> >> That''s somewhat implicit: srat_parse_regions() gets passed an >> address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G). >> Thus srat_parse_regions() starts off with a mask with the lower >> 32 bits all set (only more bits can get set subsequently). Thus >> the earliest zero bit pfn_pdx_hole_setup() can find is bit 20 >> (due to the >> PAGE_SHIFT in the invocation). Consequently >> the smallest chunk where arithmetic is valid really is 4Gb, not >> 256Mb as I first wrote. > > Well, that''s a bit too implicit for me. How about we initialise ''j'' to > MAX_ORDER in pfn_pdx_hole_setup() with a comment about supporting page_info > pointer arithmetic within allocatable multi-page regions? > > Something like the appended (but with a code comment)?Yes, that would seem reasonable (and not affecting current behavior). Jan> --- a/xen/arch/x86/x86_64/mm.c Mon Aug 30 14:59:12 2010 +0100 > +++ b/xen/arch/x86/x86_64/mm.c Tue Aug 31 17:34:34 2010 +0100 > @@ -165,7 +165,8 @@ > { > unsigned int i, j, bottom_shift, hole_shift; > > - for ( hole_shift = bottom_shift = j = 0; ; ) > + hole_shift = bottom_shift = 0; > + for ( j = MAX_ORDER-1; ; ) > { > i = find_next_zero_bit(&mask, BITS_PER_LONG, j); > j = find_next_bit(&mask, BITS_PER_LONG, i);_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.08.10 at 19:03, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 31/08/2010 17:35, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >>> That''s somewhat implicit: srat_parse_regions() gets passed an >>> address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G). >>> Thus srat_parse_regions() starts off with a mask with the lower >>> 32 bits all set (only more bits can get set subsequently). Thus >>> the earliest zero bit pfn_pdx_hole_setup() can find is bit 20 >>> (due to the >> PAGE_SHIFT in the invocation). Consequently >>> the smallest chunk where arithmetic is valid really is 4Gb, not >>> 256Mb as I first wrote. >> >> Well, that''s a bit too implicit for me. How about we initialise ''j'' to >> MAX_ORDER in pfn_pdx_hole_setup() with a comment about supporting page_info >> pointer arithmetic within allocatable multi-page regions? > > Well I agree with your logic anyway. So I don''t see that this can be the > cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as to > why the page arithmetic and checks in free_heap_pages are (apparently) > resulting in a page pointer way outside the frame-table region and actually > in the directmap region.There must be some unchecked use of PAGE_LIST_NULL, i.e. running off a list end without taking notice (0xffff8315ffffffe4 exactly corresponds with that). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 01.09.10 at 09:17, MaoXiaoyun <tinnycloud@hotmail.com> wrote:> As I go through the chunk merge code in free_heap_pages, one thing I''d like > > to mention is, previously, I printted out all domain pages when allocated, > > and I found the order in assgin_pages in > /xen-4.0.0/xen/common/page_alloc.c:1087, > > the order either be 0, or 9, and later I know that is because domain U > populate physmap > > 2M Bytes everytime. > > > > And here in the while statement, the order is compare with MAX_ORDER, which > is 20.Are you sure it''s 20? MAX_ORDER should be 18 for x86 afaict. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes, you are right. I have it printted out, it is 18. Thanks for correcting me. I am interested in your assumption on list NULL check on last mail. How can I set up a test to verify it?> Date: Wed, 1 Sep 2010 09:05:50 +0100 > From: JBeulich@novell.com > To: tinnycloud@hotmail.com > CC: keir.fraser@eu.citrix.com; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > > >>> On 01.09.10 at 09:17, MaoXiaoyun <tinnycloud@hotmail.com> wrote: > > > As I go through the chunk merge code in free_heap_pages, one thing I''d like > > > > to mention is, previously, I printted out all domain pages when allocated, > > > > and I found the order in assgin_pages in > > /xen-4.0.0/xen/common/page_alloc.c:1087, > > > > the order either be 0, or 9, and later I know that is because domain U > > populate physmap > > > > 2M Bytes everytime. > > > > > > > > And here in the while statement, the order is compare with MAX_ORDER, which > > is 20. > > Are you sure it''s 20? MAX_ORDER should be 18 for x86 afaict. > > Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote:>> Well I agree with your logic anyway. So I don''t see that this can be the >> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as to >> why the page arithmetic and checks in free_heap_pages are (apparently) >> resulting in a page pointer way outside the frame-table region and actually >> in the directmap region. > > There must be some unchecked use of PAGE_LIST_NULL, i.e. > running off a list end without taking notice (0xffff8315ffffffe4 > exactly corresponds with that).Okay, my next guess then is that we are deleting a chunk from the wrong list head. I don''t see any check that the adjacent chunks we are considering to merge are from the same node and zone. I suppose the zone logic does just work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t the merging logic in free_heap_pages be checking that the merging candidate is from the same NUMA node? I see I have an ASSERTion later in the same function, but it''s too weak and wishful I suspect. MaoXiaoyun: can you please test with the attached patch? If I''m right, you will crash on one of the BUG_ON checks that I added, rather than crashing on a pointer dereference. You may even crash during boot. Anyhow, what is interesting is whether this patch always makes you crash on BUG_ON before you would normally crash on pointer dereference. If so this is trivial to fix. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 01.09.10 at 10:49, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > Okay, my next guess then is that we are deleting a chunk from the wrong list > head. I don''t see any check that the adjacent chunks we are considering to > merge are from the same node and zone. I suppose the zone logic does just > work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t > the merging logic in free_heap_pages be checking that the merging candidate > is from the same NUMA node? I see I have an ASSERTion later in the same > function, but it''s too weak and wishful I suspect.Hmm, we''re keeping a page reserved if node boundaries aren''t well aligned (at the end of init_heap_pages()), so that shouldn''t be possible. MaoXiaoyun: Would it be possible that we get to see a *full* boot log (at maximum log level), so we know the characteristics of the machine? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks Keir. I will run the test and keep you updated.> Date: Wed, 1 Sep 2010 09:49:18 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: JBeulich@novell.com > CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> Well I agree with your logic anyway. So I don''t see that this can be the > >> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as to > >> why the page arithmetic and checks in free_heap_pages are (apparently) > >> resulting in a page pointer way outside the frame-table region and actually > >> in the directmap region. > > > > There must be some unchecked use of PAGE_LIST_NULL, i.e. > > running off a list end without taking notice (0xffff8315ffffffe4 > > exactly corresponds with that). > > Okay, my next guess then is that we are deleting a chunk from the wrong list > head. I don''t see any check that the adjacent chunks we are considering to > merge are from the same node and zone. I suppose the zone logic does just > work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t > the merging logic in free_heap_pages be checking that the merging candidate > is from the same NUMA node? I see I have an ASSERTion later in the same > function, but it''s too weak and wishful I suspect. > > MaoXiaoyun: can you please test with the attached patch? If I''m right, you > will crash on one of the BUG_ON checks that I added, rather than crashing on > a pointer dereference. You may even crash during boot. Anyhow, what is > interesting is whether this patch always makes you crash on BUG_ON before > you would normally crash on pointer dereference. If so this is trivial to > fix. > > Thanks, > Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Well. It did crash on every startup. below is what I got. --------------------------------------------------- root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M dom0_max_ vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, entry=0x100000 ] module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0 [Multiboot-module @ 0x39b000, 0x3214d0 bytes] __ __ _ _ ___ ___ \ \/ /___ _ __ | || | / _ \ / _ \ * \ // _ \ ''_ \ | || |_| | | | | | | * / \ __/ | | | |__ _| |_| | |_| | * * /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** hich entry is highlighted. (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 (XEN) Latest ChangeSet: unavailableto modify the kernel arguments (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 6 MBR signatures (XEN) Found 6 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009a800 (usable) (XEN) 000000000009a800 - 00000000000a0000 (reserved) (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000bf790000 (usable) (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000fff00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000640000000 (usable) (XEN) --------------849 (XEN) --------------849 (XEN) --------------849 (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) (XEN) ACPI: FACS BF79E000, 0040 (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) (XEN) --------------847 (XEN) ---------srat enter (XEN) ---------prepare enter into pfn (XEN) -------in pfn (XEN) -------hole shift returned (XEN) --------------849 (XEN) System RAM: 24542MB (25131224kB) (XEN) Unknown interrupt (cr2=0000000000000000) (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 0000000000001000 0000000000000004 0000000000000080 0000000000000001 ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000fffff000> Date: Wed, 1 Sep 2010 09:49:18 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: JBeulich@novell.com > CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> Well I agree with your logic anyway. So I don''t see that this can be the > >> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as to > >> why the page arithmetic and checks in free_heap_pages are (apparently) > >> resulting in a page pointer way outside the frame-table region and actually > >> in the directmap region. > > > > There must be some unchecked use of PAGE_LIST_NULL, i.e. > > running off a list end without taking notice (0xffff8315ffffffe4 > > exactly corresponds with that). > > Okay, my next guess then is that we are deleting a chunk from the wrong list > head. I don''t see any check that the adjacent chunks we are considering to > merge are from the same node and zone. I suppose the zone logic does just > work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t > the merging logic in free_heap_pages be checking that the merging candidate > is from the same NUMA node? I see I have an ASSERTion later in the same > function, but it''s too weak and wishful I suspect. > > MaoXiaoyun: can you please test with the attached patch? If I''m right, you > will crash on one of the BUG_ON checks that I added, rather than crashing on > a pointer dereference. You may even crash during boot. Anyhow, what is > interesting is whether this patch always makes you crash on BUG_ON before > you would normally crash on pointer dereference. If so this is trivial to > fix. > > Thanks, > Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 01/09/2010 10:01, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 01.09.10 at 10:49, Keir Fraser <keir.fraser@eu.citrix.com> wrote: >> Okay, my next guess then is that we are deleting a chunk from the wrong list >> head. I don''t see any check that the adjacent chunks we are considering to >> merge are from the same node and zone. I suppose the zone logic does just >> work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t >> the merging logic in free_heap_pages be checking that the merging candidate >> is from the same NUMA node? I see I have an ASSERTion later in the same >> function, but it''s too weak and wishful I suspect. > > Hmm, we''re keeping a page reserved if node boundaries aren''t > well aligned (at the end of init_heap_pages()), so that shouldn''t > be possible.Oh yes, that ought to be sufficient really. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
See below log, is this sufficient ? Thanks. --------------------------------------------------- root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M dom0_max_ vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, entry=0x100000 ] module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0 __ __ _ _ ___ ___ \ \/ /___ _ __ | || | / _ \ / _ \ * \ // _ \ ''_ \ | || |_| | | | | | | * / \ __/ | | | |__ _| |_| | |_| | * /_/\_\___|_| |_| |_|(_)___(_)___/ * * ************************************** (XEN) Xen version 4.0.0 (root@dev.sd.hello.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) Wed Sep 1 17:39:07 CST 2010 (XEN) Latest ChangeSet: unavailableted OS, ''e'' to edit the (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot (XEN) Video information: ''c'' for a command-line. (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 0 secondsseconds. (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 6 MBR signatures (XEN) Found 6 EDD information structures (XEN) Xen-e820 RAM map:ebooting the system... (XEN) 0000000000000000 - 000000000009a800 (usable) (XEN) 000000000009a800 - 00000000000a0000 (reserved) (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000bf790000 (usable) (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000fff00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000640000000 (usable) (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) (XEN) ACPI: FACS BF79E000, 0040 (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) (XEN) System RAM: 24542MB (25131224kB) (XEN) Domain heap initialised DMA width 31 bits (XEN) Processor #0 7:10 APIC version 21 (XEN) Processor #16 7:10 APIC version 21 (XEN) Processor #2 7:10 APIC version 21 (XEN) Processor #18 7:10 APIC version 21 (XEN) Processor #4 7:10 APIC version 21 (XEN) Processor #20 7:10 APIC version 21 (XEN) Processor #6 7:10 APIC version 21 (XEN) Processor #22 7:10 APIC version 21 (XEN) Processor #1 7:10 APIC version 21 (XEN) Processor #17 7:10 APIC version 21 (XEN) Processor #3 7:10 APIC version 21 (XEN) Processor #19 7:10 APIC version 21 (XEN) Processor #5 7:10 APIC version 21 (XEN) Processor #21 7:10 APIC version 21 (XEN) Processor #7 7:10 APIC version 21 (XEN) Processor #23 7:10 APIC version 21 (XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 (XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec8a000, GSI 24-47 (XEN) Enabling APIC mode: Phys. Using 2 I/O APICs (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2400.126 MHz processor. (XEN) Initing memory sharing. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) HVM: ASIDs enabled. (XEN) HVM: VMX enabled (XEN) HVM: Hardware Assisted Paging detected. (XEN) I/O virtualisation disabled (XEN) MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0. (XEN) Bank 8: ea1e1200008000b0[ 0] (XEN) Total of 16 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) TSC is reliable, synchronization unnecessary (XEN) Platform timer is 14.318MHz HPET ?(XEN) Allocated console ring of 32 KiB. (XEN) Brought up 16 CPUs (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1845000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000238000000->000000023c000000 (2605056 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff81000000->ffffffff81845000 (XEN) Init. ramdisk: ffffffff81845000->ffffffff81d9f200 (XEN) Phys-Mach map: ffffffff81da0000->ffffffff831a0000 (XEN) Start info: ffffffff831a0000->ffffffff831a04b4 (XEN) Page tables: ffffffff831a1000->ffffffff831be000 (XEN) Boot stack: ffffffff831be000->ffffffff831bf000 (XEN) TOTAL: ffffffff80000000->ffffffff83400000 (XEN) ENTRY ADDRESS: ffffffff816d0060 (XEN) Dom0 has maximum 4 VCPUs (XEN) Scrubbing Free RAM: ............................................................................................................................................done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> Xen (type ''CTRL-a'' three times to switch input to DOM0) (XEN) Freed 156kB init memory. mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) ioapic_guest_write: apic=0, pin=2, irq=0 (XEN) ioapic_guest_write: new_entry=000100f0 (XEN) ioapic_guest_write: old_entry=000000f0 pirq=0 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! (XEN) irq.c:1445: dom0: pirq 0 or irq 3 already mapped (XEN) ioapic_guest_write: apic=0, pin=4, irq=4 (XEN) ioapic_guest_write: new_entry=000100f1 (XEN) ioapic_guest_write: old_entry=000000f1 pirq=0 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! (XEN) irq.c:1445: dom0: pirq 0 or irq 5 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 6 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 7 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 8 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 9 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 10 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 11 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 12 already mapped (XEN) irq.c:1445: dom0: pirq 0 or irq 13 already mapped (XEN) ioapic_guest_write: apic=0, pin=0, irq=0 (XEN) ioapic_guest_write: new_entry=000000f0 (XEN) ioapic_guest_write: old_entry=00010a7d pirq=0 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.31.13-pvops-patch (root@houyi-chunk2.dev.sd.hello.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Aug 24 11:23:51 CST 2010 Command line: ro root=LABEL=/ hda=noprobe console=hvc0 KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls xen_release_chunk: looking at area pfn bf7e0-bf7ec: 12 pages freed xen_release_chunk: looking at area pfn c0000-e0000: 131072 pages freed xen_release_chunk: looking at area pfn f0000-fec00: 60416 pages freed xen_release_chunk: looking at area pfn fec01-fec8a: 137 pages freed xen_release_chunk: looking at area pfn fec8b-fee00: 373 pages freed xen_release_chunk: looking at area pfn fee01-fff00: 4351 pages freed released 196361 pages of unused memory BIOS-provided physical RAM map: Xen: 0000000000000000 - 000000000009a800 (usable) Xen: 000000000009a800 - 0000000000100000 (reserved) Xen: 0000000000100000 - 00000000bf790000 (usable) Xen: 00000000bf790000 - 00000000bf79e000 (ACPI data) Xen: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) Xen: 00000000bf7d0000 - 00000000bf7e0000 (reserved) Xen: 00000000bf7ec000 - 00000000c0000000 (reserved) Xen: 00000000e0000000 - 00000000f0000000 (reserved) Xen: 00000000fec00000 - 00000000fec01000 (reserved) Xen: 00000000fec8a000 - 00000000fec8b000 (reserved) Xen: 00000000fee00000 - 00000000fee01000 (reserved) Xen: 00000000fff00000 - 0000000100000000 (reserved) Xen: 0000000100000000 - 0000000280000000 (usable) DMI present. AMI BIOS detected: BIOS may corrupt low RAM, working around it. last_pfn = 0x280000 max_arch_pfn = 0x400000000 last_pfn = 0xbf790 max_arch_pfn = 0x400000000 init_memory_mapping: 0000000000000000-00000000bf790000 init_memory_mapping: 0000000100000000-0000000280000000 RAMDISK: 01845000 - 01d9f200 ACPI: RSDP 00000000000f9dd0 00024 (v02 ACPIAM) ACPI: XSDT 00000000bf790100 0005C (v01 112309 XSDT1113 20091123 MSFT 00000097) ACPI: FACP 00000000bf790290 000F4 (v04 112309 FACP1113 20091123 MSFT 00000097) ACPI: DSDT 00000000bf7904b0 04D6A (v02 CTSAV CTSAV122 00000122 INTL 20051117) ACPI: FACS 00000000bf79e000 00040 ACPI: APIC 00000000bf790390 000D8 (v02 112309 APIC1113 20091123 MSFT 00000097) ACPI: MCFG 00000000bf790470 0003C (v01 112309 OEMMCFG 20091123 MSFT 00000097) ACPI: OEMB 00000000bf79e040 0007A (v01 112309 OEMB1113 20091123 MSFT 00000097) ACPI: SRAT 00000000bf79a4b0 001D0 (v01 112309 OEMSRAT 00000001 INTL 00000001) ACPI: HPET 00000000bf79a680 00038 (v01 112309 OEMHPET 20091123 MSFT 00000097) ACPI: SSDT 00000000bf7a1a00 00363 (v01 DpgPmm CpuPm 00000012 INTL 20051117) (9 early reservations) ==> bootmem [0000000000 - 0280000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] #1 [00031a1000 - 00031be000] XEN PAGETABLES ==> [00031a1000 - 00031be000] #2 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] #3 [0001000000 - 0001824630] TEXT DATA BSS ==> [0001000000 - 0001824630] #4 [0001845000 - 0001d9f200] RAMDISK ==> [0001845000 - 0001d9f200] #5 [0001da0000 - 00031a1000] XEN START INFO ==> [0001da0000 - 00031a1000] #6 [0001825000 - 0001825187] BRK ==> [0001825000 - 0001825187] #7 [0000100000 - 00006e0000] PGTABLE ==> [0000100000 - 00006e0000] #8 [00031be000 - 0003dc4000] PGTABLE ==> [00031be000 - 0003dc4000] Zone PFN ranges: DMA 0x00000010 -> 0x00001000 DMA32 0x00001000 -> 0x00100000 Normal 0x00100000 -> 0x00280000 Movable zone start PFN for each node early_node_map[3] active PFN ranges 0: 0x00000010 -> 0x0000009a 0: 0x00000100 -> 0x000bf790 0: 0x00100000 -> 0x00280000 ACPI: PM-Timer IO Port: 0x808 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x10] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x12] enabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x14] enabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] enabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x16] enabled) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x11] enabled) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x03] enabled) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x13] enabled) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x05] enabled) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x15] enabled) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x07] enabled) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x17] enabled) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x09] address[0xfec8a000] gsi_base[24]) IOAPIC[1]: apic_id 9, version 32, address 0xfec8a000, GSI 24-47 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Using ACPI (MADT) for SMP configuration information ACPI: HPET id: 0x8086a301 base: 0xfed00000 SMP: Allowing 4 CPUs, 0 hotplug CPUs PM: Registered nosave memory: 000000000009a000 - 000000000009b000 PM: Registered nosave memory: 000000000009b000 - 0000000000100000 PM: Registered nosave memory: 00000000bf790000 - 00000000bf79e000 PM: Registered nosave memory: 00000000bf79e000 - 00000000bf7d0000 PM: Registered nosave memory: 00000000bf7d0000 - 00000000bf7e0000 PM: Registered nosave memory: 00000000bf7e0000 - 00000000bf7ec000 PM: Registered nosave memory: 00000000bf7ec000 - 00000000c0000000 PM: Registered nosave memory: 00000000c0000000 - 00000000e0000000 PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000 PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000 PM: Registered nosave memory: 00000000fec00000 - 00000000fec01000 PM: Registered nosave memory: 00000000fec01000 - 00000000fec8a000 PM: Registered nosave memory: 00000000fec8a000 - 00000000fec8b000 PM: Registered nosave memory: 00000000fec8b000 - 00000000fee00000 PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000 PM: Registered nosave memory: 00000000fee01000 - 00000000fff00000 PM: Registered nosave memory: 00000000fff00000 - 0000000100000000 Allocating PCI resources starting at c0000000 (gap: c0000000:20000000) NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:4 nr_node_ids:1 PERCPU: Allocated 22 4k pages, static data 89696 bytes Xen: using vcpu_info placement Built 1 zonelists in Zone order, mobility grouping on. Total pages: 2319671 Kernel command line: ro root=LABEL=/ hda=noprobe console=hvc0 PID hash table entries: 4096 (order: 12, 32768 bytes) Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes) Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes) Initializing CPU#0 PCI-DMA: Using Xen software bounce buffering for IO (Xen-SWIOTLB) Placing 64MB Xen software IO TLB between ffff880020000000 - ffff880024000000 Xen software IO TLB at phys 0x20000000 - 0x24000000 Memory: 9154192k/10485760k available (4104k kernel code, 1057688k absent, 272952k reserved, 2777k data, 480k init) SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 Hierarchical RCU implementation. NR_IRQS:4352 nr_irqs:848 xen_set_ioapic_routing: irq 0 gsi 0 vector 0 ioapic 0 pin 0 triggering 0 polarity 0 xen_set_ioapic_routing: irq 1 gsi 1 vector 1 ioapic 0 pin 1 triggering 0 polarity 0 xen_set_ioapic_routing: irq 3 gsi 3 vector 3 ioapic 0 pin 3 triggering 0 polarity 0 xen_set_ioapic_routing: irq 4 gsi 4 vector 4 ioapic 0 pin 4 triggering 0 polarity 0 xen_set_ioapic_routing: irq 5 gsi 5 vector 5 ioapic 0 pin 5 triggering 0 polarity 0 xen_set_ioapic_routing: irq 6 gsi 6 vector 6 ioapic 0 pin 6 triggering 0 polarity 0 xen_set_ioapic_routing: irq 7 gsi 7 vector 7 ioapic 0 pin 7 triggering 0 polarity 0 xen_set_ioapic_routing: irq 8 gsi 8 vector 8 ioapic 0 pin 8 triggering 0 polarity 0 xen_set_ioapic_routing: irq 9 gsi 9 vector 9 ioapic 0 pin 9 triggering 1 polarity 0 xen_set_ioapic_routing: irq 10 gsi 10 vector 10 ioapic 0 pin 10 triggering 0 polarity 0 xen_set_ioapic_routing: irq 11 gsi 11 vector 11 ioapic 0 pin 11 triggering 0 polarity 0 xen_set_ioapic_routing: irq 12 gsi 12 vector 12 ioapic 0 pin 12 triggering 0 polarity 0 xen_set_ioapic_routing: irq 13 gsi 13 vector 13 ioapic 0 pin 13 triggering 0 polarity 0 xen_set_ioapic_routing: irq 14 gsi 14 vector 14 ioapic 0 pin 14 triggering 0 polarity 0 xen_set_ioapic_routing: irq 15 gsi 15 vector 15 ioapic 0 pin 15 triggering 0 polarity 0 Detected 2400.126 MHz processor. Console: colour VGA+ 80x25 console [hvc0] enabled allocated 94371840 bytes of page_cgroup please try ''cgroup_disable=memory'' option if you don''t want memory cgroups installing Xen timer for CPU 0 xen: vcpu_time_info placement not supported Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.25 BogoMIPS (lpj=2400126) Security Framework initialized SELinux: Initializing. Mount-cache hash table entries: 256 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct Initializing cgroup subsys memory Initializing cgroup subsys devices Initializing cgroup subsys freezer Initializing cgroup subsys net_cls CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 256K CPU: L3 cache: 8192K CPU: Unsupported number of siblings 16 mce: CPU supports 9 MCE banks Performance Counters: unsupported p6 CPU model 26 no PMU driver, software counters only. SMP alternatives: switching to UP code ACPI: Core revision 20090521 ftrace: converting mcount calls to 0f 1f 44 00 00 ftrace: allocating 24253 entries in 96 pages installing Xen timer for CPU 1 SMP alternatives: switching to SMP code Initializing CPU#1 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 256K CPU: L3 cache: 8192K CPU: Unsupported number of siblings 16 mce: CPU supports 9 MCE banks installing Xen timer for CPU 2 Initializing CPU#2 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 256K CPU: L3 cache: 8192K CPU: Unsupported number of siblings 16 mce: CPU supports 9 MCE banks installing Xen timer for CPU 3 Initializing CPU#3 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 256K CPU: L3 cache: 8192K CPU: Unsupported number of siblings 16 mce: CPU supports 9 MCE banks Brought up 4 CPUs Booting paravirtualized kernel on Xen Xen version: 4.0.0 (preserve-AD) (dom0) Grant tables using version 2 layout. Grant table initialized regulator: core version 0.5 NET: Registered protocol family 16 xenbus_probe_init ok ACPI: bus type pci registered PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255 PCI: MCFG area at e0000000 reserved in E820 PCI: Using MMCONFIG at e0000000 - efffffff PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0 ACPI: Interpreter enabled ACPI: (supports S0 S5) ACPI: Using IOAPIC for interrupt routing ACPI: No dock devices found. ACPI: PCI Root Bridge [PCI0] (0000:00) pci 0000:00:00.0: PME# supported from D0 D3hot D3cold pci 0000:00:00.0: PME# disabled pci 0000:00:01.0: PME# supported from D0 D3hot D3cold pci 0000:00:01.0: PME# disabled pci 0000:00:03.0: PME# supported from D0 D3hot D3cold pci 0000:00:03.0: PME# disabled pci 0000:00:07.0: PME# supported from D0 D3hot D3cold pci 0000:00:07.0: PME# disabled pci 0000:00:09.0: PME# supported from D0 D3hot D3cold pci 0000:00:09.0: PME# disabled pci 0000:00:13.0: PME# supported from D0 D3hot D3cold pci 0000:00:13.0: PME# disabled pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold pci 0000:00:1a.7: PME# disabled pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold pci 0000:00:1d.7: PME# disabled pci 0000:01:00.0: PME# supported from D0 D3hot D3cold pci 0000:01:00.0: PME# disabled pci 0000:01:00.1: PME# supported from D0 D3hot D3cold pci 0000:01:00.1: PME# disabled pci 0000:00:1e.0: transparent bridge ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 6 7 10 11 12 14 *15) ACPI: PCI Interrupt Link [LNKB] (IRQs *5) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 6 7 10 11 12 *14 15) xenbus_probe_backend_init bus registered ok xen_balloon: Initialising balloon driver with page order 0. SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing IO APIC resources couldn''t be allocated. NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default pnp: PnP ACPI init ACPI: bus type pnp registered xen_allocate_pirq: returning irq 8 for gsi 8 xen_set_ioapic_routing: irq 8 gsi 8 vector 8 ioapic 0 pin 8 triggering 0 polarity 0 xen_allocate_pirq: returning irq 13 for gsi 13 xen_set_ioapic_routing: irq 13 gsi 13 vector 13 ioapic 0 pin 13 triggering 0 polarity 0 xen_allocate_pirq: returning irq 4 for gsi 4 xen_set_ioapic_routing: irq 4 gsi 4 vector 4 ioapic 0 pin 4 triggering 0 polarity 0 pnp: PnP ACPI: found 12 devices ACPI: ACPI bus type pnp unregistered system 00:01: iomem range 0xfbf00000-0xfbffffff has been reserved system 00:01: iomem range 0xfc000000-0xfcffffff has been reserved system 00:01: iomem range 0xfd000000-0xfdffffff has been reserved system 00:01: iomem range 0xfe000000-0xfebfffff has been reserved system 00:06: ioport range 0x4d0-0x4d1 has been reserved system 00:06: ioport range 0x800-0x87f has been reserved system 00:06: ioport range 0x500-0x57f has been reserved system 00:06: iomem range 0xfed1c000-0xfed1ffff has been reserved system 00:06: iomem range 0xfed20000-0xfed3ffff has been reserved system 00:06: iomem range 0xfed40000-0xfed8ffff has been reserved system 00:08: iomem range 0xfec00000-0xfec00fff has been reserved system 00:08: iomem range 0xfee00000-0xfee00fff has been reserved system 00:0a: iomem range 0xe0000000-0xefffffff has been reserved system 00:0b: iomem range 0x0-0x9ffff could not be reserved system 00:0b: iomem range 0xc0000-0xcffff could not be reserved system 00:0b: iomem range 0xe0000-0xfffff could not be reserved system 00:0b: iomem range 0x100000-0xbf8fffff could not be reserved system 00:0b: iomem range 0xfed90000-0xffffffff could not be reserved PM-Timer failed consistency check (0x0xffffff) - aborting. pci 0000:00:01.0: PCI bridge, secondary bus 0000:01 pci 0000:00:01.0: IO window: disabled pci 0000:00:01.0: MEM window: 0xf6000000-0xf9ffffff pci 0000:00:01.0: PREFETCH window: disabled pci 0000:00:03.0: PCI bridge, secondary bus 0000:02 pci 0000:00:03.0: IO window: disabled pci 0000:00:03.0: MEM window: disabled pci 0000:00:03.0: PREFETCH window: disabled pci 0000:00:07.0: PCI bridge, secondary bus 0000:03 pci 0000:00:07.0: IO window: disabfba00000-0xfbdfffff pci 0000:00:09.0: PREFETCH window: disabled pci 0000:00:1e.0: PCI bridge, secondary bus 0000:05 pci te cache hash table entries: 524288 (order: 10, 4194304 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered NET: Registered protocol family 1 Trying to unpack rootfs image as initramfs... Freeing initrd memory: 5480k freed audit: initializing netlink socket (disabled) type=2000 audit(1283362843.039:1): initialized HugeTLB registered 2 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.2 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) msgmni has been set to 17891 alg: No test for stdrng (krng) Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 pciehp: PCI Express Hot Plug Controller Driver version: 0.4 acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 input: Power Button as /class/input/input0 ACPI: Power Button [PWRF] input: Power Button as /class/input/input1 ACPI: Power Button [PWRB] ACPI: SSDT 00000000bf79e0c0 02FB4 (v01 DpgPmm P001Ist 00000011 INTL 20051117) ACPI: SSDT 00000000bf7a1080 00980 (v01 PmRef P001Cst 00003001 INTL 20051117) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) ACPI: CPU-1 (power states: C1[C1] C2[C2] C3[C3]) Event-channel device installed. blktap_device_init: blktap device major 253 blktap_ring_init: blktap ring major: 251 registering netback hpet_acpi_add: no address or irqs in _CRS Non-volatile memory driver v1.3 Linux agpgart interface v0.103 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled brd: module loaded input: Macintosh mouse button emulation as /class/input/input2 Fixed MDIO Bus: probed PNP: No PS/2 controller found. Probing ports directly. serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice rtc_cmos 00:03: RTC can wake from S4 rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0 rtc0: alarms up to one month, y3k, 114 bytes nvram device-mapper: uevent: version 1.0.3 device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com cpuidle: using governor lusb 3-1: new full speed USB device using uhci_hcd and address 2 usb 3-1: New USB device found, idVendor=12d1, idProduct=0003 usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 3-1: Product: Huawei Keyboard/Mouse V100 usb 3-1: Manufacturer: Huawei Technologies usb 3-1: configuration #1 chosen from 1 choice input: Huawei Technologies Huawei Keyboard/Mouse V100 as /class/input/input3 generic-usb 0003:12D1:0003.0001: input,hidraw0: USB HID v1.10 Keyboard [Huawei Technologies Huawei Keyboard/Mouse V100] on usb-0000:00:1a.0-1/input0 input: Huawei Technologies Huawei Keyboard/Mouse V100 as /class/input/input4 generic-usb 0003:12D1:0003.0002: input,hidraw1: USB HID v1.10 Mouse [Huawei Technologies Huawei Keyboard/Mouse V100] on usb-0000:00:1a.0-1/input1 scsi0 : ioc0: LSISAS1068E B3, FwRev=011a0000h, Ports=1, MaxQ=266, IRQ=32 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x1221000000000000 scsi 0:0:0:0: Direct-Access ATA ST31000340NS SN06 PQ: 0 ANSI: 5 sd 0:0:0:0: Attached scsi generic sg0 type 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 1, phy 1, sas_addr 0x1221000001000000 sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) scsi 0:0:1:0: Direct-Access ATA ST31000340NS SN06 PQ: 0 ANSI: 5 sd 0:0:1:0: Attached scsi generic sg1 type 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 2, phy 2, sas_addr 0x1221000002000000 sd 0:0:1:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) scsi 0:0:2:0: Direct-Access ATA ST31000340NS SN06 PQ: 0 ANSI: 5 sd 0:0:2:0: Attached scsi generic sg2 type 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 3, phy 3, sas_addr 0x1221000003000000 sd 0:0:2:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) scsi 0:0:3:0: Direct-Access ATA ST31000340NS SN06 PQ: 0 ANSI: 5 sd 0:0:3:0: Attached scsi generic sg3 type 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 4, phy 4, sas_addr 0x1221000004000000 sd 0:0:3:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) scsi 0:0:4:0: Direct-Access ATA ST31000340NS SN06 PQ: 0 ANSI: 5 sd 0:0:4:0: Attached scsi generic sg4 type 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 5, phy 5, sas_addr 0x1221000005000000 sd 0:0:4:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) scsi 0:0:5:0: Direct-Access ATA ST31000340NS SN06 PQ: 0 ANSI: 5 sd 0:0:5:0: Attached scsi generic sg5 type 0 sd 0:0:5:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn''t support DPO or FUA sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn''t support DPO or FUA sd 0:0:2:0: [sdc] Write Protect is off sd 0:0:2:0: [sdc] Write cache: enabled, read cache: enabled, doesn''t support DPO or FUA sd 0:0:3:0: [sdd] Write Protect is off sd 0:0:3:0: [sdd] Write cache: enabled, read cache: enabled, doesn''t support DPO or FUA sd 0:0:4:0: [sde] Write Protect is off sd 0:0:5:0: [sdf] Write Protect is off sd 0:0:4:0: [sde] Write cache: enabled, read cache: enabled, doesn''t support DPO or FUA sd 0:0:5:0: [sdf] Write cache: enabled, read cache: enabled, doesn''t support DPO or FUA sda: sdb: sdc: sdd: sda1 sda2 sda3 sda4 < sdb1 sde: sdf: sdc1 sda5 sdd1 sda6 > sdf1 sde1 sd 0:0:1:0: [sdb] Attached SCSI disk sd 0:0:2:0: [sdc] Attached SCSI disk sd 0:0:0:0: [sda] Attached SCSI disk sd 0:0:3:0: [sdd] Attached SCSI disk sd 0:0:4:0: [sde] Attached SCSI disk sd 0:0:5:0: [sdf] Attached SCSI disk Loading shpchp.ko module shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 Loading ata_piix.ko module xen_allocate_pirq: returning irq 19 for gsi 19 xen_set_ioapic_routing: irq 19 gsi 19 vector 19 ioapic 0 pin 19 triggering 1 polarity 1 (XEN) ioapic_guest_write: apic=0, pin=19, irq=19 (XEN) ioapic_guest_write: new_entry=0001a0a8 (XEN) ioapic_guest_write: old_entry=0000a0a8 pirq=19 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! ata_piix 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19 ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] scsi1 : ata_piix scsi2 : ata_piix ata1: SATA max UDMA/133 cmd 0xb400 ctl 0xb080 bmdma 0xa880 irq 19 ata2: SATA max UDMA/133 cmd 0xb000 ctl 0xac00 bmdma 0xa888 irq 19 xen_allocate_pirq: returning irq 19 for gsi 19 xen_set_ioapic_routing: irq 19 gsi 19 vector 19 ioapic 0 pin 19 triggering 1 polarity 1 (XEN) ioapic_guest_write: apic=0, pin=19, irq=19 (XEN) ioapic_guest_write: new_entry=0001a0a8 (XEN) ioapic_guest_write: old_entry=0000a0a8 pirq=19 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! ata_piix 0000:00:1f.5: PCI INT B -> GSI 19 (level, low) -> IRQ 19 ata_piix 0000:00:1f.5: MAP [ P0 -- P1 -- ] scsi3 : ata_piix scsi4 : ata_piix ata3: SATA max UDMA/133 cmd 0xc400 ctl 0xc080 bmdma 0xb880 irq 19 ata4: SATA max UDMA/133 cmd 0xc000 ctl 0xbc00 bmdma 0xb888 irq 19 ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) ata1.00: SATA link down (SStatus 0 SControl 300) ata1.01: SATA link down (SStatus 0 SControl 300) ata2.00: SATA link down (SStatus 0 SControl 300) ata2.01: SATA link down (SStatus 0 SControl 300) Loading aacraid.ko module Adaptec aacraid driver 1.1-5[2461]-ms Scanning and configuring dmraid supported devices Trying to resume from LABEL=SWAP-sda5 No suspend signature on swap, not resuming. Creating root device. Mounting root filesystem. kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Setting up other filesystems. Setting up new root fs no fstab.sys, mounting internal defaults Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys SELinux: Disabled at runtime. type=1404 audit(1283362855.133:2): selinux=0 auid=4294967295 ses=4294967295 INIT: version 2.86 booting Welcome to Red Hat Enterprise Linux Server Press ''I'' to enter interactive startup. Cannot access the Hardware Clock via any known method. Use the --debug option to see the details of our search for an access method. Setting clock (localtime): Thu Sep 2 01:40:55 CST 2010 [ OK ] Starting udev: (XEN) ioapic_guest_write: apic=0, pin=18, irq=18 (XEN) ioapic_guest_write: new_entry=0001a0a0 (XEN) ioapic_guest_write: old_entry=0000a0a0 pirq=18 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! [ OK ] Loading default keymap (us): [ OK ] Setting hostname houyi-chunk2.dev.sd.hello.com: [ OK ] DM multipath kernel driver version too old No devices found Setting up Logical Volume Management: [ OK ] Checking filesystems Checking all file systems. [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda3 /: clean, 506608/38404096 files, 24515093/38399366 blocks [/sbin/fsck.ext3 (1) -- /apsara] fsck.ext3 -a /dev/sda2 /apsara: clean, 11/38404096 files, 1250655/38399366 blocks [/sbin/fsck.ext3 (1) -- /apsarapangu] fsck.ext3 -a /dev/sda6 /apsarapangu: clean, 52/167018496 files, 27754537/166999683 blocks [/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1 /boot: clean, 97/128520 files, 239490/514048 blocks> Date: Wed, 1 Sep 2010 10:01:57 +0100 > From: JBeulich@novell.com > To: keir.fraser@eu.citrix.com; tinnycloud@hotmail.com > CC: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > > >>> On 01.09.10 at 10:49, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > > Okay, my next guess then is that we are deleting a chunk from the wrong list > > head. I don''t see any check that the adjacent chunks we are considering to > > merge are from the same node and zone. I suppose the zone logic does just > > work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t > > the merging logic in free_heap_pages be checking that the merging candidate > > is from the same NUMA node? I see I have an ASSERTion later in the same > > function, but it''s too weak and wishful I suspect. > > Hmm, we''re keeping a page reserved if node boundaries aren''t > well aligned (at the end of init_heap_pages()), so that shouldn''t > be possible. > > MaoXiaoyun: Would it be possible that we get to see a *full* boot > log (at maximum log level), so we know the characteristics of the > machine? > > Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent merging across node boundaries. Nonetheless the code is simpler and more obvious if we put a further merging constraint in free_heap_pages() instead. It''s also correcter, since I''m not sure that the phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won''t possibly BUG out if pg-1 is not a RAM page and is not in a known NUMA node range. Please give the attached patch a spin. (You should revert the previous patch, of course). Thanks, Keir On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> Well. It did crash on every startup. > > below is what I got. > --------------------------------------------------- > root (hd0,0) > Filesystem type is ext2fs, partition type 0x83 > kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M > dom0_max_ > vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot > [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, > entry=0x100000 > ] > module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0 > [Multiboot-module @ 0x39b000, 0x3214d0 bytes] > > > __ __ _ _ > ___ ___ > \ \/ /___ _ __ | || | / _ \ / _ \ * > \ // _ \ ''_ \ | || |_| | | | | | | * > / \ __/ | | | |__ _| |_| | |_| | * * > /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** > hich entry is highlighted. > (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 20080704 > (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 > (XEN) Latest ChangeSet: unavailableto modify the kernel arguments > (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M > dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax > noreboot > (XEN) Video information: > (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. > (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds > (XEN) EDID info not retrieved because no DDC retrieval method detected > (XEN) Disc information: > (XEN) Found 6 MBR signatures > (XEN) Found 6 EDD information structures > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009a800 (usable) > (XEN) 000000000009a800 - 00000000000a0000 (reserved) > (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 00000000bf790000 (usable) > (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) > (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) > (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) > (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) > (XEN) 00000000e0000000 - 00000000f0000000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000fff00000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 0000000640000000 (usable) > (XEN) --------------849 > (XEN) --------------849 > (XEN) --------------849 > (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) > (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) > (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) > (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) > (XEN) ACPI: FACS BF79E000, 0040 > (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) > (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) > (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) > (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) > (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) > (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) > (XEN) --------------847 > (XEN) ---------srat enter > (XEN) ---------prepare enter into pfn > (XEN) -------in pfn > (XEN) -------hole shift returned > (XEN) --------------849 > (XEN) System RAM: 24542MB (25131224kB) > (XEN) Unknown interrupt (cr2=0000000000000000) > (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 > 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 > 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 > 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 > ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 > 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 > 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 > 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff > 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 > 0000000000001000 0000000000000004 0000000000000080 0000000000000001 > ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 > 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc > 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 > 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 > 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 > 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 > 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 > 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 > 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 00000000fffff000 > >> Date: Wed, 1 Sep 2010 09:49:18 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@eu.citrix.com >> To: JBeulich@novell.com >> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: >> >>>> Well I agree with your logic anyway. So I don''t see that this can be the >>>> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as >>>> to >>>> why the page arithmetic and checks in free_heap_pages are (apparently) >>>> resulting in a page pointer way outside the frame-table region and actually >>>> in the directmap region. >>> >>> There must be some unchecked use of PAGE_LIST_NULL, i.e. >>> running off a list end without taking notice (0xffff8315ffffffe4 >>> exactly corresponds with that). >> >> Okay, my next guess then is that we are deleting a chunk from the wrong list >> head. I don''t see any check that the adjacent chunks we are considering to >> merge are from the same node and zone. I suppose the zone logic does just >> work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t >> the merging logic in free_heap_pages be checking that the merging candidate >> is from the same NUMA node? I see I have an ASSERTion later in the same >> function, but it''s too weak and wishful I suspect. >> >> MaoXiaoyun: can you please test with the attached patch? If I''m right, you >> will crash on one of the BUG_ON checks that I added, rather than crashing on >> a pointer dereference. You may even crash during boot. Anyhow, what is >> interesting is whether this patch always makes you crash on BUG_ON before >> you would normally crash on pointer dereference. If so this is trivial to >> fix. >> >> Thanks, >> Keir >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 01.09.10 at 11:48, MaoXiaoyun <tinnycloud@hotmail.com> wrote: > See below log, is this sufficient ? Thanks.Unfortunately not - you missed adding "loglvl=all" to the Xen command line (the SRAT messages are info-level onlym and hence invisible by default). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks Keir. I myself did below test. in page_alloc.c. check_page will panic on all pages which the 6th character in its adddress is ''3'', i used to indicate which line paniced. Below output indicates the panic comes from line 558, and the page address is ffff82f600002040, while its next page is ffff8315ffffffe0, compare to the panic address in previous panic(ffff8315ffffffe4), which is very similar. I think this should imply something. --------------------------------------- (XEN) -----------18 (XEN) System RAM: 24542MB (25131224kB) (XEN) SRAT: No PXM for e820 range: 0000000000000000 - 000000000009a7ff (XEN) SRAT: SRAT not used. (XEN) ----------------pgb ffff82f600002040 pg ffff8315ffffffe0, mask 1, order 0, 0 (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) xmao invalid page address assigned (XEN) **************************************** (XEN) ---------------------------------------------------- 485 static int check_page(struct page_info* pgb, struct page_info* pg, unsigned long mask, unsigned int order, int i){ 486 487 if((unsigned long)pg & 0x0000020000000000 && 488 (unsigned long)pg & 0x0000010000000000 489 ){ 490 printk("----------------pgb %p pg %p, mask %lx, order %d, %d\n", pgb, pg, mask, order, i); 491 panic("xmao invalid page address assigned \n"); 492 } 493 return 0; 494 } 549 if ( (page_to_mfn(pg) & mask) ) 550 { 551 /* Merge with predecessor block? */ 552 if ( !mfn_valid(page_to_mfn(pg-mask)) || 553 !page_state_is(pg-mask, free) || 554 (PFN_ORDER(pg-mask) != order) ) 555 break; 556 pg -= mask; 557 558 check_page(pg, pdx_to_page(pg->list.next), mask, order, 0); 559 check_page(pg, pdx_to_page(pg->list.prev), mask, order, 1); 560 561 page_list_del(pg, &heap(node, zone, order)); 562 } 563 else 564 { 565 /* Merge with successor block? */ 566 if ( !mfn_valid(page_to_mfn(pg+mask)) || 567 !page_state_is(pg+mask, free) || 568 (PFN_ORDER(pg+mask) != order) ) 569 break; 570 571 pgt = pg + mask; 572 check_page(pg, pdx_to_page(pgt->list.next), mask, order, 2); 573 check_page(pg, pdx_to_page(pgt->list.prev), mask, order, 3); 574> Date: Wed, 1 Sep 2010 10:58:54 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: tinnycloud@hotmail.com; jbeulich@novell.com > CC: xen-devel@lists.xensource.com > > Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent > merging across node boundaries. Nonetheless the code is simpler and more > obvious if we put a further merging constraint in free_heap_pages() instead. > It''s also correcter, since I''m not sure that the > phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won''t possibly BUG out > if pg-1 is not a RAM page and is not in a known NUMA node range. > > Please give the attached patch a spin. (You should revert the previous > patch, of course). > > Thanks, > Keir > > On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > > > Well. It did crash on every startup. > > > > below is what I got. > > --------------------------------------------------- > > root (hd0,0) > > Filesystem type is ext2fs, partition type 0x83 > > kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M > > dom0_max_ > > vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot > > [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, > > entry=0x100000 > > ] > > module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0 > > [Multiboot-module @ 0x39b000, 0x3214d0 bytes] > > > > > > __ __ _ _ > > ___ ___ > > \ \/ /___ _ __ | || | / _ \ / _ \ * > > \ // _ \ ''_ \ | || |_| | | | | | | * > > / \ __/ | | | |__ _| |_| | |_| | * * > > /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** > > hich entry is highlighted. > > (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 20080704 > > (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 > > (XEN) Latest ChangeSet: unavailableto modify the kernel arguments > > (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M > > dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax > > noreboot > > (XEN) Video information: > > (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. > > (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds > > (XEN) EDID info not retrieved because no DDC retrieval method detected > > (XEN) Disc information: > > (XEN) Found 6 MBR signatures > > (XEN) Found 6 EDD information structures > > (XEN) Xen-e820 RAM map: > > (XEN) 0000000000000000 - 000000000009a800 (usable) > > (XEN) 000000000009a800 - 00000000000a0000 (reserved) > > (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) > > (XEN) 0000000000100000 - 00000000bf790000 (usable) > > (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) > > (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) > > (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) > > (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) > > (XEN) 00000000e0000000 - 00000000f0000000 (reserved) > > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > > (XEN) 00000000fff00000 - 0000000100000000 (reserved) > > (XEN) 0000000100000000 - 0000000640000000 (usable) > > (XEN) --------------849 > > (XEN) --------------849 > > (XEN) --------------849 > > (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) > > (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) > > (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) > > (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) > > (XEN) ACPI: FACS BF79E000, 0040 > > (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) > > (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) > > (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) > > (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) > > (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) > > (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) > > (XEN) --------------847 > > (XEN) ---------srat enter > > (XEN) ---------prepare enter into pfn > > (XEN) -------in pfn > > (XEN) -------hole shift returned > > (XEN) --------------849 > > (XEN) System RAM: 24542MB (25131224kB) > > (XEN) Unknown interrupt (cr2=0000000000000000) > > (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 > > 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 > > 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 > > 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 > > ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 > > 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 > > 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 > > 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff > > 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 > > 0000000000001000 0000000000000004 0000000000000080 0000000000000001 > > ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 > > 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc > > 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 > > 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 > > 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 > > 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 > > 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 > > 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 > > 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 00000000fffff000 > > > >> Date: Wed, 1 Sep 2010 09:49:18 +0100 > >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >> From: keir.fraser@eu.citrix.com > >> To: JBeulich@novell.com > >> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > >> > >> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: > >> > >>>> Well I agree with your logic anyway. So I don''t see that this can be the > >>>> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as > >>>> to > >>>> why the page arithmetic and checks in free_heap_pages are (apparently) > >>>> resulting in a page pointer way outside the frame-table region and actually > >>>> in the directmap region. > >>> > >>> There must be some unchecked use of PAGE_LIST_NULL, i.e. > >>> running off a list end without taking notice (0xffff8315ffffffe4 > >>> exactly corresponds with that). > >> > >> Okay, my next guess then is that we are deleting a chunk from the wrong list > >> head. I don''t see any check that the adjacent chunks we are considering to > >> merge are from the same node and zone. I suppose the zone logic does just > >> work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t > >> the merging logic in free_heap_pages be checking that the merging candidate > >> is from the same NUMA node? I see I have an ASSERTion later in the same > >> function, but it''s too weak and wishful I suspect. > >> > >> MaoXiaoyun: can you please test with the attached patch? If I''m right, you > >> will crash on one of the BUG_ON checks that I added, rather than crashing on > >> a pointer dereference. You may even crash during boot. Anyhow, what is > >> interesting is whether this patch always makes you crash on BUG_ON before > >> you would normally crash on pointer dereference. If so this is trivial to > >> fix. > >> > >> Thanks, > >> Keir > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
That doesn''t imply anything. It is perfectly valid for a page''s prev or next index to be PAGE_LIST_NULL, if that page is not in a list, or if it is at the head and/or tail of a list. -- Keir On 01/09/2010 11:21, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:> Thanks Keir. > > I myself did below test. in page_alloc.c. > check_page will panic on all pages which the 6th character in its adddress is > ''3'', i used to indicate which line paniced. > > Below output indicates the panic comes from line 558, and the page address is > ffff82f600002040, while its next page > is ffff8315ffffffe0, compare to the panic address in previous > panic(ffff8315ffffffe4), which is very similar. > > I think this should imply something. > > --------------------------------------- > (XEN) -----------18 > (XEN) System RAM: 24542MB (25131224kB) > (XEN) SRAT: No PXM for e820 range: 0000000000000000 - 000000000009a7ff > (XEN) SRAT: SRAT not used. > (XEN) ----------------pgb ffff82f600002040 pg ffff8315ffffffe0, mask 1, order > 0, 0 > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) xmao invalid page address assigned > (XEN) **************************************** > (XEN) > > ---------------------------------------------------- > 485 static int check_page(struct page_info* pgb, struct page_info* pg, > unsigned long mask, unsigned int order, int i){ > 486 > 487 if((unsigned long)pg & 0x0000020000000000 && > 488 (unsigned long)pg & 0x0000010000000000 > 489 ){ > 490 printk("----------------pgb %p pg %p, mask %lx, order > %d, %d\n", pgb, pg, mask, order, i); > 491 panic("xmao invalid page address assigned \n"); > 492 } > 493 return 0; > 494 } > > 549 if ( (page_to_mfn(pg) & mask) ) > 550 { > 551 /* Merge with predecessor block? */ > 552 if ( !mfn_valid(page_to_mfn(pg-mask)) || > 553 !page_state_is(pg-mask, free) || > 554 (PFN_ORDER(pg-mask) != order) ) > 555 break; > 556 pg -= mask; > 557 > 558 check_page(pg, pdx_to_page(pg->list.next), mask, order, 0); > 559 check_page(pg, pdx_to_page(pg->list.prev), mask, order, 1); > 560 > 561 page_list_del(pg, &heap(node, zone, order)); > 562 } > 563 else > 564 { > 565 /* Merge with successor block? */ > 566 if ( !mfn_valid(page_to_mfn(pg+mask)) || > 567 !page_state_is(pg+mask, free) || > 568 (PFN_ORDER(pg+mask) != order) ) > 569 break; > 570 > 571 pgt = pg + mask; > 572 check_page(pg, pdx_to_page(pgt->list.next), mask, order, 2); > 573 check_page(pg, pdx_to_page(pgt->list.prev), mask, order, 3); > 574 > >> Date: Wed, 1 Sep 2010 10:58:54 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@eu.citrix.com >> To: tinnycloud@hotmail.com; jbeulich@novell.com >> CC: xen-devel@lists.xensource.com >> >> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent >> merging across node boundaries. Nonetheless the code is simpler and more >> obvious if we put a further merging constraint in free_heap_pages() instead. >> It''s also correcter, since I''m not sure that the >> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won''t possibly BUG out >> if pg-1 is not a RAM page and is not in a known NUMA node range. >> >> Please give the attached patch a spin. (You should revert the previous >> patch, of course). >> >> Thanks, >> Keir >> >> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: >> >>> Well. It did crash on every startup. >>> >>> below is what I got. >>> --------------------------------------------------- >>> root (hd0,0) >>> Filesystem type is ext2fs, partition type 0x83 >>> kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >>> dom0_max_ >>> vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax >>> noreboot >>> [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, >>> entry=0x100000 >>> ] >>> module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe >>> console=hvc0 >>> [Multiboot-module @ 0x39b000, 0x3214d0 bytes] >>> >>> >>> __ __ _ _ >>> ___ ___ >>> \ \/ /___ _ __ | || | / _ \ / _ \ * >>> \ // _ \ ''_ \ | || |_| | | | | | | * >>> / \ __/ | | | |__ _| |_| | |_| | * * >>> /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** >>> hich entry is highlighted. >>> (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 20080704 >>> (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 >>> (XEN) Latest ChangeSet: unavailableto modify the kernel arguments >>> (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >>> dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 >>> conswitch=ax >>> noreboot >>> (XEN) Video information: >>> (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. >>> (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds >>> (XEN) EDID info not retrieved because no DDC retrieval method detected >>> (XEN) Disc information: >>> (XEN) Found 6 MBR signatures >>> (XEN) Found 6 EDD information structures >>> (XEN) Xen-e820 RAM map: >>> (XEN) 0000000000000000 - 000000000009a800 (usable) >>> (XEN) 000000000009a800 - 00000000000a0000 (reserved) >>> (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) >>> (XEN) 0000000000100000 - 00000000bf790000 (usable) >>> (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) >>> (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) >>> (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) >>> (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) >>> (XEN) 00000000e0000000 - 00000000f0000000 (reserved) >>> (XEN) 00000000fee00000 - 00000000fee01000 (reserved) >>> (XEN) 00000000fff00000 - 0000000100000000 (reserved) >>> (XEN) 0000000100000000 - 0000000640000000 (usable) >>> (XEN) --------------849 >>> (XEN) --------------849 >>> (XEN) --------------849 >>> (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) >>> (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) >>> (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) >>> (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) >>> (XEN) ACPI: FACS BF79E000, 0040 >>> (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) >>> (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) >>> (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) >>> (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) >>> (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) >>> (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) >>> (XEN) --------------847 >>> (XEN) ---------srat enter >>> (XEN) ---------prepare enter into pfn >>> (XEN) -------in pfn >>> (XEN) -------hole shift returned >>> (XEN) --------------849 >>> (XEN) System RAM: 24542MB (25131224kB) >>> (XEN) Unknown interrupt (cr2=0000000000000000) >>> (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 >>> 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 >>> 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 >>> 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 >>> ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 >>> 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 >>> 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 >>> 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff >>> 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 >>> 0000000000001000 0000000000000004 0000000000000080 0000000000000001 >>> ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 >>> 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc >>> 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 >>> 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 >>> 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 >>> 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 >>> 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 >>> 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 >>> 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 >>> 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 0000000000000000 00000000fffff000 >>> >>>> Date: Wed, 1 Sep 2010 09:49:18 +0100 >>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >>>> From: keir.fraser@eu.citrix.com >>>> To: JBeulich@novell.com >>>> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >>>> >>>> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: >>>> >>>>>> Well I agree with your logic anyway. So I don''t see that this can be the >>>>>> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as >>>>>> to >>>>>> why the page arithmetic and checks in free_heap_pages are (apparently) >>>>>> resulting in a page pointer way outside the frame-table region and >>>>>> actually >>>>>> in the directmap region. >>>>> >>>>> There must be some unchecked use of PAGE_LIST_NULL, i.e. >>>>> running off a list end without taking notice (0xffff8315ffffffe4 >>>>> exactly corresponds with that). >>>> >>>> Okay, my next guess then is that we are deleting a chunk from the wrong >>>> list >>>> head. I don''t see any check that the adjacent chunks we are considering to >>>> merge are from the same node and zone. I suppose the zone logic does just >>>> work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t >>>> the merging logic in free_heap_pages be checking that the merging candidate >>>> is from the same NUMA node? I see I have an ASSERTion later in the same >>>> function, but it''s too weak and wishful I suspect. >>>> >>>> MaoXiaoyun: can you please test with the attached patch? If I''m right, you >>>> will crash on one of the BUG_ON checks that I added, rather than crashing >>>> on >>>> a pointer dereference. You may even crash during boot. Anyhow, what is >>>> interesting is whether this patch always makes you crash on BUG_ON before >>>> you would normally crash on pointer dereference. If so this is trivial to >>>> fix. >>>> >>>> Thanks, >>>> Keir >>>> >>> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
More interesting would be to turn the BUG_ON stamements in my first patch into if() statements and print out that kind of info before panic()ing. It would tell us which BUG_ON() fired, the page addresses (and maybe MFNs) and order, mask, node, and zone info. -- Keir On 01/09/2010 11:25, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> That doesn''t imply anything. It is perfectly valid for a page''s prev or next > index to be PAGE_LIST_NULL, if that page is not in a list, or if it is at > the head and/or tail of a list. > > -- Keir > > On 01/09/2010 11:21, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > >> Thanks Keir. >> >> I myself did below test. in page_alloc.c. >> check_page will panic on all pages which the 6th character in its adddress is >> ''3'', i used to indicate which line paniced. >> >> Below output indicates the panic comes from line 558, and the page address is >> ffff82f600002040, while its next page >> is ffff8315ffffffe0, compare to the panic address in previous >> panic(ffff8315ffffffe4), which is very similar. >> >> I think this should imply something. >> >> --------------------------------------- >> (XEN) -----------18 >> (XEN) System RAM: 24542MB (25131224kB) >> (XEN) SRAT: No PXM for e820 range: 0000000000000000 - 000000000009a7ff >> (XEN) SRAT: SRAT not used. >> (XEN) ----------------pgb ffff82f600002040 pg ffff8315ffffffe0, mask 1, order >> 0, 0 >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) xmao invalid page address assigned >> (XEN) **************************************** >> (XEN) >> >> ---------------------------------------------------- >> 485 static int check_page(struct page_info* pgb, struct page_info* pg, >> unsigned long mask, unsigned int order, int i){ >> 486 >> 487 if((unsigned long)pg & 0x0000020000000000 && >> 488 (unsigned long)pg & 0x0000010000000000 >> 489 ){ >> 490 printk("----------------pgb %p pg %p, mask %lx, order >> %d, %d\n", pgb, pg, mask, order, i); >> 491 panic("xmao invalid page address assigned \n"); >> 492 } >> 493 return 0; >> 494 } >> >> 549 if ( (page_to_mfn(pg) & mask) ) >> 550 { >> 551 /* Merge with predecessor block? */ >> 552 if ( !mfn_valid(page_to_mfn(pg-mask)) || >> 553 !page_state_is(pg-mask, free) || >> 554 (PFN_ORDER(pg-mask) != order) ) >> 555 break; >> 556 pg -= mask; >> 557 >> 558 check_page(pg, pdx_to_page(pg->list.next), mask, order, 0); >> 559 check_page(pg, pdx_to_page(pg->list.prev), mask, order, 1); >> 560 >> 561 page_list_del(pg, &heap(node, zone, order)); >> 562 } >> 563 else >> 564 { >> 565 /* Merge with successor block? */ >> 566 if ( !mfn_valid(page_to_mfn(pg+mask)) || >> 567 !page_state_is(pg+mask, free) || >> 568 (PFN_ORDER(pg+mask) != order) ) >> 569 break; >> 570 >> 571 pgt = pg + mask; >> 572 check_page(pg, pdx_to_page(pgt->list.next), mask, order, 2); >> 573 check_page(pg, pdx_to_page(pgt->list.prev), mask, order, 3); >> 574 >> >>> Date: Wed, 1 Sep 2010 10:58:54 +0100 >>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >>> From: keir.fraser@eu.citrix.com >>> To: tinnycloud@hotmail.com; jbeulich@novell.com >>> CC: xen-devel@lists.xensource.com >>> >>> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent >>> merging across node boundaries. Nonetheless the code is simpler and more >>> obvious if we put a further merging constraint in free_heap_pages() instead. >>> It''s also correcter, since I''m not sure that the >>> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won''t possibly BUG out >>> if pg-1 is not a RAM page and is not in a known NUMA node range. >>> >>> Please give the attached patch a spin. (You should revert the previous >>> patch, of course). >>> >>> Thanks, >>> Keir >>> >>> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: >>> >>>> Well. It did crash on every startup. >>>> >>>> below is what I got. >>>> --------------------------------------------------- >>>> root (hd0,0) >>>> Filesystem type is ext2fs, partition type 0x83 >>>> kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >>>> dom0_max_ >>>> vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax >>>> noreboot >>>> [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, >>>> entry=0x100000 >>>> ] >>>> module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe >>>> console=hvc0 >>>> [Multiboot-module @ 0x39b000, 0x3214d0 bytes] >>>> >>>> >>>> __ __ _ _ >>>> ___ ___ >>>> \ \/ /___ _ __ | || | / _ \ / _ \ * >>>> \ // _ \ ''_ \ | || |_| | | | | | | * >>>> / \ __/ | | | |__ _| |_| | |_| | * * >>>> /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** >>>> hich entry is highlighted. >>>> (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 >>>> 20080704 >>>> (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 >>>> (XEN) Latest ChangeSet: unavailableto modify the kernel arguments >>>> (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >>>> dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 >>>> conswitch=ax >>>> noreboot >>>> (XEN) Video information: >>>> (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. >>>> (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds >>>> (XEN) EDID info not retrieved because no DDC retrieval method detected >>>> (XEN) Disc information: >>>> (XEN) Found 6 MBR signatures >>>> (XEN) Found 6 EDD information structures >>>> (XEN) Xen-e820 RAM map: >>>> (XEN) 0000000000000000 - 000000000009a800 (usable) >>>> (XEN) 000000000009a800 - 00000000000a0000 (reserved) >>>> (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) >>>> (XEN) 0000000000100000 - 00000000bf790000 (usable) >>>> (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) >>>> (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) >>>> (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) >>>> (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) >>>> (XEN) 00000000e0000000 - 00000000f0000000 (reserved) >>>> (XEN) 00000000fee00000 - 00000000fee01000 (reserved) >>>> (XEN) 00000000fff00000 - 0000000100000000 (reserved) >>>> (XEN) 0000000100000000 - 0000000640000000 (usable) >>>> (XEN) --------------849 >>>> (XEN) --------------849 >>>> (XEN) --------------849 >>>> (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) >>>> (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) >>>> (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) >>>> (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) >>>> (XEN) ACPI: FACS BF79E000, 0040 >>>> (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) >>>> (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) >>>> (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) >>>> (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) >>>> (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) >>>> (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) >>>> (XEN) --------------847 >>>> (XEN) ---------srat enter >>>> (XEN) ---------prepare enter into pfn >>>> (XEN) -------in pfn >>>> (XEN) -------hole shift returned >>>> (XEN) --------------849 >>>> (XEN) System RAM: 24542MB (25131224kB) >>>> (XEN) Unknown interrupt (cr2=0000000000000000) >>>> (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 >>>> 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 >>>> 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 >>>> 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 >>>> ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 >>>> 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 >>>> 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 >>>> 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff >>>> 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 >>>> 0000000000001000 0000000000000004 0000000000000080 0000000000000001 >>>> ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 >>>> 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc >>>> 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 >>>> 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 >>>> 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 >>>> 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 >>>> 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 >>>> 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 >>>> 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 00000000fffff000 >>>> >>>>> Date: Wed, 1 Sep 2010 09:49:18 +0100 >>>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >>>>> From: keir.fraser@eu.citrix.com >>>>> To: JBeulich@novell.com >>>>> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >>>>> >>>>> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: >>>>> >>>>>>> Well I agree with your logic anyway. So I don''t see that this can be the >>>>>>> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped >>>>>>> as >>>>>>> to >>>>>>> why the page arithmetic and checks in free_heap_pages are (apparently) >>>>>>> resulting in a page pointer way outside the frame-table region and >>>>>>> actually >>>>>>> in the directmap region. >>>>>> >>>>>> There must be some unchecked use of PAGE_LIST_NULL, i.e. >>>>>> running off a list end without taking notice (0xffff8315ffffffe4 >>>>>> exactly corresponds with that). >>>>> >>>>> Okay, my next guess then is that we are deleting a chunk from the wrong >>>>> list >>>>> head. I don''t see any check that the adjacent chunks we are considering to >>>>> merge are from the same node and zone. I suppose the zone logic does just >>>>> work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t >>>>> the merging logic in free_heap_pages be checking that the merging >>>>> candidate >>>>> is from the same NUMA node? I see I have an ASSERTion later in the same >>>>> function, but it''s too weak and wishful I suspect. >>>>> >>>>> MaoXiaoyun: can you please test with the attached patch? If I''m right, you >>>>> will crash on one of the BUG_ON checks that I added, rather than crashing >>>>> on >>>>> a pointer dereference. You may even crash during boot. Anyhow, what is >>>>> interesting is whether this patch always makes you crash on BUG_ON before >>>>> you would normally crash on pointer dereference. If so this is trivial to >>>>> fix. >>>>> >>>>> Thanks, >>>>> Keir >>>>> >>>> >>> >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 01.09.10 at 12:21, MaoXiaoyun <tinnycloud@hotmail.com> wrote: > I myself did below test. in page_alloc.c. > > check_page will panic on all pages which the 6th character in its adddress > is ''3'', i used to indicate which line paniced. > > > > Below output indicates the panic comes from line 558, and the page address > is ffff82f600002040, while its next page > > is ffff8315ffffffe0, compare to the panic address in previous > panic(ffff8315ffffffe4), which is very similar. > > > > I think this should imply something.No, you didn''t do it right. When merging backwards pg->list.next may validly be PAGE_LIST_NULL, and hence calling check_page() with it passed as second argument isn''t correct. Similarly when forward merging, pgt->list.prev may validly be PAGE_LIST_NULL. Jan> > > > --------------------------------------- > > (XEN) -----------18 > (XEN) System RAM: 24542MB (25131224kB) > (XEN) SRAT: No PXM for e820 range: 0000000000000000 - 000000000009a7ff > (XEN) SRAT: SRAT not used. > (XEN) ----------------pgb ffff82f600002040 pg ffff8315ffffffe0, mask 1, order 0, 0 > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) xmao invalid page address assigned > (XEN) **************************************** > (XEN) > > > > ---------------------------------------------------- > > 485 static int check_page(struct page_info* pgb, struct page_info* pg, > unsigned long mask, unsigned int order, int i){ > 486 > 487 if((unsigned long)pg & 0x0000020000000000 && > 488 (unsigned long)pg & 0x0000010000000000 > 489 ){ > 490 printk("----------------pgb %p pg %p, mask %lx, order %d, %d\n", pgb, > pg, mask, order, i); > 491 panic("xmao invalid page address assigned \n"); > 492 } > 493 return 0; > 494 } > > > > 549 if ( (page_to_mfn(pg) & mask) ) > 550 { > 551 /* Merge with predecessor block? */ > 552 if ( !mfn_valid(page_to_mfn(pg-mask)) || > 553 !page_state_is(pg-mask, free) || > 554 (PFN_ORDER(pg-mask) != order) ) > 555 break; > 556 pg -= mask; > 557 > 558 check_page(pg, pdx_to_page(pg->list.next), mask, order, 0); > > > 559 check_page(pg, pdx_to_page(pg->list.prev), mask, order, 1); > 560 > 561 page_list_del(pg, &heap(node, zone, order)); > 562 } > 563 else > 564 { > 565 /* Merge with successor block? */ > 566 if ( !mfn_valid(page_to_mfn(pg+mask)) || > 567 !page_state_is(pg+mask, free) || > 568 (PFN_ORDER(pg+mask) != order) ) > 569 break; > 570 > 571 pgt = pg + mask; > 572 check_page(pg, pdx_to_page(pgt->list.next), mask, order, 2); > 573 check_page(pg, pdx_to_page(pgt->list.prev), mask, order, 3); > 574 > >> Date: Wed, 1 Sep 2010 10:58:54 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@eu.citrix.com >> To: tinnycloud@hotmail.com; jbeulich@novell.com >> CC: xen-devel@lists.xensource.com >> >> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent >> merging across node boundaries. Nonetheless the code is simpler and more >> obvious if we put a further merging constraint in free_heap_pages() instead. >> It''s also correcter, since I''m not sure that the >> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won''t possibly BUG out >> if pg-1 is not a RAM page and is not in a known NUMA node range. >> >> Please give the attached patch a spin. (You should revert the previous >> patch, of course). >> >> Thanks, >> Keir >> >> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: >> >> > Well. It did crash on every startup. >> > >> > below is what I got. >> > --------------------------------------------------- >> > root (hd0,0) >> > Filesystem type is ext2fs, partition type 0x83 >> > kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >> > dom0_max_ >> > vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax > noreboot >> > [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, >> > entry=0x100000 >> > ] >> > module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0 >> > [Multiboot-module @ 0x39b000, 0x3214d0 bytes] >> > >> > >> > __ __ _ _ >> > ___ ___ >> > \ \/ /___ _ __ | || | / _ \ / _ \ * >> > \ // _ \ ''_ \ | || |_| | | | | | | * >> > / \ __/ | | | |__ _| |_| | |_| | * * >> > /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** >> > hich entry is highlighted. >> > (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 > 20080704 >> > (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 >> > (XEN) Latest ChangeSet: unavailableto modify the kernel arguments >> > (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >> > dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 > conswitch=ax >> > noreboot >> > (XEN) Video information: >> > (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. >> > (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds >> > (XEN) EDID info not retrieved because no DDC retrieval method detected >> > (XEN) Disc information: >> > (XEN) Found 6 MBR signatures >> > (XEN) Found 6 EDD information structures >> > (XEN) Xen-e820 RAM map: >> > (XEN) 0000000000000000 - 000000000009a800 (usable) >> > (XEN) 000000000009a800 - 00000000000a0000 (reserved) >> > (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) >> > (XEN) 0000000000100000 - 00000000bf790000 (usable) >> > (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) >> > (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) >> > (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) >> > (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) >> > (XEN) 00000000e0000000 - 00000000f0000000 (reserved) >> > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) >> > (XEN) 00000000fff00000 - 0000000100000000 (reserved) >> > (XEN) 0000000100000000 - 0000000640000000 (usable) >> > (XEN) --------------849 >> > (XEN) --------------849 >> > (XEN) --------------849 >> > (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) >> > (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) >> > (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) >> > (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) >> > (XEN) ACPI: FACS BF79E000, 0040 >> > (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) >> > (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) >> > (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) >> > (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) >> > (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) >> > (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) >> > (XEN) --------------847 >> > (XEN) ---------srat enter >> > (XEN) ---------prepare enter into pfn >> > (XEN) -------in pfn >> > (XEN) -------hole shift returned >> > (XEN) --------------849 >> > (XEN) System RAM: 24542MB (25131224kB) >> > (XEN) Unknown interrupt (cr2=0000000000000000) >> > (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 >> > 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 >> > 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 >> > 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 >> > ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 >> > 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 >> > 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 >> > 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff >> > 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 >> > 0000000000001000 0000000000000004 0000000000000080 0000000000000001 >> > ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 >> > 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc >> > 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 >> > 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 >> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 >> > 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 >> > 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 >> > 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 >> > 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 >> > 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 >> > 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 >> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > 0000000000000000 0000000000000000 00000000fffff000 >> > >> >> Date: Wed, 1 Sep 2010 09:49:18 +0100 >> >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> >> From: keir.fraser@eu.citrix.com >> >> To: JBeulich@novell.com >> >> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> >> >> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: >> >> >> >>>> Well I agree with your logic anyway. So I don''t see that this can be the >> >>>> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as >> >>>> to >> >>>> why the page arithmetic and checks in free_heap_pages are (apparently) >> >>>> resulting in a page pointer way outside the frame-table region and actually >> >>>> in the directmap region. >> >>> >> >>> There must be some unchecked use of PAGE_LIST_NULL, i.e. >> >>> running off a list end without taking notice (0xffff8315ffffffe4 >> >>> exactly corresponds with that). >> >> >> >> Okay, my next guess then is that we are deleting a chunk from the wrong > list >> >> head. I don''t see any check that the adjacent chunks we are considering to >> >> merge are from the same node and zone. I suppose the zone logic does just >> >> work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t >> >> the merging logic in free_heap_pages be checking that the merging candidate >> >> is from the same NUMA node? I see I have an ASSERTion later in the same >> >> function, but it''s too weak and wishful I suspect. >> >> >> >> MaoXiaoyun: can you please test with the attached patch? If I''m right, you >> >> will crash on one of the BUG_ON checks that I added, rather than crashing > on >> >> a pointer dereference. You may even crash during boot. Anyhow, what is >> >> interesting is whether this patch always makes you crash on BUG_ON before >> >> you would normally crash on pointer dereference. If so this is trivial to >> >> fix. >> >> >> >> Thanks, >> >> Keir >> >> >> > >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
When I put bug on code into if statement, the server can start. Well, I should have committed another stupid mistakes during manually copy the patch, I apologize. Anyway, I have one server run with patch one, where the patch is move into if statement, I shall get the page address, and other information if it panic. Meanwhile, I''ll have another server to run the second patch. I''ll keep u updated, thanks.> Date: Wed, 1 Sep 2010 10:58:54 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@eu.citrix.com > To: tinnycloud@hotmail.com; jbeulich@novell.com > CC: xen-devel@lists.xensource.com > > Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent > merging across node boundaries. Nonetheless the code is simpler and more > obvious if we put a further merging constraint in free_heap_pages() instead. > It''s also correcter, since I''m not sure that the > phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won''t possibly BUG out > if pg-1 is not a RAM page and is not in a known NUMA node range. > > Please give the attached patch a spin. (You should revert the previous > patch, of course). > > Thanks, > Keir > > On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote: > > > Well. It did crash on every startup. > > > > below is what I got. > > --------------------------------------------------- > > root (hd0,0) > > Filesystem type is ext2fs, partition type 0x83 > > kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M > > dom0_max_ > > vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot > > [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, > > entry=0x100000 > > ] > > module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0 > > [Multiboot-module @ 0x39b000, 0x3214d0 bytes] > > > > > > __ __ _ _ > > ___ ___ > > \ \/ /___ _ __ | || | / _ \ / _ \ * > > \ // _ \ ''_ \ | || |_| | | | | | | * > > / \ __/ | | | |__ _| |_| | |_| | * * > > /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** > > hich entry is highlighted. > > (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 20080704 > > (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 > > (XEN) Latest ChangeSet: unavailableto modify the kernel arguments > > (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M > > dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax > > noreboot > > (XEN) Video information: > > (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. > > (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds > > (XEN) EDID info not retrieved because no DDC retrieval method detected > > (XEN) Disc information: > > (XEN) Found 6 MBR signatures > > (XEN) Found 6 EDD information structures > > (XEN) Xen-e820 RAM map: > > (XEN) 0000000000000000 - 000000000009a800 (usable) > > (XEN) 000000000009a800 - 00000000000a0000 (reserved) > > (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) > > (XEN) 0000000000100000 - 00000000bf790000 (usable) > > (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) > > (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) > > (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) > > (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) > > (XEN) 00000000e0000000 - 00000000f0000000 (reserved) > > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > > (XEN) 00000000fff00000 - 0000000100000000 (reserved) > > (XEN) 0000000100000000 - 0000000640000000 (usable) > > (XEN) --------------849 > > (XEN) --------------849 > > (XEN) --------------849 > > (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) > > (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) > > (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) > > (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) > > (XEN) ACPI: FACS BF79E000, 0040 > > (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) > > (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) > > (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) > > (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) > > (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) > > (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) > > (XEN) --------------847 > > (XEN) ---------srat enter > > (XEN) ---------prepare enter into pfn > > (XEN) -------in pfn > > (XEN) -------hole shift returned > > (XEN) --------------849 > > (XEN) System RAM: 24542MB (25131224kB) > > (XEN) Unknown interrupt (cr2=0000000000000000) > > (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 > > 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 > > 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 > > 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 > > ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 > > 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 > > 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 > > 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff > > 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 > > 0000000000001000 0000000000000004 0000000000000080 0000000000000001 > > ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 > > 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc > > 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 > > 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 > > 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 > > 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 > > 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 > > 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 > > 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 00000000fffff000 > > > >> Date: Wed, 1 Sep 2010 09:49:18 +0100 > >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >> From: keir.fraser@eu.citrix.com > >> To: JBeulich@novell.com > >> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > >> > >> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote: > >> > >>>> Well I agree with your logic anyway. So I don''t see that this can be the > >>>> cause of MaoXiaoyun''s bug. At least not directly. But then I''m stumped as > >>>> to > >>>> why the page arithmetic and checks in free_heap_pages are (apparently) > >>>> resulting in a page pointer way outside the frame-table region and actually > >>>> in the directmap region. > >>> > >>> There must be some unchecked use of PAGE_LIST_NULL, i.e. > >>> running off a list end without taking notice (0xffff8315ffffffe4 > >>> exactly corresponds with that). > >> > >> Okay, my next guess then is that we are deleting a chunk from the wrong list > >> head. I don''t see any check that the adjacent chunks we are considering to > >> merge are from the same node and zone. I suppose the zone logic does just > >> work as we''re dealing with 2**x aligned and sized regions. But, shouldn''t > >> the merging logic in free_heap_pages be checking that the merging candidate > >> is from the same NUMA node? I see I have an ASSERTion later in the same > >> function, but it''s too weak and wishful I suspect. > >> > >> MaoXiaoyun: can you please test with the attached patch? If I''m right, you > >> will crash on one of the BUG_ON checks that I added, rather than crashing on > >> a pointer dereference. You may even crash during boot. Anyhow, what is > >> interesting is whether this patch always makes you crash on BUG_ON before > >> you would normally crash on pointer dereference. If so this is trivial to > >> fix. > >> > >> Thanks, > >> Keir > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel