Christophe Saout
2006-Oct-01 20:01 UTC
[Xen-devel] Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:197
Hello list, I just got this ominous bug on my machine, that has already been seen several times: http://lists.xensource.com/archives/html/xen-devel/2006-01/msg00180.html The machine is very similar, it''s a machine with two dual-core opterons, running one of the latest xen-3.0.3-unstable (20060926 hypervisor, and a vanilla 2.6.18 + xen patch from Fedora from 20060915). This machine was running since yesterday and I just did some compiling. 1 20:53:48 waff ----------- [cut here ] --------- [please bite here ] --------- 1 20:53:48 waff Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:197 1 20:53:48 waff invalid opcode: 0000 [1] SMP 1 20:53:48 waff CPU 3 1 20:53:48 waff Modules linked in: xt_NOTRACK iptable_raw iptable_mangle xt_MARK ipt_MASQUERADE iptable_nat xt_physdev ipt 1 20:53:48 waff Pid: 31297, comm: ebuild.sh Not tainted 2.6.18-cs1-xen0 #1 1 20:53:48 waff RIP: e030:[<ffffffff80285c75>] [<ffffffff80285c75>] xen_pgd_pin+0x55/0x70 1 20:53:48 waff RSP: e02b:ffff8800210e9d88 EFLAGS: 00010282 1 20:53:48 waff RAX: 00000000ffffffea RBX: ffff88003ea1e1c0 RCX: 0000000000034d18 1 20:53:48 waff RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800210e9d88 1 20:53:48 waff RBP: ffff8800210e9da8 R08: ffff88001c05cff8 R09: ffff88001c05cff8 1 20:53:48 waff R10: 0000000000007ff0 R11: ffff88000ecbcff8 R12: 0000000000000000 1 20:53:48 waff R13: ffff8800000ca840 R14: 0000000001200011 R15: ffff8800000ca840 1 20:53:48 waff FS: 00002b3887c7ce30(0000) GS:ffffffff80773180(0000) knlGS:0000000000000000 1 20:53:48 waff CS: e033 DS: 0000 ES: 0000 1 20:53:48 waff Process ebuild.sh (pid: 31297, threadinfo ffff8800210e8000, task ffff88001f331080) 1 20:53:48 waff Stack: 0000000000000003 0000000000098718 0000000001200011 ffff88000ecbcff8 1 20:53:48 waff ffff8800210e9dc8 ffffffff80285573 0000000000000000 ffff88000c1263d8 1 20:53:48 waff ffff8800210e9dd8 ffffffff80285622 1 20:53:48 waff Call Trace: 1 20:53:48 waff [<ffffffff80285573>] mm_pin+0x183/0x220 1 20:53:48 waff [<ffffffff80285622>] _arch_dup_mmap+0x12/0x20 1 20:53:48 waff [<ffffffff802220b0>] copy_process+0xc50/0x1870 1 20:53:48 waff [<ffffffff8023680f>] do_fork+0xef/0x210 1 20:53:48 waff [<ffffffff8029c652>] recalc_sigpending+0x12/0x20 1 20:53:48 waff [<ffffffff8022005d>] sigprocmask+0xfd/0x110 1 20:53:48 waff [<ffffffff80269662>] system_call+0x86/0x8b 1 20:53:48 waff [<ffffffff802767c3>] sys_clone+0x23/0x30 1 20:53:48 waff [<ffffffff80269a71>] ptregscall_common+0x3d/0x64 1 20:53:48 waff 1 20:53:48 waff 1 20:53:48 waff Code: 0f 0b 68 b8 76 5a 80 c2 c5 00 90 c9 c3 00 00 00 00 00 00 00 1 20:53:48 waff RIP [<ffffffff80285c75>] xen_pgd_pin+0x55/0x70 1 20:53:48 waff RSP <ffff8800210e9d88> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Oct-01 20:09 UTC
RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197
> I just got this ominous bug on my machine, that has already been seen > several times: > >http://lists.xensource.com/archives/html/xen-devel/2006-01/msg00180.html That''s an old issue, not relevant on 3.0.3.> The machine is very similar, it''s a machine with two dual-coreopterons,> running one of the latest xen-3.0.3-unstable (20060926 hypervisor, anda> vanilla 2.6.18 + xen patch from Fedora from 20060915).Can you repro using the 2.6.16 kernel that came with 3.0.3 rather than the Fedora one? I suspect not. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2006-Oct-03 21:39 UTC
RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197
Am Sonntag, den 01.10.2006, 21:09 +0100 schrieb Ian Pratt:> That''s an old issue, not relevant on 3.0.3.Well, it turns out that was only one way it could crash. I was able to reproduces this several times. Most of the time I got a "bad page state" followed by hitting a BUG in rmap.c, or things like that. Then, most of the time one or two CPUs would lock up, and somewhat later the whole system.> > The machine is very similar, it''s a machine with two dual-core > opterons, > > running one of the latest xen-3.0.3-unstable (20060926 hypervisor, and > a > > vanilla 2.6.18 + xen patch from Fedora from 20060915). > > Can you repro using the 2.6.16 kernel that came with 3.0.3 rather than > the Fedora one? I suspect not.Well, I can not reproduce these bugs, but the same test case is able to kill the whole machine as well. CPU lockups on Dom0 or any DomU (depending on where the load was) that spread to the other domains until everything locks up. At some point the Dom0 even stops answering to pings. The only thing that still works is 3x Ctrl-A to get the message that the serial console was switched, but even `h'' didn''t give the help text anymore. I would like to think that this is a memory problem, but the machine is brand new and survived memtest86. And as long as I wasn''t running anything except Dom0, I was able to compile a whole Gentoo system four hours, and once I started adding some DomU''s the problems show up within minutes. The best way to reproduce this was to run a rsync on lots of files from one DomU to another (via a bridge in Dom0, filesystems on exported physical block devices) and start a compile job in any of the machines. After 5-10 minutes boom. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2006-Oct-04 00:19 UTC
RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197
Am Dienstag, den 03.10.2006, 23:39 +0200 schrieb Christophe Saout:> > Can you repro using the 2.6.16 kernel that came with 3.0.3 rather than > > the Fedora one? I suspect not. > > Well, I can not reproduce these bugs, but the same test case is able to > kill the whole machine as well. CPU lockups on Dom0 or any DomU > (depending on where the load was) that spread to the other domains until > everything locks up. At some point the Dom0 even stops answering to > pings.Well, with the 2.6.16.29-based Xen, I just noticed that I was getting similar "bad page state" and weird occurences as well, just a bit more down in the list of kernel messages. Notice that there are soft lockups that seem to go away, turn up again and then there''s a bad page state followed by other nasty stuff. I''m not sure how it works exactly, but they always seem to be stuck in some memory-management related hypervisor calls. As stated before, without running any DomU''s, everything is just fine, Dom0 can run compile jobs for hours. But running a DomU with network and disk I/O causes the same types of kernel BUGs/lockups not only in an DomU but also in the Dom0 when causing some load there. Here''s an excerpt of a domU log when it happens. I always see some combination of these before the system goes down entirely (assuming it doesn''t do that so fast that I don''t see anything before everything locks up): Oct 3 23:24:26 tuek BUG: soft lockup detected on CPU#0! Oct 3 23:24:26 tuek CPU 0: Oct 3 23:24:26 tuek Modules linked in: nfsd exportfs Oct 3 23:24:26 tuek Pid: 3998, comm: gmetad Not tainted 2.6.16.29-xen-xenU #2 Oct 3 23:24:26 tuek RIP: e030:[<ffffffff80107348>] <ffffffff80107348>{hypercall_page+840} Oct 3 23:24:26 tuek RSP: e02b:ffff88003db47900 EFLAGS: 00000246 Oct 3 23:24:26 tuek RAX: 000000000000001a RBX: ffff88000007cb78 RCX: ffffffff8010734a Oct 3 23:24:26 tuek RDX: 0000000000000000 RSI: 0000000080000001 RDI: ffff88003db47918 Oct 3 23:24:26 tuek RBP: ffff88003db47938 R08: 0000000000000001 R09: ffffffff80453440 Oct 3 23:24:26 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12: 80000001aa074167 Oct 3 23:24:26 tuek R13: ffff880003b59ce0 R14: 00002aaaaef9c000 R15: 0000000000000ce0 Oct 3 23:24:26 tuek FS: 00002aea16c0ffe0(0063) GS:ffffffff804bf000(0000) knlGS:0000000000000000 Oct 3 23:24:26 tuek CS: e033 DS: 0000 ES: 0000 Oct 3 23:24:26 tuek Oct 3 23:24:26 tuek Call Trace: <ffffffff80117c4a>{xen_invlpg_mask+58} <ffffffff801123a3>{flush_tlb_page+19} Oct 3 23:24:26 tuek <ffffffff8015c479>{__handle_mm_fault+3753} <ffffffff8010722a>{hypercall_page+554} Oct 3 23:24:26 tuek <ffffffff803d57a7>{do_page_fault+3527} <ffffffff8010b6fb>{error_exit+0} Oct 3 23:24:26 tuek <ffffffff8010b6fb>{error_exit+0} <ffffffff8014f50e>{file_read_actor+62} Oct 3 23:24:26 tuek <ffffffff8014f57c>{file_read_actor+172} <ffffffff8014d19c>{do_generic_mapping_read+412} Oct 3 23:24:26 tuek <ffffffff8014f4d0>{file_read_actor+0} <ffffffff8014dce8>{__generic_file_aio_read+424} Oct 3 23:24:26 tuek <ffffffff8014dd98>{generic_file_aio_read+56} <ffffffff801f8f51>{nfs_file_read+129} Oct 3 23:24:26 tuek <ffffffff80172dd0>{do_sync_read+240} <ffffffff80161981>{vma_link+129} Oct 3 23:24:26 tuek <ffffffff80140500>{autoremove_wake_function+0} <ffffffff80162b02>{do_mmap_pgoff+1458} Oct 3 23:24:26 tuek <ffffffff8017381b>{vfs_read+187} <ffffffff80173ce0>{sys_read+80} Oct 3 23:24:26 tuek <ffffffff8010afbe>{system_call+134} <ffffffff8010af38>{system_call+0} Oct 3 23:24:49 tuek BUG: soft lockup detected on CPU#1! Oct 3 23:24:49 tuek CPU 1: Oct 3 23:24:49 tuek Modules linked in: nfsd exportfs Oct 3 23:24:49 tuek Pid: 3998, comm: gmetad Not tainted 2.6.16.29-xen-xenU #2 Oct 3 23:24:49 tuek RIP: e030:[<ffffffff8010722a>] <ffffffff8010722a>{hypercall_page+554} Oct 3 23:24:49 tuek RSP: e02b:ffff88003db479e0 EFLAGS: 00000246 Oct 3 23:24:49 tuek RAX: 0000000000030000 RBX: ffff880001df9b98 RCX: ffffffff8010722a Oct 3 23:24:49 tuek RDX: ffffffffff5fd000 RSI: 0000000000000000 RDI: 0000000000000000 Oct 3 23:24:49 tuek RBP: ffff88003db479f8 R08: 0000000000000000 R09: 0000000000000001 Oct 3 23:24:49 tuek R10: 0000000000000060 R11: 0000000000000246 R12: 0000000000000f18 Oct 3 23:24:49 tuek R13: ffff88003db47d38 R14: 0000000000006000 R15: 0000000000000002 Oct 3 23:24:49 tuek FS: 00002aea16c0ffe0(0063) GS:ffffffff804bf080(0000) knlGS:0000000000000000 Oct 3 23:24:49 tuek CS: e033 DS: 0000 ES: 0000 Oct 3 23:24:49 tuek Call Trace: <ffffffff802dc47e>{force_evtchn_callback+14} Oct 3 23:24:49 tuek <ffffffff803d4ab6>{do_page_fault+214} <ffffffff8010b6fb>{error_exit+0} Oct 3 23:24:49 tuek <ffffffff8010b6fb>{error_exit+0} <ffffffff8010b6fb>{error_exit+0} Oct 3 23:24:49 tuek <ffffffff8014f50e>{file_read_actor+62} <ffffffff8014f57c>{file_read_actor+172} Oct 3 23:24:49 tuek <ffffffff8014d19c>{do_generic_mapping_read+412} <ffffffff8014f4d0>{file_read_actor+0} Oct 3 23:24:49 tuek <ffffffff8014dce8>{__generic_file_aio_read+424} <ffffffff8014dd98>{generic_file_aio_read+56} Oct 3 23:24:49 tuek <ffffffff801f8f51>{nfs_file_read+129} <ffffffff80172dd0>{do_sync_read+240} Oct 3 23:24:49 tuek <ffffffff80161981>{vma_link+129} <ffffffff80140500>{autoremove_wake_function+0} Oct 3 23:24:49 tuek <ffffffff80162b02>{do_mmap_pgoff+1458} <ffffffff8017381b>{vfs_read+187} Oct 3 23:24:49 tuek <ffffffff80173ce0>{sys_read+80} <ffffffff8010afbe>{system_call+134} Oct 3 23:24:49 tuek <ffffffff8010af38>{system_call+0} Oct 3 23:27:28 tuek BUG: soft lockup detected on CPU#0! Oct 3 23:27:28 tuek CPU 0: Oct 3 23:27:28 tuek Modules linked in: nfsd exportfs Oct 3 23:27:28 tuek Pid: 3988, comm: gmetad Not tainted 2.6.16.29-xen-xenU #2 Oct 3 23:27:28 tuek RIP: e030:[<ffffffff8010722a>] <ffffffff8010722a>{hypercall_page+554} Oct 3 23:27:28 tuek RSP: e02b:ffff88003e32f9e0 EFLAGS: 00000246 Oct 3 23:27:28 tuek RAX: 0000000000030000 RBX: ffff8800017ea448 RCX: ffffffff8010722a Oct 3 23:27:28 tuek RDX: ffffffffff5fd000 RSI: 0000000000000000 RDI: 0000000000000000 Oct 3 23:27:28 tuek RBP: ffff88003e32f9f8 R08: 0000000000000000 R09: 0000000000000000 Oct 3 23:27:28 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000001000 Oct 3 23:27:28 tuek R13: ffff88003e32fd38 R14: 0000000000005000 R15: 0000000000000002 Oct 3 23:27:28 tuek FS: 00002aeaaa684b00(0000) GS:ffffffff804bf000(0000) knlGS:0000000000000000 Oct 3 23:27:28 tuek CS: e033 DS: 0000 ES: 0000 Oct 3 23:27:28 tuek Oct 3 23:27:28 tuek Call Trace: <ffffffff802dc47e>{force_evtchn_callback+14} Oct 3 23:27:28 tuek <ffffffff803d4ab6>{do_page_fault+214} <ffffffff8010b6fb>{error_exit+0} Oct 3 23:27:28 tuek <ffffffff8010b6fb>{error_exit+0} <ffffffff8014f50e>{file_read_actor+62} Oct 3 23:27:28 tuek <ffffffff8014f57c>{file_read_actor+172} <ffffffff8014d19c>{do_generic_mapping_read+412} Oct 3 23:27:28 tuek <ffffffff8014f4d0>{file_read_actor+0} <ffffffff8014dce8>{__generic_file_aio_read+424} Oct 3 23:27:28 tuek <ffffffff8014dd98>{generic_file_aio_read+56} <ffffffff801f8f51>{nfs_file_read+129} Oct 3 23:27:28 tuek <ffffffff80172dd0>{do_sync_read+240} <ffffffff80161981>{vma_link+129} Oct 3 23:27:28 tuek <ffffffff80140500>{autoremove_wake_function+0} <ffffffff80162b02>{do_mmap_pgoff+1458} Oct 3 23:27:28 tuek <ffffffff8017381b>{vfs_read+187} <ffffffff80173ce0>{sys_read+80} Oct 3 23:27:28 tuek <ffffffff8010afbe>{system_call+134} <ffffffff8010af38>{system_call+0} Oct 3 23:27:52 tuek Bad page state in process ''bash'' Oct 3 23:27:52 tuek page:ffff880001c72bc8 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:1 Oct 3 23:27:52 tuek Trying to fix it up, but a reboot is needed Oct 3 23:27:52 tuek Backtrace: Oct 3 23:27:52 tuek Oct 3 23:27:52 tuek Call Trace: <ffffffff801512ad>{bad_page+93} <ffffffff80151d57>{get_page_from_freelist+775} Oct 3 23:27:52 tuek <ffffffff80151f1d>{__alloc_pages+157} <ffffffff80152249>{get_zeroed_page+73} Oct 3 23:27:52 tuek <ffffffff80158cf4>{__pmd_alloc+36} <ffffffff8015e55e>{copy_page_range+1262} Oct 3 23:27:52 tuek <ffffffff802a6bea>{rb_insert_color+250} <ffffffff80127cb7>{copy_process+3079} Oct 3 23:27:52 tuek <ffffffff80128c8e>{do_fork+238} <ffffffff801710d6>{fd_install+54} Oct 3 23:27:52 tuek <ffffffff80134e8c>{sigprocmask+220} <ffffffff8010afbe>{system_call+134} Oct 3 23:27:52 tuek <ffffffff801094b3>{sys_clone+35} <ffffffff8010b3e9>{ptregscall_common+61} Oct 3 23:27:52 tuek ----------- [cut here ] --------- [please bite here ] --------- Oct 3 23:27:52 tuek Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:198 Oct 3 23:27:52 tuek invalid opcode: 0000 [1] SMP Oct 3 23:27:52 tuek CPU 3 Oct 3 23:27:52 tuek Modules linked in: nfsd exportfs Oct 3 23:27:52 tuek Pid: 4617, comm: bash Tainted: G B 2.6.16.29-xen-xenU #2 Oct 3 23:27:52 tuek RIP: e030:[<ffffffff80117cb5>] <ffffffff80117cb5>{xen_pgd_pin+85} Oct 3 23:27:52 tuek RSP: e02b:ffff880038ed9d58 EFLAGS: 00010282 Oct 3 23:27:52 tuek RAX: 00000000ffffffea RBX: ffff880000e098c0 RCX: 000000000001dc48 Oct 3 23:27:52 tuek RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880038ed9d58 Oct 3 23:27:52 tuek RBP: ffff880038ed9d78 R08: ffff880038e7fff8 R09: ffff880038e7fff8 Oct 3 23:27:52 tuek R10: 0000000000007ff0 R11: ffff880002d39008 R12: 0000000000000000 Oct 3 23:27:52 tuek R13: ffff8800006383c0 R14: 0000000001200011 R15: ffff8800006383c0 Oct 3 23:27:52 tuek FS: 00002afecc63ae60(0000) GS:ffffffff804bf180(0000) knlGS:0000000000000000 Oct 3 23:27:52 tuek CS: e033 DS: 0000 ES: 0000 Oct 3 23:27:52 tuek Process bash (pid: 4617, threadinfo ffff880038ed8000, task ffff88003f9e0180) Oct 3 23:27:52 tuek Stack: 0000000000000003 00000000001b3aa7 0000000001200011 ffff880002d39008 Oct 3 23:27:52 tuek ffff880038ed9d98 ffffffff80117543 0000000000000000 ffff88003ca4ea28 Oct 3 23:27:52 tuek ffff880038ed9da8 ffffffff801175f2 Oct 3 23:27:52 tuek Call Trace: <ffffffff80117543>{mm_pin+387} <ffffffff801175f2>{_arch_dup_mmap+18} Oct 3 23:27:52 tuek <ffffffff80127cf6>{copy_process+3142} <ffffffff80128c8e>{do_fork+238} Oct 3 23:27:52 tuek <ffffffff801710d6>{fd_install+54} <ffffffff80134e8c>{sigprocmask+220} Oct 3 23:27:52 tuek <ffffffff8010afbe>{system_call+134} <ffffffff801094b3>{sys_clone+35} Oct 3 23:27:52 tuek <ffffffff8010b3e9>{ptregscall_common+61} Oct 3 23:27:52 tuek Oct 3 23:27:52 tuek Code: 0f 0b 68 38 d7 3f 80 c2 c6 00 90 c9 c3 0f 1f 80 00 00 00 00 Oct 3 23:27:52 tuek RIP <ffffffff80117cb5>{xen_pgd_pin+85} RSP <ffff880038ed9d58> Oct 3 23:27:55 tuek <3>BUG: soft lockup detected on CPU#0! Oct 3 23:27:55 tuek CPU 0: Oct 3 23:27:55 tuek Modules linked in: nfsd exportfs Oct 3 23:27:55 tuek Pid: 3998, comm: gmetad Tainted: G B 2.6.16.29-xen-xenU #2 Oct 3 23:27:55 tuek RIP: e030:[<ffffffff8010734a>] <ffffffff8010734a>{hypercall_page+842} Oct 3 23:27:55 tuek RSP: e02b:ffff88003db47900 EFLAGS: 00000246 Oct 3 23:27:55 tuek RAX: 0000000000000000 RBX: ffff88000007cb78 RCX: ffffffff8010734a Oct 3 23:27:55 tuek RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88003db47918 Oct 3 23:27:55 tuek RBP: ffff88003db47938 R08: 0000000000000001 R09: 0000000000000000 Oct 3 23:27:55 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12: 80000001c8bac167 Oct 3 23:27:55 tuek R13: ffff88003a80ab68 R14: 00002aaaaf16d000 R15: 0000000000000b68 Oct 3 23:27:55 tuek FS: 0000000000000000(0063) GS:ffffffff804bf000(0000) knlGS:0000000000000000 Oct 3 23:27:55 tuek CS: e033 DS: 0000 ES: 0000 Oct 3 23:27:55 tuek Oct 3 23:27:55 tuek Call Trace: <ffffffff80117c4a>{xen_invlpg_mask+58} <ffffffff8010b6fb>{error_exit+0} Oct 3 23:27:55 tuek <ffffffff801123a3>{flush_tlb_page+19} <ffffffff8015c479>{__handle_mm_fault+3753} Oct 3 23:27:55 tuek <ffffffff8010722a>{hypercall_page+554} <ffffffff803d57a7>{do_page_fault+3527} Oct 3 23:27:55 tuek <ffffffff8010b6fb>{error_exit+0} <ffffffff8010b6fb>{error_exit+0} Oct 3 23:27:55 tuek <ffffffff8014f50e>{file_read_actor+62} <ffffffff8014f57c>{file_read_actor+172} Oct 3 23:27:55 tuek <ffffffff8014d19c>{do_generic_mapping_read+412} <ffffffff8014f4d0>{file_read_actor+0} Oct 3 23:27:55 tuek <ffffffff8014dce8>{__generic_file_aio_read+424} <ffffffff8014dd98>{generic_file_aio_read+56} Oct 3 23:27:55 tuek <ffffffff801f8f51>{nfs_file_read+129} <ffffffff80172dd0>{do_sync_read+240} Oct 3 23:27:55 tuek <ffffffff80161981>{vma_link+129} <ffffffff80140500>{autoremove_wake_function+0} Oct 3 23:27:55 tuek <ffffffff80162b02>{do_mmap_pgoff+1458} <ffffffff8017381b>{vfs_read+187} Oct 3 23:27:55 tuek <ffffffff80173ce0>{sys_read+80} <ffffffff8010afbe>{system_call+134} Oct 3 23:27:55 tuek <ffffffff8010af38>{system_call+0} Information about the system: __ __ _____ ___ _ _ _ \ \/ /___ _ __ |___ / / _ \ _ _ _ __ ___| |_ __ _| |__ | | ___ \ // _ \ ''_ \ |_ \| | | |__| | | | ''_ \/ __| __/ _` | ''_ \| |/ _ \ / \ __/ | | | ___) | |_| |__| |_| | | | \__ \ || (_| | |_) | | __/ /_/\_\___|_| |_| |____(_)___/ \__,_|_| |_|___/\__\__,_|_.__/|_|\___| http://www.cl.cam.ac.uk/netos/xen University of Cambridge Computer Laboratory Xen version 3.0-unstable (chtephan@intern) (gcc-Version 4.1.1 (Gentoo Hardened 4.1.1-r1, pie-8.7.8)) Wed Oct 4 00:07:24 CEST 2006 Latest ChangeSet: Wed Sep 27 14:30:36 2006 +0100 11633:000aa9510e55 (XEN) Command line: xen-3.0.3.gz dom0_mem=1024M console=com2,vga com2=19200,8n1 (XEN) Physical RAM map: (XEN) 0000000000000000 - 000000000009fc00 (usable) (XEN) 000000000009fc00 - 00000000000a0000 (reserved) (XEN) 00000000000e8000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000bfff0000 (usable) (XEN) 00000000bfff0000 - 00000000bffff000 (ACPI data) (XEN) 00000000bffff000 - 00000000c0000000 (ACPI NVS) (XEN) 00000000ff780000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000030e000000 (usable) (XEN) System RAM: 11487MB (11763260kB) (XEN) Xen heap: 13MB (14108kB) (XEN) found SMP MP-table at 000ff780 (XEN) DMI 2.3 present. (XEN) Using APIC driver default (XEN) ACPI: RSDP (v002 ACPIAM ) @ 0x00000000000f9870 (XEN) ACPI: XSDT (v001 A M I OEMXSDT 0x05000622 MSFT 0x00000097) @ 0x00000000bfff0100 (XEN) ACPI: FADT (v003 A M I OEMFACP 0x05000622 MSFT 0x00000097) @ 0x00000000bfff0281 (XEN) ACPI: MADT (v001 A M I OEMAPIC 0x05000622 MSFT 0x00000097) @ 0x00000000bfff0380 (XEN) ACPI: SRAT (v001 A M I OEMSRAT 0x05000622 MSFT 0x00000097) @ 0x00000000bfff38d0 (XEN) ACPI: DSDT (v001 H8DA8 H8DA8010 0x00000000 INTL 0x02002026) @ 0x0000000000000000 (XEN) ACPI: Local APIC address 0xfee00000 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) (XEN) Processor #0 15:1 APIC version 16 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) (XEN) Processor #1 15:1 APIC version 16 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) (XEN) Processor #2 15:1 APIC version 16 (XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) (XEN) Processor #3 15:1 APIC version 16 (XEN) ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-23 (XEN) ACPI: IOAPIC (id[0x05] address[0xfebfe000] gsi_base[24]) (XEN) IOAPIC[1]: apic_id 5, version 17, address 0xfebfe000, GSI 24-27 (XEN) ACPI: IOAPIC (id[0x06] address[0xfebff000] gsi_base[28]) (XEN) IOAPIC[2]: apic_id 6, version 17, address 0xfebff000, GSI 28-31 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) Enabling APIC mode: Flat. Using 3 I/O APICs (XEN) Using ACPI (MADT) for SMP configuration information (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 1994.337 MHz processor. (XEN) CPU0: AMD Flush Filter disabled (XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 1024K (64 bytes/line) (XEN) CPU 0(2) -> Core 0 (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#0. (XEN) CPU0: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02 (XEN) Booting processor 1/1 eip 90000 (XEN) Initializing CPU#1 (XEN) CPU1: AMD Flush Filter disabled (XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 1024K (64 bytes/line) (XEN) CPU 1(2) -> Core 1 (XEN) AMD: Disabling C1 Clock Ramping Node #0 (XEN) AMD: Disabling C1 Clock Ramping Node #1 (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#1. (XEN) CPU1: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02 (XEN) Booting processor 2/2 eip 90000 (XEN) Initializing CPU#2 (XEN) CPU2: AMD Flush Filter disabled (XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 1024K (64 bytes/line) (XEN) CPU 2(2) -> Core 0 (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#2. (XEN) CPU2: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02 (XEN) Booting processor 3/3 eip 90000 (XEN) Initializing CPU#3 (XEN) CPU3: AMD Flush Filter disabled (XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 1024K (64 bytes/line) (XEN) CPU 3(2) -> Core 1 (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#3. (XEN) CPU3: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02 (XEN) Total of 4 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=0 pin2=0 (XEN) checking TSC synchronization across 4 CPUs: passed. (XEN) Platform timer is 1.193MHz PIT (XEN) Brought up 4 CPUs (XEN) Machine check exception polling timer started. (XEN) *** LOADING DOMAIN 0 *** (XEN) Domain 0 kernel supports features = { 0000001f }. (XEN) Domain 0 kernel requires features = { 00000000 }. (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 000000000b000000->000000000c000000 (258048 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff80200000->ffffffff807cb448 (XEN) Init. ramdisk: ffffffff807cc000->ffffffff807cc000 (XEN) Phys-Mach map: ffffffff807cc000->ffffffff809cc000 (XEN) Start info: ffffffff809cc000->ffffffff809cc49c (XEN) Page tables: ffffffff809cd000->ffffffff809d6000 (XEN) Boot stack: ffffffff809d6000->ffffffff809d7000 (XEN) TOTAL: ffffffff80000000->ffffffff80c00000 (XEN) ENTRY ADDRESS: ffffffff80200000 (XEN) Dom0 has maximum 4 VCPUs (XEN) Scrubbing Free RAM: ..............................................................................................................................done. (XEN) Xen trace buffers: disabled (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2006-Oct-04 12:39 UTC
RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197
Am Mittwoch, den 04.10.2006, 02:19 +0200 schrieb Christophe Saout: Update: When running on 4GB of total memory instead of 12GB, everything is just fine. (the three virtual machines, Dom0 + 2 x DomU are assigned 1GB of memory each, in both test runs). Does that help? If you have any ideas where I should do more debugging, please tell me. We would really like to get this machine going.> Oct 3 23:27:28 tuek BUG: soft lockup detected on CPU#0! > Oct 3 23:27:28 tuek CPU 0: > Oct 3 23:27:28 tuek Modules linked in: nfsd exportfs > Oct 3 23:27:28 tuek Pid: 3988, comm: gmetad Not tainted 2.6.16.29-xen-xenU #2 > Oct 3 23:27:28 tuek RIP: e030:[<ffffffff8010722a>] <ffffffff8010722a>{hypercall_page+554} > Oct 3 23:27:28 tuek RSP: e02b:ffff88003e32f9e0 EFLAGS: 00000246 > Oct 3 23:27:28 tuek RAX: 0000000000030000 RBX: ffff8800017ea448 RCX: ffffffff8010722a > Oct 3 23:27:28 tuek RDX: ffffffffff5fd000 RSI: 0000000000000000 RDI: 0000000000000000 > Oct 3 23:27:28 tuek RBP: ffff88003e32f9f8 R08: 0000000000000000 R09: 0000000000000000 > Oct 3 23:27:28 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000001000 > Oct 3 23:27:28 tuek R13: ffff88003e32fd38 R14: 0000000000005000 R15: 0000000000000002 > Oct 3 23:27:28 tuek FS: 00002aeaaa684b00(0000) GS:ffffffff804bf000(0000) knlGS:0000000000000000 > Oct 3 23:27:28 tuek CS: e033 DS: 0000 ES: 0000 > Oct 3 23:27:28 tuek > Oct 3 23:27:28 tuek Call Trace: <ffffffff802dc47e>{force_evtchn_callback+14} > Oct 3 23:27:28 tuek <ffffffff803d4ab6>{do_page_fault+214} <ffffffff8010b6fb>{error_exit+0} > Oct 3 23:27:28 tuek <ffffffff8010b6fb>{error_exit+0} <ffffffff8014f50e>{file_read_actor+62} > Oct 3 23:27:28 tuek <ffffffff8014f57c>{file_read_actor+172} <ffffffff8014d19c>{do_generic_mapping_read+412} > Oct 3 23:27:28 tuek <ffffffff8014f4d0>{file_read_actor+0} <ffffffff8014dce8>{__generic_file_aio_read+424} > Oct 3 23:27:28 tuek <ffffffff8014dd98>{generic_file_aio_read+56} <ffffffff801f8f51>{nfs_file_read+129} > Oct 3 23:27:28 tuek <ffffffff80172dd0>{do_sync_read+240} <ffffffff80161981>{vma_link+129} > Oct 3 23:27:28 tuek <ffffffff80140500>{autoremove_wake_function+0} <ffffffff80162b02>{do_mmap_pgoff+1458} > Oct 3 23:27:28 tuek <ffffffff8017381b>{vfs_read+187} <ffffffff80173ce0>{sys_read+80} > Oct 3 23:27:28 tuek <ffffffff8010afbe>{system_call+134} <ffffffff8010af38>{system_call+0} > > Oct 3 23:27:52 tuek Bad page state in process ''bash'' > Oct 3 23:27:52 tuek page:ffff880001c72bc8 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:1 > Oct 3 23:27:52 tuek Trying to fix it up, but a reboot is needed > Oct 3 23:27:52 tuek Backtrace: > Oct 3 23:27:52 tuek > Oct 3 23:27:52 tuek Call Trace: <ffffffff801512ad>{bad_page+93} <ffffffff80151d57>{get_page_from_freelist+775} > Oct 3 23:27:52 tuek <ffffffff80151f1d>{__alloc_pages+157} <ffffffff80152249>{get_zeroed_page+73} > Oct 3 23:27:52 tuek <ffffffff80158cf4>{__pmd_alloc+36} <ffffffff8015e55e>{copy_page_range+1262} > Oct 3 23:27:52 tuek <ffffffff802a6bea>{rb_insert_color+250} <ffffffff80127cb7>{copy_process+3079} > Oct 3 23:27:52 tuek <ffffffff80128c8e>{do_fork+238} <ffffffff801710d6>{fd_install+54} > Oct 3 23:27:52 tuek <ffffffff80134e8c>{sigprocmask+220} <ffffffff8010afbe>{system_call+134} > Oct 3 23:27:52 tuek <ffffffff801094b3>{sys_clone+35} <ffffffff8010b3e9>{ptregscall_common+61} > > Oct 3 23:27:52 tuek ----------- [cut here ] --------- [please bite here ] --------- > Oct 3 23:27:52 tuek Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:198 > Oct 3 23:27:52 tuek invalid opcode: 0000 [1] SMP > Oct 3 23:27:52 tuek CPU 3 > Oct 3 23:27:52 tuek Modules linked in: nfsd exportfs > Oct 3 23:27:52 tuek Pid: 4617, comm: bash Tainted: G B 2.6.16.29-xen-xenU #2 > Oct 3 23:27:52 tuek RIP: e030:[<ffffffff80117cb5>] <ffffffff80117cb5>{xen_pgd_pin+85} > Oct 3 23:27:52 tuek RSP: e02b:ffff880038ed9d58 EFLAGS: 00010282 > Oct 3 23:27:52 tuek RAX: 00000000ffffffea RBX: ffff880000e098c0 RCX: 000000000001dc48 > Oct 3 23:27:52 tuek RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880038ed9d58 > Oct 3 23:27:52 tuek RBP: ffff880038ed9d78 R08: ffff880038e7fff8 R09: ffff880038e7fff8 > Oct 3 23:27:52 tuek R10: 0000000000007ff0 R11: ffff880002d39008 R12: 0000000000000000 > Oct 3 23:27:52 tuek R13: ffff8800006383c0 R14: 0000000001200011 R15: ffff8800006383c0 > Oct 3 23:27:52 tuek FS: 00002afecc63ae60(0000) GS:ffffffff804bf180(0000) knlGS:0000000000000000 > Oct 3 23:27:52 tuek CS: e033 DS: 0000 ES: 0000 > Oct 3 23:27:52 tuek Process bash (pid: 4617, threadinfo ffff880038ed8000, task ffff88003f9e0180) > Oct 3 23:27:52 tuek Stack: 0000000000000003 00000000001b3aa7 0000000001200011 ffff880002d39008 > Oct 3 23:27:52 tuek ffff880038ed9d98 ffffffff80117543 0000000000000000 ffff88003ca4ea28 > Oct 3 23:27:52 tuek ffff880038ed9da8 ffffffff801175f2 > Oct 3 23:27:52 tuek Call Trace: <ffffffff80117543>{mm_pin+387} <ffffffff801175f2>{_arch_dup_mmap+18} > Oct 3 23:27:52 tuek <ffffffff80127cf6>{copy_process+3142} <ffffffff80128c8e>{do_fork+238} > Oct 3 23:27:52 tuek <ffffffff801710d6>{fd_install+54} <ffffffff80134e8c>{sigprocmask+220} > Oct 3 23:27:52 tuek <ffffffff8010afbe>{system_call+134} <ffffffff801094b3>{sys_clone+35} > Oct 3 23:27:52 tuek <ffffffff8010b3e9>{ptregscall_common+61} > Oct 3 23:27:52 tuek > Oct 3 23:27:52 tuek Code: 0f 0b 68 38 d7 3f 80 c2 c6 00 90 c9 c3 0f 1f 80 00 00 00 00 > Oct 3 23:27:52 tuek RIP <ffffffff80117cb5>{xen_pgd_pin+85} RSP <ffff880038ed9d58>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel