thr3ads.net - Xen devel - [Xen-devel] Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:197 [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Christophe Saout

2006-Oct-01 20:01 UTC

[Xen-devel] Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:197

Hello list,

I just got this ominous bug on my machine, that has already been seen
several times:

http://lists.xensource.com/archives/html/xen-devel/2006-01/msg00180.html

The machine is very similar, it''s a machine with two dual-core
opterons,
running one of the latest xen-3.0.3-unstable (20060926 hypervisor, and a
vanilla 2.6.18 + xen patch from Fedora from 20060915).

This machine was running since yesterday and I just did some compiling.

1 20:53:48 waff ----------- [cut here ] --------- [please bite here ] ---------
1 20:53:48 waff Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:197
1 20:53:48 waff invalid opcode: 0000 [1] SMP 
1 20:53:48 waff CPU 3 
1 20:53:48 waff Modules linked in: xt_NOTRACK iptable_raw iptable_mangle xt_MARK
ipt_MASQUERADE iptable_nat xt_physdev ipt
1 20:53:48 waff Pid: 31297, comm: ebuild.sh Not tainted 2.6.18-cs1-xen0 #1
1 20:53:48 waff RIP: e030:[<ffffffff80285c75>]  [<ffffffff80285c75>]
xen_pgd_pin+0x55/0x70
1 20:53:48 waff RSP: e02b:ffff8800210e9d88  EFLAGS: 00010282
1 20:53:48 waff RAX: 00000000ffffffea RBX: ffff88003ea1e1c0 RCX:
0000000000034d18
1 20:53:48 waff RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff8800210e9d88
1 20:53:48 waff RBP: ffff8800210e9da8 R08: ffff88001c05cff8 R09:
ffff88001c05cff8
1 20:53:48 waff R10: 0000000000007ff0 R11: ffff88000ecbcff8 R12:
0000000000000000
1 20:53:48 waff R13: ffff8800000ca840 R14: 0000000001200011 R15:
ffff8800000ca840
1 20:53:48 waff FS:  00002b3887c7ce30(0000) GS:ffffffff80773180(0000)
knlGS:0000000000000000
1 20:53:48 waff CS:  e033 DS: 0000 ES: 0000
1 20:53:48 waff Process ebuild.sh (pid: 31297, threadinfo ffff8800210e8000, task
ffff88001f331080)
1 20:53:48 waff Stack:  0000000000000003  0000000000098718  0000000001200011 
ffff88000ecbcff8
1 20:53:48 waff ffff8800210e9dc8  ffffffff80285573  0000000000000000 
ffff88000c1263d8
1 20:53:48 waff ffff8800210e9dd8  ffffffff80285622 
1 20:53:48 waff Call Trace:
1 20:53:48 waff [<ffffffff80285573>] mm_pin+0x183/0x220
1 20:53:48 waff [<ffffffff80285622>] _arch_dup_mmap+0x12/0x20
1 20:53:48 waff [<ffffffff802220b0>] copy_process+0xc50/0x1870
1 20:53:48 waff [<ffffffff8023680f>] do_fork+0xef/0x210
1 20:53:48 waff [<ffffffff8029c652>] recalc_sigpending+0x12/0x20
1 20:53:48 waff [<ffffffff8022005d>] sigprocmask+0xfd/0x110
1 20:53:48 waff [<ffffffff80269662>] system_call+0x86/0x8b
1 20:53:48 waff [<ffffffff802767c3>] sys_clone+0x23/0x30
1 20:53:48 waff [<ffffffff80269a71>] ptregscall_common+0x3d/0x64
1 20:53:48 waff 
1 20:53:48 waff 
1 20:53:48 waff Code: 0f 0b 68 b8 76 5a 80 c2 c5 00 90 c9 c3 00 00 00 00 00 00
00
1 20:53:48 waff RIP  [<ffffffff80285c75>] xen_pgd_pin+0x55/0x70
1 20:53:48 waff RSP <ffff8800210e9d88>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Oct-01 20:09 UTC

head link

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

> I just got this ominous bug on my machine, that has already been seen
> several times:
> 
>http://lists.xensource.com/archives/html/xen-devel/2006-01/msg00180.html

That''s an old issue, not relevant on 3.0.3.
> The machine is very similar, it''s a machine with two dual-core
opterons,> running one of the latest xen-3.0.3-unstable (20060926 hypervisor, and
a> vanilla 2.6.18 + xen patch from Fedora from 20060915).
Can you repro using the 2.6.16 kernel that came with 3.0.3 rather than
the Fedora one? I suspect not.

Ian
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christophe Saout

2006-Oct-03 21:39 UTC

head link

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

Am Sonntag, den 01.10.2006, 21:09 +0100 schrieb Ian Pratt:
> That''s an old issue, not relevant on 3.0.3.
Well, it turns out that was only one way it could crash. I was able to
reproduces this several times. Most of the time I got a "bad page
state"
followed by hitting a BUG in rmap.c, or things like that. Then, most of
the time one or two CPUs would lock up, and somewhat later the whole
system.
> > The machine is very similar, it''s a machine with two
dual-core
> opterons,
> > running one of the latest xen-3.0.3-unstable (20060926 hypervisor, and
> a
> > vanilla 2.6.18 + xen patch from Fedora from 20060915).
> 
> Can you repro using the 2.6.16 kernel that came with 3.0.3 rather than
> the Fedora one? I suspect not.
Well, I can not reproduce these bugs, but the same test case is able to
kill the whole machine as well. CPU lockups on Dom0 or any DomU
(depending on where the load was) that spread to the other domains until
everything locks up. At some point the Dom0 even stops answering to
pings.

The only thing that still works is 3x Ctrl-A to get the message that the
serial console was switched, but even `h'' didn''t give the help
text
anymore.

I would like to think that this is a memory problem, but the machine is
brand new and survived memtest86. And as long as I wasn''t running
anything except Dom0, I was able to compile a whole Gentoo system four
hours, and once I started adding some DomU''s the problems show up
within
minutes.

The best way to reproduce this was to run a rsync on lots of files from
one DomU to another (via a bridge in Dom0, filesystems on exported
physical block devices) and start a compile job in any of the machines.
After 5-10 minutes boom.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christophe Saout

2006-Oct-04 00:19 UTC

head link

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

Am Dienstag, den 03.10.2006, 23:39 +0200 schrieb Christophe Saout:
> > Can you repro using the 2.6.16 kernel that came with 3.0.3 rather than
> > the Fedora one? I suspect not.
> 
> Well, I can not reproduce these bugs, but the same test case is able to
> kill the whole machine as well. CPU lockups on Dom0 or any DomU
> (depending on where the load was) that spread to the other domains until
> everything locks up. At some point the Dom0 even stops answering to
> pings.
Well, with the 2.6.16.29-based Xen, I just noticed that I was getting
similar "bad page state" and weird occurences as well, just a bit more
down in the list of kernel messages. Notice that there are soft lockups
that seem to go away, turn up again and then there''s a bad page state
followed by other nasty stuff. I''m not sure how it works exactly, but
they always seem to be stuck in some memory-management related
hypervisor calls.

As stated before, without running any DomU''s, everything is just fine,
Dom0 can run compile jobs for hours. But running a DomU with network and
disk I/O causes the same types of kernel BUGs/lockups not only in an
DomU but also in the Dom0 when causing some load there.

Here''s an excerpt of a domU log when it happens. I always see some
combination of these before the system goes down entirely (assuming it
doesn''t do that so fast that I don''t see anything before
everything
locks up):

Oct  3 23:24:26 tuek BUG: soft lockup detected on CPU#0!
Oct  3 23:24:26 tuek CPU 0:
Oct  3 23:24:26 tuek Modules linked in: nfsd exportfs
Oct  3 23:24:26 tuek Pid: 3998, comm: gmetad Not tainted 2.6.16.29-xen-xenU #2
Oct  3 23:24:26 tuek RIP: e030:[<ffffffff80107348>]
<ffffffff80107348>{hypercall_page+840}
Oct  3 23:24:26 tuek RSP: e02b:ffff88003db47900  EFLAGS: 00000246
Oct  3 23:24:26 tuek RAX: 000000000000001a RBX: ffff88000007cb78 RCX:
ffffffff8010734a
Oct  3 23:24:26 tuek RDX: 0000000000000000 RSI: 0000000080000001 RDI:
ffff88003db47918
Oct  3 23:24:26 tuek RBP: ffff88003db47938 R08: 0000000000000001 R09:
ffffffff80453440
Oct  3 23:24:26 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12:
80000001aa074167
Oct  3 23:24:26 tuek R13: ffff880003b59ce0 R14: 00002aaaaef9c000 R15:
0000000000000ce0
Oct  3 23:24:26 tuek FS:  00002aea16c0ffe0(0063) GS:ffffffff804bf000(0000)
knlGS:0000000000000000
Oct  3 23:24:26 tuek CS:  e033 DS: 0000 ES: 0000
Oct  3 23:24:26 tuek 
Oct  3 23:24:26 tuek Call Trace: <ffffffff80117c4a>{xen_invlpg_mask+58}
<ffffffff801123a3>{flush_tlb_page+19}
Oct  3 23:24:26 tuek <ffffffff8015c479>{__handle_mm_fault+3753}
<ffffffff8010722a>{hypercall_page+554}
Oct  3 23:24:26 tuek <ffffffff803d57a7>{do_page_fault+3527}
<ffffffff8010b6fb>{error_exit+0}
Oct  3 23:24:26 tuek <ffffffff8010b6fb>{error_exit+0}
<ffffffff8014f50e>{file_read_actor+62}
Oct  3 23:24:26 tuek <ffffffff8014f57c>{file_read_actor+172}
<ffffffff8014d19c>{do_generic_mapping_read+412}
Oct  3 23:24:26 tuek <ffffffff8014f4d0>{file_read_actor+0}
<ffffffff8014dce8>{__generic_file_aio_read+424}
Oct  3 23:24:26 tuek <ffffffff8014dd98>{generic_file_aio_read+56}
<ffffffff801f8f51>{nfs_file_read+129}
Oct  3 23:24:26 tuek <ffffffff80172dd0>{do_sync_read+240}
<ffffffff80161981>{vma_link+129}
Oct  3 23:24:26 tuek <ffffffff80140500>{autoremove_wake_function+0}
<ffffffff80162b02>{do_mmap_pgoff+1458}
Oct  3 23:24:26 tuek <ffffffff8017381b>{vfs_read+187}
<ffffffff80173ce0>{sys_read+80}
Oct  3 23:24:26 tuek <ffffffff8010afbe>{system_call+134}
<ffffffff8010af38>{system_call+0}

Oct  3 23:24:49 tuek BUG: soft lockup detected on CPU#1!
Oct  3 23:24:49 tuek CPU 1:
Oct  3 23:24:49 tuek Modules linked in: nfsd exportfs
Oct  3 23:24:49 tuek Pid: 3998, comm: gmetad Not tainted 2.6.16.29-xen-xenU #2
Oct  3 23:24:49 tuek RIP: e030:[<ffffffff8010722a>]
<ffffffff8010722a>{hypercall_page+554}
Oct  3 23:24:49 tuek RSP: e02b:ffff88003db479e0  EFLAGS: 00000246
Oct  3 23:24:49 tuek RAX: 0000000000030000 RBX: ffff880001df9b98 RCX:
ffffffff8010722a
Oct  3 23:24:49 tuek RDX: ffffffffff5fd000 RSI: 0000000000000000 RDI:
0000000000000000
Oct  3 23:24:49 tuek RBP: ffff88003db479f8 R08: 0000000000000000 R09:
0000000000000001
Oct  3 23:24:49 tuek R10: 0000000000000060 R11: 0000000000000246 R12:
0000000000000f18
Oct  3 23:24:49 tuek R13: ffff88003db47d38 R14: 0000000000006000 R15:
0000000000000002
Oct  3 23:24:49 tuek FS:  00002aea16c0ffe0(0063) GS:ffffffff804bf080(0000)
knlGS:0000000000000000
Oct  3 23:24:49 tuek CS:  e033 DS: 0000 ES: 0000
Oct  3 23:24:49 tuek Call Trace:
<ffffffff802dc47e>{force_evtchn_callback+14}
Oct  3 23:24:49 tuek <ffffffff803d4ab6>{do_page_fault+214}
<ffffffff8010b6fb>{error_exit+0}
Oct  3 23:24:49 tuek <ffffffff8010b6fb>{error_exit+0}
<ffffffff8010b6fb>{error_exit+0}
Oct  3 23:24:49 tuek <ffffffff8014f50e>{file_read_actor+62}
<ffffffff8014f57c>{file_read_actor+172}
Oct  3 23:24:49 tuek <ffffffff8014d19c>{do_generic_mapping_read+412}
<ffffffff8014f4d0>{file_read_actor+0}
Oct  3 23:24:49 tuek <ffffffff8014dce8>{__generic_file_aio_read+424}
<ffffffff8014dd98>{generic_file_aio_read+56}
Oct  3 23:24:49 tuek <ffffffff801f8f51>{nfs_file_read+129}
<ffffffff80172dd0>{do_sync_read+240}
Oct  3 23:24:49 tuek <ffffffff80161981>{vma_link+129}
<ffffffff80140500>{autoremove_wake_function+0}
Oct  3 23:24:49 tuek <ffffffff80162b02>{do_mmap_pgoff+1458}
<ffffffff8017381b>{vfs_read+187}
Oct  3 23:24:49 tuek <ffffffff80173ce0>{sys_read+80}
<ffffffff8010afbe>{system_call+134}
Oct  3 23:24:49 tuek <ffffffff8010af38>{system_call+0}

Oct  3 23:27:28 tuek BUG: soft lockup detected on CPU#0!
Oct  3 23:27:28 tuek CPU 0:
Oct  3 23:27:28 tuek Modules linked in: nfsd exportfs
Oct  3 23:27:28 tuek Pid: 3988, comm: gmetad Not tainted 2.6.16.29-xen-xenU #2
Oct  3 23:27:28 tuek RIP: e030:[<ffffffff8010722a>]
<ffffffff8010722a>{hypercall_page+554}
Oct  3 23:27:28 tuek RSP: e02b:ffff88003e32f9e0  EFLAGS: 00000246
Oct  3 23:27:28 tuek RAX: 0000000000030000 RBX: ffff8800017ea448 RCX:
ffffffff8010722a
Oct  3 23:27:28 tuek RDX: ffffffffff5fd000 RSI: 0000000000000000 RDI:
0000000000000000
Oct  3 23:27:28 tuek RBP: ffff88003e32f9f8 R08: 0000000000000000 R09:
0000000000000000
Oct  3 23:27:28 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12:
0000000000001000
Oct  3 23:27:28 tuek R13: ffff88003e32fd38 R14: 0000000000005000 R15:
0000000000000002
Oct  3 23:27:28 tuek FS:  00002aeaaa684b00(0000) GS:ffffffff804bf000(0000)
knlGS:0000000000000000
Oct  3 23:27:28 tuek CS:  e033 DS: 0000 ES: 0000
Oct  3 23:27:28 tuek 
Oct  3 23:27:28 tuek Call Trace:
<ffffffff802dc47e>{force_evtchn_callback+14}
Oct  3 23:27:28 tuek <ffffffff803d4ab6>{do_page_fault+214}
<ffffffff8010b6fb>{error_exit+0}
Oct  3 23:27:28 tuek <ffffffff8010b6fb>{error_exit+0}
<ffffffff8014f50e>{file_read_actor+62}
Oct  3 23:27:28 tuek <ffffffff8014f57c>{file_read_actor+172}
<ffffffff8014d19c>{do_generic_mapping_read+412}
Oct  3 23:27:28 tuek <ffffffff8014f4d0>{file_read_actor+0}
<ffffffff8014dce8>{__generic_file_aio_read+424}
Oct  3 23:27:28 tuek <ffffffff8014dd98>{generic_file_aio_read+56}
<ffffffff801f8f51>{nfs_file_read+129}
Oct  3 23:27:28 tuek <ffffffff80172dd0>{do_sync_read+240}
<ffffffff80161981>{vma_link+129}
Oct  3 23:27:28 tuek <ffffffff80140500>{autoremove_wake_function+0}
<ffffffff80162b02>{do_mmap_pgoff+1458}
Oct  3 23:27:28 tuek <ffffffff8017381b>{vfs_read+187}
<ffffffff80173ce0>{sys_read+80}
Oct  3 23:27:28 tuek <ffffffff8010afbe>{system_call+134}
<ffffffff8010af38>{system_call+0}

Oct  3 23:27:52 tuek Bad page state in process ''bash''
Oct  3 23:27:52 tuek page:ffff880001c72bc8 flags:0x0000000000000000
mapping:0000000000000000 mapcount:1 count:1
Oct  3 23:27:52 tuek Trying to fix it up, but a reboot is needed
Oct  3 23:27:52 tuek Backtrace:
Oct  3 23:27:52 tuek 
Oct  3 23:27:52 tuek Call Trace: <ffffffff801512ad>{bad_page+93}
<ffffffff80151d57>{get_page_from_freelist+775}
Oct  3 23:27:52 tuek <ffffffff80151f1d>{__alloc_pages+157}
<ffffffff80152249>{get_zeroed_page+73}
Oct  3 23:27:52 tuek <ffffffff80158cf4>{__pmd_alloc+36}
<ffffffff8015e55e>{copy_page_range+1262}
Oct  3 23:27:52 tuek <ffffffff802a6bea>{rb_insert_color+250}
<ffffffff80127cb7>{copy_process+3079}
Oct  3 23:27:52 tuek <ffffffff80128c8e>{do_fork+238}
<ffffffff801710d6>{fd_install+54}
Oct  3 23:27:52 tuek <ffffffff80134e8c>{sigprocmask+220}
<ffffffff8010afbe>{system_call+134}
Oct  3 23:27:52 tuek <ffffffff801094b3>{sys_clone+35}
<ffffffff8010b3e9>{ptregscall_common+61}

Oct  3 23:27:52 tuek ----------- [cut here ] --------- [please bite here ]
---------
Oct  3 23:27:52 tuek Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:198
Oct  3 23:27:52 tuek invalid opcode: 0000 [1] SMP 
Oct  3 23:27:52 tuek CPU 3 
Oct  3 23:27:52 tuek Modules linked in: nfsd exportfs
Oct  3 23:27:52 tuek Pid: 4617, comm: bash Tainted: G    B 2.6.16.29-xen-xenU #2
Oct  3 23:27:52 tuek RIP: e030:[<ffffffff80117cb5>]
<ffffffff80117cb5>{xen_pgd_pin+85}
Oct  3 23:27:52 tuek RSP: e02b:ffff880038ed9d58  EFLAGS: 00010282
Oct  3 23:27:52 tuek RAX: 00000000ffffffea RBX: ffff880000e098c0 RCX:
000000000001dc48
Oct  3 23:27:52 tuek RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff880038ed9d58
Oct  3 23:27:52 tuek RBP: ffff880038ed9d78 R08: ffff880038e7fff8 R09:
ffff880038e7fff8
Oct  3 23:27:52 tuek R10: 0000000000007ff0 R11: ffff880002d39008 R12:
0000000000000000
Oct  3 23:27:52 tuek R13: ffff8800006383c0 R14: 0000000001200011 R15:
ffff8800006383c0
Oct  3 23:27:52 tuek FS:  00002afecc63ae60(0000) GS:ffffffff804bf180(0000)
knlGS:0000000000000000
Oct  3 23:27:52 tuek CS:  e033 DS: 0000 ES: 0000
Oct  3 23:27:52 tuek Process bash (pid: 4617, threadinfo ffff880038ed8000, task
ffff88003f9e0180)
Oct  3 23:27:52 tuek Stack: 0000000000000003 00000000001b3aa7 0000000001200011
ffff880002d39008
Oct  3 23:27:52 tuek ffff880038ed9d98 ffffffff80117543 0000000000000000
ffff88003ca4ea28
Oct  3 23:27:52 tuek ffff880038ed9da8 ffffffff801175f2 
Oct  3 23:27:52 tuek Call Trace: <ffffffff80117543>{mm_pin+387}
<ffffffff801175f2>{_arch_dup_mmap+18}
Oct  3 23:27:52 tuek <ffffffff80127cf6>{copy_process+3142}
<ffffffff80128c8e>{do_fork+238}
Oct  3 23:27:52 tuek <ffffffff801710d6>{fd_install+54}
<ffffffff80134e8c>{sigprocmask+220}
Oct  3 23:27:52 tuek <ffffffff8010afbe>{system_call+134}
<ffffffff801094b3>{sys_clone+35}
Oct  3 23:27:52 tuek <ffffffff8010b3e9>{ptregscall_common+61}
Oct  3 23:27:52 tuek 
Oct  3 23:27:52 tuek Code: 0f 0b 68 38 d7 3f 80 c2 c6 00 90 c9 c3 0f 1f 80 00 00
00 00
Oct  3 23:27:52 tuek RIP <ffffffff80117cb5>{xen_pgd_pin+85} RSP
<ffff880038ed9d58>

Oct  3 23:27:55 tuek <3>BUG: soft lockup detected on CPU#0!
Oct  3 23:27:55 tuek CPU 0:
Oct  3 23:27:55 tuek Modules linked in: nfsd exportfs
Oct  3 23:27:55 tuek Pid: 3998, comm: gmetad Tainted: G    B 2.6.16.29-xen-xenU
#2
Oct  3 23:27:55 tuek RIP: e030:[<ffffffff8010734a>]
<ffffffff8010734a>{hypercall_page+842}
Oct  3 23:27:55 tuek RSP: e02b:ffff88003db47900  EFLAGS: 00000246
Oct  3 23:27:55 tuek RAX: 0000000000000000 RBX: ffff88000007cb78 RCX:
ffffffff8010734a
Oct  3 23:27:55 tuek RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff88003db47918
Oct  3 23:27:55 tuek RBP: ffff88003db47938 R08: 0000000000000001 R09:
0000000000000000
Oct  3 23:27:55 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12:
80000001c8bac167
Oct  3 23:27:55 tuek R13: ffff88003a80ab68 R14: 00002aaaaf16d000 R15:
0000000000000b68
Oct  3 23:27:55 tuek FS:  0000000000000000(0063) GS:ffffffff804bf000(0000)
knlGS:0000000000000000
Oct  3 23:27:55 tuek CS:  e033 DS: 0000 ES: 0000
Oct  3 23:27:55 tuek 
Oct  3 23:27:55 tuek Call Trace: <ffffffff80117c4a>{xen_invlpg_mask+58}
<ffffffff8010b6fb>{error_exit+0}
Oct  3 23:27:55 tuek <ffffffff801123a3>{flush_tlb_page+19}
<ffffffff8015c479>{__handle_mm_fault+3753}
Oct  3 23:27:55 tuek <ffffffff8010722a>{hypercall_page+554}
<ffffffff803d57a7>{do_page_fault+3527}
Oct  3 23:27:55 tuek <ffffffff8010b6fb>{error_exit+0}
<ffffffff8010b6fb>{error_exit+0}
Oct  3 23:27:55 tuek <ffffffff8014f50e>{file_read_actor+62}
<ffffffff8014f57c>{file_read_actor+172}
Oct  3 23:27:55 tuek <ffffffff8014d19c>{do_generic_mapping_read+412}
<ffffffff8014f4d0>{file_read_actor+0}
Oct  3 23:27:55 tuek <ffffffff8014dce8>{__generic_file_aio_read+424}
<ffffffff8014dd98>{generic_file_aio_read+56}
Oct  3 23:27:55 tuek <ffffffff801f8f51>{nfs_file_read+129}
<ffffffff80172dd0>{do_sync_read+240}
Oct  3 23:27:55 tuek <ffffffff80161981>{vma_link+129}
<ffffffff80140500>{autoremove_wake_function+0}
Oct  3 23:27:55 tuek <ffffffff80162b02>{do_mmap_pgoff+1458}
<ffffffff8017381b>{vfs_read+187}
Oct  3 23:27:55 tuek <ffffffff80173ce0>{sys_read+80}
<ffffffff8010afbe>{system_call+134}
Oct  3 23:27:55 tuek <ffffffff8010af38>{system_call+0}

Information about the system:

 __  __            _____  ___                     _        _     _      
 \ \/ /___ _ __   |___ / / _ \    _   _ _ __  ___| |_ __ _| |__ | | ___ 
  \  // _ \ ''_ \    |_ \| | | |__| | | | ''_ \/ __| __/ _` |
''_ \| |/ _ \
  /  \  __/ | | |  ___) | |_| |__| |_| | | | \__ \ || (_| | |_) | |  __/
 /_/\_\___|_| |_| |____(_)___/    \__,_|_| |_|___/\__\__,_|_.__/|_|\___|
                                                                        
 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 3.0-unstable (chtephan@intern) (gcc-Version 4.1.1 (Gentoo Hardened
4.1.1-r1, pie-8.7.8)) Wed Oct  4 00:07:24 CEST 2006
 Latest ChangeSet: Wed Sep 27 14:30:36 2006 +0100 11633:000aa9510e55

(XEN) Command line: xen-3.0.3.gz dom0_mem=1024M console=com2,vga com2=19200,8n1 
(XEN) Physical RAM map:
(XEN)  0000000000000000 - 000000000009fc00 (usable)
(XEN)  000000000009fc00 - 00000000000a0000 (reserved)
(XEN)  00000000000e8000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000bfff0000 (usable)
(XEN)  00000000bfff0000 - 00000000bffff000 (ACPI data)
(XEN)  00000000bffff000 - 00000000c0000000 (ACPI NVS)
(XEN)  00000000ff780000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 000000030e000000 (usable)
(XEN) System RAM: 11487MB (11763260kB)
(XEN) Xen heap: 13MB (14108kB)
(XEN) found SMP MP-table at 000ff780
(XEN) DMI 2.3 present.
(XEN) Using APIC driver default
(XEN) ACPI: RSDP (v002 ACPIAM                                ) @
0x00000000000f9870
(XEN) ACPI: XSDT (v001 A M I  OEMXSDT  0x05000622 MSFT 0x00000097) @
0x00000000bfff0100
(XEN) ACPI: FADT (v003 A M I  OEMFACP  0x05000622 MSFT 0x00000097) @
0x00000000bfff0281
(XEN) ACPI: MADT (v001 A M I  OEMAPIC  0x05000622 MSFT 0x00000097) @
0x00000000bfff0380
(XEN) ACPI: SRAT (v001 A M I  OEMSRAT  0x05000622 MSFT 0x00000097) @
0x00000000bfff38d0
(XEN) ACPI: DSDT (v001  H8DA8 H8DA8010 0x00000000 INTL 0x02002026) @
0x0000000000000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 15:1 APIC version 16
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
(XEN) Processor #1 15:1 APIC version 16
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
(XEN) Processor #2 15:1 APIC version 16
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
(XEN) Processor #3 15:1 APIC version 16
(XEN) ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x05] address[0xfebfe000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 5, version 17, address 0xfebfe000, GSI 24-27
(XEN) ACPI: IOAPIC (id[0x06] address[0xfebff000] gsi_base[28])
(XEN) IOAPIC[2]: apic_id 6, version 17, address 0xfebff000, GSI 28-31
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 3 I/O APICs
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Initializing CPU#0
(XEN) Detected 1994.337 MHz processor.
(XEN) CPU0: AMD Flush Filter disabled
(XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 1024K (64 bytes/line)
(XEN) CPU 0(2) -> Core 0
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
(XEN) Booting processor 1/1 eip 90000
(XEN) Initializing CPU#1
(XEN) CPU1: AMD Flush Filter disabled
(XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 1024K (64 bytes/line)
(XEN) CPU 1(2) -> Core 1
(XEN) AMD: Disabling C1 Clock Ramping Node #0
(XEN) AMD: Disabling C1 Clock Ramping Node #1
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#1.
(XEN) CPU1: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
(XEN) Booting processor 2/2 eip 90000
(XEN) Initializing CPU#2
(XEN) CPU2: AMD Flush Filter disabled
(XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 1024K (64 bytes/line)
(XEN) CPU 2(2) -> Core 0
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#2.
(XEN) CPU2: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
(XEN) Booting processor 3/3 eip 90000
(XEN) Initializing CPU#3
(XEN) CPU3: AMD Flush Filter disabled
(XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 1024K (64 bytes/line)
(XEN) CPU 3(2) -> Core 1
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#3.
(XEN) CPU3: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
(XEN) Total of 4 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=0 pin2=0
(XEN) checking TSC synchronization across 4 CPUs: passed.
(XEN) Platform timer is 1.193MHz PIT
(XEN) Brought up 4 CPUs
(XEN) Machine check exception polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Domain 0 kernel supports features = { 0000001f }.
(XEN) Domain 0 kernel requires features = { 00000000 }.
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   000000000b000000->000000000c000000 (258048 pages to be
allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff80200000->ffffffff807cb448
(XEN)  Init. ramdisk: ffffffff807cc000->ffffffff807cc000
(XEN)  Phys-Mach map: ffffffff807cc000->ffffffff809cc000
(XEN)  Start info:    ffffffff809cc000->ffffffff809cc49c
(XEN)  Page tables:   ffffffff809cd000->ffffffff809d6000
(XEN)  Boot stack:    ffffffff809d6000->ffffffff809d7000
(XEN)  TOTAL:         ffffffff80000000->ffffffff80c00000
(XEN)  ENTRY ADDRESS: ffffffff80200000
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Scrubbing Free RAM:
..............................................................................................................................done.
(XEN) Xen trace buffers: disabled
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times
to switch input to Xen).



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christophe Saout

2006-Oct-04 12:39 UTC

head link

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

Am Mittwoch, den 04.10.2006, 02:19 +0200 schrieb Christophe Saout:

Update:

When running on 4GB of total memory instead of 12GB, everything is just
fine. (the three virtual machines, Dom0 + 2 x DomU are assigned 1GB of
memory each, in both test runs). Does that help?

If you have any ideas where I should do more debugging, please tell me.
We would really like to get this machine going.
> Oct  3 23:27:28 tuek BUG: soft lockup detected on CPU#0!
> Oct  3 23:27:28 tuek CPU 0:
> Oct  3 23:27:28 tuek Modules linked in: nfsd exportfs
> Oct  3 23:27:28 tuek Pid: 3988, comm: gmetad Not tainted 2.6.16.29-xen-xenU
#2
> Oct  3 23:27:28 tuek RIP: e030:[<ffffffff8010722a>]
<ffffffff8010722a>{hypercall_page+554}
> Oct  3 23:27:28 tuek RSP: e02b:ffff88003e32f9e0  EFLAGS: 00000246
> Oct  3 23:27:28 tuek RAX: 0000000000030000 RBX: ffff8800017ea448 RCX:
ffffffff8010722a
> Oct  3 23:27:28 tuek RDX: ffffffffff5fd000 RSI: 0000000000000000 RDI:
0000000000000000
> Oct  3 23:27:28 tuek RBP: ffff88003e32f9f8 R08: 0000000000000000 R09:
0000000000000000
> Oct  3 23:27:28 tuek R10: 0000000000007ff0 R11: 0000000000000246 R12:
0000000000001000
> Oct  3 23:27:28 tuek R13: ffff88003e32fd38 R14: 0000000000005000 R15:
0000000000000002
> Oct  3 23:27:28 tuek FS:  00002aeaaa684b00(0000) GS:ffffffff804bf000(0000)
knlGS:0000000000000000
> Oct  3 23:27:28 tuek CS:  e033 DS: 0000 ES: 0000
> Oct  3 23:27:28 tuek 
> Oct  3 23:27:28 tuek Call Trace:
<ffffffff802dc47e>{force_evtchn_callback+14}
> Oct  3 23:27:28 tuek <ffffffff803d4ab6>{do_page_fault+214}
<ffffffff8010b6fb>{error_exit+0}
> Oct  3 23:27:28 tuek <ffffffff8010b6fb>{error_exit+0}
<ffffffff8014f50e>{file_read_actor+62}
> Oct  3 23:27:28 tuek <ffffffff8014f57c>{file_read_actor+172}
<ffffffff8014d19c>{do_generic_mapping_read+412}
> Oct  3 23:27:28 tuek <ffffffff8014f4d0>{file_read_actor+0}
<ffffffff8014dce8>{__generic_file_aio_read+424}
> Oct  3 23:27:28 tuek <ffffffff8014dd98>{generic_file_aio_read+56}
<ffffffff801f8f51>{nfs_file_read+129}
> Oct  3 23:27:28 tuek <ffffffff80172dd0>{do_sync_read+240}
<ffffffff80161981>{vma_link+129}
> Oct  3 23:27:28 tuek <ffffffff80140500>{autoremove_wake_function+0}
<ffffffff80162b02>{do_mmap_pgoff+1458}
> Oct  3 23:27:28 tuek <ffffffff8017381b>{vfs_read+187}
<ffffffff80173ce0>{sys_read+80}
> Oct  3 23:27:28 tuek <ffffffff8010afbe>{system_call+134}
<ffffffff8010af38>{system_call+0}
> 
> Oct  3 23:27:52 tuek Bad page state in process ''bash''
> Oct  3 23:27:52 tuek page:ffff880001c72bc8 flags:0x0000000000000000
mapping:0000000000000000 mapcount:1 count:1
> Oct  3 23:27:52 tuek Trying to fix it up, but a reboot is needed
> Oct  3 23:27:52 tuek Backtrace:
> Oct  3 23:27:52 tuek 
> Oct  3 23:27:52 tuek Call Trace: <ffffffff801512ad>{bad_page+93}
<ffffffff80151d57>{get_page_from_freelist+775}
> Oct  3 23:27:52 tuek <ffffffff80151f1d>{__alloc_pages+157}
<ffffffff80152249>{get_zeroed_page+73}
> Oct  3 23:27:52 tuek <ffffffff80158cf4>{__pmd_alloc+36}
<ffffffff8015e55e>{copy_page_range+1262}
> Oct  3 23:27:52 tuek <ffffffff802a6bea>{rb_insert_color+250}
<ffffffff80127cb7>{copy_process+3079}
> Oct  3 23:27:52 tuek <ffffffff80128c8e>{do_fork+238}
<ffffffff801710d6>{fd_install+54}
> Oct  3 23:27:52 tuek <ffffffff80134e8c>{sigprocmask+220}
<ffffffff8010afbe>{system_call+134}
> Oct  3 23:27:52 tuek <ffffffff801094b3>{sys_clone+35}
<ffffffff8010b3e9>{ptregscall_common+61}
> 
> Oct  3 23:27:52 tuek ----------- [cut here ] --------- [please bite here ]
---------
> Oct  3 23:27:52 tuek Kernel BUG at
arch/x86_64/mm/../../i386/mm/hypervisor.c:198
> Oct  3 23:27:52 tuek invalid opcode: 0000 [1] SMP 
> Oct  3 23:27:52 tuek CPU 3 
> Oct  3 23:27:52 tuek Modules linked in: nfsd exportfs
> Oct  3 23:27:52 tuek Pid: 4617, comm: bash Tainted: G    B
2.6.16.29-xen-xenU #2
> Oct  3 23:27:52 tuek RIP: e030:[<ffffffff80117cb5>]
<ffffffff80117cb5>{xen_pgd_pin+85}
> Oct  3 23:27:52 tuek RSP: e02b:ffff880038ed9d58  EFLAGS: 00010282
> Oct  3 23:27:52 tuek RAX: 00000000ffffffea RBX: ffff880000e098c0 RCX:
000000000001dc48
> Oct  3 23:27:52 tuek RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff880038ed9d58
> Oct  3 23:27:52 tuek RBP: ffff880038ed9d78 R08: ffff880038e7fff8 R09:
ffff880038e7fff8
> Oct  3 23:27:52 tuek R10: 0000000000007ff0 R11: ffff880002d39008 R12:
0000000000000000
> Oct  3 23:27:52 tuek R13: ffff8800006383c0 R14: 0000000001200011 R15:
ffff8800006383c0
> Oct  3 23:27:52 tuek FS:  00002afecc63ae60(0000) GS:ffffffff804bf180(0000)
knlGS:0000000000000000
> Oct  3 23:27:52 tuek CS:  e033 DS: 0000 ES: 0000
> Oct  3 23:27:52 tuek Process bash (pid: 4617, threadinfo ffff880038ed8000,
task ffff88003f9e0180)
> Oct  3 23:27:52 tuek Stack: 0000000000000003 00000000001b3aa7
0000000001200011 ffff880002d39008
> Oct  3 23:27:52 tuek ffff880038ed9d98 ffffffff80117543 0000000000000000
ffff88003ca4ea28
> Oct  3 23:27:52 tuek ffff880038ed9da8 ffffffff801175f2 
> Oct  3 23:27:52 tuek Call Trace: <ffffffff80117543>{mm_pin+387}
<ffffffff801175f2>{_arch_dup_mmap+18}
> Oct  3 23:27:52 tuek <ffffffff80127cf6>{copy_process+3142}
<ffffffff80128c8e>{do_fork+238}
> Oct  3 23:27:52 tuek <ffffffff801710d6>{fd_install+54}
<ffffffff80134e8c>{sigprocmask+220}
> Oct  3 23:27:52 tuek <ffffffff8010afbe>{system_call+134}
<ffffffff801094b3>{sys_clone+35}
> Oct  3 23:27:52 tuek <ffffffff8010b3e9>{ptregscall_common+61}
> Oct  3 23:27:52 tuek 
> Oct  3 23:27:52 tuek Code: 0f 0b 68 38 d7 3f 80 c2 c6 00 90 c9 c3 0f 1f 80
00 00 00 00
> Oct  3 23:27:52 tuek RIP <ffffffff80117cb5>{xen_pgd_pin+85} RSP
<ffff880038ed9d58>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Seemingly Similar Threads

Search for more apparently analagous threads

Xen devel - Oct 2006 - Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:197

[Xen-devel] Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:197

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

RE: [Xen-devel] Kernel BUG atarch/x86_64/mm/../../i386/mm/hypervisor.c:197

Seemingly Similar Threads