thr3ads.net - Xen users - Xen 4.3.1 / Linux 3.12 panic [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Wouter de Geus

2013-Nov-05 13:19 UTC

Xen 4.3.1 / Linux 3.12 panic

Hej folks,

I''ve been trying to get a new machine up and running with the latest
Xen for a while on a Slackware64 (current) machine.
After installing Xen from source and building a new kernel with all xen options
enabled I haven''t been able to get the machine to behave.
The machine is a brand new dual opteron 6212 on a Supermicro H8DGi board with
64G ECC memory.

Running a stock slackware kernel without xen works like a charm,
haven''t seen anything weird.
However, as soon as I boot Xen with my custom kernel the machine panics within
the hour.
When doing something intensive like building a kernel it''ll often crash
in a few minutes.
I''ve tried both Xen 4.3.0 and 4.3.1, no difference there.
The kernels I''ve tried were 3.11.4 and 3.11.6 and the brand new 3.12.

The kernel panics are a bit different every time, but the most common seems to
be ''Bad page state in process X'' or ''unable to handle
kernel paging request at X'' and of course ''general protection
fault''.
Here''s the most recent one:
-------------------
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034665]
general protection fault: 0000 [#1] SMP
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034686]
Modules linked in:
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034697] CPU:
0 PID: 262 Comm: jbd2/md0-8 Not tainted 3.12.0-Desman #1
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034707]
Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0        09/10/2012
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034717]
task: ffff8800d7162b20 ti: ffff8800d68b2000 task.ti: ffff8800d68b2000
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034726] RIP:
e030:[<ffffffff8114119b>]  [<ffffffff8114119b>] __rmqueue+0x6b/0x3a0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034745] RSP:
e02b:ffff8800d68b38b0  EFLAGS: 00010012
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034752] RAX:
ffff8801281d9e08 RBX: 0000000000000000 RCX: 0000000000000003
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034797] RDX:
0000000000000001 RSI: ffff8801281d9f22 RDI: 9f30ffff8801281d
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034805] RBP:
ffff8801281d9f02 R08: 0000000000000010 R09: 9f20ffff8801281d
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034814] R10:
ffff8801281d9f10 R11: 0000000000000058 R12: ffff8801281d9d80
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034822] R13:
0000000000000001 R14: ffff8801281d9e00 R15: ffffea00046534e0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034837] FS: 
00007ff41d295740(0000) GS:ffff880122a00000(0000) knlGS:0000000000000000
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034847] CS: 
e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034854] CR2:
00007f4d5f308fb5 CR3: 000000011d045000 CR4: 0000000000040660
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034864]
Stack:
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034869] 
0000000000000000 ffff88011e4046c0 0000000000000001 0000000000001000
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034883] 
0000000000000000 ffff8801281d9d80 0000000000000001 0000000000000000
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034897] 
000000000000001f 0000000000000009 ffffea00046534e0 ffffffff81142f89
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034911] Call
Trace:
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034922] 
[<ffffffff81142f89>] ? get_page_from_freelist+0x329/0x900
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034935] 
[<ffffffff811436b4>] ? __alloc_pages_nodemask+0x154/0xa90
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034948] 
[<ffffffff811539bd>] ? zone_statistics+0x9d/0xa0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034961] 
[<ffffffff8117ddd3>] ? __kmalloc+0xe3/0x120
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034975] 
[<ffffffff81263d9a>] ? ext4_ext_find_extent+0x26a/0x300
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034987] 
[<ffffffff811749a5>] ? alloc_pages_current+0xb5/0x180
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.034999] 
[<ffffffff8117c3d5>] ? new_slab+0x255/0x2e0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035011] 
[<ffffffff81d42807>] ? __slab_alloc+0x2a1/0x436
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035025] 
[<ffffffff811b8e18>] ? alloc_buffer_head+0x18/0x60
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035037] 
[<ffffffff8117db3b>] ? kmem_cache_alloc+0xab/0xd0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035049] 
[<ffffffff811b8e18>] ? alloc_buffer_head+0x18/0x60
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035061] 
[<ffffffff811b8227>] ? generic_block_bmap+0x37/0x50
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035075] 
[<ffffffff81290676>] ? jbd2_journal_write_metadata_buffer+0x56/0x3c0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035088] 
[<ffffffff8128aa41>] ? jbd2_journal_commit_transaction+0x721/0x16d0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035103] 
[<ffffffff81007cbc>] ? xen_clocksource_read+0x1c/0x20
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035116] 
[<ffffffff81d4eed1>] ? _raw_spin_lock_irqsave+0x11/0x50
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035128] 
[<ffffffff8128e5df>] ? kjournald2+0xaf/0x240
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035140] 
[<ffffffff810cd2d0>] ? wake_up_atomic_t+0x30/0x30
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035152] 
[<ffffffff8128e530>] ? commit_timeout+0x10/0x10
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035163] 
[<ffffffff810cc57f>] ? kthread+0xaf/0xc0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035174] 
[<ffffffff81007cbc>] ? xen_clocksource_read+0x1c/0x20
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035186] 
[<ffffffff810cc4d0>] ? kthread_create_on_node+0x120/0x120
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035197] 
[<ffffffff81d4facc>] ? ret_from_fork+0x7c/0xb0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035208] 
[<ffffffff810cc4d0>] ? kthread_create_on_node+0x120/0x120
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035215]
Code: 89 c2 89 d9 4c 89 c7 48 c1 e7 04 48 8d 34 38 48 3b 36 0f 84 b8 00 00 00 49
c1 e0 04 4b 8b 34 02 48 8b 7e 08 4c 8b 0e 48 8d 6e e0 <49> 89 79 08 4c 89
0f 48 bf 00 01 10 00 00 00 ad de 48 89 3e 48
Nov  5 13:44:20 192.168.1.6 kernel 01 [kern.alert] kernel: [ 6868.035326] RIP 
[<ffffffff8114119b>] __rmqueue+0x6b/0x3a0
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.035336]  RSP
<ffff8800d68b38b0>
Nov  5 13:44:20 192.168.1.6 kernel 04 [kern.warning] kernel: [ 6868.049110] ---[
end trace 65f94d10957f59d0 ]---
-----------------------

I''ve attached my kernel configuration to this email for those who care
:)

Does anyone have any idea what I''m facing here?
If it weren''t for the stock kernel (without Xen) running stable
I''d guess bad memory, but so far a memory test gave 0 errors (not that
that''s a real indication).
Feels like a bug / config problem somehow.

Thanks for reading :)

Regards,

Wouter.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Wouter de Geus

2013-Nov-06 09:12 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

I''ve been experimenting some more.
Last 24 hours I''ve been constantly compiling (in a while loop) using my
(non-Xen) stock slackware kernel 3.10.7, stable as a rock.

Just booted Xen 4.3.1 with my custom 3.11 kernel, crashed as soon as I did a rm
-rf on some old sources.
Here''s the console output:
------
(XEN) ----[ Xen-4.3.1  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    4
(XEN) RIP:    e008:[<ffff82c4c013f47c>] do_dbs_timer+0x11c/0x240
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 000000003b9d8704   rcx: 000000000000001d
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
(XEN) rbp: ffff830834fd6380   rsp: ffff830834fffe30   r8:  00000012d91afd3e
(XEN) r9:  ffff830834ff7128   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: ffff830977948860   r14: 8000000000000380
(XEN) r15: 000000000000001d   cr0: 000000008005003b   cr4: 00000000000406f0
(XEN) cr3: 00000000d7c5f000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff830834fffe30:
(XEN)    0000000000000286 ffff82c4c02ea940 ffff82c4c0300980 0027ac4021424b00
(XEN)    000000fb00000000 ffff831021424d00 ffff831021424d50 00000012d91b237a
(XEN)    0000000000000004 0000000000000000 0000000000000000 ffff82c4c019bbf1
(XEN)    00000000ffffffff ffff82c4c02c7800 0014e1920000200d 0000000000000000
(XEN)    0000000000000000 00000000ffffffff ffff82c4c02c7800 ffff82c4c01245f4
(XEN)    000000000000e008 ffff830834ff8000 ffff830834ff8000 0000000000000004
(XEN)    0000000000000004 ffff82c4c01584ce 0000000000000000 0000000000000000
(XEN)    0000000000000001 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000004 ffff8300d7afc000
(XEN)    0000004374cd5a00 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4c013f47c>] do_dbs_timer+0x11c/0x240
(XEN)    [<ffff82c4c019bbf1>] acpi_processor_idle+0x201/0x550
(XEN)    [<ffff82c4c01245f4>] __do_softirq+0x74/0xa0
(XEN)    [<ffff82c4c01584ce>] idle_loop+0x1e/0x50
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 4:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************
(XEN)
(XEN) Manual reset required (''noreboot'' specified)
------

Suggestions anyone? :)

Regards,

Wouter.

Ian Campbell

2013-Nov-06 09:38 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

On Tue, 2013-11-05 at 14:19 +0100, Wouter de Geus wrote:
> I''ve been trying to get a new machine up and running with the
latest
> Xen for a while on a Slackware64 (current) machine.
> After installing Xen from source and building a new kernel with all
> xen options enabled I haven''t been able to get the machine to
behave.
> The machine is a brand new dual opteron 6212 on a Supermicro H8DGi
> board with 64G ECC memory.
> 
> Running a stock slackware kernel without xen works like a charm,
> haven''t seen anything weird.
> However, as soon as I boot Xen with my custom kernel the machine
> panics within the hour.
> When doing something intensive like building a kernel it''ll often
> crash in a few minutes.
> I''ve tried both Xen 4.3.0 and 4.3.1, no difference there.
> The kernels I''ve tried were 3.11.4 and 3.11.6 and the brand new
3.12.
> 
> The kernel panics are a bit different every time, but the most common
> seems to be ''Bad page state in process X'' or
''unable to handle kernel
> paging request at X'' and of course ''general protection
fault''.
> Here''s the most recent one:
(trimmed the long common prefix so it''s not wrapped and therefore
readable)
> [ 6868.034665] general protection fault: 0000 [#1] SMP 
> [ 6868.034686] Modules linked in:
> [ 6868.034697] CPU: 0 PID: 262 Comm: jbd2/md0-8 Not tainted 3.12.0-Desman
#1
> [ 6868.034707] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0 
09/10/2012
> [ 6868.034717] task: ffff8800d7162b20 ti: ffff8800d68b2000 task.ti:
ffff8800d68b2000
> [ 6868.034726] RIP: e030:[<ffffffff8114119b>] 
[<ffffffff8114119b>] __rmqueue+0x6b/0x3a0
> [ 6868.034745] RSP: e02b:ffff8800d68b38b0  EFLAGS: 00010012
> [ 6868.034752] RAX: ffff8801281d9e08 RBX: 0000000000000000 RCX:
0000000000000003
> [ 6868.034797] RDX: 0000000000000001 RSI: ffff8801281d9f22 RDI:
9f30ffff8801281d
> [ 6868.034805] RBP: ffff8801281d9f02 R08: 0000000000000010 R09:
9f20ffff8801281d
> [ 6868.034814] R10: ffff8801281d9f10 R11: 0000000000000058 R12:
ffff8801281d9d80
> [ 6868.034822] R13: 0000000000000001 R14: ffff8801281d9e00 R15:
ffffea00046534e0
> [ 6868.034837] FS:  00007ff41d295740(0000) GS:ffff880122a00000(0000)
knlGS:0000000000000000
> [ 6868.034847] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 6868.034854] CR2: 00007f4d5f308fb5 CR3: 000000011d045000 CR4:
0000000000040660
> [ 6868.034864] Stack:
> [ 6868.034869]  0000000000000000 ffff88011e4046c0 0000000000000001
0000000000001000
> [ 6868.034883]  0000000000000000 ffff8801281d9d80 0000000000000001
0000000000000000
> [ 6868.034897]  000000000000001f 0000000000000009 ffffea00046534e0
ffffffff81142f89
> [ 6868.034911] Call Trace:
> [ 6868.034922]  [<ffffffff81142f89>] ?
get_page_from_freelist+0x329/0x900
> [ 6868.034935]  [<ffffffff811436b4>] ?
__alloc_pages_nodemask+0x154/0xa90
> [ 6868.034948]  [<ffffffff811539bd>] ? zone_statistics+0x9d/0xa0
> [ 6868.034961]  [<ffffffff8117ddd3>] ? __kmalloc+0xe3/0x120
> [ 6868.034975]  [<ffffffff81263d9a>] ?
ext4_ext_find_extent+0x26a/0x300
> [ 6868.034987]  [<ffffffff811749a5>] ? alloc_pages_current+0xb5/0x180
> [ 6868.034999]  [<ffffffff8117c3d5>] ? new_slab+0x255/0x2e0
> [ 6868.035011]  [<ffffffff81d42807>] ? __slab_alloc+0x2a1/0x436
> [ 6868.035025]  [<ffffffff811b8e18>] ? alloc_buffer_head+0x18/0x60
> [ 6868.035037]  [<ffffffff8117db3b>] ? kmem_cache_alloc+0xab/0xd0
> [ 6868.035049]  [<ffffffff811b8e18>] ? alloc_buffer_head+0x18/0x60
> [ 6868.035061]  [<ffffffff811b8227>] ? generic_block_bmap+0x37/0x50
> [ 6868.035075]  [<ffffffff81290676>] ?
jbd2_journal_write_metadata_buffer+0x56/0x3c0
> [ 6868.035088]  [<ffffffff8128aa41>] ?
jbd2_journal_commit_transaction+0x721/0x16d0
> [ 6868.035103]  [<ffffffff81007cbc>] ? xen_clocksource_read+0x1c/0x20
> [ 6868.035116]  [<ffffffff81d4eed1>] ?
_raw_spin_lock_irqsave+0x11/0x50
> [ 6868.035128]  [<ffffffff8128e5df>] ? kjournald2+0xaf/0x240
> [ 6868.035140]  [<ffffffff810cd2d0>] ? wake_up_atomic_t+0x30/0x30
> [ 6868.035152]  [<ffffffff8128e530>] ? commit_timeout+0x10/0x10
> [ 6868.035163]  [<ffffffff810cc57f>] ? kthread+0xaf/0xc0
> [ 6868.035174]  [<ffffffff81007cbc>] ? xen_clocksource_read+0x1c/0x20
> [ 6868.035186]  [<ffffffff810cc4d0>] ?
kthread_create_on_node+0x120/0x120
> [ 6868.035197]  [<ffffffff81d4facc>] ? ret_from_fork+0x7c/0xb0
> [ 6868.035208]  [<ffffffff810cc4d0>] ?
kthread_create_on_node+0x120/0x120
> [ 6868.035215] Code: 89 c2 89 d9 4c 89 c7 48 c1 e7 04 48 8d 34 38 48 3b 36
0f 84 b8 00 00 00 49 c1 e0 04 4b 8b 34 02 48 8b 7e 08 4c 8b 0e 48 8d 6e e0
<49> 89 79 08 4c 89 0f 48 bf 00 01 10 00 00 00 ad de 48 89 3e 48
> [ 6868.035326] RIP  [<ffffffff8114119b>] __rmqueue+0x6b/0x3a0
> [ 6868.035336]  RSP <ffff8800d68b38b0>
> [ 6868.049110] ---[ end trace 65f94d10957f59d0 ]---
> If it weren''t for the stock kernel (without Xen) running stable
Have you run your own kernel (3.12.0-Desman, the one which crashes with
Xen) without Xen underneath?
> I''d guess bad memory, but so far a memory test gave 0 errors (not
that
> that''s a real indication).
> Feels like a bug / config problem somehow.
Yes, I agree. I''m afraid I''ve not seen anything like this,
CCing the Xen
pvops maintainers for input.


Ian.

Ian Campbell

2013-Nov-06 09:41 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

(CCing Linux guys, not because this involves Linux but because I CCed
them on the previous mail)

On Wed, 2013-11-06 at 10:12 +0100, Wouter de Geus wrote:> I''ve been experimenting some more.
> Last 24 hours I''ve been constantly compiling (in a while loop)
using my (non-Xen) stock slackware kernel 3.10.7, stable as a rock.
> 
> Just booted Xen 4.3.1 with my custom 3.11 kernel, crashed as soon as I did
a rm -rf on some old sources.
> Here''s the console output:
> ------
> (XEN) ----[ Xen-4.3.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    4
> (XEN) RIP:    e008:[<ffff82c4c013f47c>] do_dbs_timer+0x11c/0x240
The is a cpufreq thing from the looks of it.

cpufreq differences between native Linux and Xen could cause weird
memory corruption, manifesting as a variety of page faults, GPFs etc, I
guess.

Perhaps investigate disabling cpufreq stuff under Xen? I''m not sure how
one does this exactly but google through up
http://wiki.xen.org/wiki/Xen_power_management and I saw some references
in http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

Ian.

> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 000000003b9d8704   rcx: 000000000000001d
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
> (XEN) rbp: ffff830834fd6380   rsp: ffff830834fffe30   r8:  00000012d91afd3e
> (XEN) r9:  ffff830834ff7128   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: ffff830977948860   r14: 8000000000000380
> (XEN) r15: 000000000000001d   cr0: 000000008005003b   cr4: 00000000000406f0
> (XEN) cr3: 00000000d7c5f000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff830834fffe30:
> (XEN)    0000000000000286 ffff82c4c02ea940 ffff82c4c0300980
0027ac4021424b00
> (XEN)    000000fb00000000 ffff831021424d00 ffff831021424d50
00000012d91b237a
> (XEN)    0000000000000004 0000000000000000 0000000000000000
ffff82c4c019bbf1
> (XEN)    00000000ffffffff ffff82c4c02c7800 0014e1920000200d
0000000000000000
> (XEN)    0000000000000000 00000000ffffffff ffff82c4c02c7800
ffff82c4c01245f4
> (XEN)    000000000000e008 ffff830834ff8000 ffff830834ff8000
0000000000000004
> (XEN)    0000000000000004 ffff82c4c01584ce 0000000000000000
0000000000000000
> (XEN)    0000000000000001 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000004
ffff8300d7afc000
> (XEN)    0000004374cd5a00 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4c013f47c>] do_dbs_timer+0x11c/0x240
> (XEN)    [<ffff82c4c019bbf1>] acpi_processor_idle+0x201/0x550
> (XEN)    [<ffff82c4c01245f4>] __do_softirq+0x74/0xa0
> (XEN)    [<ffff82c4c01584ce>] idle_loop+0x1e/0x50
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 4:
> (XEN) GENERAL PROTECTION FAULT
> (XEN) [error_code=0000]
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required (''noreboot'' specified)
> ------
> 
> Suggestions anyone? :)
> 
> Regards,
> 
> Wouter.
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Wouter de Geus

2013-Nov-06 10:20 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

* Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 09:41:39 +0000]:
> The is a cpufreq thing from the looks of it.
> 
> cpufreq differences between native Linux and Xen could cause weird
> memory corruption, manifesting as a variety of page faults, GPFs etc, I
> guess.
> 
> Perhaps investigate disabling cpufreq stuff under Xen? I''m not
sure how
> one does this exactly but google through up
> http://wiki.xen.org/wiki/Xen_power_management and I saw some references
> in http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html
> 
> Ian.
Thanks a lot for the insight!

I''ve booted my 3.12-Desman kernel under xen with
''cpufreq=none'' on the xen
commandline.  So far so good (trying some kernel compiles to see if
it''s
stable, system has been up for 20 minutes now).
If this turns out to be stable I''ll try again with cpufreq=dom0 to see
if
that''s also stable. I''ll report my findings if you care.

If there''s anything you guys want me to test please let me know.

Thanks again!

Wouter.

Ian Campbell

2013-Nov-06 10:51 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

On Wed, 2013-11-06 at 11:20 +0100, Wouter de Geus wrote:> * Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 09:41:39 +0000]:
> 
> > The is a cpufreq thing from the looks of it.
> > 
> > cpufreq differences between native Linux and Xen could cause weird
> > memory corruption, manifesting as a variety of page faults, GPFs etc,
I
> > guess.
> > 
> > Perhaps investigate disabling cpufreq stuff under Xen? I''m
not sure how
> > one does this exactly but google through up
> > http://wiki.xen.org/wiki/Xen_power_management and I saw some
references
> > in http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html
> > 
> > Ian.
> 
> Thanks a lot for the insight!
> 
> I''ve booted my 3.12-Desman kernel under xen with
''cpufreq=none'' on the xen
> commandline.  So far so good (trying some kernel compiles to see if
it''s
> stable, system has been up for 20 minutes now).
> If this turns out to be stable I''ll try again with cpufreq=dom0 to
see if
> that''s also stable. I''ll report my findings if you care.
Please do.

I suspect it shouldn''t be necessary to use command lines to override
these things, but I''ve no idea how to diagnose this further.

Once you have the findings if you could post a summary to xen-devel and
CC jbeulich@suse.com & insong.liu@intel.com (cpufreq/power mgmt
maintainers) perhaps they can advise.

Ian.

Wouter de Geus

2013-Nov-06 13:25 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

* Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 10:51:07 +0000]:
> > If this turns out to be stable I''ll try again with
cpufreq=dom0 to see if
> > that''s also stable. I''ll report my findings if you
care.
> 
> Please do.
With cpufreq=none I''ve been able to run through a windows 2008
installation
and some kernel compiles without problems.  After that I rebooted with
cpufreq=dom0, and within 5 minutes ran into the first oops again:

[  428.105061] BUG: unable to handle kernel paging request at ffffea0000dd8a48
[  428.105103] IP: [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
[  428.105115] PGD 1281d6067 PUD 1281d5067 PMD 1281ce067 PTE 801000097bf53068
[  428.105123] Oops: 0000 [#1] SMP 
[  428.105127] Modules linked in:
[  428.105133] CPU: 3 PID: 1786 Comm: sh Not tainted 3.12.0-Desman #32
[  428.105138] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0      
09/10/2012
[  428.105142] task: ffff88011dbb1590 ti: ffff8800d5088000 task.ti:
ffff8800d5088000
[  428.105147] RIP: e030:[<ffffffff8115c126>]  [<ffffffff8115c126>]
unmap_single_vma+0x426/0x820
[  428.105154] RSP: e02b:ffff8800d5089d30  EFLAGS: 00010246
[  428.105157] RAX: 80000008002db165 RBX: ffff8800d2ad0d60 RCX: 0000000000dd8a40
[  428.105161] RDX: 80000008002db165 RSI: 0000000001fac000 RDI: 80000008002db165
[  428.105165] RBP: ffffea0000dd8a40 R08: ffff8800d2b52cf0 R09: 00000000fffffffa
[  428.105169] R10: 0000000000000a6f R11: 00000063ad0a7abc R12: 0000000001fe5000
[  428.105173] R13: ffffc00000000fff R14: 0000000001fac000 R15: ffff8800d5089e40
[  428.105181] FS:  00002b839c48c600(0000) GS:ffff880122a60000(0000)
knlGS:0000000000000000
[  428.105186] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  428.105215] CR2: ffffea0000dd8a48 CR3: 00000000021de000 CR4: 0000000000040660
[  428.105220] Stack:
[  428.105222]  ffff8800d6961c00 0000000000000000 ffff8800d2b52cf0
0000000000000000
[  428.105229]  ffffea00034ab430 80000008002db165 ffff8800c331c078
0000000001fe5000
[  428.105236]  ffff880000000000 00003ffffffff000 ffff88011dbb1590
0000000001fe4fff
[  428.105242] Call Trace:
[  428.105248]  [<ffffffff8115d4c1>] ? unmap_vmas+0x41/0x90
[  428.105254]  [<ffffffff81165e1a>] ? exit_mmap+0x8a/0x150
[  428.105261]  [<ffffffff810abc19>] ? mmput+0x49/0x100
[  428.105267]  [<ffffffff810afb53>] ? do_exit+0x273/0xa30
[  428.105273]  [<ffffffff810dc045>] ? vtime_account_user+0x45/0x60
[  428.105278]  [<ffffffff810b10d4>] ? do_group_exit+0x34/0xa0
[  428.105284]  [<ffffffff810b114b>] ? SyS_exit_group+0xb/0x10
[  428.105290]  [<ffffffff81d4fd8f>] ? tracesys+0xe1/0xe6
[  428.105294] Code: 48 8b 3c 24 4c 89 f6 48 89 da 66 66 66 90 66 66 90 41 80 4f
18 01 48 85 ed 0f 84 7a ff ff ff 48 83 7c 24 18 00 0f 85 02 03 00 00 <f6>
45 08 01 0f 84 70 01 00 00 48 89 ef ff 8c 24 98 00 00 00 e8
[  428.105347] RIP  [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
[  428.105353]  RSP <ffff8800d5089d30>
[  428.105356] CR2: ffffea0000dd8a48
[  428.105360] ---[ end trace 81935aa1c6524ae3 ]---
> I suspect it shouldn''t be necessary to use command lines to
override
> these things, but I''ve no idea how to diagnose this further.
Removing the entire cpufreq part from my dom0 kernel might help :)
But then again, if that''s a problem I would like the hypervisor to
detect
and avoid this problem if that''s possible.
> Once you have the findings if you could post a summary to xen-devel and
> CC jbeulich@suse.com & insong.liu@intel.com (cpufreq/power mgmt
> maintainers) perhaps they can advise.
Summary:
--------
The issue: Xen 4.3.1 and my Linux 3.12 build (with cpufreq) panics (page
requests, GPF, bad page state) usually within a few minutes.
When Xen is booted with cpufreq=none the problem seems to disappear, with
cpufreq=dom0 the problem is still there.
The machine I run this on is a dual opteron 6212 with 64GB ECC RAM on a
Supermicro H8DGi board.

Regards,

Wouter.

Wouter de Geus

2013-Nov-06 13:25 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

* Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 10:51:07 +0000]:
> > If this turns out to be stable I''ll try again with
cpufreq=dom0 to see if
> > that''s also stable. I''ll report my findings if you
care.
> 
> Please do.
With cpufreq=none I''ve been able to run through a windows 2008
installation
and some kernel compiles without problems.  After that I rebooted with
cpufreq=dom0, and within 5 minutes ran into the first oops again:

[  428.105061] BUG: unable to handle kernel paging request at ffffea0000dd8a48
[  428.105103] IP: [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
[  428.105115] PGD 1281d6067 PUD 1281d5067 PMD 1281ce067 PTE 801000097bf53068
[  428.105123] Oops: 0000 [#1] SMP 
[  428.105127] Modules linked in:
[  428.105133] CPU: 3 PID: 1786 Comm: sh Not tainted 3.12.0-Desman #32
[  428.105138] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0      
09/10/2012
[  428.105142] task: ffff88011dbb1590 ti: ffff8800d5088000 task.ti:
ffff8800d5088000
[  428.105147] RIP: e030:[<ffffffff8115c126>]  [<ffffffff8115c126>]
unmap_single_vma+0x426/0x820
[  428.105154] RSP: e02b:ffff8800d5089d30  EFLAGS: 00010246
[  428.105157] RAX: 80000008002db165 RBX: ffff8800d2ad0d60 RCX: 0000000000dd8a40
[  428.105161] RDX: 80000008002db165 RSI: 0000000001fac000 RDI: 80000008002db165
[  428.105165] RBP: ffffea0000dd8a40 R08: ffff8800d2b52cf0 R09: 00000000fffffffa
[  428.105169] R10: 0000000000000a6f R11: 00000063ad0a7abc R12: 0000000001fe5000
[  428.105173] R13: ffffc00000000fff R14: 0000000001fac000 R15: ffff8800d5089e40
[  428.105181] FS:  00002b839c48c600(0000) GS:ffff880122a60000(0000)
knlGS:0000000000000000
[  428.105186] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  428.105215] CR2: ffffea0000dd8a48 CR3: 00000000021de000 CR4: 0000000000040660
[  428.105220] Stack:
[  428.105222]  ffff8800d6961c00 0000000000000000 ffff8800d2b52cf0
0000000000000000
[  428.105229]  ffffea00034ab430 80000008002db165 ffff8800c331c078
0000000001fe5000
[  428.105236]  ffff880000000000 00003ffffffff000 ffff88011dbb1590
0000000001fe4fff
[  428.105242] Call Trace:
[  428.105248]  [<ffffffff8115d4c1>] ? unmap_vmas+0x41/0x90
[  428.105254]  [<ffffffff81165e1a>] ? exit_mmap+0x8a/0x150
[  428.105261]  [<ffffffff810abc19>] ? mmput+0x49/0x100
[  428.105267]  [<ffffffff810afb53>] ? do_exit+0x273/0xa30
[  428.105273]  [<ffffffff810dc045>] ? vtime_account_user+0x45/0x60
[  428.105278]  [<ffffffff810b10d4>] ? do_group_exit+0x34/0xa0
[  428.105284]  [<ffffffff810b114b>] ? SyS_exit_group+0xb/0x10
[  428.105290]  [<ffffffff81d4fd8f>] ? tracesys+0xe1/0xe6
[  428.105294] Code: 48 8b 3c 24 4c 89 f6 48 89 da 66 66 66 90 66 66 90 41 80 4f
18 01 48 85 ed 0f 84 7a ff ff ff 48 83 7c 24 18 00 0f 85 02 03 00 00 <f6>
45 08 01 0f 84 70 01 00 00 48 89 ef ff 8c 24 98 00 00 00 e8
[  428.105347] RIP  [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
[  428.105353]  RSP <ffff8800d5089d30>
[  428.105356] CR2: ffffea0000dd8a48
[  428.105360] ---[ end trace 81935aa1c6524ae3 ]---
> I suspect it shouldn''t be necessary to use command lines to
override
> these things, but I''ve no idea how to diagnose this further.
Removing the entire cpufreq part from my dom0 kernel might help :)
But then again, if that''s a problem I would like the hypervisor to
detect
and avoid this problem if that''s possible.
> Once you have the findings if you could post a summary to xen-devel and
> CC jbeulich@suse.com & insong.liu@intel.com (cpufreq/power mgmt
> maintainers) perhaps they can advise.
Summary:
--------
The issue: Xen 4.3.1 and my Linux 3.12 build (with cpufreq) panics (page
requests, GPF, bad page state) usually within a few minutes.
When Xen is booted with cpufreq=none the problem seems to disappear, with
cpufreq=dom0 the problem is still there.
The machine I run this on is a dual opteron 6212 with 64GB ECC RAM on a
Supermicro H8DGi board.

Regards,

Wouter.

Jean-Paul Pozzi

2013-Nov-06 13:49 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

Hello,

I use currently (in grub.cfg) :

multiboot       /xen-4.2-amd64.gz placeholder  dom0_mem=6144M cpufreq=xen
cpuidle vtd=1 iommu=1 loop.max_loop=64

with an AMD processor and it works without any kind of kernel bugs.

Regards

JP P

----- Mail original -----
De: "Wouter de Geus" <benv-xensource.com@junerules.com>
À: xen-users@lists.xen.org
Cc: "insong liu" <insong.liu@intel.com>, jbeulich@suse.com,
xen-devel@lists.xen.org
Envoyé: Mercredi 6 Novembre 2013 14:25:28
Objet: Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

* Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 10:51:07 +0000]:
> > If this turns out to be stable I'll try again with cpufreq=dom0 to
see if
> > that's also stable. I'll report my findings if you care.
> 
> Please do.
With cpufreq=none I've been able to run through a windows 2008 installation
and some kernel compiles without problems.  After that I rebooted with
cpufreq=dom0, and within 5 minutes ran into the first oops again:

[  428.105061] BUG: unable to handle kernel paging request at ffffea0000dd8a48
[  428.105103] IP: [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
[  428.105115] PGD 1281d6067 PUD 1281d5067 PMD 1281ce067 PTE 801000097bf53068
[  428.105123] Oops: 0000 [#1] SMP 
[  428.105127] Modules linked in:
[  428.105133] CPU: 3 PID: 1786 Comm: sh Not tainted 3.12.0-Desman #32
[  428.105138] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0      
09/10/2012
[  428.105142] task: ffff88011dbb1590 ti: ffff8800d5088000 task.ti:
ffff8800d5088000
[  428.105147] RIP: e030:[<ffffffff8115c126>]  [<ffffffff8115c126>]
unmap_single_vma+0x426/0x820
[  428.105154] RSP: e02b:ffff8800d5089d30  EFLAGS: 00010246
[  428.105157] RAX: 80000008002db165 RBX: ffff8800d2ad0d60 RCX: 0000000000dd8a40
[  428.105161] RDX: 80000008002db165 RSI: 0000000001fac000 RDI: 80000008002db165
[  428.105165] RBP: ffffea0000dd8a40 R08: ffff8800d2b52cf0 R09: 00000000fffffffa
[  428.105169] R10: 0000000000000a6f R11: 00000063ad0a7abc R12: 0000000001fe5000
[  428.105173] R13: ffffc00000000fff R14: 0000000001fac000 R15: ffff8800d5089e40
[  428.105181] FS:  00002b839c48c600(0000) GS:ffff880122a60000(0000)
knlGS:0000000000000000
[  428.105186] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  428.105215] CR2: ffffea0000dd8a48 CR3: 00000000021de000 CR4: 0000000000040660
[  428.105220] Stack:
[  428.105222]  ffff8800d6961c00 0000000000000000 ffff8800d2b52cf0
0000000000000000
[  428.105229]  ffffea00034ab430 80000008002db165 ffff8800c331c078
0000000001fe5000
[  428.105236]  ffff880000000000 00003ffffffff000 ffff88011dbb1590
0000000001fe4fff
[  428.105242] Call Trace:
[  428.105248]  [<ffffffff8115d4c1>] ? unmap_vmas+0x41/0x90
[  428.105254]  [<ffffffff81165e1a>] ? exit_mmap+0x8a/0x150
[  428.105261]  [<ffffffff810abc19>] ? mmput+0x49/0x100
[  428.105267]  [<ffffffff810afb53>] ? do_exit+0x273/0xa30
[  428.105273]  [<ffffffff810dc045>] ? vtime_account_user+0x45/0x60
[  428.105278]  [<ffffffff810b10d4>] ? do_group_exit+0x34/0xa0
[  428.105284]  [<ffffffff810b114b>] ? SyS_exit_group+0xb/0x10
[  428.105290]  [<ffffffff81d4fd8f>] ? tracesys+0xe1/0xe6
[  428.105294] Code: 48 8b 3c 24 4c 89 f6 48 89 da 66 66 66 90 66 66 90 41 80 4f
18 01 48 85 ed 0f 84 7a ff ff ff 48 83 7c 24 18 00 0f 85 02 03 00 00 <f6>
45 08 01 0f 84 70 01 00 00 48 89 ef ff 8c 24 98 00 00 00 e8
[  428.105347] RIP  [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
[  428.105353]  RSP <ffff8800d5089d30>
[  428.105356] CR2: ffffea0000dd8a48
[  428.105360] ---[ end trace 81935aa1c6524ae3 ]---
> I suspect it shouldn't be necessary to use command lines to override
> these things, but I've no idea how to diagnose this further.
Removing the entire cpufreq part from my dom0 kernel might help :)
But then again, if that's a problem I would like the hypervisor to detect
and avoid this problem if that's possible.
> Once you have the findings if you could post a summary to xen-devel and
> CC jbeulich@suse.com & insong.liu@intel.com (cpufreq/power mgmt
> maintainers) perhaps they can advise.
Summary:
--------
The issue: Xen 4.3.1 and my Linux 3.12 build (with cpufreq) panics (page
requests, GPF, bad page state) usually within a few minutes.
When Xen is booted with cpufreq=none the problem seems to disappear, with
cpufreq=dom0 the problem is still there.
The machine I run this on is a dual opteron 6212 with 64GB ECC RAM on a
Supermicro H8DGi board.

Regards,

Wouter.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Wouter de Geus

2013-Nov-06 14:02 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

* Jean-Paul Pozzi <jpp@jppozzi.dyndns.org> [2013-11-06 14:49:52 +0100]:
> Hello,
Hello,
> I use currently (in grub.cfg) :
> multiboot       /xen-4.2-amd64.gz placeholder  dom0_mem=6144M cpufreq=xen
cpuidle vtd=1 iommu=1 loop.max_loop=64
> with an AMD processor and it works without any kind of kernel bugs.
Yeah, well, I have 4 more machines in the datacenter running Xen 4.1 and
Xen 4.2 with AMD processors, even one that has an Opteron processor (but
not the same model).  However, this new machine I have has the problem I
mentioned before...

New hardware, new problems :)
And I would throw this on a hardware problem weren''t it that the kernel
without Xen works flawlessly (and no memtest errors either).

Regards,

Wouter.

Konrad Rzeszutek Wilk

2013-Nov-06 20:44 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

On Wed, Nov 06, 2013 at 02:25:28PM +0100, Wouter de Geus
wrote:> * Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 10:51:07 +0000]:
> 
> > > If this turns out to be stable I''ll try again with
cpufreq=dom0 to see if
> > > that''s also stable. I''ll report my findings if
you care.
> > 
> > Please do.
> 
> With cpufreq=none I''ve been able to run through a windows 2008
installation
> and some kernel compiles without problems.  After that I rebooted with
> cpufreq=dom0, and within 5 minutes ran into the first oops again:
Is there a particular reason you had tried ''cpufreq''? Sorry if
that
was answered earlier?> 
> [  428.105061] BUG: unable to handle kernel paging request at
ffffea0000dd8a48
> [  428.105103] IP: [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
> [  428.105115] PGD 1281d6067 PUD 1281d5067 PMD 1281ce067 PTE
801000097bf53068
> [  428.105123] Oops: 0000 [#1] SMP 
> [  428.105127] Modules linked in:
> [  428.105133] CPU: 3 PID: 1786 Comm: sh Not tainted 3.12.0-Desman #32
> [  428.105138] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0 
09/10/2012
> [  428.105142] task: ffff88011dbb1590 ti: ffff8800d5088000 task.ti:
ffff8800d5088000
> [  428.105147] RIP: e030:[<ffffffff8115c126>] 
[<ffffffff8115c126>] unmap_single_vma+0x426/0x820
> [  428.105154] RSP: e02b:ffff8800d5089d30  EFLAGS: 00010246
> [  428.105157] RAX: 80000008002db165 RBX: ffff8800d2ad0d60 RCX:
0000000000dd8a40
> [  428.105161] RDX: 80000008002db165 RSI: 0000000001fac000 RDI:
80000008002db165
> [  428.105165] RBP: ffffea0000dd8a40 R08: ffff8800d2b52cf0 R09:
00000000fffffffa
> [  428.105169] R10: 0000000000000a6f R11: 00000063ad0a7abc R12:
0000000001fe5000
> [  428.105173] R13: ffffc00000000fff R14: 0000000001fac000 R15:
ffff8800d5089e40
> [  428.105181] FS:  00002b839c48c600(0000) GS:ffff880122a60000(0000)
knlGS:0000000000000000
> [  428.105186] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  428.105215] CR2: ffffea0000dd8a48 CR3: 00000000021de000 CR4:
0000000000040660
> [  428.105220] Stack:
> [  428.105222]  ffff8800d6961c00 0000000000000000 ffff8800d2b52cf0
0000000000000000
> [  428.105229]  ffffea00034ab430 80000008002db165 ffff8800c331c078
0000000001fe5000
> [  428.105236]  ffff880000000000 00003ffffffff000 ffff88011dbb1590
0000000001fe4fff
> [  428.105242] Call Trace:
> [  428.105248]  [<ffffffff8115d4c1>] ? unmap_vmas+0x41/0x90
> [  428.105254]  [<ffffffff81165e1a>] ? exit_mmap+0x8a/0x150
> [  428.105261]  [<ffffffff810abc19>] ? mmput+0x49/0x100
> [  428.105267]  [<ffffffff810afb53>] ? do_exit+0x273/0xa30
> [  428.105273]  [<ffffffff810dc045>] ? vtime_account_user+0x45/0x60
> [  428.105278]  [<ffffffff810b10d4>] ? do_group_exit+0x34/0xa0
> [  428.105284]  [<ffffffff810b114b>] ? SyS_exit_group+0xb/0x10
> [  428.105290]  [<ffffffff81d4fd8f>] ? tracesys+0xe1/0xe6
> [  428.105294] Code: 48 8b 3c 24 4c 89 f6 48 89 da 66 66 66 90 66 66 90 41
80 4f 18 01 48 85 ed 0f 84 7a ff ff ff 48 83 7c 24 18 00 0f 85 02 03 00 00
<f6> 45 08 01 0f 84 70 01 00 00 48 89 ef ff 8c 24 98 00 00 00 e8
> [  428.105347] RIP  [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
> [  428.105353]  RSP <ffff8800d5089d30>
> [  428.105356] CR2: ffffea0000dd8a48
> [  428.105360] ---[ end trace 81935aa1c6524ae3 ]---
> 
> > I suspect it shouldn''t be necessary to use command lines to
override
> > these things, but I''ve no idea how to diagnose this further.
> 
> Removing the entire cpufreq part from my dom0 kernel might help :)
> But then again, if that''s a problem I would like the hypervisor to
detect
> and avoid this problem if that''s possible.
So the cpufreq=dom0 is kind of an nops as the Linux kernel will disable
the native CPUfreq machinery. This is done b/c it does not make sense
for Linux dom0 to control the CPU freq when it has no idea of the
workloads (the hypervisor has it).

But with the ''cpufreq=dom0'' you are getting faults.

So the other question is - does anything happen if you disable ACPI power
states in the BIOS?
> 
> > Once you have the findings if you could post a summary to xen-devel
and
> > CC jbeulich@suse.com & insong.liu@intel.com (cpufreq/power mgmt
> > maintainers) perhaps they can advise.
> 
> Summary:
> --------
> The issue: Xen 4.3.1 and my Linux 3.12 build (with cpufreq) panics (page
> requests, GPF, bad page state) usually within a few minutes.
> When Xen is booted with cpufreq=none the problem seems to disappear, with
> cpufreq=dom0 the problem is still there.
> The machine I run this on is a dual opteron 6212 with 64GB ECC RAM on a
> Supermicro H8DGi board.
> 
> Regards,
> 
> Wouter.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Nov-06 20:44 UTC

head link

Re: [Xen-devel] Xen 4.3.1 / Linux 3.12 panic

On Wed, Nov 06, 2013 at 02:25:28PM +0100, Wouter de Geus
wrote:> * Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 10:51:07 +0000]:
> 
> > > If this turns out to be stable I''ll try again with
cpufreq=dom0 to see if
> > > that''s also stable. I''ll report my findings if
you care.
> > 
> > Please do.
> 
> With cpufreq=none I''ve been able to run through a windows 2008
installation
> and some kernel compiles without problems.  After that I rebooted with
> cpufreq=dom0, and within 5 minutes ran into the first oops again:
Is there a particular reason you had tried ''cpufreq''? Sorry if
that
was answered earlier?> 
> [  428.105061] BUG: unable to handle kernel paging request at
ffffea0000dd8a48
> [  428.105103] IP: [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
> [  428.105115] PGD 1281d6067 PUD 1281d5067 PMD 1281ce067 PTE
801000097bf53068
> [  428.105123] Oops: 0000 [#1] SMP 
> [  428.105127] Modules linked in:
> [  428.105133] CPU: 3 PID: 1786 Comm: sh Not tainted 3.12.0-Desman #32
> [  428.105138] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0 
09/10/2012
> [  428.105142] task: ffff88011dbb1590 ti: ffff8800d5088000 task.ti:
ffff8800d5088000
> [  428.105147] RIP: e030:[<ffffffff8115c126>] 
[<ffffffff8115c126>] unmap_single_vma+0x426/0x820
> [  428.105154] RSP: e02b:ffff8800d5089d30  EFLAGS: 00010246
> [  428.105157] RAX: 80000008002db165 RBX: ffff8800d2ad0d60 RCX:
0000000000dd8a40
> [  428.105161] RDX: 80000008002db165 RSI: 0000000001fac000 RDI:
80000008002db165
> [  428.105165] RBP: ffffea0000dd8a40 R08: ffff8800d2b52cf0 R09:
00000000fffffffa
> [  428.105169] R10: 0000000000000a6f R11: 00000063ad0a7abc R12:
0000000001fe5000
> [  428.105173] R13: ffffc00000000fff R14: 0000000001fac000 R15:
ffff8800d5089e40
> [  428.105181] FS:  00002b839c48c600(0000) GS:ffff880122a60000(0000)
knlGS:0000000000000000
> [  428.105186] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  428.105215] CR2: ffffea0000dd8a48 CR3: 00000000021de000 CR4:
0000000000040660
> [  428.105220] Stack:
> [  428.105222]  ffff8800d6961c00 0000000000000000 ffff8800d2b52cf0
0000000000000000
> [  428.105229]  ffffea00034ab430 80000008002db165 ffff8800c331c078
0000000001fe5000
> [  428.105236]  ffff880000000000 00003ffffffff000 ffff88011dbb1590
0000000001fe4fff
> [  428.105242] Call Trace:
> [  428.105248]  [<ffffffff8115d4c1>] ? unmap_vmas+0x41/0x90
> [  428.105254]  [<ffffffff81165e1a>] ? exit_mmap+0x8a/0x150
> [  428.105261]  [<ffffffff810abc19>] ? mmput+0x49/0x100
> [  428.105267]  [<ffffffff810afb53>] ? do_exit+0x273/0xa30
> [  428.105273]  [<ffffffff810dc045>] ? vtime_account_user+0x45/0x60
> [  428.105278]  [<ffffffff810b10d4>] ? do_group_exit+0x34/0xa0
> [  428.105284]  [<ffffffff810b114b>] ? SyS_exit_group+0xb/0x10
> [  428.105290]  [<ffffffff81d4fd8f>] ? tracesys+0xe1/0xe6
> [  428.105294] Code: 48 8b 3c 24 4c 89 f6 48 89 da 66 66 66 90 66 66 90 41
80 4f 18 01 48 85 ed 0f 84 7a ff ff ff 48 83 7c 24 18 00 0f 85 02 03 00 00
<f6> 45 08 01 0f 84 70 01 00 00 48 89 ef ff 8c 24 98 00 00 00 e8
> [  428.105347] RIP  [<ffffffff8115c126>] unmap_single_vma+0x426/0x820
> [  428.105353]  RSP <ffff8800d5089d30>
> [  428.105356] CR2: ffffea0000dd8a48
> [  428.105360] ---[ end trace 81935aa1c6524ae3 ]---
> 
> > I suspect it shouldn''t be necessary to use command lines to
override
> > these things, but I''ve no idea how to diagnose this further.
> 
> Removing the entire cpufreq part from my dom0 kernel might help :)
> But then again, if that''s a problem I would like the hypervisor to
detect
> and avoid this problem if that''s possible.
So the cpufreq=dom0 is kind of an nops as the Linux kernel will disable
the native CPUfreq machinery. This is done b/c it does not make sense
for Linux dom0 to control the CPU freq when it has no idea of the
workloads (the hypervisor has it).

But with the ''cpufreq=dom0'' you are getting faults.

So the other question is - does anything happen if you disable ACPI power
states in the BIOS?
> 
> > Once you have the findings if you could post a summary to xen-devel
and
> > CC jbeulich@suse.com & insong.liu@intel.com (cpufreq/power mgmt
> > maintainers) perhaps they can advise.
> 
> Summary:
> --------
> The issue: Xen 4.3.1 and my Linux 3.12 build (with cpufreq) panics (page
> requests, GPF, bad page state) usually within a few minutes.
> When Xen is booted with cpufreq=none the problem seems to disappear, with
> cpufreq=dom0 the problem is still there.
> The machine I run this on is a dual opteron 6212 with 64GB ECC RAM on a
> Supermicro H8DGi board.
> 
> Regards,
> 
> Wouter.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Nov-06 20:59 UTC

head link

Re: Xen 4.3.1 / Linux 3.12 panic

On Wed, Nov 06, 2013 at 09:41:39AM +0000, Ian Campbell
wrote:> (CCing Linux guys, not because this involves Linux but because I CCed
> them on the previous mail)
> 
> On Wed, 2013-11-06 at 10:12 +0100, Wouter de Geus wrote:
> > I''ve been experimenting some more.
> > Last 24 hours I''ve been constantly compiling (in a while
loop) using my (non-Xen) stock slackware kernel 3.10.7, stable as a rock.
> > 
> > Just booted Xen 4.3.1 with my custom 3.11 kernel, crashed as soon as I
did a rm -rf on some old sources.
> > Here''s the console output:
> > ------
> > (XEN) ----[ Xen-4.3.1  x86_64  debug=n  Not tainted ]----
> > (XEN) CPU:    4
> > (XEN) RIP:    e008:[<ffff82c4c013f47c>] do_dbs_timer+0x11c/0x240
> 
> The is a cpufreq thing from the looks of it.
> 
> cpufreq differences between native Linux and Xen could cause weird
> memory corruption, manifesting as a variety of page faults, GPFs etc, I
> guess.
> 
> Perhaps investigate disabling cpufreq stuff under Xen? I''m not
sure how
> one does this exactly but google through up
> http://wiki.xen.org/wiki/Xen_power_management and I saw some references
> in http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html
> 
> Ian.
> 
> 
> > (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: 000000003b9d8704   rcx:
000000000000001d
> > (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi:
0000000000000000
> > (XEN) rbp: ffff830834fd6380   rsp: ffff830834fffe30   r8: 
00000012d91afd3e
> > (XEN) r9:  ffff830834ff7128   r10: 0000000000000000   r11:
0000000000000000
> > (XEN) r12: 0000000000000000   r13: ffff830977948860   r14:
8000000000000380
> > (XEN) r15: 000000000000001d   cr0: 000000008005003b   cr4:
00000000000406f0
> > (XEN) cr3: 00000000d7c5f000   cr2: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff830834fffe30:
> > (XEN)    0000000000000286 ffff82c4c02ea940 ffff82c4c0300980
0027ac4021424b00
> > (XEN)    000000fb00000000 ffff831021424d00 ffff831021424d50
00000012d91b237a
> > (XEN)    0000000000000004 0000000000000000 0000000000000000
ffff82c4c019bbf1
> > (XEN)    00000000ffffffff ffff82c4c02c7800 0014e1920000200d
0000000000000000
> > (XEN)    0000000000000000 00000000ffffffff ffff82c4c02c7800
ffff82c4c01245f4
> > (XEN)    000000000000e008 ffff830834ff8000 ffff830834ff8000
0000000000000004
> > (XEN)    0000000000000004 ffff82c4c01584ce 0000000000000000
0000000000000000
> > (XEN)    0000000000000001 0000000000000000 0000000000000000
0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000004
ffff8300d7afc000
> > (XEN)    0000004374cd5a00 0000000000000000
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c4c013f47c>] do_dbs_timer+0x11c/0x240
> > (XEN)    [<ffff82c4c019bbf1>] acpi_processor_idle+0x201/0x550
> > (XEN)    [<ffff82c4c01245f4>] __do_softirq+0x74/0xa0
> > (XEN)    [<ffff82c4c01584ce>] idle_loop+0x1e/0x50
That is just impressive. I see a bunch of computations that it might be doing.

But I can''t reproduce it with Xen 4.4 on an AMD box.

Could you pass in the full serial log? I am curios what your config
options are ? And when does it happen? Is there a specific workload
you are doing?

Thanks!> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 4:
> > (XEN) GENERAL PROTECTION FAULT
> > (XEN) [error_code=0000]
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required (''noreboot'' specified)
> > ------
> > 
> > Suggestions anyone? :)
> > 
> > Regards,
> > 
> > Wouter.
> > 
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@lists.xen.org
> > http://lists.xen.org/xen-users
> 
>

Wouter de Geus

2013-Nov-07 11:20 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

* Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [2013-11-06 15:44:58
-0500]:
> Is there a particular reason you had tried ''cpufreq''?
Sorry if that
> was answered earlier?
Ian suggested that the problem might be cpufreq related, (see
http://lists.xenproject.org/archives/html/xen-users/2013-11/msg00053.html )
> So the cpufreq=dom0 is kind of an nops as the Linux kernel will disable
> the native CPUfreq machinery. This is done b/c it does not make sense
> for Linux dom0 to control the CPU freq when it has no idea of the
> workloads (the hypervisor has it).
Aha. Not exactly what I understood from the xen documentation
(http://wiki.xen.org/wiki/Xen_power_management), but I was just testing it
to see if it would be stable.
> But with the ''cpufreq=dom0'' you are getting faults.
With both cpufreq=dom0 and not specifying cpufreq at all, which defaults
to cpufreq=xen according to the docs the system will crash within the hour.
With cpufreq=none the system has now been stable for over a day without any
kernel warnings etc whatsoever (and I tried compiling a kernel for some load).
> So the other question is - does anything happen if you disable ACPI power
> states in the BIOS?
Let''s try ;)
The BIOS has the following options that I consider relevant:
  Name         [Current] (Options)
- PowerNow     [Enabled]
- C State Mode [C6] (Disabled, C6)
- PowerCap     [P-state 0] (P-state 0 through 4)
- HPC Mode     [Enabled] (Disabled, Enabled)
- CPB Mode     [Auto] (Disabled, Auto)
- C1E Support  [Enabled] (Enabled, Disabled)

When I set PowerNow to disabled the C State Mode, PowerCap and HPC options also
disappear.
After booting with PowerNow disabled (without the cpufreq option) I tried a
kernel compile
twice and some heavy I/O under which the system was stable.
So that seems to have the same effect as cpufreq=none.

- With PowerNow enabled (C State Mode and HPC / CPB Mode disabled ,no cpufreq
cmdline)
the system also seems stable.
- C State Mode set to C6 (HPC/CPB disabled, no cpufreq cmdline), seems stable.
- HPC enabled, (CPB disabled, no cpufreq cmdline), crashed (serial console log
attached
as xen-crash-hpc-enabled).

These tests are inconclusive since I can''t reliably trigger a panic,
but it
usually happened within a few minutes when compiling a kernel or other load.
Even without load the system would crash mind you, just
''idling'' it managed to crash
as well but that often took longer.

Anyhow, please let me know if there''s anything else I can test for you
guys.
And thanks for the help :)

Regards,

Wouter.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Wouter de Geus

2013-Nov-07 11:20 UTC

head link

Re: [Xen-devel] Xen 4.3.1 / Linux 3.12 panic

* Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [2013-11-06 15:44:58
-0500]:
> Is there a particular reason you had tried ''cpufreq''?
Sorry if that
> was answered earlier?
Ian suggested that the problem might be cpufreq related, (see
http://lists.xenproject.org/archives/html/xen-users/2013-11/msg00053.html )
> So the cpufreq=dom0 is kind of an nops as the Linux kernel will disable
> the native CPUfreq machinery. This is done b/c it does not make sense
> for Linux dom0 to control the CPU freq when it has no idea of the
> workloads (the hypervisor has it).
Aha. Not exactly what I understood from the xen documentation
(http://wiki.xen.org/wiki/Xen_power_management), but I was just testing it
to see if it would be stable.
> But with the ''cpufreq=dom0'' you are getting faults.
With both cpufreq=dom0 and not specifying cpufreq at all, which defaults
to cpufreq=xen according to the docs the system will crash within the hour.
With cpufreq=none the system has now been stable for over a day without any
kernel warnings etc whatsoever (and I tried compiling a kernel for some load).
> So the other question is - does anything happen if you disable ACPI power
> states in the BIOS?
Let''s try ;)
The BIOS has the following options that I consider relevant:
  Name         [Current] (Options)
- PowerNow     [Enabled]
- C State Mode [C6] (Disabled, C6)
- PowerCap     [P-state 0] (P-state 0 through 4)
- HPC Mode     [Enabled] (Disabled, Enabled)
- CPB Mode     [Auto] (Disabled, Auto)
- C1E Support  [Enabled] (Enabled, Disabled)

When I set PowerNow to disabled the C State Mode, PowerCap and HPC options also
disappear.
After booting with PowerNow disabled (without the cpufreq option) I tried a
kernel compile
twice and some heavy I/O under which the system was stable.
So that seems to have the same effect as cpufreq=none.

- With PowerNow enabled (C State Mode and HPC / CPB Mode disabled ,no cpufreq
cmdline)
the system also seems stable.
- C State Mode set to C6 (HPC/CPB disabled, no cpufreq cmdline), seems stable.
- HPC enabled, (CPB disabled, no cpufreq cmdline), crashed (serial console log
attached
as xen-crash-hpc-enabled).

These tests are inconclusive since I can''t reliably trigger a panic,
but it
usually happened within a few minutes when compiling a kernel or other load.
Even without load the system would crash mind you, just
''idling'' it managed to crash
as well but that often took longer.

Anyhow, please let me know if there''s anything else I can test for you
guys.
And thanks for the help :)

Regards,

Wouter.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Ian Campbell

2013-Nov-07 11:26 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

On Wed, 2013-11-06 at 15:44 -0500, Konrad Rzeszutek Wilk
wrote:> On Wed, Nov 06, 2013 at 02:25:28PM +0100, Wouter de Geus wrote:
> > * Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 10:51:07
+0000]:
> > 
> > > > If this turns out to be stable I''ll try again with
cpufreq=dom0 to see if
> > > > that''s also stable. I''ll report my
findings if you care.
> > > 
> > > Please do.
> > 
> > With cpufreq=none I''ve been able to run through a windows
2008 installation
> > and some kernel compiles without problems.  After that I rebooted with
> > cpufreq=dom0, and within 5 minutes ran into the first oops again:
> 
> Is there a particular reason you had tried ''cpufreq''?
Sorry if that
> was answered earlier?
I suggested it in <1383730899.26213.16.camel@kazak.uk.xensource.com> (on
xen-users only, you were CCd though) because one of the crashes was on
the hypervisor side and involved do_dbs_timer which looked (from the
file comments) to be cpufreq related.

Ian.

Ian Campbell

2013-Nov-07 11:26 UTC

head link

Re: [Xen-devel] Xen 4.3.1 / Linux 3.12 panic

On Wed, 2013-11-06 at 15:44 -0500, Konrad Rzeszutek Wilk
wrote:> On Wed, Nov 06, 2013 at 02:25:28PM +0100, Wouter de Geus wrote:
> > * Ian Campbell <Ian.Campbell@citrix.com> [2013-11-06 10:51:07
+0000]:
> > 
> > > > If this turns out to be stable I''ll try again with
cpufreq=dom0 to see if
> > > > that''s also stable. I''ll report my
findings if you care.
> > > 
> > > Please do.
> > 
> > With cpufreq=none I''ve been able to run through a windows
2008 installation
> > and some kernel compiles without problems.  After that I rebooted with
> > cpufreq=dom0, and within 5 minutes ran into the first oops again:
> 
> Is there a particular reason you had tried ''cpufreq''?
Sorry if that
> was answered earlier?
I suggested it in <1383730899.26213.16.camel@kazak.uk.xensource.com> (on
xen-users only, you were CCd though) because one of the crashes was on
the hypervisor side and involved do_dbs_timer which looked (from the
file comments) to be cpufreq related.

Ian.

Jan Beulich

2013-Nov-07 11:57 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

>>> On 07.11.13 at 12:20, Wouter de Geus
<benv-xensource.com@junerules.com> wrote:
> The BIOS has the following options that I consider relevant:
>   Name         [Current] (Options)
> - PowerNow     [Enabled]
> - C State Mode [C6] (Disabled, C6)
> - PowerCap     [P-state 0] (P-state 0 through 4)
> - HPC Mode     [Enabled] (Disabled, Enabled)
> - CPB Mode     [Auto] (Disabled, Auto)
> - C1E Support  [Enabled] (Enabled, Disabled)
> 
> When I set PowerNow to disabled the C State Mode, PowerCap and HPC options 
> also disappear.
> After booting with PowerNow disabled (without the cpufreq option) I tried a
> kernel compile
> twice and some heavy I/O under which the system was stable.
> So that seems to have the same effect as cpufreq=none.
> 
> - With PowerNow enabled (C State Mode and HPC / CPB Mode disabled ,no
cpufreq
> cmdline)
> the system also seems stable.
> - C State Mode set to C6 (HPC/CPB disabled, no cpufreq cmdline), seems 
> stable.
> - HPC enabled, (CPB disabled, no cpufreq cmdline), crashed (serial console 
> log attached
> as xen-crash-hpc-enabled).
Now we''d need to know what HPC actually means (it means nothing
to me in this context) - I''d have expected the PowerCap (as referring
to P-states) to be the interesting one.

In any event - with cpufreq=dom0 and no cpufreq drivers loaded
in dom0 (which as Konrad says should be the default), there
shouldn''t be any P-state management.

But you being able tom suppress the problem with cpufreq=none
also suggests that quite likely there''s either a problem with the
silicon, or the PowerNow driver in Xen went sufficiently much out
of date wrt newer CPUs that it''s not usable anymore (it certainly
hasn''t been touched in meaningful ways for quite a while). You
may have said so before, but can you confirm that under native
Linux with acpi-cpufreq (or the powernow driver) loaded, you
don''t have this kind of problem? If so, could you please provide
contents of the respective sysfs nodes?

Jan

Wouter de Geus

2013-Nov-07 13:10 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

* Jan Beulich <JBeulich@suse.com> [2013-11-07 11:57:17 +0000]:
> > - PowerCap     [P-state 0] (P-state 0 through 4)
> 
> Now we''d need to know what HPC actually means (it means nothing
> to me in this context) - I''d have expected the PowerCap (as
referring
> to P-states) to be the interesting one.
Would you like me to test the PowerCap setting? If so, in combination with
the other settings set to what? Note that the PowerCap setting can''t be
disabled by itself. (unless P-state 4 counts as disabled?)

According to a faq on supermicro.com (this is a Supermicro board after all)
http://www.supermicro.com/Aplus/support/faqs/faq.cfm?faq=13400
---
Q: I noticed that the newer BIOS supporting AMD 6200 series CPUs have a
   P-state HPC Mode option. Can you provide some info on this mode?
A: HPC mode only keeps maximum and minimum states. In system idle mode CPU
   will stay at P4 state for power saving. Once CPU detects higher
   activities, CPU will jump up to P0 or boost state to reduce clock ramp
   up latency.
---
> In any event - with cpufreq=dom0 and no cpufreq drivers loaded
> in dom0 (which as Konrad says should be the default), there
> shouldn''t be any P-state management.
I thought cpufreq=xen was the default - at least according to
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html
> But you being able tom suppress the problem with cpufreq=none
> also suggests that quite likely there''s either a problem with the
> silicon, or the PowerNow driver in Xen went sufficiently much out
> of date wrt newer CPUs that it''s not usable anymore (it certainly
> hasn''t been touched in meaningful ways for quite a while). You
> may have said so before, but can you confirm that under native
> Linux with acpi-cpufreq (or the powernow driver) loaded, you
> don''t have this kind of problem? If so, could you please provide
> contents of the respective sysfs nodes?
I started tinkering on this new machine with (Slackware''s) linux
3.10.17 kernel
and had no problems whatsoever. The problems only started after booting Xen
with my new custom 3.12 kernel.

I just booted the machine with the 3.12-dom0 kernel without Xen.  And rebooted
since I guess you''re interested in the contents with HPC Mode enabled
;)
I''ve attached the output of dmesg to this email.

Not sure which sysfs nodes you''re interested in though,
there''s:
/sys/module/acpi_cpufreq/parameters/acpi_pstate_strict (contents: 0)

Then per CPU we have /sys/devices/system/cpu/cpu0/cpufreq
with (for CPU 0):
affected_cpus -> 0
bios_limit -> 2600000
cpb -> 1
cpuinfo_cur_freq -> 2600000
cpuinfo_max_freq -> 2600000
cpuinfo_min_freq -> 1400000
cpuinfo_transition_latency -> 5000
freqdomain_cpus -> 0 1
related_cpus -> 0
scaling_available_frequencies -> 2600000 1400000
scaling_available_governors -> conservative ondemand userspace powersave
performance
scaling_cur_freq -> 2600000
scaling_driver -> acpi-cpufreq
scaling_governor -> performance
scaling_max_freq -> 2600000
scaling_min_freq -> 1400000
scaling_setspeed -> <unsupported>

If there''s any other entry you would like to hear please let me know :)
Meanwhile the machine is still stable after going through several kernel
compilations and some heavy I/O (just for testing).

Regards,

Wouter.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2013-Nov-07 13:15 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

>>> On 07.11.13 at 14:10, Wouter de Geus
<benv-xensource.com@junerules.com> wrote:
> * Jan Beulich <JBeulich@suse.com> [2013-11-07 11:57:17 +0000]:
> 
>> > - PowerCap     [P-state 0] (P-state 0 through 4)
>> 
>> Now we''d need to know what HPC actually means (it means
nothing
>> to me in this context) - I''d have expected the PowerCap (as
referring
>> to P-states) to be the interesting one.
> 
> Would you like me to test the PowerCap setting? If so, in combination with
> the other settings set to what? Note that the PowerCap setting
can''t be
> disabled by itself. (unless P-state 4 counts as disabled?)
> 
> According to a faq on supermicro.com (this is a Supermicro board after all)
> http://www.supermicro.com/Aplus/support/faqs/faq.cfm?faq=13400 
> ---
> Q: I noticed that the newer BIOS supporting AMD 6200 series CPUs have a
>    P-state HPC Mode option. Can you provide some info on this mode?
> A: HPC mode only keeps maximum and minimum states. In system idle mode CPU
>    will stay at P4 state for power saving. Once CPU detects higher
>    activities, CPU will jump up to P0 or boost state to reduce clock ramp
>    up latency.
That suggests that P4 is the lowest power state (and the only low
power one in HPC mode), i.e. not meaning disabled. P0 alone would
then mean disabled afaict. And in non-HPC mode I would conclude
intermediate states are also allowed, in which case limiting the number
of states might be interesting for you to try out.

Jan

Wouter de Geus

2013-Nov-07 13:46 UTC

head link

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

* Jan Beulich <JBeulich@suse.com> [2013-11-07 13:15:26 +0000]:
> That suggests that P4 is the lowest power state (and the only low
> power one in HPC mode), i.e. not meaning disabled. P0 alone would
> then mean disabled afaict. And in non-HPC mode I would conclude
> intermediate states are also allowed, in which case limiting the number
> of states might be interesting for you to try out.
Well, non-HPC mode works for me, and I don''t really see the advantage
of HPC mode anyway. So as far as I''m concerned I''ll leave it
off.

So if there''s any mode you want me to try, please be specific in what
to test and what to report and I''ll try it out :)

Regards,

Wouter.

Maybe Matching Threads

Search for more reasonably related threads

Xen users - Nov 2013 - Xen 4.3.1 / Linux 3.12 panic

Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-devel] Xen 4.3.1 / Linux 3.12 panic

Re: Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-devel] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-devel] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic

Maybe Matching Threads