thr3ads.net - Linux Virtualization - Divide error in kvm_unlock

If this information is useful, please help other people find it:
Share via:

Chris Webb

2014-May-29 18:03 UTC

Divide error in kvm_unlock_kick()

Paolo Bonzini <pbonzini at redhat.com> wrote:
> Il 29/05/2014 19:45, Chris Webb ha scritto:
>> Chris Webb <chris at arachsys.com> wrote:
>> 
>>> My CPU flags inside the crashing guest look like this:
>>> 
>>> fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush
>>> mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm rep_good
nopl
>>> extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic
popcnt aes xsave
>>> avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a
misalignsse
>>> 3dnowprefetch osvw xop fma4 tbm arat npt nrip_save tsc_adjust bmi1
>>> 
>>> whereas in a (working) -cpu qemu64 guest, they look like this:
>>> 
>>> fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx
>>> fxsr sse sse2 ht syscall nx lm nopl pni cx16 x2apic popcnt
hypervisor lahf_lm
>>> cmp_legacy svm abm sse4a
>> 
>> I thought I'd try to bisect on processor flags to see which
was/were
>> implicated.
> 
> Can you dump the full /proc/cpuinfo?
On the host, it looks like this:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron(tm) Processor 6328
stepping	: 0
microcode	: 0x600081c
cpu MHz		: 3200.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 32
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm
constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm
cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat
cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
decodeassists pausefilter pfthreshold bmi1
bogomips	: 6399.89
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

[ x8 for processor 0 -> 7; full dump at http://cdw.me.uk/tmp/host-cpuinfo.txt
]

and on the guest it looks like:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron(tm) Processor 6328
stepping	: 0
microcode	: 0x1000065
cpu MHz		: 3199.946
cache size	: 2048 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm rep_good nopl
extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave
avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw xop fma4 tbm arat npt nrip_save tsc_adjust bmi1
bogomips	: 6399.89
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

[ x4 for processor 0 -> 3; full dump at
http://cdw.me.uk/tmp/guest-cpuinfo.txt ]

Many thanks in advance for any pointers.

Best wishes,

Chris.

Chris Webb

2014-Jun-01 12:36 UTC

head link

Divide error in kvm_unlock_kick()

I realised my original bug report was for a guest kernel compiled without
frame pointers which might be unhelpful, so I enabled CONFIG_DEBUG_INFO and
CONFIG_FRAME_POINTER, but I don't think this has made the backtrace any more
detailed.

Is there anything more I can do to pinpoint what might be going on here?

Cheers,

Chris.


divide error: 0000 [#1] PREEMPT SMP 
Modules linked in:
CPU: 1 PID: 1013 Comm: mkdir Not tainted 3.14.4-guest #21
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
task: ffff88007c8cf400 ti: ffff88007c7c6000 task.ti: ffff88007c7c6000
RIP: 0010:[<ffffffff8102ea86>]  [<ffffffff8102ea86>]
kvm_unlock_kick+0x69/0x73
RSP: 0000:ffff88007fc83ca8  EFLAGS: 00010046
RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 0000000000000002 RSI: ffff88007fd11d40 RDI: ffffffff8198f840
RBP: ffff88007fc83cc0 R08: 0000000000000000 R09: ffffffff8198f840
R10: 000000000000b5e0 R11: 0000000000000005 R12: ffff88007fd11d40
R13: 000000000000cec0 R14: ffff88007d382b80 R15: 0000000000000002
FS:  00007f4c6e265700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4c6dc9a080 CR3: 000000007c62e000 CR4: 00000000000406e0
Stack:
 0000000000011d40 ffff88007fd11d40 0000000000000002 ffff88007fc83cd0
 ffffffff815852d0 ffff88007fc83d20 ffffffff810dd694 ffff88007fd00000
 0000000000000046 ffff88007d383172 ffff88007d3abe68 0000000000000003
Call Trace:
 <IRQ> 
 [<ffffffff815852d0>] _raw_spin_unlock+0x36/0x5b
 [<ffffffff810dd694>] try_to_wake_up+0x1f4/0x217
 [<ffffffff810dd6f6>] default_wake_function+0xd/0xf
 [<ffffffff810e99f0>] autoremove_wake_function+0xd/0x2f
 [<ffffffff810e944f>] __wake_up_common+0x50/0x7c
 [<ffffffff810e962f>] __wake_up+0x34/0x46
 [<ffffffff810f3b45>] rsp_wakeup+0x1c/0x1e
 [<ffffffff81112e31>] irq_work_run+0x77/0x9b
 [<ffffffff810063e2>] smp_irq_work_interrupt+0x2a/0x31
 [<ffffffff8158739d>] irq_work_interrupt+0x6d/0x80
 [<ffffffff81585336>] ? _raw_spin_unlock_irqrestore+0x41/0x6a
 [<ffffffff810f5402>] rcu_process_callbacks+0x162/0x486
 [<ffffffff810c4140>] ? run_timer_softirq+0x19f/0x1c0
 [<ffffffff810be612>] __do_softirq+0xe1/0x1e9
 [<ffffffff810be8b7>] irq_exit+0x40/0x87
 [<ffffffff810283f1>] smp_apic_timer_interrupt+0x3f/0x4b
 [<ffffffff81586e9d>] apic_timer_interrupt+0x6d/0x80
 <EOI> 
Code: c5 40 50 87 81 49 8d 44 0d 00 48 8b 30 4c 39 e6 75 c9 8a 40 08 38 d8 75 c2
48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 <0f> 01 c1 5b 41 5c
41 5d 5d c3 4c 8d 54 24 08 48 83 e4 f0 b9 0a
RIP  [<ffffffff8102ea86>] kvm_unlock_kick+0x69/0x73
 RSP <ffff88007fc83ca8>
---[ end trace ed563ea2dedc59b5 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffff9fffffff)

Chris Webb

2014-Jun-17 10:27 UTC

head link

Divide error in kvm_unlock_kick()

I see kernel 3.15 is now out, so I retested with 3.15 guest and host. I'm
still getting exactly the same guest kernel panic: a divide error in
kvm_unlock_kick with -cpu host, but not with -cpu qemu64:

divide error: 0000 [#1] PREEMPT SMP 
Modules linked in:
CPU: 1 PID: 781 Comm: mkdir Not tainted 3.15.0-guest #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
task: ffff88007cbf6180 ti: ffff880000088000 task.ti: ffff880000088000
RIP: 0010:[<ffffffff8102d1e0>]  [<ffffffff8102d1e0>]
kvm_unlock_kick+0x63/0x6b
RSP: 0000:ffff88007fc83d38  EFLAGS: 00010046
RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 0000000000000002 RSI: ffff88007fd11d80 RDI: ffffffff81994840
RBP: ffff88007fd11d80 R08: 0000000000000000 R09: ffffffff81994840
R10: ffff88007c480c88 R11: 0000000000000005 R12: 000000000000cec0
R13: ffff88007d38332a R14: 0000000000000002 R15: ffff88007d382d00
FS:  00007fdabf7fd700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0643f6509 CR3: 000000007c028000 CR4: 00000000000406e0
Stack:
 0000000000011d80 0000000000000002 ffff88007fd11d80 ffffffff8156f83f
 ffffffff810dba53 0000000000000046 ffff88007fd00000 ffff88007d3bbe70
 ffffffff81845da8 0000000000000003 0000000000000000 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffff8156f83f>] ? _raw_spin_unlock+0x32/0x55
 [<ffffffff810dba53>] ? try_to_wake_up+0x1ed/0x20f
 [<ffffffff810e78b8>] ? autoremove_wake_function+0x9/0x2a
 [<ffffffff810e739a>] ? __wake_up_common+0x47/0x73
 [<ffffffff810e7547>] ? __wake_up+0x33/0x44
 [<ffffffff8110f10b>] ? irq_work_run+0x72/0x8f
 [<ffffffff81006079>] ? smp_irq_work_interrupt+0x26/0x2b
 [<ffffffff8157185d>] ? irq_work_interrupt+0x6d/0x80
 [<ffffffff810dba64>] ? try_to_wake_up+0x1fe/0x20f
 [<ffffffff8102ad01>] ? native_apic_msr_read+0x6/0x4e
 [<ffffffff8156f89f>] ? _raw_spin_unlock_irqrestore+0x3d/0x65
 [<ffffffff810f2de3>] ? rcu_process_callbacks+0x15e/0x47d
 [<ffffffff810cccf3>] ? execute_in_process_context+0x55/0x55
 [<ffffffff810bdb98>] ? __do_softirq+0xe0/0x1e6
 [<ffffffff810bde23>] ? irq_exit+0x3c/0x81
 [<ffffffff810270e4>] ? smp_apic_timer_interrupt+0x3b/0x46
 [<ffffffff8157135d>] ? apic_timer_interrupt+0x6d/0x80
 <EOI> 
Code: 0c c5 c0 b8 87 81 49 8d 04 0c 48 8b 30 48 39 ee 75 ca 8a 40 08 38 d8 75 c3
48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 <0f> 01 c1 5b 5d 41
5c c3 4c 8d 54 24 08 48 83 e4 f0 b9 0a 00 00
RIP  [<ffffffff8102d1e0>] kvm_unlock_kick+0x63/0x6b
 RSP <ffff88007fc83d38>
---[ end trace 949b1bf47cc57d09 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffff9fffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt

I'm at a complete loss as to what to do next to debug this. Any help would
be
extremely gratefully received!

I've put 3.15 host and guest configs here:

  http://cdw.me.uk/tmp/3.15-guest-config.txt
  http://cdw.me.uk/tmp/3.15-host-config.txt

dmesg just after boot here:

  http://cdw.me.uk/tmp/3.15-guest-dmesg.txt
  http://cdw.me.uk/tmp/3.15-host-dmesg.txt

and /proc/cpuinfo from both host and guest here:

  http://cdw.me.uk/tmp/3.15-guest-cpuinfo.txt
  http://cdw.me.uk/tmp/3.15-host-cpuinfo.txt

The qemu command line was

  qemu-system-x86 -enable-kvm -cpu host -machine q35 -m 2048 -name omega \
    -smp sockets=1,cores=4 -pidfile /run/omega.pid -runas nobody \
    -serial stdio -vga none -vnc none -kernel /boot/vmlinuz-guest \
    -append "console=ttyS0 root=/dev/vda" \
    -drive file=/dev/guest/omega,cache=none,format=raw,if=virtio \
    -device virtio-rng-pci \
    -device virtio-net-pci,netdev=nic,mac=02:14:72:3c:69:54 \
    -netdev tap,id=nic,fd=3,vhost=on 3<>/dev/tapNNN

but removing the -machine q35 and -device virtio-rng-pci doesn't affect the
crash.

Dropping to -smp 1, running with -cpu qemu64, or compiling the guest kernel
without paravirtualised spinlock support does remove the panic, albeit at the
cost of performance.

Best wishes,

Chris.

Apparently Analagous Threads

Search for more maybe matching threads

Linux Virtualization - Jun 2014 - Divide error in kvm_unlock_kick()

Divide error in kvm_unlock_kick()

Divide error in kvm_unlock_kick()

Divide error in kvm_unlock_kick()

Apparently Analagous Threads