thr3ads.net - Xen users - [Xen-users] DomU crashing in CPU hotplug after migration [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Tim Evers

2011-Nov-13 09:42 UTC

[Xen-users] DomU crashing in CPU hotplug after migration

I have the following setup:

2 Dom0 running Debian Lenny Kernel 2.6.32.21/64bit ontop Xen 4.0.1 HV.
Storage is iSCSI (Equallogic). Dom0 Hardware is Dell M610 with 5640 CPUs.

I''am trying to implement migration and cpu/memory hotplug in kernel
2.6.32.48/32bit PAE running in a Debian 6 domU. I can hotplug and
-remove CPU and RAM without problems (after adding udev rules for taking
hotplugged CPUs online) after createion of the cdomU, but if I do a
migration from one dom0 to the other and hotplug cpus afterwards via xm
vcpu-set the domU crashes with an error similar to this:

[49525.372432] installing Xen timer for CPU 1
[49525.372469] SMP alternatives: switching to SMP code
[49451.900582] Initializing CPU#1
[2575388.916907] CPU: L1 I cache: 32K, L1 D cache: 32K
[2575388.916907] CPU: L2 cache: 256K
[2575388.916907] CPU: L3 cache: 12288K
[2575388.916907] CPU: Unsupported number of siblings 32
[2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s! [bash:7130]
[2575388.944900] Modules linked in: iptable_filter ip_tables x_tables
nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc loop evdev
snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache
raid1 md_mod xen_netfront xen_blkfront
[2575388.944978]
[2575388.944985] Pid: 7130, comm: bash Not tainted (2.6.32.48 #5)
[2575388.944993] EIP: 0061:[<c1002227>] EFLAGS: 00000246 CPU: 0
[2575388.945004] EIP is at hypercall_page+0x227/0x1001
[2575388.945011] EAX: 00040000 EBX: 00000000 ECX: 00000000 EDX: cf235500
[2575388.945020] ESI: 7fffffff EDI: cf235500 EBP: d5d1be6c ESP: d5d1be00
[2575388.945028]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[2575388.945036] CR0: 8005003b CR2: 083e27d4 CR3: 1fbb9000 CR4: 00002660
[2575388.945046] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[2575388.945054] DR6: ffff0ff0 DR7: 00000400
[2575388.945060] Call Trace:
[2575388.945071]  [<c100570c>] ? xen_force_evtchn_callback+0xc/0x10
[2575388.945082]  [<c1005d58>] ? check_events+0x8/0xc
[2575388.945092]  [<c1005d17>] ? xen_irq_enable_direct_end+0x0/0x1
[2575388.945104]  [<c127666d>] ? wait_for_common+0xac/0x113
[2575388.945116]  [<c10324fe>] ? default_wake_function+0x0/0x8
[2575388.945127]  [<c1047b16>] ? synchronize_sched+0x3e/0x43
[2575388.945137]  [<c1047b1b>] ? wakeme_after_rcu+0x0/0x9
[2575388.945146]  [<c102f02f>] ? free_rootdomain+0x8/0x18
[2575388.945155]  [<c102f5ee>] ? cpu_attach_domain+0x11f/0x159
[2575388.945165]  [<c102ec07>] ? sd_free_ctl_entry+0x35/0x3e
[2575388.945176]  [<c10b4175>] ? kfree+0xa5/0xaa
[2575388.945185]  [<c102fe17>] ? partition_sched_domains+0xed/0x257
[2575388.945195]  [<c10324f4>] ? try_to_wake_up+0x282/0x28c
[2575388.945206]  [<c1068cf4>] ? cpuset_track_online_cpus+0x6b/0x77
[2575388.945217]  [<c127919c>] ? notifier_call_chain+0x23/0x46
[2575388.945227]  [<c104cb7c>] ? raw_notifier_call_chain+0x9/0xc
[2575388.945237]  [<c1273efb>] ? _cpu_up+0xba/0xf8
[2575388.945246]  [<c1273f7d>] ? cpu_up+0x44/0x52
[2575388.945256]  [<c1267d32>] ? store_online+0x37/0x54
[2575388.945265]  [<c1267cfb>] ? store_online+0x0/0x54
[2575388.945275]  [<c11bc195>] ? sysdev_store+0x19/0x1d
[2575388.945285]  [<c10f8774>] ? sysfs_write_file+0xb8/0xe5
[2575388.945295]  [<c10f86bc>] ? sysfs_write_file+0x0/0xe5
[2575388.945305]  [<c10b9d0c>] ? vfs_write+0x7f/0xda
[2575388.945314]  [<c10b9dfa>] ? sys_write+0x3c/0x60
[2575388.945324]  [<c100801b>] ? sysenter_do_call+0x12/0x28

after I set the cpu1 online by issuing

echo 1 > /sys/devices/system/cpu/cpu1/online

either by udev script or by hand.

I''ve searched the net up and down and tried various acpi and timer
settings but found nothing which has impact on this error. The error
also appears with stock Debian 6 kernel 2.6.32-5-*.

Any idea anyone?

regards

tim

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

George Shuklin

2011-Nov-13 17:26 UTC

head link

Re: [Xen-users] DomU crashing in CPU hotplug after migration

I can''t be sure, but I have some issues with vcpus-max > 
vcpus-at-startup (e.g. not all CPU online at the moment of migration). I 
set them equal and crash stopped. It was 2.6.34-xen (from suse 11.3), 
not sure this really related to question.

On 13.11.2011 13:42, Tim Evers wrote:> I have the following setup:
>
> 2 Dom0 running Debian Lenny Kernel 2.6.32.21/64bit ontop Xen 4.0.1 HV.
> Storage is iSCSI (Equallogic). Dom0 Hardware is Dell M610 with 5640 CPUs.
>
> I''am trying to implement migration and cpu/memory hotplug in
kernel
> 2.6.32.48/32bit PAE running in a Debian 6 domU. I can hotplug and
> -remove CPU and RAM without problems (after adding udev rules for taking
> hotplugged CPUs online) after createion of the cdomU, but if I do a
> migration from one dom0 to the other and hotplug cpus afterwards via xm
> vcpu-set the domU crashes with an error similar to this:
>
> [49525.372432] installing Xen timer for CPU 1
> [49525.372469] SMP alternatives: switching to SMP code
> [49451.900582] Initializing CPU#1
> [2575388.916907] CPU: L1 I cache: 32K, L1 D cache: 32K
> [2575388.916907] CPU: L2 cache: 256K
> [2575388.916907] CPU: L3 cache: 12288K
> [2575388.916907] CPU: Unsupported number of siblings 32
> [2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s! [bash:7130]
> [2575388.944900] Modules linked in: iptable_filter ip_tables x_tables
> nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc loop evdev
> snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache
> raid1 md_mod xen_netfront xen_blkfront
> [2575388.944978]
> [2575388.944985] Pid: 7130, comm: bash Not tainted (2.6.32.48 #5)
> [2575388.944993] EIP: 0061:[<c1002227>] EFLAGS: 00000246 CPU: 0
> [2575388.945004] EIP is at hypercall_page+0x227/0x1001
> [2575388.945011] EAX: 00040000 EBX: 00000000 ECX: 00000000 EDX: cf235500
> [2575388.945020] ESI: 7fffffff EDI: cf235500 EBP: d5d1be6c ESP: d5d1be00
> [2575388.945028]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [2575388.945036] CR0: 8005003b CR2: 083e27d4 CR3: 1fbb9000 CR4: 00002660
> [2575388.945046] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [2575388.945054] DR6: ffff0ff0 DR7: 00000400
> [2575388.945060] Call Trace:
> [2575388.945071]  [<c100570c>] ? xen_force_evtchn_callback+0xc/0x10
> [2575388.945082]  [<c1005d58>] ? check_events+0x8/0xc
> [2575388.945092]  [<c1005d17>] ? xen_irq_enable_direct_end+0x0/0x1
> [2575388.945104]  [<c127666d>] ? wait_for_common+0xac/0x113
> [2575388.945116]  [<c10324fe>] ? default_wake_function+0x0/0x8
> [2575388.945127]  [<c1047b16>] ? synchronize_sched+0x3e/0x43
> [2575388.945137]  [<c1047b1b>] ? wakeme_after_rcu+0x0/0x9
> [2575388.945146]  [<c102f02f>] ? free_rootdomain+0x8/0x18
> [2575388.945155]  [<c102f5ee>] ? cpu_attach_domain+0x11f/0x159
> [2575388.945165]  [<c102ec07>] ? sd_free_ctl_entry+0x35/0x3e
> [2575388.945176]  [<c10b4175>] ? kfree+0xa5/0xaa
> [2575388.945185]  [<c102fe17>] ? partition_sched_domains+0xed/0x257
> [2575388.945195]  [<c10324f4>] ? try_to_wake_up+0x282/0x28c
> [2575388.945206]  [<c1068cf4>] ? cpuset_track_online_cpus+0x6b/0x77
> [2575388.945217]  [<c127919c>] ? notifier_call_chain+0x23/0x46
> [2575388.945227]  [<c104cb7c>] ? raw_notifier_call_chain+0x9/0xc
> [2575388.945237]  [<c1273efb>] ? _cpu_up+0xba/0xf8
> [2575388.945246]  [<c1273f7d>] ? cpu_up+0x44/0x52
> [2575388.945256]  [<c1267d32>] ? store_online+0x37/0x54
> [2575388.945265]  [<c1267cfb>] ? store_online+0x0/0x54
> [2575388.945275]  [<c11bc195>] ? sysdev_store+0x19/0x1d
> [2575388.945285]  [<c10f8774>] ? sysfs_write_file+0xb8/0xe5
> [2575388.945295]  [<c10f86bc>] ? sysfs_write_file+0x0/0xe5
> [2575388.945305]  [<c10b9d0c>] ? vfs_write+0x7f/0xda
> [2575388.945314]  [<c10b9dfa>] ? sys_write+0x3c/0x60
> [2575388.945324]  [<c100801b>] ? sysenter_do_call+0x12/0x28
>
> after I set the cpu1 online by issuing
>
> echo 1>  /sys/devices/system/cpu/cpu1/online
>
> either by udev script or by hand.
>
> I''ve searched the net up and down and tried various acpi and timer
> settings but found nothing which has impact on this error. The error
> also appears with stock Debian 6 kernel 2.6.32-5-*.
>
> Any idea anyone?
>
> regards
>
> tim
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Adi Kriegisch

2011-Nov-14 07:07 UTC

head link

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Dear Tim,
> I have the following setup:
[SNIP]> migration from one dom0 to the other and hotplug cpus afterwards via xm
> vcpu-set the domU crashes with an error similar to this:
> 
> [49525.372432] installing Xen timer for CPU 1
[SNIP]> [2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s! [bash:7130]
[SNIP]> Any idea anyone?Hmm, you are possibly experiencing unstable clock sources? 2352393 seconds
are approximately a month -- you probably did not wait that long; so bash
probably wasn''t stuck for a month.
There are several pages out there explaining the details about getting
clock sources stable.

Hope it helps,
    Adi

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tim Evers

2011-Nov-14 09:42 UTC

head link

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Am 14.11.11 08:07, schrieb Adi Kriegisch:> Dear Tim,
> 
>> I have the following setup:
> [SNIP]
>> migration from one dom0 to the other and hotplug cpus afterwards via xm
>> vcpu-set the domU crashes with an error similar to this:
>> 
>> [49525.372432] installing Xen timer for CPU 1
> [SNIP]
>> [2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s!
[bash:7130]
> [SNIP]
>> Any idea anyone?
> Hmm, you are possibly experiencing unstable clock sources? 2352393 seconds
> are approximately a month -- you probably did not wait that long; so bash
> probably wasn''t stuck for a month.
> There are several pages out there explaining the details about getting
> clock sources stable.
Hi,

thanks for your hint. I''ve noticed that too but it seems I have no
options regarding clocksource:

tsc gives me the infamous "clock went backwards"-error with lockup
jiffies simply runs at about twice the speed it should
xen gives a stable time

All crash in the above scenario.

As I understand the docs every pv_ops domu runs with independent clock
but this should only affect ntp which is not relevant to this problem I
think.

I''m really confused since I''m not able to get a fully working
xen+pv_ops
kernel to run which let''s me:

- hotplug cpus
- hotplug memory
- migrate

I''ve tried:

Stock Debian 6 kernel 2.6.32
Stock Ubuntu 10.04 lernel 2.6.32
Ubuntu Backports Kernel 2.6.35
Ubuntu Backports Kernel 2.6.38
kernel.org Kernel 2.6.32.48

without success.

I can''t believe that I''m the only one testing this, so I
assume I have
some error in my setup. Unfortunately after one complete week of testing
I have no clue left what it would be...

regards

Tim

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Adi Kriegisch

2011-Nov-14 10:24 UTC

head link

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Dear Tim, 
> thanks for your hint. I''ve noticed that too but it seems I have no
> options regarding clocksource:
> 
> tsc gives me the infamous "clock went backwards"-error with
lockup
> jiffies simply runs at about twice the speed it should
> xen gives a stable timeI see; do your Dom0''s have a synchronized clock?
 > I''ve tried:
> 
> Stock Debian 6 kernel 2.6.32I cannot say anything about the other kernels, but there is a related bug
in the recent version of the Debian Squeeze kernel package version
2.6.32-38: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=644604

This is fixed in 2.6.32-39 which is pending in stable-proposed-updates. So
if you''re running any kernel from 2.6.32-38 in your DomU, you should
consider upgrading to the kernel in s-p-o and try again.

-- Adi

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tim Evers

2011-Nov-14 19:01 UTC

head link

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Am 14.11.11 11:24, schrieb Adi Kriegisch:> Dear Tim, 
> 
>> thanks for your hint. I''ve noticed that too but it seems I
have no
>> options regarding clocksource:
>> 
>> tsc gives me the infamous "clock went backwards"-error with
lockup
>> jiffies simply runs at about twice the speed it should
>> xen gives a stable time
> I see; do your Dom0''s have a synchronized clock?
>  
>> I''ve tried:
>> 
>> Stock Debian 6 kernel 2.6.32
> I cannot say anything about the other kernels, but there is a related bug
> in the recent version of the Debian Squeeze kernel package version
> 2.6.32-38: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=644604
> 
> This is fixed in 2.6.32-39 which is pending in stable-proposed-updates. So
> if you''re running any kernel from 2.6.32-38 in your DomU, you
should
> consider upgrading to the kernel in s-p-o and try again.
I''ve tried. No success :(
I''ve gathered some more test cases:

1. start vm with 8 vcpus, migrate -> runs
2. start vm with 8 vcpus, reduce to 3 vcpus, migrate, raise to 8 vcpus
-> crash
3. start vm with 3 vcpus, raise to 8 vcpus, migrate, reduce to 3 vcpus
-> runs
4. start vm with 3 vcpus, raise to 8 vcpus, migrate, reduce to 3 vcpus
-> raise to 8 vcpus -> crash

Seems that whatever I try - raising cpus (or to be precise: taking them
online) after a migration occurred leads to a crash.

BTW: save/restore shows the same behaviour, so it''s most likely not
related to migration but to sleep/wakeup CPUs.

regards

Tim

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Nov 2011 - DomU crashing in CPU hotplug after migration

[Xen-users] DomU crashing in CPU hotplug after migration

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Re: [Xen-users] DomU crashing in CPU hotplug after migration

Re: [Xen-users] DomU crashing in CPU hotplug after migration