I have the following setup: 2 Dom0 running Debian Lenny Kernel 2.6.32.21/64bit ontop Xen 4.0.1 HV. Storage is iSCSI (Equallogic). Dom0 Hardware is Dell M610 with 5640 CPUs. I''am trying to implement migration and cpu/memory hotplug in kernel 2.6.32.48/32bit PAE running in a Debian 6 domU. I can hotplug and -remove CPU and RAM without problems (after adding udev rules for taking hotplugged CPUs online) after createion of the cdomU, but if I do a migration from one dom0 to the other and hotplug cpus afterwards via xm vcpu-set the domU crashes with an error similar to this: [49525.372432] installing Xen timer for CPU 1 [49525.372469] SMP alternatives: switching to SMP code [49451.900582] Initializing CPU#1 [2575388.916907] CPU: L1 I cache: 32K, L1 D cache: 32K [2575388.916907] CPU: L2 cache: 256K [2575388.916907] CPU: L3 cache: 12288K [2575388.916907] CPU: Unsupported number of siblings 32 [2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s! [bash:7130] [2575388.944900] Modules linked in: iptable_filter ip_tables x_tables nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc loop evdev snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache raid1 md_mod xen_netfront xen_blkfront [2575388.944978] [2575388.944985] Pid: 7130, comm: bash Not tainted (2.6.32.48 #5) [2575388.944993] EIP: 0061:[<c1002227>] EFLAGS: 00000246 CPU: 0 [2575388.945004] EIP is at hypercall_page+0x227/0x1001 [2575388.945011] EAX: 00040000 EBX: 00000000 ECX: 00000000 EDX: cf235500 [2575388.945020] ESI: 7fffffff EDI: cf235500 EBP: d5d1be6c ESP: d5d1be00 [2575388.945028] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [2575388.945036] CR0: 8005003b CR2: 083e27d4 CR3: 1fbb9000 CR4: 00002660 [2575388.945046] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [2575388.945054] DR6: ffff0ff0 DR7: 00000400 [2575388.945060] Call Trace: [2575388.945071] [<c100570c>] ? xen_force_evtchn_callback+0xc/0x10 [2575388.945082] [<c1005d58>] ? check_events+0x8/0xc [2575388.945092] [<c1005d17>] ? xen_irq_enable_direct_end+0x0/0x1 [2575388.945104] [<c127666d>] ? wait_for_common+0xac/0x113 [2575388.945116] [<c10324fe>] ? default_wake_function+0x0/0x8 [2575388.945127] [<c1047b16>] ? synchronize_sched+0x3e/0x43 [2575388.945137] [<c1047b1b>] ? wakeme_after_rcu+0x0/0x9 [2575388.945146] [<c102f02f>] ? free_rootdomain+0x8/0x18 [2575388.945155] [<c102f5ee>] ? cpu_attach_domain+0x11f/0x159 [2575388.945165] [<c102ec07>] ? sd_free_ctl_entry+0x35/0x3e [2575388.945176] [<c10b4175>] ? kfree+0xa5/0xaa [2575388.945185] [<c102fe17>] ? partition_sched_domains+0xed/0x257 [2575388.945195] [<c10324f4>] ? try_to_wake_up+0x282/0x28c [2575388.945206] [<c1068cf4>] ? cpuset_track_online_cpus+0x6b/0x77 [2575388.945217] [<c127919c>] ? notifier_call_chain+0x23/0x46 [2575388.945227] [<c104cb7c>] ? raw_notifier_call_chain+0x9/0xc [2575388.945237] [<c1273efb>] ? _cpu_up+0xba/0xf8 [2575388.945246] [<c1273f7d>] ? cpu_up+0x44/0x52 [2575388.945256] [<c1267d32>] ? store_online+0x37/0x54 [2575388.945265] [<c1267cfb>] ? store_online+0x0/0x54 [2575388.945275] [<c11bc195>] ? sysdev_store+0x19/0x1d [2575388.945285] [<c10f8774>] ? sysfs_write_file+0xb8/0xe5 [2575388.945295] [<c10f86bc>] ? sysfs_write_file+0x0/0xe5 [2575388.945305] [<c10b9d0c>] ? vfs_write+0x7f/0xda [2575388.945314] [<c10b9dfa>] ? sys_write+0x3c/0x60 [2575388.945324] [<c100801b>] ? sysenter_do_call+0x12/0x28 after I set the cpu1 online by issuing echo 1 > /sys/devices/system/cpu/cpu1/online either by udev script or by hand. I''ve searched the net up and down and tried various acpi and timer settings but found nothing which has impact on this error. The error also appears with stock Debian 6 kernel 2.6.32-5-*. Any idea anyone? regards tim _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
George Shuklin
2011-Nov-13 17:26 UTC
Re: [Xen-users] DomU crashing in CPU hotplug after migration
I can''t be sure, but I have some issues with vcpus-max > vcpus-at-startup (e.g. not all CPU online at the moment of migration). I set them equal and crash stopped. It was 2.6.34-xen (from suse 11.3), not sure this really related to question. On 13.11.2011 13:42, Tim Evers wrote:> I have the following setup: > > 2 Dom0 running Debian Lenny Kernel 2.6.32.21/64bit ontop Xen 4.0.1 HV. > Storage is iSCSI (Equallogic). Dom0 Hardware is Dell M610 with 5640 CPUs. > > I''am trying to implement migration and cpu/memory hotplug in kernel > 2.6.32.48/32bit PAE running in a Debian 6 domU. I can hotplug and > -remove CPU and RAM without problems (after adding udev rules for taking > hotplugged CPUs online) after createion of the cdomU, but if I do a > migration from one dom0 to the other and hotplug cpus afterwards via xm > vcpu-set the domU crashes with an error similar to this: > > [49525.372432] installing Xen timer for CPU 1 > [49525.372469] SMP alternatives: switching to SMP code > [49451.900582] Initializing CPU#1 > [2575388.916907] CPU: L1 I cache: 32K, L1 D cache: 32K > [2575388.916907] CPU: L2 cache: 256K > [2575388.916907] CPU: L3 cache: 12288K > [2575388.916907] CPU: Unsupported number of siblings 32 > [2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s! [bash:7130] > [2575388.944900] Modules linked in: iptable_filter ip_tables x_tables > nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc loop evdev > snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache > raid1 md_mod xen_netfront xen_blkfront > [2575388.944978] > [2575388.944985] Pid: 7130, comm: bash Not tainted (2.6.32.48 #5) > [2575388.944993] EIP: 0061:[<c1002227>] EFLAGS: 00000246 CPU: 0 > [2575388.945004] EIP is at hypercall_page+0x227/0x1001 > [2575388.945011] EAX: 00040000 EBX: 00000000 ECX: 00000000 EDX: cf235500 > [2575388.945020] ESI: 7fffffff EDI: cf235500 EBP: d5d1be6c ESP: d5d1be00 > [2575388.945028] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > [2575388.945036] CR0: 8005003b CR2: 083e27d4 CR3: 1fbb9000 CR4: 00002660 > [2575388.945046] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [2575388.945054] DR6: ffff0ff0 DR7: 00000400 > [2575388.945060] Call Trace: > [2575388.945071] [<c100570c>] ? xen_force_evtchn_callback+0xc/0x10 > [2575388.945082] [<c1005d58>] ? check_events+0x8/0xc > [2575388.945092] [<c1005d17>] ? xen_irq_enable_direct_end+0x0/0x1 > [2575388.945104] [<c127666d>] ? wait_for_common+0xac/0x113 > [2575388.945116] [<c10324fe>] ? default_wake_function+0x0/0x8 > [2575388.945127] [<c1047b16>] ? synchronize_sched+0x3e/0x43 > [2575388.945137] [<c1047b1b>] ? wakeme_after_rcu+0x0/0x9 > [2575388.945146] [<c102f02f>] ? free_rootdomain+0x8/0x18 > [2575388.945155] [<c102f5ee>] ? cpu_attach_domain+0x11f/0x159 > [2575388.945165] [<c102ec07>] ? sd_free_ctl_entry+0x35/0x3e > [2575388.945176] [<c10b4175>] ? kfree+0xa5/0xaa > [2575388.945185] [<c102fe17>] ? partition_sched_domains+0xed/0x257 > [2575388.945195] [<c10324f4>] ? try_to_wake_up+0x282/0x28c > [2575388.945206] [<c1068cf4>] ? cpuset_track_online_cpus+0x6b/0x77 > [2575388.945217] [<c127919c>] ? notifier_call_chain+0x23/0x46 > [2575388.945227] [<c104cb7c>] ? raw_notifier_call_chain+0x9/0xc > [2575388.945237] [<c1273efb>] ? _cpu_up+0xba/0xf8 > [2575388.945246] [<c1273f7d>] ? cpu_up+0x44/0x52 > [2575388.945256] [<c1267d32>] ? store_online+0x37/0x54 > [2575388.945265] [<c1267cfb>] ? store_online+0x0/0x54 > [2575388.945275] [<c11bc195>] ? sysdev_store+0x19/0x1d > [2575388.945285] [<c10f8774>] ? sysfs_write_file+0xb8/0xe5 > [2575388.945295] [<c10f86bc>] ? sysfs_write_file+0x0/0xe5 > [2575388.945305] [<c10b9d0c>] ? vfs_write+0x7f/0xda > [2575388.945314] [<c10b9dfa>] ? sys_write+0x3c/0x60 > [2575388.945324] [<c100801b>] ? sysenter_do_call+0x12/0x28 > > after I set the cpu1 online by issuing > > echo 1> /sys/devices/system/cpu/cpu1/online > > either by udev script or by hand. > > I''ve searched the net up and down and tried various acpi and timer > settings but found nothing which has impact on this error. The error > also appears with stock Debian 6 kernel 2.6.32-5-*. > > Any idea anyone? > > regards > > tim > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Adi Kriegisch
2011-Nov-14 07:07 UTC
Re: [Xen-users] DomU crashing in CPU hotplug after migration
Dear Tim,> I have the following setup:[SNIP]> migration from one dom0 to the other and hotplug cpus afterwards via xm > vcpu-set the domU crashes with an error similar to this: > > [49525.372432] installing Xen timer for CPU 1[SNIP]> [2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s! [bash:7130][SNIP]> Any idea anyone?Hmm, you are possibly experiencing unstable clock sources? 2352393 seconds are approximately a month -- you probably did not wait that long; so bash probably wasn''t stuck for a month. There are several pages out there explaining the details about getting clock sources stable. Hope it helps, Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tim Evers
2011-Nov-14 09:42 UTC
Re: [Xen-users] DomU crashing in CPU hotplug after migration
Am 14.11.11 08:07, schrieb Adi Kriegisch:> Dear Tim, > >> I have the following setup: > [SNIP] >> migration from one dom0 to the other and hotplug cpus afterwards via xm >> vcpu-set the domU crashes with an error similar to this: >> >> [49525.372432] installing Xen timer for CPU 1 > [SNIP] >> [2575388.944891] BUG: soft lockup - CPU#0 stuck for 2352393s! [bash:7130] > [SNIP] >> Any idea anyone? > Hmm, you are possibly experiencing unstable clock sources? 2352393 seconds > are approximately a month -- you probably did not wait that long; so bash > probably wasn''t stuck for a month. > There are several pages out there explaining the details about getting > clock sources stable.Hi, thanks for your hint. I''ve noticed that too but it seems I have no options regarding clocksource: tsc gives me the infamous "clock went backwards"-error with lockup jiffies simply runs at about twice the speed it should xen gives a stable time All crash in the above scenario. As I understand the docs every pv_ops domu runs with independent clock but this should only affect ntp which is not relevant to this problem I think. I''m really confused since I''m not able to get a fully working xen+pv_ops kernel to run which let''s me: - hotplug cpus - hotplug memory - migrate I''ve tried: Stock Debian 6 kernel 2.6.32 Stock Ubuntu 10.04 lernel 2.6.32 Ubuntu Backports Kernel 2.6.35 Ubuntu Backports Kernel 2.6.38 kernel.org Kernel 2.6.32.48 without success. I can''t believe that I''m the only one testing this, so I assume I have some error in my setup. Unfortunately after one complete week of testing I have no clue left what it would be... regards Tim _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Adi Kriegisch
2011-Nov-14 10:24 UTC
Re: [Xen-users] DomU crashing in CPU hotplug after migration
Dear Tim,> thanks for your hint. I''ve noticed that too but it seems I have no > options regarding clocksource: > > tsc gives me the infamous "clock went backwards"-error with lockup > jiffies simply runs at about twice the speed it should > xen gives a stable timeI see; do your Dom0''s have a synchronized clock?> I''ve tried: > > Stock Debian 6 kernel 2.6.32I cannot say anything about the other kernels, but there is a related bug in the recent version of the Debian Squeeze kernel package version 2.6.32-38: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=644604 This is fixed in 2.6.32-39 which is pending in stable-proposed-updates. So if you''re running any kernel from 2.6.32-38 in your DomU, you should consider upgrading to the kernel in s-p-o and try again. -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tim Evers
2011-Nov-14 19:01 UTC
Re: [Xen-users] DomU crashing in CPU hotplug after migration
Am 14.11.11 11:24, schrieb Adi Kriegisch:> Dear Tim, > >> thanks for your hint. I''ve noticed that too but it seems I have no >> options regarding clocksource: >> >> tsc gives me the infamous "clock went backwards"-error with lockup >> jiffies simply runs at about twice the speed it should >> xen gives a stable time > I see; do your Dom0''s have a synchronized clock? > >> I''ve tried: >> >> Stock Debian 6 kernel 2.6.32 > I cannot say anything about the other kernels, but there is a related bug > in the recent version of the Debian Squeeze kernel package version > 2.6.32-38: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=644604 > > This is fixed in 2.6.32-39 which is pending in stable-proposed-updates. So > if you''re running any kernel from 2.6.32-38 in your DomU, you should > consider upgrading to the kernel in s-p-o and try again.I''ve tried. No success :( I''ve gathered some more test cases: 1. start vm with 8 vcpus, migrate -> runs 2. start vm with 8 vcpus, reduce to 3 vcpus, migrate, raise to 8 vcpus -> crash 3. start vm with 3 vcpus, raise to 8 vcpus, migrate, reduce to 3 vcpus -> runs 4. start vm with 3 vcpus, raise to 8 vcpus, migrate, reduce to 3 vcpus -> raise to 8 vcpus -> crash Seems that whatever I try - raising cpus (or to be precise: taking them online) after a migration occurred leads to a crash. BTW: save/restore shows the same behaviour, so it''s most likely not related to migration but to sleep/wakeup CPUs. regards Tim _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users