Konrad Rzeszutek Wilk
2013-May-30 18:29 UTC
WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() on v3.10-rc3
Hello, I had not yet done a full git bisection run but since this is new code added in v3.10. I did not see this in v3.9. I think I saw this in v3.10-rc1 but never got to look at it in depth. Either way on a PV guest, if I do: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online I get this fat warning: [ 39.946760] Broke affinity for irq 16 [ 40.071242] installing Xen timer for CPU 1 [ 40.076109] cpu 1 spinlock event irq 48 [ 40.081207] ------------[ cut here ]------------ [ 40.085841] WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() [ 40.095970] Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi libcrc32c crc32c i915 radeon sg sd_mod mperf skge e1000 sata_nv nouveau ata_generic libata fbcon tileblit font bitblit scsi_mod ttm softcursor drm_kms_helper mxm_wmi video wmi xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd [ 40.130893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1 [ 40.139212] Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009 [ 40.147098] ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0 [ 40.154550] ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010 [ 40.162004] 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0 [ 40.169458] Call Trace: [ 40.171977] [<ffffffff816707c8>] dump_stack+0x19/0x1b [ 40.177175] [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0 [ 40.183243] [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20 [ 40.189134] [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0 [ 40.195287] [<ffffffff810da755>] cpu_startup_entry+0x205/0x250 [ 40.201268] [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15 [ 40.207332] ---[ end trace 915c8c486004dda1 ]--- which I presume is b/c the code does not expect to be run _after_ it has offlined. However, under the PV code, the mechanism is that that a CPU that has been offlined, can resume (if it is onlined). If you look at: 445 static void __cpuinit xen_play_dead(void) /* used only with HOTPLUG_CPU */ 446 { 447 play_dead_common(); 448 HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL); 449 cpu_bringup(); 450 } That is called right after the CPU is put to sleep and the hypercall VCPUOP_down blocks - until the CPU is brough back up. And which point we end up calling cpu_bringup - which sets up the clockevets, timers, etc. I am wondering if part of this is that the ts->inidle gets reset b/c we end up resetting all the timers but then when xen_play_dead exits, it ends up right back in the cpu_idle_loop() loop - and we call tick_nohz_idle_exit(). Thoughts?
Thomas Gleixner
2013-May-30 20:05 UTC
Re: WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() on v3.10-rc3
On Thu, 30 May 2013, Konrad Rzeszutek Wilk wrote:> [ 40.085841] WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() > > which I presume is b/c the code does not expect to be run _after_ it has > offlined. However, under the PV code, the mechanism is that that a CPU > that has been offlined, can resume (if it is onlined). If you look at: > > 445 static void __cpuinit xen_play_dead(void) /* used only with HOTPLUG_CPU */ > 446 { > 447 play_dead_common(); > 448 HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL); > 449 cpu_bringup(); > 450 } > > That is called right after the CPU is put to sleep and the hypercall > VCPUOP_down blocks - until the CPU is brough back up. And which point > we end up calling cpu_bringup - which sets up the clockevets, timers, etc. > > I am wondering if part of this is that the ts->inidle gets reset > b/c we end up resetting all the timers but then when xen_play_dead > exits, it ends up right back in the cpu_idle_loop() loop - and we > call tick_nohz_idle_exit(). > > Thoughts?cpu_dead() is definitely not expected to return after the cpu has been declared dead. I should have put a big fat warning into the generic idle loop for this :) The reason why you get that warning only now is commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu down), which is btw. targeted for stable as well. We can''t revert the above commit as it fixes a long standing nastiness, so for now until I come around to make the idle loop return on cpu down you probably need to call tick_nohz_idle_enter() before returning from play_dead(). Thanks, tglx
Konrad Rzeszutek Wilk
2013-Jun-03 13:42 UTC
Re: WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() on v3.10-rc3
On Thu, May 30, 2013 at 10:05:46PM +0200, Thomas Gleixner wrote:> On Thu, 30 May 2013, Konrad Rzeszutek Wilk wrote: > > [ 40.085841] WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() > > > > which I presume is b/c the code does not expect to be run _after_ it has > > offlined. However, under the PV code, the mechanism is that that a CPU > > that has been offlined, can resume (if it is onlined). If you look at: > > > > 445 static void __cpuinit xen_play_dead(void) /* used only with HOTPLUG_CPU */ > > 446 { > > 447 play_dead_common(); > > 448 HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL); > > 449 cpu_bringup(); > > 450 } > > > > That is called right after the CPU is put to sleep and the hypercall > > VCPUOP_down blocks - until the CPU is brough back up. And which point > > we end up calling cpu_bringup - which sets up the clockevets, timers, etc. > > > > I am wondering if part of this is that the ts->inidle gets reset > > b/c we end up resetting all the timers but then when xen_play_dead > > exits, it ends up right back in the cpu_idle_loop() loop - and we > > call tick_nohz_idle_exit(). > > > > Thoughts? > > cpu_dead() is definitely not expected to return after the cpu has been > declared dead. I should have put a big fat warning into the generic > idle loop for this :) > > The reason why you get that warning only now is commit 4b0c0f294 > (tick: Cleanup NOHZ per cpu data on cpu down), which is btw. targeted > for stable as well.Ah, that would explain it. Thanks!> > We can''t revert the above commit as it fixes a long standing > nastiness, so for now until I come around to make the idle loop return > on cpu down you probably need to call tick_nohz_idle_enter() before > returning from play_dead().OK. Could you keep me in mind when you do that cleanup and CC me? Thank you.