BUG: soft lockup detected on CPU#0! Pid: 17178, comm: xvd 275 fd:86 EIP: 0061:[<c0133b65>] CPU: 1 EIP is at kthread_should_stop+0x15/0x20 EFLAGS: 00000246 Not tainted (2.6.16-xen0 #1) EAX: 00000001 EBX: 00000000 ECX: c05431b4 EDX: 00000000 ESI: 00000000 EDI: cc7b143c EBP: c5f11f98 DS: 007b ES: 007b CR0: 8005003b CR2: 090a8928 CR3: 0635f000 CR4: 00000660 [<c0387bd5>] blkif_schedule+0x25/0x4a0 [<c0133e90>] autoremove_wake_function+0x0/0x60 [<c0133c6f>] kthread+0xff/0x110 [<c0387bb0>] blkif_schedule+0x0/0x4a0 [<c0133b70>] kthread+0x0/0x110 [<c0102ca5>] kernel_thread_helper+0x5/0x10 These are happening every few minutes, for four domains, and rising. Machine is under somewhat high disk load (lots of scp processes copying large file systems, cfq disk sched). xen_changeset: Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0 Those domain have now zombified and are unkillable. I''ll grab the latest updates and reboot the box. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Itamar Reis Peixoto
2006-Apr-05 02:45 UTC
Re: [Xen-devel] BUG: soft lockup detected on CPU#0!
I have the same problem. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=543 xen_changeset : Sat Apr 1 14:59:12 2006 +0100 9511:60071beccf18 if you found a solution first, please send-me.> BUG: soft lockup detected on CPU#0! > Pid: 17178, comm: xvd 275 fd:86 > EIP: 0061:[<c0133b65>] CPU: 1 > EIP is at kthread_should_stop+0x15/0x20 > EFLAGS: 00000246 Not tainted (2.6.16-xen0 #1) > EAX: 00000001 EBX: 00000000 ECX: c05431b4 EDX: 00000000 > ESI: 00000000 EDI: cc7b143c EBP: c5f11f98 DS: 007b ES: 007b > CR0: 8005003b CR2: 090a8928 CR3: 0635f000 CR4: 00000660 > [<c0387bd5>] blkif_schedule+0x25/0x4a0 > [<c0133e90>] autoremove_wake_function+0x0/0x60 > [<c0133c6f>] kthread+0xff/0x110 > [<c0387bb0>] blkif_schedule+0x0/0x4a0 > [<c0133b70>] kthread+0x0/0x110 > [<c0102ca5>] kernel_thread_helper+0x5/0x10 > > These are happening every few minutes, for four domains, and rising. > Machine is under somewhat high disk load (lots of scp processes copying > large file systems, cfq disk sched). > > xen_changeset: Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0 > > Those domain have now zombified and are unkillable. I''ll grab the latest > updates and reboot the box. > > -Chris > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 5 Apr 2006, at 03:39, Christopher S. Aker wrote:> These are happening every few minutes, for four domains, and rising. > Machine is under somewhat high disk load (lots of scp processes > copying large file systems, cfq disk sched). > > xen_changeset: Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0 > > Those domain have now zombified and are unkillable. I''ll grab the > latest updates and reboot the box.Since it looks like a problem with the blkback kernel thread, it''s worth doing: echo 1 >/sys/module/blkback/parameters/debug_lvl That may get some kernel tracing (at level KERN_DEBUG) from that thread and we can see if it''s got into a bad looping state. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2006-Apr-05 18:32 UTC
Re: [Xen-devel] BUG: soft lockup detected on CPU#0!
Keir Fraser wrote:> Since it looks like a problem with the blkback kernel thread, it''s worth > doing: > echo 1 >/sys/module/blkback/parameters/debug_lvl > > That may get some kernel tracing (at level KERN_DEBUG) from that thread > and we can see if it''s got into a bad looping state.After an update and a reboot, and turning off soft lockup detection, I''m still getting zombie domains. It also appears that after this happens, no new block devices can be attached. Here''s a summary of the different debug outputs: (after restarting Xend) ==> /var/log/xend.log <=[2006-04-05 14:29:09 xend] DEBUG (XendDomain:197) Cannot recreate information for dying domain 54. Xend will ignore this domain from now on. [2006-04-05 14:29:09 xend] DEBUG (XendDomain:197) Cannot recreate information for dying domain 73. Xend will ignore this domain from now on. Apr 5 14:28:40 host56 kernel: xvd 73 fd:85: I/O pending, delaying exit Apr 5 14:28:40 host56 kernel: xvd 73 fd:85: not connected (13 pending) Apr 5 14:28:40 host56 kernel: xvd 73 fd:85: I/O pending, delaying exit Apr 5 14:28:40 host56 kernel: xvd 73 fd:85: not connected (13 pending) ^-- these flood syslog Apr 5 14:28:40 host56 kernel: ined (13 pe, delayed (13 pe, delayined (13 , delayed (13 , delayied (13 , delayined (13 , delayed (13 pend, delayed (13 , delayined (13 pe, delayined (13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 p, delayined (13 pen, delayed (13 pe, delayined (13 , delayied (13 pe, delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined (13 pe, delayined (13 pendin, delayined (13 pe, delaying ed (13 pe, delayined (13 pe, delayined (13 pe, delayed (13 pe, delayed (13 , delayin, delayined (13 pending, delayined (13 , delaying ed (13 pe, delayed (13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined (13 pe, delayed (13 pendin, delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined (13 , delayied (13 pendin, delayined (13 , delayined (13 pe, delayined (13 pe, delayed (13 pe, delayed (13 Apr 5 14:28:40 host56 kernel: elayined (13 , delayed (13 pendin, delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 pe, delayed (13 pe, delayed (13 p, delayined (13 , delayed (13 pendin, delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 , delayed (13 p, delayed (13 pe, delayined (13 pe, delayined (13 pend, delayined (13 , delaying ed (13 peed (13 , delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 p, delayined (13 pend, delayined (13 , delayined (13 pe, delayined (13 pe, de, delayined (13 pe, delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , delayined (13 pen, delayed (13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , delayined (13 pe, delayined (13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 , delayined (13 pe, delayed (13 , delayined (13 p, delayed (13 pend, delayed (13 , delayined (13 pe, dela ^-- these are flooding, but not quite as often. This leaves Xen/Xend in an unstable condition, I''m thinking the only way out is a reboot... -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 5 Apr 2006, at 19:32, Christopher S. Aker wrote:> pe, delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , > delayined (13 pe, delayined (13 pe, delayined (13 , delayed (13 pe, > delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , > delayined (13 pe, delayed (13 pe, delayined (13 , delayined (13 pe, > delayed (13 , delayined (13 p, delayed (13 pend, delayed (13 , > delayined (13 pe, dela > ^-- these are flooding, but not quite as often. > > This leaves Xen/Xend in an unstable condition, I''m thinking the only > way out is a reboot...Thanks, I''ll create a fix for xen-unstable which can later be backported to 3.0.2. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 5 Apr 2006, at 19:32, Christopher S. Aker wrote:> After an update and a reboot, and turning off soft lockup detection, > I''m still getting zombie domains. It also appears that after this > happens, no new block devices can be attached.Okay, this issue should be fixed in both the -unstable and -3.0-testing trees. I pushed directly into the 3.0.2 release tree as the original blkback kernel-thread loop was very broken indeed. I''m sure the fix is a strict improvement. :-) Look for changeset comment "Fix the blkif_schedule() kthread loop." when you pull -- that''s the changeset that contains the fix. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel