Luke Crawford
2006-Sep-01 22:02 UTC
[Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
BUG: soft lockup detected on CPU#0! Pid: 2213, comm: smbiod EIP: 0061:[<f4990f2e>] CPU: 0 EIP is at smbiod+0x116/0x16d [smbfs] EFLAGS: 00000246 Tainted: GF (2.6.16-xen-automount #1) EAX: 00000000 EBX: f4996400 ECX: f2c99f68 EDX: f2c98000 ESI: f2c98000 EDI: c06f5780 EBP: f2c99fb8 DS: 007b ES: 007b CR0: 8005003b CR2: b7f77000 CR3: 326e2000 CR4: 00000640 [<c0131bb3>] autoremove_wake_function+0x0/0x4b [<c0104b5e>] ret_from_fork+0x6/0x10 [<c0131bb3>] autoremove_wake_function+0x0/0x4b [<f4990e18>] smbiod+0x0/0x16d [smbfs] [<c0102e6d>] kernel_thread_helper+0x5/0xb smb_add_request: request [f26c6e80, mid=6567] timed out! smb_lookup: find java/com failed, error=-5 smb_add_request: request [f26c6b80, mid=6566] timed out! smb_add_request: request [f26c6080, mid=6568] timed out! I get the same problem on -unstable. I can reliably reproduce the problem whenever I start a particular set of java programs that read files on the samba mount (the smbmount grabs the files off a windows PC) this is a 3.0.2-2 2.6.16 kernel with a RHEL3 userland (with updated module-init-tools) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Sep-04 23:04 UTC
RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of Luke Crawford > Sent: 01 September 2006 23:03 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2 > > BUG: soft lockup detected on CPU#0! > Pid: 2213, comm: smbiod > EIP: 0061:[<f4990f2e>] CPU: 0 > EIP is at smbiod+0x116/0x16d [smbfs] > EFLAGS: 00000246 Tainted: GF (2.6.16-xen-automount #1) > EAX: 00000000 EBX: f4996400 ECX: f2c99f68 EDX: f2c98000 > ESI: f2c98000 EDI: c06f5780 EBP: f2c99fb8 DS: 007b ES: 007b > CR0: 8005003b CR2: b7f77000 CR3: 326e2000 CR4: 00000640 > [<c0131bb3>] autoremove_wake_function+0x0/0x4b > [<c0104b5e>] ret_from_fork+0x6/0x10 > [<c0131bb3>] autoremove_wake_function+0x0/0x4b > [<f4990e18>] smbiod+0x0/0x16d [smbfs] > [<c0102e6d>] kernel_thread_helper+0x5/0xb > smb_add_request: request [f26c6e80, mid=6567] timed out! > smb_lookup: find java/com failed, error=-5 > smb_add_request: request [f26c6b80, mid=6566] timed out! > smb_add_request: request [f26c6080, mid=6568] timed out! > > I get the same problem on -unstable. I can reliably reproduce theproblem> whenever I start a particular set of java programs that read files onthe> samba mount (the smbmount grabs the files off a windows PC) > > this is a 3.0.2-2 2.6.16 kernel with a RHEL3 userland (with updated > module-init-tools)I presume this is in a guest? As an experiment, try running it in dom0 and see what happens. Are these SMP guests? Are you sure the problem doesn''t happen with native 2.6.16? Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke Crawford
2006-Sep-04 23:07 UTC
RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
On Tue, 5 Sep 2006, Ian Pratt wrote:> I presume this is in a guest? As an experiment, try running it in dom0I will try that on Tusday. (the box is at a client''s location, and I don''t have off-site access)> Are these SMP guests?yes. this is a SMP guest> Are you sure the problem doesn''t happen with native 2.6.16?No. I am sure the problem doesn''t happen in native 2.4 with the RHEL3 patches, though. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Sep-04 23:16 UTC
RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
> > Are you sure the problem doesn''t happen with native 2.6.16? > > No. I am sure the problem doesn''t happen in native 2.4 with the RHEL3 > patches, though.I''ll wager this is a native problem. Smbfs is deprecated these days, so you should probably be using cifs on modern kernels -- see /sbin/mount.cifs Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Sep-05 15:46 UTC
Re: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
On 5/9/06 12:16 am, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote:>>> Are you sure the problem doesn''t happen with native 2.6.16? >> >> No. I am sure the problem doesn''t happen in native 2.4 with the RHEL3 >> patches, though. > > I''ll wager this is a native problem. Smbfs is deprecated these days, so > you should probably be using cifs on modern kernels -- see > /sbin/mount.cifs3.0.2-2 doesn''t include the fix to SEDF scheduler to prevent domain0 from taking all CPU time. Without that, domU''s can be starved and hence trigger the softlockup warning. The tip of 3.0.2 repository is a much better prospect, having lots of other bug fixes too. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke Crawford
2006-Sep-05 16:37 UTC
Re: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
On Tue, 5 Sep 2006, Keir Fraser wrote:> 3.0.2-2 doesn''t include the fix to SEDF scheduler to prevent domain0 from > taking all CPU time. Without that, domU''s can be starved and hence trigger > the softlockup warning. The tip of 3.0.2 repository is a much better > prospect, having lots of other bug fixes too.I installed 3-unstable, and was able to reproduce the problem. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke Crawford
2006-Sep-06 22:11 UTC
RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
It appears that you are correct; simply changing the mount to smbfs made the problem go away. (that caused some permissions issues, but those were easy enough to hack around) Thanks! two guys have been working on this for a week, and the problem is now solved and explained. On Tue, 5 Sep 2006, Ian Pratt wrote:> Date: Tue, 5 Sep 2006 00:16:50 +0100 > From: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> > To: Luke Crawford <lsc@prgmr.com> > Cc: xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2 > > >>> Are you sure the problem doesn''t happen with native 2.6.16? >> >> No. I am sure the problem doesn''t happen in native 2.4 with the RHEL3 >> patches, though. > > I''ll wager this is a native problem. Smbfs is deprecated these days, so > you should probably be using cifs on modern kernels -- see > /sbin/mount.cifs > > Ian > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke Crawford
2006-Sep-16 01:16 UTC
Re: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
Upgrading from SMB to CIFS looked like it fixed the problem for around a week. It''s back, but nobody can reproduce it yet. I had them on the 3.0.3 hg clone for a while, but as soon as they figured out that the problem appeared to be fixed by the SMB->CIFS fix, they moved back to 3.0.2-2. This error occurred on 3.0.2-2. If I can get them to reproduce the error, I will test with 3.0-testing Now, this is a bug, right? this isn''t just one of the VMs being heavily used? because the VM in question is running a massive Java app that would stress the server even if I ran it native. Pid: 23136, comm: csh EIP: 0061:[<c0168c05>] CPU: 1 EIP is at generic_fillattr+0x6d/0xa4 EFLAGS: 00000202 Tainted: GF (2.6.16-xen-automount #1) EAX: 0000448b EBX: 00000000 ECX: 0000448b EDX: 00000001 ESI: 00000000 EDI: d113e678 EBP: cc1a3f64 DS: 007b ES: 007b CR0: 8005003b CR2: 08127000 CR3: 30abb000 CR4: 00000640 [<f4a37a66>] cifs_getattr+0x32/0x3a [cifs] [<c0168e17>] vfs_fstat+0x33/0x44 [<c016949b>] sys_fstat64+0x18/0x36 [<c01541ff>] get_swap_page+0xbf/0x270 [<c015e900>] sys_open+0x27/0x2b [<c0104be1>] syscall_call+0x7/0xb CIFS VFS: No response for cmd 50 mid 46953 CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ \utils\pogo\home\cdc-ops\bin\mv_gamelogs CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ \utils\pogo\home\cdc-ops\bin\mv_gamelogs CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ \utils\pogo\home\cdc-ops\bin\mv_gamelogs CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ \utils\pogo\home\cdc-ops\bin\mv_gamelogs _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Sep-16 01:23 UTC
RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
> Upgrading from SMB to CIFS looked like it fixed the problem for arounda> week. It''s back, but nobody can reproduce it yet. > > I had them on the 3.0.3 hg clone for a while, but as soon as theyfigured> out that the problem appeared to be fixed by the SMB->CIFS fix, theymoved> back to 3.0.2-2. This error occurred on 3.0.2-2. > > If I can get them to reproduce the error, I will test with 3.0-testing > > Now, this is a bug, right? this isn''t just one of the VMs beingheavily> used? because the VM in question is running a massive Java app that > would stress the server even if I ran it native.It''s hard to tell for sure, but this doesn''t look to me like a xen issue. It may be possible to repro on an equivalent native kernel. Was the guest still pingable or did it crash? Ian> Pid: 23136, comm: csh > EIP: 0061:[<c0168c05>] CPU: 1 > EIP is at generic_fillattr+0x6d/0xa4 > EFLAGS: 00000202 Tainted: GF (2.6.16-xen-automount #1) > EAX: 0000448b EBX: 00000000 ECX: 0000448b EDX: 00000001 > ESI: 00000000 EDI: d113e678 EBP: cc1a3f64 DS: 007b ES: 007b > CR0: 8005003b CR2: 08127000 CR3: 30abb000 CR4: 00000640 > [<f4a37a66>] cifs_getattr+0x32/0x3a [cifs] > [<c0168e17>] vfs_fstat+0x33/0x44 > [<c016949b>] sys_fstat64+0x18/0x36 > [<c01541ff>] get_swap_page+0xbf/0x270 > [<c015e900>] sys_open+0x27/0x2b > [<c0104be1>] syscall_call+0x7/0xb > CIFS VFS: No response for cmd 50 mid 46953 > > CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ > \utils\pogo\home\cdc-ops\bin\mv_gamelogs > > CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ > \utils\pogo\home\cdc-ops\bin\mv_gamelogs > > CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ > \utils\pogo\home\cdc-ops\bin\mv_gamelogs > > CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\ > \utils\pogo\home\cdc-ops\bin\mv_gamelogs > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke Crawford
2006-Sep-16 01:34 UTC
RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
On Sat, 16 Sep 2006, Ian Pratt wrote:>> Now, this is a bug, right? this isn''t just one of the VMs being > heavily >> used? because the VM in question is running a massive Java app that >> would stress the server even if I ran it native. > > It''s hard to tell for sure, but this doesn''t look to me like a xen > issue. It may be possible to repro on an equivalent native kernel. > Was the guest still pingable or did it crash? > > Iancompletely unpingable. console was also dead, nobody tried the xen console. (I just setup a better reboot procedure for my hosting company; I need to setup something similar here so that we don''t loose the data we need to figure this out.) Where should I start looking to find out exactly what "bug: soft lockup on cpu0" means? linux source/docs? or Xen source/docs? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Sep-16 16:23 UTC
Re: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
On 16/9/06 2:34 am, "Luke Crawford" <lsc@prgmr.com> wrote:> completely unpingable. console was also dead, nobody tried the xen > console. (I just setup a better reboot procedure for my hosting company; > I need to setup something similar here so that we don''t loose the data we > need to figure this out.) > > Where should I start looking to find out exactly what "bug: soft lockup on > cpu0" means? linux source/docs? or Xen source/docs?The watchdog code runs a kernel thread on every CPU. This is supposed to wake up every second and update a per-CPU counter. A hook from the timer interrupt checks the per-CPU counter and prints a softlockup warning if the counter is not updated for 10 seconds. 3.0.2-2 is known to be susceptible to softlockups because the Xen scheduler will starve domains to run domain0. It''s not clear if that''s what is happening here, but you need to repro on tip of xen-3.0-testing to find out one way or the other. Because of the number of bug fixes since 3.0.2-3 we don''t recommend running any old releases of 3.0.2. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
john maclean
2006-Nov-02 14:50 UTC
[Xen-devel] Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2
Keir Fraser <Keir.Fraser <at> cl.cam.ac.uk> writes:> > On 16/9/06 2:34 am, "Luke Crawford" <lsc <at> prgmr.com> wrote: > > > completely unpingable. console was also dead, nobody tried the xen > > console. (I just setup a better reboot procedure for my hosting company; > > I need to setup something similar here so that we don''t loose the data we > > need to figure this out.) > > > > Where should I start looking to find out exactly what "bug: soft lockup on > > cpu0" means? linux source/docs? or Xen source/docs? > > The watchdog code runs a kernel thread on every CPU. This is supposed to > wake up every second and update a per-CPU counter. A hook from the timer > interrupt checks the per-CPU counter and prints a softlockup warning if the > counter is not updated for 10 seconds. > > 3.0.2-2 is known to be susceptible to softlockups because the Xen scheduler > will starve domains to run domain0. It''s not clear if that''s what is > happening here, but you need to repro on tip of xen-3.0-testing to find out > one way or the other. Because of the number of bug fixes since 3.0.2-3 we > don''t recommend running any old releases of 3.0.2. > > -- Keir >I also get soft lockup warnings in my Xen domU. I''d really love to be able to determine the source of the error(s) and perhaps fix them myself. Not a kernel hacker and my C is rather flaky but can anyone point me to some docs one how ti interpret data from:- Pausing... 5<3>BUG: soft lockup detected on CPU#0! Pid: 1, comm: init EIP: 0061:[<c0107c64>] CPU: 0 EIP is at delay_tsc+0x14/0x20 EFLAGS: 00000287 Not tainted (2.6.16-xen #1) EAX: 79d31a46 EBX: 000c74e4 ECX: 79c788c9 EDX: 00004616 ESI: 00000005 EDI: c0112520 EBP: bff6c010 DS: 007b ES: 007b CR0: 8005003b CR2: 431ea00c CR3: 003e3000 CR4: 00000640 [<c011264d>] do_fixup_4gb_segment+0x12d/0x160 [<c0113fa0>] do_page_fault+0x4a0/0x7ac [<c03de17c>] icmp_init+0xdc/0x110 [<c03de284>] inet_init+0x74/0x380 [<c03de17c>] icmp_init+0xdc/0x110 [<c0105243>] error_code+0x2b/0x30 Continuing... Are there any local docs, tools or dirs (on my machine) or URLs that anyone can point me to? - jm _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel