thr3ads.net - freebsd stable - Suspected libkvm infinite loop [Mar 2015]

If this information is useful, please help other people find it:
Share via:

John Baldwin

2015-Mar-10 18:10 UTC

Suspected libkvm infinite loop

On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton
wrote:> Hi,
> 
> For the past several months, we have had an intermittent problem where a
> process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets
> stuck in an infinite loop and goes to 100% cpu. We have just observed
> "fstat -m" do the same thing and suspect it may be the same
problem.
> 
> Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox,
with
> ufs root and zfs /home.
> 
> Has anyone else experienced this? Is there anything we can do to
investigate
> the problem further?
Often loops using libkvm are due to programs using libkvm are trying to read 
kernel data structures while they are changing.  However, if you use sysctls 
to fetch this data instead, you should be able to get a stable snapshot of the 
system state without getting stuck in a possible loop.  I believe for libkvm 
to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and 
"/dev/null" for the core image.  fstat -m should be doing that by
default
however, so if it is not that, can you ktrace fstat when it is spinning to see 
if it is spinning userland or in the kernel?  If you see no activity via 
ktrace, then it is spinning in one of the two places without making any system 
calls, etc.  You can attach to it with gdb to pause it, then see where gdb 
thinks it is.  If gdb hangs attaching to it, then it is stuck in the kernel.  

If gdb attaches to it ok, then it is spinning in userland.  Unfortunately, for 
gdb to be useful, you really need debug symbols.  We don't currently provide
those for release binaries or binaries provided via freebsd-update (though 
that is being worked on for 11.0).  If you build from source, then the 
simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to
/etc/src.conf and
rebuild your world without NO_CLEAN.  If you are building from source and are 
able to reproduce with those binaries, then after attaching to the process 
with gdb, use 'bt' to see where it is hung and reply with that.

If it is hanging in the kernel, then you will need to use the kernel debugger 
to see where it is hanging.  The simplest way to do this is probably to force 
a crash via the debug.kdb.panic sysctl (set it to a non-zero value).  You will 
then need to fire up kgdb on the crash dump after it reboots, switch to the 
fstat process via the 'proc <pid>' command and get a backtrace via
'bt'.

-- 
John Baldwin

Mark Johnston

2015-Mar-10 21:59 UTC

head link

Suspected libkvm infinite loop

On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin
wrote:> On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton wrote:
> > Hi,
> > 
> > For the past several months, we have had an intermittent problem where
a
> > process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which)
gets
> > stuck in an infinite loop and goes to 100% cpu. We have just observed
> > "fstat -m" do the same thing and suspect it may be the same
problem.
> > 
> > Our environment is a 10.1-RELEASE-p6 amd64 guest running in
VirtualBox, with
> > ufs root and zfs /home.
> > 
> > Has anyone else experienced this? Is there anything we can do to
investigate
> > the problem further?
> 
> Often loops using libkvm are due to programs using libkvm are trying to
read
> kernel data structures while they are changing.  However, if you use
sysctls
> to fetch this data instead, you should be able to get a stable snapshot of
the
> system state without getting stuck in a possible loop.  I believe for
libkvm
> to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel
and
> "/dev/null" for the core image.  fstat -m should be doing that by
default
> however, so if it is not that, can you ktrace fstat when it is spinning to
see
> if it is spinning userland or in the kernel?  If you see no activity via 
> ktrace, then it is spinning in one of the two places without making any
system
> calls, etc.  You can attach to it with gdb to pause it, then see where gdb 
> thinks it is.  If gdb hangs attaching to it, then it is stuck in the
kernel.
> 
> If gdb attaches to it ok, then it is spinning in userland.  Unfortunately,
for
> gdb to be useful, you really need debug symbols.  We don't currently
provide
> those for release binaries or binaries provided via freebsd-update (though 
> that is being worked on for 11.0).  If you build from source, then the 
> simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to
/etc/src.conf and
> rebuild your world without NO_CLEAN.  If you are building from source and
are
> able to reproduce with those binaries, then after attaching to the process 
> with gdb, use 'bt' to see where it is hung and reply with that.
> 
> If it is hanging in the kernel, then you will need to use the kernel
debugger
> to see where it is hanging.  The simplest way to do this is probably to
force
> a crash via the debug.kdb.panic sysctl (set it to a non-zero value).  You
will
> then need to fire up kgdb on the crash dump after it reboots, switch to the
> fstat process via the 'proc <pid>' command and get a
backtrace via 'bt'.
It sounds like this issue might be the one fixed in r272566: if the
KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
sbuf error return value could bubble up and be treated as ERESTART,
resulting in a loop.

This can be confirmed with something like

  dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();}
tick-3s {exit(0);}' -p <pid of looping proc>

If the output consists solely of __sysctl, this bug is likely the
culprit.

-Mark

freebsd stable - Mar 2015 - Suspected libkvm infinite loop

Suspected libkvm infinite loop

Suspected libkvm infinite loop