On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton
wrote:> Hi,
>
> For the past several months, we have had an intermittent problem where a
> process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets
> stuck in an infinite loop and goes to 100% cpu. We have just observed
> "fstat -m" do the same thing and suspect it may be the same
problem.
>
> Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox,
with
> ufs root and zfs /home.
>
> Has anyone else experienced this? Is there anything we can do to
investigate
> the problem further?
Often loops using libkvm are due to programs using libkvm are trying to read
kernel data structures while they are changing. However, if you use sysctls
to fetch this data instead, you should be able to get a stable snapshot of the
system state without getting stuck in a possible loop. I believe for libkvm
to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and
"/dev/null" for the core image. fstat -m should be doing that by
default
however, so if it is not that, can you ktrace fstat when it is spinning to see
if it is spinning userland or in the kernel? If you see no activity via
ktrace, then it is spinning in one of the two places without making any system
calls, etc. You can attach to it with gdb to pause it, then see where gdb
thinks it is. If gdb hangs attaching to it, then it is stuck in the kernel.
If gdb attaches to it ok, then it is spinning in userland. Unfortunately, for
gdb to be useful, you really need debug symbols. We don't currently provide
those for release binaries or binaries provided via freebsd-update (though
that is being worked on for 11.0). If you build from source, then the
simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to
/etc/src.conf and
rebuild your world without NO_CLEAN. If you are building from source and are
able to reproduce with those binaries, then after attaching to the process
with gdb, use 'bt' to see where it is hung and reply with that.
If it is hanging in the kernel, then you will need to use the kernel debugger
to see where it is hanging. The simplest way to do this is probably to force
a crash via the debug.kdb.panic sysctl (set it to a non-zero value). You will
then need to fire up kgdb on the crash dump after it reboots, switch to the
fstat process via the 'proc <pid>' command and get a backtrace via
'bt'.
--
John Baldwin