On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark
wrote:> Hello List,
>
> I am running into a strange problem that points to a resource leak. The
> problem manifests itself after one of our remote systems has been up
> around 100 days.
> The symptom is that it appears no new processes can be spawned. If I try to
> ssh to the unit, I can see the 3-way tcp handshake and then no more
traffic.
> Examining log files, like cron, etc show that when this happens no more
entries
> are written into the cron log. The unit is acting as a firewall, router
> and vpn appliance these functions continue to work. We have a C
> application that is periodically started out of a shell script that
> reports various information about the system, it stops reporting, while
> vpns, ospf routing, and ipfilter firewalling continue to work and write
> into their logfiles.
>
> My question is how do I monitor the various resources in the system that
could
> prevent the spawning of a new process?
Periodically logging "ps -auxw" output to a file would be useful, as
ideally you'd gradually see the list get longer and longer over time;
it's possible you have many zombie processes as a result of a parent
which is not reaping its children (calling waitpid(2) or its friends).
Other things that might come in useful are "fstat" and "vmstat
-s".
It sounds like your C program relies heavily on system() or execl() and
fork(), which is why it's affected -- while the other programs are
likely kernel-level.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |