-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 After 155 days of problem free uptime, I upgraded my 6-STABLE system the other day to the latest cvsup ... 3 days later, the whole thing hung solid with: Feb 27 04:32:49 mars uptimec: The server requested that we do a new login Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5). Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see tuning(7) and login.conf(5). Stupid question: why isn't there some mechanism that prevents new processes from starting up, instead of locking up the whole server? I'm not asking for the evilness of Linux, where it arbitrarily kills off existing processes, but if maxproc is hit, why continue to try and start up new ones? - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFF5Csz4QvfyHIvDvMRAvriAJ48K+5X/YdY7YW13Ro8z/nVuca3cQCeIlYk L8cLOgpzH4W4+tz6V8GVVqc=x/Ok -----END PGP SIGNATURE-----
----- "Marc G. Fournier" <scrappy@freebsd.org> wrote:> Feb 27 04:32:49 mars uptimec: The server requested that we do a new > login > Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please > see > tuning(7) and login.conf(5). > Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please > see > tuning(7) and login.conf(5). > > Stupid question: why isn't there some mechanism that prevents new > processes > from starting up, instead of locking up the whole server? I'm not > asking for... Isn't that what is happening? When maxproc is hit, new processes can't be created. It is harmless, except for the uid that exceeded its process limit. I think the hang is some side-effect. Either because init can't fork a process, therefore there is nothing to login to. Did you try ping the system from remote to really see whether it was a "solid" hang? Or did you just pound on the keyboard? Or it is just a deadlock. That would be a bug. Tom
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Based on the suggestion by someone on this list, I setup a screen session with top running, to watch things ... again, after 3 days, the server goes 'out of process' ... this time, of course, I could get in to look around and kill off processes ... from what I can tell, a process that all it does is: ping -c 1 <host> with a 300 sec timeout that runs once a minute started to 'run over top of' each other out of cron ... the host that it is pinging is on the same switch and has been running fine for 20 days now, and it wasn't until I did the last upgrade on teh server causing the problems that these problems started ... Coincidence? :) I'm going to fix the script so that it doesn't try to run over itself ... anyone konw of a problem with the fxp driver in 6-STABLE that might cause the ping to hang? - --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer <fbsd-stable@mawer.org> wrote:> On 27/02/2007 11:59 PM, Marc G. Fournier wrote: >> After 155 days of problem free uptime, I upgraded my 6-STABLE system the >> other day to the latest cvsup ... 3 days later, the whole thing hung solid >> with: >> >> >> Feb 27 04:32:49 mars uptimec: The server requested that we do a new login >> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see >> tuning(7) and login.conf(5). >> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see >> tuning(7) and login.conf(5). >> >> Stupid question: why isn't there some mechanism that prevents new processes >> from starting up, instead of locking up the whole server? I'm not asking >> for the evilness of Linux, where it arbitrarily kills off existing >> processes, but if maxproc is hit, why continue to try and start up new ones? > > What do you define as 'hung solid'? You are unable to get in via SSH? Or at a > console via iLO/etc? > > I've seen this on some of our 6.0-RELEASE machines (along with maxpipekva > exhausted errors), and you can't SSH in from that point... because sshd forks > to handle the connection, and all available process slots are used up. > > I've thought about writing a background daemon to monitor the logs for signs > of this (or even to just try and create a short-lived child process by > fork()ing every 5 minutes or so), and dump information to disk then reboot > the system when this occurs... it's a work-around for something that > "shouldn't happen", but it does anyway... once I'm able to identify _what_ is > causing the build-up of processes, then I might be able to do something about > killing them...!!! > > > It's quite deceptive from an end-user point of view, because things like > Apache that are already keep running, so all they see are strange bits and > pieces that don't work... and as always, its one of those things that only > happens on some clients machines, but never on any of our test machines... > > --Antony > > > PS. I haven't disappeared off the face of the earth.. though close.. my > fiance and I have been busy planning the wedding, and wound up buying a house > at the same time..!! Will catch up shortly once I get a chance to come up for > air!!- ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFF6Ofd4QvfyHIvDvMRAmoqAJ9ka8ZQxq0Ciidyy4R60bTmYfxeggCeLz7i /De9C0Hmdqb22nErxhyUaZA=Seo0 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I don't know how critical this is, but I just thought about it ... this is my only system running gmirror ... everything seems fine according ot gmirror status, but maybe something iswron gthere I'm not seeing: Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 destroyed. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 removed from md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 removed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 destroyed. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 removed from md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 created (id=2282154470). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 launched. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 created (id=3089402334). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 launched. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm created (id=2175292049). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 detected. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 created (id=1094782536). Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 attached to md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 attached to md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Force device vm start due to timeout. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm launched. mirror/md1 COMPLETE da1 da2 mirror/md2 COMPLETE da3 da4 mirror/vm DEGRADED da5 I'm not using da5 right now, its just in there ... went with a RAID1+0 vs RAID5 configuration ... - --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer <fbsd-stable@mawer.org> wrote:> On 27/02/2007 11:59 PM, Marc G. Fournier wrote: >> After 155 days of problem free uptime, I upgraded my 6-STABLE system the >> other day to the latest cvsup ... 3 days later, the whole thing hung solid >> with: >> >> >> Feb 27 04:32:49 mars uptimec: The server requested that we do a new login >> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see >> tuning(7) and login.conf(5). >> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see >> tuning(7) and login.conf(5). >> >> Stupid question: why isn't there some mechanism that prevents new processes >> from starting up, instead of locking up the whole server? I'm not asking >> for the evilness of Linux, where it arbitrarily kills off existing >> processes, but if maxproc is hit, why continue to try and start up new ones? > > What do you define as 'hung solid'? You are unable to get in via SSH? Or at a > console via iLO/etc? > > I've seen this on some of our 6.0-RELEASE machines (along with maxpipekva > exhausted errors), and you can't SSH in from that point... because sshd forks > to handle the connection, and all available process slots are used up. > > I've thought about writing a background daemon to monitor the logs for signs > of this (or even to just try and create a short-lived child process by > fork()ing every 5 minutes or so), and dump information to disk then reboot > the system when this occurs... it's a work-around for something that > "shouldn't happen", but it does anyway... once I'm able to identify _what_ is > causing the build-up of processes, then I might be able to do something about > killing them...!!! > > > It's quite deceptive from an end-user point of view, because things like > Apache that are already keep running, so all they see are strange bits and > pieces that don't work... and as always, its one of those things that only > happens on some clients machines, but never on any of our test machines... > > --Antony > > > PS. I haven't disappeared off the face of the earth.. though close.. my > fiance and I have been busy planning the wedding, and wound up buying a house > at the same time..!! Will catch up shortly once I get a chance to come up for > air!!- ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFF6Qhk4QvfyHIvDvMRAhJ0AKDVibziN1W1TagIapB5GWN3+mbCGACdHd4w dgT0Xi40Ie/pBeUMB8Pj1go=bSuI -----END PGP SIGNATURE-----