thr3ads.net - freebsd stable - Some days, it doesn't pay to upgrade ... [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Marc G. Fournier

2007-Feb-27 13:25 UTC

Some days, it doesn't pay to upgrade ...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


After 155 days of problem free uptime, I upgraded my 6-STABLE system the other 
day to the latest cvsup ... 3 days later, the whole thing hung solid with:


Feb 27 04:32:49 mars uptimec: The server requested that we do a new login
Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see 
tuning(7) and login.conf(5).
Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see 
tuning(7) and login.conf(5).

Stupid question: why isn't there some mechanism that prevents new processes 
from starting up, instead of locking up the whole server?  I'm not asking
for
the evilness of Linux, where it arbitrarily kills off existing processes, but 
if maxproc is hit, why continue to try and start up new ones?

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFF5Csz4QvfyHIvDvMRAvriAJ48K+5X/YdY7YW13Ro8z/nVuca3cQCeIlYk
L8cLOgpzH4W4+tz6V8GVVqc=x/Ok
-----END PGP SIGNATURE-----

Tom Samplonius

2007-Feb-28 04:16 UTC

head link

Some days, it doesn't pay to upgrade ...

----- "Marc G. Fournier" <scrappy@freebsd.org>
wrote:> Feb 27 04:32:49 mars uptimec: The server requested that we do a new
> login
> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please
> see 
> tuning(7) and login.conf(5).
> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please
> see 
> tuning(7) and login.conf(5).
> 
> Stupid question: why isn't there some mechanism that prevents new
> processes 
> from starting up, instead of locking up the whole server?  I'm not
> asking for ...

  Isn't that what is happening?  When maxproc is hit, new processes
can't be created.  It is harmless, except for the uid that exceeded its
process limit.

  I think the hang is some side-effect.  Either because init can't fork a
process, therefore there is nothing to login to.  Did you try ping the system
from remote to really see whether it was a "solid" hang?  Or did you
just pound on the keyboard?

  Or it is just a deadlock.  That would be a bug.

Tom

Marc G. Fournier

2007-Mar-03 03:13 UTC

head link

Some days, it doesn't pay to upgrade ...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Based on the suggestion by someone on this list, I setup a screen session with 
top running, to watch things ... again, after 3 days, the server goes 'out
of
process' ... this time, of course, I could get in to look around and kill
off
processes ...

from what I can tell, a process that all it does is:

ping -c 1 <host> with a 300 sec timeout that runs once a minute started to
'run
over top of' each other out of cron ... the host that it is pinging is on
the
same switch and has been running fine for 20 days now, and it wasn't until I
did the last upgrade on teh server causing the problems that these problems 
started ...

Coincidence? :)

I'm going to fix the script so that it doesn't try to run over itself
...
anyone konw of a problem with the fxp driver in 6-STABLE that might cause the 
ping to hang?

- --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer 
<fbsd-stable@mawer.org> wrote:
> On 27/02/2007 11:59 PM, Marc G. Fournier wrote:
>> After 155 days of problem free uptime, I upgraded my 6-STABLE system
the
>> other  day to the latest cvsup ... 3 days later, the whole thing hung
solid
>> with:
>>
>>
>> Feb 27 04:32:49 mars uptimec: The server requested that we do a new
login
>> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please
see
>> tuning(7) and login.conf(5).
>> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please
see
>> tuning(7) and login.conf(5).
>>
>> Stupid question: why isn't there some mechanism that prevents new
processes
>> from starting up, instead of locking up the whole server?  I'm not
asking
>> for  the evilness of Linux, where it arbitrarily kills off existing
>> processes, but  if maxproc is hit, why continue to try and start up new
ones?
>
> What do you define as 'hung solid'? You are unable to get in via
SSH? Or at a
> console via iLO/etc?
>
> I've seen this on some of our 6.0-RELEASE machines (along with
maxpipekva
> exhausted errors), and you can't SSH in from that point... because sshd
forks
> to handle the connection, and all available process slots are used up.
>
> I've thought about writing a background daemon to monitor the logs for
signs
> of this (or even to just try and create a short-lived child process by
> fork()ing every 5 minutes or so), and dump information to disk then reboot
> the system when this occurs... it's a work-around for something that
> "shouldn't happen", but it does anyway... once I'm able
to identify _what_ is
> causing the build-up of processes, then I might be able to do something
about
> killing them...!!!
>
>
> It's quite deceptive from an end-user point of view, because things
like
> Apache that are already keep running, so all they see are strange bits and
> pieces that don't work... and as always, its one of those things that
only
> happens on some clients machines, but never on any of our test machines...
>
> --Antony
>
>
> PS. I haven't disappeared off the face of the earth.. though close.. my
> fiance and I have been busy planning the wedding, and wound up buying a
house
> at the same time..!! Will catch up shortly once I get a chance to come up
for
> air!!


- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFF6Ofd4QvfyHIvDvMRAmoqAJ9ka8ZQxq0Ciidyy4R60bTmYfxeggCeLz7i
/De9C0Hmdqb22nErxhyUaZA=Seo0
-----END PGP SIGNATURE-----

Marc G. Fournier

2007-Mar-03 05:32 UTC

head link

Some days, it doesn't pay to upgrade ...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I don't know how critical this is, but I just thought about it ... this is
my
only system running gmirror ... everything seems fine according ot gmirror 
status, but maybe something iswron gthere I'm not seeing:

Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm 
destroyed.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device vm destroyed.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 
destroyed.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 destroyed.
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 removed from md0.
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 removed.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 
destroyed.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 destroyed.
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 removed from md0.
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 destroyed.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 created (id=2282154470).
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 detected.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 detected.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 activated.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 activated.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 
launched.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 created (id=3089402334).
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 detected.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 detected.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 activated.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 activated.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 
launched.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device vm created (id=2175292049).
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 detected.
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 created (id=1094782536).
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 attached to md0.
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 attached to md0.
Mar  3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 activated.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Force device vm start due to timeout.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 activated.
Mar  3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm 
launched.

mirror/md1  COMPLETE  da1
                      da2
mirror/md2  COMPLETE  da3
                      da4
 mirror/vm  DEGRADED  da5

I'm not using da5 right now, its just in there ... went with a RAID1+0 vs
RAID5
configuration ...

- --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer 
<fbsd-stable@mawer.org> wrote:
> On 27/02/2007 11:59 PM, Marc G. Fournier wrote:
>> After 155 days of problem free uptime, I upgraded my 6-STABLE system
the
>> other  day to the latest cvsup ... 3 days later, the whole thing hung
solid
>> with:
>>
>>
>> Feb 27 04:32:49 mars uptimec: The server requested that we do a new
login
>> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please
see
>> tuning(7) and login.conf(5).
>> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please
see
>> tuning(7) and login.conf(5).
>>
>> Stupid question: why isn't there some mechanism that prevents new
processes
>> from starting up, instead of locking up the whole server?  I'm not
asking
>> for  the evilness of Linux, where it arbitrarily kills off existing
>> processes, but  if maxproc is hit, why continue to try and start up new
ones?
>
> What do you define as 'hung solid'? You are unable to get in via
SSH? Or at a
> console via iLO/etc?
>
> I've seen this on some of our 6.0-RELEASE machines (along with
maxpipekva
> exhausted errors), and you can't SSH in from that point... because sshd
forks
> to handle the connection, and all available process slots are used up.
>
> I've thought about writing a background daemon to monitor the logs for
signs
> of this (or even to just try and create a short-lived child process by
> fork()ing every 5 minutes or so), and dump information to disk then reboot
> the system when this occurs... it's a work-around for something that
> "shouldn't happen", but it does anyway... once I'm able
to identify _what_ is
> causing the build-up of processes, then I might be able to do something
about
> killing them...!!!
>
>
> It's quite deceptive from an end-user point of view, because things
like
> Apache that are already keep running, so all they see are strange bits and
> pieces that don't work... and as always, its one of those things that
only
> happens on some clients machines, but never on any of our test machines...
>
> --Antony
>
>
> PS. I haven't disappeared off the face of the earth.. though close.. my
> fiance and I have been busy planning the wedding, and wound up buying a
house
> at the same time..!! Will catch up shortly once I get a chance to come up
for
> air!!

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFF6Qhk4QvfyHIvDvMRAhJ0AKDVibziN1W1TagIapB5GWN3+mbCGACdHd4w
dgT0Xi40Ie/pBeUMB8Pj1go=bSuI
-----END PGP SIGNATURE-----

freebsd stable - Feb 2007 - Some days, it doesn't pay to upgrade ...

Some days, it doesn't pay to upgrade ...

Some days, it doesn't pay to upgrade ...

Some days, it doesn't pay to upgrade ...

Some days, it doesn't pay to upgrade ...