thr3ads.net - CentOS - [CentOS] Server locking up everyday around 3:30 AM [Mar 2011]

If this information is useful, please help other people find it:
Share via:

m.roth at 5-cent.us

2011-Mar-11 18:06 UTC

[CentOS] Server locking up everyday around 3:30 AM

PJ wrote:> This may or may not be CentOS related, but am out of ideas at this point
and wanted to bounce this off the list.>
> I'm running a CentOS 5.5 server, running the latest kernel
> 2.6.18-194.32.1.el5.
>
> Almost everyday around 3:30 AM the server completely locks up and has to
be power cycled before it will come back online.> (this means someone hat to wake up and reboot the server, oh how I lovebeing an internet janitor! :)

Please log of the Internet. We are cleaning it. You may log back on later.

<snip>> I was able to pull this from /var/log/messages, this happens just
seconds before locking up completely...>
> Mar  8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more than
120 seconds.> Mar  8 03:33:19 web1 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar  8 03:33:19 web1 kernel: wget          D ffff810001004420     0
13608  13607                     (NOTLB)> Mar  8 03:33:19 web1 kernel:  ffff81007bc7bc78 0000000000000086
> ffff81007bc7bd88 ffff81000100d3f8
> Mar  8 03:33:19 web1 kernel:  ffff81007bc7bbf0 0000000000000007
> ffff8100849db0c0 ffffffff80308b60
> Mar  8 03:33:19 web1 kernel:  00013a2964cdf439 0000000000003237
> ffff8100849db2a8 0000000064c82eae
> Mar  8 03:33:19 web1 kernel: Call Trace:
> Mar  8 03:33:20 web1 kernel:  [<ffffffff80063c6f>]
> __mutex_lock_slowpath+0x60/0x9b<snip>
Anyone else smell an OOM killer? But it's clearly whatever the wget's
after that's killing the system.

           mark

2011-Mar-11 18:20 UTC

head link

[CentOS] Server locking up everyday around 3:30 AM

On Fri, Mar 11, 2011 at 10:06 AM,  <m.roth at 5-cent.us>
wrote:> PJ wrote:
>> This may or may not be CentOS related, but am out of ideas at this
point
> and wanted to bounce this off the list.
>>
>> I'm running a CentOS 5.5 server, running the latest kernel
>> 2.6.18-194.32.1.el5.
>>
>> Almost everyday around 3:30 AM the server completely locks up and has
to
> be power cycled before it will come back online.
>> (this means someone hat to wake up and reboot the server, oh how I love
> being an internet janitor! :)
>
> Please log of the Internet. We are cleaning it. You may log back on later.
>
> <snip>
>> I was able to pull this from /var/log/messages, this happens just
> seconds before locking up completely...
>>
>> Mar ?8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more
than
> 120 seconds.
>> Mar ?8 03:33:19 web1 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar ?8 03:33:19 web1 kernel: wget ? ? ? ? ?D ffff810001004420 ? ? 0
> 13608 ?13607 ? ? ? ? ? ? ? ? ? ? (NOTLB)
>> Mar ?8 03:33:19 web1 kernel: ?ffff81007bc7bc78 0000000000000086
>> ffff81007bc7bd88 ffff81000100d3f8
>> Mar ?8 03:33:19 web1 kernel: ?ffff81007bc7bbf0 0000000000000007
>> ffff8100849db0c0 ffffffff80308b60
>> Mar ?8 03:33:19 web1 kernel: ?00013a2964cdf439 0000000000003237
>> ffff8100849db2a8 0000000064c82eae
>> Mar ?8 03:33:19 web1 kernel: Call Trace:
>> Mar ?8 03:33:20 web1 kernel: ?[<ffffffff80063c6f>]
>> __mutex_lock_slowpath+0x60/0x9b
> <snip>
> Anyone else smell an OOM killer? But it's clearly whatever the
wget's
> after that's killing the system.
>
> ? ? ? ? ? mark
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
What makes no sense to me is this runs every 5 minutes all day, but
only around 3:30 AM does it look up.

There is nothing in the log that suggests the kernel is having to kill
processes because it is out of resources.

No "httpd invoked oom-killer" etc... which I have seen before in other
situations.

http://bugs.centos.org/view.php?id=4515 sounds like what I have going
on, but not with kjournald of course...

Thanks,

--
PJ

Steve Thompson

2011-Mar-11 19:05 UTC

head link

[CentOS] Server locking up everyday around 3:30 AM

> PJ wrote:
>> Mar  8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more
than
> 120 seconds.
Check the number of dirty pages:

 	grep Dirty /proc/meminfo

relative to the dirty_ratio setting:

 	cat /proc/sys/vm/dirty_ratio

to see if the system is going into synhronous flush mode around that time 
(especially if dirty_ratio is large and you have a lot of physical 
memory). This is what I usually see as the cause of the "blocked for more 
than" message. I've also found that it can be several minutes, and up
to
20 minutes, before the system recovers (but recover it always does).

-Steve

m.roth at 5-cent.us

2011-Mar-11 22:33 UTC

head link

[CentOS] Server locking up everyday around 3:30 AM

PJ wrote:> On Fri, Mar 11, 2011 at 11:05 AM, Steve Thompson <smt at
vgersoft.com> wrote:
>>
>>> PJ wrote:
>>>> Mar ?8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for
morethan 120 seconds.
<snip>> Great replies from everyone, I really appreciate the feedback.
>
> Interesting entries in /var/log/cron:
<snip>> Mar 11 03:01:01 web1 crond[13613]: (root) CMD (run-parts/etc/cron.hourly) Mar 11 03:07:20 web1 crond[13727]: (webuser) error:
Job execution of per-minute job scheduled for 03:05 delayed into
subsequent minute 03:07. Skipping job run.> Mar 11 03:07:20 web1 crond[13727]: CRON (webuser) ERROR: cannot setsecurity context
<snip>
SELINUX! Look at /var/log/messages for an selinux error: if you don't have
sealert, install the package, then use it on /var/log/audit.

Or put selinux in permissive mode.

        mark

Seemingly Similar Threads

Search for more apparently analagous threads

CentOS - Mar 2011 - Server locking up everyday around 3:30 AM

[CentOS] Server locking up everyday around 3:30 AM

[CentOS] Server locking up everyday around 3:30 AM

[CentOS] Server locking up everyday around 3:30 AM

[CentOS] Server locking up everyday around 3:30 AM

Seemingly Similar Threads