Dennis Jacobfeuerborn
2010-Dec-12 14:40 UTC
[CentOS-virt] VMs died due to hanging httpd processes
Hi, about an hour ago two web-serving VMs died at the same time with the following error on the console: INFO: task httpd:4304 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. httpd D 00af1f714d1112e2 0 4304 22471 4305 4303 (NOTLB) ffff88006574bdc8 0000000000000282 00000000000041f8 ffff88006574bea8 000000000000000a ffff88009747b820 ffffffff804f4b00 00000000001a5eee ffff88009747ba08 ffff880095be5015 Call Trace: [<ffffffff8022d03c>] mntput_no_expire+0x19/0x89 [<ffffffff8020eeae>] link_path_walk+0xa6/0xb2 [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff80223f33>] __path_lookup_intent_open+0x56/0x97 [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14 [<ffffffff8021b52d>] open_namei+0xea/0x6d5 [<ffffffff8029cb30>] set_process_cpu_timer+0xc7/0xd2 [<ffffffff80227caa>] do_filp_open+0x1c/0x38 [<ffffffff8021a364>] do_sys_open+0x44/0xbe [<ffffffff802602f9>] tracesys+0xab/0xb6 Monitoring show that in a timeframe of about 3 minutes the load on the systems shot up to over 400 before they died. Since MaxClients is set to 512 I suspect that the processes had a mass-lockup with each process constantly causing a load of 1 (similar to what happens when a process hangs on an NFS mount point). One of the two VMs acts as a NFS server and exports directories to the other VM (but doesn't mount any external NFS sources itself). What is strange is that both system locked up at the same time since they are running on two different physical hosts. The hosts run Centos 5.3 while the VMs run Centos 5.5 as PV Xen guests. Since the call trace looks identical on both cases I wonder if anyone has an idea what exactly went wrong here? Regards, dennis
On 12/12/2010 06:40 AM, Dennis Jacobfeuerborn wrote:> Monitoring show that in a timeframe of about 3 minutes the load on the > systems shot up to over 400 before they died. Since MaxClients is set to > 512 I suspect that the processes had a mass-lockup with each process > constantly causing a load of 1 (similar to what happens when a process > hangs on an NFS mount point). One of the two VMs acts as a NFS server and > exports directories to the other VM (but doesn't mount any external NFS > sources itself). > > What is strange is that both system locked up at the same time since they > are running on two different physical hosts. The hosts run Centos 5.3 while > the VMs run Centos 5.5 as PV Xen guests. > > Since the call trace looks identical on both cases I wonder if anyone has > an idea what exactly went wrong here?That sounds like it might be a 'slow http' DOS attack. http://ha.ckers.org/slowloris/ http://blog.spiderlabs.com/2010/11/advanced-topic-of-the-week-mitigating-slow-http-dos-attacks.html -- Benjamin Franz