Stephan Hendl
2006-Nov-13 05:03 UTC
[Ocfs2-users] frozen ocfs2 filesystem under heavy webserver load
Hi, I use a cluster of 4 nodes with ocfs2 as a webserver cluster. During a stresstest it occurs after a couple of minutes that the webserverprocesses are ideling but the system load is extremely high (abou 150...200) where the waits are very low. After that I cannot interrupt the webserver processes anymore and in some directories a "ls -ls" comes not back - so it seems that the file system has a problem. Only a reset of the server solves the problem ;-(( In the debug mode I can find the following lines as an example: Lockres: D00000000000000000ccda0fcc1c07e Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Lockres: M00000000000000000ccda0fcc1c07e Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Lockres: D00000000000000000ccd9ffcc1c07d Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Lockres: M00000000000000000ccd9ffcc1c07d Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Could it be that two servers like to write to the same file and under heavy load the clusterd processes cannot handle this? Regards und thanks, Stephan -- Dr. Stephan Hendl Systemmanagement ----------------------------------- Landesbetrieb f?r Datenverarbeitung und Statistik Land Brandenburg Adresse: 14467 Potsdam, Dortustr. 46 Telefon: +49-(0)331 39-471 Fax: +49-(0)331 27548-1187 Mobil: +49-(0)160 90 645 893 EMail: Stephan.Hendl@lds.brandenburg.de Internet: http://www.lds-bb.de
Sunil Mushran
2006-Nov-13 18:27 UTC
[Ocfs2-users] frozen ocfs2 filesystem under heavy webserver load
None of these locks are busy. So they should not be the cause of the problem. Start with the version of ocfs2. Also, which kernel? What does top say? Is some process spinning? Also, what does this stresstest entail? Stephan Hendl wrote:> Hi, > > I use a cluster of 4 nodes with ocfs2 as a webserver cluster. During a stresstest it occurs after a couple of minutes that the webserverprocesses are ideling but the system load is extremely high (abou 150...200) where the waits are very low. After that I cannot interrupt the webserver processes anymore and in some directories a "ls -ls" comes not back - so it seems that the file system has a problem. Only a reset of the server solves the problem ;-(( > > In the debug mode I can find the following lines as an example: > > Lockres: D00000000000000000ccda0fcc1c07e Mode: Exclusive > Flags: Initialized Attached > RO Holders: 0 EX Holders: 0 > Pending Action: None Pending Unlock Action: None > Requested Mode: Exclusive Blocking Mode: Invalid > > Lockres: M00000000000000000ccda0fcc1c07e Mode: Exclusive > Flags: Initialized Attached > RO Holders: 0 EX Holders: 0 > Pending Action: None Pending Unlock Action: None > Requested Mode: Exclusive Blocking Mode: Invalid > > Lockres: D00000000000000000ccd9ffcc1c07d Mode: Exclusive > Flags: Initialized Attached > RO Holders: 0 EX Holders: 0 > Pending Action: None Pending Unlock Action: None > Requested Mode: Exclusive Blocking Mode: Invalid > > Lockres: M00000000000000000ccd9ffcc1c07d Mode: Exclusive > Flags: Initialized Attached > RO Holders: 0 EX Holders: 0 > Pending Action: None Pending Unlock Action: None > Requested Mode: Exclusive Blocking Mode: Invalid > > Could it be that two servers like to write to the same file and under heavy load the clusterd processes cannot handle this? > > Regards und thanks, > Stephan >
Stephan Hendl
2006-Nov-15 22:34 UTC
Antw: Re: [Ocfs2-users] frozen ocfs2 filesystem under heavy webserver load
ocfs2: version 1.2.3 Linux: RedHat EL 4 Kernel: 2.6.9-42.0.2.ELsmp top says a systemload of about 150 to 200 where no active processes are running; the 150 webserver processes are waiting of something... The waits in the top are below 2%. The webserver stresstest is for knowing more about the amount of requests this webservercluster can handle at the same time. The stress program is "http_load" which does send 200 requests of about 25 weblinks in our CMS parallel. -- Stephan>>> Sunil Mushran <Sunil.Mushran@oracle.com> schrieb am 14.11.2006 um 03:27 inNachricht <4559297B.3090207@oracle.com>:> None of these locks are busy. So they should not be the cause of the > problem. > > Start with the version of ocfs2. Also, which kernel? > What does top say? Is some process spinning? > > Also, what does this stresstest entail? > > Stephan Hendl wrote: >> Hi, >> >> I use a cluster of 4 nodes with ocfs2 as a webserver cluster. During a > stresstest it occurs after a couple of minutes that the webserverprocesses > are ideling but the system load is extremely high (abou 150...200) where the > waits are very low. After that I cannot interrupt the webserver processes > anymore and in some directories a "ls -ls" comes not back - so it seems that > the file system has a problem. Only a reset of the server solves the problem > ;-(( >> >> In the debug mode I can find the following lines as an example: >> >> Lockres: D00000000000000000ccda0fcc1c07e Mode: Exclusive >> Flags: Initialized Attached >> RO Holders: 0 EX Holders: 0 >> Pending Action: None Pending Unlock Action: None >> Requested Mode: Exclusive Blocking Mode: Invalid >> >> Lockres: M00000000000000000ccda0fcc1c07e Mode: Exclusive >> Flags: Initialized Attached >> RO Holders: 0 EX Holders: 0 >> Pending Action: None Pending Unlock Action: None >> Requested Mode: Exclusive Blocking Mode: Invalid >> >> Lockres: D00000000000000000ccd9ffcc1c07d Mode: Exclusive >> Flags: Initialized Attached >> RO Holders: 0 EX Holders: 0 >> Pending Action: None Pending Unlock Action: None >> Requested Mode: Exclusive Blocking Mode: Invalid >> >> Lockres: M00000000000000000ccd9ffcc1c07d Mode: Exclusive >> Flags: Initialized Attached >> RO Holders: 0 EX Holders: 0 >> Pending Action: None Pending Unlock Action: None >> Requested Mode: Exclusive Blocking Mode: Invalid >> >> Could it be that two servers like to write to the same file and under heavy > load the clusterd processes cannot handle this? >> >> Regards und thanks, >> Stephan >>