Michael S. Moody
2008-Jun-06 21:17 UTC
[Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen
I had an instance today on several servers where the load average soared, and all of my apache processes were in uninterruptible sleep state. I did run scanlocks, and the ps command as requested: On one server some apache processes looked like this: 3345 D apache2 dlm_wait_for_recovery On another, the output was completely different. Nonetheless, the following threads on one server were using 100% cpu: [o2net] [o2hb-BC778ACE98] [dlm_thread] That box needed to be rebooted, and things recovered. The output of scanlocks: forum3 ~ # ./scanlocks /dev/sdb1 M000000000000000000002b7976e45d /dev/sdb1 O000000000000000242b4bb00000000 /dev/sdb1 O0000000000000000dd506300000000 /dev/sdb1 O0000000000000000f898b400000000 /dev/sdb1 O0000000000000001b23aa000000000 /dev/sdb1 O000000000000000133e0c200000000 /dev/sdb1 O00000000000000011c6ea100000000 /dev/sdb1 N00000000013fffb30140013c /dev/sdb1 N000000000039292b016d640d /dev/sdb1 O0000000000000000f8999e00000000 /dev/sdb1 N000000000042fb6002ff2763 /dev/sdb1 O0000000000000002d5d64300000000 /dev/sdb1 O0000000000000001e639b100000000 /dev/sdb1 O0000000000000000398aa700000000 /dev/sdb1 O0000000000000000c139fd00000000 /dev/sdb1 O00000000000000014df4ab00000000 /dev/sdb1 D0000000000000000815e6ed5b0763f forum ~ # ./scanlocks /dev/sdd1 O0000000000000000dd506300000000 /dev/sdd1 M0000000000000000c102c700000000 /dev/sdd1 M000000000000000039d65600000000 /dev/sdd1 S000000000000000000000200000000 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080606/a8fa5d23/attachment.html
Sunil Mushran
2008-Jun-07 21:09 UTC
[Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen
What version/kernel are you running? We are about to release 1.2.9 that addresses one known case of o2net consuming 100% cpu. Due out next week. Michael S. Moody wrote:> > I had an instance today on several servers where the load average > soared, and all of my apache processes were in uninterruptible sleep > state. > > > > I did run scanlocks, and the ps command as requested: > > > > On one server some apache processes looked like this: > > 3345 D apache2 dlm_wait_for_recovery > > > > On another, the output was completely different. > > > > Nonetheless, the following threads on one server were using 100% cpu: > > > > [o2net] > > [o2hb-BC778ACE98] > > [dlm_thread] > > > > That box needed to be rebooted, and things recovered. > > > > The output of scanlocks: > > > > forum3 ~ # ./scanlocks > > /dev/sdb1 M000000000000000000002b7976e45d > > /dev/sdb1 O000000000000000242b4bb00000000 > > /dev/sdb1 O0000000000000000dd506300000000 > > /dev/sdb1 O0000000000000000f898b400000000 > > /dev/sdb1 O0000000000000001b23aa000000000 > > /dev/sdb1 O000000000000000133e0c200000000 > > /dev/sdb1 O00000000000000011c6ea100000000 > > /dev/sdb1 N00000000013fffb30140013c > > /dev/sdb1 N000000000039292b016d640d > > /dev/sdb1 O0000000000000000f8999e00000000 > > /dev/sdb1 N000000000042fb6002ff2763 > > /dev/sdb1 O0000000000000002d5d64300000000 > > /dev/sdb1 O0000000000000001e639b100000000 > > /dev/sdb1 O0000000000000000398aa700000000 > > /dev/sdb1 O0000000000000000c139fd00000000 > > /dev/sdb1 O00000000000000014df4ab00000000 > > /dev/sdb1 D0000000000000000815e6ed5b0763f > > > > > > forum ~ # ./scanlocks > > /dev/sdd1 O0000000000000000dd506300000000 > > /dev/sdd1 M0000000000000000c102c700000000 > > /dev/sdd1 M000000000000000039d65600000000 > > /dev/sdd1 S000000000000000000000200000000 > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Michael Moody
2008-Jun-09 22:16 UTC
[Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen
Ocfs2-tools version 1.3.9 Kernel 2.6.24-gentoo-r8 Michael Sunil Mushran wrote:> What version/kernel are you running? > > We are about to release 1.2.9 that addresses one known case of o2net > consuming 100% cpu. Due out next week. > > Michael S. Moody wrote: >> >> I had an instance today on several servers where the load average >> soared, and all of my apache processes were in uninterruptible sleep >> state. >> >> >> >> I did run scanlocks, and the ps command as requested: >> >> >> >> On one server some apache processes looked like this: >> >> 3345 D apache2 dlm_wait_for_recovery >> >> >> >> On another, the output was completely different. >> >> >> >> Nonetheless, the following threads on one server were using 100% cpu: >> >> >> >> [o2net] >> >> [o2hb-BC778ACE98] >> >> [dlm_thread] >> >> >> >> That box needed to be rebooted, and things recovered. >> >> >> >> The output of scanlocks: >> >> >> >> forum3 ~ # ./scanlocks >> >> /dev/sdb1 M000000000000000000002b7976e45d >> >> /dev/sdb1 O000000000000000242b4bb00000000 >> >> /dev/sdb1 O0000000000000000dd506300000000 >> >> /dev/sdb1 O0000000000000000f898b400000000 >> >> /dev/sdb1 O0000000000000001b23aa000000000 >> >> /dev/sdb1 O000000000000000133e0c200000000 >> >> /dev/sdb1 O00000000000000011c6ea100000000 >> >> /dev/sdb1 N00000000013fffb30140013c >> >> /dev/sdb1 N000000000039292b016d640d >> >> /dev/sdb1 O0000000000000000f8999e00000000 >> >> /dev/sdb1 N000000000042fb6002ff2763 >> >> /dev/sdb1 O0000000000000002d5d64300000000 >> >> /dev/sdb1 O0000000000000001e639b100000000 >> >> /dev/sdb1 O0000000000000000398aa700000000 >> >> /dev/sdb1 O0000000000000000c139fd00000000 >> >> /dev/sdb1 O00000000000000014df4ab00000000 >> >> /dev/sdb1 D0000000000000000815e6ed5b0763f >> >> >> >> >> >> forum ~ # ./scanlocks >> >> /dev/sdd1 O0000000000000000dd506300000000 >> >> /dev/sdd1 M0000000000000000c102c700000000 >> >> /dev/sdd1 M000000000000000039d65600000000 >> >> /dev/sdd1 S000000000000000000000200000000 >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users > >-- Michael S. Moody Sr. Systems Engineer Global Systems Consulting Direct: (650) 265-4154 Web: http://www.GlobalSystemsConsulting.com Engineering Support: support at gsc.cc Billing Support: billing at gsc.cc Customer Support Portal: http://my.gsc.cc NOTICE - This message contains privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message, you are hereby notified that you must not disseminate, copy or take any action in reliance on it. If you have received this message in error, please immediately notify Global Systems Consulting, its subsidiaries or associates. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the view of Global Systems Consulting, its subsidiaries and associates.
Sunil Mushran
2008-Jun-09 22:27 UTC
[Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c824c3c723f2e37a00b3b739a55b28de595fd72e Michael Moody wrote:> Ocfs2-tools version 1.3.9 > Kernel 2.6.24-gentoo-r8 > > Michael > > Sunil Mushran wrote: > >> What version/kernel are you running? >> >> We are about to release 1.2.9 that addresses one known case of o2net >> consuming 100% cpu. Due out next week. >> >> Michael S. Moody wrote: >> >>> I had an instance today on several servers where the load average >>> soared, and all of my apache processes were in uninterruptible sleep >>> state. >>> >>> >>> >>> I did run scanlocks, and the ps command as requested: >>> >>> >>> >>> On one server some apache processes looked like this: >>> >>> 3345 D apache2 dlm_wait_for_recovery >>> >>> >>> >>> On another, the output was completely different. >>> >>> >>> >>> Nonetheless, the following threads on one server were using 100% cpu: >>> >>> >>> >>> [o2net] >>> >>> [o2hb-BC778ACE98] >>> >>> [dlm_thread] >>> >>> >>> >>> That box needed to be rebooted, and things recovered. >>> >>> >>> >>> The output of scanlocks: >>> >>> >>> >>> forum3 ~ # ./scanlocks >>> >>> /dev/sdb1 M000000000000000000002b7976e45d >>> >>> /dev/sdb1 O000000000000000242b4bb00000000 >>> >>> /dev/sdb1 O0000000000000000dd506300000000 >>> >>> /dev/sdb1 O0000000000000000f898b400000000 >>> >>> /dev/sdb1 O0000000000000001b23aa000000000 >>> >>> /dev/sdb1 O000000000000000133e0c200000000 >>> >>> /dev/sdb1 O00000000000000011c6ea100000000 >>> >>> /dev/sdb1 N00000000013fffb30140013c >>> >>> /dev/sdb1 N000000000039292b016d640d >>> >>> /dev/sdb1 O0000000000000000f8999e00000000 >>> >>> /dev/sdb1 N000000000042fb6002ff2763 >>> >>> /dev/sdb1 O0000000000000002d5d64300000000 >>> >>> /dev/sdb1 O0000000000000001e639b100000000 >>> >>> /dev/sdb1 O0000000000000000398aa700000000 >>> >>> /dev/sdb1 O0000000000000000c139fd00000000 >>> >>> /dev/sdb1 O00000000000000014df4ab00000000 >>> >>> /dev/sdb1 D0000000000000000815e6ed5b0763f >>> >>> >>> >>> >>> >>> forum ~ # ./scanlocks >>> >>> /dev/sdd1 O0000000000000000dd506300000000 >>> >>> /dev/sdd1 M0000000000000000c102c700000000 >>> >>> /dev/sdd1 M000000000000000039d65600000000 >>> >>> /dev/sdd1 S000000000000000000000200000000 >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Ocfs2-users mailing list >>> Ocfs2-users at oss.oracle.com >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>> >> > >
Michael Moody
2008-Jun-09 22:31 UTC
[Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen
What ramifications could this bug have? Random filesystem lockups, file locks not being released? Michael Sunil Mushran wrote:> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c824c3c723f2e37a00b3b739a55b28de595fd72e > > > Michael Moody wrote: >> Ocfs2-tools version 1.3.9 >> Kernel 2.6.24-gentoo-r8 >> >> Michael >> >> Sunil Mushran wrote: >> >>> What version/kernel are you running? >>> >>> We are about to release 1.2.9 that addresses one known case of o2net >>> consuming 100% cpu. Due out next week. >>> >>> Michael S. Moody wrote: >>> >>>> I had an instance today on several servers where the load average >>>> soared, and all of my apache processes were in uninterruptible sleep >>>> state. >>>> >>>> >>>> >>>> I did run scanlocks, and the ps command as requested: >>>> >>>> >>>> >>>> On one server some apache processes looked like this: >>>> >>>> 3345 D apache2 dlm_wait_for_recovery >>>> >>>> >>>> >>>> On another, the output was completely different. >>>> >>>> >>>> >>>> Nonetheless, the following threads on one server were using 100% cpu: >>>> >>>> >>>> >>>> [o2net] >>>> >>>> [o2hb-BC778ACE98] >>>> >>>> [dlm_thread] >>>> >>>> >>>> >>>> That box needed to be rebooted, and things recovered. >>>> >>>> >>>> >>>> The output of scanlocks: >>>> >>>> >>>> >>>> forum3 ~ # ./scanlocks >>>> >>>> /dev/sdb1 M000000000000000000002b7976e45d >>>> >>>> /dev/sdb1 O000000000000000242b4bb00000000 >>>> >>>> /dev/sdb1 O0000000000000000dd506300000000 >>>> >>>> /dev/sdb1 O0000000000000000f898b400000000 >>>> >>>> /dev/sdb1 O0000000000000001b23aa000000000 >>>> >>>> /dev/sdb1 O000000000000000133e0c200000000 >>>> >>>> /dev/sdb1 O00000000000000011c6ea100000000 >>>> >>>> /dev/sdb1 N00000000013fffb30140013c >>>> >>>> /dev/sdb1 N000000000039292b016d640d >>>> >>>> /dev/sdb1 O0000000000000000f8999e00000000 >>>> >>>> /dev/sdb1 N000000000042fb6002ff2763 >>>> >>>> /dev/sdb1 O0000000000000002d5d64300000000 >>>> >>>> /dev/sdb1 O0000000000000001e639b100000000 >>>> >>>> /dev/sdb1 O0000000000000000398aa700000000 >>>> >>>> /dev/sdb1 O0000000000000000c139fd00000000 >>>> >>>> /dev/sdb1 O00000000000000014df4ab00000000 >>>> >>>> /dev/sdb1 D0000000000000000815e6ed5b0763f >>>> >>>> >>>> >>>> >>>> >>>> forum ~ # ./scanlocks >>>> >>>> /dev/sdd1 O0000000000000000dd506300000000 >>>> >>>> /dev/sdd1 M0000000000000000c102c700000000 >>>> >>>> /dev/sdd1 M000000000000000039d65600000000 >>>> >>>> /dev/sdd1 S000000000000000000000200000000 >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> _______________________________________________ >>>> Ocfs2-users mailing list >>>> Ocfs2-users at oss.oracle.com >>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>> >>> >> >> >-- Michael S. Moody Sr. Systems Engineer Global Systems Consulting Direct: (650) 265-4154 Web: http://www.GlobalSystemsConsulting.com Engineering Support: support at gsc.cc Billing Support: billing at gsc.cc Customer Support Portal: http://my.gsc.cc NOTICE - This message contains privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message, you are hereby notified that you must not disseminate, copy or take any action in reliance on it. If you have received this message in error, please immediately notify Global Systems Consulting, its subsidiaries or associates. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the view of Global Systems Consulting, its subsidiaries and associates.
Sunil Mushran
2008-Jun-09 22:43 UTC
[Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen
The one I am aware of is o2net spinning at 100% before the node is fenced due to cluster timeout. Michael Moody wrote:> What ramifications could this bug have? > > Random filesystem lockups, file locks not being released? > > Michael > > Sunil Mushran wrote: > >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c824c3c723f2e37a00b3b739a55b28de595fd72e >> >> >> Michael Moody wrote: >> >>> Ocfs2-tools version 1.3.9 >>> Kernel 2.6.24-gentoo-r8 >>> >>> Michael >>> >>> Sunil Mushran wrote: >>> >>> >>>> What version/kernel are you running? >>>> >>>> We are about to release 1.2.9 that addresses one known case of o2net >>>> consuming 100% cpu. Due out next week. >>>> >>>> Michael S. Moody wrote: >>>> >>>> >>>>> I had an instance today on several servers where the load average >>>>> soared, and all of my apache processes were in uninterruptible sleep >>>>> state. >>>>> >>>>> >>>>> >>>>> I did run scanlocks, and the ps command as requested: >>>>> >>>>> >>>>> >>>>> On one server some apache processes looked like this: >>>>> >>>>> 3345 D apache2 dlm_wait_for_recovery >>>>> >>>>> >>>>> >>>>> On another, the output was completely different. >>>>> >>>>> >>>>> >>>>> Nonetheless, the following threads on one server were using 100% cpu: >>>>> >>>>> >>>>> >>>>> [o2net] >>>>> >>>>> [o2hb-BC778ACE98] >>>>> >>>>> [dlm_thread] >>>>> >>>>> >>>>> >>>>> That box needed to be rebooted, and things recovered. >>>>> >>>>> >>>>> >>>>> The output of scanlocks: >>>>> >>>>> >>>>> >>>>> forum3 ~ # ./scanlocks >>>>> >>>>> /dev/sdb1 M000000000000000000002b7976e45d >>>>> >>>>> /dev/sdb1 O000000000000000242b4bb00000000 >>>>> >>>>> /dev/sdb1 O0000000000000000dd506300000000 >>>>> >>>>> /dev/sdb1 O0000000000000000f898b400000000 >>>>> >>>>> /dev/sdb1 O0000000000000001b23aa000000000 >>>>> >>>>> /dev/sdb1 O000000000000000133e0c200000000 >>>>> >>>>> /dev/sdb1 O00000000000000011c6ea100000000 >>>>> >>>>> /dev/sdb1 N00000000013fffb30140013c >>>>> >>>>> /dev/sdb1 N000000000039292b016d640d >>>>> >>>>> /dev/sdb1 O0000000000000000f8999e00000000 >>>>> >>>>> /dev/sdb1 N000000000042fb6002ff2763 >>>>> >>>>> /dev/sdb1 O0000000000000002d5d64300000000 >>>>> >>>>> /dev/sdb1 O0000000000000001e639b100000000 >>>>> >>>>> /dev/sdb1 O0000000000000000398aa700000000 >>>>> >>>>> /dev/sdb1 O0000000000000000c139fd00000000 >>>>> >>>>> /dev/sdb1 O00000000000000014df4ab00000000 >>>>> >>>>> /dev/sdb1 D0000000000000000815e6ed5b0763f >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> forum ~ # ./scanlocks >>>>> >>>>> /dev/sdd1 O0000000000000000dd506300000000 >>>>> >>>>> /dev/sdd1 M0000000000000000c102c700000000 >>>>> >>>>> /dev/sdd1 M000000000000000039d65600000000 >>>>> >>>>> /dev/sdd1 S000000000000000000000200000000 >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ocfs2-users mailing list >>>>> Ocfs2-users at oss.oracle.com >>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>>> >>>>> >>>> >>>> >>> >>> > >