Hi experts,
Just upgraded from 1.0.9-9 to 1.0.13-1. All went smooth but every two
months we have a problem of CPU gets 100% on any activity done on OCFS
mountpoints. (mv, cp, gzip, rm, etc). The only way I know how to solve
now is reboot the stack but this cost us downtime and this is only a
band aid not a solution.
Found some msg on /var/log/message before this happen that may or may
not be related:
Dec 4 20:13:39 x335-215 kernel: ocfs: Removing x335-235-HB (node 5)
from clustered device (8,36)
Dec 4 20:14:21 x335-215 kernel: ocfs: Adding x335-235-HB (node 5) to
clustered device (8,36)
My current version is:
/e2open/home/oracle: 1004>rpm -qa | grep ocfs
ocfs-support-1.0.10-1
ocfs-tools-1.0.10-1
ocfs-2.4.9-e-smp-1.0.13-1
Output from top:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
6299 oracle 25 0 412 412 352 R 99.9 0.0 14:28 cp
25751 root 25 0 3584 3584 828 R 97.5 0.0 260:46 bpbkar
3628 oracle 15 0 18856 17M 13092 D 7.7 0.4 3:21 oracle
8158 oracle 15 0 1200 1200 772 R 0.3 0.0 0:01 top
8 root 34 19 0 0 0 RWN 0.0 0.0 1:21
ksoftirqd_CPU2
Any advice what is the problem?
Thanks / regards,
Ivan Wong
Database Administrator
e2Open Inc. (www.e2open.com <http://www.e2open.com/> )
Suite 34.03, Level 34,
Menara Citibank,
156, Jalan Ampang,
50450 Kuala Lumpur, Malaysia
DID: +603 2776 6392
Tel: +603 2776 6300
Fax: +603 2712 9112
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs-users/attachments/20061205/6a3b9d30/attachment.html
Do: echo t >/proc/sysrq-trigger Have a netdump server to capture the stack traces. That should show where it is spinning. Ivan Wong wrote:> Hi experts, > > Just upgraded from 1.0.9-9 to 1.0.13-1. All went smooth but every two > months we have a problem of CPU gets 100% on any activity done on OCFS > mountpoints. (mv, cp, gzip, rm, etc). The only way I know how to solve > now is reboot the stack but this cost us downtime and this is only a > band aid not a solution. > > Found some msg on /var/log/message before this happen that may or may > not be related: > > Dec 4 20:13:39 x335-215 kernel: ocfs: Removing x335-235-HB (node 5) > from clustered device (8,36) > Dec 4 20:14:21 x335-215 kernel: ocfs: Adding x335-235-HB (node 5) to > clustered device (8,36) > My current version is: > > /e2open/home/oracle: 1004>rpm -qa | grep ocfs > ocfs-support-1.0.10-1 > ocfs-tools-1.0.10-1 > ocfs-2.4.9-e-smp-1.0.13-1 > > Output from top: > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 6299 oracle 25 0 412 412 352 R 99.9 0.0 14:28 cp > 25751 root 25 0 3584 3584 828 R 97.5 0.0 260:46 bpbkar > 3628 oracle 15 0 18856 17M 13092 D 7.7 0.4 3:21 oracle > 8158 oracle 15 0 1200 1200 772 R 0.3 0.0 0:01 top > 8 root 34 19 0 0 0 RWN 0.0 0.0 1:21 > ksoftirqd_CPU2 > > Any advice what is the problem? > > Thanks / regards, > > Ivan Wong > Database Administrator > > e2Open Inc. (www.e2open.com <http://www.e2open.com/>) > Suite 34.03, Level 34, > Menara Citibank, > 156, Jalan Ampang, > 50450 Kuala Lumpur, Malaysia > DID: +603 2776 6392 > Tel: +603 2776 6300 > Fax: +603 2712 9112 > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs-users mailing list > Ocfs-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs-users >