Hi experts, Just upgraded from 1.0.9-9 to 1.0.13-1. All went smooth but every two months we have a problem of CPU gets 100% on any activity done on OCFS mountpoints. (mv, cp, gzip, rm, etc). The only way I know how to solve now is reboot the stack but this cost us downtime and this is only a band aid not a solution. Found some msg on /var/log/message before this happen that may or may not be related: Dec 4 20:13:39 x335-215 kernel: ocfs: Removing x335-235-HB (node 5) from clustered device (8,36) Dec 4 20:14:21 x335-215 kernel: ocfs: Adding x335-235-HB (node 5) to clustered device (8,36) My current version is: /e2open/home/oracle: 1004>rpm -qa | grep ocfs ocfs-support-1.0.10-1 ocfs-tools-1.0.10-1 ocfs-2.4.9-e-smp-1.0.13-1 Output from top: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 6299 oracle 25 0 412 412 352 R 99.9 0.0 14:28 cp 25751 root 25 0 3584 3584 828 R 97.5 0.0 260:46 bpbkar 3628 oracle 15 0 18856 17M 13092 D 7.7 0.4 3:21 oracle 8158 oracle 15 0 1200 1200 772 R 0.3 0.0 0:01 top 8 root 34 19 0 0 0 RWN 0.0 0.0 1:21 ksoftirqd_CPU2 Any advice what is the problem? Thanks / regards, Ivan Wong Database Administrator e2Open Inc. (www.e2open.com <http://www.e2open.com/> ) Suite 34.03, Level 34, Menara Citibank, 156, Jalan Ampang, 50450 Kuala Lumpur, Malaysia DID: +603 2776 6392 Tel: +603 2776 6300 Fax: +603 2712 9112 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs-users/attachments/20061205/6a3b9d30/attachment.html
Do: echo t >/proc/sysrq-trigger Have a netdump server to capture the stack traces. That should show where it is spinning. Ivan Wong wrote:> Hi experts, > > Just upgraded from 1.0.9-9 to 1.0.13-1. All went smooth but every two > months we have a problem of CPU gets 100% on any activity done on OCFS > mountpoints. (mv, cp, gzip, rm, etc). The only way I know how to solve > now is reboot the stack but this cost us downtime and this is only a > band aid not a solution. > > Found some msg on /var/log/message before this happen that may or may > not be related: > > Dec 4 20:13:39 x335-215 kernel: ocfs: Removing x335-235-HB (node 5) > from clustered device (8,36) > Dec 4 20:14:21 x335-215 kernel: ocfs: Adding x335-235-HB (node 5) to > clustered device (8,36) > My current version is: > > /e2open/home/oracle: 1004>rpm -qa | grep ocfs > ocfs-support-1.0.10-1 > ocfs-tools-1.0.10-1 > ocfs-2.4.9-e-smp-1.0.13-1 > > Output from top: > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 6299 oracle 25 0 412 412 352 R 99.9 0.0 14:28 cp > 25751 root 25 0 3584 3584 828 R 97.5 0.0 260:46 bpbkar > 3628 oracle 15 0 18856 17M 13092 D 7.7 0.4 3:21 oracle > 8158 oracle 15 0 1200 1200 772 R 0.3 0.0 0:01 top > 8 root 34 19 0 0 0 RWN 0.0 0.0 1:21 > ksoftirqd_CPU2 > > Any advice what is the problem? > > Thanks / regards, > > Ivan Wong > Database Administrator > > e2Open Inc. (www.e2open.com <http://www.e2open.com/>) > Suite 34.03, Level 34, > Menara Citibank, > 156, Jalan Ampang, > 50450 Kuala Lumpur, Malaysia > DID: +603 2776 6392 > Tel: +603 2776 6300 > Fax: +603 2712 9112 > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs-users mailing list > Ocfs-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs-users >