Ivan Wong
2005-Dec-16 15:50 UTC
[Ocfs-users] Server crashed with Common/ocfsgencreate.c, Common/ocfsgenvote.c
Hi Experts, We have a 4nodes RAC running and recently one is down due to hardware (fibre optics card) failure. Since running on 3-nodes RAC, the surviving server just keep crashing. We cannot figure out why is this happening but checking /var/log/messages we have these error (notice the msg before crashing at 8:32): Dec 12 08:30:45 x335-142 kernel: (2) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 08:30:45 x335-142 kernel: (2) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:32:28 x335-142 kernel: (2) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 08:32:28 x335-142 kernel: (2) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:46:35 x335-142 sshd(pam_unix)[9468]: session opened for user oracle by (uid=0) Dec 12 08:55:30 x335-142 xinetd[11044]: warning: can't get client address: Connection reset by peer Dec 12 09:59:11 x335-142 sshd(pam_unix)[9468]: session closed for user oracle Dec 12 15:15:48 x335-142 kernel: (4) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 15:15:48 x335-142 kernel: (4) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:10 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:10 x335-142 kernel: (3) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:20 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:20 x335-142 kernel: (3) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:30 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:30 x335-142 kernel: (3) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:35 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Power cycle the box will allow us to continue starting db, etc. But this is the 4th time in two weeks. Since the only error found is ocfs, just wondering if anyone have seen this? Or if it is OCFS related. Our environment is: x335-142:slr142:/e2open/home/oracle: 1000>rpm -qa | grep ocfs ocfs-2.4.9-e-smp-1.0.9-9 ocfs-support-1.0.9-9 ocfs-tools-1.0.9-9 x335-142:slr142:/e2open/home/oracle: 1001>uname -a Linux x335-142 2.4.9-e.25smp #1 SMP Fri Jun 6 18:11:40 EDT 2003 i686 unknown Appreciate any feedback. Thanks / regards, Ivan Wong Database Administrator e2Open Inc. (www.e2open.com) Suite 34.03, Level 34, Menara Citibank 156, Jalan Ampang, 50450 Kuala Lumpur, Malaysia DID: +603 2776 6397 Tel: +603 2776 6300 Fax: +603 2712 9112
Ivan Wong
2005-Dec-20 09:05 UTC
[Ocfs-users] Server crashed with Common/ocfsgencreate.c, Common/ocfsgenvote.c
Sorry for the non-read friendly email. This week we had another crash, the error is below: Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17, Common/ocfsgencreate.c, 1671 Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17, Common/ocfsgencreate.c, 1827 Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17, Linux/ocfsmain.c, 2090 Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17, Linux/ocfsmain.c, 2409 Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, Common/ocfsgendlm.c, 986 Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, Common/ocfsgendlm.c, 1163 Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, Common/ocfsgencreate.c, 479 Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, Linux/ocfsmain.c, 2030 Dec 18 16:33:10 x335-149 kernel: (10) ERROR: lockres=null, Linux/ocfsmain.c, 3541 Dec 18 18:20:38 x335-149 kernel: (10) ERROR: lockres=null, Linux/ocfsmain.c, 3541 Dec 19 00:02:30 x335-149 last message repeated 3 times Dec 19 00:26:24 x335-149 sshd(pam_unix)[17895]: session opened for user oracle by (uid=0) Dec 19 00:26:49 x335-149 sshd(pam_unix)[17895]: session closed for user oracle Dec 19 04:02:05 x335-149 syslogd 1.4.1: restart. Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, Common/ocfsgendirnode.c, 1507 Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, Common/ocfsgencreate.c, 1671 Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, Common/ocfsgencreate.c, 1827 Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, Linux/ocfsmain.c, 2090 Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, Linux/ocfsmain.c, 2409 Dec 19 06:48:08 x335-149 kernel: (10) ERROR: lockres=null, Linux/ocfsmain.c, 3541 Dec 19 07:15:06 x335-149 kernel: Unable to handle kernel NULL pointer dereference<3>(3) ERROR: oin has no matching inode!!!!, Common/ocfsgencreate.c, 81 Dec 19 09:00:03 x335-149 syslogd 1.4.1: restart. Dec 19 09:00:03 x335-149 syslog: syslogd startup succeeded Dec 19 09:00:04 x335-149 kernel: klogd 1.4.1, log source = /proc/kmsg started. Dec 19 09:00:04 x335-149 syslog: klogd startup succeeded Note the part that says : "oin has no matching inode" is particulary link to OCFS. Wim? Sunil? Pls anyone advice. Thanks / regards, Ivan Wong Database Administrator e2Open Inc. (www.e2open.com) Suite 34.03, Level 34, Menara Citibank, 156, Jalan Ampang, 50450 Kuala Lumpur, Malaysia DID: +603 2776 6397 Tel: +603 2776 6300 Fax: +603 2712 9112 -----Original Message----- From: ocfs-users-bounces at oss.oracle.com [mailto:ocfs-users-bounces at oss.oracle.com] On Behalf Of Ivan Wong Sent: Tuesday, December 13, 2005 3:56 PM To: ocfs-users at oss.oracle.com Subject: [Ocfs-users] Server crashed with Common/ocfsgencreate.c,Common/ocfsgenvote.c Hi Experts, We have a 4nodes RAC running and recently one is down due to hardware (fibre optics card) failure. Since running on 3-nodes RAC, the surviving server just keep crashing. We cannot figure out why is this happening but checking /var/log/messages we have these error (notice the msg before crashing at 8:32): Dec 12 08:30:45 x335-142 kernel: (2) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 08:30:45 x335-142 kernel: (2) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:32:28 x335-142 kernel: (2) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 08:32:28 x335-142 kernel: (2) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:46:35 x335-142 sshd(pam_unix)[9468]: session opened for user oracle by (uid=0) Dec 12 08:55:30 x335-142 xinetd[11044]: warning: can't get client address: Connection reset by peer Dec 12 09:59:11 x335-142 sshd(pam_unix)[9468]: session closed for user oracle Dec 12 15:15:48 x335-142 kernel: (4) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 15:15:48 x335-142 kernel: (4) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:10 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:10 x335-142 kernel: (3) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:20 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:20 x335-142 kernel: (3) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:30 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:30 x335-142 kernel: (3) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:35 x335-142 kernel: (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, 97 Power cycle the box will allow us to continue starting db, etc. But this is the 4th time in two weeks. Since the only error found is ocfs, just wondering if anyone have seen this? Or if it is OCFS related. Our environment is: x335-142:slr142:/e2open/home/oracle: 1000>rpm -qa | grep ocfs ocfs-2.4.9-e-smp-1.0.9-9 ocfs-support-1.0.9-9 ocfs-tools-1.0.9-9 x335-142:slr142:/e2open/home/oracle: 1001>uname -a Linux x335-142 2.4.9-e.25smp #1 SMP Fri Jun 6 18:11:40 EDT 2003 i686 unknown Appreciate any feedback. Thanks / regards, Ivan Wong Database Administrator e2Open Inc. (www.e2open.com) Suite 34.03, Level 34, Menara Citibank 156, Jalan Ampang, 50450 Kuala Lumpur, Malaysia DID: +603 2776 6397 Tel: +603 2776 6300 Fax: +603 2712 9112 _______________________________________________ Ocfs-users mailing list Ocfs-users at oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs-users
Ivan Wong
2005-Dec-21 03:51 UTC
[Ocfs-users] Server crashed with Common/ocfsgencreate.c, Common/ocfsgenvote.c
Hi Sunil, Thanks for responding. We would really love to upgrade the OCFS version. However, to get to 1.0.13/14, the kernel version will have to be upgraded as well. We are running a 7x24 system, considering the downtime and risk we will do upgrade as last resort. So far we still can live with this version...until this happen and we want to confirm if it is OCFS bug before proceed with upgrade? Will the upgrade (with risk/downtime) definitely will fix the problem? Thanks / regards, Ivan Wong Database Administrator e2Open Inc. (www.e2open.com) Suite 34.03, Level 34, Menara Citibank, 156, Jalan Ampang, 50450 Kuala Lumpur, Malaysia DID: +603 2776 6397 Tel: +603 2776 6300 Fax: +603 2712 9112 -----Original Message----- From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] Sent: Wednesday, December 21, 2005 1:15 AM To: Ivan Wong Cc: ocfs-users at oss.oracle.com Subject: Re: [Ocfs-users] Server crashed with Common/ocfsgencreate.c, Common/ocfsgenvote.c 1.0.9-9? You are running a very very old verion of ocfs. Please upgrade to atleast 1.0.13 if not 1.0.14. The README has the list of bugs fixed. While the disk format has not changed, that means you can just upgrade the rpm and start, you will have to umount the volumes on all nodes before installing. README lists this requirement when upgrading from 1.0.9-x to 1.0.10 or more. Ivan Wong wrote:>Sorry for the non-read friendly email. This week we had another crash, >the error is below: > >Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17, >Common/ocfsgencreate.c, 1671 Dec 18 14:16:26 x335-149 kernel: (5694) >ERROR: status = -17, Common/ocfsgencreate.c, 1827 >Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17, >Linux/ocfsmain.c, 2090 >Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17, >Linux/ocfsmain.c, 2409 >Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, >Common/ocfsgendlm.c, 986 >Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, >Common/ocfsgendlm.c, 1163 >Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, >Common/ocfsgencreate.c, 479 >Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2, >Linux/ocfsmain.c, 2030 >Dec 18 16:33:10 x335-149 kernel: (10) ERROR: lockres=null, >Linux/ocfsmain.c, 3541 >Dec 18 18:20:38 x335-149 kernel: (10) ERROR: lockres=null, >Linux/ocfsmain.c, 3541 >Dec 19 00:02:30 x335-149 last message repeated 3 times >Dec 19 00:26:24 x335-149 sshd(pam_unix)[17895]: session opened for user >oracle by (uid=0) >Dec 19 00:26:49 x335-149 sshd(pam_unix)[17895]: session closed for user >oracle >Dec 19 04:02:05 x335-149 syslogd 1.4.1: restart. >Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, >Common/ocfsgendirnode.c, 1507 >Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, >Common/ocfsgencreate.c, 1671 >Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, >Common/ocfsgencreate.c, 1827 >Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, >Linux/ocfsmain.c, 2090 >Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17, >Linux/ocfsmain.c, 2409 >Dec 19 06:48:08 x335-149 kernel: (10) ERROR: lockres=null, >Linux/ocfsmain.c, 3541 >Dec 19 07:15:06 x335-149 kernel: Unable to handle kernel NULL pointer >dereference<3>(3) ERROR: oin has no matching inode!!!!, >Common/ocfsgencreate.c, 81 >Dec 19 09:00:03 x335-149 syslogd 1.4.1: restart. >Dec 19 09:00:03 x335-149 syslog: syslogd startup succeeded >Dec 19 09:00:04 x335-149 kernel: klogd 1.4.1, log source = /proc/kmsg >started. >Dec 19 09:00:04 x335-149 syslog: klogd startup succeeded > > >Note the part that says : "oin has no matching inode" is particulary >link to OCFS. Wim? Sunil? Pls anyone advice. > > >Thanks / regards, > >Ivan Wong >Database Administrator > >e2Open Inc. (www.e2open.com) >Suite 34.03, Level 34, >Menara Citibank, >156, Jalan Ampang, >50450 Kuala Lumpur, Malaysia >DID: +603 2776 6397 >Tel: +603 2776 6300 >Fax: +603 2712 9112 > > >-----Original Message----- >From: ocfs-users-bounces at oss.oracle.com >[mailto:ocfs-users-bounces at oss.oracle.com] On Behalf Of Ivan Wong >Sent: Tuesday, December 13, 2005 3:56 PM >To: ocfs-users at oss.oracle.com >Subject: [Ocfs-users] Server crashed with >Common/ocfsgencreate.c,Common/ocfsgenvote.c > > >Hi Experts, > >We have a 4nodes RAC running and recently one is down due to hardware >(fibre optics card) failure. Since running on 3-nodes RAC, the >surviving server just keep crashing. We cannot figure out why is this >happening but checking /var/log/messages we have these error (notice >the msg before crashing at 8:32): > >Dec 12 08:30:45 x335-142 kernel: (2) ERROR: file entry name did not >match inode, Common/ocfsgencreate.c, 97 Dec 12 08:30:45 x335-142 >kernel: >(2) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:32:28 >x335-142 kernel: (2) ERROR: file entry name did not match inode, >Common/ocfsgencreate.c, 97 Dec 12 08:32:28 x335-142 kernel: (2) ERROR: >status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:46:35 x335-142 >sshd(pam_unix)[9468]: session opened for user oracle by (uid=0) Dec 12 >08:55:30 x335-142 xinetd[11044]: warning: can't get client >address: Connection reset by peer >Dec 12 09:59:11 x335-142 sshd(pam_unix)[9468]: session closed for user >oracle Dec 12 15:15:48 x335-142 kernel: (4) ERROR: file entry name did >not match inode, Common/ocfsgencreate.c, 97 Dec 12 15:15:48 x335-142 >kernel: (4) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 >16:16:10 x335-142 kernel: (3) ERROR: file entry name did not match >inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:10 x335-142 kernel: (3) >ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:20 x335-142 >kernel: (3) ERROR: file entry name did not match inode, >Common/ocfsgencreate.c, 97 Dec 12 16:16:20 x335-142 kernel: (3) ERROR: >status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:30 x335-142 kernel: >(3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c, >97 Dec 12 16:16:30 x335-142 kernel: (3) ERROR: status = -2, >Common/ocfsgenvote.c, 121 Dec 12 16:16:35 x335-142 kernel: (3) ERROR: >file entry name did not match inode, Common/ocfsgencreate.c, 97 > > >Power cycle the box will allow us to continue starting db, etc. But >this is the 4th time in two weeks. Since the only error found is ocfs, >just wondering if anyone have seen this? Or if it is OCFS related. > >Our environment is: > >x335-142:slr142:/e2open/home/oracle: 1000>rpm -qa | grep ocfs >ocfs-2.4.9-e-smp-1.0.9-9 ocfs-support-1.0.9-9 ocfs-tools-1.0.9-9 >x335-142:slr142:/e2open/home/oracle: 1001>uname -a >Linux x335-142 2.4.9-e.25smp #1 SMP Fri Jun 6 18:11:40 EDT 2003 i686 >unknown > >Appreciate any feedback. > > > >Thanks / regards, > >Ivan Wong >Database Administrator > >e2Open Inc. (www.e2open.com) >Suite 34.03, Level 34, Menara Citibank >156, Jalan Ampang, >50450 Kuala Lumpur, Malaysia >DID: +603 2776 6397 Tel: +603 2776 6300 Fax: +603 2712 9112 >_______________________________________________ >Ocfs-users mailing list >Ocfs-users at oss.oracle.com >http://oss.oracle.com/mailman/listinfo/ocfs-users >_______________________________________________ >Ocfs-users mailing list >Ocfs-users at oss.oracle.com >http://oss.oracle.com/mailman/listinfo/ocfs-users > >
Sunil Mushran
2006-Feb-16 00:48 UTC
[Ocfs-users] Server crashed with Common/ocfsgencreate.c, Common/ocfsgenvote.c
Sorry for the late response. The mailserver on oss.oracle.com was misbehaving. :) I believe I have corresponded to you before and indicated that 1.0.9-x is a very old release... 2+ yrs old and that I would seriously recommend an upgrade to atleast 1.0.13 if not 1.0.14. We had made quite a few fixes in the area you are encountering errors. Ivan Wong wrote:> Hi Experts, > > We have a 4nodes RAC running and recently one is down due to hardware > (fibre optics card) failure. Since running on 3-nodes RAC, the surviving > server just keep crashing. We cannot figure out why is this happening > but checking /var/log/messages we have these error (notice the msg > before crashing at 8:32): > > Dec 12 08:30:45 x335-142 kernel: (2) ERROR: file entry name did not > match inode, Common/ocfsgencreate.c, 97 > Dec 12 08:30:45 x335-142 kernel: (2) ERROR: status = -2, > Common/ocfsgenvote.c, 121 > Dec 12 08:32:28 x335-142 kernel: (2) ERROR: file entry name did not > match inode, Common/ocfsgencreate.c, 97 > Dec 12 08:32:28 x335-142 kernel: (2) ERROR: status = -2, > Common/ocfsgenvote.c, 121 > Dec 12 08:46:35 x335-142 sshd(pam_unix)[9468]: session opened for user > oracle by (uid=0) > Dec 12 08:55:30 x335-142 xinetd[11044]: warning: can't get client > address: Connection reset by peer > Dec 12 09:59:11 x335-142 sshd(pam_unix)[9468]: session closed for user > oracle > Dec 12 15:15:48 x335-142 kernel: (4) ERROR: file entry name did not > match inode, Common/ocfsgencreate.c, 97 > Dec 12 15:15:48 x335-142 kernel: (4) ERROR: status = -2, > Common/ocfsgenvote.c, 121 > Dec 12 16:16:10 x335-142 kernel: (3) ERROR: file entry name did not > match inode, Common/ocfsgencreate.c, 97 > Dec 12 16:16:10 x335-142 kernel: (3) ERROR: status = -2, > Common/ocfsgenvote.c, 121 > Dec 12 16:16:20 x335-142 kernel: (3) ERROR: file entry name did not > match inode, Common/ocfsgencreate.c, 97 > Dec 12 16:16:20 x335-142 kernel: (3) ERROR: status = -2, > Common/ocfsgenvote.c, 121 > Dec 12 16:16:30 x335-142 kernel: (3) ERROR: file entry name did not > match inode, Common/ocfsgencreate.c, 97 > Dec 12 16:16:30 x335-142 kernel: (3) ERROR: status = -2, > Common/ocfsgenvote.c, 121 > Dec 12 16:16:35 x335-142 kernel: (3) ERROR: file entry name did not > match inode, Common/ocfsgencreate.c, 97 > > > Power cycle the box will allow us to continue starting db, etc. But this > is the 4th time in two weeks. Since the only error found is ocfs, just > wondering if anyone have seen this? Or if it is OCFS related. > > Our environment is: > > x335-142:slr142:/e2open/home/oracle: 1000>rpm -qa | grep ocfs > ocfs-2.4.9-e-smp-1.0.9-9 > ocfs-support-1.0.9-9 > ocfs-tools-1.0.9-9 > x335-142:slr142:/e2open/home/oracle: 1001>uname -a > Linux x335-142 2.4.9-e.25smp #1 SMP Fri Jun 6 18:11:40 EDT 2003 i686 > unknown > > Appreciate any feedback. > > > > Thanks / regards, > > Ivan Wong > Database Administrator > > e2Open Inc. (www.e2open.com) > Suite 34.03, Level 34, Menara Citibank > 156, Jalan Ampang, > 50450 Kuala Lumpur, Malaysia > DID: +603 2776 6397 Tel: +603 2776 6300 Fax: +603 2712 9112 > _______________________________________________ > Ocfs-users mailing list > Ocfs-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs-users >