Hi! Is the the place to report/discuss ocfs2 related kernel bugs? We just had some file system corruption on a two node SLES9SP3 cluster with shared ocfs2 filesystem. /var/log/messages told me: Aug 17 15:56:17 tux100p012 kernel: (8504,3):ocfs2_meta_lock_update:1423 ERROR: bug expression: le64_to_cpu(fe->i_dtime) || !(fe->i_flags & cpu_to_le32(OCFS2_VALID_FL)) Aug 17 15:56:17 tux100p012 kernel: (8504,3):ocfs2_meta_lock_update:1423 ERROR: Stale dinode 17079 dtime: 1155822895 flags: 0x0 Aug 17 15:56:17 tux100p012 kernel: ------------[ cut here ]------------ Aug 17 15:56:17 tux100p012 kernel: kernel BUG at fs/ocfs2/dlmglue.c:1423! Aug 17 15:56:17 tux100p012 kernel: invalid operand: 0000 [#1] Aug 17 15:56:17 tux100p012 kernel: SMP Aug 17 15:56:17 tux100p012 kernel: CPU: 3 Aug 17 15:56:17 tux100p012 kernel: EIP: 0060:[<f9fe72ff>] Tainted: PF U Aug 17 15:56:17 tux100p012 kernel: EFLAGS: 00010246 (2.6.5-7.244-smp SLES9_SP3_BRANCH-200512121832250000) Aug 17 15:56:17 tux100p012 kernel: EIP is at ocfs2_meta_lock_full+0x46f/0x10d0 [ocfs2] Aug 17 15:56:17 tux100p012 kernel: eax: 0000005f ebx: f5573990 ecx: c03aab74 edx: 0001a8d3 Aug 17 15:56:17 tux100p012 kernel: esi: d68be000 edi: 00000000 ebp: fa017f08 esp: f3ef1dc4 Aug 17 15:56:17 tux100p012 kernel: ds: 007b es: 007b ss: 0068 Aug 17 15:56:17 tux100p012 kernel: Process nfsd (pid: 8504, threadinfo=f3ef0000 task=f5573990) Aug 17 15:56:17 tux100p012 kernel: Stack: fa01c23c 00002138 00000003 fa017f08 0000058f 000042b7 00000000 44e4752f Aug 17 15:56:17 tux100p012 kernel: 00000000 00000000 f4abb35c d68be000 d1892788 f569c06c f569c06c d1892788 Aug 17 15:56:17 tux100p012 kernel: 00000001 00000000 00000000 d1892978 dc1efc10 00000000 d7bf2200 f4abb200 Aug 17 15:56:17 tux100p012 kernel: Call Trace: Aug 17 15:56:17 tux100p012 kernel: [<c0339acc>] svc_sock_enqueue+0x13c/0x2a0 Aug 17 15:56:17 tux100p012 kernel: [<c033b068>] svc_tcp_recvfrom+0x428/0x870 Aug 17 15:56:17 tux100p012 kernel: [<f9feff19>] ocfs2_inode_revalidate+0x139/0x2f0 [ocfs2] Aug 17 15:56:17 tux100p012 kernel: [<f9fe87fc>] ocfs2_decode_fh+0xbc/0x2a0 [ocfs2] Aug 17 15:56:17 tux100p012 kernel: [<f98e30b0>] nfsd_acceptable+0x0/0xf7 [nfsd] Aug 17 15:56:17 tux100p012 kernel: [<f9feafa9>] ocfs2_getattr+0x49/0x1e0 [ocfs2] Aug 17 15:56:17 tux100p012 kernel: [<c01395cd>] set_current_groups+0x19d/0x1d0 Aug 17 15:56:17 tux100p012 kernel: [<f9feaf60>] ocfs2_getattr+0x0/0x1e0 [ocfs2] Aug 17 15:56:17 tux100p012 kernel: [<c017ec14>] vfs_getattr_it+0x54/0x120 Aug 17 15:56:17 tux100p012 kernel: [<c017eced>] vfs_getattr+0xd/0x10 Aug 17 15:56:17 tux100p012 kernel: [<f98ee02b>] nfs3svc_encode_attrstat+0x6b/0x250 [nfsd] Aug 17 15:56:17 tux100p012 kernel: [<f98eb943>] nfsd3_proc_getattr+0x73/0xb0 [nfsd] Aug 17 15:56:17 tux100p012 kernel: [<f98edfc0>] nfs3svc_encode_attrstat+0x0/0x250 [nfsd] Aug 17 15:56:17 tux100p012 kernel: [<f98e018d>] nfsd_dispatch+0x16d/0x1e0 [nfsd] Aug 17 15:56:17 tux100p012 kernel: [<c033c2ad>] svc_authenticate+0x4d/0x8d Aug 17 15:56:17 tux100p012 kernel: [<c0338a22>] svc_process+0x272/0x670 Aug 17 15:56:17 tux100p012 kernel: [<c010a1da>] apic_timer_interrupt+0x1a/0x20 Aug 17 15:56:17 tux100p012 kernel: [<f98e0624>] nfsd+0x1c4/0x369 [nfsd] Aug 17 15:56:17 tux100p012 kernel: [<f98e0460>] nfsd+0x0/0x369 [nfsd] Aug 17 15:56:17 tux100p012 kernel: [<c0107005>] kernel_thread_helper+0x5/0x10 Aug 17 15:56:17 tux100p012 kernel: Aug 17 15:56:17 tux100p012 kernel: Code: 0f 0b 8f 05 29 7a 01 fa 8b 54 24 2c 8b 44 24 4c e8 bc 82 00 Does this say anything about what happened? We don't want to run into this probloem again, of course. Regards, Kai.
Which kernel do you use? If it is native SP3 kernel, then better upgrade to the last one (# 276) or at least to #255. Native SP3 had old ocfs2 with numerous bugs. ----- Original Message ----- From: "Kai Nielsen" <kn@moonage.net> To: <ocfs2-users@oss.oracle.com> Sent: Sunday, August 20, 2006 11:25 PM Subject: [Ocfs2-users] ocfs2-users@oss.oracle.com> Hi! > > Is the the place to report/discuss ocfs2 related kernel bugs? > > We just had some file system corruption on a two node SLES9SP3 cluster > with shared ocfs2 filesystem. > > /var/log/messages told me: > > Aug 17 15:56:17 tux100p012 kernel: (8504,3):ocfs2_meta_lock_update:1423ERROR: bug expression: le64_to_cpu(fe->i_dtime) || !(fe->i_flags & cpu_to_le32(OCFS2_VALID_FL))> Aug 17 15:56:17 tux100p012 kernel: (8504,3):ocfs2_meta_lock_update:1423ERROR: Stale dinode 17079 dtime: 1155822895 flags: 0x0> Aug 17 15:56:17 tux100p012 kernel: ------------[ cut here ]------------ > Aug 17 15:56:17 tux100p012 kernel: kernel BUG at fs/ocfs2/dlmglue.c:1423! > Aug 17 15:56:17 tux100p012 kernel: invalid operand: 0000 [#1] > Aug 17 15:56:17 tux100p012 kernel: SMP > Aug 17 15:56:17 tux100p012 kernel: CPU: 3 > Aug 17 15:56:17 tux100p012 kernel: EIP: 0060:[<f9fe72ff>] Tainted:PF U> Aug 17 15:56:17 tux100p012 kernel: EFLAGS: 00010246 (2.6.5-7.244-smpSLES9_SP3_BRANCH-200512121832250000)> Aug 17 15:56:17 tux100p012 kernel: EIP is atocfs2_meta_lock_full+0x46f/0x10d0 [ocfs2]> Aug 17 15:56:17 tux100p012 kernel: eax: 0000005f ebx: f5573990 ecx:c03aab74 edx: 0001a8d3> Aug 17 15:56:17 tux100p012 kernel: esi: d68be000 edi: 00000000 ebp:fa017f08 esp: f3ef1dc4> Aug 17 15:56:17 tux100p012 kernel: ds: 007b es: 007b ss: 0068 > Aug 17 15:56:17 tux100p012 kernel: Process nfsd (pid: 8504,threadinfo=f3ef0000 task=f5573990)> Aug 17 15:56:17 tux100p012 kernel: Stack: fa01c23c 00002138 00000003fa017f08 0000058f 000042b7 00000000 44e4752f> Aug 17 15:56:17 tux100p012 kernel: 00000000 00000000 f4abb35cd68be000 d1892788 f569c06c f569c06c d1892788> Aug 17 15:56:17 tux100p012 kernel: 00000001 00000000 00000000d1892978 dc1efc10 00000000 d7bf2200 f4abb200> Aug 17 15:56:17 tux100p012 kernel: Call Trace: > Aug 17 15:56:17 tux100p012 kernel: [<c0339acc>]svc_sock_enqueue+0x13c/0x2a0> Aug 17 15:56:17 tux100p012 kernel: [<c033b068>]svc_tcp_recvfrom+0x428/0x870> Aug 17 15:56:17 tux100p012 kernel: [<f9feff19>]ocfs2_inode_revalidate+0x139/0x2f0 [ocfs2]> Aug 17 15:56:17 tux100p012 kernel: [<f9fe87fc>]ocfs2_decode_fh+0xbc/0x2a0 [ocfs2]> Aug 17 15:56:17 tux100p012 kernel: [<f98e30b0>] nfsd_acceptable+0x0/0xf7[nfsd]> Aug 17 15:56:17 tux100p012 kernel: [<f9feafa9>] ocfs2_getattr+0x49/0x1e0[ocfs2]> Aug 17 15:56:17 tux100p012 kernel: [<c01395cd>]set_current_groups+0x19d/0x1d0> Aug 17 15:56:17 tux100p012 kernel: [<f9feaf60>] ocfs2_getattr+0x0/0x1e0[ocfs2]> Aug 17 15:56:17 tux100p012 kernel: [<c017ec14>] vfs_getattr_it+0x54/0x120 > Aug 17 15:56:17 tux100p012 kernel: [<c017eced>] vfs_getattr+0xd/0x10 > Aug 17 15:56:17 tux100p012 kernel: [<f98ee02b>]nfs3svc_encode_attrstat+0x6b/0x250 [nfsd]> Aug 17 15:56:17 tux100p012 kernel: [<f98eb943>]nfsd3_proc_getattr+0x73/0xb0 [nfsd]> Aug 17 15:56:17 tux100p012 kernel: [<f98edfc0>]nfs3svc_encode_attrstat+0x0/0x250 [nfsd]> Aug 17 15:56:17 tux100p012 kernel: [<f98e018d>] nfsd_dispatch+0x16d/0x1e0[nfsd]> Aug 17 15:56:17 tux100p012 kernel: [<c033c2ad>]svc_authenticate+0x4d/0x8d> Aug 17 15:56:17 tux100p012 kernel: [<c0338a22>] svc_process+0x272/0x670 > Aug 17 15:56:17 tux100p012 kernel: [<c010a1da>]apic_timer_interrupt+0x1a/0x20> Aug 17 15:56:17 tux100p012 kernel: [<f98e0624>] nfsd+0x1c4/0x369 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<f98e0460>] nfsd+0x0/0x369 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<c0107005>]kernel_thread_helper+0x5/0x10> Aug 17 15:56:17 tux100p012 kernel: > Aug 17 15:56:17 tux100p012 kernel: Code: 0f 0b 8f 05 29 7a 01 fa 8b 54 242c 8b 44 24 4c e8 bc 82 00> > Does this say anything about what happened? We don't want to run into > this probloem again, of course. > > Regards, > > Kai. > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
You are running 2.6.5-7.244-smp. Upgrade to 2.6.5-7.257 at least. Kai Nielsen wrote:> Hi! > > Is the the place to report/discuss ocfs2 related kernel bugs? > > We just had some file system corruption on a two node SLES9SP3 cluster > with shared ocfs2 filesystem. > > /var/log/messages told me: > > Aug 17 15:56:17 tux100p012 kernel: (8504,3):ocfs2_meta_lock_update:1423 ERROR: bug expression: le64_to_cpu(fe->i_dtime) || !(fe->i_flags & cpu_to_le32(OCFS2_VALID_FL)) > Aug 17 15:56:17 tux100p012 kernel: (8504,3):ocfs2_meta_lock_update:1423 ERROR: Stale dinode 17079 dtime: 1155822895 flags: 0x0 > Aug 17 15:56:17 tux100p012 kernel: ------------[ cut here ]------------ > Aug 17 15:56:17 tux100p012 kernel: kernel BUG at fs/ocfs2/dlmglue.c:1423! > Aug 17 15:56:17 tux100p012 kernel: invalid operand: 0000 [#1] > Aug 17 15:56:17 tux100p012 kernel: SMP > Aug 17 15:56:17 tux100p012 kernel: CPU: 3 > Aug 17 15:56:17 tux100p012 kernel: EIP: 0060:[<f9fe72ff>] Tainted: PF U > Aug 17 15:56:17 tux100p012 kernel: EFLAGS: 00010246 (2.6.5-7.244-smp SLES9_SP3_BRANCH-200512121832250000) > Aug 17 15:56:17 tux100p012 kernel: EIP is at ocfs2_meta_lock_full+0x46f/0x10d0 [ocfs2] > Aug 17 15:56:17 tux100p012 kernel: eax: 0000005f ebx: f5573990 ecx: c03aab74 edx: 0001a8d3 > Aug 17 15:56:17 tux100p012 kernel: esi: d68be000 edi: 00000000 ebp: fa017f08 esp: f3ef1dc4 > Aug 17 15:56:17 tux100p012 kernel: ds: 007b es: 007b ss: 0068 > Aug 17 15:56:17 tux100p012 kernel: Process nfsd (pid: 8504, threadinfo=f3ef0000 task=f5573990) > Aug 17 15:56:17 tux100p012 kernel: Stack: fa01c23c 00002138 00000003 fa017f08 0000058f 000042b7 00000000 44e4752f > Aug 17 15:56:17 tux100p012 kernel: 00000000 00000000 f4abb35c d68be000 d1892788 f569c06c f569c06c d1892788 > Aug 17 15:56:17 tux100p012 kernel: 00000001 00000000 00000000 d1892978 dc1efc10 00000000 d7bf2200 f4abb200 > Aug 17 15:56:17 tux100p012 kernel: Call Trace: > Aug 17 15:56:17 tux100p012 kernel: [<c0339acc>] svc_sock_enqueue+0x13c/0x2a0 > Aug 17 15:56:17 tux100p012 kernel: [<c033b068>] svc_tcp_recvfrom+0x428/0x870 > Aug 17 15:56:17 tux100p012 kernel: [<f9feff19>] ocfs2_inode_revalidate+0x139/0x2f0 [ocfs2] > Aug 17 15:56:17 tux100p012 kernel: [<f9fe87fc>] ocfs2_decode_fh+0xbc/0x2a0 [ocfs2] > Aug 17 15:56:17 tux100p012 kernel: [<f98e30b0>] nfsd_acceptable+0x0/0xf7 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<f9feafa9>] ocfs2_getattr+0x49/0x1e0 [ocfs2] > Aug 17 15:56:17 tux100p012 kernel: [<c01395cd>] set_current_groups+0x19d/0x1d0 > Aug 17 15:56:17 tux100p012 kernel: [<f9feaf60>] ocfs2_getattr+0x0/0x1e0 [ocfs2] > Aug 17 15:56:17 tux100p012 kernel: [<c017ec14>] vfs_getattr_it+0x54/0x120 > Aug 17 15:56:17 tux100p012 kernel: [<c017eced>] vfs_getattr+0xd/0x10 > Aug 17 15:56:17 tux100p012 kernel: [<f98ee02b>] nfs3svc_encode_attrstat+0x6b/0x250 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<f98eb943>] nfsd3_proc_getattr+0x73/0xb0 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<f98edfc0>] nfs3svc_encode_attrstat+0x0/0x250 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<f98e018d>] nfsd_dispatch+0x16d/0x1e0 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<c033c2ad>] svc_authenticate+0x4d/0x8d > Aug 17 15:56:17 tux100p012 kernel: [<c0338a22>] svc_process+0x272/0x670 > Aug 17 15:56:17 tux100p012 kernel: [<c010a1da>] apic_timer_interrupt+0x1a/0x20 > Aug 17 15:56:17 tux100p012 kernel: [<f98e0624>] nfsd+0x1c4/0x369 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<f98e0460>] nfsd+0x0/0x369 [nfsd] > Aug 17 15:56:17 tux100p012 kernel: [<c0107005>] kernel_thread_helper+0x5/0x10 > Aug 17 15:56:17 tux100p012 kernel: > Aug 17 15:56:17 tux100p012 kernel: Code: 0f 0b 8f 05 29 7a 01 fa 8b 54 24 2c 8b 44 24 4c e8 bc 82 00 > > Does this say anything about what happened? We don't want to run into > this probloem again, of course. > > Regards, > > Kai. > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >