Hello, We think we've seen this issue a few times now and I wonder if anyone out there has any insight as to what's going on. We have a large (40+) cluster accessing an OCFS2 v 1.2.1 filesystem. All nodes are Fedora Core 4 (kernel 2.6.13-1.1532_FC4smp). One node is x86_64, the rest are i686. On the affected node (the x86_64 host), we tried to start the Sun grid Engine dbwriter service. All the related processes entered a permanent "D" state. All root logins received a syslog message saying: kernel: Kernel BUG at "/root/dnld/ocfs2/ocfs2-1.2.1/fs/ocfs2/file.c":787 kernel: invalid operand: 0000 [1] SMP I've attached a syslog fragment with the full stack trace. We were able to log into the box and reboot it. The dbwriter then started normally. Does anyone have suggestions on how to avoid this problem going forward? -- Eric Jones ejones at jimmy dot harvard dot edu System Adminstrator Department of Biostatistics & Computational Biology 617-632-2447 Dana-Farber Cancer Institute -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ocsf2-error-syslog Url: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060720/25fd7f5f/ocsf2-error-syslog.ksh
This looks like the lvb issue fixed in ocfs2 1.2.2. The inode->i_size gets out of sync. Upgrade to 1.2.2. Eric Jones wrote:> Hello, > > We think we've seen this issue a few times now and I wonder if anyone > out there has any insight as to what's going on. > > We have a large (40+) cluster accessing an OCFS2 v 1.2.1 filesystem. > All nodes are Fedora Core 4 (kernel 2.6.13-1.1532_FC4smp). One node > is x86_64, the rest are i686. > > On the affected node (the x86_64 host), we tried to start the Sun grid > Engine dbwriter service. All the related processes entered a > permanent "D" state. All root logins received a syslog message saying: > > kernel: Kernel BUG at "/root/dnld/ocfs2/ocfs2-1.2.1/fs/ocfs2/file.c":787 > kernel: invalid operand: 0000 [1] SMP > > I've attached a syslog fragment with the full stack trace. > > We were able to log into the box and reboot it. The dbwriter then > started normally. > > Does anyone have suggestions on how to avoid this problem going forward? > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >