Hi, Please provide the detail info of ocfs2 version which may be helpful for diagnose. Peter Selzner wrote:> Hi, > > we had this entries in /var/log/messeges a few days ago: > > Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: bug expression: i_size_read(inode) != (le64_to_cpu(fe->i_size) - *bytes_extended) > Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: Inode 8323098 i_size = 1572864, dinode i_size = 1568768, bytes_extended = 0, new_i_size = 1576960 > Jul 28 23:30:47 xxx kernel: klogd 1.4.1, ---------- state change ---------- > Jul 28 23:30:47 xxx kernel: ------------[ cut here ]------------ > Jul 28 23:30:47 xxx kernel: kernel BUG at fs/ocfs2/file.c:790! > Jul 28 23:30:47 xxx kernel: invalid opcode: 0000 [#1] > Jul 28 23:30:47 xxx kernel: SMP > Jul 28 23:30:47 xxx kernel: last sysfs file: /class/infiniband/mthca1/board_id > Jul 28 23:30:47 xxx kernel: Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs cpqci mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler rdma_ucm rds ib_ucm ib_sdp rdma_cm iw_cm > ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad bonding ib_mthca ib_mad ib_core button battery ac raw loop dm_round_robin dm_multipath dm_mod usbhid hw_random ide_cd uhci_hcd e1000 > cdrom ehci_hcd bnx2 usbcore ext3 jbd ata_piix ahci libata edd fan thermal processor cciss sg qla2400 qla2300 qla2xxx firmware_class qla2xxx_conf intermodule piix sd_mod scsi_mod ide_disk ide_core > Jul 28 23:30:47 xxx kernel: CPU: 2 > Jul 28 23:30:47 xxx kernel: EIP: 0060:[<f9de8173>] Tainted: P U VLI > Jul 28 23:30:47 xxx kernel: EFLAGS: 00210292 (2.6.16.46-0.12-bigsmp #1) > Jul 28 23:30:47 xxx kernel: EIP is at ocfs2_extend_file+0x3cd/0xf9b [ocfs2] > Jul 28 23:30:47 xxx kernel: eax: 0000008c ebx: 00000000 ecx: ffffff00 edx: 00200286 > Jul 28 23:30:47 xxx kernel: esi: 00000000 edi: 00000000 ebp: df05f000 esp: e398de70 > Jul 28 23:30:47 xxx kernel: ds: 007b es: 007b ss: 0068 > Jul 28 23:30:47 xxx kernel: Process mv (pid: 12268, threadinfo=e398c000 task=f7f80660) > Jul 28 23:30:47 xxx kernel: Stack: <0>00000000 dd4f9d88 ce48c000 00000000 00000000 00000001 cf253280 dd4f9b80 > Jul 28 23:30:47 xxx kernel: dd4f9ee4 0017f000 00000000 00000000 f9ddf432 e398dea8 dd4f9b80 00000000 > Jul 28 23:30:47 xxx kernel: 00000001 e398deb4 e398deb4 ce48c000 00000000 00000000 ece0bc00 00000000 > Jul 28 23:30:47 xxx kernel: Call Trace: > Jul 28 23:30:47 xxx kernel: [<f9ddf432>] ocfs2_status_completion_cb+0x0/0xa [ocfs2] > Jul 28 23:30:47 xxx kernel: [<f9df72f2>] ocfs2_write_lock_maybe_extend+0xb2f/0xde3 [ocfs2] > Jul 28 23:30:47 xxx kernel: [<f9dea85d>] ocfs2_file_write+0x125/0x24d [ocfs2] > Jul 28 23:30:47 xxx kernel: [<f9dea738>] ocfs2_file_write+0x0/0x24d [ocfs2] > Jul 28 23:30:47 xxx kernel: [<c0164714>] vfs_write+0xaa/0x152 > Jul 28 23:30:47 xxx kernel: [<c0164d1f>] sys_write+0x3c/0x63 > Jul 28 23:30:47 xxx kernel: [<c0103cab>] sysenter_past_esp+0x54/0x79 > Jul 28 23:30:47 xxx kernel: Code: 8b 4c 24 3c ff 71 04 ff 31 68 16 03 00 00 68 2b b5 e0 f9 ff 70 10 8b 00 ff b0 c0 00 00 00 68 b1 fd e0 f9 e8 ca a8 33 c6 83 c4 3c <0f> 0b 16 03 db fb e0 f9 8b 5c 24 20 > 8b 03 0f ae e8 89 f6 8b 74 > > It was impossible to do "ls -al" in a certain directory (each process that > "touched" files in this directory ends in DEAD state (uninterruptible sleep). > Any suggestions? Thanks.How do this happen and could you please explain it in more detail? e.g, how many nodes are in your cluster? you hang in one node, how about other nodes or what you are doing in other nodes. Regards, Tao
Hi, we had this entries in /var/log/messeges a few days ago: Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: bug expression: i_size_read(inode) != (le64_to_cpu(fe->i_size) - *bytes_extended) Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: Inode 8323098 i_size = 1572864, dinode i_size = 1568768, bytes_extended = 0, new_i_size = 1576960 Jul 28 23:30:47 xxx kernel: klogd 1.4.1, ---------- state change ---------- Jul 28 23:30:47 xxx kernel: ------------[ cut here ]------------ Jul 28 23:30:47 xxx kernel: kernel BUG at fs/ocfs2/file.c:790! Jul 28 23:30:47 xxx kernel: invalid opcode: 0000 [#1] Jul 28 23:30:47 xxx kernel: SMP Jul 28 23:30:47 xxx kernel: last sysfs file: /class/infiniband/mthca1/board_id Jul 28 23:30:47 xxx kernel: Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs cpqci mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler rdma_ucm rds ib_ucm ib_sdp rdma_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad bonding ib_mthca ib_mad ib_core button battery ac raw loop dm_round_robin dm_multipath dm_mod usbhid hw_random ide_cd uhci_hcd e1000 cdrom ehci_hcd bnx2 usbcore ext3 jbd ata_piix ahci libata edd fan thermal processor cciss sg qla2400 qla2300 qla2xxx firmware_class qla2xxx_conf intermodule piix sd_mod scsi_mod ide_disk ide_core Jul 28 23:30:47 xxx kernel: CPU: 2 Jul 28 23:30:47 xxx kernel: EIP: 0060:[<f9de8173>] Tainted: P U VLI Jul 28 23:30:47 xxx kernel: EFLAGS: 00210292 (2.6.16.46-0.12-bigsmp #1) Jul 28 23:30:47 xxx kernel: EIP is at ocfs2_extend_file+0x3cd/0xf9b [ocfs2] Jul 28 23:30:47 xxx kernel: eax: 0000008c ebx: 00000000 ecx: ffffff00 edx: 00200286 Jul 28 23:30:47 xxx kernel: esi: 00000000 edi: 00000000 ebp: df05f000 esp: e398de70 Jul 28 23:30:47 xxx kernel: ds: 007b es: 007b ss: 0068 Jul 28 23:30:47 xxx kernel: Process mv (pid: 12268, threadinfo=e398c000 task=f7f80660) Jul 28 23:30:47 xxx kernel: Stack: <0>00000000 dd4f9d88 ce48c000 00000000 00000000 00000001 cf253280 dd4f9b80 Jul 28 23:30:47 xxx kernel: dd4f9ee4 0017f000 00000000 00000000 f9ddf432 e398dea8 dd4f9b80 00000000 Jul 28 23:30:47 xxx kernel: 00000001 e398deb4 e398deb4 ce48c000 00000000 00000000 ece0bc00 00000000 Jul 28 23:30:47 xxx kernel: Call Trace: Jul 28 23:30:47 xxx kernel: [<f9ddf432>] ocfs2_status_completion_cb+0x0/0xa [ocfs2] Jul 28 23:30:47 xxx kernel: [<f9df72f2>] ocfs2_write_lock_maybe_extend+0xb2f/0xde3 [ocfs2] Jul 28 23:30:47 xxx kernel: [<f9dea85d>] ocfs2_file_write+0x125/0x24d [ocfs2] Jul 28 23:30:47 xxx kernel: [<f9dea738>] ocfs2_file_write+0x0/0x24d [ocfs2] Jul 28 23:30:47 xxx kernel: [<c0164714>] vfs_write+0xaa/0x152 Jul 28 23:30:47 xxx kernel: [<c0164d1f>] sys_write+0x3c/0x63 Jul 28 23:30:47 xxx kernel: [<c0103cab>] sysenter_past_esp+0x54/0x79 Jul 28 23:30:47 xxx kernel: Code: 8b 4c 24 3c ff 71 04 ff 31 68 16 03 00 00 68 2b b5 e0 f9 ff 70 10 8b 00 ff b0 c0 00 00 00 68 b1 fd e0 f9 e8 ca a8 33 c6 83 c4 3c <0f> 0b 16 03 db fb e0 f9 8b 5c 24 20 8b 03 0f ae e8 89 f6 8b 74 It was impossible to do "ls -al" in a certain directory (each process that "touched" files in this directory ends in DEAD state (uninterruptible sleep). Any suggestions? Thanks. Best regards! Peter Selzner
* Tao Ma <tao.ma at oracle.com> [01.08.08 10:58] Hi, hanks for your quick reply. Here the details: xxx:/ # SPident CONCLUSION: System is up-to-date! found SLE-10-i386-SP1 + "online updates" xxx:/ # uname -r 2.6.16.46-0.12-bigsmp xxx:/ # cat /proc/fs/ocfs2/version OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19 EDT 2007 (build sles) xxx:/ # debugfs.ocfs2 -V debugfs.ocfs2 1.2.3 We have 6 nodes in the cluster and the described behavior (freeze of processes in a certain directory) was observed on all 6 nodes. Thanks.> Hi, > Please provide the detail info of ocfs2 version which may be helpful for > diagnose. > > Peter Selzner wrote: > >Hi, > >we had this entries in /var/log/messeges a few days ago: > >Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: bug > >expression: i_size_read(inode) != (le64_to_cpu(fe->i_size) - *bytes_extended) > >Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: Inode > >8323098 i_size = 1572864, dinode i_size = 1568768, bytes_extended = 0, > >new_i_size = 1576960 Jul 28 23:30:47 xxx kernel: klogd 1.4.1, ---------- state > >change ---------- Jul 28 23:30:47 xxx kernel: ------------[ cut here > >]------------ > >Jul 28 23:30:47 xxx kernel: kernel BUG at fs/ocfs2/file.c:790! > >Jul 28 23:30:47 xxx kernel: invalid opcode: 0000 [#1] > >Jul 28 23:30:47 xxx kernel: SMP Jul 28 23:30:47 xxx kernel: last sysfs file: > >/class/infiniband/mthca1/board_id > >Jul 28 23:30:47 xxx kernel: Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm > >ocfs2_nodemanager configfs cpqci mptctl mptbase ipmi_si ipmi_devintf > >ipmi_msghandler rdma_ucm rds ib_ucm ib_sdp rdma_cm iw_cm > >ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad bonding > >ib_mthca ib_mad ib_core button battery ac raw loop dm_round_robin dm_multipath > >dm_mod usbhid hw_random ide_cd uhci_hcd e1000 > >cdrom ehci_hcd bnx2 usbcore ext3 jbd ata_piix ahci libata edd fan thermal > >processor cciss sg qla2400 qla2300 qla2xxx firmware_class qla2xxx_conf > >intermodule piix sd_mod scsi_mod ide_disk ide_core > >Jul 28 23:30:47 xxx kernel: CPU: 2 Jul 28 23:30:47 xxx kernel: EIP: > >0060:[<f9de8173>] Tainted: P U VLI Jul 28 23:30:47 xxx kernel: EFLAGS: > >00210292 (2.6.16.46-0.12-bigsmp #1) Jul 28 23:30:47 xxx kernel: EIP is at > >ocfs2_extend_file+0x3cd/0xf9b [ocfs2] > >Jul 28 23:30:47 xxx kernel: eax: 0000008c ebx: 00000000 ecx: ffffff00 > >edx: 00200286 > >Jul 28 23:30:47 xxx kernel: esi: 00000000 edi: 00000000 ebp: df05f000 > >esp: e398de70 > >Jul 28 23:30:47 xxx kernel: ds: 007b es: 007b ss: 0068 > >Jul 28 23:30:47 xxx kernel: Process mv (pid: 12268, threadinfo=e398c000 > >task=f7f80660) > >Jul 28 23:30:47 xxx kernel: Stack: <0>00000000 dd4f9d88 ce48c000 00000000 > >00000000 00000001 cf253280 dd4f9b80 Jul 28 23:30:47 xxx kernel: > >dd4f9ee4 0017f000 00000000 00000000 f9ddf432 e398dea8 dd4f9b80 00000000 Jul 28 > >23:30:47 xxx kernel: 00000001 e398deb4 e398deb4 ce48c000 00000000 > >00000000 ece0bc00 00000000 Jul 28 23:30:47 xxx kernel: Call Trace: > >Jul 28 23:30:47 xxx kernel: [<f9ddf432>] ocfs2_status_completion_cb+0x0/0xa > >[ocfs2] > >Jul 28 23:30:47 xxx kernel: [<f9df72f2>] > >ocfs2_write_lock_maybe_extend+0xb2f/0xde3 [ocfs2] > >Jul 28 23:30:47 xxx kernel: [<f9dea85d>] ocfs2_file_write+0x125/0x24d [ocfs2] > >Jul 28 23:30:47 xxx kernel: [<f9dea738>] ocfs2_file_write+0x0/0x24d [ocfs2] > >Jul 28 23:30:47 xxx kernel: [<c0164714>] vfs_write+0xaa/0x152 > >Jul 28 23:30:47 xxx kernel: [<c0164d1f>] sys_write+0x3c/0x63 > >Jul 28 23:30:47 xxx kernel: [<c0103cab>] sysenter_past_esp+0x54/0x79 > >Jul 28 23:30:47 xxx kernel: Code: 8b 4c 24 3c ff 71 04 ff 31 68 16 03 00 00 68 > >2b b5 e0 f9 ff 70 10 8b 00 ff b0 c0 00 00 00 68 b1 fd e0 f9 e8 ca a8 33 c6 83 > >c4 3c <0f> 0b 16 03 db fb e0 f9 8b 5c 24 20 > >8b 03 0f ae e8 89 f6 8b 74 It was impossible to do "ls -al" in a certain > >directory (each process that > >"touched" files in this directory ends in DEAD state (uninterruptible sleep). > >Any suggestions? Thanks. > How do this happen and could you please explain it in more detail? e.g, how > many nodes are in your cluster? you hang in one node, how about other nodes or > what you are doing in other nodes. > > Regards, > TaoMit freundlichen Gruessen Peter Selzner -- | Peter Selzner mail: p.selzner at krz.de | | Kommunales Rechenzentrum (KRZ) tel: +49 (0)5261-252-273 | | Minden-Ravensberg / Lippe fax: +49 (0)5261-932-273 |
No, the kernel is old. A year+ old. Refer to this announcement below. http://oss.oracle.com/pipermail/ocfs2-announce/2008-July/000026.html From the stack, it looks you are encountering the rename/extend race that was fixed a long time ago. http://oss.oracle.com/projects/ocfs2/news/article_14.html Peter Selzner wrote:> * Tao Ma <tao.ma at oracle.com> [01.08.08 10:58] > Hi, > > hanks for your quick reply. > > Here the details: > > xxx:/ # SPident > > CONCLUSION: System is up-to-date! > found SLE-10-i386-SP1 + "online updates" > > xxx:/ # uname -r > 2.6.16.46-0.12-bigsmp > > xxx:/ # cat /proc/fs/ocfs2/version > OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19 EDT 2007 (build sles) > > xxx:/ # debugfs.ocfs2 -V > debugfs.ocfs2 1.2.3 > > We have 6 nodes in the cluster and the described behavior (freeze of > processes in a certain directory) was observed on all 6 nodes. Thanks. > > >> Hi, >> Please provide the detail info of ocfs2 version which may be helpful for >> diagnose. >> >> Peter Selzner wrote: >> >>> Hi, >>> we had this entries in /var/log/messeges a few days ago: >>> Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: bug >>> expression: i_size_read(inode) != (le64_to_cpu(fe->i_size) - *bytes_extended) >>> Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: Inode >>> 8323098 i_size = 1572864, dinode i_size = 1568768, bytes_extended = 0, >>> new_i_size = 1576960 Jul 28 23:30:47 xxx kernel: klogd 1.4.1, ---------- state >>> change ---------- Jul 28 23:30:47 xxx kernel: ------------[ cut here >>> ]------------ >>> Jul 28 23:30:47 xxx kernel: kernel BUG at fs/ocfs2/file.c:790! >>> Jul 28 23:30:47 xxx kernel: invalid opcode: 0000 [#1] >>> Jul 28 23:30:47 xxx kernel: SMP Jul 28 23:30:47 xxx kernel: last sysfs file: >>> /class/infiniband/mthca1/board_id >>> Jul 28 23:30:47 xxx kernel: Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm >>> ocfs2_nodemanager configfs cpqci mptctl mptbase ipmi_si ipmi_devintf >>> ipmi_msghandler rdma_ucm rds ib_ucm ib_sdp rdma_cm iw_cm >>> ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad bonding >>> ib_mthca ib_mad ib_core button battery ac raw loop dm_round_robin dm_multipath >>> dm_mod usbhid hw_random ide_cd uhci_hcd e1000 >>> cdrom ehci_hcd bnx2 usbcore ext3 jbd ata_piix ahci libata edd fan thermal >>> processor cciss sg qla2400 qla2300 qla2xxx firmware_class qla2xxx_conf >>> intermodule piix sd_mod scsi_mod ide_disk ide_core >>> Jul 28 23:30:47 xxx kernel: CPU: 2 Jul 28 23:30:47 xxx kernel: EIP: >>> 0060:[<f9de8173>] Tainted: P U VLI Jul 28 23:30:47 xxx kernel: EFLAGS: >>> 00210292 (2.6.16.46-0.12-bigsmp #1) Jul 28 23:30:47 xxx kernel: EIP is at >>> ocfs2_extend_file+0x3cd/0xf9b [ocfs2] >>> Jul 28 23:30:47 xxx kernel: eax: 0000008c ebx: 00000000 ecx: ffffff00 >>> edx: 00200286 >>> Jul 28 23:30:47 xxx kernel: esi: 00000000 edi: 00000000 ebp: df05f000 >>> esp: e398de70 >>> Jul 28 23:30:47 xxx kernel: ds: 007b es: 007b ss: 0068 >>> Jul 28 23:30:47 xxx kernel: Process mv (pid: 12268, threadinfo=e398c000 >>> task=f7f80660) >>> Jul 28 23:30:47 xxx kernel: Stack: <0>00000000 dd4f9d88 ce48c000 00000000 >>> 00000000 00000001 cf253280 dd4f9b80 Jul 28 23:30:47 xxx kernel: >>> dd4f9ee4 0017f000 00000000 00000000 f9ddf432 e398dea8 dd4f9b80 00000000 Jul 28 >>> 23:30:47 xxx kernel: 00000001 e398deb4 e398deb4 ce48c000 00000000 >>> 00000000 ece0bc00 00000000 Jul 28 23:30:47 xxx kernel: Call Trace: >>> Jul 28 23:30:47 xxx kernel: [<f9ddf432>] ocfs2_status_completion_cb+0x0/0xa >>> [ocfs2] >>> Jul 28 23:30:47 xxx kernel: [<f9df72f2>] >>> ocfs2_write_lock_maybe_extend+0xb2f/0xde3 [ocfs2] >>> Jul 28 23:30:47 xxx kernel: [<f9dea85d>] ocfs2_file_write+0x125/0x24d [ocfs2] >>> Jul 28 23:30:47 xxx kernel: [<f9dea738>] ocfs2_file_write+0x0/0x24d [ocfs2] >>> Jul 28 23:30:47 xxx kernel: [<c0164714>] vfs_write+0xaa/0x152 >>> Jul 28 23:30:47 xxx kernel: [<c0164d1f>] sys_write+0x3c/0x63 >>> Jul 28 23:30:47 xxx kernel: [<c0103cab>] sysenter_past_esp+0x54/0x79 >>> Jul 28 23:30:47 xxx kernel: Code: 8b 4c 24 3c ff 71 04 ff 31 68 16 03 00 00 68 >>> 2b b5 e0 f9 ff 70 10 8b 00 ff b0 c0 00 00 00 68 b1 fd e0 f9 e8 ca a8 33 c6 83 >>> c4 3c <0f> 0b 16 03 db fb e0 f9 8b 5c 24 20 >>> 8b 03 0f ae e8 89 f6 8b 74 It was impossible to do "ls -al" in a certain >>> directory (each process that >>> "touched" files in this directory ends in DEAD state (uninterruptible sleep). >>> Any suggestions? Thanks. >>> >> How do this happen and could you please explain it in more detail? e.g, how >> many nodes are in your cluster? you hang in one node, how about other nodes or >> what you are doing in other nodes. >> >> Regards, >> Tao >> > > > Mit freundlichen Gruessen > Peter Selzner > > -- > | Peter Selzner mail: p.selzner at krz.de | > | Kommunales Rechenzentrum (KRZ) tel: +49 (0)5261-252-273 | > | Minden-Ravensberg / Lippe fax: +49 (0)5261-932-273 | > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >