Hi, I'm using ocfs2 cluster in a production environment since almost 1 year. During this time i had to run a fsck.ocfs2 few months ago due to some errors but they were fixed. Now i have a big problem: I'm not able to mount the volume on any of the nodes. I stopped all nodes except one. Some output bellow: *mount /mnt/ocfs2** **mount.ocfs2: I/O error on channel while trying to determine heartbeat information** ** **fsck.ocfs2 /dev/mapper/volgr1-lvol0** **fsck.ocfs2 1.6.3** **fsck.ocfs2: I/O error on channel while initializing the DLM** ** **fsck.ocfs2 -n /dev/mapper/volgr1-lvol0** **fsck.ocfs2 1.6.3** **Checking OCFS2 filesystem in /dev/mapper/volgr1-lvol0:** ** Label: SAN** ** UUID: B4CF8D4667AF43118F3324567B90A987** ** Number of blocks: 2901788672** ** Block size: 4096** ** Number of clusters: 45340448** ** Cluster size: 262144** ** Number of slots: 10** ** **journal recovery: I/O error on channel while looking up the journal inode for slot 0** **fsck encountered unrecoverable errors while replaying the journals and will not continue* Can you give me some hints on how to debug the problem? Thank you, Laurentiu. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121109/374f628f/attachment.html
IO error on channel means the system cannot talk to the block device. The problem is in the block layer. Maybe a loose cable or a setup problem. dmesg should show errors. On Fri, Nov 9, 2012 at 10:46 AM, Laurentiu Gosu <lg at easic.ro> wrote:> Hi, > I'm using ocfs2 cluster in a production environment since almost 1 year. > During this time i had to run a fsck.ocfs2 few months ago due to some > errors but they were fixed. > Now i have a big problem: I'm not able to mount the volume on any of the > nodes. I stopped all nodes except one. Some output bellow: > *mount /mnt/ocfs2** > **mount.ocfs2: I/O error on channel while trying to determine heartbeat > information** > ** > **fsck.ocfs2 /dev/mapper/volgr1-lvol0** > **fsck.ocfs2 1.6.3** > **fsck.ocfs2: I/O error on channel while initializing the DLM** > ** > **fsck.ocfs2 -n /dev/mapper/volgr1-lvol0** > **fsck.ocfs2 1.6.3** > **Checking OCFS2 filesystem in /dev/mapper/volgr1-lvol0:** > ** Label: SAN** > ** UUID: B4CF8D4667AF43118F3324567B90A987** > ** Number of blocks: 2901788672** > ** Block size: 4096** > ** Number of clusters: 45340448** > ** Cluster size: 262144** > ** Number of slots: 10** > ** > **journal recovery: I/O error on channel while looking up the journal > inode for slot 0** > **fsck encountered unrecoverable errors while replaying the journals and > will not continue* > > > Can you give me some hints on how to debug the problem? > > Thank you, > Laurentiu. > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121109/186166a7/attachment.html
Hi Sunil, Thank you for answering. Unfortunately, it doesn't seem like it's a hardware problem. There's no way a cable can be loose because it's iSCSI over 1G Ethernet (copper wires) environment. Also I performed "dd if=/dev/.... of=/dev/null" and first 16GB or so are fine. "Dmesg" shows no errors. Also tried with debugfs.ocfs2: [root at ro02xsrv003 ~]# debugfs.ocfs2 /dev/mapper/volgr1-lvol0 debugfs.ocfs2 1.6.3 debugfs: ls ls: Bad magic number in inode '.' debugfs: slotmap slotmap: Bad magic number in inode while reading slotmap system file debugfs: stats Revision: 0.90 Mount Count: 0 Max Mount Count: 20 State: 0 Errors: 0 Check Interval: 0 Last Check: Fri Nov 9 14:35:53 2012 Creator OS: 0 Feature Compat: 3 backup-super strict-journal-super Feature Incompat: 16208 sparse extended-slotmap inline-data metaecc xattr indexed-dirs refcount discontig-bg Tunefs Incomplete: 0 Feature RO compat: 7 unwritten usrquota grpquota Root Blknum: 129 System Dir Blknum: 130 First Cluster Group Blknum: 64 Block Size Bits: 12 Cluster Size Bits: 18 Max Node Slots: 10 Extended Attributes Inline Size: 256 Label: SAN UUID: B4CF8D4667AF43118F3324567B90A987 Hash: 3698209293 (0xdc6e320d) DX Seed[0]: 0x9f4a2bb7 DX Seed[1]: 0x501ddac0 DX Seed[2]: 0x6034bfe8 Cluster stack: classic o2cb Inode: 2 Mode: 00 Generation: 1093568923 (0x412e899b) FS Generation: 1093568923 (0x412e899b) CRC32: 46f2d360 ECC: 04d4 Type: Unknown Attr: 0x0 Flags: Valid System Superblock Dynamic Features: (0x0) User: 0 (root) Group: 0 (root) Size: 0 Links: 0 Clusters: 45340448 ctime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011 atime: 0x0 -- Thu Jan 1 02:00:00 1970 mtime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011 dtime: 0x0 -- Thu Jan 1 02:00:00 1970 ctime_nsec: 0x00000000 -- 0 atime_nsec: 0x00000000 -- 0 mtime_nsec: 0x00000000 -- 0 Refcount Block: 0 Last Extblk: 0 Orphan Slot: 0 Sub Alloc Slot: Global Sub Alloc Bit: 65535 Marian -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3973 bytes Desc: S/MIME Cryptographic Signature Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121109/790cf970/attachment.bin