Hi, I'm using ocfs2 cluster in a production environment since almost 1 year. During this time i had to run a fsck.ocfs2 few months ago due to some errors but they were fixed. Now i have a big problem: I'm not able to mount the volume on any of the nodes. I stopped all nodes except one. Some output bellow: *mount /mnt/ocfs2** **mount.ocfs2: I/O error on channel while trying to determine heartbeat information** ** **fsck.ocfs2 /dev/mapper/volgr1-lvol0** **fsck.ocfs2 1.6.3** **fsck.ocfs2: I/O error on channel while initializing the DLM** ** **fsck.ocfs2 -n /dev/mapper/volgr1-lvol0** **fsck.ocfs2 1.6.3** **Checking OCFS2 filesystem in /dev/mapper/volgr1-lvol0:** ** Label: SAN** ** UUID: B4CF8D4667AF43118F3324567B90A987** ** Number of blocks: 2901788672** ** Block size: 4096** ** Number of clusters: 45340448** ** Cluster size: 262144** ** Number of slots: 10** ** **journal recovery: I/O error on channel while looking up the journal inode for slot 0** **fsck encountered unrecoverable errors while replaying the journals and will not continue* Can you give me some hints on how to debug the problem? Thank you, Laurentiu. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121109/374f628f/attachment.html
IO error on channel means the system cannot talk to the block device. The problem is in the block layer. Maybe a loose cable or a setup problem. dmesg should show errors. On Fri, Nov 9, 2012 at 10:46 AM, Laurentiu Gosu <lg at easic.ro> wrote:> Hi, > I'm using ocfs2 cluster in a production environment since almost 1 year. > During this time i had to run a fsck.ocfs2 few months ago due to some > errors but they were fixed. > Now i have a big problem: I'm not able to mount the volume on any of the > nodes. I stopped all nodes except one. Some output bellow: > *mount /mnt/ocfs2** > **mount.ocfs2: I/O error on channel while trying to determine heartbeat > information** > ** > **fsck.ocfs2 /dev/mapper/volgr1-lvol0** > **fsck.ocfs2 1.6.3** > **fsck.ocfs2: I/O error on channel while initializing the DLM** > ** > **fsck.ocfs2 -n /dev/mapper/volgr1-lvol0** > **fsck.ocfs2 1.6.3** > **Checking OCFS2 filesystem in /dev/mapper/volgr1-lvol0:** > ** Label: SAN** > ** UUID: B4CF8D4667AF43118F3324567B90A987** > ** Number of blocks: 2901788672** > ** Block size: 4096** > ** Number of clusters: 45340448** > ** Cluster size: 262144** > ** Number of slots: 10** > ** > **journal recovery: I/O error on channel while looking up the journal > inode for slot 0** > **fsck encountered unrecoverable errors while replaying the journals and > will not continue* > > > Can you give me some hints on how to debug the problem? > > Thank you, > Laurentiu. > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121109/186166a7/attachment.html
Hi Sunil,
Thank you for answering. Unfortunately, it doesn't seem like it's a
hardware problem. There's no way a cable can be loose because it's iSCSI
over 1G Ethernet (copper wires) environment. Also I performed "dd
if=/dev/.... of=/dev/null" and first 16GB or so are fine. "Dmesg"
shows
no errors.
Also tried with debugfs.ocfs2:
[root at ro02xsrv003 ~]# debugfs.ocfs2 /dev/mapper/volgr1-lvol0
debugfs.ocfs2 1.6.3
debugfs: ls
ls: Bad magic number in inode '.'
debugfs: slotmap
slotmap: Bad magic number in inode while reading slotmap system file
debugfs: stats
Revision: 0.90
Mount Count: 0 Max Mount Count: 20
State: 0 Errors: 0
Check Interval: 0 Last Check: Fri Nov 9 14:35:53 2012
Creator OS: 0
Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 16208 sparse extended-slotmap inline-data
metaecc xattr indexed-dirs refcount discontig-bg
Tunefs Incomplete: 0
Feature RO compat: 7 unwritten usrquota grpquota
Root Blknum: 129 System Dir Blknum: 130
First Cluster Group Blknum: 64
Block Size Bits: 12 Cluster Size Bits: 18
Max Node Slots: 10
Extended Attributes Inline Size: 256
Label: SAN
UUID: B4CF8D4667AF43118F3324567B90A987
Hash: 3698209293 (0xdc6e320d)
DX Seed[0]: 0x9f4a2bb7
DX Seed[1]: 0x501ddac0
DX Seed[2]: 0x6034bfe8
Cluster stack: classic o2cb
Inode: 2 Mode: 00 Generation: 1093568923 (0x412e899b)
FS Generation: 1093568923 (0x412e899b)
CRC32: 46f2d360 ECC: 04d4
Type: Unknown Attr: 0x0 Flags: Valid System Superblock
Dynamic Features: (0x0)
User: 0 (root) Group: 0 (root) Size: 0
Links: 0 Clusters: 45340448
ctime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011
atime: 0x0 -- Thu Jan 1 02:00:00 1970
mtime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011
dtime: 0x0 -- Thu Jan 1 02:00:00 1970
ctime_nsec: 0x00000000 -- 0
atime_nsec: 0x00000000 -- 0
mtime_nsec: 0x00000000 -- 0
Refcount Block: 0
Last Extblk: 0 Orphan Slot: 0
Sub Alloc Slot: Global Sub Alloc Bit: 65535
Marian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3973 bytes
Desc: S/MIME Cryptographic Signature
Url :
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121109/790cf970/attachment.bin