Hello List, i ran into a problem with ocfs2 in a 2 node cluster. (but i suspect, that it is my fault due to a wrong configuration... maybe....) I am using: =======SLES9 on ibm xseries 235 with shared storage in SAN with following configuration: kernel 2.6.5-7.108-bigsmp ocfs2-tools-0.99.0-1 ocfs2-support-0.99.0-1 ocfs2-2.6.5-7.108-bigsmp-0.99.2-1 Node1: /etc/ocfs.conf: ip_address = 192.168.9.45 ip_port_v2 = 63000 node_name = node1 comm_voting = 1 guid = 0513A9FA0665DA21BE130002B3C76992 Node2: /etc/ocfs.conf: ip_address = 192.168.9.43 ip_port_v2 = 63000 node_name = node2 comm_voting = 1 guid = 0513A9FA0665DA21BE13001018038207 What I did: =======/etc/init.d/ocfs2 start; mount -t ocfs /dev/sdd1 /ocfs worked fine on both machines. I saw the filesystem on both machines and could copy files to and from the filesystem. I tried to stress the filesystem a little bit. By Accident i tried to write to the same file from both nodes. This may have caused the problems. While node2 hang completely /var/log/messages on node1 told the following: kernel: (15943) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: warning: rzsdb5 (node 1) may be ejected from cluster on device (8.48)... 20 misses so far kernel: (11) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 504: bad message: vote_state=0 type=1 lockid=10612113408 expected=10612113408 kernel: (11) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 575: status = -22 kernel: (16841) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 893: inode 2590848, vote_status=0, vote_state=1, lockid=10612113408, flags = 0x5, asked type = 5 master = 0, state = 0x0, type = 5 kernel: (16841) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/dlm.c, 384: Timed out acquiring lock for inode 2590848, (lockid = 10612113408) retrying... kernel: (16849) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/alloc.c, 3983: status = -28 kernel: (16849) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/journal.c, 727: block 2590849 was modified but never dirtied! kernel: (16849) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/journal.c, 727: block 9 was modified but never dirtied! kernel: (16882) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: warning: rzsdb5 (node 1) may be ejected from cluster on device (8.48)... 20 misses so far kernel: (16882) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: warning: rzsdb5 (node 1) WILL BE EJECTED from cluster on device (8.48)... 40 misses so far kernel: (16882) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: Removing rzsdb5 (node 1) from clustered device (8,48) after 60 misses kernel: (16881) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 893: inode 27, vote_status=0, vote_state=1, lockid=110592, flags = 0x101, asked type = 5 master = 1, state = 0x0, type = 5 kernel: (16881) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/dlm.c, 384: Timed out acquiring lock for inode 27, (lockid = 110592) retrying... After rebooting/resetting the nodes I am unable to mount /ocfs on both nodes. I fiddled around with this Problem, but can't figure out whats wrong. Node1 works fine. But on Node2 mount -t ocfs2 /dev/sdd1 /ocfs complains with mount: Unknown error 999 kernel: max_nodes for this device: 2 kernel: clusterbits=12 kernel: vol_label: kernel: uuid: b3 c4 dc df 55 f6 87 b9 b2 1d 93 39 ae ea 75 bd kernel: root_blkno=3, system_dir_blkno=4 kernel: autoconfig: blkno=330, blocks=4 newblkno=334 newblocks=4 kernel: publish: blkno=338, blocks=2 kernel: vote: blkno=340, blocks=2 kernel: bitmap_blkno=433, bitmap_blocks=150, num_clusters=4882676 kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 799: Re-mount volume with the reclaimid option to reclaim the node number kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 836: status = -999 kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 1654: status = -999 kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 979: status = -999 If I am doing a /etc/init.d/ocfs stop|start I get a kernel panic. --------- kernel: ocfs2: unsupported module, tainting kernel. kernel: Oracle Cluster FileSystem 2 Mon Sep 13 17:26:09 PDT 2004 (build f96fe36998cba3997ad37e17bcb49b04) kernel: ocfs2: hostname is rzsdb5 kernel: max_nodes for this device: 2 kernel: clusterbits=12 kernel: vol_label: kernel: uuid: d6 bb 8c c9 d4 3e 29 d8 db 42 2a e9 d3 c5 a2 49 kernel: root_blkno=3, system_dir_blkno=4 kernel: autoconfig: blkno=330, blocks=4 newblkno=334 newblocks=4 kernel: publish: blkno=338, blocks=2 kernel: vote: blkno=340, blocks=2 kernel: bitmap_blkno=433, bitmap_blocks=150, num_clusters=4882676 kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 799: Re-mount volume with the reclaimid option to reclaim the node number kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 836: status = -999 kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 1654: status = -999 kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 979: status = -999 -- MARK -- kernel: slab error in kmem_cache_destroy(): cache `ocfs2_inode': Can't free all objects kernel: Call Trace: kernel: [<c01521b4>] kmem_cache_destroy+0xe4/0x130 kernel: [<f936e3fa>] ocfs_free_mem_lists+0xa/0x26 [ocfs2] kernel: [<f93751fa>] ocfs_driver_exit+0x12a/0x15f [ocfs2] kernel: [<c013bc94>] kthread_stop+0x64/0x90 kernel: [<c0143b0d>] sys_delete_module+0x18d/0x220 kernel: [<c0109199>] sysenter_past_esp+0x52/0x71 kernel: kernel: slab error in kmem_cache_destroy(): cache `ocfs2_extent': Can't free all objects kernel: Call Trace: kernel: [<c01521b4>] kmem_cache_destroy+0xe4/0x130 kernel: [<f936e404>] ocfs_free_mem_lists+0x14/0x26 [ocfs2] kernel: [<f93751fa>] ocfs_driver_exit+0x12a/0x15f [ocfs2] kernel: [<c013bc94>] kthread_stop+0x64/0x90 kernel: [<c0143b0d>] sys_delete_module+0x18d/0x220 kernel: [<c0109199>] sysenter_past_esp+0x52/0x71 kernel: kernel: Unloaded OCFS Driver module kernel: ocfs2: unsupported module, tainting kernel. kernel: Oracle Cluster FileSystem 2 Mon Sep 13 17:26:09 PDT 2004 (build f96fe36998cba3997ad37e17bcb49b04) kernel: ocfs2: hostname is rzsdb5 kernel: kmem_cache_create: duplicate cache ocfs2_inode kernel: ------------[ cut here ]------------ kernel: kernel BUG at mm/slab.c:1348! kernel: invalid operand: 0000 [#1] kernel: SMP kernel: CPU: 3 kernel: EIP: 0060:[<c0151f97>] Tainted: G U kernel: EFLAGS: 00010202 (2.6.5-7.108-bigsmp) kernel: EIP is at kmem_cache_create+0x467/0x5a0 kernel: eax: 0000002f ebx: f5ff1a48 ecx: c04b2610 edx: 00007098 kernel: esi: f9378286 edi: f9378286 ebp: f7131a50 esp: f4651f44 kernel: ds: 007b es: 007b ss: 0068 kernel: Process modprobe (pid: 5434, threadinfo=f4650000 task=f61e06b0) kernel: Stack: c034a48c f937827a 0000001a 0000000a c0000000 ffffff80 00000080 00000180 kernel: f937827a 00000080 00000000 00000000 00000000 c0391510 f900f437 00003000 kernel: 00000000 00000000 0805a110 c039153c f9384180 4013b008 c0143502 00000013 kernel: Call Trace: kernel: [<f900f437>] ocfs_driver_entry+0x437/0x82f [ocfs2] kernel: [<c0143502>] sys_init_module+0x152/0x290 kernel: [<c0109199>] sysenter_past_esp+0x52/0x71 kernel: kernel: Code: 0f 0b 44 05 f8 9c 34 c0 eb cc b8 00 e0 ff ff 21 e0 8b 58 10 ------ If I am mounting the device with mount -t ocfs2 -o reclaimid /dev/sdd1 /ocfs it gets mounted, but an attempt of unmounting it leads to a kernel panic on the other machine... Is it so easy to crash ocfs2 at the moment or am i doing something completely wrong? Mit freundlichen Gruessen / Sincerely Jan Pilawa -- + Kontakt ----------------------------------------------------+ + Systembetreuung Rechenzentrum TU Braunschweig + Hans-Sommer-Str. 65, D-38106 Braunschweig + Tel: +49 531 391-5548 E-Mail: j.pilawa@tu-bs.de