Laurentiu Gosu
2011-Dec-11 16:14 UTC
[Ocfs2-users] Unable to stop cluster as heartbeat region still active
Hi Sunil, Maybe you remember the bellow thread. Shortly the pb was that heartbeat region was still active after umounting the ocfs volume(i use latest UEK + ocfs2-tools). Based on this link http://markmail.org/message/7h7r32avuitqdhzr#query:+page:1+mid:lq7arecz2dui6b3v+state:results i manually created /dev/dm-2 symlink to point to my SAN device [/dev/mapper/volgr1-lvol0] and the hearbeat was stopped normally. Maybe it helps you find the real issue. As i understand that symlink should be automatically created but it seems the pb is still there in ocfs2-tools-1.6.3-2.el5. br, laurentiu. On 10/24/2011 23:54, Sunil Mushran wrote:> Well, I wouldn't advice you to go into prod with this problem. > To figure out the issue, we'll need to provide a debug version of > ocfs2_hb_ctl. > > If you have support, ping oracle support and ask for assistance. > > If not, download the source and run ocfs2_hb_ctl in gdb. The problem > is in the code path that begins in the function lookup_dev(). > > On 10/23/2011 01:30 PM, Laurentiu Gosu wrote: >> #rpm -qa |grep ocfs2 >> ocfs2console-1.6.3-2.el5 >> ocfs2-tools-1.6.3-2.el5 >> >> Just let me know if I can give more details to find the problem. I >> will move ocfs2 into production in the next weeks. >> >> >> On 10/23/2011 22:49, Sunil Mushran wrote: >>> Are you sure you have ocfs2-tools-1.6.3? I remember we had an >>> issue with this with an earlier release... 1.6.1/.2. >>> >>> On 10/23/2011 10:43 AM, Laurentiu Gosu wrote: >>>> hmm.. >>>> #ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs >>>> *BUT:* >>>> #ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 >>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat >>>> I can still kill the ref using device name (-d). >>>> >>>> On 10/23/2011 17:57, Sunil Mushran wrote: >>>>> I think it stops by uuid. So try doing this the next time. >>>>> You are encountering some issue that we have not seen before. >>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 >>>>> >>>>> On 10/23/2011 05:32 AM, Laurentiu Gosu wrote: >>>>>> Hi Sunil, >>>>>> Sorry for my late reply, i just had time today to start from >>>>>> scratch and test. >>>>>> I rebuilt my environment(2 nodes connected to a SAN via >>>>>> iSCSI+multipath). I still have the issue that the heartbeat is >>>>>> active after I umount my ocfs2 volume. >>>>>> /etc/init.d/o2cb stop >>>>>> Stopping O2CB cluster CLUST: Failed >>>>>> Unable to stop cluster as heartbeat region still active >>>>>> >>>>>> ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 >>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs >>>>>> >>>>>> After i manually kill the ref (ocfs2_hb_ctl -K -d >>>>>> /dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I >>>>>> can live with that but why doesn't it stop automatically? As i >>>>>> understand, hearbeat should be started and stopped once the >>>>>> volume gets mounted/umounted. >>>>>> >>>>>> br, >>>>>> Laurentiu. >>>>>> >>>>>> On 10/19/2011 02:28, Sunil Mushran wrote: >>>>>>> Manual delete will only work if there are no references. In your >>>>>>> case >>>>>>> there are references. >>>>>>> >>>>>>> You may want to start both nodes from scratch. Do not start/stop >>>>>>> heartbeat manually. Also, do not force-format. >>>>>>> >>>>>>> On 10/18/2011 03:54 PM, Laurentiu Gosu wrote: >>>>>>>> OK, i rebooted one of the nodes(both had similar issues); . But >>>>>>>> something is still fishy. >>>>>>>> - i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/ >>>>>>>> - i unmount it: umount /mnt/tmp/ >>>>>>>> - tried to stop o2cb: /etc/init.d/o2cb stop >>>>>>>> Stopping O2CB cluster CLUSTER: Failed >>>>>>>> Unable to stop cluster as heartbeat region still active >>>>>>>> - ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs >>>>>>>> - ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping >>>>>>>> heartbeat >>>>>>>> - ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/ >>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/: >>>>>>>> total 0 >>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 01:50 >>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold >>>>>>>> >>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D: >>>>>>>> total 0 >>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes >>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks >>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 dev >>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 01:50 pid >>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block >>>>>>>> >>>>>>>> - i cannot manually delete >>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/ >>>>>>>> >>>>>>>> PS: i'm going to sleep now, i have to be up in a few hours. We >>>>>>>> can continue tomorrow if it's ok with you. >>>>>>>> Thank you for your help. >>>>>>>> >>>>>>>> Laurentiu. >>>>>>>> >>>>>>>> On 10/19/2011 01:33, Sunil Mushran wrote: >>>>>>>>> One way this can happen is if one starts the hb manually and >>>>>>>>> then force >>>>>>>>> formats on that volume. The format will generate a new uuid. >>>>>>>>> Once that >>>>>>>>> happens, the hb tool cannot map the region to the device and >>>>>>>>> thus fail >>>>>>>>> to stop it. Right now the easiest option on this box is >>>>>>>>> resetting it. >>>>>>>>> >>>>>>>>> On 10/18/2011 03:24 PM, Laurentiu Gosu wrote: >>>>>>>>>> Yes, i did reformat it(even more than once i think, last >>>>>>>>>> week). This is a pre-production system and i'm trying various >>>>>>>>>> options before moving into real life. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 10/19/2011 01:19, Sunil Mushran wrote: >>>>>>>>>>> Did you reformat the volume recently? or, when did you >>>>>>>>>>> format last? >>>>>>>>>>> >>>>>>>>>>> On 10/18/2011 03:13 PM, Laurentiu Gosu wrote: >>>>>>>>>>>> well..this is weird >>>>>>>>>>>> ls /sys/kernel/config/cluster/CLUSTER/heartbeat/ >>>>>>>>>>>> *918673F06F8F4ED188DDCE14F39945F6* dead_threshold >>>>>>>>>>>> >>>>>>>>>>>> looks like we have different UUIDs. Where is this coming from?? >>>>>>>>>>>> >>>>>>>>>>>> ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6 >>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6: 1 refs >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 10/19/2011 01:04, Sunil Mushran wrote: >>>>>>>>>>>>> Let's do it by hand. >>>>>>>>>>>>> rm -rf >>>>>>>>>>>>> /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>> * >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/18/2011 02:52 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while >>>>>>>>>>>>>> stopping heartbeat >>>>>>>>>>>>>> >>>>>>>>>>>>>> No improvment :( >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 10/19/2011 00:50, Sunil Mushran wrote: >>>>>>>>>>>>>>> See if this cleans it up. >>>>>>>>>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 10/19/2011 00:43, Sunil Mushran wrote: >>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>> mounted.ocfs2 -d >>>>>>>>>>>>>>>>>> Device FS Stack >>>>>>>>>>>>>>>>>> UUID Label >>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 o2cb >>>>>>>>>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> mounted.ocfs2 -f >>>>>>>>>>>>>>>>>> Device FS Nodes >>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 ro02xsrv001 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ro02xsrv001 = the other node in the cluster. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> By the way, there is no /dev/md-2 >>>>>>>>>>>>>>>>>> ls /dev/dm-* >>>>>>>>>>>>>>>>>> /dev/dm-0 /dev/dm-1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 10/19/2011 00:37, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>> So it is not mounted. But we still have a hb thread >>>>>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>>>>> hb could not be stopped during umount. The reason >>>>>>>>>>>>>>>>>>> for that >>>>>>>>>>>>>>>>>>> could be the same that causes ocfs2_hb_ctl to fail. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Do: >>>>>>>>>>>>>>>>>>> mounted.ocfs2 -d >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2 >>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/ocfs2: >>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm >>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/o2dlm: >>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2 >>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl: Device name specified was not found >>>>>>>>>>>>>>>>>>>> while reading uuid >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> There is no /dev/dm-2 mounted. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>>>> mount -t debugfs debugfs /sys/kernel/debug >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Then list that dir. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Also, do: >>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Be careful before killing. We want to be sure that >>>>>>>>>>>>>>>>>>>>> dev is not mounted. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>>>> Again the outputs: >>>>>>>>>>>>>>>>>>>>>> cat >>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev >>>>>>>>>>>>>>>>>>>>>> dm-2 >>>>>>>>>>>>>>>>>>>>>> --->here should be volgr1-lvol0 i guess? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2 >>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or >>>>>>>>>>>>>>>>>>>>>> directory >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm >>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or >>>>>>>>>>>>>>>>>>>>>> directory >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I think i have to enable debug first somehow..? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Laurentiu. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>>>>>> What does this return? >>>>>>>>>>>>>>>>>>>>>>> cat >>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Also, do: >>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2 >>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>>>>>> Here is the output: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster >>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster: >>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER: >>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> fence_method >>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 3 root root 0 Oct 19 00:12 heartbeat >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> idle_timeout_ms >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> keepalive_delay_ms >>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 11 20:23 node >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> reconnect_delay_ms >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat: >>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6 >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> dead_threshold >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> block_bytes >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev >>>>>>>>>>>>>>>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> start_block >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node: >>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001 >>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> ipv4_address >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 >>>>>>>>>>>>>>>>>>>>>>>> ipv4_address >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local >>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> What does this return? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK >>>>>>>>>>>>>>>>>>>>>>>>>> 2.6.32-100.0.19.el5, >>>>>>>>>>>>>>>>>>>>>>>>>> ocfs2console-1.6.3-2.el5, >>>>>>>>>>>>>>>>>>>>>>>>>> ocfs2-tools-1.6.3-2.el5. >>>>>>>>>>>>>>>>>>>>>>>>>> My problem is that all the time when i try to >>>>>>>>>>>>>>>>>>>>>>>>>> run /etc/init.d/o2cb stop >>>>>>>>>>>>>>>>>>>>>>>>>> it fails with this error: >>>>>>>>>>>>>>>>>>>>>>>>>> Stopping O2CB cluster CLUSTER: Failed >>>>>>>>>>>>>>>>>>>>>>>>>> Unable to stop cluster as heartbeat >>>>>>>>>>>>>>>>>>>>>>>>>> region still active >>>>>>>>>>>>>>>>>>>>>>>>>> There is no active mount point. I tried to >>>>>>>>>>>>>>>>>>>>>>>>>> manually stop the heartdbeat >>>>>>>>>>>>>>>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d >>>>>>>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2" (after finding >>>>>>>>>>>>>>>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d >>>>>>>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 "). >>>>>>>>>>>>>>>>>>>>>>>>>> But even if refs number is set to zero the >>>>>>>>>>>>>>>>>>>>>>>>>> "heartbeat region still >>>>>>>>>>>>>>>>>>>>>>>>>> active" occurs. >>>>>>>>>>>>>>>>>>>>>>>>>> How can i fix this? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thank you in advance. >>>>>>>>>>>>>>>>>>>>>>>>>> Laurentiu. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users at oss.oracle.com >>>>>>>>>>>>>>>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20111211/62806b11/attachment-0001.html
Sunil Mushran
2011-Dec-12 19:02 UTC
[Ocfs2-users] Unable to stop cluster as heartbeat region still active
Thanks. Yes, stop hb looks up for the device in /proc/partitions. I guess the utility is expecting the partitions there because that's how udev works normally. Having said that, I think we have made a change in 1.8 whereby stop hb does not scan the devices but just looks up configfs. On 12/11/2011 08:14 AM, Laurentiu Gosu wrote:> > Hi Sunil, > Maybe you remember the bellow thread. Shortly the pb was that heartbeat region was still active after umounting the ocfs volume(i use latest UEK + ocfs2-tools). > Based on this link http://markmail.org/message/7h7r32avuitqdhzr#query:+page:1+mid:lq7arecz2dui6b3v+state:results i manually created /dev/dm-2 symlink to point to my SAN device [/dev/mapper/volgr1-lvol0] and the hearbeat was stopped normally. Maybe it helps you find the real issue. As i understand that symlink should be automatically created but it seems the pb is still there in ocfs2-tools-1.6.3-2.el5. > > br, > laurentiu. > > On 10/24/2011 23:54, Sunil Mushran wrote: >> Well, I wouldn't advice you to go into prod with this problem. >> To figure out the issue, we'll need to provide a debug version of >> ocfs2_hb_ctl. >> >> If you have support, ping oracle support and ask for assistance. >> >> If not, download the source and run ocfs2_hb_ctl in gdb. The problem >> is in the code path that begins in the function lookup_dev(). >> >> On 10/23/2011 01:30 PM, Laurentiu Gosu wrote: >>> #rpm -qa |grep ocfs2 >>> ocfs2console-1.6.3-2.el5 >>> ocfs2-tools-1.6.3-2.el5 >>> >>> Just let me know if I can give more details to find the problem. I will move ocfs2 into production in the next weeks. >>> >>> >>> On 10/23/2011 22:49, Sunil Mushran wrote: >>>> Are you sure you have ocfs2-tools-1.6.3? I remember we had an >>>> issue with this with an earlier release... 1.6.1/.2. >>>> >>>> On 10/23/2011 10:43 AM, Laurentiu Gosu wrote: >>>>> hmm.. >>>>> #ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs >>>>> *BUT:* >>>>> #ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 >>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat >>>>> I can still kill the ref using device name (-d). >>>>> >>>>> On 10/23/2011 17:57, Sunil Mushran wrote: >>>>>> I think it stops by uuid. So try doing this the next time. >>>>>> You are encountering some issue that we have not seen before. >>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 >>>>>> >>>>>> On 10/23/2011 05:32 AM, Laurentiu Gosu wrote: >>>>>>> Hi Sunil, >>>>>>> Sorry for my late reply, i just had time today to start from scratch and test. >>>>>>> I rebuilt my environment(2 nodes connected to a SAN via iSCSI+multipath). I still have the issue that the heartbeat is active after I umount my ocfs2 volume. >>>>>>> /etc/init.d/o2cb stop >>>>>>> Stopping O2CB cluster CLUST: Failed >>>>>>> Unable to stop cluster as heartbeat region still active >>>>>>> >>>>>>> ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 >>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs >>>>>>> >>>>>>> After i manually kill the ref (ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I can live with that but why doesn't it stop automatically? As i understand, hearbeat should be started and stopped once the volume gets mounted/umounted. >>>>>>> >>>>>>> br, >>>>>>> Laurentiu. >>>>>>> >>>>>>> On 10/19/2011 02:28, Sunil Mushran wrote: >>>>>>>> Manual delete will only work if there are no references. In your case >>>>>>>> there are references. >>>>>>>> >>>>>>>> You may want to start both nodes from scratch. Do not start/stop >>>>>>>> heartbeat manually. Also, do not force-format. >>>>>>>> >>>>>>>> On 10/18/2011 03:54 PM, Laurentiu Gosu wrote: >>>>>>>>> OK, i rebooted one of the nodes(both had similar issues); . But something is still fishy. >>>>>>>>> - i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/ >>>>>>>>> - i unmount it: umount /mnt/tmp/ >>>>>>>>> - tried to stop o2cb: /etc/init.d/o2cb stop >>>>>>>>> Stopping O2CB cluster CLUSTER: Failed >>>>>>>>> Unable to stop cluster as heartbeat region still active >>>>>>>>> - ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs >>>>>>>>> - ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat >>>>>>>>> - ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/ >>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/: >>>>>>>>> total 0 >>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 01:50 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold >>>>>>>>> >>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D: >>>>>>>>> total 0 >>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes >>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks >>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 dev >>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 01:50 pid >>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block >>>>>>>>> >>>>>>>>> - i cannot manually delete /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/ >>>>>>>>> >>>>>>>>> PS: i'm going to sleep now, i have to be up in a few hours. We can continue tomorrow if it's ok with you. >>>>>>>>> Thank you for your help. >>>>>>>>> >>>>>>>>> Laurentiu. >>>>>>>>> >>>>>>>>> On 10/19/2011 01:33, Sunil Mushran wrote: >>>>>>>>>> One way this can happen is if one starts the hb manually and then force >>>>>>>>>> formats on that volume. The format will generate a new uuid. Once that >>>>>>>>>> happens, the hb tool cannot map the region to the device and thus fail >>>>>>>>>> to stop it. Right now the easiest option on this box is resetting it. >>>>>>>>>> >>>>>>>>>> On 10/18/2011 03:24 PM, Laurentiu Gosu wrote: >>>>>>>>>>> Yes, i did reformat it(even more than once i think, last week). This is a pre-production system and i'm trying various options before moving into real life. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/19/2011 01:19, Sunil Mushran wrote: >>>>>>>>>>>> Did you reformat the volume recently? or, when did you format last? >>>>>>>>>>>> >>>>>>>>>>>> On 10/18/2011 03:13 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>> well..this is weird >>>>>>>>>>>>> ls /sys/kernel/config/cluster/CLUSTER/heartbeat/ >>>>>>>>>>>>> *918673F06F8F4ED188DDCE14F39945F6* dead_threshold >>>>>>>>>>>>> >>>>>>>>>>>>> looks like we have different UUIDs. Where is this coming from?? >>>>>>>>>>>>> >>>>>>>>>>>>> ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6 >>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6: 1 refs >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/19/2011 01:04, Sunil Mushran wrote: >>>>>>>>>>>>>> Let's do it by hand. >>>>>>>>>>>>>> rm -rf /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D * >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 10/18/2011 02:52 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> No improvment :( >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 10/19/2011 00:50, Sunil Mushran wrote: >>>>>>>>>>>>>>>> See if this cleans it up. >>>>>>>>>>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 10/19/2011 00:43, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>> mounted.ocfs2 -d >>>>>>>>>>>>>>>>>>> Device FS Stack UUID Label >>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 o2cb 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> mounted.ocfs2 -f >>>>>>>>>>>>>>>>>>> Device FS Nodes >>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 ro02xsrv001 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ro02xsrv001 = the other node in the cluster. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> By the way, there is no /dev/md-2 >>>>>>>>>>>>>>>>>>> ls /dev/dm-* >>>>>>>>>>>>>>>>>>> /dev/dm-0 /dev/dm-1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 10/19/2011 00:37, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>>> So it is not mounted. But we still have a hb thread because >>>>>>>>>>>>>>>>>>>> hb could not be stopped during umount. The reason for that >>>>>>>>>>>>>>>>>>>> could be the same that causes ocfs2_hb_ctl to fail. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Do: >>>>>>>>>>>>>>>>>>>> mounted.ocfs2 -d >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2 >>>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/ocfs2: >>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm >>>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/o2dlm: >>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2 >>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl: Device name specified was not found while reading uuid >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> There is no /dev/dm-2 mounted. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>>>>> mount -t debugfs debugfs /sys/kernel/debug >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Then list that dir. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Also, do: >>>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Be careful before killing. We want to be sure that dev is not mounted. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>>>>> Again the outputs: >>>>>>>>>>>>>>>>>>>>>>> cat /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev >>>>>>>>>>>>>>>>>>>>>>> dm-2 >>>>>>>>>>>>>>>>>>>>>>> --->here should be volgr1-lvol0 i guess? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2 >>>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or directory >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm >>>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or directory >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I think i have to enable debug first somehow..? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Laurentiu. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>>>>>>> What does this return? >>>>>>>>>>>>>>>>>>>>>>>> cat /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Also, do: >>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2 >>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Here is the output: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster >>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster: >>>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER: >>>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method >>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 3 root root 0 Oct 19 00:12 heartbeat >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms >>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 11 20:23 node >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat: >>>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6 >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*: >>>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev >>>>>>>>>>>>>>>>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node: >>>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001 >>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001: >>>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002: >>>>>>>>>>>>>>>>>>>>>>>>> total 0 >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local >>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> What does this return? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5, >>>>>>>>>>>>>>>>>>>>>>>>>>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5. >>>>>>>>>>>>>>>>>>>>>>>>>>> My problem is that all the time when i try to run /etc/init.d/o2cb stop >>>>>>>>>>>>>>>>>>>>>>>>>>> it fails with this error: >>>>>>>>>>>>>>>>>>>>>>>>>>> Stopping O2CB cluster CLUSTER: Failed >>>>>>>>>>>>>>>>>>>>>>>>>>> Unable to stop cluster as heartbeat region still active >>>>>>>>>>>>>>>>>>>>>>>>>>> There is no active mount point. I tried to manually stop the heartdbeat >>>>>>>>>>>>>>>>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after finding >>>>>>>>>>>>>>>>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 "). >>>>>>>>>>>>>>>>>>>>>>>>>>> But even if refs number is set to zero the "heartbeat region still >>>>>>>>>>>>>>>>>>>>>>>>>>> active" occurs. >>>>>>>>>>>>>>>>>>>>>>>>>>> How can i fix this? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you in advance. >>>>>>>>>>>>>>>>>>>>>>>>>>> Laurentiu. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users at oss.oracle.com >>>>>>>>>>>>>>>>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20111212/59b4d60f/attachment-0001.html