Thanks! Srinivas was helping me troubleshoot that last night with a series of
strace(es) that started to point to the Kernel issues - and then he realized
that it was the wrong version! I updated the version to
2.6.39-400.215.10.el6uek.x86_64 on both of my ocfs boxes and service o2cb enable
brought the global HB online with no issues!
Warm regards and thanks!
Jon
> On Nov 13, 2014, at 12:00 PM, ocfs2-users-request at oss.oracle.com wrote:
>
> Send Ocfs2-users mailing list submissions to
> ocfs2-users at oss.oracle.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
> or, via email, send a message with subject or body 'help' to
> ocfs2-users-request at oss.oracle.com
>
> You can reach the person managing the list at
> ocfs2-users-owner at oss.oracle.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Ocfs2-users digest..."
>
>
> Today's Topics:
>
> 1. Re: Ocfs2-users Digest, Vol 130, Issue 1 (Richard Sibthorp)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 13 Nov 2014 14:16:05 +0000
> From: Richard Sibthorp <richard.sibthorp at oracle.com>
> Subject: Re: [Ocfs2-users] Ocfs2-users Digest, Vol 130, Issue 1
> To: ocfs2-users at oss.oracle.com
> Message-ID: <5464BD25.8010003 at oracle.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Jon,
>
> The kernel you are using includes the ocfs2 kernel modules at version
> 1.6.3. The global heartbeat feature was introduced in ocfs2 1.8.
>
> I haven't checked whether any of the 2.6.32-based uek's include
ocfs2
> 1.8, but certainly the 2.6.39 and later (aka uek2, uek3) ones do.
>
> I assume from below you have an Oracle support license - at least for
> rdbms if not Oracle Linux. When using ocfs2 for rdbms resources, your
> rdbms license entitles you to ocfs2 support via MOS, though for
> general-purpose ocfs2 issues an Oracle Linux Support contract needs to
> be in place. This would have a separate CSI from that of licensed
> products - obviously open-source products are not licensed, but if you
> require support you need a support contract.
>
> You may also want to review MOS documents 1552519.1 and 1553162.1
>
> Best regards,
> Richard.
>
> On 13/11/2014 02:27, ocfs2-users-request at oss.oracle.com wrote:
>> Send Ocfs2-users mailing list submissions to
>> ocfs2-users at oss.oracle.com
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>> or, via email, send a message with subject or body 'help' to
>> ocfs2-users-request at oss.oracle.com
>>
>> You can reach the person managing the list at
>> ocfs2-users-owner at oss.oracle.com
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Ocfs2-users digest..."
>>
>>
>> Today's Topics:
>>
>> 1. OCFS2 v1.8 on VMware VMs global heartbeat woes (Jon Norris)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 12 Nov 2014 18:26:51 -0800
>> From: Jon Norris <jon_norris at apple.com>
>> Subject: [Ocfs2-users] OCFS2 v1.8 on VMware VMs global heartbeat woes
>> To: ocfs2-users at oss.oracle.com
>> Message-ID: <FFF084CA-12F0-4FA1-BB38-B70F5A4A6695 at apple.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Running two VMs on ESXi 5.1.0 and trying to get global heart beat (HB)
working with no luck (on about my 20th rebuild and redo)
>>
>> Environment:
>>
>> Two VMware based VMs running
>>
>> # cat /etc/oracle-release
>>
>> Oracle Linux Server release 6.5
>>
>> # uname -r
>>
>> 2.6.32-400.36.8.el6uek.x86_64
>>
>> # yum list installed | grep ocfs
>>
>> ocfs2-tools.x86_64 1.8.0-11.el6 @oel-latest
>>
>> # yum list installed | grep uek
>>
>> kernel-uek.x86_64 2.6.32-400.36.8.el6uek @oel-latest
>> kernel-uek-firmware.noarch 2.6.32-400.36.8.el6uek @oel-latest
>> kernel-uek-headers.x86_64 2.6.32-400.36.8.el6uek @oel-latest
>>
>> Configuration:
>>
>> The shared data stores (HB and mounted OCFS) are setup in a similar way
as described by VMWare and Oracle for shared RAC VMWare based data stores. All
blogs, wikis and VMWare KB docs show similar setup, VM shared SCSI settings
[multi-writer], shared disk [independant + persistent] etc. such as:
>>
>>
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1034165
<http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1034165>
)
>>
>> The devices can be seen by both VMs after in the OS. I have used the
same configuration to run an OCFS2 setup with local heartbeat, and that works
fine (cluster starts up and the OCFS2 file system mounts with no issues)
>>
>> I followed similar procedures as show in an Oracle blog + docs:
https://docs.oracle.com/cd/E37670_01/E37355/html/ol_instcfg_ocfs2.html
<https://docs.oracle.com/cd/E37670_01/E37355/html/ol_instcfg_ocfs2.html>
and https://blogs.oracle.com/wim/entry/ocfs2_global_heartbeat
<https://blogs.oracle.com/wim/entry/ocfs2_global_heartbeat> with no luck.
>>
>> The shared SCSI controllers are VMware paravirtual and set to ?shared
none? as suggested by the VMware RAC shared disk KB (previously mentioned)
>>
>> After the shared Linux devices have been added to both VMs and are seen
by both VMs in the OS (ls /dev/sd* shows the devices on each) I format the
global HB devices in a way similar to the following from one VM:
>>
>> # mkfs.ocfs2 -b 4K -C 4K -J size=4M -N 4 -L ocfs2vol1
--cluster-name=test --cluster-stack=o2cb --global-heartbeat /dev/sdc
>> # mkfs.ocfs2 -b 4K -C 4K -J size=4M -N 4 -L ocfs2vol2
--cluster-name=test --cluster-stack=o2cb --global-heartbeat /dev/sdd
>>
>>> From both VMs you can run the following and see:
>>
>> # mounted.ocfs2 -d
>>
>> Device Stack Cluster F UUID Label
>> /dev/sdc o2cb test G 5620F19D43D840C7A46523019AE15A96
ocfs2vol1
>> /dev/sdd o2cb test G 9B9182279ABD4FD99F695F91488C94C1
ocfs2vol2
>>
>> I then add the global HB devices to the ocfs config file with similar
commands:
>>
>> # o2cb add-heartbeat test 5620F19D43D840C7A46523019AE15A96
>> # o2cb add-heartbeat test 9B9182279ABD4FD99F695F91488C94C1
>>
>> Thus far looking good (heh, but then all we?ve done is format ocfs2
with options and updated a text file) - then I do the following:
>>
>> # o2cb heartbeat-mode test global
>>
>> All this being done on one node in the cluster I copy the following to
the other node (with hostnames changed here, though the actual hostname = output
of the hostname command on each node):
>>
>> # cat /etc/ocfs2/cluster.conf
>>
>> node:
>> name = clusterhost1.mydomain.com
>> cluster = test
>> number = 0
>> ip_address = 10.143.144.12
>> ip_port = 7777
>>
>> node:
>> name = clusterhost2.mydomain.com
>> cluster = test
>> number = 1
>> ip_address = 10.143.144.13
>> ip_port = 7777
>>
>> cluster:
>> name = test
>> heartbeat_mode = global
>> node_count = 2
>>
>> heartbeat:
>> cluster = test
>> region = 5620F19D43D840C7A46523019AE15A96
>>
>> heartbeat:
>> cluster = test
>> region = 9B9182279ABD4FD99F695F91488C94C1
>>
>> The same config works fine with heartbeat_mode set to local and the
global heartbeat devices removed, and I can mount a shared FS - the local HB
interfaces are IPv4 on a private L2 non routed VLAN, are up and each node can
ping each other.
>>
>> Once the config is copied to each node and have already run:
>>
>> # service o2cb configure
>>
>> Which completes in local heartbeat mode fine, so the cluster will start
on boot and the params are default for timeouts etc.
>>
>> I check that the service on both nodes unloads and loads modules with
no issues:
>>
>> # service o2cb unload
>>
>> Clean userdlm domains: OK
>> Unmounting ocfs2_dlmfs filesystem: OK
>> Unloading module "ocfs2_dlmfs": OK
>> Unloading module "ocfs2_stack_o2cb": OK
>> Unmounting configfs filesystem: OK
>> Unloading module "configfs": OK
>>
>> # service o2cb load
>>
>> Loading filesystem "configfs": OK
>> Mounting configfs filesystem at /sys/kernel/config: OK
>> Loading stack plugin "o2cb": OK
>> Loading filesystem "ocfs2_dlmfs": OK
>> Mounting ocfs2_dlmfs filesystem at /dlm: OK
>>
>> # mount -v
>> ?
>> ?.
>> debugfs on /sys/kernel/debug type debugfs (rw)
>> ?.
>> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
>>
>> # lsmod | grep ocfs
>>
>> ocfs2_dlmfs 18026 1
>> ocfs2_stack_o2cb 3606 0
>> ocfs2_dlm 196778 1 ocfs2_stack_o2cb
>> ocfs2_nodemanager 202856 3 ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
>> ocfs2_stackglue 11283 2 ocfs2_dlmfs,ocfs2_stack_o2cb
>> configfs 25853 2 ocfs2_nodemanager
>>
>> Looks good on both nodes?. then (sigh)
>>
>> # service o2cb enable
>>
>> Writing O2CB configuration: OK
>> Setting cluster stack "o2cb": OK
>> Registering O2CB cluster "test": Failed
>> o2cb: Unable to access cluster service while registering heartbeat mode
'global'
>> Unregistering O2CB cluster "test": OK
>>
>> I have searched for the error string and have come up with a huge ZERO
on help - and the local OS log messages are equally unhelpful:
>>
>> # tail /var/log/messages
>>
>> Nov 12 21:54:53 clusterhost1 o2cb.init: online test
>> Nov 13 00:58:38 clusterhost1 o2cb.init: online test
>> Nov 13 01:00:06 clusterhost1 o2cb.init: offline test 0
>> Nov 13 01:00:06 clusterhost1 kernel: ocfs2: Unregistered cluster
interface o2cb
>> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 Node Manager 1.6.3
>> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 DLM 1.6.3
>> Nov 13 01:01:14 clusterhost1 kernel: ocfs2: Registered cluster
interface o2cb
>> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 DLMFS 1.6.3
>> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 User DLM kernel interface
loaded
>> Nov 13 01:03:32 clusterhost1 o2cb.init: online test
>>
>> Dmesg shows the same:
>>
>> # dmesg
>>
>> OCFS2 Node Manager 1.6.3
>> OCFS2 DLM 1.6.3
>> ocfs2: Registered cluster interface o2cb
>> OCFS2 DLMFS 1.6.3
>> OCFS2 User DLM kernel interface loaded
>> Slow work thread pool: Starting up
>> Slow work thread pool: Ready
>> FS-Cache: Loaded
>> FS-Cache: Netfs 'nfs' registered for caching
>> eth0: no IPv6 routers present
>> eth1: no IPv6 routers present
>> ocfs2: Unregistered cluster interface o2cb
>> OCFS2 Node Manager 1.6.3
>> OCFS2 DLM 1.6.3
>> ocfs2: Registered cluster interface o2cb
>> OCFS2 DLMFS 1.6.3
>> OCFS2 User DLM kernel interface loaded
>> ocfs2: Unregistered cluster interface o2cb
>> OCFS2 Node Manager 1.6.3
>> OCFS2 DLM 1.6.3
>> ocfs2: Registered cluster interface o2cb
>> OCFS2 DLMFS 1.6.3
>> OCFS2 User DLM kernel interface loaded
>>
>> The filesystem looks fine and this can be run from both hosts in the
cluster:
>>
>> # fsck.ocfs2 -n /dev/sdc
>>
>> fsck.ocfs2 1.8.0
>> Checking OCFS2 filesystem in /dev/sdc:
>> Label: ocfs2vol1
>> UUID: 5620F19D43D840C7A46523019AE15A96
>> Number of blocks: 524288
>> Block size: 4096
>> Number of clusters: 524288
>> Cluster size: 4096
>> Number of slots: 4
>>
>> # fsck.ocfs2 -n /dev/sdd
>>
>> fsck.ocfs2 1.8.0
>> Checking OCFS2 filesystem in /dev/sdd:
>> Label: ocfs2vol2
>> UUID: 9B9182279ABD4FD99F695F91488C94C1
>> Number of blocks: 524288
>> Block size: 4096
>> Number of clusters: 524288
>> Cluster size: 4096
>> Number of slots: 4
>>
>> What am I missing? I?ve re-done this, re-created the devices a few too
many times (thinking I may have missed something) but I am mystified. From all
outer appearances I have two VMs that can see and in local heartbeat mode mount
a shared OCFS2 filesystem and access it (have it running in local heartbeat mode
for a cluster of rsyslog servers that are being load balanced by an F5 LTM VS
with no issues) I am stumped on how to get global HB devices setup, though I
have read and re-read the user guides, troubleshooting guides and wikis/blogs on
how to make that work until my eyes hurt.
>>
>> Mounted the debugfs and ran the debugfs.ocfs2 utility but am unfamiliar
of what I should be looking for there (or if this is where I would look for
cluster not coming online errors)
>>
>> As the oc2b/ocfs modules are all kernel based I am not 100% sure how to
increase debug information without digging into the source code and mucking
around there.
>>
>> Any guidance or lessons learned (or things to check) would be super :)
and if works warrant a happy scream of joy from my frustrated cube!
>>
>>
>> Warm regards,
>>
>> Jon
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20141112/68b903fe/attachment.html
>>
>> ------------------------------
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>> End of Ocfs2-users Digest, Vol 130, Issue 1
>> *******************************************
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
> End of Ocfs2-users Digest, Vol 130, Issue 2
> *******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20141113/c0a1422b/attachment-0001.html