Hi Joseph,
yes, that's what i thought. But why doesn't the other node have the same
proplem ?
Also on the other host the port is opened as the last step during boot,
after starting the ocfs2 and the o2cb init script. But it suceeded in mounting
the volume !?!
Then this would be a design error from Suse.
Bernd
----- On Oct 25, 2016, at 11:44 AM, Joseph Qi joseph.qi at huawei.com wrote:
> Hi Bernd,
> The error message shows that connection between node 1 and 2 cannot be
> set up. So you should make sure network is up before mounting and can
> be reached through port 7777.
>
> Thanks,
> Joseph
>
> On 2016/10/25 3:36, Lentes, Bernd wrote:
>> Hi,
>>
>> i have two nodes (SLES 11 SP4 64bit), which are connected via FC to a
SAN. On
>> the SAN i created an OCFS2 volume.
>> One host (let's call him 20) mounts the OCFS2 volume while booting
>> automatically. The other (let's call him 10) doesn't.
>> Here is my /etc/ocfs2/cluster.conf:
>>
>> cluster:
>> node_count = 2
>> name = idg
>>
>> node:
>> ip_port = 7777
>> ip_address = 192.168.100.10
>> number = 1
>> name = sunhb65277
>> cluster = idg
>>
>> node:
>> ip_port = 7777
>> ip_address = 192.168.100.20
>> number = 2
>> name = sunhb58820
>> cluster = idg
>>
>>
>> 192.168.100.10 is host 10, 192.168.100.20 is host 20. File is identical
on both
>> nodes.
>>
>> /etc/fstab:
>> /dev/disk/by-id/dm-uuid-mpath-3600c0ff00012824b04af7a5201000000
/images
>> ocfs2 _netdev,defaults 0 0
>>
>>
>> This is the error message on 10:
>>
>> Oct 24 19:16:27 sunhb65277 kernel: [ 46.302046] OCFS2 1.5.0
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296137]
>> (kworker/u:0,5,3):o2net_connect_expired:1724 ERROR: no connection
established
>> with node 2 after 60.0 seconds, giving up and returning errors.
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296182]
>> (mount.ocfs2,6555,0):dlm_request_join:1472 ERROR: Error -107 when
sending
>> message 510 (key 0x666c6172) to node 2
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296188]
>> (mount.ocfs2,6555,0):dlm_try_to_join_domain:1648 ERROR: status = -107
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296193]
>> (mount.ocfs2,6555,0):dlm_join_domain:1948 ERROR: status = -107
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296311]
>> (mount.ocfs2,6555,0):dlm_register_domain:2214 ERROR: status = -107
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296330]
>> (mount.ocfs2,6555,0):o2cb_cluster_connect:313 ERROR: status = -107
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296334]
>> (mount.ocfs2,6555,0):ocfs2_dlm_init:2995 ERROR: status = -107
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296350]
>> (mount.ocfs2,6555,0):ocfs2_mount_volume:1881 ERROR: status = -107
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296387] ocfs2: Unmounting
device
>> (252,5) on (node 0)
>> Oct 24 19:17:23 sunhb65277 kernel: [ 102.296395]
>> (mount.ocfs2,6555,0):ocfs2_fill_super:1236 ERROR: status = -107
>>
>> The error is logical. In SLES, the firewall init script is the last one
>> executed. I don't know why, but it seems to be normal for SuSE.
>> So, port 7777 is not opened when host 20 tries to connect host 10. And
when the
>> port is opened, host 20 has already stopped connecting.
>> The host stuck in the init script from ocfs2 until "Network idle
timeout: 60000"
>> has run out.
>> The other way (host 20 booting independent if host 10 is online or
not), the
>> ocfs2 init script starts the mount, waits some seconds
>> and the host continues to boot (and the ocfs2 volume is mounted).
>>
>> What i find out already is that the node with the higher number (number
2, host
>> 20) tries to connect the node with the lower number (number 1,host 10)
>> (https://oss.oracle.com/pipermail/ocfs2-users/2009-June/003626.html).
>> Although i would expect that always the booting host tries to connect
the other
>> one(s).
>> Host 20 also mounts automatically when host 10 is offline.
>>
>> Questions:
>>
>> Why does host 20 mount automatically and host 10 does not ?
>> From where does host 20 know that host 10 is trying to mount the ocfs2
volume,
>> because in exactly that moment host 20 tries to connect host 10 on port
7777 ?
>> There is no packet from host 10 visible before !?!
>>
>> Of course i could fumble on the init scripts or change the order of
them, but i
>> would prefer having a running solution out of the box.
>> And i like to understand it.
>>
>>
>> Bernd
>>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671