thr3ads.net - Ocfs2 users - [Ocfs2-users] Problems mounting shared filesystem [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Chris Clonch

2010-Mar-22 18:44 UTC

[Ocfs2-users] Problems mounting shared filesystem

We are testing clustering and I am having issues getting all of my nodes to
mount.  I have 4 nodes.  I am using iSCSI to share 1 target with 2 luns.
All 4 nodes can are accessing the target; I can run fdisk -l against the
block devices.  Initially I had all 4 nodes mounting the share but brought
the cluster down to add an additional NIC.  Presently nodes 2 and 3 can
mount the shares, 1 and 4 can not.  Previously I had node 1 mounted and
nodes 2, 3 and 4 could not.

Any help is appreciated!

Nodes 2 & 3:

# service o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active


Nodes 1 & 4:

# service o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Not active


All nodes:

# mounted.ocfs2 -d
Device                FS     UUID                                  Label
/dev/sda1             ocfs2  fea0a398-a696-414f-bd9f-d7aa84bd6b77  ocu01
/dev/sdb1             ocfs2  26e82fa7-ec91-4a81-a965-571ed4223ab0
oracluster

# mounted.ocfs2 -f
Device                FS     Nodes
/dev/sda1             ocfs2  ocnode2, ocnode3
/dev/sdb1             ocfs2  ocnode2, ocnode3


dmesg snippet from node 4:

o2net: connected to node ocnode2 (num 2) at 192.168.1.2:7777
(4145,0):o2net_connect_expired:1664 ERROR: no connection established with
node 3 after 30.0 seconds, giving up and returning errors.
(4176,0):dlm_request_join:1036 ERROR: status = -107
(4176,0):dlm_try_to_join_domain:1210 ERROR: status = -107
(4176,0):dlm_join_domain:1488 ERROR: status = -107
(4176,0):dlm_register_domain:1754 ERROR: status = -107
(4176,0):ocfs2_dlm_init:2723 ERROR: status = -107
(4176,0):ocfs2_mount_volume:1437 ERROR: status = -107
ocfs2: Unmounting device (8,17) on (node 4)
o2net: no longer connected to node ocnode2 (num 2) at 192.168.1.2:7777
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100322/2f5224ee/attachment.html

Sunil Mushran

2010-Mar-22 20:52 UTC

head link

[Ocfs2-users] Problems mounting shared filesystem

The network connect is failing. Could be because of a firewall,
or bad ip address, some switch issue.

Mount the volume on node 2. Then enable tracing and
tail messages file.
# debugfs.ocfs2 -l TCP allow
# tail -f /var/log/messages

Then from node 4, ping node 2 using netcat.
# nc -z 192.168.1.2 7777

If it succeeds, then you should see:
Connection to 192.168.1.2 7777 port [tcp/cbt] succeeded!

Additionally, you will see a message on node 2 "attempt to connect
from node...".

If not, then look at your network setup.

Remember to disable tracing on node 2.
#debugfs.ocfs2 -l TCP off

Sunil

Chris Clonch wrote:> We are testing clustering and I am having issues getting all of my 
> nodes to mount.  I have 4 nodes.  I am using iSCSI to share 1 target 
> with 2 luns.  All 4 nodes can are accessing the target; I can run 
> fdisk -l against the block devices.  Initially I had all 4 nodes 
> mounting the share but brought the cluster down to add an additional 
> NIC.  Presently nodes 2 and 3 can mount the shares, 1 and 4 can not.  
> Previously I had node 1 mounted and nodes 2, 3 and 4 could not.
>
> Any help is appreciated!
>
> Nodes 2 & 3:
>
> # service o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster ocfs2: Online
> Heartbeat dead threshold = 31
>   Network idle timeout: 30000
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
> Checking O2CB heartbeat: Active
>
>
> Nodes 1 & 4:
>
> # service o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster ocfs2: Online
> Heartbeat dead threshold = 31
>   Network idle timeout: 30000
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
> Checking O2CB heartbeat: Not active
>
>
> All nodes:
>
> # mounted.ocfs2 -d
> Device                FS     UUID                                  Label
> /dev/sda1             ocfs2  fea0a398-a696-414f-bd9f-d7aa84bd6b77  ocu01
> /dev/sdb1             ocfs2  26e82fa7-ec91-4a81-a965-571ed4223ab0  
> oracluster
>
> # mounted.ocfs2 -f
> Device                FS     Nodes
> /dev/sda1             ocfs2  ocnode2, ocnode3
> /dev/sdb1             ocfs2  ocnode2, ocnode3
>
>
> dmesg snippet from node 4:
>
> o2net: connected to node ocnode2 (num 2) at 192.168.1.2:7777 
> <http://192.168.1.2:7777>
> (4145,0):o2net_connect_expired:1664 ERROR: no connection established 
> with node 3 after 30.0 seconds, giving up and returning errors.
> (4176,0):dlm_request_join:1036 ERROR: status = -107
> (4176,0):dlm_try_to_join_domain:1210 ERROR: status = -107
> (4176,0):dlm_join_domain:1488 ERROR: status = -107
> (4176,0):dlm_register_domain:1754 ERROR: status = -107
> (4176,0):ocfs2_dlm_init:2723 ERROR: status = -107
> (4176,0):ocfs2_mount_volume:1437 ERROR: status = -107
> ocfs2: Unmounting device (8,17) on (node 4)
> o2net: no longer connected to node ocnode2 (num 2) at 192.168.1.2:7777 
> <http://192.168.1.2:7777>

Chris Clonch

2010-Mar-23 19:07 UTC

head link

[Ocfs2-users] Fwd: Problems mounting shared filesystem

Thanks for the reply Sunil! ?I thought it might be a network issue,
but when I enabled debug then ran netcat against it I could connect.
Firewall and SELinux are both disabled on all nodes.

I went ahead and reran the ethernet for both sets of NIC to ensure the
old hub I was using was not a factor. ?Now everything runs through an
enterprise-class switch. ?Stats from netstat and ethtool look normal;
I did not view them prior to the changes. ?I also pulled down the
latest updates from RHEL (did I mention these are RHEL5.4?), which
included kernel-2.6.18-164.15.1.el5. ?Now the ocfs2_stackglue and
ocfs2_dlmfs modules are failing to load.

-Chris

On Mon, Mar 22, 2010 at 4:52 PM, Sunil Mushran <sunil.mushran at
oracle.com> wrote:>
> The network connect is failing. Could be because of a firewall,
> or bad ip address, some switch issue.
>
> Mount the volume on node 2. Then enable tracing and
> tail messages file.
> # debugfs.ocfs2 -l TCP allow
> # tail -f /var/log/messages
>
> Then from node 4, ping node 2 using netcat.
> # nc -z 192.168.1.2 7777
>
> If it succeeds, then you should see:
> Connection to 192.168.1.2 7777 port [tcp/cbt] succeeded!
>
> Additionally, you will see a message on node 2 "attempt to connect
> from node...".
>
> If not, then look at your network setup.
>
> Remember to disable tracing on node 2.
> #debugfs.ocfs2 -l TCP off
>
> Sunil

Ocfs2 users - Mar 2010 - Problems mounting shared filesystem

[Ocfs2-users] Problems mounting shared filesystem

[Ocfs2-users] Problems mounting shared filesystem

[Ocfs2-users] Fwd: Problems mounting shared filesystem