thr3ads.net - Gluster users - [Gluster-users] Initial mount problem

If this information is useful, please help other people find it:
Share via:

Rumen Telbizov

2015-Mar-30 23:11 UTC

[Gluster-users] Initial mount problem - all subvolumes are down

Hello everyone,

I have a problem that I am trying to resolve and not sure which way to go
so here I am asking for your advise.

What it comes down to is that upon initial boot of all my GlusterFS
machines the shared volume doesn't get mounted. Nevertheless the volume
successfully created and started and further attempts to mount it manually
succeed. I suspect what's happening is that gluster processes/bricks/etc
haven't fully started at the time the /etc/fstab entry is read and the
initial mount attempt is being made. Again, by the time I log in and run a
mount -a -- the volume mounts without any issues.

*Details from the logs:*

[2015-03-30 22:29:04.381918] I [MSGID: 100030] [glusterfsd.c:2018:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.2
(args: /usr/sbin/glusterfs --log-file=/var/log/glusterfs/glusterfs.log
--attribute-timeout=0 --entry-timeout=0 --volfile-server=localhost
--volfile-server=10.12.130.21 --volfile-server=10.12.130.22
--volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared)
[2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish]
0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
[2015-03-30 22:29:04.394950] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport
endpoint is not connected)
[2015-03-30 22:29:04.394964] I [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting to next volfile server 10.12.130.21
[2015-03-30 22:29:08.390687] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host: 10.12.130.21
(Transport endpoint is not connected)
[2015-03-30 22:29:08.390720] I [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting to next volfile server 10.12.130.22
[2015-03-30 22:29:11.392015] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host: 10.12.130.22
(Transport endpoint is not connected)
[2015-03-30 22:29:11.392050] I [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting to next volfile server 10.12.130.23
[2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex]
0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-03-30 22:29:14.408964] I [rpc-clnt.c:969:rpc_clnt_connection_init]
0-host-client-2: setting frame-timeout to 60
[2015-03-30 22:29:14.409183] I [rpc-clnt.c:969:rpc_clnt_connection_init]
0-host-client-1: setting frame-timeout to 60
[2015-03-30 22:29:14.409388] I [rpc-clnt.c:969:rpc_clnt_connection_init]
0-host-client-0: setting frame-timeout to 60
[2015-03-30 22:29:14.409430] I [client.c:2280:notify] 0-host-client-0:
parent translators are ready, attempting connect on transport
[2015-03-30 22:29:14.409658] I [client.c:2280:notify] 0-host-client-1:
parent translators are ready, attempting connect on transport
[2015-03-30 22:29:14.409844] I [client.c:2280:notify] 0-host-client-2:
parent translators are ready, attempting connect on transport
Final graph:

....

[2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify]
0-host-client-2: disconnected from host-client-2. Client process will keep
trying to connect to glusterd until brick's port is available

*[2015-03-30 22:29:14.411063] E [MSGID: 108006]
[afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes are
down. Going offline until atleast one of them comes back up.*[2015-03-30
22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup] 0-fuse: switched
to graph 0
[2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel
7.17
[2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init]
0-myvolume-replicate-0: no subvolumes up
[2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init]
0-myvolume-replicate-0: no subvolumes up
[2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk]
0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
[2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc]
0-fuse: unmounting /opt/shared
*[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit] (-->
0-: received signum (15), shutting down*
[2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting
'/opt/shared'.


*Relevant /etc/fstab entries are:?*

/dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0

localhost:/myvolume /opt/shared glusterfs
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
0 0


*Volume configuration? is:*

Volume Name: myvolume
Type: Replicate
Volume ID: xxxx
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1:/opt/local/brick
Brick2: host2:/opt/local/brick
Brick3: host3:/opt/local/brick
Options Reconfigured:
storage.health-check-interval: 5
network.ping-timeout: 5
nfs.disable: on
auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
cluster.quorum-type: auto
network.frame-timeout: 60


?I run Debian 7 and the following GlusterFS version 3.6.2-2.

While I could together some rc.local type of script which retries to mount
the volume for a while until it succeeds or times out I was wondering if
there's a better way to solve this problem?

Thank you for your help.

Regards,
-- 
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150330/e5e13d66/attachment.html>

Pranith Kumar Karampuri

2015-Mar-31 06:57 UTC

head link

[Gluster-users] Initial mount problem - all subvolumes are down

Atin,
        Could it be because bricks are started with PROC_START_NO_WAIT?

Pranith
On 03/31/2015 04:41 AM, Rumen Telbizov wrote:> Hello everyone,
>
> I have a problem that I am trying to resolve and not sure which way to 
> go so here I am asking for your advise.
>
> What it comes down to is that upon initial boot of all my GlusterFS 
> machines the shared volume doesn't get mounted. Nevertheless the 
> volume successfully created and started and further attempts to mount 
> it manually succeed. I suspect what's happening is that gluster 
> processes/bricks/etc haven't fully started at the time the /etc/fstab 
> entry is read and the initial mount attempt is being made. Again, by 
> the time I log in and run a mount -a -- the volume mounts without any 
> issues.
>
> _Details from the logs:_
>
> [2015-03-30 22:29:04.381918] I [MSGID: 100030] 
> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running 
> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs 
> --log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0 
> --entry-timeout=0 --volfile-server=localhost 
> --volfile-server=10.12.130.21 --volfile-server=10.12.130.22 
> --volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared)
> [2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish] 
> 0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007> 
> failed (Connection refused)
> [2015-03-30 22:29:04.394950] E 
> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to 
> connect with remote-host: localhost (Transport endpoint is not connected)
> [2015-03-30 22:29:04.394964] I 
> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting 
> to next volfile server 10.12.130.21
> [2015-03-30 22:29:08.390687] E 
> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to 
> connect with remote-host: 10.12.130.21 (Transport endpoint is not 
> connected)
> [2015-03-30 22:29:08.390720] I 
> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting 
> to next volfile server 10.12.130.22
> [2015-03-30 22:29:11.392015] E 
> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to 
> connect with remote-host: 10.12.130.22 (Transport endpoint is not 
> connected)
> [2015-03-30 22:29:11.392050] I 
> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting 
> to next volfile server 10.12.130.23
> [2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex] 
> 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
> [2015-03-30 22:29:14.408964] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting 
> frame-timeout to 60
> [2015-03-30 22:29:14.409183] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting 
> frame-timeout to 60
> [2015-03-30 22:29:14.409388] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting 
> frame-timeout to 60
> [2015-03-30 22:29:14.409430] I [client.c:2280:notify] 0-host-client-0: 
> parent translators are ready, attempting connect on transport
> [2015-03-30 22:29:14.409658] I [client.c:2280:notify] 0-host-client-1: 
> parent translators are ready, attempting connect on transport
> [2015-03-30 22:29:14.409844] I [client.c:2280:notify] 0-host-client-2: 
> parent translators are ready, attempting connect on transport
> Final graph:
>
> ....
>
> [2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify] 
> 0-host-client-2: disconnected from host-client-2. Client process will 
> keep trying to connect to glusterd until brick's port is available
> *[2015-03-30 22:29:14.411063] E [MSGID: 108006] 
> [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes 
> are down. Going offline until atleast one of them comes back up.
> *[2015-03-30 22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup] 
> 0-fuse: switched to graph 0
> [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init] 
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 
> kernel 7.17
> [2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init] 
> 0-myvolume-replicate-0: no subvolumes up
> [2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init] 
> 0-myvolume-replicate-0: no subvolumes up
> [2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk] 
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not 
> connected)
> [2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc] 
> 0-fuse: unmounting /opt/shared
> *[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit] 
> (--> 0-: received signum (15), shutting down*
> [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse: 
> Unmounting '/opt/shared'.
>
>
> _Relevant /etc/fstab entries are:_
>
> /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0
>
> localhost:/myvolume /opt/shared glusterfs 
>
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
> 0 0
>
>
> _Volume configuration is:_
>
> Volume Name: myvolume
> Type: Replicate
> Volume ID: xxxx
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: host1:/opt/local/brick
> Brick2: host2:/opt/local/brick
> Brick3: host3:/opt/local/brick
> Options Reconfigured:
> storage.health-check-interval: 5
> network.ping-timeout: 5
> nfs.disable: on
> auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
> cluster.quorum-type: auto
> network.frame-timeout: 60
>
>
> I run Debian 7 and the following GlusterFS version 3.6.2-2.
>
> While I could together some rc.local type of script which retries to 
> mount the volume for a while until it succeeds or times out I was 
> wondering if there's a better way to solve this problem?
>
> Thank you for your help.
>
> Regards,
> -- 
> Rumen Telbizov
> Unix Systems Administrator <http://telbizov.com>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150331/70778b90/attachment.html>

Gluster users - Mar 2015 - Initial mount problem - all subvolumes are down

[Gluster-users] Initial mount problem - all subvolumes are down

[Gluster-users] Initial mount problem - all subvolumes are down