thr3ads.net - Gluster users - [Gluster-users] Initial mount problem

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2015-Mar-31 06:57 UTC

[Gluster-users] Initial mount problem - all subvolumes are down

Atin,
        Could it be because bricks are started with PROC_START_NO_WAIT?

Pranith
On 03/31/2015 04:41 AM, Rumen Telbizov wrote:> Hello everyone,
>
> I have a problem that I am trying to resolve and not sure which way to 
> go so here I am asking for your advise.
>
> What it comes down to is that upon initial boot of all my GlusterFS 
> machines the shared volume doesn't get mounted. Nevertheless the 
> volume successfully created and started and further attempts to mount 
> it manually succeed. I suspect what's happening is that gluster 
> processes/bricks/etc haven't fully started at the time the /etc/fstab 
> entry is read and the initial mount attempt is being made. Again, by 
> the time I log in and run a mount -a -- the volume mounts without any 
> issues.
>
> _Details from the logs:_
>
> [2015-03-30 22:29:04.381918] I [MSGID: 100030] 
> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running 
> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs 
> --log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0 
> --entry-timeout=0 --volfile-server=localhost 
> --volfile-server=10.12.130.21 --volfile-server=10.12.130.22 
> --volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared)
> [2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish] 
> 0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007> 
> failed (Connection refused)
> [2015-03-30 22:29:04.394950] E 
> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to 
> connect with remote-host: localhost (Transport endpoint is not connected)
> [2015-03-30 22:29:04.394964] I 
> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting 
> to next volfile server 10.12.130.21
> [2015-03-30 22:29:08.390687] E 
> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to 
> connect with remote-host: 10.12.130.21 (Transport endpoint is not 
> connected)
> [2015-03-30 22:29:08.390720] I 
> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting 
> to next volfile server 10.12.130.22
> [2015-03-30 22:29:11.392015] E 
> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to 
> connect with remote-host: 10.12.130.22 (Transport endpoint is not 
> connected)
> [2015-03-30 22:29:11.392050] I 
> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting 
> to next volfile server 10.12.130.23
> [2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex] 
> 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
> [2015-03-30 22:29:14.408964] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting 
> frame-timeout to 60
> [2015-03-30 22:29:14.409183] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting 
> frame-timeout to 60
> [2015-03-30 22:29:14.409388] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting 
> frame-timeout to 60
> [2015-03-30 22:29:14.409430] I [client.c:2280:notify] 0-host-client-0: 
> parent translators are ready, attempting connect on transport
> [2015-03-30 22:29:14.409658] I [client.c:2280:notify] 0-host-client-1: 
> parent translators are ready, attempting connect on transport
> [2015-03-30 22:29:14.409844] I [client.c:2280:notify] 0-host-client-2: 
> parent translators are ready, attempting connect on transport
> Final graph:
>
> ....
>
> [2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify] 
> 0-host-client-2: disconnected from host-client-2. Client process will 
> keep trying to connect to glusterd until brick's port is available
> *[2015-03-30 22:29:14.411063] E [MSGID: 108006] 
> [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes 
> are down. Going offline until atleast one of them comes back up.
> *[2015-03-30 22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup] 
> 0-fuse: switched to graph 0
> [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init] 
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 
> kernel 7.17
> [2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init] 
> 0-myvolume-replicate-0: no subvolumes up
> [2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init] 
> 0-myvolume-replicate-0: no subvolumes up
> [2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk] 
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not 
> connected)
> [2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc] 
> 0-fuse: unmounting /opt/shared
> *[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit] 
> (--> 0-: received signum (15), shutting down*
> [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse: 
> Unmounting '/opt/shared'.
>
>
> _Relevant /etc/fstab entries are:_
>
> /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0
>
> localhost:/myvolume /opt/shared glusterfs 
>
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
> 0 0
>
>
> _Volume configuration is:_
>
> Volume Name: myvolume
> Type: Replicate
> Volume ID: xxxx
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: host1:/opt/local/brick
> Brick2: host2:/opt/local/brick
> Brick3: host3:/opt/local/brick
> Options Reconfigured:
> storage.health-check-interval: 5
> network.ping-timeout: 5
> nfs.disable: on
> auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
> cluster.quorum-type: auto
> network.frame-timeout: 60
>
>
> I run Debian 7 and the following GlusterFS version 3.6.2-2.
>
> While I could together some rc.local type of script which retries to 
> mount the volume for a while until it succeeds or times out I was 
> wondering if there's a better way to solve this problem?
>
> Thank you for your help.
>
> Regards,
> -- 
> Rumen Telbizov
> Unix Systems Administrator <http://telbizov.com>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150331/70778b90/attachment.html>

Atin Mukherjee

2015-Mar-31 07:23 UTC

head link

[Gluster-users] Initial mount problem - all subvolumes are down

On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:> Atin,
>        Could it be because bricks are started with PROC_START_NO_WAIT?That's the correct analysis Pranith. Mount was attempted before the
bricks were started. If we can have a time lag in some seconds between
mount and volume start the problem will go away.

> 
> Pranith
> On 03/31/2015 04:41 AM, Rumen Telbizov wrote:
>> Hello everyone,
>>
>> I have a problem that I am trying to resolve and not sure which way to
>> go so here I am asking for your advise.
>>
>> What it comes down to is that upon initial boot of all my GlusterFS
>> machines the shared volume doesn't get mounted. Nevertheless the
>> volume successfully created and started and further attempts to mount
>> it manually succeed. I suspect what's happening is that gluster
>> processes/bricks/etc haven't fully started at the time the
/etc/fstab
>> entry is read and the initial mount attempt is being made. Again, by
>> the time I log in and run a mount -a -- the volume mounts without any
>> issues.
>>
>> _Details from the logs:_
>>
>> [2015-03-30 22:29:04.381918] I [MSGID: 100030]
>> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running
>> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs
>> --log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0
>> --entry-timeout=0 --volfile-server=localhost
>> --volfile-server=10.12.130.21 --volfile-server=10.12.130.22
>> --volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared)
>> [2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish]
>> 0-glusterfs: connection to 127.0.0.1:24007
<http://127.0.0.1:24007>
>> failed (Connection refused)
>> [2015-03-30 22:29:04.394950] E
>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to
>> connect with remote-host: localhost (Transport endpoint is not
connected)
>> [2015-03-30 22:29:04.394964] I
>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
>> to next volfile server 10.12.130.21
>> [2015-03-30 22:29:08.390687] E
>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to
>> connect with remote-host: 10.12.130.21 (Transport endpoint is not
>> connected)
>> [2015-03-30 22:29:08.390720] I
>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
>> to next volfile server 10.12.130.22
>> [2015-03-30 22:29:11.392015] E
>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to
>> connect with remote-host: 10.12.130.22 (Transport endpoint is not
>> connected)
>> [2015-03-30 22:29:11.392050] I
>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
>> to next volfile server 10.12.130.23
>> [2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex]
>> 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
>> [2015-03-30 22:29:14.408964] I
>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting
>> frame-timeout to 60
>> [2015-03-30 22:29:14.409183] I
>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting
>> frame-timeout to 60
>> [2015-03-30 22:29:14.409388] I
>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting
>> frame-timeout to 60
>> [2015-03-30 22:29:14.409430] I [client.c:2280:notify] 0-host-client-0:
>> parent translators are ready, attempting connect on transport
>> [2015-03-30 22:29:14.409658] I [client.c:2280:notify] 0-host-client-1:
>> parent translators are ready, attempting connect on transport
>> [2015-03-30 22:29:14.409844] I [client.c:2280:notify] 0-host-client-2:
>> parent translators are ready, attempting connect on transport
>> Final graph:
>>
>> ....
>>
>> [2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify]
>> 0-host-client-2: disconnected from host-client-2. Client process will
>> keep trying to connect to glusterd until brick's port is available
>> *[2015-03-30 22:29:14.411063] E [MSGID: 108006]
>> [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes
>> are down. Going offline until atleast one of them comes back up.
>> *[2015-03-30 22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup]
>> 0-fuse: switched to graph 0
>> [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init]
>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22
>> kernel 7.17
>> [2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init]
>> 0-myvolume-replicate-0: no subvolumes up
>> [2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init]
>> 0-myvolume-replicate-0: no subvolumes up
>> [2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk]
>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not
>> connected)
>> [2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc]
>> 0-fuse: unmounting /opt/shared
>> *[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit]
>> (--> 0-: received signum (15), shutting down*
>> [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse:
>> Unmounting '/opt/shared'.
>>
>>
>> _Relevant /etc/fstab entries are:_
>>
>> /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0
>>
>> localhost:/myvolume /opt/shared glusterfs
>>
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
>> 0 0
>>
>>
>> _Volume configuration is:_
>>
>> Volume Name: myvolume
>> Type: Replicate
>> Volume ID: xxxx
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: host1:/opt/local/brick
>> Brick2: host2:/opt/local/brick
>> Brick3: host3:/opt/local/brick
>> Options Reconfigured:
>> storage.health-check-interval: 5
>> network.ping-timeout: 5
>> nfs.disable: on
>> auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
>> cluster.quorum-type: auto
>> network.frame-timeout: 60
>>
>>
>> I run Debian 7 and the following GlusterFS version 3.6.2-2.
>>
>> While I could together some rc.local type of script which retries to
>> mount the volume for a while until it succeeds or times out I was
>> wondering if there's a better way to solve this problem?
>>
>> Thank you for your help.
>>
>> Regards,
>> -- 
>> Rumen Telbizov
>> Unix Systems Administrator <http://telbizov.com>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
-- 
~Atin

Gluster users - Mar 2015 - Initial mount problem - all subvolumes are down

[Gluster-users] Initial mount problem - all subvolumes are down

[Gluster-users] Initial mount problem - all subvolumes are down