Pranith Kumar Karampuri
2015-Mar-31 06:57 UTC
[Gluster-users] Initial mount problem - all subvolumes are down
Atin, Could it be because bricks are started with PROC_START_NO_WAIT? Pranith On 03/31/2015 04:41 AM, Rumen Telbizov wrote:> Hello everyone, > > I have a problem that I am trying to resolve and not sure which way to > go so here I am asking for your advise. > > What it comes down to is that upon initial boot of all my GlusterFS > machines the shared volume doesn't get mounted. Nevertheless the > volume successfully created and started and further attempts to mount > it manually succeed. I suspect what's happening is that gluster > processes/bricks/etc haven't fully started at the time the /etc/fstab > entry is read and the initial mount attempt is being made. Again, by > the time I log in and run a mount -a -- the volume mounts without any > issues. > > _Details from the logs:_ > > [2015-03-30 22:29:04.381918] I [MSGID: 100030] > [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running > /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs > --log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0 > --entry-timeout=0 --volfile-server=localhost > --volfile-server=10.12.130.21 --volfile-server=10.12.130.22 > --volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared) > [2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish] > 0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007> > failed (Connection refused) > [2015-03-30 22:29:04.394950] E > [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to > connect with remote-host: localhost (Transport endpoint is not connected) > [2015-03-30 22:29:04.394964] I > [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting > to next volfile server 10.12.130.21 > [2015-03-30 22:29:08.390687] E > [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to > connect with remote-host: 10.12.130.21 (Transport endpoint is not > connected) > [2015-03-30 22:29:08.390720] I > [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting > to next volfile server 10.12.130.22 > [2015-03-30 22:29:11.392015] E > [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to > connect with remote-host: 10.12.130.22 (Transport endpoint is not > connected) > [2015-03-30 22:29:11.392050] I > [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting > to next volfile server 10.12.130.23 > [2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex] > 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ > [2015-03-30 22:29:14.408964] I > [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting > frame-timeout to 60 > [2015-03-30 22:29:14.409183] I > [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting > frame-timeout to 60 > [2015-03-30 22:29:14.409388] I > [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting > frame-timeout to 60 > [2015-03-30 22:29:14.409430] I [client.c:2280:notify] 0-host-client-0: > parent translators are ready, attempting connect on transport > [2015-03-30 22:29:14.409658] I [client.c:2280:notify] 0-host-client-1: > parent translators are ready, attempting connect on transport > [2015-03-30 22:29:14.409844] I [client.c:2280:notify] 0-host-client-2: > parent translators are ready, attempting connect on transport > Final graph: > > .... > > [2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify] > 0-host-client-2: disconnected from host-client-2. Client process will > keep trying to connect to glusterd until brick's port is available > *[2015-03-30 22:29:14.411063] E [MSGID: 108006] > [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes > are down. Going offline until atleast one of them comes back up. > *[2015-03-30 22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup] > 0-fuse: switched to graph 0 > [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 > kernel 7.17 > [2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init] > 0-myvolume-replicate-0: no subvolumes up > [2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init] > 0-myvolume-replicate-0: no subvolumes up > [2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk] > 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not > connected) > [2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc] > 0-fuse: unmounting /opt/shared > *[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit] > (--> 0-: received signum (15), shutting down* > [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse: > Unmounting '/opt/shared'. > > > _Relevant /etc/fstab entries are:_ > > /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0 > > localhost:/myvolume /opt/shared glusterfs > defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23 > 0 0 > > > _Volume configuration is:_ > > Volume Name: myvolume > Type: Replicate > Volume ID: xxxx > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: host1:/opt/local/brick > Brick2: host2:/opt/local/brick > Brick3: host3:/opt/local/brick > Options Reconfigured: > storage.health-check-interval: 5 > network.ping-timeout: 5 > nfs.disable: on > auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23 > cluster.quorum-type: auto > network.frame-timeout: 60 > > > I run Debian 7 and the following GlusterFS version 3.6.2-2. > > While I could together some rc.local type of script which retries to > mount the volume for a while until it succeeds or times out I was > wondering if there's a better way to solve this problem? > > Thank you for your help. > > Regards, > -- > Rumen Telbizov > Unix Systems Administrator <http://telbizov.com> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150331/70778b90/attachment.html>
Atin Mukherjee
2015-Mar-31 07:23 UTC
[Gluster-users] Initial mount problem - all subvolumes are down
On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:> Atin, > Could it be because bricks are started with PROC_START_NO_WAIT?That's the correct analysis Pranith. Mount was attempted before the bricks were started. If we can have a time lag in some seconds between mount and volume start the problem will go away.> > Pranith > On 03/31/2015 04:41 AM, Rumen Telbizov wrote: >> Hello everyone, >> >> I have a problem that I am trying to resolve and not sure which way to >> go so here I am asking for your advise. >> >> What it comes down to is that upon initial boot of all my GlusterFS >> machines the shared volume doesn't get mounted. Nevertheless the >> volume successfully created and started and further attempts to mount >> it manually succeed. I suspect what's happening is that gluster >> processes/bricks/etc haven't fully started at the time the /etc/fstab >> entry is read and the initial mount attempt is being made. Again, by >> the time I log in and run a mount -a -- the volume mounts without any >> issues. >> >> _Details from the logs:_ >> >> [2015-03-30 22:29:04.381918] I [MSGID: 100030] >> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running >> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs >> --log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0 >> --entry-timeout=0 --volfile-server=localhost >> --volfile-server=10.12.130.21 --volfile-server=10.12.130.22 >> --volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared) >> [2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish] >> 0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007> >> failed (Connection refused) >> [2015-03-30 22:29:04.394950] E >> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to >> connect with remote-host: localhost (Transport endpoint is not connected) >> [2015-03-30 22:29:04.394964] I >> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting >> to next volfile server 10.12.130.21 >> [2015-03-30 22:29:08.390687] E >> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to >> connect with remote-host: 10.12.130.21 (Transport endpoint is not >> connected) >> [2015-03-30 22:29:08.390720] I >> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting >> to next volfile server 10.12.130.22 >> [2015-03-30 22:29:11.392015] E >> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to >> connect with remote-host: 10.12.130.22 (Transport endpoint is not >> connected) >> [2015-03-30 22:29:11.392050] I >> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting >> to next volfile server 10.12.130.23 >> [2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex] >> 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ >> [2015-03-30 22:29:14.408964] I >> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting >> frame-timeout to 60 >> [2015-03-30 22:29:14.409183] I >> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting >> frame-timeout to 60 >> [2015-03-30 22:29:14.409388] I >> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting >> frame-timeout to 60 >> [2015-03-30 22:29:14.409430] I [client.c:2280:notify] 0-host-client-0: >> parent translators are ready, attempting connect on transport >> [2015-03-30 22:29:14.409658] I [client.c:2280:notify] 0-host-client-1: >> parent translators are ready, attempting connect on transport >> [2015-03-30 22:29:14.409844] I [client.c:2280:notify] 0-host-client-2: >> parent translators are ready, attempting connect on transport >> Final graph: >> >> .... >> >> [2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify] >> 0-host-client-2: disconnected from host-client-2. Client process will >> keep trying to connect to glusterd until brick's port is available >> *[2015-03-30 22:29:14.411063] E [MSGID: 108006] >> [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes >> are down. Going offline until atleast one of them comes back up. >> *[2015-03-30 22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup] >> 0-fuse: switched to graph 0 >> [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 >> kernel 7.17 >> [2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init] >> 0-myvolume-replicate-0: no subvolumes up >> [2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init] >> 0-myvolume-replicate-0: no subvolumes up >> [2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk] >> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not >> connected) >> [2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc] >> 0-fuse: unmounting /opt/shared >> *[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit] >> (--> 0-: received signum (15), shutting down* >> [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse: >> Unmounting '/opt/shared'. >> >> >> _Relevant /etc/fstab entries are:_ >> >> /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0 >> >> localhost:/myvolume /opt/shared glusterfs >> defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23 >> 0 0 >> >> >> _Volume configuration is:_ >> >> Volume Name: myvolume >> Type: Replicate >> Volume ID: xxxx >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: host1:/opt/local/brick >> Brick2: host2:/opt/local/brick >> Brick3: host3:/opt/local/brick >> Options Reconfigured: >> storage.health-check-interval: 5 >> network.ping-timeout: 5 >> nfs.disable: on >> auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23 >> cluster.quorum-type: auto >> network.frame-timeout: 60 >> >> >> I run Debian 7 and the following GlusterFS version 3.6.2-2. >> >> While I could together some rc.local type of script which retries to >> mount the volume for a while until it succeeds or times out I was >> wondering if there's a better way to solve this problem? >> >> Thank you for your help. >> >> Regards, >> -- >> Rumen Telbizov >> Unix Systems Administrator <http://telbizov.com> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users > >-- ~Atin