thr3ads.net - Gluster users - [Gluster-users] Initial mount problem

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2015-Mar-31 07:33 UTC

[Gluster-users] Initial mount problem - all subvolumes are down

On 03/31/2015 12:53 PM, Atin Mukherjee wrote:>
> On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:
>> Atin,
>>         Could it be because bricks are started with PROC_START_NO_WAIT?
> That's the correct analysis Pranith. Mount was attempted before the
> bricks were started. If we can have a time lag in some seconds between
> mount and volume start the problem will go away.Atin,
        I think one way to solve this issue is to start the bricks with 
NO_WAIT so that we can handle pmap-signin but wait for the pmap-signins 
to complete before responding to cli/completing 'init'?

Pranith>
>
>> Pranith
>> On 03/31/2015 04:41 AM, Rumen Telbizov wrote:
>>> Hello everyone,
>>>
>>> I have a problem that I am trying to resolve and not sure which way
to
>>> go so here I am asking for your advise.
>>>
>>> What it comes down to is that upon initial boot of all my GlusterFS
>>> machines the shared volume doesn't get mounted. Nevertheless
the
>>> volume successfully created and started and further attempts to
mount
>>> it manually succeed. I suspect what's happening is that gluster
>>> processes/bricks/etc haven't fully started at the time the
/etc/fstab
>>> entry is read and the initial mount attempt is being made. Again,
by
>>> the time I log in and run a mount -a -- the volume mounts without
any
>>> issues.
>>>
>>> _Details from the logs:_
>>>
>>> [2015-03-30 22:29:04.381918] I [MSGID: 100030]
>>> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running
>>> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs
>>> --log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0
>>> --entry-timeout=0 --volfile-server=localhost
>>> --volfile-server=10.12.130.21 --volfile-server=10.12.130.22
>>> --volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared)
>>> [2015-03-30 22:29:04.394913] E
[socket.c:2267:socket_connect_finish]
>>> 0-glusterfs: connection to 127.0.0.1:24007
<http://127.0.0.1:24007>
>>> failed (Connection refused)
>>> [2015-03-30 22:29:04.394950] E
>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed
to
>>> connect with remote-host: localhost (Transport endpoint is not
connected)
>>> [2015-03-30 22:29:04.394964] I
>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt:
connecting
>>> to next volfile server 10.12.130.21
>>> [2015-03-30 22:29:08.390687] E
>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed
to
>>> connect with remote-host: 10.12.130.21 (Transport endpoint is not
>>> connected)
>>> [2015-03-30 22:29:08.390720] I
>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt:
connecting
>>> to next volfile server 10.12.130.22
>>> [2015-03-30 22:29:11.392015] E
>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed
to
>>> connect with remote-host: 10.12.130.22 (Transport endpoint is not
>>> connected)
>>> [2015-03-30 22:29:11.392050] I
>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt:
connecting
>>> to next volfile server 10.12.130.23
>>> [2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex]
>>> 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
>>> [2015-03-30 22:29:14.408964] I
>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting
>>> frame-timeout to 60
>>> [2015-03-30 22:29:14.409183] I
>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting
>>> frame-timeout to 60
>>> [2015-03-30 22:29:14.409388] I
>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting
>>> frame-timeout to 60
>>> [2015-03-30 22:29:14.409430] I [client.c:2280:notify]
0-host-client-0:
>>> parent translators are ready, attempting connect on transport
>>> [2015-03-30 22:29:14.409658] I [client.c:2280:notify]
0-host-client-1:
>>> parent translators are ready, attempting connect on transport
>>> [2015-03-30 22:29:14.409844] I [client.c:2280:notify]
0-host-client-2:
>>> parent translators are ready, attempting connect on transport
>>> Final graph:
>>>
>>> ....
>>>
>>> [2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify]
>>> 0-host-client-2: disconnected from host-client-2. Client process
will
>>> keep trying to connect to glusterd until brick's port is
available
>>> *[2015-03-30 22:29:14.411063] E [MSGID: 108006]
>>> [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All
subvolumes
>>> are down. Going offline until atleast one of them comes back up.
>>> *[2015-03-30 22:29:14.414871] I
[fuse-bridge.c:5080:fuse_graph_setup]
>>> 0-fuse: switched to graph 0
>>> [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
7.22
>>> kernel 7.17
>>> [2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init]
>>> 0-myvolume-replicate-0: no subvolumes up
>>> [2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init]
>>> 0-myvolume-replicate-0: no subvolumes up
>>> [2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk]
>>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not
>>> connected)
>>> [2015-03-30 22:29:14.419007] I
[fuse-bridge.c:4921:fuse_thread_proc]
>>> 0-fuse: unmounting /opt/shared
>>> *[2015-03-30 22:29:14.420176] W
[glusterfsd.c:1194:cleanup_and_exit]
>>> (--> 0-: received signum (15), shutting down*
>>> [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse:
>>> Unmounting '/opt/shared'.
>>>
>>>
>>> _Relevant /etc/fstab entries are:_
>>>
>>> /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0
>>>
>>> localhost:/myvolume /opt/shared glusterfs
>>>
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
>>> 0 0
>>>
>>>
>>> _Volume configuration is:_
>>>
>>> Volume Name: myvolume
>>> Type: Replicate
>>> Volume ID: xxxx
>>> Status: Started
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: host1:/opt/local/brick
>>> Brick2: host2:/opt/local/brick
>>> Brick3: host3:/opt/local/brick
>>> Options Reconfigured:
>>> storage.health-check-interval: 5
>>> network.ping-timeout: 5
>>> nfs.disable: on
>>> auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
>>> cluster.quorum-type: auto
>>> network.frame-timeout: 60
>>>
>>>
>>> I run Debian 7 and the following GlusterFS version 3.6.2-2.
>>>
>>> While I could together some rc.local type of script which retries
to
>>> mount the volume for a while until it succeeds or times out I was
>>> wondering if there's a better way to solve this problem?
>>>
>>> Thank you for your help.
>>>
>>> Regards,
>>> -- 
>>> Rumen Telbizov
>>> Unix Systems Administrator <http://telbizov.com>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>

Atin Mukherjee

2015-Mar-31 08:25 UTC

head link

[Gluster-users] Initial mount problem - all subvolumes are down

On 03/31/2015 01:03 PM, Pranith Kumar Karampuri wrote:> 
> On 03/31/2015 12:53 PM, Atin Mukherjee wrote:
>>
>> On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:
>>> Atin,
>>>         Could it be because bricks are started with
PROC_START_NO_WAIT?
>> That's the correct analysis Pranith. Mount was attempted before the
>> bricks were started. If we can have a time lag in some seconds between
>> mount and volume start the problem will go away.
> Atin,
>        I think one way to solve this issue is to start the bricks with
> NO_WAIT so that we can handle pmap-signin but wait for the pmap-signins
> to complete before responding to cli/completing 'init'?Logically it should solve the problem. We need to think around it more
from the existing design perspective.

~Atin> 
> Pranith
>>
>>
>>> Pranith
>>> On 03/31/2015 04:41 AM, Rumen Telbizov wrote:
>>>> Hello everyone,
>>>>
>>>> I have a problem that I am trying to resolve and not sure which
way to
>>>> go so here I am asking for your advise.
>>>>
>>>> What it comes down to is that upon initial boot of all my
GlusterFS
>>>> machines the shared volume doesn't get mounted.
Nevertheless the
>>>> volume successfully created and started and further attempts to
mount
>>>> it manually succeed. I suspect what's happening is that
gluster
>>>> processes/bricks/etc haven't fully started at the time the
/etc/fstab
>>>> entry is read and the initial mount attempt is being made.
Again, by
>>>> the time I log in and run a mount -a -- the volume mounts
without any
>>>> issues.
>>>>
>>>> _Details from the logs:_
>>>>
>>>> [2015-03-30 22:29:04.381918] I [MSGID: 100030]
>>>> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running
>>>> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs
>>>> --log-file=/var/log/glusterfs/glusterfs.log
--attribute-timeout=0
>>>> --entry-timeout=0 --volfile-server=localhost
>>>> --volfile-server=10.12.130.21 --volfile-server=10.12.130.22
>>>> --volfile-server=10.12.130.23 --volfile-id=/myvolume
/opt/shared)
>>>> [2015-03-30 22:29:04.394913] E
[socket.c:2267:socket_connect_finish]
>>>> 0-glusterfs: connection to 127.0.0.1:24007
<http://127.0.0.1:24007>
>>>> failed (Connection refused)
>>>> [2015-03-30 22:29:04.394950] E
>>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt:
failed to
>>>> connect with remote-host: localhost (Transport endpoint is not
>>>> connected)
>>>> [2015-03-30 22:29:04.394964] I
>>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt:
connecting
>>>> to next volfile server 10.12.130.21
>>>> [2015-03-30 22:29:08.390687] E
>>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt:
failed to
>>>> connect with remote-host: 10.12.130.21 (Transport endpoint is
not
>>>> connected)
>>>> [2015-03-30 22:29:08.390720] I
>>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt:
connecting
>>>> to next volfile server 10.12.130.22
>>>> [2015-03-30 22:29:11.392015] E
>>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt:
failed to
>>>> connect with remote-host: 10.12.130.22 (Transport endpoint is
not
>>>> connected)
>>>> [2015-03-30 22:29:11.392050] I
>>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt:
connecting
>>>> to next volfile server 10.12.130.23
>>>> [2015-03-30 22:29:14.406429] I
[dht-shared.c:337:dht_init_regex]
>>>> 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
>>>> [2015-03-30 22:29:14.408964] I
>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2:
setting
>>>> frame-timeout to 60
>>>> [2015-03-30 22:29:14.409183] I
>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1:
setting
>>>> frame-timeout to 60
>>>> [2015-03-30 22:29:14.409388] I
>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0:
setting
>>>> frame-timeout to 60
>>>> [2015-03-30 22:29:14.409430] I [client.c:2280:notify]
0-host-client-0:
>>>> parent translators are ready, attempting connect on transport
>>>> [2015-03-30 22:29:14.409658] I [client.c:2280:notify]
0-host-client-1:
>>>> parent translators are ready, attempting connect on transport
>>>> [2015-03-30 22:29:14.409844] I [client.c:2280:notify]
0-host-client-2:
>>>> parent translators are ready, attempting connect on transport
>>>> Final graph:
>>>>
>>>> ....
>>>>
>>>> [2015-03-30 22:29:14.411045] I
[client.c:2215:client_rpc_notify]
>>>> 0-host-client-2: disconnected from host-client-2. Client
process will
>>>> keep trying to connect to glusterd until brick's port is
available
>>>> *[2015-03-30 22:29:14.411063] E [MSGID: 108006]
>>>> [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All
subvolumes
>>>> are down. Going offline until atleast one of them comes back
up.
>>>> *[2015-03-30 22:29:14.414871] I
[fuse-bridge.c:5080:fuse_graph_setup]
>>>> 0-fuse: switched to graph 0
>>>> [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init]
>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
7.22
>>>> kernel 7.17
>>>> [2015-03-30 22:29:14.415101] I
[afr-common.c:3722:afr_local_init]
>>>> 0-myvolume-replicate-0: no subvolumes up
>>>> [2015-03-30 22:29:14.415215] I
[afr-common.c:3722:afr_local_init]
>>>> 0-myvolume-replicate-0: no subvolumes up
>>>> [2015-03-30 22:29:14.415236] W
[fuse-bridge.c:779:fuse_attr_cbk]
>>>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is
not
>>>> connected)
>>>> [2015-03-30 22:29:14.419007] I
[fuse-bridge.c:4921:fuse_thread_proc]
>>>> 0-fuse: unmounting /opt/shared
>>>> *[2015-03-30 22:29:14.420176] W
[glusterfsd.c:1194:cleanup_and_exit]
>>>> (--> 0-: received signum (15), shutting down*
>>>> [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini]
0-fuse:
>>>> Unmounting '/opt/shared'.
>>>>
>>>>
>>>> _Relevant /etc/fstab entries are:_
>>>>
>>>> /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0
>>>>
>>>> localhost:/myvolume /opt/shared glusterfs
>>>>
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
>>>>
>>>> 0 0
>>>>
>>>>
>>>> _Volume configuration is:_
>>>>
>>>> Volume Name: myvolume
>>>> Type: Replicate
>>>> Volume ID: xxxx
>>>> Status: Started
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: host1:/opt/local/brick
>>>> Brick2: host2:/opt/local/brick
>>>> Brick3: host3:/opt/local/brick
>>>> Options Reconfigured:
>>>> storage.health-check-interval: 5
>>>> network.ping-timeout: 5
>>>> nfs.disable: on
>>>> auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
>>>> cluster.quorum-type: auto
>>>> network.frame-timeout: 60
>>>>
>>>>
>>>> I run Debian 7 and the following GlusterFS version 3.6.2-2.
>>>>
>>>> While I could together some rc.local type of script which
retries to
>>>> mount the volume for a while until it succeeds or times out I
was
>>>> wondering if there's a better way to solve this problem?
>>>>
>>>> Thank you for your help.
>>>>
>>>> Regards,
>>>> -- 
>>>> Rumen Telbizov
>>>> Unix Systems Administrator <http://telbizov.com>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
> 
> 
> 
-- 
~Atin

Gluster users - Mar 2015 - Initial mount problem - all subvolumes are down

[Gluster-users] Initial mount problem - all subvolumes are down

[Gluster-users] Initial mount problem - all subvolumes are down