thr3ads.net - Gluster users - [Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server? [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2015-Aug-31 17:12 UTC

[Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server?

-Atin
Sent from one plus one
On Aug 31, 2015 10:34 PM, "Merlin Morgenstern" <merlin.morgenstern
at gmail.com>
wrote:>
> Thank you all for your help.
>
> To explain the setup better, here is the goal I am trying to achieve:
>
> - 3 servers running in a cluster, each with a webserver uploading and
serving files to visitors from a common glusterfs share.> - Server1 and Server2 have gluster-server installed
> - One brick replicated between Server1 and Server2 with the goal of
achieving High Availability> - Server1, Server2 and Server3 mount the brick through fuse.
> - Server1 mounts Gluster-Server1 with Backup of Server 2. Same via versa
for Server2>
> Now following scenario:
>
> 1. Server2 dies
>
> In this case Server1 serves as a failover and serves the files forServer1,2,3 until Server1 comes back up again. This
works.>
> 2. Server2 dies. Server1 has to reboot.
>
> In this case the service stays down. It is inpossible to remount theshare without Server1. This is not acceptable for a High Availability
System and I believe also not intended, but a misconfiguration or bug.
This is exactly what I gave as an example in the thread (please read
again). GlusterD is not supposed to start brick process if its other
counter part hasn't come up yet in a 2 node setup. The reason it has been
designed in this way is to block GlusterD on operating on a volume which
could be stale as the node was down and cluster was operational
earlier.>
> Thank you again for looking into this.
>
>
> 2015-08-31 14:10 GMT+02:00 Yiping Peng <barius.cn at gmail.com>:
>>>
>>> One more thing, when I do this on server1, which has been in the
pool
for a long time:>>> server1:~$ mount server1:/vol1 mountpoint
>>> It also fails.
>>> The log gave me:
>>
>>
>> My fault, I used localhost as endpoint.
>>
>> I re-issued "mount -t glusterfs server01:/speech0 qqq"
>> and the log shows a lot of things like:
>>
>> [2015-08-31 12:08:44.801169] W [socket.c:923:__socket_keepalive]0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not
available>> [2015-08-31 12:08:44.801187] E [socket.c:3019:socket_connect]0-speech0-client-43: Failed to set keep-alive: Protocol not
available>> [2015-08-31 12:08:44.801305] W [socket.c:642:__socket_rwv]0-speech0-client-43: readv on 10.88.153.25:24007 failed (Connection reset
by peer)>> [2015-08-31 12:08:44.801404] E [rpc-clnt.c:362:saved_frames_unwind]
(-->/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] )))))
0-speech0-client-43: forced unwinding frame type(GF-DUMP) op(DUMP(1))
called at 2015-08-31 12:08:44.801294 (xid=0x17)>> [2015-08-31 12:08:44.801423] W [MSGID: 114032][client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-43:
received RPC status error [Transport endpoint is not
connected]>> [2015-08-31 12:08:44.801440] I [MSGID: 114018][client.c:2042:client_rpc_notify] 0-speech0-client-43: disconnected from
speech0-client-43. Client process will keep trying to connect to glusterd
until brick's port is available>> [2015-08-31 12:08:44.804488] W [socket.c:923:__socket_keepalive]0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not
available>> [2015-08-31 12:08:44.804505] E [socket.c:3019:socket_connect]0-speech0-client-51: Failed to set keep-alive: Protocol not
available>> [2015-08-31 12:08:44.804775] W [socket.c:642:__socket_rwv]0-speech0-client-51: readv on 10.88.146.19:24007 failed (Connection reset
by peer)>> [2015-08-31 12:08:44.804878] E [rpc-clnt.c:362:saved_frames_unwind]
(-->/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] )))))
0-speech0-client-51: forced unwinding frame type(GF-DUMP) op(DUMP(1))
called at 2015-08-31 12:08:44.804693 (xid=0x18)>> [2015-08-31 12:08:44.804898] W [MSGID: 114032][client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-51:
received RPC status error [Transport endpoint is not
connected]>> [2015-08-31 12:08:44.804917] I [MSGID: 114018][client.c:2042:client_rpc_notify] 0-speech0-client-51: disconnected from
speech0-client-51. Client process will keep trying to connect to glusterd
until brick's port is available>>
>>
>> 2015-08-31 20:06 GMT+08:00 Yiping Peng <barius.cn at gmail.com>:
>>>
>>>
>>>> I believe the following events have happened in the cluster
resulting
>>>> into this situation:
>>>> 1. GlusterD & brick process on node 2 was brought down
>>>> 2. Node 1 was rebooted.
>>>
>>> Strangely enough, glusterfs, glusterd and glusterfsd are running on
myserver. Is glusterfsd the brick process? Also server01 has not been
rebooted during the whole process.>>>
>>> glusterfsd has the following arguments:
>>> /usr/sbin/glusterfsd -s server01.local.net --volfile-idspeech0.server01.local.net.home-glusterfs-speech0-brick0 -p
/var/lib/glusterd/vols/speech0/run/server01.local.net-home-glusterfs-speech0-brick0.pid
-S /var/run/gluster/6bf40a98deade9dde8b615226bc57567.socket --brick-name
/home/glusterfs/speech0/brick0 -l
/var/log/glusterfs/bricks/home-glusterfs-speech0-brick0.log --xlator-option
*-posix.glusterd-uuid=1c33ff18-2a6a-44cf-9a04-727fc96e92be --brick-port
49159 --xlator-option speech0-server.listen-port=49159>>>
>>> One more thing, when I do this on server1, which has been in the
pool
for a long time:>>> server1:~$ mount server1:/vol1 mountpoint
>>> It also fails.
>>> The log gave me:
>>>
>>> [2015-08-31 11:56:57.123307] I [MSGID: 100030]
[glusterfsd.c:2301:main]0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3
(args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/speech0
qqq)>>> [2015-08-31 11:56:57.134642] W [socket.c:923:__socket_keepalive]0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 9, Protocol not
available>>> [2015-08-31 11:56:57.134688] E [socket.c:3019:socket_connect]0-glusterfs: Failed to set keep-alive: Protocol not
available>>> [2015-08-31 11:56:57.135063] I [MSGID: 101190][event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1>>> [2015-08-31 11:56:57.135113] E
[socket.c:2332:socket_connect_finish]0-glusterfs: connection to 127.0.0.1:24007 failed (Connection reset by
peer)>>> [2015-08-31 11:56:57.135149] E
[glusterfsd-mgmt.c:1819:mgmt_rpc_notify]0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport
endpoint is not connected)>>> [2015-08-31 11:56:57.135158] I
[glusterfsd-mgmt.c:1825:mgmt_rpc_notify]
0-glusterfsd-mgmt: Exhausted all volfile servers>>> [2015-08-31 11:56:57.135333] W [glusterfsd.c:1219:cleanup_and_exit](-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3) [0x7fb5e1be39a3]
-->/usr/sbin/glusterfs() [0x4099c8]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received
signum (1), shutting down>>> [2015-08-31 11:56:57.135371] I [fuse-bridge.c:5595:fini] 0-fuse:
Unmounting '/home/speech/pengyiping/qqq'.>>> [2015-08-31 11:56:57.140640] W [glusterfsd.c:1219:cleanup_and_exit](-->/lib64/libpthread.so.0() [0x318b207851]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received
signum (15), shutting down>>>
>>>
>>> Any help is much appreciated.
>>>
>>>
>>> 2015-08-31 19:15 GMT+08:00 Atin Mukherjee <amukherj at
redhat.com>:
>>>>
>>>> I believe the following events have happened in the cluster
resulting
>>>> into this situation:
>>>> 1. GlusterD & brick process on node 2 was brought down
>>>> 2. Node 1 was rebooted.
>>>>
>>>> In the above case the mount will definitely fail since the
brick
process>>>> was not started as in a 2 node set up glusterd waits its peers
to come
>>>> up before it starts the bricks. Could you check whether the
brick
>>>> process is running or not?
>>>>
>>>> Thanks,
>>>> Atin
>>>>
>>>> On 08/31/2015 04:17 PM, Yiping Peng wrote:
>>>> > I've tried both: assuming server1 is already in pool,
server2 is
undergoing>>>> > peer-probing
>>>> >
>>>> > server2:~$ mount server1:/vol1 mountpoint, fail;
>>>> > server2:~$ mount server2:/vol1 mountpoint, fail.
>>>> >
>>>> > Strange enough. I *should* be able to mount server1:/vol1
on
server2. But>>>> > this is not the case :(
>>>> > Maybe something is broken in the server pool, as I'm
seeing
disconnected>>>> > nodes?
>>>> >
>>>> >
>>>> > 2015-08-31 18:02 GMT+08:00 Ravishankar N <ravishankar
at redhat.com>:
>>>> >
>>>> >>
>>>> >>
>>>> >> On 08/31/2015 12:53 PM, Merlin Morgenstern wrote:
>>>> >>
>>>> >> Trying to mount the brick on the same physical server
with deamon
running>>>> >> on this server but not on the other server:
>>>> >>
>>>> >> @node2:~$ sudo mount -t glusterfs gs2:/volume1
/data/nfs
>>>> >> Mount failed. Please check the log file for more
details.
>>>> >>
>>>> >> For mount to succeed the glusterd must be up on the
node that you
specify>>>> >> as the volfile-server; gs2 in this case. You can use
-o
>>>> >> backupvolfile-server=gs1 as a fallback.
>>>> >> -Ravi
>>>> >>
>>>> >> _______________________________________________
>>>> >> Gluster-users mailing list
>>>> >> Gluster-users at gluster.org
>>>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > Gluster-users at gluster.org
>>>> > http://www.gluster.org/mailman/listinfo/gluster-users
>>>> >
>>>
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150831/9ef95dff/attachment.html>

Vijay Bellur

2015-Aug-31 17:41 UTC

head link

[Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server?

On Monday 31 August 2015 10:42 PM, Atin Mukherjee wrote:>  > 2. Server2 dies. Server1 has to reboot.
>  >
>  > In this case the service stays down. It is inpossible to remount the
> share without Server1. This is not acceptable for a High Availability
> System and I believe also not intended, but a misconfiguration or bug.
> This is exactly what I gave as an example in the thread (please read
> again). GlusterD is not supposed to start brick process if its other
> counter part hasn't come up yet in a 2 node setup. The reason it has
> been designed in this way is to block GlusterD on operating on a volume
> which could be stale as the node was down and cluster was operational
> earlier.
For two node deployments, a third dummy node is recommended to ensure 
that quorum is maintained when one of the nodes is down.

Regards,
Vijay

Gluster users - Aug 2015 - Why is it not possible to mount a replicated gluster volume with one Gluster server?

[Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server?

[Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server?