Merlin Morgenstern
2015-Aug-31 17:04 UTC
[Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server?
Thank you all for your help. To explain the setup better, here is the goal I am trying to achieve: - 3 servers running in a cluster, each with a webserver uploading and serving files to visitors from a common glusterfs share. - Server1 and Server2 have gluster-server installed - One brick replicated between Server1 and Server2 with the goal of achieving High Availability - Server1, Server2 and Server3 mount the brick through fuse. - Server1 mounts Gluster-Server1 with Backup of Server 2. Same via versa for Server2 Now following scenario: 1. Server2 dies In this case Server1 serves as a failover and serves the files for Server1,2,3 until Server1 comes back up again. This works. 2. Server2 dies. Server1 has to reboot. In this case the service stays down. It is inpossible to remount the share without Server1. This is not acceptable for a High Availability System and I believe also not intended, but a misconfiguration or bug. Thank you again for looking into this. 2015-08-31 14:10 GMT+02:00 Yiping Peng <barius.cn at gmail.com>:> One more thing, when I do this on server1, which has been in the pool for >> a long time: >> server1:~$ mount server1:/vol1 mountpoint >> It also fails. >> The log gave me: >> > > My fault, I used localhost as endpoint. > > I re-issued "mount -t glusterfs server01:/speech0 qqq" > and the log shows a lot of things like: > > [2015-08-31 12:08:44.801169] W [socket.c:923:__socket_keepalive] 0-socket: > failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available > [2015-08-31 12:08:44.801187] E [socket.c:3019:socket_connect] > 0-speech0-client-43: Failed to set keep-alive: Protocol not available > [2015-08-31 12:08:44.801305] W [socket.c:642:__socket_rwv] > 0-speech0-client-43: readv on 10.88.153.25:24007 failed (Connection reset > by peer) > [2015-08-31 12:08:44.801404] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> > /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] > (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) > 0-speech0-client-43: forced unwinding frame type(GF-DUMP) op(DUMP(1)) > called at 2015-08-31 12:08:44.801294 (xid=0x17) > [2015-08-31 12:08:44.801423] W [MSGID: 114032] > [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-43: > received RPC status error [Transport endpoint is not connected] > [2015-08-31 12:08:44.801440] I [MSGID: 114018] > [client.c:2042:client_rpc_notify] 0-speech0-client-43: disconnected from > speech0-client-43. Client process will keep trying to connect to glusterd > until brick's port is available > [2015-08-31 12:08:44.804488] W [socket.c:923:__socket_keepalive] 0-socket: > failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available > [2015-08-31 12:08:44.804505] E [socket.c:3019:socket_connect] > 0-speech0-client-51: Failed to set keep-alive: Protocol not available > [2015-08-31 12:08:44.804775] W [socket.c:642:__socket_rwv] > 0-speech0-client-51: readv on 10.88.146.19:24007 failed (Connection reset > by peer) > [2015-08-31 12:08:44.804878] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> > /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] > (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) > 0-speech0-client-51: forced unwinding frame type(GF-DUMP) op(DUMP(1)) > called at 2015-08-31 12:08:44.804693 (xid=0x18) > [2015-08-31 12:08:44.804898] W [MSGID: 114032] > [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-51: > received RPC status error [Transport endpoint is not connected] > [2015-08-31 12:08:44.804917] I [MSGID: 114018] > [client.c:2042:client_rpc_notify] 0-speech0-client-51: disconnected from > speech0-client-51. Client process will keep trying to connect to glusterd > until brick's port is available > > > 2015-08-31 20:06 GMT+08:00 Yiping Peng <barius.cn at gmail.com>: > >> >> I believe the following events have happened in the cluster resulting >>> into this situation: >>> 1. GlusterD & brick process on node 2 was brought down >>> 2. Node 1 was rebooted. >>> >> Strangely enough, glusterfs, glusterd and glusterfsd are running on my >> server. Is glusterfsd the brick process? Also server01 has not been >> rebooted during the whole process. >> >> glusterfsd has the following arguments: >> /usr/sbin/glusterfsd -s server01.local.net --volfile-id >> speech0.server01.local.net.home-glusterfs-speech0-brick0 -p >> /var/lib/glusterd/vols/speech0/run/server01.local.net-home-glusterfs-speech0-brick0.pid >> -S /var/run/gluster/6bf40a98deade9dde8b615226bc57567.socket --brick-name >> /home/glusterfs/speech0/brick0 -l >> /var/log/glusterfs/bricks/home-glusterfs-speech0-brick0.log --xlator-option >> *-posix.glusterd-uuid=1c33ff18-2a6a-44cf-9a04-727fc96e92be --brick-port >> 49159 --xlator-option speech0-server.listen-port=49159 >> >> One more thing, when I do this on server1, which has been in the pool for >> a long time: >> server1:~$ mount server1:/vol1 mountpoint >> It also fails. >> The log gave me: >> >> [2015-08-31 11:56:57.123307] I [MSGID: 100030] [glusterfsd.c:2301:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 >> (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/speech0 >> qqq) >> [2015-08-31 11:56:57.134642] W [socket.c:923:__socket_keepalive] >> 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 9, Protocol not >> available >> [2015-08-31 11:56:57.134688] E [socket.c:3019:socket_connect] >> 0-glusterfs: Failed to set keep-alive: Protocol not available >> [2015-08-31 11:56:57.135063] I [MSGID: 101190] >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 1 >> [2015-08-31 11:56:57.135113] E [socket.c:2332:socket_connect_finish] >> 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection reset by >> peer) >> [2015-08-31 11:56:57.135149] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] >> 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport >> endpoint is not connected) >> [2015-08-31 11:56:57.135158] I [glusterfsd-mgmt.c:1825:mgmt_rpc_notify] >> 0-glusterfsd-mgmt: Exhausted all volfile servers >> [2015-08-31 11:56:57.135333] W [glusterfsd.c:1219:cleanup_and_exit] >> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3) [0x7fb5e1be39a3] >> -->/usr/sbin/glusterfs() [0x4099c8] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received >> signum (1), shutting down >> [2015-08-31 11:56:57.135371] I [fuse-bridge.c:5595:fini] 0-fuse: >> Unmounting '/home/speech/pengyiping/qqq'. >> [2015-08-31 11:56:57.140640] W [glusterfsd.c:1219:cleanup_and_exit] >> (-->/lib64/libpthread.so.0() [0x318b207851] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received >> signum (15), shutting down >> >> >> Any help is much appreciated. >> >> >> 2015-08-31 19:15 GMT+08:00 Atin Mukherjee <amukherj at redhat.com>: >> >>> I believe the following events have happened in the cluster resulting >>> into this situation: >>> 1. GlusterD & brick process on node 2 was brought down >>> 2. Node 1 was rebooted. >>> >>> In the above case the mount will definitely fail since the brick process >>> was not started as in a 2 node set up glusterd waits its peers to come >>> up before it starts the bricks. Could you check whether the brick >>> process is running or not? >>> >>> Thanks, >>> Atin >>> >>> On 08/31/2015 04:17 PM, Yiping Peng wrote: >>> > I've tried both: assuming server1 is already in pool, server2 is >>> undergoing >>> > peer-probing >>> > >>> > server2:~$ mount server1:/vol1 mountpoint, fail; >>> > server2:~$ mount server2:/vol1 mountpoint, fail. >>> > >>> > Strange enough. I *should* be able to mount server1:/vol1 on server2. >>> But >>> > this is not the case :( >>> > Maybe something is broken in the server pool, as I'm seeing >>> disconnected >>> > nodes? >>> > >>> > >>> > 2015-08-31 18:02 GMT+08:00 Ravishankar N <ravishankar at redhat.com>: >>> > >>> >> >>> >> >>> >> On 08/31/2015 12:53 PM, Merlin Morgenstern wrote: >>> >> >>> >> Trying to mount the brick on the same physical server with deamon >>> running >>> >> on this server but not on the other server: >>> >> >>> >> @node2:~$ sudo mount -t glusterfs gs2:/volume1 /data/nfs >>> >> Mount failed. Please check the log file for more details. >>> >> >>> >> For mount to succeed the glusterd must be up on the node that you >>> specify >>> >> as the volfile-server; gs2 in this case. You can use -o >>> >> backupvolfile-server=gs1 as a fallback. >>> >> -Ravi >>> >> >>> >> _______________________________________________ >>> >> Gluster-users mailing list >>> >> Gluster-users at gluster.org >>> >> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >>> > >>> > >>> > >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > http://www.gluster.org/mailman/listinfo/gluster-users >>> > >>> >> >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150831/ce49d56c/attachment.html>
Atin Mukherjee
2015-Aug-31 17:12 UTC
[Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server?
-Atin Sent from one plus one On Aug 31, 2015 10:34 PM, "Merlin Morgenstern" <merlin.morgenstern at gmail.com> wrote:> > Thank you all for your help. > > To explain the setup better, here is the goal I am trying to achieve: > > - 3 servers running in a cluster, each with a webserver uploading andserving files to visitors from a common glusterfs share.> - Server1 and Server2 have gluster-server installed > - One brick replicated between Server1 and Server2 with the goal ofachieving High Availability> - Server1, Server2 and Server3 mount the brick through fuse. > - Server1 mounts Gluster-Server1 with Backup of Server 2. Same via versafor Server2> > Now following scenario: > > 1. Server2 dies > > In this case Server1 serves as a failover and serves the files forServer1,2,3 until Server1 comes back up again. This works.> > 2. Server2 dies. Server1 has to reboot. > > In this case the service stays down. It is inpossible to remount theshare without Server1. This is not acceptable for a High Availability System and I believe also not intended, but a misconfiguration or bug. This is exactly what I gave as an example in the thread (please read again). GlusterD is not supposed to start brick process if its other counter part hasn't come up yet in a 2 node setup. The reason it has been designed in this way is to block GlusterD on operating on a volume which could be stale as the node was down and cluster was operational earlier.> > Thank you again for looking into this. > > > 2015-08-31 14:10 GMT+02:00 Yiping Peng <barius.cn at gmail.com>: >>> >>> One more thing, when I do this on server1, which has been in the poolfor a long time:>>> server1:~$ mount server1:/vol1 mountpoint >>> It also fails. >>> The log gave me: >> >> >> My fault, I used localhost as endpoint. >> >> I re-issued "mount -t glusterfs server01:/speech0 qqq" >> and the log shows a lot of things like: >> >> [2015-08-31 12:08:44.801169] W [socket.c:923:__socket_keepalive]0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available>> [2015-08-31 12:08:44.801187] E [socket.c:3019:socket_connect]0-speech0-client-43: Failed to set keep-alive: Protocol not available>> [2015-08-31 12:08:44.801305] W [socket.c:642:__socket_rwv]0-speech0-client-43: readv on 10.88.153.25:24007 failed (Connection reset by peer)>> [2015-08-31 12:08:44.801404] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) 0-speech0-client-43: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-08-31 12:08:44.801294 (xid=0x17)>> [2015-08-31 12:08:44.801423] W [MSGID: 114032][client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-43: received RPC status error [Transport endpoint is not connected]>> [2015-08-31 12:08:44.801440] I [MSGID: 114018][client.c:2042:client_rpc_notify] 0-speech0-client-43: disconnected from speech0-client-43. Client process will keep trying to connect to glusterd until brick's port is available>> [2015-08-31 12:08:44.804488] W [socket.c:923:__socket_keepalive]0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available>> [2015-08-31 12:08:44.804505] E [socket.c:3019:socket_connect]0-speech0-client-51: Failed to set keep-alive: Protocol not available>> [2015-08-31 12:08:44.804775] W [socket.c:642:__socket_rwv]0-speech0-client-51: readv on 10.88.146.19:24007 failed (Connection reset by peer)>> [2015-08-31 12:08:44.804878] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) 0-speech0-client-51: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-08-31 12:08:44.804693 (xid=0x18)>> [2015-08-31 12:08:44.804898] W [MSGID: 114032][client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-51: received RPC status error [Transport endpoint is not connected]>> [2015-08-31 12:08:44.804917] I [MSGID: 114018][client.c:2042:client_rpc_notify] 0-speech0-client-51: disconnected from speech0-client-51. Client process will keep trying to connect to glusterd until brick's port is available>> >> >> 2015-08-31 20:06 GMT+08:00 Yiping Peng <barius.cn at gmail.com>: >>> >>> >>>> I believe the following events have happened in the cluster resulting >>>> into this situation: >>>> 1. GlusterD & brick process on node 2 was brought down >>>> 2. Node 1 was rebooted. >>> >>> Strangely enough, glusterfs, glusterd and glusterfsd are running on myserver. Is glusterfsd the brick process? Also server01 has not been rebooted during the whole process.>>> >>> glusterfsd has the following arguments: >>> /usr/sbin/glusterfsd -s server01.local.net --volfile-idspeech0.server01.local.net.home-glusterfs-speech0-brick0 -p /var/lib/glusterd/vols/speech0/run/server01.local.net-home-glusterfs-speech0-brick0.pid -S /var/run/gluster/6bf40a98deade9dde8b615226bc57567.socket --brick-name /home/glusterfs/speech0/brick0 -l /var/log/glusterfs/bricks/home-glusterfs-speech0-brick0.log --xlator-option *-posix.glusterd-uuid=1c33ff18-2a6a-44cf-9a04-727fc96e92be --brick-port 49159 --xlator-option speech0-server.listen-port=49159>>> >>> One more thing, when I do this on server1, which has been in the poolfor a long time:>>> server1:~$ mount server1:/vol1 mountpoint >>> It also fails. >>> The log gave me: >>> >>> [2015-08-31 11:56:57.123307] I [MSGID: 100030] [glusterfsd.c:2301:main]0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/speech0 qqq)>>> [2015-08-31 11:56:57.134642] W [socket.c:923:__socket_keepalive]0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 9, Protocol not available>>> [2015-08-31 11:56:57.134688] E [socket.c:3019:socket_connect]0-glusterfs: Failed to set keep-alive: Protocol not available>>> [2015-08-31 11:56:57.135063] I [MSGID: 101190][event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1>>> [2015-08-31 11:56:57.135113] E [socket.c:2332:socket_connect_finish]0-glusterfs: connection to 127.0.0.1:24007 failed (Connection reset by peer)>>> [2015-08-31 11:56:57.135149] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify]0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected)>>> [2015-08-31 11:56:57.135158] I [glusterfsd-mgmt.c:1825:mgmt_rpc_notify]0-glusterfsd-mgmt: Exhausted all volfile servers>>> [2015-08-31 11:56:57.135333] W [glusterfsd.c:1219:cleanup_and_exit](-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3) [0x7fb5e1be39a3] -->/usr/sbin/glusterfs() [0x4099c8] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received signum (1), shutting down>>> [2015-08-31 11:56:57.135371] I [fuse-bridge.c:5595:fini] 0-fuse:Unmounting '/home/speech/pengyiping/qqq'.>>> [2015-08-31 11:56:57.140640] W [glusterfsd.c:1219:cleanup_and_exit](-->/lib64/libpthread.so.0() [0x318b207851] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received signum (15), shutting down>>> >>> >>> Any help is much appreciated. >>> >>> >>> 2015-08-31 19:15 GMT+08:00 Atin Mukherjee <amukherj at redhat.com>: >>>> >>>> I believe the following events have happened in the cluster resulting >>>> into this situation: >>>> 1. GlusterD & brick process on node 2 was brought down >>>> 2. Node 1 was rebooted. >>>> >>>> In the above case the mount will definitely fail since the brickprocess>>>> was not started as in a 2 node set up glusterd waits its peers to come >>>> up before it starts the bricks. Could you check whether the brick >>>> process is running or not? >>>> >>>> Thanks, >>>> Atin >>>> >>>> On 08/31/2015 04:17 PM, Yiping Peng wrote: >>>> > I've tried both: assuming server1 is already in pool, server2 isundergoing>>>> > peer-probing >>>> > >>>> > server2:~$ mount server1:/vol1 mountpoint, fail; >>>> > server2:~$ mount server2:/vol1 mountpoint, fail. >>>> > >>>> > Strange enough. I *should* be able to mount server1:/vol1 onserver2. But>>>> > this is not the case :( >>>> > Maybe something is broken in the server pool, as I'm seeingdisconnected>>>> > nodes? >>>> > >>>> > >>>> > 2015-08-31 18:02 GMT+08:00 Ravishankar N <ravishankar at redhat.com>: >>>> > >>>> >> >>>> >> >>>> >> On 08/31/2015 12:53 PM, Merlin Morgenstern wrote: >>>> >> >>>> >> Trying to mount the brick on the same physical server with deamonrunning>>>> >> on this server but not on the other server: >>>> >> >>>> >> @node2:~$ sudo mount -t glusterfs gs2:/volume1 /data/nfs >>>> >> Mount failed. Please check the log file for more details. >>>> >> >>>> >> For mount to succeed the glusterd must be up on the node that youspecify>>>> >> as the volfile-server; gs2 in this case. You can use -o >>>> >> backupvolfile-server=gs1 as a fallback. >>>> >> -Ravi >>>> >> >>>> >> _______________________________________________ >>>> >> Gluster-users mailing list >>>> >> Gluster-users at gluster.org >>>> >> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >> >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > Gluster-users mailing list >>>> > Gluster-users at gluster.org >>>> > http://www.gluster.org/mailman/listinfo/gluster-users >>>> > >>> >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150831/9ef95dff/attachment.html>