Atin Mukherjee
2017-May-22 16:12 UTC
[Gluster-users] Failure while upgrading gluster to 3.10.1
On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi <pawan at platform.sh> wrote:> > On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > >> >> >> On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> Sorry Pawan, I did miss the other part of the attachments. So looking >>> from the glusterd.info file from all the hosts, it looks like host2 and >>> host3 do not have the correct op-version. Can you please set the op-version >>> as "operating-version=30702" in host2 and host3 and restart glusterd >>> instance one by one on all the nodes? >>> >> >> Please ensure that all the hosts are upgraded to the same bits before >> doing this change. >> > > Having to upgrade all 3 hosts to newer version before gluster could work > successfully on any of them means application downtime. The applications > running on these hosts are expected to be highly available. So with the > way the things are right now, is an online upgrade possible? My upgrade > steps are: (1) stop the applications (2) umount the gluster volume, and > then (3) upgrade gluster one host at a time. >One of the way to mitigate this is to first do an online upgrade to glusterfs-3.7.9 (op-version:30707) given this bug was introduced in 3.7.10 and then come to 3.11.> Our goal is to get gluster upgraded to 3.11 from 3.6.9, and to make this > an online upgrade we are okay to take two steps 3.6.9 -> 3.7 and then 3.7 > to 3.11. > > >> >> >>> >>> Apparently it looks like there is a bug which you have uncovered, during >>> peer handshaking if one of the glusterd instance is running with old bits >>> then during validating the handshake request there is a possibility that >>> uuid received will be blank and the same was ignored however there was a >>> patch http://review.gluster.org/13519 which had some additional changes >>> which was always looking at this field and doing some extra checks which >>> was causing the handshake to fail. For now, the above workaround should >>> suffice. I'll be sending a patch pretty soon. >>> >> >> Posted a patch https://review.gluster.org/#/c/17358 . >> >> >>> >>> >>> >>> On Mon, May 22, 2017 at 11:35 AM, Pawan Alwandi <pawan at platform.sh> >>> wrote: >>> >>>> Hello Atin, >>>> >>>> The tar's have the content of `/var/lib/glusterd` too for all 3 nodes, >>>> please check again. >>>> >>>> Thanks >>>> >>>> On Mon, May 22, 2017 at 11:32 AM, Atin Mukherjee <amukherj at redhat.com> >>>> wrote: >>>> >>>>> Pawan, >>>>> >>>>> I see you have provided the log files from the nodes, however it'd be >>>>> really helpful if you can provide me the content of /var/lib/glusterd from >>>>> all the nodes to get to the root cause of this issue. >>>>> >>>>> On Fri, May 19, 2017 at 12:09 PM, Pawan Alwandi <pawan at platform.sh> >>>>> wrote: >>>>> >>>>>> Hello Atin, >>>>>> >>>>>> Thanks for continued support. I've attached requested files from all >>>>>> 3 nodes. >>>>>> >>>>>> (I think we already verified the UUIDs to be correct, anyway let us >>>>>> know if you find any more info in the logs) >>>>>> >>>>>> Pawan >>>>>> >>>>>> On Thu, May 18, 2017 at 11:45 PM, Atin Mukherjee <amukherj at redhat.com >>>>>> > wrote: >>>>>> >>>>>>> >>>>>>> On Thu, 18 May 2017 at 23:40, Atin Mukherjee <amukherj at redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>>> On Wed, 17 May 2017 at 12:47, Pawan Alwandi <pawan at platform.sh> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello Atin, >>>>>>>>> >>>>>>>>> I realized that these http://gluster.readthedocs.io/ >>>>>>>>> en/latest/Upgrade-Guide/upgrade_to_3.10/ instructions only work >>>>>>>>> for upgrades from 3.7, while we are running 3.6.2. Are there >>>>>>>>> instructions/suggestion you have for us to upgrade from 3.6 version? >>>>>>>>> >>>>>>>>> I believe upgrade from 3.6 to 3.7 and then to 3.10 would work, but >>>>>>>>> I see similar errors reported when I upgraded to 3.7 too. >>>>>>>>> >>>>>>>>> For what its worth, I was able to set the op-version (gluster v >>>>>>>>> set all cluster.op-version 30702) but that doesn't seem to help. >>>>>>>>> >>>>>>>>> [2017-05-17 06:48:33.700014] I [MSGID: 100030] >>>>>>>>> [glusterfsd.c:2338:main] 0-/usr/sbin/glusterd: Started running >>>>>>>>> /usr/sbin/glusterd version 3.7.20 (args: /usr/sbin/glusterd -p >>>>>>>>> /var/run/glusterd.pid) >>>>>>>>> [2017-05-17 06:48:33.703808] I [MSGID: 106478] >>>>>>>>> [glusterd.c:1383:init] 0-management: Maximum allowed open file descriptors >>>>>>>>> set to 65536 >>>>>>>>> [2017-05-17 06:48:33.703836] I [MSGID: 106479] >>>>>>>>> [glusterd.c:1432:init] 0-management: Using /var/lib/glusterd as working >>>>>>>>> directory >>>>>>>>> [2017-05-17 06:48:33.708866] W [MSGID: 103071] >>>>>>>>> [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm >>>>>>>>> event channel creation failed [No such device] >>>>>>>>> [2017-05-17 06:48:33.709011] W [MSGID: 103055] [rdma.c:4901:init] >>>>>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>>>>> [2017-05-17 06:48:33.709033] W [rpc-transport.c:359:rpc_transport_load] >>>>>>>>> 0-rpc-transport: 'rdma' initialization failed >>>>>>>>> [2017-05-17 06:48:33.709088] W [rpcsvc.c:1642:rpcsvc_create_listener] >>>>>>>>> 0-rpc-service: cannot create listener, initing the transport failed >>>>>>>>> [2017-05-17 06:48:33.709105] E [MSGID: 106243] >>>>>>>>> [glusterd.c:1656:init] 0-management: creation of 1 listeners failed, >>>>>>>>> continuing with succeeded transport >>>>>>>>> [2017-05-17 06:48:35.480043] I [MSGID: 106513] >>>>>>>>> [glusterd-store.c:2068:glusterd_restore_op_version] 0-glusterd: >>>>>>>>> retrieved op-version: 30600 >>>>>>>>> [2017-05-17 06:48:35.605779] I [MSGID: 106498] >>>>>>>>> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] >>>>>>>>> 0-management: connect returned 0 >>>>>>>>> [2017-05-17 06:48:35.607059] I [rpc-clnt.c:1046:rpc_clnt_connection_init] >>>>>>>>> 0-management: setting frame-timeout to 600 >>>>>>>>> [2017-05-17 06:48:35.607670] I [rpc-clnt.c:1046:rpc_clnt_connection_init] >>>>>>>>> 0-management: setting frame-timeout to 600 >>>>>>>>> [2017-05-17 06:48:35.607025] I [MSGID: 106498] >>>>>>>>> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] >>>>>>>>> 0-management: connect returned 0 >>>>>>>>> [2017-05-17 06:48:35.608125] I [MSGID: 106544] >>>>>>>>> [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: >>>>>>>>> 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>> >>>>>>>> >>>>>>>>> Final graph: >>>>>>>>> +----------------------------------------------------------- >>>>>>>>> -------------------+ >>>>>>>>> 1: volume management >>>>>>>>> 2: type mgmt/glusterd >>>>>>>>> 3: option rpc-auth.auth-glusterfs on >>>>>>>>> 4: option rpc-auth.auth-unix on >>>>>>>>> 5: option rpc-auth.auth-null on >>>>>>>>> 6: option rpc-auth-allow-insecure on >>>>>>>>> 7: option transport.socket.listen-backlog 128 >>>>>>>>> 8: option event-threads 1 >>>>>>>>> 9: option ping-timeout 0 >>>>>>>>> 10: option transport.socket.read-fail-log off >>>>>>>>> 11: option transport.socket.keepalive-interval 2 >>>>>>>>> 12: option transport.socket.keepalive-time 10 >>>>>>>>> 13: option transport-type rdma >>>>>>>>> 14: option working-directory /var/lib/glusterd >>>>>>>>> 15: end-volume >>>>>>>>> 16: >>>>>>>>> +----------------------------------------------------------- >>>>>>>>> -------------------+ >>>>>>>>> [2017-05-17 06:48:35.609868] I [MSGID: 101190] >>>>>>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started >>>>>>>>> thread with index 1 >>>>>>>>> [2017-05-17 06:48:35.610839] W [socket.c:596:__socket_rwv] >>>>>>>>> 0-management: readv on 192.168.0.7:24007 failed (No data >>>>>>>>> available) >>>>>>>>> [2017-05-17 06:48:35.611907] E [rpc-clnt.c:370:saved_frames_unwind] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380] >>>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) >>>>>>>>> called at 2017-05-17 06:48:35.609965 (xid=0x1) >>>>>>>>> [2017-05-17 06:48:35.611928] E [MSGID: 106167] >>>>>>>>> [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk] >>>>>>>>> 0-management: Error through RPC layer, retry again later >>>>>>>>> [2017-05-17 06:48:35.611944] I [MSGID: 106004] >>>>>>>>> [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] >>>>>>>>> 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>), >>>>>>>>> in state <Peer in Cluster>, has disconnected from glusterd. >>>>>>>>> [2017-05-17 06:48:35.612024] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g >>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b) [0x7fd6bdc4912b] >>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160) [0x7fd6bdc52dd0] >>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3] ) >>>>>>>>> 0-management: Lock for vol shared not held >>>>>>>>> [2017-05-17 06:48:35.612039] W [MSGID: 106118] >>>>>>>>> [glusterd-handler.c:5223:__glusterd_peer_rpc_notify] >>>>>>>>> 0-management: Lock not released for shared >>>>>>>>> [2017-05-17 06:48:35.612079] W [socket.c:596:__socket_rwv] >>>>>>>>> 0-management: readv on 192.168.0.6:24007 failed (No data >>>>>>>>> available) >>>>>>>>> [2017-05-17 06:48:35.612179] E [rpc-clnt.c:370:saved_frames_unwind] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39] >>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380] >>>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) >>>>>>>>> called at 2017-05-17 06:48:35.610007 (xid=0x1) >>>>>>>>> [2017-05-17 06:48:35.612197] E [MSGID: 106167] >>>>>>>>> [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk] >>>>>>>>> 0-management: Error through RPC layer, retry again later >>>>>>>>> [2017-05-17 06:48:35.612211] I [MSGID: 106004] >>>>>>>>> [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] >>>>>>>>> 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d8928805ed95>), >>>>>>>>> in state <Peer in Cluster>, has disconnected from glusterd. >>>>>>>>> [2017-05-17 06:48:35.612292] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g >>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b) [0x7fd6bdc4912b] >>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160) [0x7fd6bdc52dd0] >>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3] ) >>>>>>>>> 0-management: Lock for vol shared not held >>>>>>>>> [2017-05-17 06:48:35.613432] W [MSGID: 106118] >>>>>>>>> [glusterd-handler.c:5223:__glusterd_peer_rpc_notify] >>>>>>>>> 0-management: Lock not released for shared >>>>>>>>> [2017-05-17 06:48:35.614317] E [MSGID: 106170] >>>>>>>>> [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req] >>>>>>>>> 0-management: Request from peer 192.168.0.6:991 has an entry in >>>>>>>>> peerinfo, but uuid does not match >>>>>>>>> >>>>>>>> >>>>>>>> Apologies for delay. My initial suspect was correct. You have an >>>>>>>> incorrect UUID in the peer file which is causing this. Can you please >>>>>>>> provide me the >>>>>>>> >>>>>>> >>>>>>> Clicked the send button accidentally! >>>>>>> >>>>>>> Can you please send me the content of /var/lib/glusterd & glusterd >>>>>>> log from all the nodes? >>>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, May 15, 2017 at 10:31 PM, Atin Mukherjee < >>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, 15 May 2017 at 11:58, Pawan Alwandi <pawan at platform.sh> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Atin, >>>>>>>>>>> >>>>>>>>>>> I see below error. Do I require gluster to be upgraded on all 3 >>>>>>>>>>> hosts for this to work? Right now I have host 1 running 3.10.1 and host 2 >>>>>>>>>>> & 3 running 3.6.2 >>>>>>>>>>> >>>>>>>>>>> # gluster v set all cluster.op-version 31001 >>>>>>>>>>> volume set: failed: Required op_version (31001) is not supported >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes you should given 3.6 version is EOLed. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, May 15, 2017 at 3:32 AM, Atin Mukherjee < >>>>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Sun, 14 May 2017 at 21:43, Atin Mukherjee < >>>>>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Allright, I see that you haven't bumped up the op-version. Can >>>>>>>>>>>>> you please execute: >>>>>>>>>>>>> >>>>>>>>>>>>> gluster v set all cluster.op-version 30101 and then restart >>>>>>>>>>>>> glusterd on all the nodes and check the brick status? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> s/30101/31001 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, May 14, 2017 at 8:55 PM, Pawan Alwandi < >>>>>>>>>>>>> pawan at platform.sh> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Atin, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for looking at this. Below is the output you >>>>>>>>>>>>>> requested for. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Again, I'm seeing those errors after upgrading gluster on >>>>>>>>>>>>>> host 1. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Host 1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>> UUID=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>> operating-version=30600 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>> uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633 >>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>> hostname1=192.168.0.7 >>>>>>>>>>>>>> uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95 >>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>> hostname1=192.168.0.6 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # gluster --version >>>>>>>>>>>>>> glusterfs 3.10.1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Host 2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>> UUID=83e9a0b9-6bd5-483b-8516-d8928805ed95 >>>>>>>>>>>>>> operating-version=30600 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>> uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633 >>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>> hostname1=192.168.0.7 >>>>>>>>>>>>>> uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>> hostname1=192.168.0.5 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # gluster --version >>>>>>>>>>>>>> glusterfs 3.6.2 built on Jan 21 2015 14:23:44 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Host 3 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>> UUID=5ec54b4f-f60c-48c6-9e55-95f2bb58f633 >>>>>>>>>>>>>> operating-version=30600 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>> uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>> hostname1=192.168.0.5 >>>>>>>>>>>>>> uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95 >>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>> hostname1=192.168.0.6 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # gluster --version >>>>>>>>>>>>>> glusterfs 3.6.2 built on Jan 21 2015 14:23:44 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, May 13, 2017 at 6:28 PM, Atin Mukherjee < >>>>>>>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have already asked for the following earlier: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you please provide output of following from all the >>>>>>>>>>>>>>> nodes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>>> cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, 13 May 2017 at 12:22, Pawan Alwandi >>>>>>>>>>>>>>> <pawan at platform.sh> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello folks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does anyone have any idea whats going on here? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Pawan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 5:02 PM, Pawan Alwandi < >>>>>>>>>>>>>>>> pawan at platform.sh> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm trying to upgrade gluster from 3.6.2 to 3.10.1 but >>>>>>>>>>>>>>>>> don't see the glusterfsd and glusterfs processes coming up. >>>>>>>>>>>>>>>>> http://gluster.readthedocs.io/ >>>>>>>>>>>>>>>>> en/latest/Upgrade-Guide/upgrade_to_3.10/ is the process >>>>>>>>>>>>>>>>> that I'm trying to follow. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This is a 3 node server setup with a replicated volume >>>>>>>>>>>>>>>>> having replica count of 3. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Logs below: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.507959] I [MSGID: 100030] >>>>>>>>>>>>>>>>> [glusterfsd.c:2460:main] 0-/usr/sbin/glusterd: Started running >>>>>>>>>>>>>>>>> /usr/sbin/glusterd version 3.10.1 (args: /usr/sbin/glusterd -p >>>>>>>>>>>>>>>>> /var/run/glusterd.pid) >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.512827] I [MSGID: 106478] >>>>>>>>>>>>>>>>> [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors >>>>>>>>>>>>>>>>> set to 65536 >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.512855] I [MSGID: 106479] >>>>>>>>>>>>>>>>> [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working >>>>>>>>>>>>>>>>> directory >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520426] W [MSGID: 103071] >>>>>>>>>>>>>>>>> [rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma: >>>>>>>>>>>>>>>>> rdma_cm event channel creation failed [No such device] >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520452] W [MSGID: 103055] >>>>>>>>>>>>>>>>> [rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520465] W >>>>>>>>>>>>>>>>> [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: >>>>>>>>>>>>>>>>> 'rdma' initialization failed >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520518] W >>>>>>>>>>>>>>>>> [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: >>>>>>>>>>>>>>>>> cannot create listener, initing the transport failed >>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520534] E [MSGID: 106243] >>>>>>>>>>>>>>>>> [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, >>>>>>>>>>>>>>>>> continuing with succeeded transport >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.931764] I [MSGID: 106513] >>>>>>>>>>>>>>>>> [glusterd-store.c:2197:glusterd_restore_op_version] >>>>>>>>>>>>>>>>> 0-glusterd: retrieved op-version: 30600 >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.964354] I [MSGID: 106544] >>>>>>>>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: >>>>>>>>>>>>>>>>> retrieved UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.993944] I [MSGID: 106498] >>>>>>>>>>>>>>>>> [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] >>>>>>>>>>>>>>>>> 0-management: connect returned 0 >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995864] I [MSGID: 106498] >>>>>>>>>>>>>>>>> [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] >>>>>>>>>>>>>>>>> 0-management: connect returned 0 >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995879] W [MSGID: 106062] >>>>>>>>>>>>>>>>> [glusterd-handler.c:3466:glust >>>>>>>>>>>>>>>>> erd_transport_inet_options_build] 0-glusterd: Failed to >>>>>>>>>>>>>>>>> get tcp-user-timeout >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995903] I >>>>>>>>>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: >>>>>>>>>>>>>>>>> setting frame-timeout to 600 >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.996325] I >>>>>>>>>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: >>>>>>>>>>>>>>>>> setting frame-timeout to 600 >>>>>>>>>>>>>>>>> Final graph: >>>>>>>>>>>>>>>>> +----------------------------- >>>>>>>>>>>>>>>>> -------------------------------------------------+ >>>>>>>>>>>>>>>>> 1: volume management >>>>>>>>>>>>>>>>> 2: type mgmt/glusterd >>>>>>>>>>>>>>>>> 3: option rpc-auth.auth-glusterfs on >>>>>>>>>>>>>>>>> 4: option rpc-auth.auth-unix on >>>>>>>>>>>>>>>>> 5: option rpc-auth.auth-null on >>>>>>>>>>>>>>>>> 6: option rpc-auth-allow-insecure on >>>>>>>>>>>>>>>>> 7: option transport.socket.listen-backlog 128 >>>>>>>>>>>>>>>>> 8: option event-threads 1 >>>>>>>>>>>>>>>>> 9: option ping-timeout 0 >>>>>>>>>>>>>>>>> 10: option transport.socket.read-fail-log off >>>>>>>>>>>>>>>>> 11: option transport.socket.keepalive-interval 2 >>>>>>>>>>>>>>>>> 12: option transport.socket.keepalive-time 10 >>>>>>>>>>>>>>>>> 13: option transport-type rdma >>>>>>>>>>>>>>>>> 14: option working-directory /var/lib/glusterd >>>>>>>>>>>>>>>>> 15: end-volume >>>>>>>>>>>>>>>>> 16: >>>>>>>>>>>>>>>>> +----------------------------- >>>>>>>>>>>>>>>>> -------------------------------------------------+ >>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.996310] W [MSGID: 106062] >>>>>>>>>>>>>>>>> [glusterd-handler.c:3466:glust >>>>>>>>>>>>>>>>> erd_transport_inet_options_build] 0-glusterd: Failed to >>>>>>>>>>>>>>>>> get tcp-user-timeout >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.000461] I [MSGID: 101190] >>>>>>>>>>>>>>>>> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: >>>>>>>>>>>>>>>>> Started thread with index 1 >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001493] W [socket.c:593:__socket_rwv] >>>>>>>>>>>>>>>>> 0-management: readv on 192.168.0.7:24007 failed (No data >>>>>>>>>>>>>>>>> available) >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001513] I [MSGID: 106004] >>>>>>>>>>>>>>>>> [glusterd-handler.c:5882:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>), >>>>>>>>>>>>>>>>> in state <Peer in Cluster>, h >>>>>>>>>>>>>>>>> as disconnected from glusterd. >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001677] W >>>>>>>>>>>>>>>>> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] >>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/ >>>>>>>>>>>>>>>>> glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559) >>>>>>>>>>>>>>>>> [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu >>>>>>>>>>>>>>>>> /glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0) >>>>>>>>>>>>>>>>> [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g >>>>>>>>>>>>>>>>> lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3) >>>>>>>>>>>>>>>>> [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no >>>>>>>>>>>>>>>>> t held >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001696] W [MSGID: 106118] >>>>>>>>>>>>>>>>> [glusterd-handler.c:5907:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>> 0-management: Lock not released for shared >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003099] E >>>>>>>>>>>>>>>>> [rpc-clnt.c:365:saved_frames_unwind] (--> >>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c] >>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s >>>>>>>>>>>>>>>>> aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (--> >>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (--> >>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x >>>>>>>>>>>>>>>>> 91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] ))))) >>>>>>>>>>>>>>>>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called >>>>>>>>>>>>>>>>> at 2017-05-10 09:0 >>>>>>>>>>>>>>>>> 7:05.000627 (xid=0x1) >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003129] E [MSGID: 106167] >>>>>>>>>>>>>>>>> [glusterd-handshake.c:2181:__glusterd_peer_dump_version_cbk] >>>>>>>>>>>>>>>>> 0-management: Error through RPC layer, retry again later >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003251] W [socket.c:593:__socket_rwv] >>>>>>>>>>>>>>>>> 0-management: readv on 192.168.0.6:24007 failed (No data >>>>>>>>>>>>>>>>> available) >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003267] I [MSGID: 106004] >>>>>>>>>>>>>>>>> [glusterd-handler.c:5882:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d8928805ed95>), >>>>>>>>>>>>>>>>> in state <Peer in Cluster>, h >>>>>>>>>>>>>>>>> as disconnected from glusterd. >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003318] W >>>>>>>>>>>>>>>>> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] >>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/ >>>>>>>>>>>>>>>>> glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559) >>>>>>>>>>>>>>>>> [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu >>>>>>>>>>>>>>>>> /glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0) >>>>>>>>>>>>>>>>> [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g >>>>>>>>>>>>>>>>> lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3) >>>>>>>>>>>>>>>>> [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no >>>>>>>>>>>>>>>>> t held >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003329] W [MSGID: 106118] >>>>>>>>>>>>>>>>> [glusterd-handler.c:5907:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>> 0-management: Lock not released for shared >>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003457] E >>>>>>>>>>>>>>>>> [rpc-clnt.c:365:saved_frames_unwind] (--> >>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c] >>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s >>>>>>>>>>>>>>>>> aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (--> >>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (--> >>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x >>>>>>>>>>>>>>>>> 91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] ))))) >>>>>>>>>>>>>>>>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called >>>>>>>>>>>>>>>>> at 2017-05-10 09:0 >>>>>>>>>>>>>>>>> 7:05.001407 (xid=0x1) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are a bunch of errors reported but I'm not sure >>>>>>>>>>>>>>>>> which is signal and which ones are noise. Does anyone have any idea whats >>>>>>>>>>>>>>>>> going on here? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Pawan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> - Atin (atinm) >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>> - Atin (atinm) >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> - Atin (atinm) >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> - Atin (atinm) >>>>>>>> >>>>>>> -- >>>>>>> - Atin (atinm) >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170522/c757880c/attachment.html>
Pawan Alwandi
2017-May-24 10:50 UTC
[Gluster-users] Failure while upgrading gluster to 3.10.1
Thanks Atin, So I got gluster downgraded to 3.7.9 on host 1 and now have the glusterfs and glusterfsd processes come up. But I see the volume is mounted read only. I see these being logged every 3s: [2017-05-24 10:45:44.440435] W [socket.c:852:__socket_keepalive] 0-socket: failed to set keep idle -1 on socket 17, Invalid argument [2017-05-24 10:45:44.440475] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2017-05-24 10:45:44.440734] W [socket.c:852:__socket_keepalive] 0-socket: failed to set keep idle -1 on socket 20, Invalid argument [2017-05-24 10:45:44.440754] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2017-05-24 10:45:44.441354] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_ connection_cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/ libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24 10:45:44.440945 (xid=0xbf) [2017-05-24 10:45:44.441505] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/ glusterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/ glusterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/ glusterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management: Lock for vol shared not held [2017-05-24 10:45:44.441660] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_ connection_cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/ libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24 10:45:44.441086 (xid=0xbf) [2017-05-24 10:45:44.441790] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/ glusterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/ glusterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/ glusterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management: Lock for vol shared not held The heal info says this: # gluster volume heal shared info Brick 192.168.0.5:/data/exports/shared Number of entries: 0 Brick 192.168.0.6:/data/exports/shared Status: Transport endpoint is not connected Brick 192.168.0.7:/data/exports/shared Status: Transport endpoint is not connected Any idea whats up here? Pawan On Mon, May 22, 2017 at 9:42 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi <pawan at platform.sh> wrote: > >> >> On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> >>> >>> On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee <amukherj at redhat.com> >>> wrote: >>> >>>> Sorry Pawan, I did miss the other part of the attachments. So looking >>>> from the glusterd.info file from all the hosts, it looks like host2 >>>> and host3 do not have the correct op-version. Can you please set the >>>> op-version as "operating-version=30702" in host2 and host3 and restart >>>> glusterd instance one by one on all the nodes? >>>> >>> >>> Please ensure that all the hosts are upgraded to the same bits before >>> doing this change. >>> >> >> Having to upgrade all 3 hosts to newer version before gluster could work >> successfully on any of them means application downtime. The applications >> running on these hosts are expected to be highly available. So with the >> way the things are right now, is an online upgrade possible? My upgrade >> steps are: (1) stop the applications (2) umount the gluster volume, and >> then (3) upgrade gluster one host at a time. >> > > One of the way to mitigate this is to first do an online upgrade to > glusterfs-3.7.9 (op-version:30707) given this bug was introduced in 3.7.10 > and then come to 3.11. > > >> Our goal is to get gluster upgraded to 3.11 from 3.6.9, and to make this >> an online upgrade we are okay to take two steps 3.6.9 -> 3.7 and then 3.7 >> to 3.11. >> >> >>> >>> >>>> >>>> Apparently it looks like there is a bug which you have uncovered, >>>> during peer handshaking if one of the glusterd instance is running with old >>>> bits then during validating the handshake request there is a possibility >>>> that uuid received will be blank and the same was ignored however there was >>>> a patch http://review.gluster.org/13519 which had some additional >>>> changes which was always looking at this field and doing some extra checks >>>> which was causing the handshake to fail. For now, the above workaround >>>> should suffice. I'll be sending a patch pretty soon. >>>> >>> >>> Posted a patch https://review.gluster.org/#/c/17358 . >>> >>> >>>> >>>> >>>> >>>> On Mon, May 22, 2017 at 11:35 AM, Pawan Alwandi <pawan at platform.sh> >>>> wrote: >>>> >>>>> Hello Atin, >>>>> >>>>> The tar's have the content of `/var/lib/glusterd` too for all 3 nodes, >>>>> please check again. >>>>> >>>>> Thanks >>>>> >>>>> On Mon, May 22, 2017 at 11:32 AM, Atin Mukherjee <amukherj at redhat.com> >>>>> wrote: >>>>> >>>>>> Pawan, >>>>>> >>>>>> I see you have provided the log files from the nodes, however it'd be >>>>>> really helpful if you can provide me the content of /var/lib/glusterd from >>>>>> all the nodes to get to the root cause of this issue. >>>>>> >>>>>> On Fri, May 19, 2017 at 12:09 PM, Pawan Alwandi <pawan at platform.sh> >>>>>> wrote: >>>>>> >>>>>>> Hello Atin, >>>>>>> >>>>>>> Thanks for continued support. I've attached requested files from >>>>>>> all 3 nodes. >>>>>>> >>>>>>> (I think we already verified the UUIDs to be correct, anyway let us >>>>>>> know if you find any more info in the logs) >>>>>>> >>>>>>> Pawan >>>>>>> >>>>>>> On Thu, May 18, 2017 at 11:45 PM, Atin Mukherjee < >>>>>>> amukherj at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> On Thu, 18 May 2017 at 23:40, Atin Mukherjee <amukherj at redhat.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Wed, 17 May 2017 at 12:47, Pawan Alwandi <pawan at platform.sh> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello Atin, >>>>>>>>>> >>>>>>>>>> I realized that these http://gluster.readthedocs.io/ >>>>>>>>>> en/latest/Upgrade-Guide/upgrade_to_3.10/ instructions only work >>>>>>>>>> for upgrades from 3.7, while we are running 3.6.2. Are there >>>>>>>>>> instructions/suggestion you have for us to upgrade from 3.6 version? >>>>>>>>>> >>>>>>>>>> I believe upgrade from 3.6 to 3.7 and then to 3.10 would work, >>>>>>>>>> but I see similar errors reported when I upgraded to 3.7 too. >>>>>>>>>> >>>>>>>>>> For what its worth, I was able to set the op-version (gluster v >>>>>>>>>> set all cluster.op-version 30702) but that doesn't seem to help. >>>>>>>>>> >>>>>>>>>> [2017-05-17 06:48:33.700014] I [MSGID: 100030] >>>>>>>>>> [glusterfsd.c:2338:main] 0-/usr/sbin/glusterd: Started running >>>>>>>>>> /usr/sbin/glusterd version 3.7.20 (args: /usr/sbin/glusterd -p >>>>>>>>>> /var/run/glusterd.pid) >>>>>>>>>> [2017-05-17 06:48:33.703808] I [MSGID: 106478] >>>>>>>>>> [glusterd.c:1383:init] 0-management: Maximum allowed open file descriptors >>>>>>>>>> set to 65536 >>>>>>>>>> [2017-05-17 06:48:33.703836] I [MSGID: 106479] >>>>>>>>>> [glusterd.c:1432:init] 0-management: Using /var/lib/glusterd as working >>>>>>>>>> directory >>>>>>>>>> [2017-05-17 06:48:33.708866] W [MSGID: 103071] >>>>>>>>>> [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm >>>>>>>>>> event channel creation failed [No such device] >>>>>>>>>> [2017-05-17 06:48:33.709011] W [MSGID: 103055] [rdma.c:4901:init] >>>>>>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>>>>>> [2017-05-17 06:48:33.709033] W [rpc-transport.c:359:rpc_transport_load] >>>>>>>>>> 0-rpc-transport: 'rdma' initialization failed >>>>>>>>>> [2017-05-17 06:48:33.709088] W [rpcsvc.c:1642:rpcsvc_create_listener] >>>>>>>>>> 0-rpc-service: cannot create listener, initing the transport failed >>>>>>>>>> [2017-05-17 06:48:33.709105] E [MSGID: 106243] >>>>>>>>>> [glusterd.c:1656:init] 0-management: creation of 1 listeners failed, >>>>>>>>>> continuing with succeeded transport >>>>>>>>>> [2017-05-17 06:48:35.480043] I [MSGID: 106513] >>>>>>>>>> [glusterd-store.c:2068:glusterd_restore_op_version] 0-glusterd: >>>>>>>>>> retrieved op-version: 30600 >>>>>>>>>> [2017-05-17 06:48:35.605779] I [MSGID: 106498] >>>>>>>>>> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] >>>>>>>>>> 0-management: connect returned 0 >>>>>>>>>> [2017-05-17 06:48:35.607059] I [rpc-clnt.c:1046:rpc_clnt_connection_init] >>>>>>>>>> 0-management: setting frame-timeout to 600 >>>>>>>>>> [2017-05-17 06:48:35.607670] I [rpc-clnt.c:1046:rpc_clnt_connection_init] >>>>>>>>>> 0-management: setting frame-timeout to 600 >>>>>>>>>> [2017-05-17 06:48:35.607025] I [MSGID: 106498] >>>>>>>>>> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] >>>>>>>>>> 0-management: connect returned 0 >>>>>>>>>> [2017-05-17 06:48:35.608125] I [MSGID: 106544] >>>>>>>>>> [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved >>>>>>>>>> UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Final graph: >>>>>>>>>> +----------------------------------------------------------- >>>>>>>>>> -------------------+ >>>>>>>>>> 1: volume management >>>>>>>>>> 2: type mgmt/glusterd >>>>>>>>>> 3: option rpc-auth.auth-glusterfs on >>>>>>>>>> 4: option rpc-auth.auth-unix on >>>>>>>>>> 5: option rpc-auth.auth-null on >>>>>>>>>> 6: option rpc-auth-allow-insecure on >>>>>>>>>> 7: option transport.socket.listen-backlog 128 >>>>>>>>>> 8: option event-threads 1 >>>>>>>>>> 9: option ping-timeout 0 >>>>>>>>>> 10: option transport.socket.read-fail-log off >>>>>>>>>> 11: option transport.socket.keepalive-interval 2 >>>>>>>>>> 12: option transport.socket.keepalive-time 10 >>>>>>>>>> 13: option transport-type rdma >>>>>>>>>> 14: option working-directory /var/lib/glusterd >>>>>>>>>> 15: end-volume >>>>>>>>>> 16: >>>>>>>>>> +----------------------------------------------------------- >>>>>>>>>> -------------------+ >>>>>>>>>> [2017-05-17 06:48:35.609868] I [MSGID: 101190] >>>>>>>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started >>>>>>>>>> thread with index 1 >>>>>>>>>> [2017-05-17 06:48:35.610839] W [socket.c:596:__socket_rwv] >>>>>>>>>> 0-management: readv on 192.168.0.7:24007 failed (No data >>>>>>>>>> available) >>>>>>>>>> [2017-05-17 06:48:35.611907] E [rpc-clnt.c:370:saved_frames_unwind] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380] >>>>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) >>>>>>>>>> called at 2017-05-17 06:48:35.609965 (xid=0x1) >>>>>>>>>> [2017-05-17 06:48:35.611928] E [MSGID: 106167] >>>>>>>>>> [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk] >>>>>>>>>> 0-management: Error through RPC layer, retry again later >>>>>>>>>> [2017-05-17 06:48:35.611944] I [MSGID: 106004] >>>>>>>>>> [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] >>>>>>>>>> 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>), >>>>>>>>>> in state <Peer in Cluster>, has disconnected from glusterd. >>>>>>>>>> [2017-05-17 06:48:35.612024] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] >>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g >>>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b) [0x7fd6bdc4912b] >>>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160) [0x7fd6bdc52dd0] >>>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3] ) >>>>>>>>>> 0-management: Lock for vol shared not held >>>>>>>>>> [2017-05-17 06:48:35.612039] W [MSGID: 106118] >>>>>>>>>> [glusterd-handler.c:5223:__glusterd_peer_rpc_notify] >>>>>>>>>> 0-management: Lock not released for shared >>>>>>>>>> [2017-05-17 06:48:35.612079] W [socket.c:596:__socket_rwv] >>>>>>>>>> 0-management: readv on 192.168.0.6:24007 failed (No data >>>>>>>>>> available) >>>>>>>>>> [2017-05-17 06:48:35.612179] E [rpc-clnt.c:370:saved_frames_unwind] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39] >>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380] >>>>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) >>>>>>>>>> called at 2017-05-17 06:48:35.610007 (xid=0x1) >>>>>>>>>> [2017-05-17 06:48:35.612197] E [MSGID: 106167] >>>>>>>>>> [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk] >>>>>>>>>> 0-management: Error through RPC layer, retry again later >>>>>>>>>> [2017-05-17 06:48:35.612211] I [MSGID: 106004] >>>>>>>>>> [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] >>>>>>>>>> 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d8928805ed95>), >>>>>>>>>> in state <Peer in Cluster>, has disconnected from glusterd. >>>>>>>>>> [2017-05-17 06:48:35.612292] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] >>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g >>>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b) [0x7fd6bdc4912b] >>>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160) [0x7fd6bdc52dd0] >>>>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl >>>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3] ) >>>>>>>>>> 0-management: Lock for vol shared not held >>>>>>>>>> [2017-05-17 06:48:35.613432] W [MSGID: 106118] >>>>>>>>>> [glusterd-handler.c:5223:__glusterd_peer_rpc_notify] >>>>>>>>>> 0-management: Lock not released for shared >>>>>>>>>> [2017-05-17 06:48:35.614317] E [MSGID: 106170] >>>>>>>>>> [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req] >>>>>>>>>> 0-management: Request from peer 192.168.0.6:991 has an entry in >>>>>>>>>> peerinfo, but uuid does not match >>>>>>>>>> >>>>>>>>> >>>>>>>>> Apologies for delay. My initial suspect was correct. You have an >>>>>>>>> incorrect UUID in the peer file which is causing this. Can you please >>>>>>>>> provide me the >>>>>>>>> >>>>>>>> >>>>>>>> Clicked the send button accidentally! >>>>>>>> >>>>>>>> Can you please send me the content of /var/lib/glusterd & glusterd >>>>>>>> log from all the nodes? >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, May 15, 2017 at 10:31 PM, Atin Mukherjee < >>>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, 15 May 2017 at 11:58, Pawan Alwandi <pawan at platform.sh> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Atin, >>>>>>>>>>>> >>>>>>>>>>>> I see below error. Do I require gluster to be upgraded on all >>>>>>>>>>>> 3 hosts for this to work? Right now I have host 1 running 3.10.1 and host >>>>>>>>>>>> 2 & 3 running 3.6.2 >>>>>>>>>>>> >>>>>>>>>>>> # gluster v set all cluster.op-version 31001 >>>>>>>>>>>> volume set: failed: Required op_version (31001) is not supported >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yes you should given 3.6 version is EOLed. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, May 15, 2017 at 3:32 AM, Atin Mukherjee < >>>>>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Sun, 14 May 2017 at 21:43, Atin Mukherjee < >>>>>>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Allright, I see that you haven't bumped up the op-version. >>>>>>>>>>>>>> Can you please execute: >>>>>>>>>>>>>> >>>>>>>>>>>>>> gluster v set all cluster.op-version 30101 and then restart >>>>>>>>>>>>>> glusterd on all the nodes and check the brick status? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> s/30101/31001 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, May 14, 2017 at 8:55 PM, Pawan Alwandi < >>>>>>>>>>>>>> pawan at platform.sh> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Atin, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for looking at this. Below is the output you >>>>>>>>>>>>>>> requested for. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Again, I'm seeing those errors after upgrading gluster on >>>>>>>>>>>>>>> host 1. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Host 1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>>> UUID=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>>> operating-version=30600 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>>> uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633 >>>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>>> hostname1=192.168.0.7 >>>>>>>>>>>>>>> uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95 >>>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>>> hostname1=192.168.0.6 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # gluster --version >>>>>>>>>>>>>>> glusterfs 3.10.1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Host 2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>>> UUID=83e9a0b9-6bd5-483b-8516-d8928805ed95 >>>>>>>>>>>>>>> operating-version=30600 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>>> uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633 >>>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>>> hostname1=192.168.0.7 >>>>>>>>>>>>>>> uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>>> hostname1=192.168.0.5 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # gluster --version >>>>>>>>>>>>>>> glusterfs 3.6.2 built on Jan 21 2015 14:23:44 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Host 3 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>>> UUID=5ec54b4f-f60c-48c6-9e55-95f2bb58f633 >>>>>>>>>>>>>>> operating-version=30600 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>>> uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>>> hostname1=192.168.0.5 >>>>>>>>>>>>>>> uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95 >>>>>>>>>>>>>>> state=3 >>>>>>>>>>>>>>> hostname1=192.168.0.6 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # gluster --version >>>>>>>>>>>>>>> glusterfs 3.6.2 built on Jan 21 2015 14:23:44 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, May 13, 2017 at 6:28 PM, Atin Mukherjee < >>>>>>>>>>>>>>> amukherj at redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have already asked for the following earlier: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you please provide output of following from all the >>>>>>>>>>>>>>>> nodes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> cat /var/lib/glusterd/glusterd.info >>>>>>>>>>>>>>>> cat /var/lib/glusterd/peers/* >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sat, 13 May 2017 at 12:22, Pawan Alwandi >>>>>>>>>>>>>>>> <pawan at platform.sh> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello folks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does anyone have any idea whats going on here? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Pawan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 5:02 PM, Pawan Alwandi < >>>>>>>>>>>>>>>>> pawan at platform.sh> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm trying to upgrade gluster from 3.6.2 to 3.10.1 but >>>>>>>>>>>>>>>>>> don't see the glusterfsd and glusterfs processes coming up. >>>>>>>>>>>>>>>>>> http://gluster.readthedocs.io/ >>>>>>>>>>>>>>>>>> en/latest/Upgrade-Guide/upgrade_to_3.10/ is the process >>>>>>>>>>>>>>>>>> that I'm trying to follow. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This is a 3 node server setup with a replicated volume >>>>>>>>>>>>>>>>>> having replica count of 3. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Logs below: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.507959] I [MSGID: 100030] >>>>>>>>>>>>>>>>>> [glusterfsd.c:2460:main] 0-/usr/sbin/glusterd: Started running >>>>>>>>>>>>>>>>>> /usr/sbin/glusterd version 3.10.1 (args: /usr/sbin/glusterd -p >>>>>>>>>>>>>>>>>> /var/run/glusterd.pid) >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.512827] I [MSGID: 106478] >>>>>>>>>>>>>>>>>> [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors >>>>>>>>>>>>>>>>>> set to 65536 >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.512855] I [MSGID: 106479] >>>>>>>>>>>>>>>>>> [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working >>>>>>>>>>>>>>>>>> directory >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520426] W [MSGID: 103071] >>>>>>>>>>>>>>>>>> [rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma: >>>>>>>>>>>>>>>>>> rdma_cm event channel creation failed [No such device] >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520452] W [MSGID: 103055] >>>>>>>>>>>>>>>>>> [rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520465] W >>>>>>>>>>>>>>>>>> [rpc-transport.c:350:rpc_transport_load] >>>>>>>>>>>>>>>>>> 0-rpc-transport: 'rdma' initialization failed >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520518] W >>>>>>>>>>>>>>>>>> [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: >>>>>>>>>>>>>>>>>> cannot create listener, initing the transport failed >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520534] E [MSGID: 106243] >>>>>>>>>>>>>>>>>> [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, >>>>>>>>>>>>>>>>>> continuing with succeeded transport >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.931764] I [MSGID: 106513] >>>>>>>>>>>>>>>>>> [glusterd-store.c:2197:glusterd_restore_op_version] >>>>>>>>>>>>>>>>>> 0-glusterd: retrieved op-version: 30600 >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.964354] I [MSGID: 106544] >>>>>>>>>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: >>>>>>>>>>>>>>>>>> retrieved UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073 >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.993944] I [MSGID: 106498] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] >>>>>>>>>>>>>>>>>> 0-management: connect returned 0 >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995864] I [MSGID: 106498] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] >>>>>>>>>>>>>>>>>> 0-management: connect returned 0 >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995879] W [MSGID: 106062] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:3466:glust >>>>>>>>>>>>>>>>>> erd_transport_inet_options_build] 0-glusterd: Failed to >>>>>>>>>>>>>>>>>> get tcp-user-timeout >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995903] I >>>>>>>>>>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: >>>>>>>>>>>>>>>>>> setting frame-timeout to 600 >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.996325] I >>>>>>>>>>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: >>>>>>>>>>>>>>>>>> setting frame-timeout to 600 >>>>>>>>>>>>>>>>>> Final graph: >>>>>>>>>>>>>>>>>> +----------------------------- >>>>>>>>>>>>>>>>>> -------------------------------------------------+ >>>>>>>>>>>>>>>>>> 1: volume management >>>>>>>>>>>>>>>>>> 2: type mgmt/glusterd >>>>>>>>>>>>>>>>>> 3: option rpc-auth.auth-glusterfs on >>>>>>>>>>>>>>>>>> 4: option rpc-auth.auth-unix on >>>>>>>>>>>>>>>>>> 5: option rpc-auth.auth-null on >>>>>>>>>>>>>>>>>> 6: option rpc-auth-allow-insecure on >>>>>>>>>>>>>>>>>> 7: option transport.socket.listen-backlog 128 >>>>>>>>>>>>>>>>>> 8: option event-threads 1 >>>>>>>>>>>>>>>>>> 9: option ping-timeout 0 >>>>>>>>>>>>>>>>>> 10: option transport.socket.read-fail-log off >>>>>>>>>>>>>>>>>> 11: option transport.socket.keepalive-interval 2 >>>>>>>>>>>>>>>>>> 12: option transport.socket.keepalive-time 10 >>>>>>>>>>>>>>>>>> 13: option transport-type rdma >>>>>>>>>>>>>>>>>> 14: option working-directory /var/lib/glusterd >>>>>>>>>>>>>>>>>> 15: end-volume >>>>>>>>>>>>>>>>>> 16: >>>>>>>>>>>>>>>>>> +----------------------------- >>>>>>>>>>>>>>>>>> -------------------------------------------------+ >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.996310] W [MSGID: 106062] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:3466:glust >>>>>>>>>>>>>>>>>> erd_transport_inet_options_build] 0-glusterd: Failed to >>>>>>>>>>>>>>>>>> get tcp-user-timeout >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.000461] I [MSGID: 101190] >>>>>>>>>>>>>>>>>> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: >>>>>>>>>>>>>>>>>> Started thread with index 1 >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001493] W >>>>>>>>>>>>>>>>>> [socket.c:593:__socket_rwv] 0-management: readv on >>>>>>>>>>>>>>>>>> 192.168.0.7:24007 failed (No data available) >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001513] I [MSGID: 106004] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:5882:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>), >>>>>>>>>>>>>>>>>> in state <Peer in Cluster>, h >>>>>>>>>>>>>>>>>> as disconnected from glusterd. >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001677] W >>>>>>>>>>>>>>>>>> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] >>>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/ >>>>>>>>>>>>>>>>>> glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559) >>>>>>>>>>>>>>>>>> [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu >>>>>>>>>>>>>>>>>> /glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0) >>>>>>>>>>>>>>>>>> [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g >>>>>>>>>>>>>>>>>> lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3) >>>>>>>>>>>>>>>>>> [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no >>>>>>>>>>>>>>>>>> t held >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001696] W [MSGID: 106118] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:5907:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>>> 0-management: Lock not released for shared >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003099] E >>>>>>>>>>>>>>>>>> [rpc-clnt.c:365:saved_frames_unwind] (--> >>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c] >>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s >>>>>>>>>>>>>>>>>> aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (--> >>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (--> >>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x >>>>>>>>>>>>>>>>>> 91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] ))))) >>>>>>>>>>>>>>>>>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called >>>>>>>>>>>>>>>>>> at 2017-05-10 09:0 >>>>>>>>>>>>>>>>>> 7:05.000627 (xid=0x1) >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003129] E [MSGID: 106167] >>>>>>>>>>>>>>>>>> [glusterd-handshake.c:2181:__glusterd_peer_dump_version_cbk] >>>>>>>>>>>>>>>>>> 0-management: Error through RPC layer, retry again later >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003251] W >>>>>>>>>>>>>>>>>> [socket.c:593:__socket_rwv] 0-management: readv on >>>>>>>>>>>>>>>>>> 192.168.0.6:24007 failed (No data available) >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003267] I [MSGID: 106004] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:5882:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d8928805ed95>), >>>>>>>>>>>>>>>>>> in state <Peer in Cluster>, h >>>>>>>>>>>>>>>>>> as disconnected from glusterd. >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003318] W >>>>>>>>>>>>>>>>>> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] >>>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/ >>>>>>>>>>>>>>>>>> glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559) >>>>>>>>>>>>>>>>>> [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu >>>>>>>>>>>>>>>>>> /glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0) >>>>>>>>>>>>>>>>>> [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g >>>>>>>>>>>>>>>>>> lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3) >>>>>>>>>>>>>>>>>> [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no >>>>>>>>>>>>>>>>>> t held >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003329] W [MSGID: 106118] >>>>>>>>>>>>>>>>>> [glusterd-handler.c:5907:__glusterd_peer_rpc_notify] >>>>>>>>>>>>>>>>>> 0-management: Lock not released for shared >>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003457] E >>>>>>>>>>>>>>>>>> [rpc-clnt.c:365:saved_frames_unwind] (--> >>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c] >>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s >>>>>>>>>>>>>>>>>> aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (--> >>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (--> >>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x >>>>>>>>>>>>>>>>>> 91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg >>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] ))))) >>>>>>>>>>>>>>>>>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called >>>>>>>>>>>>>>>>>> at 2017-05-10 09:0 >>>>>>>>>>>>>>>>>> 7:05.001407 (xid=0x1) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> There are a bunch of errors reported but I'm not sure >>>>>>>>>>>>>>>>>> which is signal and which ones are noise. Does anyone have any idea whats >>>>>>>>>>>>>>>>>> going on here? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Pawan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> - Atin (atinm) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>> - Atin (atinm) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> - Atin (atinm) >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> - Atin (atinm) >>>>>>>>> >>>>>>>> -- >>>>>>>> - Atin (atinm) >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170524/e9fb2eb2/attachment.html>