Good morning. We have a 6 node cluster. 3 nodes are participating in a replica 3 volume. Naming convention: xx01 - 3 nodes participating in ovirt_vol xx02 - 3 nodes NOT particpating in ovirt_vol Last week, restarted glusterd on each node in cluster to update (one at a time). The three xx01 nodes all show the following in glusterd.log: [2018-02-26 14:31:47.330670] E [socket.c:2020:__socket_read_frag] 0-rpc: wrong MSG-TYPE (29386) received from 172.26.30.9:24007 [2018-02-26 14:31:47.330879] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2322a) [0x7f46020e922a] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2d198) [0x7f46020f3198] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0xe4755) [0x7f46021aa755] ) 0-management: Lock for vol ovirtprod_vol not held [2018-02-26 14:31:47.331066] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f460d64dedb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f460d412e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f460d412f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f460d414710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f460d415200] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2018-02-26 14:31:47.330496 (xid=0x72e0) [2018-02-26 14:31:47.333993] E [socket.c:2020:__socket_read_frag] 0-rpc: wrong MSG-TYPE (84253) received from 172.26.30.8:24007 [2018-02-26 14:31:47.334148] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2322a) [0x7f46020e922a] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2d198) [0x7f46020f3198] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0xe4755) [0x7f46021aa755] ) 0-management: Lock for vol ovirtprod_vol not held [2018-02-26 14:31:47.334317] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f460d64dedb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f460d412e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f460d412f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f460d414710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f460d415200] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2018-02-26 14:31:47.333824 (xid=0x1494b) [2018-02-26 14:31:48.511390] E [socket.c:2632:socket_poller] 0-socket.management: poll error on socket Additionally, all show connectivity to 2 of the three hosts (itself, and a second). None of the 3 show connectivity to the same host (xx01 show connectivity to itself and yy01, yy01 show connectivity to itself and zz01, zz01 shows itself and xx01). However, xx02 hosts (non-volume participating, same cluster) show volume info as being fine, and all xx01 hosts participating in volume. In our dev environment, had to stop the volume, and restart glusterd on all hosts, however for prod, that would mean a system wide outage and down time, which needs to be avoided. Any suggestions? Thanks. vk -------------------------------- Vineet Khandpur UNIX System Administrator Information Technology Services University of Alberta Libraries +1-780-492-4718 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180226/c3814d3b/attachment.html>