Danny Lee
2016-Sep-15 21:24 UTC
[Gluster-users] RPC program not available (req 1298437 330)
Hi, Environment: Gluster Version: 3.8.3 Operating System: CentOS Linux 7 (Core) Kernel: Linux 3.10.0-327.28.3.el7.x86_64 Architecture: x86-64 Replicated 3-Node Volume ~400GB of around a million files Description of Problem: One of the brick dies. The only suspect log I see is in the etc-glusterfs-glusterd.vol.log (shown below). Trying to get an idea of why the brick died and how it could be prevented in the future. During this time, I was forcing replication (find . | xargs stat on the mount). There were some services starting up as well that was using the gluster mount. [2016-09-13 20:01:50.033369] W [socket.c:590:__socket_rwv] 0-management: readv on /var/run/gluster/cfc57a83cf77779864900aa08380be93.socket failed (No data available) [2016-09-13 20:01:50.033830] I [MSGID: 106005] [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected from glusterd. [2016-09-13 20:01:50.121316] W [rpcsvc.c:265:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) for 172.17.32.28:49146 [2016-09-13 20:01:50.121339] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2016-09-13 20:01:50.121383] W [rpcsvc.c:265:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) for 172.17.32.28:49146 [2016-09-13 20:01:50.121392] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully The message "I [MSGID: 106005] [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected from glusterd." repeated 34 times between [2016-09-13 20:01:50.033830] and [2016-09-13 20:03:40.010862] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160915/cb5ff1e4/attachment.html>
Atin Mukherjee
2016-Sep-16 17:49 UTC
[Gluster-users] RPC program not available (req 1298437 330)
On Friday 16 September 2016, Danny Lee <dannyl at vt.edu> wrote:> Hi, > > Environment: > Gluster Version: 3.8.3 > Operating System: CentOS Linux 7 (Core) > Kernel: Linux 3.10.0-327.28.3.el7.x86_64 > Architecture: x86-64 > Replicated 3-Node Volume > ~400GB of around a million files > > Description of Problem: > One of the brick dies. The only suspect log I see is in the > etc-glusterfs-glusterd.vol.log (shown below). Trying to get an idea of why > the brick died and how it could be prevented in the future. > > During this time, I was forcing replication (find . | xargs stat on the > mount). There were some services starting up as well that was using the > gluster mount. > > [2016-09-13 20:01:50.033369] W [socket.c:590:__socket_rwv] 0-management: > readv on /var/run/gluster/cfc57a83cf77779864900aa08380be93.socket failed > (No data available) > [2016-09-13 20:01:50.033830] I [MSGID: 106005] [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] > 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data > has disconnected from glusterd. > [2016-09-13 20:01:50.121316] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121339] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully > [2016-09-13 20:01:50.121383] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121392] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully >I haven't checked the code yet, but at a guess a brick op (in transit) failed here when the brick went down. The message "I [MSGID: 106005] [glusterd-handler.c:5050:__glusterd_brick_rpc_notify]> 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data > has disconnected from glusterd." repeated 34 times between [2016-09-13 > 20:01:50.033830] and [2016-09-13 20:03:40.010862] >-- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160916/e22407de/attachment.html>
Vijay Bellur
2016-Sep-16 20:37 UTC
[Gluster-users] RPC program not available (req 1298437 330)
Have you checked the brick logs to see if there's anything unusual there? Regards, Vijay On Thu, Sep 15, 2016 at 5:24 PM, Danny Lee <dannyl at vt.edu> wrote:> Hi, > > Environment: > Gluster Version: 3.8.3 > Operating System: CentOS Linux 7 (Core) > Kernel: Linux 3.10.0-327.28.3.el7.x86_64 > Architecture: x86-64 > Replicated 3-Node Volume > ~400GB of around a million files > > Description of Problem: > One of the brick dies. The only suspect log I see is in the > etc-glusterfs-glusterd.vol.log (shown below). Trying to get an idea of why > the brick died and how it could be prevented in the future. > > During this time, I was forcing replication (find . | xargs stat on the > mount). There were some services starting up as well that was using the > gluster mount. > > [2016-09-13 20:01:50.033369] W [socket.c:590:__socket_rwv] 0-management: > readv on /var/run/gluster/cfc57a83cf77779864900aa08380be93.socket failed (No > data available) > [2016-09-13 20:01:50.033830] I [MSGID: 106005] > [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick > 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected > from glusterd. > [2016-09-13 20:01:50.121316] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121339] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully > [2016-09-13 20:01:50.121383] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121392] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully > The message "I [MSGID: 106005] > [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick > 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected > from glusterd." repeated 34 times between [2016-09-13 20:01:50.033830] and > [2016-09-13 20:03:40.010862] > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users