Przemysław Mroczek
2015-Mar-07 23:20 UTC
[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]
Hi guys, We have rails app, which is using gluster for our distributed file system. The glusters servers are hosted independently as part of deal with other, we don't have any impact on them, we are connected o them by using gluster native client. We tried to resolve this issue using help from the admins of the company that is hosting our gluster servers, but they say that's the client issue and we ran out of ideas how that's possible if we are not doing anything special here. Information about independent gluster servers: -version: 3.6.0.42.1 - They are using red hat -They are enterprise so the are always using older versions Our servers: System version: Ubuntu 14.04 Our gluster client version: 3.6.2 The exact problem is that it often happens(couple times a week) that errors in gluster causes proceses to become zombies. It happens with our application server(unicorn), nginx and our crawling script that is run as daemon. Our fstab file: 10.10.11.17:/drslk-prod /mnt/storage glusterfs defaults,_netdev,nobootwait,fetch-attempts=10 0 0 10.10.11.17:/drslk-backup /mnt/backup glusterfs defaults,_netdev,nobootwait,fetch-attempts=10 0 0 Logs from gluster: 2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d bc1c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc _clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18 12:36:12.361489 (xid=0x5d475da) [2015-02-18 12:36:12.375765] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: /system/posts/00/00/71/77/59.jpg (2ad81c2b-a141-478d-9dd4-253345edbce b) [2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d bc1c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc _clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18 12:36:12.361858 (xid=0x5d475db) [2015-02-18 12:36:12.376355] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-33b893af103d) [2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_request] 0-drslk-prod-client-10: not connected (priv->connected = 0) [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_submit] 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (drslk-prod-client-10) [2015-02-18 12:36:12.376814] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-000000000000) [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_notify] 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_submit] 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (drslk-prod-client-10) [2015-02-18 12:36:12.376906] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-000000000000) [2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_finish] 0-drslk-prod-client-10: connection to 10.10.11.23:24007 failed (Connection refused) [2015-02-18 12:36:12.379296] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-000000000000) [2015-02-18 12:36:12.379700] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-000000000000) [2015-02-18 13:10:52.759736] E [client-handshake.c:1496:client_query_portmap_cbk] 0-drslk-prod-client-10: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc_notify] 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-drslk-prod-client-10: changing port to 49349 (from 0) [2015-02-18 13:11:02.898097] I [client-handshake.c:1413:select_server_supported_programs] 0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-02-18 13:11:02.898446] I [client-handshake.c:1200:client_setvolume_cbk] 0-drslk-prod-client-10: Connected to drslk-prod-client-10, attached to remote volume '/GLUSTERFS/drslk-prod'. [2015-02-18 13:11:02.898460] I [client-handshake.c:1210:client_setvolume_cbk] 0-drslk-prod-client-10: Server and Client lk-version numbers are not same, reopening the fds Additional logs in attachments. Did anyone encounter similiar issue with gluster? Do you have any ideas how to solve the problem? Best regards, Przemek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150308/57e762a7/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: mnt-storage-241.log Type: text/x-log Size: 12006 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150308/57e762a7/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: mnt-storage-242-old.log Type: text/x-log Size: 24202 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150308/57e762a7/attachment-0001.bin>
Vijay Bellur
2015-Mar-08 00:29 UTC
[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]
On 03/07/2015 06:20 PM, Przemys?aw Mroczek wrote:> Hi guys, > > We have rails app, which is using gluster for our distributed file > system. The glusters servers are hosted independently as part of deal > with other, we don't have any impact on them, we are connected o them by > using gluster native client. > > We tried to resolve this issue using help from the admins of the company > that is hosting our gluster servers, but they say that's the client > issue and we ran out of ideas how that's possible if we are not doing > anything special here. > > Information about independent gluster servers: > -version: 3.6.0.42.1 > - They are using red hat > -They are enterprise so the are always using older versions > > Our servers: > System version: Ubuntu 14.04 > Our gluster client version: 3.6.2 > > The exact problem is that it often happens(couple times a week) that > errors in gluster causes proceses to become zombies. It happens with our > application server(unicorn), nginx and our crawling script that is run > as daemon. > > Our fstab file: > > 10.10.11.17:/drslk-prod /mnt/storage glusterfs > defaults,_netdev,nobootwait,fetch-attempts=10 0 0 > 10.10.11.17:/drslk-backup /mnt/backup glusterfs > defaults,_netdev,nobootwait,fetch-attempts=10 0 0 > > Logs from gluster: > > 2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d > bc1c7e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc > _clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced > unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18 > 12:36:12.361489 (xid=0x5d475da) > [2015-02-18 12:36:12.375765] W > [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: > remote operation failed: Transport endpoint is not connected. Path: > /system/posts/00/00/71/77/59.jpg (2ad81c2b-a141-478d-9dd4-253345edbce > b) > [2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d > bc1c7e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc > _clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced > unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18 > 12:36:12.361858 (xid=0x5d475db) > [2015-02-18 12:36:12.376355] W > [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: > remote operation failed: Transport endpoint is not connected. Path: > /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-33b893af103d) > [2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_request] > 0-drslk-prod-client-10: not connected (priv->connected = 0) > [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_submit] > 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc > Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport > (drslk-prod-client-10) > [2015-02-18 12:36:12.376814] W > [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: > remote operation failed: Transport endpoint is not connected. Path: > (null) (00000000-0000-0000-0000-000000000000) > [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_notify] > 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client > process will keep trying to connect to glusterd until brick's port is > available > [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_submit] > 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd > Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport > (drslk-prod-client-10) > [2015-02-18 12:36:12.376906] W > [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: > remote operation failed: Transport endpoint is not connected. Path: > (null) (00000000-0000-0000-0000-000000000000) > [2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_finish] > 0-drslk-prod-client-10: connection to 10.10.11.23:24007 > <http://10.10.11.23:24007/> failed (Connection refused) > [2015-02-18 12:36:12.379296] W > [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: > remote operation failed: Transport endpoint is not connected. Path: > (null) (00000000-0000-0000-0000-000000000000) > [2015-02-18 12:36:12.379700] W > [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10: > remote operation failed: Transport endpoint is not connected. Path: > (null) (00000000-0000-0000-0000-000000000000) > [2015-02-18 13:10:52.759736] E > [client-handshake.c:1496:client_query_portmap_cbk] > 0-drslk-prod-client-10: failed to get the port number for remote > subvolume. Please run 'gluster volume status' on server to see if brick > process is running. > [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc_notify] > 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client > process will keep trying to connect to glusterd until brick's port is > available > [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt_reconfig] > 0-drslk-prod-client-10: changing port to 49349 (from 0) > [2015-02-18 13:11:02.898097] I > [client-handshake.c:1413:select_server_supported_programs] > 0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437), > Version (330) > [2015-02-18 13:11:02.898446] I > [client-handshake.c:1200:client_setvolume_cbk] 0-drslk-prod-client-10: > Connected to drslk-prod-client-10, attached to remote volume > '/GLUSTERFS/drslk-prod'. > [2015-02-18 13:11:02.898460] I > [client-handshake.c:1210:client_setvolume_cbk] 0-drslk-prod-client-10: > Server and Client lk-version numbers are not same, reopening the fds >Can you provide the gluster volume configuration details? It does look like frame-timeout for the volume has been set to 60. Is there any specific reason? Normally altering the frame-timeout is not recommended. -Vijay