Mauro Tridici
2019-Feb-04 10:25 UTC
[Gluster-users] RSYNC files renaming issue and timeout errors
Hi All, our users are experiencing the following two problems using our gluster storage based on a 12x(4+2) distributed dispersed volume: 1) sometimes, during RSYNC copies executions, they noticed this error message: rsync: rename "/tier2/CSP/sp1/ftp/lweprsn/.cmcc_CMCC-CM2-v20160423_hindcast_S2005040100_atmos_day_surface_lweprsn_n1-n40.sha256.rD38kX" -> "cmcc_CMCC-CM2-v20160423_hindcast_S2005040100_atmos_day_surface_lweprsn_n1-n40.sha256": Permission denied (13) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6] The files seem to be correctly copied but rsync returns this kind of error and stop the sync process. 2) if the number of the total rsync sessions against the gluster volume grows, it seems that gluster volume starts suffering some particular workload, directories tree navigation becomes very slow and a lot of timeout errors appear on 192.168.0.54 gluster server (named ?s04") I just checked switch, cables, ports and so on, but everything seems to be ok. I also executed a lot of checks using iperf tool and the network seems to be ok. I dont? understand why I see only timeout errors related to s04 host (192.168.0.54). It is up and running? It seems that rsync copy processes started by the gluster volume users cause some noise (on gluster client log file) during temporary files creation. [2019-01-27 02:36:53.488942] E [socket.c:2376:socket_connect_finish] 0-tier2-client-57: connection to 192.168.0.54:49159 failed (Timeout della connessione); disconnecting socke t [2019-01-27 02:53:48.607832] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved _frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d 410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-57: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-01-2 7 02:53:10.568717 (xid=0x4aeefc8) [2019-01-27 02:55:01.631912] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke t [2019-01-27 02:56:05.643880] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke t [2019-01-27 02:57:09.653961] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke t [2019-01-27 03:50:56.951088] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved _frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d 410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2019-01 -27 03:39:48.938568 (xid=0x49e54c1) [2019-01-27 03:50:56.951111] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-54: remote operation failed [Transport endpoint is not connected] [2019-01-27 03:50:56.951471] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved _frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d 410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-01-2 7 03:35:31.328387 (xid=0x49e54bf) [2019-01-27 03:50:56.958357] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved _frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d 410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(IPC(47)) called at 2019-01-27 03:37:49.652707 (xid=0x49e54c0) [2019-01-27 03:50:56.966845] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste] [2019-01-27 03:50:56.967086] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4) [2019-01-27 03:50:56.967224] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4) [2019-01-27 03:50:56.992129] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste] [2019-01-27 03:50:56.995954] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste] [2019-01-27 03:50:56.996126] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4) [2019-01-27 03:50:56.996751] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4) [2019-01-27 03:50:57.001301] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste] [2019-01-27 03:50:57.006375] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4) [2019-01-27 04:02:09.269929] E [socket.c:2376:socket_connect_finish] 0-tier2-client-57: connection to 192.168.0.54:49159 failed (Timeout della connessione); disconnecting socke t I focused my attention on the errors occurred on gluster client during 25th of January 2019. So, I collected the following log files: - tier2.old.log.gz, (the client log file covering the period) - tier2.today.log.gz (the updated client log file reporting similar error related to ?renaming? and ?timeout? issues) - memory_usage.pdf, (weekly statistics for memory usage on 192.168.0.54 host from NAGIOS) - network_usage.pdf, (weekly statistics for network usage on 192.168.0.54 host from NAGIOS) - cpu_usage.pdf, (weekly statistics for network usage on 192.168.0.54 host from NAGIOS) - gluster-mnt6-brick.log-20190127, brick 6 (on host 192.168.0.54) log file - gluster-mnt7-brick.log-20190127, brick 7 (on host 192.168.0.54) log file I detected the involved bricks greping on ?gluster volume status? command output: [root at s04 bricks]# gluster volume status|grep 49158 Brick s02-stg:/gluster/mnt6/brick 49158 0 Y 4147 Brick s03-stg:/gluster/mnt6/brick 49158 0 Y 4272 Brick s01-stg:/gluster/mnt7/brick 49158 0 Y 3324 Brick s04-stg:/gluster/mnt7/brick 49158 0 Y 3787 Brick s05-stg:/gluster/mnt7/brick 49158 0 Y 3131 Brick s06-stg:/gluster/mnt7/brick 49158 0 Y 3254 On gluster server s04 (192.168.0.54), in messages log file, I can see a lot of errors like the following one and nothig more Jan 25 14:16:22 s04 kernel: traps: check_vol_utili[142140] general protection ip:7f8fcec7866d sp:7ffee3a86a30 error:0 in libglusterfs.so.0.0.1[7f8fcec25000+f7000] Jan 25 14:16:22 s04 abrt-hook-ccpp: Process 142140 (python2.7) of user 0 killed by SIGSEGV - dumping core Jan 25 14:16:22 s04 abrt-server: Generating core_backtrace Jan 25 14:16:22 s04 abrt-server: Error: Unable to open './coredump': No such file or directory Jan 25 14:17:00 s04 abrt-server: Duplicate: UUID Jan 25 14:17:00 s04 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-11:18:20-4471 Jan 25 14:17:00 s04 abrt-server: Deleting problem directory ccpp-2019-01-25-14:16:22-142140 (dup of ccpp-2018-09-25-11:18:20-4471) Jan 25 14:17:00 s04 dbus[1877]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Jan 25 14:17:00 s04 dbus[1877]: [system] Successfully activated service 'org.freedesktop.problems' Jan 25 14:17:00 s04 abrt-server: Generating core_backtrace Jan 25 14:17:00 s04 abrt-server: Error: Unable to open './coredump': No such file or directory Jan 25 14:17:00 s04 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-11:18:20-4471' via uReport: Event 'report_uReport' exited with 1 Jan 25 14:17:01 s04 systemd: Created slice User Slice of root. Jan 25 14:17:01 s04 systemd: Starting User Slice of root. Jan 25 14:17:01 s04 systemd: Started Session 140059 of user root. Jan 25 14:17:01 s04 systemd: Starting Session 140059 of user root. Jan 25 14:17:01 s04 systemd: Removed slice User Slice of root. Jan 25 14:17:01 s04 systemd: Stopping User Slice of root. Jan 25 14:18:01 s04 systemd: Created slice User Slice of root. Jan 25 14:18:01 s04 systemd: Starting User Slice of root. Jan 25 14:18:01 s04 systemd: Started Session 140060 of user root. Jan 25 14:18:01 s04 systemd: Starting Session 140060 of user root. Jan 25 14:18:01 s04 systemd: Removed slice User Slice of root. Jan 25 14:18:01 s04 systemd: Stopping User Slice of root. Jan 25 14:18:35 s04 kernel: traps: check_vol_utili[142501] general protection ip:7f119fd7066d sp:7ffc399aa6f0 error:0 in libglusterfs.so.0.0.1[7f119fd1d000+f7000] Jan 25 14:18:35 s04 abrt-hook-ccpp: Process 142501 (python2.7) of user 0 killed by SIGSEGV - dumping core Jan 25 14:18:35 s04 abrt-server: Generating core_backtrace Jan 25 14:18:35 s04 abrt-server: Error: Unable to open './coredump': No such file or directory No errors in /var/log/messages log file related to the client machine. Anyone of us could help me to solve this issue? Thank you very much in advance, Mauro -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0008.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: tier2.old.log.gz Type: application/x-gzip Size: 636711 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0004.gz> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0009.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: tier2.today.log.gz Type: application/x-gzip Size: 325016 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0005.gz> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0010.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: cpu_usage.pdf Type: application/pdf Size: 1011067 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0003.pdf> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0011.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: memory_usage.pdf Type: application/pdf Size: 65668 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0004.pdf> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0012.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: network_usage.pdf Type: application/pdf Size: 67425 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0005.pdf> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0013.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: gluster-mnt6-brick.log-20190127.gz Type: application/x-gzip Size: 497687 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0006.gz> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0014.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: gluster-mnt7-brick.log-20190127.gz Type: application/x-gzip Size: 526789 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0007.gz> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0015.html>