Strahil Nikolov
2022-Jun-17 06:18 UTC
[Gluster-users] Odd "Transport endpoint is not connected" when trying to gunzip a file
Check with top & iotop the load.Especially check the wait for I/O in top. Did you check dmesg for any clues ? Best Regards,Strahil Nikolov On Thu, Jun 16, 2022 at 22:59, Pat Haley<phaley at mit.edu> wrote: Hi Strahil, I poked around our logs, and found this on the front-end (from the day & time of the last time we had the issue) Jun 15 10:51:17 mseas gdata[155485]: [2022-06-15 14:51:17.263858] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-volume-client-2: server 172.16.1.113:49153 has not responded in the last 42 seconds, disconnecting. This would indicate that the problem is related.? For us, however, I believe we can reproduce this issue at will (i.e. simply try to gunzip the same file). Unfortunately I have to go to a meeting now, but if you have some specific tests you'd like me to try, I can try them when I get back. Thanks Pat On 6/16/22 3:07 PM, Strahil Nikolov wrote: Pat,? Can you check the cpu and disk? performance when the volume reports the issue? It seems that similar issue was reportedin?https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html but I don't see a clear solution. Take a look in the thread and check if it matches your symptoms. Best Regards, Strahil Nikolov On Thu, Jun 16, 2022 at 18:14, Pat Haley <phaley at mit.edu> wrote: Hi Strahil, I poked around again and for brick 3 (where the file we were testing resides)? I only found the same log file as was at the bottom of my first Email: --------------------------------------------------- mseas-data3:? bricks/export-sda-brick3.log ----------------------------------------- [2022-06-15 14:50:42.588143] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-data-volume-server: disconnecting connection frommseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28 [2022-06-15 14:50:42.588220] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd cleanup on/projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC [2022-06-15 14:50:42.588259] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd cleanup on /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz [2022-06-15 14:50:42.588288] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting down connectionmseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28 [2022-06-15 14:50:53.605215] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-data-volume-server: accepted client frommseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-29 (version: 3.7.11) [2022-06-15 14:50:42.588247] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd cleanup on/projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC Thanks Pat On 6/15/22 6:47 PM, Strahil Nikolov wrote: I agree. It will be very hard to debug. Anything in the brick logs ? I think it's pointless to mention that EL6 is dead and Gluster v3 is so old that it's worth considering a migration to a newer setup. Best Regards, Strahil Nikolov On Wed, Jun 15, 2022 at 22:51, Yaniv Kaul <ykaul at redhat.com> wrote: ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220617/bb403bc5/attachment.html>
Pat Haley
2022-Jun-21 21:49 UTC
[Gluster-users] Odd "Transport endpoint is not connected" when trying to gunzip a file
Hi Strahil I have tried a couple of tests of trying to gunzip the file with top running on the client (mseas) and on the brick server (mseas-data3) and with iotop running on the client (mseas).? I was not able to install iotop on the brick server yet (the external line is down).? I'll repeat when I fix that problem I now can get one of two error messages when gunzip fails: * gzip: /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz: File descriptor in bad state o a new error message * gzip: /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz: Transport endpoint is not connected o the original error message What I observed while waiting for gunzip to fail * top o no significant load (usually less than 0.1) on both machines. o zero IO-wait on both machines * iotop (only running on the client) o nothing related to gluster showing up in the display at all I include below what I found in the log files again corresponding to these tests (and what I see in dmesg on the brick-server related to gluster, nothing showed up on the client) Please let me know what I should try next. Thanks Pat ------------------------------------------ mseas-data3: dmesg | grep glust ------------------------------------------ many repeats of the following pairs of lines: glusterfsd: page allocation failure. order:1, mode:0x20 Pid: 14245, comm: glusterfsd Not tainted 2.6.32-754.2.1.el6.x86_64 #1 ------------------------------------------ mseas:messages ------------------------------------------ Jun 21 17:04:35 mseas gdata[155485]: [2022-06-21 21:04:35.638810] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-volume-client-2: server 172.16.1.113:49153 has not responded in the last 42 seconds, disconnecting. Jun 21 17:21:04 mseas gdata[155485]: [2022-06-21 21:21:04.786083] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-volume-client-2: server 172.16.1.113:49153 has not responded in the last 42 seconds, disconnecting. ------------------------------------------ mseas:gdata.log ------------------------------------------ [2022-06-21 21:04:35.638810] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-volume-client-2: server 172.16.1.113:49153 has not responded in the last 42 seconds, disconnecting. [2022-06-21 21:04:35.639261] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] ))))) 0-data-volume-client-2: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2022-06-21 21:03:29.735807 (xid=0xc05d54) [2022-06-21 21:04:35.639494] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] ))))) 0-data-volume-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-06-21 21:03:53.633472 (xid=0xc05d55) [2022-06-21 21:21:04.786083] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-volume-client-2: server 172.16.1.113:49153 has not responded in the last 42 seconds, disconnecting. [2022-06-21 21:21:04.786732] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] ))))) 0-data-volume-client-2: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2022-06-21 21:19:52.634383 (xid=0xc05e31) [2022-06-21 21:21:04.787172] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] (--> /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] ))))) 0-data-volume-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-06-21 21:20:22.780023 (xid=0xc05e32) ------------------------------------------ mseas-data3: bricks/export-sda-brick3.log ------------------------------------------ [2022-06-21 21:03:54.489638] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-data-volume-server: disconnecting connection from mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31 [2022-06-21 21:03:54.489752] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd cleanup on /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz [2022-06-21 21:03:54.489817] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting down connection mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31 [2022-06-21 21:04:04.506544] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-data-volume-server: accepted client from mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32 (version: 3.7.11) [2022-06-21 21:20:23.625096] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-data-volume-server: disconnecting connection from mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32 [2022-06-21 21:20:23.625189] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd cleanup on /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz [2022-06-21 21:20:23.625255] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting down connection mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32 [2022-06-21 21:20:23.641462] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-data-volume-server: accepted client from mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-33 (version: 3.7.11) On 6/17/22 2:18 AM, Strahil Nikolov wrote:> Check with top & iotop the load. > Especially check the wait for I/O in top. > > Did you check dmesg for any clues ? > > Best Regards, > Strahil Nikolov > > On Thu, Jun 16, 2022 at 22:59, Pat Haley > <phaley at mit.edu> wrote: > > > Hi Strahil, > > I poked around our logs, and found this on the front-end (from the > day & time of the last time we had the issue) > > > Jun 15 10:51:17 mseas gdata[155485]: [2022-06-15 14:51:17.263858] > C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] > 0-data-volume-client-2: server 172.16.1.113:49153 has not > responded in the last 42 seconds, disconnecting. > > > This would indicate that the problem is related.? For us, however, > I believe we can reproduce this issue at will (i.e. simply try to > gunzip the same file). Unfortunately I have to go to a meeting > now, but if you have some specific tests you'd like me to try, I > can try them when I get back. > > Thanks > > Pat > > > > On 6/16/22 3:07 PM, Strahil Nikolov wrote: > Pat, > > Can you check the cpu and disk? performance when the volume > reports the issue? > > > It seems that similar issue was reported in > https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html > <https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html> > but I don't see a clear solution. > Take a look in the thread and check if it matches your symptoms. > > > Best Regards, > Strahil Nikolov > > On Thu, Jun 16, 2022 at 18:14, Pat Haley > <phaley at mit.edu> <mailto:phaley at mit.edu> wrote: > > > Hi Strahil, > > I poked around again and for brick 3 (where the file we were > testing resides)? I only found the same log file as was at the > bottom of my first Email: > > > --------------------------------------------------- > mseas-data3:? bricks/export-sda-brick3.log > ----------------------------------------- > [2022-06-15 14:50:42.588143] I [MSGID: 115036] > [server.c:552:server_rpc_notify] 0-data-volume-server: > disconnecting connection from > mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28 > [2022-06-15 14:50:42.588220] I [MSGID: 115013] > [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd > cleanup on > /projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC > [2022-06-15 14:50:42.588259] I [MSGID: 115013] > [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd > cleanup on > /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz > [2022-06-15 14:50:42.588288] I [MSGID: 101055] > [client_t.c:420:gf_client_unref] 0-data-volume-server: > Shutting down connection > mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28 > [2022-06-15 14:50:53.605215] I [MSGID: 115029] > [server-handshake.c:690:server_setvolume] > 0-data-volume-server: accepted client from > mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-29 > (version: 3.7.11) > [2022-06-15 14:50:42.588247] I [MSGID: 115013] > [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd > cleanup on > /projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC > > Thanks > > Pat > > > On 6/15/22 6:47 PM, Strahil Nikolov wrote: > I agree. It will be very hard to debug. > > Anything in the brick logs ? > > I think it's pointless to mention that EL6 is dead and Gluster > v3 is so old that it's worth considering a migration to a > newer setup. > > Best Regards, > Strahil Nikolov > > On Wed, Jun 15, 2022 at 22:51, Yaniv Kaul > <ykaul at redhat.com> <mailto:ykaul at redhat.com> wrote: > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > <https://meet.google.com/cpu-eiue-hvk> > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > <https://lists.gluster.org/mailman/listinfo/gluster-users> > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ <http://web.mit.edu/phaley/www/> > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ <http://web.mit.edu/phaley/www/> > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 >-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email:phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220621/908317e4/attachment.html>