Hi all, I have problem pin pointing an error, that users of my system experience processes that crash. The thing that have changed since the craches started is that I added a gluster cluster. Of cause the users start to attack my gluster cluster. I started looking at logs, starting from the client side. I just need help to understand how to read it in the right way. I can see that every ten minutes the client changes port and attach to the remote volume. About five minutes later the client unmounts the volume. I guess that this is the "old" mount and that the "new" mount is already responding to user interaction? As this repeates every ten minutes I see this as normal behavior and just want to get a better understanding on how the client interacts with the cluster. Have you experienced that this switch malfunctions and the mount becomes unreachable for a while? Many thanks in advance! Best regards Marcus Peder?n An example of the output: [2017-11-09 10:10:39.776403] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-interbull-interbull-client-1: changing port to 49160 (from 0) [2017-11-09 10:10:39.776830] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-interbull-interbull-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-11-09 10:10:39.777642] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-0: Connected to interbull-interbull-client-0, attached to remote volume '/interbullfs/i\ nterbull'. [2017-11-09 10:10:39.777663] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-0: Server and Client lk-version numbers are not same, reopening the fds [2017-11-09 10:10:39.777724] I [MSGID: 108005] [afr-common.c:4756:afr_notify] 0-interbull-interbull-replicate-0: Subvolume 'interbull-interbull-client-0' came back up; going online. [2017-11-09 10:10:39.777954] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-interbull-interbull-client-0: Server lk version = 1 [2017-11-09 10:10:39.779909] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-interbull-interbull-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-11-09 10:10:39.780481] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-1: Connected to interbull-interbull-client-1, attached to remote volume '/interbullfs/i\ nterbull'. [2017-11-09 10:10:39.780509] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-1: Server and Client lk-version numbers are not same, reopening the fds [2017-11-09 10:10:39.781544] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-interbull-interbull-client-1: Server lk version = 1 [2017-11-09 10:10:39.781608] I [fuse-bridge.c:4146:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26 [2017-11-09 10:10:39.781632] I [fuse-bridge.c:4831:fuse_graph_sync] 0-fuse: switched to graph 0 [2017-11-09 10:16:10.609922] I [fuse-bridge.c:5089:fuse_thread_proc] 0-fuse: unmounting /interbull [2017-11-09 10:16:10.610258] W [glusterfsd.c:1329:cleanup_and_exit] (-->/usr/lib/libpthread.so.0(+0x72e7) [0x7f98c02282e7] -->/usr/bin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40890d] -->/usr/bin/glusterfs(cleanu\ p_and_exit+0x4b) [0x40878b] ) 0-: received signum (15), shutting down [2017-11-09 10:16:10.610290] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/interbull'. [2017-11-09 10:20:39.752079] I [MSGID: 100030] [glusterfsd.c:2460:main] 0-/usr/bin/glusterfs: Started running /usr/bin/glusterfs version 3.10.1 (args: /usr/bin/glusterfs --negative-timeout=60 --volfile-server=1\ 92.168.67.31 --volfile-id=/interbull-interbull /interbull) [2017-11-09 10:20:39.763902] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2017-11-09 10:20:39.768738] I [afr.c:94:fix_quorum_options] 0-interbull-interbull-replicate-0: reindeer: incoming qtype = none [2017-11-09 10:20:39.768756] I [afr.c:116:fix_quorum_options] 0-interbull-interbull-replicate-0: reindeer: quorum_count = 0 [2017-11-09 10:20:39.768856] W [MSGID: 108040] [afr.c:315:afr_pending_xattrs_init] 0-interbull-interbull-replicate-0: Unable to fetch afr-pending-xattr option from volfile. Falling back to using client translat\ or names. [2017-11-09 10:20:39.769832] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2017-11-09 10:20:39.770193] I [MSGID: 114020] [client.c:2352:notify] 0-interbull-interbull-client-0: parent translators are ready, attempting connect on transport [2017-11-09 10:20:39.773109] I [MSGID: 114020] [client.c:2352:notify] 0-interbull-interbull-client-1: parent translators are ready, attempting connect on transport [2017-11-09 10:20:39.773712] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-interbull-interbull-client-0: changing port to 49177 (from 0) -- ************************************************** * Marcus Peders?n * * System administrator * ************************************************** * Interbull Centre * * ================ * * Department of Animal Breeding & Genetics ? SLU * * Box 7023, SE-750 07 * * Uppsala, Sweden * ************************************************** * Visiting address: * * Room 55614, Ulls v?g 26, Ultuna * * Uppsala * * Sweden * * * * Tel: +46-(0)18-67 1962 * * * ************************************************** * ISO 9001 Bureau Veritas No SE004561-1 * **************************************************
Marcus, Please paste the name-version-release of the primary glusterfs package on your system. If possible, also describe the typical workload that happens at the mount via the user application. On Tue, Jan 23, 2018 at 7:43 PM, Marcus Peders?n <marcus.pedersen at slu.se> wrote:> Hi all, > I have problem pin pointing an error, that users of > my system experience processes that crash. > The thing that have changed since the craches started > is that I added a gluster cluster. > Of cause the users start to attack my gluster cluster. > > I started looking at logs, starting from the client side. > I just need help to understand how to read it in the right way. > I can see that every ten minutes the client changes port and > attach to the remote volume. About five minutes later > the client unmounts the volume. > I guess that this is the "old" mount and that the "new" mount > is already responding to user interaction? > > As this repeates every ten minutes I see this as normal behavior > and just want to get a better understanding on how the client > interacts with the cluster. > > Have you experienced that this switch malfunctions and the > mount becomes unreachable for a while? > > Many thanks in advance! > > Best regards > Marcus Peder?n > > An example of the output: > [2017-11-09 10:10:39.776403] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-interbull-interbull-client-1: changing port to 49160 (from 0) > [2017-11-09 10:10:39.776830] I [MSGID: 114057] [client-handshake.c:1451: > select_server_supported_programs] 0-interbull-interbull-client-0: Using > Program GlusterFS 3.3, Num (1298437), Version (330) > [2017-11-09 10:10:39.777642] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] > 0-interbull-interbull-client-0: Connected to > interbull-interbull-client-0, attached to remote volume '/interbullfs/i\ > nterbull'. > [2017-11-09 10:10:39.777663] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] > 0-interbull-interbull-client-0: Server and Client lk-version numbers are > not same, reopening the fds > [2017-11-09 10:10:39.777724] I [MSGID: 108005] > [afr-common.c:4756:afr_notify] 0-interbull-interbull-replicate-0: > Subvolume 'interbull-interbull-client-0' came back up; going online. > [2017-11-09 10:10:39.777954] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] > 0-interbull-interbull-client-0: Server lk version = 1 > [2017-11-09 10:10:39.779909] I [MSGID: 114057] [client-handshake.c:1451: > select_server_supported_programs] 0-interbull-interbull-client-1: Using > Program GlusterFS 3.3, Num (1298437), Version (330) > [2017-11-09 10:10:39.780481] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] > 0-interbull-interbull-client-1: Connected to > interbull-interbull-client-1, attached to remote volume '/interbullfs/i\ > nterbull'. > [2017-11-09 10:10:39.780509] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] > 0-interbull-interbull-client-1: Server and Client lk-version numbers are > not same, reopening the fds > [2017-11-09 10:10:39.781544] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] > 0-interbull-interbull-client-1: Server lk version = 1 > [2017-11-09 10:10:39.781608] I [fuse-bridge.c:4146:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.26 > [2017-11-09 10:10:39.781632] I [fuse-bridge.c:4831:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2017-11-09 10:16:10.609922] I [fuse-bridge.c:5089:fuse_thread_proc] > 0-fuse: unmounting /interbull > [2017-11-09 10:16:10.610258] W [glusterfsd.c:1329:cleanup_and_exit] > (-->/usr/lib/libpthread.so.0(+0x72e7) [0x7f98c02282e7] > -->/usr/bin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40890d] > -->/usr/bin/glusterfs(cleanu\ > p_and_exit+0x4b) [0x40878b] ) 0-: received signum (15), shutting down > [2017-11-09 10:16:10.610290] I [fuse-bridge.c:5802:fini] 0-fuse: > Unmounting '/interbull'. > [2017-11-09 10:20:39.752079] I [MSGID: 100030] [glusterfsd.c:2460:main] > 0-/usr/bin/glusterfs: Started running /usr/bin/glusterfs version 3.10.1 > (args: /usr/bin/glusterfs --negative-timeout=60 --volfile-server=1\ > 92.168.67.31 --volfile-id=/interbull-interbull /interbull) > [2017-11-09 10:20:39.763902] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] > 0-epoll: Started thread with index 1 > [2017-11-09 10:20:39.768738] I [afr.c:94:fix_quorum_options] > 0-interbull-interbull-replicate-0: reindeer: incoming qtype = none > [2017-11-09 10:20:39.768756] I [afr.c:116:fix_quorum_options] > 0-interbull-interbull-replicate-0: reindeer: quorum_count = 0 > [2017-11-09 10:20:39.768856] W [MSGID: 108040] > [afr.c:315:afr_pending_xattrs_init] 0-interbull-interbull-replicate-0: > Unable to fetch afr-pending-xattr option from volfile. Falling back to > using client translat\ > or names. > [2017-11-09 10:20:39.769832] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] > 0-epoll: Started thread with index 2 > [2017-11-09 10:20:39.770193] I [MSGID: 114020] [client.c:2352:notify] > 0-interbull-interbull-client-0: parent translators are ready, attempting > connect on transport > [2017-11-09 10:20:39.773109] I [MSGID: 114020] [client.c:2352:notify] > 0-interbull-interbull-client-1: parent translators are ready, attempting > connect on transport > [2017-11-09 10:20:39.773712] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-interbull-interbull-client-0: changing port to 49177 (from 0) > > > -- > ************************************************** > * Marcus Peders?n * > * System administrator * > ************************************************** > * Interbull Centre * > * ================ * > * Department of Animal Breeding & Genetics ? SLU * > * Box 7023, SE-750 07 * > * Uppsala, Sweden * > ************************************************** > * Visiting address: * > * Room 55614, Ulls v?g 26, Ultuna * > * Uppsala * > * Sweden * > * * > * Tel: +46-(0)18-67 1962 * > * * > ************************************************** > * ISO 9001 Bureau Veritas No SE004561-1 * > ************************************************** > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-- Milind -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180123/a2a1fcea/attachment.html>
Hi, Yes, of cause...should have included it from start. Yes, I know an old version, but I will rebuild a new cluster later on, that is another story. Client side: Archlinux glusterfs 1:3.10.1-1 Sever side: Replicated cluster on two physical machines. Both running: Centos 7 3.10.0-514.16.1.el7.x86_64 Gluster glusterfs 3.8.11 from centos-gluster38 Typical user case(the one we have problem with now; typical): Our users handle genomic evaluations, where loads of calculations are done, intermediate results are saved to files (MB-GB size and up to a hundred files), and used for next calculation step where it is read from file, calculated, written to file aso. a couple of times. The lenght of these processes are about 8-12 hours and up to processes running for up til about 72-96 hours. For this run we had 12 clients (all connected to gluster and all file read/writes done to gluster). On each client we had assign 3 cores to be used to run the processes, and most of the time all 3 cores were beeing used on all 12 clients. Regards Marcus ________________________________ Fr?n: Milind Changire <mchangir at redhat.com> Skickat: den 23 januari 2018 15:46 Till: Marcus Peders?n Kopia: Gluster Users ?mne: Re: [Gluster-users] Understanding client logs Marcus, Please paste the name-version-release of the primary glusterfs package on your system. If possible, also describe the typical workload that happens at the mount via the user application. On Tue, Jan 23, 2018 at 7:43 PM, Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: Hi all, I have problem pin pointing an error, that users of my system experience processes that crash. The thing that have changed since the craches started is that I added a gluster cluster. Of cause the users start to attack my gluster cluster. I started looking at logs, starting from the client side. I just need help to understand how to read it in the right way. I can see that every ten minutes the client changes port and attach to the remote volume. About five minutes later the client unmounts the volume. I guess that this is the "old" mount and that the "new" mount is already responding to user interaction? As this repeates every ten minutes I see this as normal behavior and just want to get a better understanding on how the client interacts with the cluster. Have you experienced that this switch malfunctions and the mount becomes unreachable for a while? Many thanks in advance! Best regards Marcus Peder?n An example of the output: [2017-11-09 10:10:39.776403] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-interbull-interbull-client-1: changing port to 49160 (from 0) [2017-11-09 10:10:39.776830] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-interbull-interbull-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-11-09 10:10:39.777642] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-0: Connected to interbull-interbull-client-0, attached to remote volume '/interbullfs/i\ nterbull'. [2017-11-09 10:10:39.777663] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-0: Server and Client lk-version numbers are not same, reopening the fds [2017-11-09 10:10:39.777724] I [MSGID: 108005] [afr-common.c:4756:afr_notify] 0-interbull-interbull-replicate-0: Subvolume 'interbull-interbull-client-0' came back up; going online. [2017-11-09 10:10:39.777954] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-interbull-interbull-client-0: Server lk version = 1 [2017-11-09 10:10:39.779909] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-interbull-interbull-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-11-09 10:10:39.780481] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-1: Connected to interbull-interbull-client-1, attached to remote volume '/interbullfs/i\ nterbull'. [2017-11-09 10:10:39.780509] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-1: Server and Client lk-version numbers are not same, reopening the fds [2017-11-09 10:10:39.781544] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-interbull-interbull-client-1: Server lk version = 1 [2017-11-09 10:10:39.781608] I [fuse-bridge.c:4146:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26 [2017-11-09 10:10:39.781632] I [fuse-bridge.c:4831:fuse_graph_sync] 0-fuse: switched to graph 0 [2017-11-09 10:16:10.609922] I [fuse-bridge.c:5089:fuse_thread_proc] 0-fuse: unmounting /interbull [2017-11-09 10:16:10.610258] W [glusterfsd.c:1329:cleanup_and_exit] (-->/usr/lib/libpthread.so.0(+0x72e7) [0x7f98c02282e7] -->/usr/bin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40890d] -->/usr/bin/glusterfs(cleanu\ p_and_exit+0x4b) [0x40878b] ) 0-: received signum (15), shutting down [2017-11-09 10:16:10.610290] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/interbull'. [2017-11-09 10:20:39.752079] I [MSGID: 100030] [glusterfsd.c:2460:main] 0-/usr/bin/glusterfs: Started running /usr/bin/glusterfs version 3.10.1 (args: /usr/bin/glusterfs --negative-timeout=60 --volfile-server=1\ 92.168.67.31 --volfile-id=/interbull-interbull /interbull) [2017-11-09 10:20:39.763902] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2017-11-09 10:20:39.768738] I [afr.c:94:fix_quorum_options] 0-interbull-interbull-replicate-0: reindeer: incoming qtype = none [2017-11-09 10:20:39.768756] I [afr.c:116:fix_quorum_options] 0-interbull-interbull-replicate-0: reindeer: quorum_count = 0 [2017-11-09 10:20:39.768856] W [MSGID: 108040] [afr.c:315:afr_pending_xattrs_init] 0-interbull-interbull-replicate-0: Unable to fetch afr-pending-xattr option from volfile. Falling back to using client translat\ or names. [2017-11-09 10:20:39.769832] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2017-11-09 10:20:39.770193] I [MSGID: 114020] [client.c:2352:notify] 0-interbull-interbull-client-0: parent translators are ready, attempting connect on transport [2017-11-09 10:20:39.773109] I [MSGID: 114020] [client.c:2352:notify] 0-interbull-interbull-client-1: parent translators are ready, attempting connect on transport [2017-11-09 10:20:39.773712] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-interbull-interbull-client-0: changing port to 49177 (from 0) -- ************************************************** * Marcus Peders?n * * System administrator * ************************************************** * Interbull Centre * * ================ * * Department of Animal Breeding & Genetics - SLU * * Box 7023, SE-750 07 * * Uppsala, Sweden * ************************************************** * Visiting address: * * Room 55614, Ulls v?g 26, Ultuna * * Uppsala * * Sweden * * * * Tel: +46-(0)18-67 1962 * * * ************************************************** * ISO 9001 Bureau Veritas No SE004561-1 * ************************************************** _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> http://lists.gluster.org/mailman/listinfo/gluster-users -- Milind -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180123/19fb56dd/attachment.html>