Geoffrey Letessier
2015-Jul-22 14:59 UTC
[Gluster-users] Change transport-type on volume from tcp to rdma, tcp
I can confirm your words? Everything looks like OK with TCP proto and more-or-less unstable with RDMA one. But TCP is slower than RDMA protocol? In attachments you can find my volume mount log, all brick logs and several information concerning my vol_shared volume. Thanks in advance, Geoffrey PS: sorry for my next answer latencies but I will be in vacation (from this evening) very far from any internet access. ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ing?nieur syst?me UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr Le 22 juil. 2015 ? 10:45, Mohammed Rafi K C <rkavunga at redhat.com> a ?crit :> > > On 07/22/2015 01:36 PM, Geoffrey Letessier wrote: >> Oops, i forgot to add all people in CC. >> >> Yes, i guessed. >> >> With TCP protocol, all my volume seem OK and I dont note, for the moment, any hang. > > So if I understand correctly , everything is fine with tcp (no hang, no transport end point disconnected error),and both happens for rdma. please correct me if not so. > > >> >> mount command: >> - with RDMA: mount -t glusterfs -o transport=rdma,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt >> - with TCP: mount -t glusterfs -o transport=tcp,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt >> >> volume status: >> # gluster volume status all >> Status of volume: vol_home >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick ib-storage1:/export/brick_home/brick1 >> /data 49159 49165 Y 6547 >> Brick ib-storage2:/export/brick_home/brick1 >> /data 49161 49173 Y 24348 >> Brick ib-storage3:/export/brick_home/brick1 >> /data 49152 49156 Y 5616 >> Brick ib-storage4:/export/brick_home/brick1 >> /data 49152 49162 Y 5424 >> Brick ib-storage1:/export/brick_home/brick2 >> /data 49160 49166 Y 6548 >> Brick ib-storage2:/export/brick_home/brick2 >> /data 49162 49174 Y 24355 >> Brick ib-storage3:/export/brick_home/brick2 >> /data 49153 49157 Y 5635 >> Brick ib-storage4:/export/brick_home/brick2 >> /data 49153 49163 Y 5443 >> Self-heal Daemon on localhost N/A N/A Y 6534 >> Self-heal Daemon on ib-storage3 N/A N/A Y 7656 >> Self-heal Daemon on ib-storage2 N/A N/A Y 24519 >> Self-heal Daemon on ib-storage4 N/A N/A Y 7288 >> >> Task Status of Volume vol_home >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> Status of volume: vol_shared >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick ib-storage1:/export/brick_shared/data 49152 49164 Y 6554 >> Brick ib-storage2:/export/brick_shared/data 49152 49172 Y 24362 >> Self-heal Daemon on localhost N/A N/A Y 6534 >> Self-heal Daemon on ib-storage3 N/A N/A Y 7656 >> Self-heal Daemon on ib-storage2 N/A N/A Y 24519 >> Self-heal Daemon on ib-storage4 N/A N/A Y 7288 >> >> Task Status of Volume vol_shared >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> Status of volume: vol_workdir_amd >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick ib-storage1:/export/brick_workdir/bri >> ck1/data 49191 49192 Y 6555 >> Brick ib-storage3:/export/brick_workdir/bri >> ck1/data 49164 49165 Y 6368 >> Brick ib-storage1:/export/brick_workdir/bri >> ck2/data 49193 49194 Y 6576 >> Brick ib-storage3:/export/brick_workdir/bri >> ck2/data 49166 49167 Y 6387 >> >> Task Status of Volume vol_workdir_amd >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> Status of volume: vol_workdir_intel >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick ib-storage2:/export/brick_workdir/bri >> ck1/data 49175 49176 Y 24371 >> Brick ib-storage2:/export/brick_workdir/bri >> ck2/data 49177 49178 Y 24372 >> Brick ib-storage4:/export/brick_workdir/bri >> ck1/data 49164 49165 Y 5571 >> Brick ib-storage4:/export/brick_workdir/bri >> ck2/data 49166 49167 Y 5590 >> >> Task Status of Volume vol_workdir_intel >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> Concerning the brick logs, do you wanna have all bricks on every servers? > any errors from client log and bricks logs, and logs which has message id in between 102000 to 104000 from the same . > > Rafi KC > >> >> Geoffrey >> >> ------------------------------------------------------ >> Geoffrey Letessier >> Responsable informatique & ing?nieur syst?me >> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >> >> Le 22 juil. 2015 ? 10:00, Mohammed Rafi K C <rkavunga at redhat.com> a ?crit : >> >>> >>> >>> On 07/22/2015 12:55 PM, Geoffrey Letessier wrote: >>>> Concerning the hang, I just saw this only once with TCP protocol but, actually, RDMA seems to be in cause. >>> >>> If you are mounting a tcp,rdma volume using tcp protocol, all the communication will go through the tcp connection and rdma won't come in between client and server. >>> >>>> ? And, after a moment (a few minutes after having restarted my back-transfert of around 40TB), my volume fall down (and all my rsync too): >>>> [root at atlas ~]# df -h /mnt >>>> df: ? /mnt ?: Noeud final de transport n'est pas connect? >>>> df: aucun syst?me de fichiers trait? >>>> aka "transport endpoint is not connected ? >>> >>> Can you sent me the following details , if possible, ? >>> 1) mount command used, 2) volume status 3) Client, brick logs >>> >>> Regards >>> Rafi KC >>> >>>> >>>> Geoffrey >>>> >>>> >>>> ------------------------------------------------------ >>>> Geoffrey Letessier >>>> Responsable informatique & ing?nieur syst?me >>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>> Institut de Biologie Physico-Chimique >>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>> >>>> Le 22 juil. 2015 ? 09:17, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a ?crit : >>>> >>>>> Hi Rafi, >>>>> >>>>> It?s what I do. But I note particularly this kind of trouble when I mount my volumes manually. >>>>> >>>>> In addition, when I changed my transport-type from tcp or rdma to tcp,rdma, I have had to restart my volume in order they can took effect. >>>>> >>>>> I wonder if these trouble are not due to RDMA protocol? because it looks like more stable with TCP one. >>>>> >>>>> Another idea? >>>>> Thanks for replying and by advance, >>>>> Geoffrey >>>>> ------------------------------------------------------ >>>>> Geoffrey Letessier >>>>> Responsable informatique & ing?nieur syst?me >>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>> Institut de Biologie Physico-Chimique >>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>> >>>>> Le 22 juil. 2015 ? 07:33, Mohammed Rafi K C <rkavunga at redhat.com> a ?crit : >>>>> >>>>>> >>>>>> >>>>>> On 07/22/2015 04:51 AM, Geoffrey Letessier wrote: >>>>>>> Hi Niels, >>>>>>> >>>>>>> Thanks for replying. >>>>>>> >>>>>>> In fact, after having checked the log, I've discovered GlusterFS tried to connect a brick with a TCP (or RDMA) port allocated to another volume? (bug?) >>>>>>> For example, here is a extract of my workdir.log file : >>>>>>> [2015-07-21 21:34:01.820188] E [socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-0: connection to 10.0.4.1:49161 failed (Connexion refus?e) >>>>>>> [2015-07-21 21:34:01.822563] E [socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-2: connection to 10.0.4.1:49162 failed (Connexion refus?e) >>>>>>> >>>>>>> But the 2 ports (49161 and 49162) concerned only my vol_home volume, not the vol_workdir_amd one. >>>>>>> >>>>>>> Now, after having restart all glusterd synchronously (pdsh -w cl-storage[1-4] service glusterd restart), all seems to be back into a normal situation (size, write permission, etc.) >>>>>>> >>>>>>> But, a few minutes later, i note a strange thing I notice since i?ve upgraded my cluster storage from 3.5.3 to 3.7.2-3: when I try to mount some volume (particularly my vol_shared volume (replicated volume)) my system can hang? And, because I use it in my bashrc file for my environment modules, i need to restart my node. Idem if I try to do a DF on my mounted volume (if it doesn?t hang during the mount). >>>>>>> >>>>>>> With TCP transport-type, the situation seems to be more stable.. >>>>>>> >>>>>>> In addition: If I restart a storage node, I can?t use Gluster CLI (it also hang). >>>>>>> >>>>>>> Do you have an idea? >>>>>> >>>>>> Are you using bash script to start/mount the volume ? If so, add a sleep after volume start and mount, to allow all the process to start properly. Because RDMA protocol will take some time to init the resources. >>>>>> >>>>>> Regards >>>>>> Rafi KC >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> One more time, thanks a lot for your help, >>>>>>> Geoffrey >>>>>>> >>>>>>> ------------------------------------------------------ >>>>>>> Geoffrey Letessier >>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>> Institut de Biologie Physico-Chimique >>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>>> >>>>>>> Le 21 juil. 2015 ? 23:49, Niels de Vos <ndevos at redhat.com> a ?crit : >>>>>>> >>>>>>>> On Tue, Jul 21, 2015 at 11:20:20PM +0200, Geoffrey Letessier wrote: >>>>>>>>> Hello Soumya, Hello everybody, >>>>>>>>> >>>>>>>>> network.ping-timeout was set to 42 seconds. I set it to 0 but no >>>>>>>>> difference. The problem was, after having re-set le transport-type to >>>>>>>>> rdma,tcp some brick down after a few minutes.. Despite of restarting >>>>>>>>> volumes, after a few minutes, some [other/different] bricks down >>>>>>>>> again. >>>>>>>> >>>>>>>> I'm not sure how if the ping-timeout is differently handled when RDMA is >>>>>>>> used. Adding two of the guys that know RDMA well on CC. >>>>>>>> >>>>>>>>> Now, after re-creation of my volume, bricks keep alive but, oddly, i?m >>>>>>>>> not able to write on my volume. In addition, I defined a distributed >>>>>>>>> volume with 2 servers, 4 bricks of 250GB each and my final volume >>>>>>>>> seems to be only sized to 500GB? It?s amazing.. >>>>>>>> >>>>>>>> As seen further below, the 500GB volume is caused by two unreachable >>>>>>>> bricks. When the bricks are not reachable, the size of the bricks can >>>>>>>> not be detected by the client and therefore 2x 250 GB is missing. >>>>>>>> >>>>>>>> It is unclear to me why writing to a pure distributed volume fails. When >>>>>>>> a brick is not reachable, and the file should be created there, it >>>>>>>> would normally get created on an other brick. When the brick that should >>>>>>>> have the file gets online, and a new lookup for the file is done, a so >>>>>>>> called "link file" is created, which points to the file on the other >>>>>>>> brick. I guess the failure has to do with the connection issues, and I >>>>>>>> would suggest to get that solved first. >>>>>>>> >>>>>>>> HTH, >>>>>>>> Niels >>>>>>>> >>>>>>>> >>>>>>>>> Here you can find some information: >>>>>>>>> # gluster volume status vol_workdir_amd >>>>>>>>> Status of volume: vol_workdir_amd >>>>>>>>> Gluster process TCP Port RDMA Port Online Pid >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> Brick ib-storage1:/export/brick_workdir/bri >>>>>>>>> ck1/data 49185 49186 Y 23098 >>>>>>>>> Brick ib-storage3:/export/brick_workdir/bri >>>>>>>>> ck1/data 49158 49159 Y 3886 >>>>>>>>> Brick ib-storage1:/export/brick_workdir/bri >>>>>>>>> ck2/data 49187 49188 Y 23117 >>>>>>>>> Brick ib-storage3:/export/brick_workdir/bri >>>>>>>>> ck2/data 49160 49161 Y 3905 >>>>>>>>> >>>>>>>>> # gluster volume info vol_workdir_amd >>>>>>>>> >>>>>>>>> Volume Name: vol_workdir_amd >>>>>>>>> Type: Distribute >>>>>>>>> Volume ID: 087d26ea-c6df-4cbe-94af-ecd87b59aedb >>>>>>>>> Status: Started >>>>>>>>> Number of Bricks: 4 >>>>>>>>> Transport-type: tcp,rdma >>>>>>>>> Bricks: >>>>>>>>> Brick1: ib-storage1:/export/brick_workdir/brick1/data >>>>>>>>> Brick2: ib-storage3:/export/brick_workdir/brick1/data >>>>>>>>> Brick3: ib-storage1:/export/brick_workdir/brick2/data >>>>>>>>> Brick4: ib-storage3:/export/brick_workdir/brick2/data >>>>>>>>> Options Reconfigured: >>>>>>>>> performance.readdir-ahead: on >>>>>>>>> >>>>>>>>> # pdsh -w storage[1,3] df -h /export/brick_workdir/brick{1,2} >>>>>>>>> storage3: Filesystem Size Used Avail Use% Mounted on >>>>>>>>> storage3: /dev/mapper/st--block1-blk1--workdir >>>>>>>>> storage3: 250G 34M 250G 1% /export/brick_workdir/brick1 >>>>>>>>> storage3: /dev/mapper/st--block2-blk2--workdir >>>>>>>>> storage3: 250G 34M 250G 1% /export/brick_workdir/brick2 >>>>>>>>> storage1: Filesystem Size Used Avail Use% Mounted on >>>>>>>>> storage1: /dev/mapper/st--block1-blk1--workdir >>>>>>>>> storage1: 250G 33M 250G 1% /export/brick_workdir/brick1 >>>>>>>>> storage1: /dev/mapper/st--block2-blk2--workdir >>>>>>>>> storage1: 250G 33M 250G 1% /export/brick_workdir/brick2 >>>>>>>>> >>>>>>>>> # df -h /workdir/ >>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>> localhost:vol_workdir_amd.rdma >>>>>>>>> 500G 67M 500G 1% /workdir >>>>>>>>> >>>>>>>>> # touch /workdir/test >>>>>>>>> touch: impossible de faire un touch ? /workdir/test ?: Aucun fichier ou dossier de ce type >>>>>>>>> >>>>>>>>> # tail -30l /var/log/glusterfs/workdir.log >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> [2015-07-21 21:10:33.927673] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> [2015-07-21 21:10:37.877231] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) >>>>>>>>> [2015-07-21 21:10:37.880556] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) >>>>>>>>> [2015-07-21 21:10:37.914661] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> [2015-07-21 21:10:37.923535] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> [2015-07-21 21:10:41.883925] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) >>>>>>>>> [2015-07-21 21:10:41.887085] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) >>>>>>>>> [2015-07-21 21:10:41.919394] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> [2015-07-21 21:10:41.932622] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> [2015-07-21 21:10:44.682636] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>>>> [2015-07-21 21:10:44.682947] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>>>> [2015-07-21 21:10:44.683240] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>>>> [2015-07-21 21:10:44.683472] W [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk info from vol_workdir_amd-client-0 >>>>>>>>> [2015-07-21 21:10:44.683506] W [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk info from vol_workdir_amd-client-2 >>>>>>>>> [2015-07-21 21:10:44.683532] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>>>> [2015-07-21 21:10:44.683551] W [fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse: 18: /test => -1 (Aucun fichier ou dossier de ce type) >>>>>>>>> [2015-07-21 21:10:44.683619] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>>>> [2015-07-21 21:10:44.683846] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>>>> [2015-07-21 21:10:45.886807] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) >>>>>>>>> [2015-07-21 21:10:45.893059] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) >>>>>>>>> [2015-07-21 21:10:45.920434] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> [2015-07-21 21:10:45.925292] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>>>> >>>>>>>>> I use GlusterFS in production since around 3 years without any block >>>>>>>>> problem but now the situation is awesome since more than 3 weeks? >>>>>>>>> Indeed, our production are down since roughly 3.5 weeks (with a lot >>>>>>>>> and different problems with GlusterFS v3.5.3 and now with 3.7.2-3) and >>>>>>>>> i need to restart it? >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> Geoffrey >>>>>>>>> ------------------------------------------------------ >>>>>>>>> Geoffrey Letessier >>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>>>>> >>>>>>>>> Le 21 juil. 2015 ? 19:36, Soumya Koduri <skoduri at redhat.com> a ?crit : >>>>>>>>> >>>>>>>>>> From the following errors, >>>>>>>>>> >>>>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020] [client.c:2118:notify] 0-vol_shared-client-0: parent translators are ready, attempting connect on transport >>>>>>>>>> [2015-07-21 14:36:30.498989] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 12, Protocole non disponible >>>>>>>>>> [2015-07-21 14:36:30.499004] E [socket.c:3015:socket_connect] 0-vol_shared-client-0: Failed to set keep-alive: Protocole non disponible >>>>>>>>>> >>>>>>>>>> looks like setting TCP_USER_TIMEOUT value to 0 on the socket failed with error (IIUC) "Protocol not available". >>>>>>>>>> Could you check if 'network.ping-timeout' is set to zero for that volume using 'gluster volume info'? Anyways from the code looks like 'TCP_USER_TIMEOUT' can take value zero. Not sure why it has failed. >>>>>>>>>> >>>>>>>>>> Niels, any thoughts? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Soumya >>>>>>>>>> >>>>>>>>>> On 07/21/2015 08:15 PM, Geoffrey Letessier wrote: >>>>>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020] [client.c:2118:notify] >>>>>>>>>>> 0-vol_shared-client-0: parent translators are ready, attempting connect >>>>>>>>>>> on transport >>>>>>>>>>> [2015-07-21 14:36:30.498989] W [socket.c:923:__socket_keepalive] >>>>>>>>>>> 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 12, Protocole non >>>>>>>>>>> disponible >>>>>>>>>>> [2015-07-21 14:36:30.499004] E [socket.c:3015:socket_connect] >>>>>>>>>>> 0-vol_shared-client-0: Failed to set keep-alive: Protocole non disponible >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150722/fa1d29e6/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: vol_shared.tgz Type: application/octet-stream Size: 13800 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150722/fa1d29e6/attachment.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150722/fa1d29e6/attachment-0001.html>
Geoffrey Letessier
2015-Jul-27 20:57 UTC
[Gluster-users] heaps split-brains during back-transfert
Dears, For a couple of weeks (more than one month), our computing production is stopped due to several -but amazing- troubles with GlusterFS. After having noticed a big problem with incorrect quota size accounted for many many files, i decided under the guidance of Gluster team support to upgrade my storage cluster from version 3.5.3 to the latest (3.7.2-3) because these bugs are theoretically fixed in this branch. Now, since i?ve done this upgrade, it?s the amazing mess and i cannot restart the production. Indeed : 1 - RDMA protocol is not working and hang my system / shell commands; only TCP protocol (over Infiniband) is more or less operational - it?s not a blocking point but? 2 - read/write performance relatively low 3 - thousands split-brains are appeared. So, for the moment, i believe GlusterFS 3.7 is not actually production ready. Concerning the third point: after having destroy all my volumes (RAID re-init, new partition, GlusterFS volumes, etc.), recreate the main one, I tried to back-transfert my data from archive/backup server info this new volume and I note a lot of errors in my mount log file, as your can read in this extract: [2015-07-26 22:35:16.962815] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: performing entry selfheal on 865083fa-984e-44bd-aacf-b8195789d9e0 [2015-07-26 22:35:16.965896] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-vol_home-replicate-0: Gfid mismatch detected for <865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>, e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and 820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping conservative merge on the file. [2015-07-26 22:35:16.975206] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: performing entry selfheal on 29382d8d-c507-4d2e-b74d-dbdcb791ca65 [2015-07-26 22:35:28.719935] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-vol_home-replicate-0: Gfid mismatch detected for <29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>, 951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping conservative merge on the file. [2015-07-26 22:35:29.764891] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: performing entry selfheal on 865083fa-984e-44bd-aacf-b8195789d9e0 [2015-07-26 22:35:29.768339] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-vol_home-replicate-0: Gfid mismatch detected for <865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>, e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and 820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping conservative merge on the file. [2015-07-26 22:35:29.775037] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: performing entry selfheal on 29382d8d-c507-4d2e-b74d-dbdcb791ca65 [2015-07-26 22:35:29.776857] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-vol_home-replicate-0: Gfid mismatch detected for <29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>, 951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping conservative merge on the file. [2015-07-26 22:35:29.800535] W [MSGID: 108008] [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0: GFID mismatch for <gfid:29382d8d-c507-4d2e-b74d-dbdcb791ca65>/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt 951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0 And when I try to browse some folders (still in mount log file): [2015-07-27 09:00:19.005763] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: performing entry selfheal on 2ac27442-8be0-4985-b48f-3328a86a6686 [2015-07-27 09:00:22.322316] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-vol_home-replicate-0: Gfid mismatch detected for <2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>, 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping conservative merge on the file. [2015-07-27 09:00:23.008771] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: performing entry selfheal on 2ac27442-8be0-4985-b48f-3328a86a6686 [2015-07-27 08:59:50.359187] W [MSGID: 108008] [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0: GFID mismatch for <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012588.gro 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0 [2015-07-27 09:00:02.500419] W [MSGID: 108008] [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0: GFID mismatch for <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012590.gro b22aec09-2be3-41ea-a976-7b8d0e6f61f0 on vol_home-client-1 and ec100f9e-ec48-4b29-b75e-a50ec6245de6 on vol_home-client-0 [2015-07-27 09:00:02.506925] W [MSGID: 108008] [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0: GFID mismatch for <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0009059.gro 0485c093-11ca-4829-b705-e259668ebd8c on vol_home-client-1 and e83a492b-7f8c-4b32-a76e-343f984142fe on vol_home-client-0 [2015-07-27 09:00:23.001121] W [MSGID: 108008] [afr-read-txn.c:241:afr_read_txn] 0-vol_home-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain) [2015-07-27 09:00:26.231262] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-vol_home-replicate-0: Gfid mismatch detected for <2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>, 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping conservative merge on the file. And, above all, browsing folder I get a lot of input/ouput errors. Currently I have 6.2M inodes and roughly 30TB in my "new" volume. For the moment, Quota is disable to increase the IO performance during the back-transfert? Your can also find in attachments: - an "ls" result - a split-brain research result - the volume information and status - a complete volume heal info Hoping this can help your to help me to fix all my problems and reopen the computing production. Thanks in advance, Geoffrey PS: ? Erreur d?Entr?e/Sortie ? = ? Input / Output Error ? ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ing?nieur syst?me UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0006.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ls_example.txt URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0005.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0007.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: split_brain__20150725.txt URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0006.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0008.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vol_home_healinfo.txt URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0007.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0009.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vol_home_info.txt URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0008.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0010.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vol_home_status.txt URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0009.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0011.html>