Mohammed Rafi K C
2015-Jul-22 08:00 UTC
[Gluster-users] Change transport-type on volume from tcp to rdma, tcp
On 07/22/2015 12:55 PM, Geoffrey Letessier wrote:> Concerning the hang, I just saw this only once with TCP protocol but, > actually, RDMA seems to be in cause.If you are mounting a tcp,rdma volume using tcp protocol, all the communication will go through the tcp connection and rdma won't come in between client and server.> ? And, after a moment (a few minutes after having restarted my > back-transfert of around 40TB), my volume fall down (and all my rsync > too): > [root at atlas ~]# df -h /mnt > df: ? /mnt ?: Noeud final de transport n'est pas connect? > df: aucun syst?me de fichiers trait? > aka "transport endpoint is not connected ?Can you sent me the following details , if possible, ? 1) mount command used, 2) volume status 3) Client, brick logs Regards Rafi KC> > Geoffrey > > > ------------------------------------------------------ > Geoffrey Letessier > Responsable informatique & ing?nieur syst?me > UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique > Institut de Biologie Physico-Chimique > 13, rue Pierre et Marie Curie - 75005 Paris > Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr > <mailto:geoffrey.letessier at ibpc.fr> > > Le 22 juil. 2015 ? 09:17, Geoffrey Letessier > <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a ?crit : > >> Hi Rafi, >> >> It?s what I do. But I note particularly this kind of trouble when I >> mount my volumes manually. >> >> In addition, when I changed my transport-type from tcp or rdma to >> tcp,rdma, I have had to restart my volume in order they can took effect. >> >> I wonder if these trouble are not due to RDMA protocol? because it >> looks like more stable with TCP one. >> >> Another idea? >> Thanks for replying and by advance, >> Geoffrey >> ------------------------------------------------------ >> Geoffrey Letessier >> Responsable informatique & ing?nieur syst?me >> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >> <mailto:geoffrey.letessier at ibpc.fr> >> >> Le 22 juil. 2015 ? 07:33, Mohammed Rafi K C <rkavunga at redhat.com >> <mailto:rkavunga at redhat.com>> a ?crit : >> >>> >>> >>> On 07/22/2015 04:51 AM, Geoffrey Letessier wrote: >>>> Hi Niels, >>>> >>>> Thanks for replying. >>>> >>>> In fact, after having checked the log, I've discovered GlusterFS >>>> tried to connect a brick with a TCP (or RDMA) port allocated to >>>> another volume? (bug?) >>>> For example, here is a extract of my workdir.log file : >>>> [2015-07-21 21:34:01.820188] E >>>> [socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-0: >>>> connection to 10.0.4.1:49161 failed (Connexion refus?e) >>>> [2015-07-21 21:34:01.822563] E >>>> [socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-2: >>>> connection to 10.0.4.1:49162 failed (Connexion refus?e) >>>> >>>> But the 2 ports (49161 and 49162) concerned only my vol_home >>>> volume, not the vol_workdir_amd one. >>>> >>>> Now, after having restart all glusterd synchronously (pdsh -w >>>> cl-storage[1-4] service glusterd restart), all seems to be back >>>> into a normal situation (size, write permission, etc.) >>>> >>>> But, a few minutes later, i note a strange thing I notice since >>>> i?ve upgraded my cluster storage from 3.5.3 to 3.7.2-3: when I try >>>> to mount some volume (particularly my vol_shared volume (replicated >>>> volume)) my system can hang? And, because I use it in my bashrc >>>> file for my environment modules, i need to restart my node. Idem if >>>> I try to do a DF on my mounted volume (if it doesn?t hang during >>>> the mount). >>>> >>>> With TCP transport-type, the situation seems to be more stable.. >>>> >>>> In addition: If I restart a storage node, I can?t use Gluster CLI >>>> (it also hang). >>>> >>>> Do you have an idea? >>> >>> Are you using bash script to start/mount the volume ? If so, add a >>> sleep after volume start and mount, to allow all the process to >>> start properly. Because RDMA protocol will take some time to init >>> the resources. >>> >>> Regards >>> Rafi KC >>> >>> >>> >>>> >>>> One more time, thanks a lot for your help, >>>> Geoffrey >>>> >>>> ------------------------------------------------------ >>>> Geoffrey Letessier >>>> Responsable informatique & ing?nieur syst?me >>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>> Institut de Biologie Physico-Chimique >>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>> <mailto:geoffrey.letessier at ibpc.fr> >>>> >>>> Le 21 juil. 2015 ? 23:49, Niels de Vos <ndevos at redhat.com >>>> <mailto:ndevos at redhat.com>> a ?crit : >>>> >>>>> On Tue, Jul 21, 2015 at 11:20:20PM +0200, Geoffrey Letessier wrote: >>>>>> Hello Soumya, Hello everybody, >>>>>> >>>>>> network.ping-timeout was set to 42 seconds. I set it to 0 but no >>>>>> difference. The problem was, after having re-set le transport-type to >>>>>> rdma,tcp some brick down after a few minutes.. Despite of restarting >>>>>> volumes, after a few minutes, some [other/different] bricks down >>>>>> again. >>>>> >>>>> I'm not sure how if the ping-timeout is differently handled when >>>>> RDMA is >>>>> used. Adding two of the guys that know RDMA well on CC. >>>>> >>>>>> Now, after re-creation of my volume, bricks keep alive but, >>>>>> oddly, i?m >>>>>> not able to write on my volume. In addition, I defined a distributed >>>>>> volume with 2 servers, 4 bricks of 250GB each and my final volume >>>>>> seems to be only sized to 500GB? It?s amazing.. >>>>> >>>>> As seen further below, the 500GB volume is caused by two unreachable >>>>> bricks. When the bricks are not reachable, the size of the bricks can >>>>> not be detected by the client and therefore 2x 250 GB is missing. >>>>> >>>>> It is unclear to me why writing to a pure distributed volume >>>>> fails. When >>>>> a brick is not reachable, and the file should be created there, it >>>>> would normally get created on an other brick. When the brick that >>>>> should >>>>> have the file gets online, and a new lookup for the file is done, a so >>>>> called "link file" is created, which points to the file on the other >>>>> brick. I guess the failure has to do with the connection issues, and I >>>>> would suggest to get that solved first. >>>>> >>>>> HTH, >>>>> Niels >>>>> >>>>> >>>>>> Here you can find some information: >>>>>> # gluster volume status vol_workdir_amd >>>>>> Status of volume: vol_workdir_amd >>>>>> Gluster process TCP Port RDMA Port >>>>>> Online Pid >>>>>> ------------------------------------------------------------------------------ >>>>>> Brick ib-storage1:/export/brick_workdir/bri >>>>>> ck1/data 49185 49186 >>>>>> Y 23098 >>>>>> Brick ib-storage3:/export/brick_workdir/bri >>>>>> ck1/data 49158 49159 >>>>>> Y 3886 >>>>>> Brick ib-storage1:/export/brick_workdir/bri >>>>>> ck2/data 49187 49188 >>>>>> Y 23117 >>>>>> Brick ib-storage3:/export/brick_workdir/bri >>>>>> ck2/data 49160 49161 >>>>>> Y 3905 >>>>>> >>>>>> # gluster volume info vol_workdir_amd >>>>>> >>>>>> Volume Name: vol_workdir_amd >>>>>> Type: Distribute >>>>>> Volume ID: 087d26ea-c6df-4cbe-94af-ecd87b59aedb >>>>>> Status: Started >>>>>> Number of Bricks: 4 >>>>>> Transport-type: tcp,rdma >>>>>> Bricks: >>>>>> Brick1: ib-storage1:/export/brick_workdir/brick1/data >>>>>> Brick2: ib-storage3:/export/brick_workdir/brick1/data >>>>>> Brick3: ib-storage1:/export/brick_workdir/brick2/data >>>>>> Brick4: ib-storage3:/export/brick_workdir/brick2/data >>>>>> Options Reconfigured: >>>>>> performance.readdir-ahead: on >>>>>> >>>>>> # pdsh -w storage[1,3] df -h /export/brick_workdir/brick{1,2} >>>>>> storage3: Filesystem Size Used Avail Use% Mounted on >>>>>> storage3: /dev/mapper/st--block1-blk1--workdir >>>>>> storage3: 250G 34M 250G 1% >>>>>> /export/brick_workdir/brick1 >>>>>> storage3: /dev/mapper/st--block2-blk2--workdir >>>>>> storage3: 250G 34M 250G 1% >>>>>> /export/brick_workdir/brick2 >>>>>> storage1: Filesystem Size Used Avail Use% Mounted on >>>>>> storage1: /dev/mapper/st--block1-blk1--workdir >>>>>> storage1: 250G 33M 250G 1% >>>>>> /export/brick_workdir/brick1 >>>>>> storage1: /dev/mapper/st--block2-blk2--workdir >>>>>> storage1: 250G 33M 250G 1% >>>>>> /export/brick_workdir/brick2 >>>>>> >>>>>> # df -h /workdir/ >>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>> localhost:vol_workdir_amd.rdma >>>>>> 500G 67M 500G 1% /workdir >>>>>> >>>>>> # touch /workdir/test >>>>>> touch: impossible de faire un touch ? /workdir/test ?: Aucun >>>>>> fichier ou dossier de ce type >>>>>> >>>>>> # tail -30l /var/log/glusterfs/workdir.log >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> [2015-07-21 21:10:33.927673] W >>>>>> [rdma.c:1263:gf_rdma_cm_event_handler] >>>>>> 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, >>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> [2015-07-21 21:10:37.877231] I >>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: >>>>>> changing port to 49173 (from 0) >>>>>> [2015-07-21 21:10:37.880556] I >>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: >>>>>> changing port to 49174 (from 0) >>>>>> [2015-07-21 21:10:37.914661] W >>>>>> [rdma.c:1263:gf_rdma_cm_event_handler] >>>>>> 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, >>>>>> error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> [2015-07-21 21:10:37.923535] W >>>>>> [rdma.c:1263:gf_rdma_cm_event_handler] >>>>>> 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, >>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> [2015-07-21 21:10:41.883925] I >>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: >>>>>> changing port to 49173 (from 0) >>>>>> [2015-07-21 21:10:41.887085] I >>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: >>>>>> changing port to 49174 (from 0) >>>>>> [2015-07-21 21:10:41.919394] W >>>>>> [rdma.c:1263:gf_rdma_cm_event_handler] >>>>>> 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, >>>>>> error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> [2015-07-21 21:10:41.932622] W >>>>>> [rdma.c:1263:gf_rdma_cm_event_handler] >>>>>> 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, >>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> [2015-07-21 21:10:44.682636] W >>>>>> [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no >>>>>> subvolume for hash (value) = 1072520554 >>>>>> [2015-07-21 21:10:44.682947] W >>>>>> [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no >>>>>> subvolume for hash (value) = 1072520554 >>>>>> [2015-07-21 21:10:44.683240] W >>>>>> [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no >>>>>> subvolume for hash (value) = 1072520554 >>>>>> [2015-07-21 21:10:44.683472] W >>>>>> [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: >>>>>> failed to get disk info from vol_workdir_amd-client-0 >>>>>> [2015-07-21 21:10:44.683506] W >>>>>> [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: >>>>>> failed to get disk info from vol_workdir_amd-client-2 >>>>>> [2015-07-21 21:10:44.683532] W >>>>>> [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no >>>>>> subvolume for hash (value) = 1072520554 >>>>>> [2015-07-21 21:10:44.683551] W >>>>>> [fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse: 18: /test >>>>>> => -1 (Aucun fichier ou dossier de ce type) >>>>>> [2015-07-21 21:10:44.683619] W >>>>>> [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no >>>>>> subvolume for hash (value) = 1072520554 >>>>>> [2015-07-21 21:10:44.683846] W >>>>>> [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no >>>>>> subvolume for hash (value) = 1072520554 >>>>>> [2015-07-21 21:10:45.886807] I >>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: >>>>>> changing port to 49173 (from 0) >>>>>> [2015-07-21 21:10:45.893059] I >>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: >>>>>> changing port to 49174 (from 0) >>>>>> [2015-07-21 21:10:45.920434] W >>>>>> [rdma.c:1263:gf_rdma_cm_event_handler] >>>>>> 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, >>>>>> error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> [2015-07-21 21:10:45.925292] W >>>>>> [rdma.c:1263:gf_rdma_cm_event_handler] >>>>>> 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, >>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>> Host Unreachable, Check your connection with IPoIB >>>>>> >>>>>> I use GlusterFS in production since around 3 years without any block >>>>>> problem but now the situation is awesome since more than 3 weeks? >>>>>> Indeed, our production are down since roughly 3.5 weeks (with a lot >>>>>> and different problems with GlusterFS v3.5.3 and now with >>>>>> 3.7.2-3) and >>>>>> i need to restart it? >>>>>> >>>>>> Thanks in advance, >>>>>> Geoffrey >>>>>> ------------------------------------------------------ >>>>>> Geoffrey Letessier >>>>>> Responsable informatique & ing?nieur syst?me >>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>> Institut de Biologie Physico-Chimique >>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>> <mailto:geoffrey.letessier at ibpc.fr> >>>>>> >>>>>> Le 21 juil. 2015 ? 19:36, Soumya Koduri <skoduri at redhat.com >>>>>> <mailto:skoduri at redhat.com>> a ?crit : >>>>>> >>>>>>> From the following errors, >>>>>>> >>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020] >>>>>>> [client.c:2118:notify] 0-vol_shared-client-0: parent translators >>>>>>> are ready, attempting connect on transport >>>>>>> [2015-07-21 14:36:30.498989] W [socket.c:923:__socket_keepalive] >>>>>>> 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 12, >>>>>>> Protocole non disponible >>>>>>> [2015-07-21 14:36:30.499004] E [socket.c:3015:socket_connect] >>>>>>> 0-vol_shared-client-0: Failed to set keep-alive: Protocole non >>>>>>> disponible >>>>>>> >>>>>>> looks like setting TCP_USER_TIMEOUT value to 0 on the socket >>>>>>> failed with error (IIUC) "Protocol not available". >>>>>>> Could you check if 'network.ping-timeout' is set to zero for >>>>>>> that volume using 'gluster volume info'? Anyways from the code >>>>>>> looks like 'TCP_USER_TIMEOUT' can take value zero. Not sure why >>>>>>> it has failed. >>>>>>> >>>>>>> Niels, any thoughts? >>>>>>> >>>>>>> Thanks, >>>>>>> Soumya >>>>>>> >>>>>>> On 07/21/2015 08:15 PM, Geoffrey Letessier wrote: >>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020] >>>>>>>> [client.c:2118:notify] >>>>>>>> 0-vol_shared-client-0: parent translators are ready, attempting >>>>>>>> connect >>>>>>>> on transport >>>>>>>> [2015-07-21 14:36:30.498989] W [socket.c:923:__socket_keepalive] >>>>>>>> 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 12, >>>>>>>> Protocole non >>>>>>>> disponible >>>>>>>> [2015-07-21 14:36:30.499004] E [socket.c:3015:socket_connect] >>>>>>>> 0-vol_shared-client-0: Failed to set keep-alive: Protocole non >>>>>>>> disponible >>>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150722/1bd7eea5/attachment.html>
Geoffrey Letessier
2015-Jul-22 08:06 UTC
[Gluster-users] Change transport-type on volume from tcp to rdma, tcp
Oops, i forgot to add all people in CC. Yes, i guessed. With TCP protocol, all my volume seem OK and I dont note, for the moment, any hang. mount command: - with RDMA: mount -t glusterfs -o transport=rdma,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt - with TCP: mount -t glusterfs -o transport=tcp,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt volume status: # gluster volume status all Status of volume: vol_home Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ib-storage1:/export/brick_home/brick1 /data 49159 49165 Y 6547 Brick ib-storage2:/export/brick_home/brick1 /data 49161 49173 Y 24348 Brick ib-storage3:/export/brick_home/brick1 /data 49152 49156 Y 5616 Brick ib-storage4:/export/brick_home/brick1 /data 49152 49162 Y 5424 Brick ib-storage1:/export/brick_home/brick2 /data 49160 49166 Y 6548 Brick ib-storage2:/export/brick_home/brick2 /data 49162 49174 Y 24355 Brick ib-storage3:/export/brick_home/brick2 /data 49153 49157 Y 5635 Brick ib-storage4:/export/brick_home/brick2 /data 49153 49163 Y 5443 Self-heal Daemon on localhost N/A N/A Y 6534 Self-heal Daemon on ib-storage3 N/A N/A Y 7656 Self-heal Daemon on ib-storage2 N/A N/A Y 24519 Self-heal Daemon on ib-storage4 N/A N/A Y 7288 Task Status of Volume vol_home ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_shared Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ib-storage1:/export/brick_shared/data 49152 49164 Y 6554 Brick ib-storage2:/export/brick_shared/data 49152 49172 Y 24362 Self-heal Daemon on localhost N/A N/A Y 6534 Self-heal Daemon on ib-storage3 N/A N/A Y 7656 Self-heal Daemon on ib-storage2 N/A N/A Y 24519 Self-heal Daemon on ib-storage4 N/A N/A Y 7288 Task Status of Volume vol_shared ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_workdir_amd Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ib-storage1:/export/brick_workdir/bri ck1/data 49191 49192 Y 6555 Brick ib-storage3:/export/brick_workdir/bri ck1/data 49164 49165 Y 6368 Brick ib-storage1:/export/brick_workdir/bri ck2/data 49193 49194 Y 6576 Brick ib-storage3:/export/brick_workdir/bri ck2/data 49166 49167 Y 6387 Task Status of Volume vol_workdir_amd ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_workdir_intel Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ib-storage2:/export/brick_workdir/bri ck1/data 49175 49176 Y 24371 Brick ib-storage2:/export/brick_workdir/bri ck2/data 49177 49178 Y 24372 Brick ib-storage4:/export/brick_workdir/bri ck1/data 49164 49165 Y 5571 Brick ib-storage4:/export/brick_workdir/bri ck2/data 49166 49167 Y 5590 Task Status of Volume vol_workdir_intel ------------------------------------------------------------------------------ There are no active volume tasks Concerning the brick logs, do you wanna have all bricks on every servers? Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ing?nieur syst?me UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr Le 22 juil. 2015 ? 10:00, Mohammed Rafi K C <rkavunga at redhat.com> a ?crit :> > > On 07/22/2015 12:55 PM, Geoffrey Letessier wrote: >> Concerning the hang, I just saw this only once with TCP protocol but, actually, RDMA seems to be in cause. > > If you are mounting a tcp,rdma volume using tcp protocol, all the communication will go through the tcp connection and rdma won't come in between client and server. > >> ? And, after a moment (a few minutes after having restarted my back-transfert of around 40TB), my volume fall down (and all my rsync too): >> [root at atlas ~]# df -h /mnt >> df: ? /mnt ?: Noeud final de transport n'est pas connect? >> df: aucun syst?me de fichiers trait? >> aka "transport endpoint is not connected ? > > Can you sent me the following details , if possible, ? > 1) mount command used, 2) volume status 3) Client, brick logs > > Regards > Rafi KC > >> >> Geoffrey >> >> >> ------------------------------------------------------ >> Geoffrey Letessier >> Responsable informatique & ing?nieur syst?me >> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >> >> Le 22 juil. 2015 ? 09:17, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a ?crit : >> >>> Hi Rafi, >>> >>> It?s what I do. But I note particularly this kind of trouble when I mount my volumes manually. >>> >>> In addition, when I changed my transport-type from tcp or rdma to tcp,rdma, I have had to restart my volume in order they can took effect. >>> >>> I wonder if these trouble are not due to RDMA protocol? because it looks like more stable with TCP one. >>> >>> Another idea? >>> Thanks for replying and by advance, >>> Geoffrey >>> ------------------------------------------------------ >>> Geoffrey Letessier >>> Responsable informatique & ing?nieur syst?me >>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>> Institut de Biologie Physico-Chimique >>> 13, rue Pierre et Marie Curie - 75005 Paris >>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>> >>> Le 22 juil. 2015 ? 07:33, Mohammed Rafi K C <rkavunga at redhat.com> a ?crit : >>> >>>> >>>> >>>> On 07/22/2015 04:51 AM, Geoffrey Letessier wrote: >>>>> Hi Niels, >>>>> >>>>> Thanks for replying. >>>>> >>>>> In fact, after having checked the log, I've discovered GlusterFS tried to connect a brick with a TCP (or RDMA) port allocated to another volume? (bug?) >>>>> For example, here is a extract of my workdir.log file : >>>>> [2015-07-21 21:34:01.820188] E [socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-0: connection to 10.0.4.1:49161 failed (Connexion refus?e) >>>>> [2015-07-21 21:34:01.822563] E [socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-2: connection to 10.0.4.1:49162 failed (Connexion refus?e) >>>>> >>>>> But the 2 ports (49161 and 49162) concerned only my vol_home volume, not the vol_workdir_amd one. >>>>> >>>>> Now, after having restart all glusterd synchronously (pdsh -w cl-storage[1-4] service glusterd restart), all seems to be back into a normal situation (size, write permission, etc.) >>>>> >>>>> But, a few minutes later, i note a strange thing I notice since i?ve upgraded my cluster storage from 3.5.3 to 3.7.2-3: when I try to mount some volume (particularly my vol_shared volume (replicated volume)) my system can hang? And, because I use it in my bashrc file for my environment modules, i need to restart my node. Idem if I try to do a DF on my mounted volume (if it doesn?t hang during the mount). >>>>> >>>>> With TCP transport-type, the situation seems to be more stable.. >>>>> >>>>> In addition: If I restart a storage node, I can?t use Gluster CLI (it also hang). >>>>> >>>>> Do you have an idea? >>>> >>>> Are you using bash script to start/mount the volume ? If so, add a sleep after volume start and mount, to allow all the process to start properly. Because RDMA protocol will take some time to init the resources. >>>> >>>> Regards >>>> Rafi KC >>>> >>>> >>>> >>>>> >>>>> One more time, thanks a lot for your help, >>>>> Geoffrey >>>>> >>>>> ------------------------------------------------------ >>>>> Geoffrey Letessier >>>>> Responsable informatique & ing?nieur syst?me >>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>> Institut de Biologie Physico-Chimique >>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>> >>>>> Le 21 juil. 2015 ? 23:49, Niels de Vos <ndevos at redhat.com> a ?crit : >>>>> >>>>>> On Tue, Jul 21, 2015 at 11:20:20PM +0200, Geoffrey Letessier wrote: >>>>>>> Hello Soumya, Hello everybody, >>>>>>> >>>>>>> network.ping-timeout was set to 42 seconds. I set it to 0 but no >>>>>>> difference. The problem was, after having re-set le transport-type to >>>>>>> rdma,tcp some brick down after a few minutes.. Despite of restarting >>>>>>> volumes, after a few minutes, some [other/different] bricks down >>>>>>> again. >>>>>> >>>>>> I'm not sure how if the ping-timeout is differently handled when RDMA is >>>>>> used. Adding two of the guys that know RDMA well on CC. >>>>>> >>>>>>> Now, after re-creation of my volume, bricks keep alive but, oddly, i?m >>>>>>> not able to write on my volume. In addition, I defined a distributed >>>>>>> volume with 2 servers, 4 bricks of 250GB each and my final volume >>>>>>> seems to be only sized to 500GB? It?s amazing.. >>>>>> >>>>>> As seen further below, the 500GB volume is caused by two unreachable >>>>>> bricks. When the bricks are not reachable, the size of the bricks can >>>>>> not be detected by the client and therefore 2x 250 GB is missing. >>>>>> >>>>>> It is unclear to me why writing to a pure distributed volume fails. When >>>>>> a brick is not reachable, and the file should be created there, it >>>>>> would normally get created on an other brick. When the brick that should >>>>>> have the file gets online, and a new lookup for the file is done, a so >>>>>> called "link file" is created, which points to the file on the other >>>>>> brick. I guess the failure has to do with the connection issues, and I >>>>>> would suggest to get that solved first. >>>>>> >>>>>> HTH, >>>>>> Niels >>>>>> >>>>>> >>>>>>> Here you can find some information: >>>>>>> # gluster volume status vol_workdir_amd >>>>>>> Status of volume: vol_workdir_amd >>>>>>> Gluster process TCP Port RDMA Port Online Pid >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Brick ib-storage1:/export/brick_workdir/bri >>>>>>> ck1/data 49185 49186 Y 23098 >>>>>>> Brick ib-storage3:/export/brick_workdir/bri >>>>>>> ck1/data 49158 49159 Y 3886 >>>>>>> Brick ib-storage1:/export/brick_workdir/bri >>>>>>> ck2/data 49187 49188 Y 23117 >>>>>>> Brick ib-storage3:/export/brick_workdir/bri >>>>>>> ck2/data 49160 49161 Y 3905 >>>>>>> >>>>>>> # gluster volume info vol_workdir_amd >>>>>>> >>>>>>> Volume Name: vol_workdir_amd >>>>>>> Type: Distribute >>>>>>> Volume ID: 087d26ea-c6df-4cbe-94af-ecd87b59aedb >>>>>>> Status: Started >>>>>>> Number of Bricks: 4 >>>>>>> Transport-type: tcp,rdma >>>>>>> Bricks: >>>>>>> Brick1: ib-storage1:/export/brick_workdir/brick1/data >>>>>>> Brick2: ib-storage3:/export/brick_workdir/brick1/data >>>>>>> Brick3: ib-storage1:/export/brick_workdir/brick2/data >>>>>>> Brick4: ib-storage3:/export/brick_workdir/brick2/data >>>>>>> Options Reconfigured: >>>>>>> performance.readdir-ahead: on >>>>>>> >>>>>>> # pdsh -w storage[1,3] df -h /export/brick_workdir/brick{1,2} >>>>>>> storage3: Filesystem Size Used Avail Use% Mounted on >>>>>>> storage3: /dev/mapper/st--block1-blk1--workdir >>>>>>> storage3: 250G 34M 250G 1% /export/brick_workdir/brick1 >>>>>>> storage3: /dev/mapper/st--block2-blk2--workdir >>>>>>> storage3: 250G 34M 250G 1% /export/brick_workdir/brick2 >>>>>>> storage1: Filesystem Size Used Avail Use% Mounted on >>>>>>> storage1: /dev/mapper/st--block1-blk1--workdir >>>>>>> storage1: 250G 33M 250G 1% /export/brick_workdir/brick1 >>>>>>> storage1: /dev/mapper/st--block2-blk2--workdir >>>>>>> storage1: 250G 33M 250G 1% /export/brick_workdir/brick2 >>>>>>> >>>>>>> # df -h /workdir/ >>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>> localhost:vol_workdir_amd.rdma >>>>>>> 500G 67M 500G 1% /workdir >>>>>>> >>>>>>> # touch /workdir/test >>>>>>> touch: impossible de faire un touch ? /workdir/test ?: Aucun fichier ou dossier de ce type >>>>>>> >>>>>>> # tail -30l /var/log/glusterfs/workdir.log >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> [2015-07-21 21:10:33.927673] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> [2015-07-21 21:10:37.877231] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) >>>>>>> [2015-07-21 21:10:37.880556] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) >>>>>>> [2015-07-21 21:10:37.914661] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> [2015-07-21 21:10:37.923535] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> [2015-07-21 21:10:41.883925] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) >>>>>>> [2015-07-21 21:10:41.887085] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) >>>>>>> [2015-07-21 21:10:41.919394] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> [2015-07-21 21:10:41.932622] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> [2015-07-21 21:10:44.682636] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>> [2015-07-21 21:10:44.682947] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>> [2015-07-21 21:10:44.683240] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>> [2015-07-21 21:10:44.683472] W [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk info from vol_workdir_amd-client-0 >>>>>>> [2015-07-21 21:10:44.683506] W [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk info from vol_workdir_amd-client-2 >>>>>>> [2015-07-21 21:10:44.683532] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>> [2015-07-21 21:10:44.683551] W [fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse: 18: /test => -1 (Aucun fichier ou dossier de ce type) >>>>>>> [2015-07-21 21:10:44.683619] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>> [2015-07-21 21:10:44.683846] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 >>>>>>> [2015-07-21 21:10:45.886807] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) >>>>>>> [2015-07-21 21:10:45.893059] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) >>>>>>> [2015-07-21 21:10:45.920434] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> [2015-07-21 21:10:45.925292] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) >>>>>>> Host Unreachable, Check your connection with IPoIB >>>>>>> >>>>>>> I use GlusterFS in production since around 3 years without any block >>>>>>> problem but now the situation is awesome since more than 3 weeks? >>>>>>> Indeed, our production are down since roughly 3.5 weeks (with a lot >>>>>>> and different problems with GlusterFS v3.5.3 and now with 3.7.2-3) and >>>>>>> i need to restart it? >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Geoffrey >>>>>>> ------------------------------------------------------ >>>>>>> Geoffrey Letessier >>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>> Institut de Biologie Physico-Chimique >>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>>> >>>>>>> Le 21 juil. 2015 ? 19:36, Soumya Koduri <skoduri at redhat.com> a ?crit : >>>>>>> >>>>>>>> From the following errors, >>>>>>>> >>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020] [client.c:2118:notify] 0-vol_shared-client-0: parent translators are ready, attempting connect on transport >>>>>>>> [2015-07-21 14:36:30.498989] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 12, Protocole non disponible >>>>>>>> [2015-07-21 14:36:30.499004] E [socket.c:3015:socket_connect] 0-vol_shared-client-0: Failed to set keep-alive: Protocole non disponible >>>>>>>> >>>>>>>> looks like setting TCP_USER_TIMEOUT value to 0 on the socket failed with error (IIUC) "Protocol not available". >>>>>>>> Could you check if 'network.ping-timeout' is set to zero for that volume using 'gluster volume info'? Anyways from the code looks like 'TCP_USER_TIMEOUT' can take value zero. Not sure why it has failed. >>>>>>>> >>>>>>>> Niels, any thoughts? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Soumya >>>>>>>> >>>>>>>> On 07/21/2015 08:15 PM, Geoffrey Letessier wrote: >>>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020] [client.c:2118:notify] >>>>>>>>> 0-vol_shared-client-0: parent translators are ready, attempting connect >>>>>>>>> on transport >>>>>>>>> [2015-07-21 14:36:30.498989] W [socket.c:923:__socket_keepalive] >>>>>>>>> 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 12, Protocole non >>>>>>>>> disponible >>>>>>>>> [2015-07-21 14:36:30.499004] E [socket.c:3015:socket_connect] >>>>>>>>> 0-vol_shared-client-0: Failed to set keep-alive: Protocole non disponible >>>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150722/c2e20dee/attachment.html>