thr3ads.net - Gluster users - [Gluster-users] Change transport-type on volume from tcp to rdma, tcp [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Mohammed Rafi K C

2015-Jul-22 08:00 UTC

[Gluster-users] Change transport-type on volume from tcp to rdma, tcp

On 07/22/2015 12:55 PM, Geoffrey Letessier wrote:> Concerning the hang, I just saw this only once with TCP protocol but,
> actually, RDMA seems to be in cause.
If you are mounting a tcp,rdma volume using tcp protocol, all the
communication will go through the tcp connection and rdma won't come in
between client and server.
> ? And, after a moment (a few minutes after having restarted my
> back-transfert of around 40TB), my volume fall down (and all my rsync
> too):
> [root at atlas ~]# df -h /mnt
> df: ? /mnt ?: Noeud final de transport n'est pas connect?
> df: aucun syst?me de fichiers trait?
> aka "transport endpoint is not connected ?
Can you sent me the following details , if possible, ?
1) mount command used, 2) volume status 3) Client, brick logs

Regards
Rafi KC
>
> Geoffrey
>
>
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ing?nieur syst?me
> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> <mailto:geoffrey.letessier at ibpc.fr>
>
> Le 22 juil. 2015 ? 09:17, Geoffrey Letessier
> <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at
cnrs.fr>> a ?crit :
>
>> Hi Rafi,
>>
>> It?s what I do. But I note particularly this kind of trouble when I
>> mount my volumes manually.
>>
>> In addition, when I changed my transport-type from tcp or rdma to
>> tcp,rdma, I have had to restart my volume in order they can took
effect.
>>
>> I wonder if these trouble are not due to RDMA protocol? because it
>> looks like more stable with TCP one.
>>
>> Another idea?
>> Thanks for replying and by advance,
>> Geoffrey
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ing?nieur syst?me
>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>> <mailto:geoffrey.letessier at ibpc.fr>
>>
>> Le 22 juil. 2015 ? 07:33, Mohammed Rafi K C <rkavunga at redhat.com
>> <mailto:rkavunga at redhat.com>> a ?crit :
>>
>>>
>>>
>>> On 07/22/2015 04:51 AM, Geoffrey Letessier wrote:
>>>> Hi Niels,
>>>>
>>>> Thanks for replying. 
>>>>
>>>> In fact, after having checked the log, I've discovered
GlusterFS
>>>> tried to connect a brick with a TCP (or RDMA) port allocated to
>>>> another volume? (bug?)
>>>> For example, here is a extract of my workdir.log file :
>>>> [2015-07-21 21:34:01.820188] E
>>>> [socket.c:2332:socket_connect_finish]
0-vol_workdir_amd-client-0:
>>>> connection to 10.0.4.1:49161 failed (Connexion refus?e)
>>>> [2015-07-21 21:34:01.822563] E
>>>> [socket.c:2332:socket_connect_finish]
0-vol_workdir_amd-client-2:
>>>> connection to 10.0.4.1:49162 failed (Connexion refus?e)
>>>>
>>>> But the 2 ports (49161 and 49162) concerned only my vol_home
>>>> volume, not the vol_workdir_amd one.
>>>>
>>>> Now, after having restart all glusterd synchronously (pdsh -w
>>>> cl-storage[1-4] service glusterd restart), all seems to be back
>>>> into a normal situation (size, write permission, etc.)
>>>>
>>>> But, a few minutes later, i note a strange thing I notice since
>>>> i?ve upgraded my cluster storage from 3.5.3 to 3.7.2-3: when I
try
>>>> to mount some volume (particularly my vol_shared volume
(replicated
>>>> volume)) my system can hang? And, because I use it in my bashrc
>>>> file for my environment modules, i need to restart my node.
Idem if
>>>> I try to do a DF on my mounted volume (if it doesn?t hang
during
>>>> the mount).
>>>>
>>>> With TCP transport-type, the situation seems to be more
stable..
>>>>
>>>> In addition: If I restart a storage node, I can?t use Gluster
CLI
>>>> (it also hang).
>>>>
>>>> Do you have an idea?
>>>
>>> Are you using bash script to start/mount the volume ? If so, add a
>>> sleep after volume start and mount, to allow all the process to
>>> start properly. Because RDMA protocol will take some time to init
>>> the resources.
>>>
>>> Regards
>>> Rafi KC
>>>
>>>
>>>
>>>>
>>>> One more time, thanks a lot for your help,
>>>> Geoffrey
>>>>
>>>> ------------------------------------------------------
>>>> Geoffrey Letessier
>>>> Responsable informatique & ing?nieur syst?me
>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>> Institut de Biologie Physico-Chimique
>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>
>>>> Le 21 juil. 2015 ? 23:49, Niels de Vos <ndevos at redhat.com
>>>> <mailto:ndevos at redhat.com>> a ?crit :
>>>>
>>>>> On Tue, Jul 21, 2015 at 11:20:20PM +0200, Geoffrey
Letessier wrote:
>>>>>> Hello Soumya, Hello everybody,
>>>>>>
>>>>>> network.ping-timeout was set to 42 seconds. I set it to
0 but no
>>>>>> difference. The problem was, after having re-set le
transport-type to
>>>>>> rdma,tcp some brick down after a few minutes.. Despite
of restarting
>>>>>> volumes, after a few minutes, some [other/different]
bricks down
>>>>>> again.
>>>>>
>>>>> I'm not sure how if the ping-timeout is differently
handled when
>>>>> RDMA is
>>>>> used. Adding two of the guys that know RDMA well on CC.
>>>>>
>>>>>> Now, after re-creation of my volume, bricks keep alive
but,
>>>>>> oddly, i?m
>>>>>> not able to write on my volume. In addition, I defined
a distributed
>>>>>> volume with 2 servers, 4 bricks of 250GB each and my
final volume
>>>>>> seems to be only sized to 500GB? It?s amazing..
>>>>>
>>>>> As seen further below, the 500GB volume is caused by two
unreachable
>>>>> bricks. When the bricks are not reachable, the size of the
bricks can
>>>>> not be detected by the client and therefore 2x 250 GB is
missing.
>>>>>
>>>>> It is unclear to me why writing to a pure distributed
volume
>>>>> fails. When
>>>>> a brick is not reachable, and the file should be created
there, it
>>>>> would normally get created on an other brick. When the
brick that
>>>>> should
>>>>> have the file gets online, and a new lookup for the file is
done, a so
>>>>> called "link file" is created, which points to
the file on the other
>>>>> brick. I guess the failure has to do with the connection
issues, and I
>>>>> would suggest to get that solved first.
>>>>>
>>>>> HTH,
>>>>> Niels
>>>>>
>>>>>
>>>>>> Here you can find some information:
>>>>>> # gluster volume status vol_workdir_amd
>>>>>> Status of volume: vol_workdir_amd
>>>>>> Gluster process                             TCP Port 
RDMA Port
>>>>>>  Online  Pid
>>>>>>
------------------------------------------------------------------------------
>>>>>> Brick ib-storage1:/export/brick_workdir/bri
>>>>>> ck1/data                                    49185    
49186
>>>>>>      Y       23098
>>>>>> Brick ib-storage3:/export/brick_workdir/bri
>>>>>> ck1/data                                    49158    
49159
>>>>>>      Y       3886
>>>>>> Brick ib-storage1:/export/brick_workdir/bri
>>>>>> ck2/data                                    49187    
49188
>>>>>>      Y       23117
>>>>>> Brick ib-storage3:/export/brick_workdir/bri
>>>>>> ck2/data                                    49160    
49161
>>>>>>      Y       3905
>>>>>>
>>>>>> # gluster volume info vol_workdir_amd
>>>>>>
>>>>>> Volume Name: vol_workdir_amd
>>>>>> Type: Distribute
>>>>>> Volume ID: 087d26ea-c6df-4cbe-94af-ecd87b59aedb
>>>>>> Status: Started
>>>>>> Number of Bricks: 4
>>>>>> Transport-type: tcp,rdma
>>>>>> Bricks:
>>>>>> Brick1: ib-storage1:/export/brick_workdir/brick1/data
>>>>>> Brick2: ib-storage3:/export/brick_workdir/brick1/data
>>>>>> Brick3: ib-storage1:/export/brick_workdir/brick2/data
>>>>>> Brick4: ib-storage3:/export/brick_workdir/brick2/data
>>>>>> Options Reconfigured:
>>>>>> performance.readdir-ahead: on
>>>>>>
>>>>>> # pdsh -w storage[1,3] df -h
/export/brick_workdir/brick{1,2}
>>>>>> storage3: Filesystem            Size  Used Avail Use%
Mounted on
>>>>>> storage3: /dev/mapper/st--block1-blk1--workdir
>>>>>> storage3:                       250G   34M  250G   1%
>>>>>> /export/brick_workdir/brick1
>>>>>> storage3: /dev/mapper/st--block2-blk2--workdir
>>>>>> storage3:                       250G   34M  250G   1%
>>>>>> /export/brick_workdir/brick2
>>>>>> storage1: Filesystem            Size  Used Avail Use%
Mounted on
>>>>>> storage1: /dev/mapper/st--block1-blk1--workdir
>>>>>> storage1:                       250G   33M  250G   1%
>>>>>> /export/brick_workdir/brick1
>>>>>> storage1: /dev/mapper/st--block2-blk2--workdir
>>>>>> storage1:                       250G   33M  250G   1%
>>>>>> /export/brick_workdir/brick2
>>>>>>
>>>>>> # df -h /workdir/
>>>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>>> localhost:vol_workdir_amd.rdma
>>>>>>                      500G   67M  500G   1% /workdir
>>>>>>
>>>>>> # touch /workdir/test
>>>>>> touch: impossible de faire un touch ? /workdir/test ?:
Aucun
>>>>>> fichier ou dossier de ce type
>>>>>>
>>>>>> # tail -30l /var/log/glusterfs/workdir.log
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>> [2015-07-21 21:10:33.927673] W
>>>>>> [rdma.c:1263:gf_rdma_cm_event_handler]
>>>>>> 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED,
>>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>> [2015-07-21 21:10:37.877231] I
>>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-0:
>>>>>> changing port to 49173 (from 0)
>>>>>> [2015-07-21 21:10:37.880556] I
>>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-2:
>>>>>> changing port to 49174 (from 0)
>>>>>> [2015-07-21 21:10:37.914661] W
>>>>>> [rdma.c:1263:gf_rdma_cm_event_handler]
>>>>>> 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED,
>>>>>> error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>> [2015-07-21 21:10:37.923535] W
>>>>>> [rdma.c:1263:gf_rdma_cm_event_handler]
>>>>>> 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED,
>>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>> [2015-07-21 21:10:41.883925] I
>>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-0:
>>>>>> changing port to 49173 (from 0)
>>>>>> [2015-07-21 21:10:41.887085] I
>>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-2:
>>>>>> changing port to 49174 (from 0)
>>>>>> [2015-07-21 21:10:41.919394] W
>>>>>> [rdma.c:1263:gf_rdma_cm_event_handler]
>>>>>> 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED,
>>>>>> error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>> [2015-07-21 21:10:41.932622] W
>>>>>> [rdma.c:1263:gf_rdma_cm_event_handler]
>>>>>> 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED,
>>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>> [2015-07-21 21:10:44.682636] W
>>>>>> [dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
>>>>>> subvolume for hash (value) = 1072520554
>>>>>> [2015-07-21 21:10:44.682947] W
>>>>>> [dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
>>>>>> subvolume for hash (value) = 1072520554
>>>>>> [2015-07-21 21:10:44.683240] W
>>>>>> [dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
>>>>>> subvolume for hash (value) = 1072520554
>>>>>> [2015-07-21 21:10:44.683472] W
>>>>>> [dht-diskusage.c:48:dht_du_info_cbk]
0-vol_workdir_amd-dht:
>>>>>> failed to get disk info from vol_workdir_amd-client-0
>>>>>> [2015-07-21 21:10:44.683506] W
>>>>>> [dht-diskusage.c:48:dht_du_info_cbk]
0-vol_workdir_amd-dht:
>>>>>> failed to get disk info from vol_workdir_amd-client-2
>>>>>> [2015-07-21 21:10:44.683532] W
>>>>>> [dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
>>>>>> subvolume for hash (value) = 1072520554
>>>>>> [2015-07-21 21:10:44.683551] W
>>>>>> [fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse:
18: /test
>>>>>> => -1 (Aucun fichier ou dossier de ce type)
>>>>>> [2015-07-21 21:10:44.683619] W
>>>>>> [dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
>>>>>> subvolume for hash (value) = 1072520554
>>>>>> [2015-07-21 21:10:44.683846] W
>>>>>> [dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
>>>>>> subvolume for hash (value) = 1072520554
>>>>>> [2015-07-21 21:10:45.886807] I
>>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-0:
>>>>>> changing port to 49173 (from 0)
>>>>>> [2015-07-21 21:10:45.893059] I
>>>>>> [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-2:
>>>>>> changing port to 49174 (from 0)
>>>>>> [2015-07-21 21:10:45.920434] W
>>>>>> [rdma.c:1263:gf_rdma_cm_event_handler]
>>>>>> 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED,
>>>>>> error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>> [2015-07-21 21:10:45.925292] W
>>>>>> [rdma.c:1263:gf_rdma_cm_event_handler]
>>>>>> 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED,
>>>>>> error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>
>>>>>> I use GlusterFS in production since around 3 years
without any block
>>>>>> problem but now the situation is awesome since more
than 3 weeks?
>>>>>> Indeed, our production are down since roughly 3.5 weeks
(with a lot
>>>>>> and different problems with GlusterFS v3.5.3 and now
with
>>>>>> 3.7.2-3) and
>>>>>> i need to restart it?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Geoffrey
>>>>>> ------------------------------------------------------
>>>>>> Geoffrey Letessier
>>>>>> Responsable informatique & ing?nieur syst?me
>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>>>> Institut de Biologie Physico-Chimique
>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at
ibpc.fr
>>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>
>>>>>> Le 21 juil. 2015 ? 19:36, Soumya Koduri <skoduri at
redhat.com
>>>>>> <mailto:skoduri at redhat.com>> a ?crit :
>>>>>>
>>>>>>> From the following errors,
>>>>>>>
>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020]
>>>>>>> [client.c:2118:notify] 0-vol_shared-client-0:
parent translators
>>>>>>> are ready, attempting connect on transport
>>>>>>> [2015-07-21 14:36:30.498989] W
[socket.c:923:__socket_keepalive]
>>>>>>> 0-socket: failed to set TCP_USER_TIMEOUT 0 on
socket 12,
>>>>>>> Protocole non disponible
>>>>>>> [2015-07-21 14:36:30.499004] E
[socket.c:3015:socket_connect]
>>>>>>> 0-vol_shared-client-0: Failed to set keep-alive:
Protocole non
>>>>>>> disponible
>>>>>>>
>>>>>>> looks like setting TCP_USER_TIMEOUT value to 0 on
the socket
>>>>>>> failed with error (IIUC) "Protocol not
available".
>>>>>>> Could you check if 'network.ping-timeout'
is set to zero for
>>>>>>> that volume using 'gluster volume info'?
Anyways from the code
>>>>>>> looks like 'TCP_USER_TIMEOUT' can take
value zero. Not sure why
>>>>>>> it has failed.
>>>>>>>
>>>>>>> Niels, any thoughts?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>>
>>>>>>> On 07/21/2015 08:15 PM, Geoffrey Letessier wrote:
>>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020]
>>>>>>>> [client.c:2118:notify]
>>>>>>>> 0-vol_shared-client-0: parent translators are
ready, attempting
>>>>>>>> connect
>>>>>>>> on transport
>>>>>>>> [2015-07-21 14:36:30.498989] W
[socket.c:923:__socket_keepalive]
>>>>>>>> 0-socket: failed to set TCP_USER_TIMEOUT 0 on
socket 12,
>>>>>>>> Protocole non
>>>>>>>> disponible
>>>>>>>> [2015-07-21 14:36:30.499004] E
[socket.c:3015:socket_connect]
>>>>>>>> 0-vol_shared-client-0: Failed to set
keep-alive: Protocole non
>>>>>>>> disponible
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150722/1bd7eea5/attachment.html>

Geoffrey Letessier

2015-Jul-22 08:06 UTC

head link

[Gluster-users] Change transport-type on volume from tcp to rdma, tcp

Oops, i forgot to add all people in CC.

Yes, i guessed. 

With TCP protocol, all my volume seem OK and I dont note, for the moment, any
hang.

mount command:
	- with RDMA: mount -t glusterfs -o
transport=rdma,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt
	- with TCP:    mount -t glusterfs -o
transport=tcp,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt

volume status:
# gluster volume status all
Status of volume: vol_home
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ib-storage1:/export/brick_home/brick1
/data                                       49159     49165      Y       6547 
Brick ib-storage2:/export/brick_home/brick1
/data                                       49161     49173      Y       24348
Brick ib-storage3:/export/brick_home/brick1
/data                                       49152     49156      Y       5616 
Brick ib-storage4:/export/brick_home/brick1
/data                                       49152     49162      Y       5424 
Brick ib-storage1:/export/brick_home/brick2
/data                                       49160     49166      Y       6548 
Brick ib-storage2:/export/brick_home/brick2
/data                                       49162     49174      Y       24355
Brick ib-storage3:/export/brick_home/brick2
/data                                       49153     49157      Y       5635 
Brick ib-storage4:/export/brick_home/brick2
/data                                       49153     49163      Y       5443 
Self-heal Daemon on localhost               N/A       N/A        Y       6534 
Self-heal Daemon on ib-storage3             N/A       N/A        Y       7656 
Self-heal Daemon on ib-storage2             N/A       N/A        Y       24519
Self-heal Daemon on ib-storage4             N/A       N/A        Y       7288 
 
Task Status of Volume vol_home
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_shared
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ib-storage1:/export/brick_shared/data 49152     49164      Y       6554 
Brick ib-storage2:/export/brick_shared/data 49152     49172      Y       24362
Self-heal Daemon on localhost               N/A       N/A        Y       6534 
Self-heal Daemon on ib-storage3             N/A       N/A        Y       7656 
Self-heal Daemon on ib-storage2             N/A       N/A        Y       24519
Self-heal Daemon on ib-storage4             N/A       N/A        Y       7288 
 
Task Status of Volume vol_shared
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_workdir_amd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ib-storage1:/export/brick_workdir/bri
ck1/data                                    49191     49192      Y       6555 
Brick ib-storage3:/export/brick_workdir/bri
ck1/data                                    49164     49165      Y       6368 
Brick ib-storage1:/export/brick_workdir/bri
ck2/data                                    49193     49194      Y       6576 
Brick ib-storage3:/export/brick_workdir/bri
ck2/data                                    49166     49167      Y       6387 
 
Task Status of Volume vol_workdir_amd
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_workdir_intel
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ib-storage2:/export/brick_workdir/bri
ck1/data                                    49175     49176      Y       24371
Brick ib-storage2:/export/brick_workdir/bri
ck2/data                                    49177     49178      Y       24372
Brick ib-storage4:/export/brick_workdir/bri
ck1/data                                    49164     49165      Y       5571 
Brick ib-storage4:/export/brick_workdir/bri
ck2/data                                    49166     49167      Y       5590 
 
Task Status of Volume vol_workdir_intel
------------------------------------------------------------------------------
There are no active volume tasks

Concerning the brick logs, do you wanna have all bricks on every servers?

Geoffrey

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ing?nieur syst?me
UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr

Le 22 juil. 2015 ? 10:00, Mohammed Rafi K C <rkavunga at redhat.com> a
?crit :
> 
> 
> On 07/22/2015 12:55 PM, Geoffrey Letessier wrote:
>> Concerning the hang, I just saw this only once with TCP protocol but,
actually, RDMA seems to be in cause.
> 
> If you are mounting a tcp,rdma volume using tcp protocol, all the
communication will go through the tcp connection and rdma won't come in
between client and server.
> 
>> ? And, after a moment (a few minutes after having restarted my
back-transfert of around 40TB), my volume fall down (and all my rsync too):
>> [root at atlas ~]# df -h /mnt
>> df: ? /mnt ?: Noeud final de transport n'est pas connect?
>> df: aucun syst?me de fichiers trait?
>> aka "transport endpoint is not connected ?
> 
> Can you sent me the following details , if possible, ?
> 1) mount command used, 2) volume status 3) Client, brick logs 
> 
> Regards
> Rafi KC
> 
>> 
>> Geoffrey
>> 
>> 
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ing?nieur syst?me
>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>> 
>> Le 22 juil. 2015 ? 09:17, Geoffrey Letessier <geoffrey.letessier at
cnrs.fr> a ?crit :
>> 
>>> Hi Rafi,
>>> 
>>> It?s what I do. But I note particularly this kind of trouble when I
mount my volumes manually.
>>> 
>>> In addition, when I changed my transport-type from tcp or rdma to
tcp,rdma, I have had to restart my volume in order they can took effect.
>>> 
>>> I wonder if these trouble are not due to RDMA protocol? because it
looks like more stable with TCP one.
>>> 
>>> Another idea?
>>> Thanks for replying and by advance,
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ing?nieur syst?me
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>> 
>>> Le 22 juil. 2015 ? 07:33, Mohammed Rafi K C <rkavunga at
redhat.com> a ?crit :
>>> 
>>>> 
>>>> 
>>>> On 07/22/2015 04:51 AM, Geoffrey Letessier wrote:
>>>>> Hi Niels,
>>>>> 
>>>>> Thanks for replying. 
>>>>> 
>>>>> In fact, after having checked the log, I've discovered
GlusterFS tried to connect a brick with a TCP (or RDMA) port allocated to
another volume? (bug?)
>>>>> For example, here is a extract of my workdir.log file :
>>>>> [2015-07-21 21:34:01.820188] E
[socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-0: connection to
10.0.4.1:49161 failed (Connexion refus?e)
>>>>> [2015-07-21 21:34:01.822563] E
[socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-2: connection to
10.0.4.1:49162 failed (Connexion refus?e)
>>>>> 
>>>>> But the 2 ports (49161 and 49162) concerned only my
vol_home volume, not the vol_workdir_amd one.
>>>>> 
>>>>> Now, after having restart all glusterd synchronously (pdsh
-w cl-storage[1-4] service glusterd restart), all seems to be back into a normal
situation (size, write permission, etc.)
>>>>> 
>>>>> But, a few minutes later, i note a strange thing I notice
since i?ve upgraded my cluster storage from 3.5.3 to 3.7.2-3: when I try to
mount some volume (particularly my vol_shared volume (replicated volume)) my
system can hang? And, because I use it in my bashrc file for my environment
modules, i need to restart my node. Idem if I try to do a DF on my mounted
volume (if it doesn?t hang during the mount).
>>>>> 
>>>>> With TCP transport-type, the situation seems to be more
stable..
>>>>> 
>>>>> In addition: If I restart a storage node, I can?t use
Gluster CLI (it also hang).
>>>>> 
>>>>> Do you have an idea?
>>>> 
>>>> Are you using bash script to start/mount the volume ? If so,
add a sleep after volume start and mount, to allow all the process to start
properly. Because RDMA protocol will take some time to init the resources.
>>>> 
>>>> Regards
>>>> Rafi KC
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> One more time, thanks a lot for your help,
>>>>> Geoffrey
>>>>> 
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ing?nieur syst?me
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>>> 
>>>>> Le 21 juil. 2015 ? 23:49, Niels de Vos <ndevos at
redhat.com> a ?crit :
>>>>> 
>>>>>> On Tue, Jul 21, 2015 at 11:20:20PM +0200, Geoffrey
Letessier wrote:
>>>>>>> Hello Soumya, Hello everybody,
>>>>>>> 
>>>>>>> network.ping-timeout was set to 42 seconds. I set
it to 0 but no
>>>>>>> difference. The problem was, after having re-set le
transport-type to
>>>>>>> rdma,tcp some brick down after a few minutes..
Despite of restarting
>>>>>>> volumes, after a few minutes, some
[other/different] bricks down
>>>>>>> again.
>>>>>> 
>>>>>> I'm not sure how if the ping-timeout is differently
handled when RDMA is
>>>>>> used. Adding two of the guys that know RDMA well on CC.
>>>>>> 
>>>>>>> Now, after re-creation of my volume, bricks keep
alive but, oddly, i?m
>>>>>>> not able to write on my volume. In addition, I
defined a distributed
>>>>>>> volume with 2 servers, 4 bricks of 250GB each and
my final volume
>>>>>>> seems to be only sized to 500GB? It?s amazing.. 
>>>>>> 
>>>>>> As seen further below, the 500GB volume is caused by
two unreachable
>>>>>> bricks. When the bricks are not reachable, the size of
the bricks can
>>>>>> not be detected by the client and therefore 2x 250 GB
is missing.
>>>>>> 
>>>>>> It is unclear to me why writing to a pure distributed
volume fails. When
>>>>>> a brick is not reachable, and the file should be
created there, it
>>>>>> would normally get created on an other brick. When the
brick that should
>>>>>> have the file gets online, and a new lookup for the
file is done, a so
>>>>>> called "link file" is created, which points
to the file on the other
>>>>>> brick. I guess the failure has to do with the
connection issues, and I
>>>>>> would suggest to get that solved first.
>>>>>> 
>>>>>> HTH,
>>>>>> Niels
>>>>>> 
>>>>>> 
>>>>>>> Here you can find some information:
>>>>>>> # gluster volume status vol_workdir_amd
>>>>>>> Status of volume: vol_workdir_amd
>>>>>>> Gluster process                             TCP
Port  RDMA Port  Online  Pid
>>>>>>>
------------------------------------------------------------------------------
>>>>>>> Brick ib-storage1:/export/brick_workdir/bri
>>>>>>> ck1/data                                    49185  
49186      Y       23098
>>>>>>> Brick ib-storage3:/export/brick_workdir/bri
>>>>>>> ck1/data                                    49158  
49159      Y       3886
>>>>>>> Brick ib-storage1:/export/brick_workdir/bri
>>>>>>> ck2/data                                    49187  
49188      Y       23117
>>>>>>> Brick ib-storage3:/export/brick_workdir/bri
>>>>>>> ck2/data                                    49160  
49161      Y       3905
>>>>>>> 
>>>>>>> # gluster volume info vol_workdir_amd
>>>>>>> 
>>>>>>> Volume Name: vol_workdir_amd
>>>>>>> Type: Distribute
>>>>>>> Volume ID: 087d26ea-c6df-4cbe-94af-ecd87b59aedb
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 4
>>>>>>> Transport-type: tcp,rdma
>>>>>>> Bricks:
>>>>>>> Brick1:
ib-storage1:/export/brick_workdir/brick1/data
>>>>>>> Brick2:
ib-storage3:/export/brick_workdir/brick1/data
>>>>>>> Brick3:
ib-storage1:/export/brick_workdir/brick2/data
>>>>>>> Brick4:
ib-storage3:/export/brick_workdir/brick2/data
>>>>>>> Options Reconfigured:
>>>>>>> performance.readdir-ahead: on
>>>>>>> 
>>>>>>> # pdsh -w storage[1,3] df -h
/export/brick_workdir/brick{1,2}
>>>>>>> storage3: Filesystem            Size  Used Avail
Use% Mounted on
>>>>>>> storage3: /dev/mapper/st--block1-blk1--workdir
>>>>>>> storage3:                       250G   34M  250G  
1% /export/brick_workdir/brick1
>>>>>>> storage3: /dev/mapper/st--block2-blk2--workdir
>>>>>>> storage3:                       250G   34M  250G  
1% /export/brick_workdir/brick2
>>>>>>> storage1: Filesystem            Size  Used Avail
Use% Mounted on
>>>>>>> storage1: /dev/mapper/st--block1-blk1--workdir
>>>>>>> storage1:                       250G   33M  250G  
1% /export/brick_workdir/brick1
>>>>>>> storage1: /dev/mapper/st--block2-blk2--workdir
>>>>>>> storage1:                       250G   33M  250G  
1% /export/brick_workdir/brick2
>>>>>>> 
>>>>>>> # df -h /workdir/
>>>>>>> Filesystem            Size  Used Avail Use% Mounted
on
>>>>>>> localhost:vol_workdir_amd.rdma
>>>>>>>                      500G   67M  500G   1% /workdir
>>>>>>> 
>>>>>>> # touch /workdir/test
>>>>>>> touch: impossible de faire un touch ? /workdir/test
?: Aucun fichier ou dossier de ce type
>>>>>>> 
>>>>>>> # tail -30l /var/log/glusterfs/workdir.log 
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> [2015-07-21 21:10:33.927673] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> [2015-07-21 21:10:37.877231] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to
49173 (from 0)
>>>>>>> [2015-07-21 21:10:37.880556] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to
49174 (from 0)
>>>>>>> [2015-07-21 21:10:37.914661] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> [2015-07-21 21:10:37.923535] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> [2015-07-21 21:10:41.883925] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to
49173 (from 0)
>>>>>>> [2015-07-21 21:10:41.887085] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to
49174 (from 0)
>>>>>>> [2015-07-21 21:10:41.919394] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> [2015-07-21 21:10:41.932622] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> [2015-07-21 21:10:44.682636] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>> [2015-07-21 21:10:44.682947] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>> [2015-07-21 21:10:44.683240] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>> [2015-07-21 21:10:44.683472] W
[dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk
info from vol_workdir_amd-client-0
>>>>>>> [2015-07-21 21:10:44.683506] W
[dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk
info from vol_workdir_amd-client-2
>>>>>>> [2015-07-21 21:10:44.683532] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>> [2015-07-21 21:10:44.683551] W
[fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse: 18: /test => -1 (Aucun
fichier ou dossier de ce type)
>>>>>>> [2015-07-21 21:10:44.683619] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>> [2015-07-21 21:10:44.683846] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>> [2015-07-21 21:10:45.886807] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to
49173 (from 0)
>>>>>>> [2015-07-21 21:10:45.893059] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to
49174 (from 0)
>>>>>>> [2015-07-21 21:10:45.920434] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> [2015-07-21 21:10:45.925292] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>> Host Unreachable, Check your connection with IPoIB
>>>>>>> 
>>>>>>> I use GlusterFS in production since around 3 years
without any block
>>>>>>> problem but now the situation is awesome since more
than 3 weeks?
>>>>>>> Indeed, our production are down since roughly 3.5
weeks (with a lot
>>>>>>> and different problems with GlusterFS v3.5.3 and
now with 3.7.2-3) and
>>>>>>> i need to restart it? 
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> Geoffrey
>>>>>>>
------------------------------------------------------
>>>>>>> Geoffrey Letessier
>>>>>>> Responsable informatique & ing?nieur syst?me
>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie
Th?orique
>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at
ibpc.fr
>>>>>>> 
>>>>>>> Le 21 juil. 2015 ? 19:36, Soumya Koduri <skoduri
at redhat.com> a ?crit :
>>>>>>> 
>>>>>>>> From the following errors,
>>>>>>>> 
>>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID: 114020]
[client.c:2118:notify] 0-vol_shared-client-0: parent translators are ready,
attempting connect on transport
>>>>>>>> [2015-07-21 14:36:30.498989] W
[socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on
socket 12, Protocole non disponible
>>>>>>>> [2015-07-21 14:36:30.499004] E
[socket.c:3015:socket_connect] 0-vol_shared-client-0: Failed to set keep-alive:
Protocole non disponible
>>>>>>>> 
>>>>>>>> looks like setting TCP_USER_TIMEOUT value to 0
on the socket failed with error (IIUC) "Protocol not available".
>>>>>>>> Could you check if
'network.ping-timeout' is set to zero for that volume using 'gluster
volume info'? Anyways from the code looks like 'TCP_USER_TIMEOUT'
can take value zero. Not sure why it has failed.
>>>>>>>> 
>>>>>>>> Niels, any thoughts?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Soumya
>>>>>>>> 
>>>>>>>> On 07/21/2015 08:15 PM, Geoffrey Letessier
wrote:
>>>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID:
114020] [client.c:2118:notify]
>>>>>>>>> 0-vol_shared-client-0: parent translators
are ready, attempting connect
>>>>>>>>> on transport
>>>>>>>>> [2015-07-21 14:36:30.498989] W
[socket.c:923:__socket_keepalive]
>>>>>>>>> 0-socket: failed to set TCP_USER_TIMEOUT 0
on socket 12, Protocole non
>>>>>>>>> disponible
>>>>>>>>> [2015-07-21 14:36:30.499004] E
[socket.c:3015:socket_connect]
>>>>>>>>> 0-vol_shared-client-0: Failed to set
keep-alive: Protocole non disponible
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150722/c2e20dee/attachment.html>

Gluster users - Jul 2015 - Change transport-type on volume from tcp to rdma, tcp

[Gluster-users] Change transport-type on volume from tcp to rdma, tcp

[Gluster-users] Change transport-type on volume from tcp to rdma, tcp