thr3ads.net - Gluster users - [Gluster-users] heaps split-brains during back-transfert [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Geoffrey Letessier

2015-Jul-22 14:59 UTC

[Gluster-users] Change transport-type on volume from tcp to rdma, tcp

I can confirm your words? Everything looks like OK with TCP proto and
more-or-less unstable with RDMA one. But TCP is slower than RDMA protocol?

In attachments you can find my volume mount log, all brick logs and several
information concerning my vol_shared volume.

Thanks in advance,
Geoffrey

PS: sorry for my next answer latencies but I will be in vacation (from this
evening) very far from any internet access.

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ing?nieur syst?me
UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr

Le 22 juil. 2015 ? 10:45, Mohammed Rafi K C <rkavunga at redhat.com> a
?crit :
> 
> 
> On 07/22/2015 01:36 PM, Geoffrey Letessier wrote:
>> Oops, i forgot to add all people in CC.
>> 
>> Yes, i guessed. 
>> 
>> With TCP protocol, all my volume seem OK and I dont note, for the
moment, any hang.
> 
> So if I understand correctly , everything is fine with tcp (no hang, no
transport end point disconnected error),and both happens for rdma. please
correct me if not so.
> 
> 
>> 
>> mount command:
>>  - with RDMA: mount -t glusterfs -o
transport=rdma,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt
>>  - with TCP:    mount -t glusterfs -o
transport=tcp,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /mnt
>> 
>> volume status:
>> # gluster volume status all
>> Status of volume: vol_home
>> Gluster process                             TCP Port  RDMA Port  Online
Pid
>>
------------------------------------------------------------------------------
>> Brick ib-storage1:/export/brick_home/brick1
>> /data                                       49159     49165      Y     
6547
>> Brick ib-storage2:/export/brick_home/brick1
>> /data                                       49161     49173      Y     
24348
>> Brick ib-storage3:/export/brick_home/brick1
>> /data                                       49152     49156      Y     
5616
>> Brick ib-storage4:/export/brick_home/brick1
>> /data                                       49152     49162      Y     
5424
>> Brick ib-storage1:/export/brick_home/brick2
>> /data                                       49160     49166      Y     
6548
>> Brick ib-storage2:/export/brick_home/brick2
>> /data                                       49162     49174      Y     
24355
>> Brick ib-storage3:/export/brick_home/brick2
>> /data                                       49153     49157      Y     
5635
>> Brick ib-storage4:/export/brick_home/brick2
>> /data                                       49153     49163      Y     
5443
>> Self-heal Daemon on localhost               N/A       N/A        Y     
6534
>> Self-heal Daemon on ib-storage3             N/A       N/A        Y     
7656
>> Self-heal Daemon on ib-storage2             N/A       N/A        Y     
24519
>> Self-heal Daemon on ib-storage4             N/A       N/A        Y     
7288
>>  
>> Task Status of Volume vol_home
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>  
>> Status of volume: vol_shared
>> Gluster process                             TCP Port  RDMA Port  Online
Pid
>>
------------------------------------------------------------------------------
>> Brick ib-storage1:/export/brick_shared/data 49152     49164      Y     
6554
>> Brick ib-storage2:/export/brick_shared/data 49152     49172      Y     
24362
>> Self-heal Daemon on localhost               N/A       N/A        Y     
6534
>> Self-heal Daemon on ib-storage3             N/A       N/A        Y     
7656
>> Self-heal Daemon on ib-storage2             N/A       N/A        Y     
24519
>> Self-heal Daemon on ib-storage4             N/A       N/A        Y     
7288
>>  
>> Task Status of Volume vol_shared
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>  
>> Status of volume: vol_workdir_amd
>> Gluster process                             TCP Port  RDMA Port  Online
Pid
>>
------------------------------------------------------------------------------
>> Brick ib-storage1:/export/brick_workdir/bri
>> ck1/data                                    49191     49192      Y     
6555
>> Brick ib-storage3:/export/brick_workdir/bri
>> ck1/data                                    49164     49165      Y     
6368
>> Brick ib-storage1:/export/brick_workdir/bri
>> ck2/data                                    49193     49194      Y     
6576
>> Brick ib-storage3:/export/brick_workdir/bri
>> ck2/data                                    49166     49167      Y     
6387
>>  
>> Task Status of Volume vol_workdir_amd
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>  
>> Status of volume: vol_workdir_intel
>> Gluster process                             TCP Port  RDMA Port  Online
Pid
>>
------------------------------------------------------------------------------
>> Brick ib-storage2:/export/brick_workdir/bri
>> ck1/data                                    49175     49176      Y     
24371
>> Brick ib-storage2:/export/brick_workdir/bri
>> ck2/data                                    49177     49178      Y     
24372
>> Brick ib-storage4:/export/brick_workdir/bri
>> ck1/data                                    49164     49165      Y     
5571
>> Brick ib-storage4:/export/brick_workdir/bri
>> ck2/data                                    49166     49167      Y     
5590
>>  
>> Task Status of Volume vol_workdir_intel
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>> 
>> Concerning the brick logs, do you wanna have all bricks on every
servers?
> any errors from client log and bricks logs, and logs which has message id
in between 102000 to 104000 from the same .
> 
> Rafi KC
> 
>> 
>> Geoffrey
>> 
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ing?nieur syst?me
>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>> 
>> Le 22 juil. 2015 ? 10:00, Mohammed Rafi K C <rkavunga at
redhat.com> a ?crit :
>> 
>>> 
>>> 
>>> On 07/22/2015 12:55 PM, Geoffrey Letessier wrote:
>>>> Concerning the hang, I just saw this only once with TCP
protocol but, actually, RDMA seems to be in cause.
>>> 
>>> If you are mounting a tcp,rdma volume using tcp protocol, all the
communication will go through the tcp connection and rdma won't come in
between client and server.
>>> 
>>>> ? And, after a moment (a few minutes after having restarted my
back-transfert of around 40TB), my volume fall down (and all my rsync too):
>>>> [root at atlas ~]# df -h /mnt
>>>> df: ? /mnt ?: Noeud final de transport n'est pas connect?
>>>> df: aucun syst?me de fichiers trait?
>>>> aka "transport endpoint is not connected ?
>>> 
>>> Can you sent me the following details , if possible, ?
>>> 1) mount command used, 2) volume status 3) Client, brick logs 
>>> 
>>> Regards
>>> Rafi KC
>>> 
>>>> 
>>>> Geoffrey
>>>> 
>>>> 
>>>> ------------------------------------------------------
>>>> Geoffrey Letessier
>>>> Responsable informatique & ing?nieur syst?me
>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>> Institut de Biologie Physico-Chimique
>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>> 
>>>> Le 22 juil. 2015 ? 09:17, Geoffrey Letessier
<geoffrey.letessier at cnrs.fr> a ?crit :
>>>> 
>>>>> Hi Rafi,
>>>>> 
>>>>> It?s what I do. But I note particularly this kind of
trouble when I mount my volumes manually.
>>>>> 
>>>>> In addition, when I changed my transport-type from tcp or
rdma to tcp,rdma, I have had to restart my volume in order they can took effect.
>>>>> 
>>>>> I wonder if these trouble are not due to RDMA protocol?
because it looks like more stable with TCP one.
>>>>> 
>>>>> Another idea?
>>>>> Thanks for replying and by advance,
>>>>> Geoffrey
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ing?nieur syst?me
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>>> 
>>>>> Le 22 juil. 2015 ? 07:33, Mohammed Rafi K C <rkavunga at
redhat.com> a ?crit :
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 07/22/2015 04:51 AM, Geoffrey Letessier wrote:
>>>>>>> Hi Niels,
>>>>>>> 
>>>>>>> Thanks for replying. 
>>>>>>> 
>>>>>>> In fact, after having checked the log, I've
discovered GlusterFS tried to connect a brick with a TCP (or RDMA) port
allocated to another volume? (bug?)
>>>>>>> For example, here is a extract of my workdir.log
file :
>>>>>>> [2015-07-21 21:34:01.820188] E
[socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-0: connection to
10.0.4.1:49161 failed (Connexion refus?e)
>>>>>>> [2015-07-21 21:34:01.822563] E
[socket.c:2332:socket_connect_finish] 0-vol_workdir_amd-client-2: connection to
10.0.4.1:49162 failed (Connexion refus?e)
>>>>>>> 
>>>>>>> But the 2 ports (49161 and 49162) concerned only my
vol_home volume, not the vol_workdir_amd one.
>>>>>>> 
>>>>>>> Now, after having restart all glusterd
synchronously (pdsh -w cl-storage[1-4] service glusterd restart), all seems to
be back into a normal situation (size, write permission, etc.)
>>>>>>> 
>>>>>>> But, a few minutes later, i note a strange thing I
notice since i?ve upgraded my cluster storage from 3.5.3 to 3.7.2-3: when I try
to mount some volume (particularly my vol_shared volume (replicated volume)) my
system can hang? And, because I use it in my bashrc file for my environment
modules, i need to restart my node. Idem if I try to do a DF on my mounted
volume (if it doesn?t hang during the mount).
>>>>>>> 
>>>>>>> With TCP transport-type, the situation seems to be
more stable..
>>>>>>> 
>>>>>>> In addition: If I restart a storage node, I can?t
use Gluster CLI (it also hang).
>>>>>>> 
>>>>>>> Do you have an idea?
>>>>>> 
>>>>>> Are you using bash script to start/mount the volume ?
If so, add a sleep after volume start and mount, to allow all the process to
start properly. Because RDMA protocol will take some time to init the resources.
>>>>>> 
>>>>>> Regards
>>>>>> Rafi KC
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> One more time, thanks a lot for your help,
>>>>>>> Geoffrey
>>>>>>> 
>>>>>>>
------------------------------------------------------
>>>>>>> Geoffrey Letessier
>>>>>>> Responsable informatique & ing?nieur syst?me
>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie
Th?orique
>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at
ibpc.fr
>>>>>>> 
>>>>>>> Le 21 juil. 2015 ? 23:49, Niels de Vos <ndevos
at redhat.com> a ?crit :
>>>>>>> 
>>>>>>>> On Tue, Jul 21, 2015 at 11:20:20PM +0200,
Geoffrey Letessier wrote:
>>>>>>>>> Hello Soumya, Hello everybody,
>>>>>>>>> 
>>>>>>>>> network.ping-timeout was set to 42 seconds.
I set it to 0 but no
>>>>>>>>> difference. The problem was, after having
re-set le transport-type to
>>>>>>>>> rdma,tcp some brick down after a few
minutes.. Despite of restarting
>>>>>>>>> volumes, after a few minutes, some
[other/different] bricks down
>>>>>>>>> again.
>>>>>>>> 
>>>>>>>> I'm not sure how if the ping-timeout is
differently handled when RDMA is
>>>>>>>> used. Adding two of the guys that know RDMA
well on CC.
>>>>>>>> 
>>>>>>>>> Now, after re-creation of my volume, bricks
keep alive but, oddly, i?m
>>>>>>>>> not able to write on my volume. In
addition, I defined a distributed
>>>>>>>>> volume with 2 servers, 4 bricks of 250GB
each and my final volume
>>>>>>>>> seems to be only sized to 500GB? It?s
amazing..
>>>>>>>> 
>>>>>>>> As seen further below, the 500GB volume is
caused by two unreachable
>>>>>>>> bricks. When the bricks are not reachable, the
size of the bricks can
>>>>>>>> not be detected by the client and therefore 2x
250 GB is missing.
>>>>>>>> 
>>>>>>>> It is unclear to me why writing to a pure
distributed volume fails. When
>>>>>>>> a brick is not reachable, and the file should
be created there, it
>>>>>>>> would normally get created on an other brick.
When the brick that should
>>>>>>>> have the file gets online, and a new lookup for
the file is done, a so
>>>>>>>> called "link file" is created, which
points to the file on the other
>>>>>>>> brick. I guess the failure has to do with the
connection issues, and I
>>>>>>>> would suggest to get that solved first.
>>>>>>>> 
>>>>>>>> HTH,
>>>>>>>> Niels
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Here you can find some information:
>>>>>>>>> # gluster volume status vol_workdir_amd
>>>>>>>>> Status of volume: vol_workdir_amd
>>>>>>>>> Gluster process                            
TCP Port  RDMA Port  Online  Pid
>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>> Brick ib-storage1:/export/brick_workdir/bri
>>>>>>>>> ck1/data                                   
49185     49186      Y       23098
>>>>>>>>> Brick ib-storage3:/export/brick_workdir/bri
>>>>>>>>> ck1/data                                   
49158     49159      Y       3886
>>>>>>>>> Brick ib-storage1:/export/brick_workdir/bri
>>>>>>>>> ck2/data                                   
49187     49188      Y       23117
>>>>>>>>> Brick ib-storage3:/export/brick_workdir/bri
>>>>>>>>> ck2/data                                   
49160     49161      Y       3905
>>>>>>>>> 
>>>>>>>>> # gluster volume info vol_workdir_amd
>>>>>>>>> 
>>>>>>>>> Volume Name: vol_workdir_amd
>>>>>>>>> Type: Distribute
>>>>>>>>> Volume ID:
087d26ea-c6df-4cbe-94af-ecd87b59aedb
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 4
>>>>>>>>> Transport-type: tcp,rdma
>>>>>>>>> Bricks:
>>>>>>>>> Brick1:
ib-storage1:/export/brick_workdir/brick1/data
>>>>>>>>> Brick2:
ib-storage3:/export/brick_workdir/brick1/data
>>>>>>>>> Brick3:
ib-storage1:/export/brick_workdir/brick2/data
>>>>>>>>> Brick4:
ib-storage3:/export/brick_workdir/brick2/data
>>>>>>>>> Options Reconfigured:
>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>> 
>>>>>>>>> # pdsh -w storage[1,3] df -h
/export/brick_workdir/brick{1,2}
>>>>>>>>> storage3: Filesystem            Size  Used
Avail Use% Mounted on
>>>>>>>>> storage3:
/dev/mapper/st--block1-blk1--workdir
>>>>>>>>> storage3:                       250G   34M 
250G   1% /export/brick_workdir/brick1
>>>>>>>>> storage3:
/dev/mapper/st--block2-blk2--workdir
>>>>>>>>> storage3:                       250G   34M 
250G   1% /export/brick_workdir/brick2
>>>>>>>>> storage1: Filesystem            Size  Used
Avail Use% Mounted on
>>>>>>>>> storage1:
/dev/mapper/st--block1-blk1--workdir
>>>>>>>>> storage1:                       250G   33M 
250G   1% /export/brick_workdir/brick1
>>>>>>>>> storage1:
/dev/mapper/st--block2-blk2--workdir
>>>>>>>>> storage1:                       250G   33M 
250G   1% /export/brick_workdir/brick2
>>>>>>>>> 
>>>>>>>>> # df -h /workdir/
>>>>>>>>> Filesystem            Size  Used Avail Use%
Mounted on
>>>>>>>>> localhost:vol_workdir_amd.rdma
>>>>>>>>>                      500G   67M  500G   1%
/workdir
>>>>>>>>> 
>>>>>>>>> # touch /workdir/test
>>>>>>>>> touch: impossible de faire un touch ?
/workdir/test ?: Aucun fichier ou dossier de ce type
>>>>>>>>> 
>>>>>>>>> # tail -30l /var/log/glusterfs/workdir.log 
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> [2015-07-21 21:10:33.927673] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> [2015-07-21 21:10:37.877231] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to
49173 (from 0)
>>>>>>>>> [2015-07-21 21:10:37.880556] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to
49174 (from 0)
>>>>>>>>> [2015-07-21 21:10:37.914661] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> [2015-07-21 21:10:37.923535] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> [2015-07-21 21:10:41.883925] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to
49173 (from 0)
>>>>>>>>> [2015-07-21 21:10:41.887085] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to
49174 (from 0)
>>>>>>>>> [2015-07-21 21:10:41.919394] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> [2015-07-21 21:10:41.932622] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> [2015-07-21 21:10:44.682636] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>>>> [2015-07-21 21:10:44.682947] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>>>> [2015-07-21 21:10:44.683240] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>>>> [2015-07-21 21:10:44.683472] W
[dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk
info from vol_workdir_amd-client-0
>>>>>>>>> [2015-07-21 21:10:44.683506] W
[dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk
info from vol_workdir_amd-client-2
>>>>>>>>> [2015-07-21 21:10:44.683532] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>>>> [2015-07-21 21:10:44.683551] W
[fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse: 18: /test => -1 (Aucun
fichier ou dossier de ce type)
>>>>>>>>> [2015-07-21 21:10:44.683619] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>>>> [2015-07-21 21:10:44.683846] W
[dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for
hash (value) = 1072520554
>>>>>>>>> [2015-07-21 21:10:45.886807] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to
49173 (from 0)
>>>>>>>>> [2015-07-21 21:10:45.893059] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to
49174 (from 0)
>>>>>>>>> [2015-07-21 21:10:45.920434] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173)
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> [2015-07-21 21:10:45.925292] W
[rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174)
>>>>>>>>> Host Unreachable, Check your connection
with IPoIB
>>>>>>>>> 
>>>>>>>>> I use GlusterFS in production since around
3 years without any block
>>>>>>>>> problem but now the situation is awesome
since more than 3 weeks?
>>>>>>>>> Indeed, our production are down since
roughly 3.5 weeks (with a lot
>>>>>>>>> and different problems with GlusterFS
v3.5.3 and now with 3.7.2-3) and
>>>>>>>>> i need to restart it? 
>>>>>>>>> 
>>>>>>>>> Thanks in advance,
>>>>>>>>> Geoffrey
>>>>>>>>>
------------------------------------------------------
>>>>>>>>> Geoffrey Letessier
>>>>>>>>> Responsable informatique & ing?nieur
syst?me
>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie
Th?orique
>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>> Tel: 01 58 41 50 93 - eMail:
geoffrey.letessier at ibpc.fr
>>>>>>>>> 
>>>>>>>>> Le 21 juil. 2015 ? 19:36, Soumya Koduri
<skoduri at redhat.com> a ?crit :
>>>>>>>>> 
>>>>>>>>>> From the following errors,
>>>>>>>>>> 
>>>>>>>>>> [2015-07-21 14:36:30.495321] I [MSGID:
114020] [client.c:2118:notify] 0-vol_shared-client-0: parent translators are
ready, attempting connect on transport
>>>>>>>>>> [2015-07-21 14:36:30.498989] W
[socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on
socket 12, Protocole non disponible
>>>>>>>>>> [2015-07-21 14:36:30.499004] E
[socket.c:3015:socket_connect] 0-vol_shared-client-0: Failed to set keep-alive:
Protocole non disponible
>>>>>>>>>> 
>>>>>>>>>> looks like setting TCP_USER_TIMEOUT
value to 0 on the socket failed with error (IIUC) "Protocol not
available".
>>>>>>>>>> Could you check if
'network.ping-timeout' is set to zero for that volume using 'gluster
volume info'? Anyways from the code looks like 'TCP_USER_TIMEOUT'
can take value zero. Not sure why it has failed.
>>>>>>>>>> 
>>>>>>>>>> Niels, any thoughts?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Soumya
>>>>>>>>>> 
>>>>>>>>>> On 07/21/2015 08:15 PM, Geoffrey
Letessier wrote:
>>>>>>>>>>> [2015-07-21 14:36:30.495321] I
[MSGID: 114020] [client.c:2118:notify]
>>>>>>>>>>> 0-vol_shared-client-0: parent
translators are ready, attempting connect
>>>>>>>>>>> on transport
>>>>>>>>>>> [2015-07-21 14:36:30.498989] W
[socket.c:923:__socket_keepalive]
>>>>>>>>>>> 0-socket: failed to set
TCP_USER_TIMEOUT 0 on socket 12, Protocole non
>>>>>>>>>>> disponible
>>>>>>>>>>> [2015-07-21 14:36:30.499004] E
[socket.c:3015:socket_connect]
>>>>>>>>>>> 0-vol_shared-client-0: Failed to
set keep-alive: Protocole non disponible
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150722/fa1d29e6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vol_shared.tgz
Type: application/octet-stream
Size: 13800 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150722/fa1d29e6/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150722/fa1d29e6/attachment-0001.html>

Geoffrey Letessier

2015-Jul-27 20:57 UTC

head link

[Gluster-users] heaps split-brains during back-transfert

Dears,

For a couple of weeks (more than one month), our computing production is stopped
due to several -but amazing- troubles with GlusterFS.

After having noticed a big problem with incorrect quota size accounted for many
many files, i decided under the guidance of Gluster team support to upgrade my
storage cluster from version 3.5.3 to the latest (3.7.2-3) because these bugs
are theoretically fixed in this branch. Now, since i?ve done this upgrade, it?s
the amazing mess and i cannot restart the production.
Indeed :
	1 - RDMA protocol is not working and hang my system / shell commands; only TCP
protocol (over Infiniband) is more or less operational   - it?s not a blocking
point but?
	2 - read/write performance relatively low
	3 - thousands split-brains are appeared.

So, for the moment, i believe GlusterFS 3.7 is not actually production ready. 

Concerning the third point: after having destroy all my volumes (RAID re-init,
new partition, GlusterFS volumes, etc.), recreate the main one, I tried to
back-transfert my data from archive/backup server info this new volume and I
note a lot of errors in my mount log file, as your can read in this extract:
[2015-07-26 22:35:16.962815] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
865083fa-984e-44bd-aacf-b8195789d9e0
[2015-07-26 22:35:16.965896] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>,
e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and
820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping conservative
merge on the file.
[2015-07-26 22:35:16.975206] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
29382d8d-c507-4d2e-b74d-dbdcb791ca65
[2015-07-26 22:35:28.719935] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,
951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and
5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping conservative
merge on the file.
[2015-07-26 22:35:29.764891] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
865083fa-984e-44bd-aacf-b8195789d9e0
[2015-07-26 22:35:29.768339] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>,
e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and
820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping conservative
merge on the file.
[2015-07-26 22:35:29.775037] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
29382d8d-c507-4d2e-b74d-dbdcb791ca65
[2015-07-26 22:35:29.776857] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,
951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and
5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping conservative
merge on the file.
[2015-07-26 22:35:29.800535] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:29382d8d-c507-4d2e-b74d-dbdcb791ca65>/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt
951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and
5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0

And when I try to browse some folders (still in mount log file):
[2015-07-27 09:00:19.005763] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
2ac27442-8be0-4985-b48f-3328a86a6686
[2015-07-27 09:00:22.322316] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>,
9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and
1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping conservative
merge on the file.
[2015-07-27 09:00:23.008771] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
2ac27442-8be0-4985-b48f-3328a86a6686
[2015-07-27 08:59:50.359187] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012588.gro
9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and
1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0
[2015-07-27 09:00:02.500419] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012590.gro
b22aec09-2be3-41ea-a976-7b8d0e6f61f0 on vol_home-client-1 and
ec100f9e-ec48-4b29-b75e-a50ec6245de6 on vol_home-client-0
[2015-07-27 09:00:02.506925] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0009059.gro
0485c093-11ca-4829-b705-e259668ebd8c on vol_home-client-1 and
e83a492b-7f8c-4b32-a76e-343f984142fe on vol_home-client-0
[2015-07-27 09:00:23.001121] W [MSGID: 108008] [afr-read-txn.c:241:afr_read_txn]
0-vol_home-replicate-0: Unreadable subvolume -1 found with event generation 2.
(Possible split-brain)
[2015-07-27 09:00:26.231262] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>,
9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and
1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping conservative
merge on the file.

And, above all, browsing folder I get a lot of input/ouput errors.

Currently I have 6.2M inodes and roughly 30TB in my "new" volume.

For the moment, Quota is disable to increase the IO performance during the
back-transfert?

Your can also find in attachments:
	- an "ls" result
	- a split-brain research result
	- the volume information and status
	- a complete volume heal info

Hoping this can help your to help me to fix all my problems and reopen the
computing production.

Thanks in advance,
Geoffrey

PS: ? Erreur d?Entr?e/Sortie ? = ? Input / Output Error ? 
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ing?nieur syst?me
UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0006.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ls_example.txt
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0005.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0007.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: split_brain__20150725.txt
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0006.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0008.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vol_home_healinfo.txt
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0007.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0009.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vol_home_info.txt
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0008.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0010.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vol_home_status.txt
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0009.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150727/5d0f756d/attachment-0011.html>

Gluster users - Jul 2015 - heaps split-brains during back-transfert

[Gluster-users] Change transport-type on volume from tcp to rdma, tcp

[Gluster-users] heaps split-brains during back-transfert