Ashish Pandey
2018-Sep-27 11:14 UTC
[Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Yes, you can. If not me others may also reply. --- Ashish ----- Original Message ----- From: "Mauro Tridici" <mauro.tridici at cmcc.it> To: "Ashish Pandey" <aspandey at redhat.com> Cc: "gluster-users" <gluster-users at gluster.org> Sent: Thursday, September 27, 2018 4:24:12 PM Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version Dear Ashish, I can not thank you enough! Your procedure and description is very detailed. I think to follow the first approach after setting network.ping-timeout option to 0 (If I?m not wrong ?0" means ?infinite?...I noticed that this value reduced rebalance errors). After the fix I will set network.ping-timeout option to default value. Could I contact you again if I need some kind of suggestion? Thank you very much again. Have a good day, Mauro Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < aspandey at redhat.com > ha scritto: Hi Mauro, We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37. That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time. I have named it V1 to V6. Problem: Take the case of V1. The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes. However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes. This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible. In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it. Now, we can have two approach to correct/modify this setup. Approach 1 We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes. Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes. While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data. Approach 2 In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1- Step 1 - Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command) Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free> Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free> Command : gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume Use replace-brick command to move following bricks on s06-stg node one by one Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free> Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free> Step 2 - After, every replace-brick command, you have to wait for heal to be completed. check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed. After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that the nodes would be different on which you have to move the bricks. V1 Brick37: s04-stg:/gluster/mnt1/brick Brick38: s04-stg:/gluster/mnt2/brick Brick39: s04-stg:/gluster/mnt3/brick Brick40: s04-stg:/gluster/mnt4/brick Brick41: s04-stg:/gluster/mnt5/brick Brick42: s04-stg:/gluster/mnt6/brick V2 Brick43: s04-stg:/gluster/mnt7/brick Brick44: s04-stg:/gluster/mnt8/brick Brick45: s04-stg:/gluster/mnt9/brick Brick46: s04-stg:/gluster/mnt10/brick Brick47: s04-stg:/gluster/mnt11/brick Brick48: s04-stg:/gluster/mnt12/brick V3 Brick49: s05-stg:/gluster/mnt1/brick Brick50: s05-stg:/gluster/mnt2/brick Brick51: s05-stg:/gluster/mnt3/brick Brick52: s05-stg:/gluster/mnt4/brick Brick53: s05-stg:/gluster/mnt5/brick Brick54: s05-stg:/gluster/mnt6/brick V4 Brick55: s05-stg:/gluster/mnt7/brick Brick56: s05-stg:/gluster/mnt8/brick Brick57: s05-stg:/gluster/mnt9/brick Brick58: s05-stg:/gluster/mnt10/brick Brick59: s05-stg:/gluster/mnt11/brick Brick60: s05-stg:/gluster/mnt12/brick V5 Brick61: s06-stg:/gluster/mnt1/brick Brick62: s06-stg:/gluster/mnt2/brick Brick63: s06-stg:/gluster/mnt3/brick Brick64: s06-stg:/gluster/mnt4/brick Brick65: s06-stg:/gluster/mnt5/brick Brick66: s06-stg:/gluster/mnt6/brick V6 Brick67: s06-stg:/gluster/mnt7/brick Brick68: s06-stg:/gluster/mnt8/brick Brick69: s06-stg:/gluster/mnt9/brick Brick70: s06-stg:/gluster/mnt10/brick Brick71: s06-stg:/gluster/mnt11/brick Brick72: s06-stg:/gluster/mnt12/brick Just a note that these steps need movement of data. Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next. Let me know if you have any issues. --- Ashish ----- Original Message ----- From: "Mauro Tridici" < mauro.tridici at cmcc.it > To: "Ashish Pandey" < aspandey at redhat.com > Cc: "gluster-users" < gluster-users at gluster.org > Sent: Thursday, September 27, 2018 4:03:04 PM Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version Dear Ashish, I hope I don?t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem. Please, forgive my insistence. Thank you in advance, Mauro <blockquote> Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < mauro.tridici at cmcc.it > ha scritto: Hi Ashish, sure, no problem! We are a little bit worried, but we can wait :-) Thank you very much for your support and your availability. Regards, Mauro <blockquote> Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < aspandey at redhat.com > ha scritto: Hi Mauro, Yes, I can provide you step by step procedure to correct it. Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry? --- Ashish ----- Original Message ----- From: "Mauro Tridici" < mauro.tridici at cmcc.it > To: "Ashish Pandey" < aspandey at redhat.com > Cc: "gluster-users" < gluster-users at gluster.org > Sent: Wednesday, September 26, 2018 6:54:19 PM Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version Hi Ashish, in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old). I just stopped the running rebalance (as you can see at the bottom of the rebalance log file). So, if exists a safe procedure to correct the problem I would like execute it. I don?t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced? The following outputs show the result of ?df -h? command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently. [root at s06 bricks]# df -h File system Dim. Usati Dispon. Uso% Montato su /dev/mapper/cl_s06-root 100G 2,1G 98G 3% / devtmpfs 32G 0 32G 0% /dev tmpfs 32G 4,0K 32G 1% /dev/shm tmpfs 32G 26M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var /dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster /dev/sda1 1014M 152M 863M 15% /boot /dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3 /dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6 /dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2 /dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4 /dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9 /dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1 /dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7 /dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5 /dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8 /dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11 /dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10 /dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12 tmpfs 6,3G 0 6,3G 0% /run/user/0 [root at s01 ~]# df -h File system Dim. Usati Dispon. Uso% Montato su /dev/mapper/cl_s01-root 100G 5,3G 95G 6% / devtmpfs 32G 0 32G 0% /dev tmpfs 32G 39M 32G 1% /dev/shm tmpfs 32G 26M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/cl_s01-var 100G 11G 90G 11% /var /dev/md127 1015M 151M 865M 15% /boot /dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster /dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7 /dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11 /dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4 /dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10 /dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8 /dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12 /dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9 /dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6 /dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5 /dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3 /dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1 /dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2 tmpfs 6,3G 0 6,3G 0% /run/user/0 s01-stg:tier2 420T 159T 262T 38% /tier2 As you can see, used space value of each brick of the last servers is about 800GB. Thank you, Mauro <blockquote> Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < aspandey at redhat.com > ha scritto: Hi Mauro, rebalance and brick logs should be the first thing we should go through. There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure. You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration. That means 2 bricks on each node for one subvolume. The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete. I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks. After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes. Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data. --- Ashish ----- Original Message ----- From: "Mauro Tridici" < mauro.tridici at cmcc.it > To: "Ashish Pandey" < aspandey at redhat.com > Cc: "gluster-users" < gluster-users at gluster.org > Sent: Wednesday, September 26, 2018 5:55:02 PM Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version Dear Ashish, thank you for you answer. I could provide you the entire log file related to glusterd, glusterfsd and rebalance. Please, could you indicate which one you need first? Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it? Many thanks, Mauro <blockquote> Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < aspandey at redhat.com > ha scritto: I think we don't have enough logs to debug this so I would suggest you to provide more logs/info. I have also observed that the configuration and setup of your volume is not very efficient. For example: Brick37: s04-stg:/gluster/mnt1/brick Brick38: s04-stg:/gluster/mnt2/brick Brick39: s04-stg:/gluster/mnt3/brick Brick40: s04-stg:/gluster/mnt4/brick Brick41: s04-stg:/gluster/mnt5/brick Brick42: s04-stg:/gluster/mnt6/brick Brick43: s04-stg:/gluster/mnt7/brick Brick44: s04-stg:/gluster/mnt8/brick Brick45: s04-stg:/gluster/mnt9/brick Brick46: s04-stg:/gluster/mnt10/brick Brick47: s04-stg:/gluster/mnt11/brick Brick48: s04-stg:/gluster/mnt12/brick These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case. --- Ashish ----- Original Message ----- From: "Mauro Tridici" < mauro.tridici at cmcc.it > To: "gluster-users" < gluster-users at gluster.org > Sent: Wednesday, September 26, 2018 3:38:35 PM Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version Dear All, Dear Nithya, after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong. Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high. [root at s01 ~]# gluster volume rebalance tier2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 19 161.6GB 537 2 2 in progress 0:32:23 s02-stg 25 212.7GB 526 5 2 in progress 0:32:25 s03-stg 4 69.1GB 511 0 0 in progress 0:32:25 s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25 s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25 s06-stg 3 1.2GB 8032 11 3 failed 0:17:57 Estimated time left for rebalance to complete : 3601:05:41 volume rebalance: tier2: success When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log Error type 1) [2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining= 000000, good=100111, bad=011000) [2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining= 000000, good=111101, bad=000010) Error type 2) [2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer) Error type 3) [2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10) [2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10) [2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers e-11:71373083776).Skipping file. Error type 4) W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected Error type 5) [2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55 90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down Error type 6) [2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting. It seems that there are some network or timeout problems, but the network usage/traffic values are not so high. Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters? Could you, please, help me to understand the cause of the problems above? You can find below our volume info: (volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD) [root at s04 ~]# gluster vol info Volume Name: tier2 Type: Distributed-Disperse Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c Status: Started Snapshot Count: 0 Number of Bricks: 12 x (4 + 2) = 72 Transport-type: tcp Bricks: Brick1: s01-stg:/gluster/mnt1/brick Brick2: s02-stg:/gluster/mnt1/brick Brick3: s03-stg:/gluster/mnt1/brick Brick4: s01-stg:/gluster/mnt2/brick Brick5: s02-stg:/gluster/mnt2/brick Brick6: s03-stg:/gluster/mnt2/brick Brick7: s01-stg:/gluster/mnt3/brick Brick8: s02-stg:/gluster/mnt3/brick Brick9: s03-stg:/gluster/mnt3/brick Brick10: s01-stg:/gluster/mnt4/brick Brick11: s02-stg:/gluster/mnt4/brick Brick12: s03-stg:/gluster/mnt4/brick Brick13: s01-stg:/gluster/mnt5/brick Brick14: s02-stg:/gluster/mnt5/brick Brick15: s03-stg:/gluster/mnt5/brick Brick16: s01-stg:/gluster/mnt6/brick Brick17: s02-stg:/gluster/mnt6/brick Brick18: s03-stg:/gluster/mnt6/brick Brick19: s01-stg:/gluster/mnt7/brick Brick20: s02-stg:/gluster/mnt7/brick Brick21: s03-stg:/gluster/mnt7/brick Brick22: s01-stg:/gluster/mnt8/brick Brick23: s02-stg:/gluster/mnt8/brick Brick24: s03-stg:/gluster/mnt8/brick Brick25: s01-stg:/gluster/mnt9/brick Brick26: s02-stg:/gluster/mnt9/brick Brick27: s03-stg:/gluster/mnt9/brick Brick28: s01-stg:/gluster/mnt10/brick Brick29: s02-stg:/gluster/mnt10/brick Brick30: s03-stg:/gluster/mnt10/brick Brick31: s01-stg:/gluster/mnt11/brick Brick32: s02-stg:/gluster/mnt11/brick Brick33: s03-stg:/gluster/mnt11/brick Brick34: s01-stg:/gluster/mnt12/brick Brick35: s02-stg:/gluster/mnt12/brick Brick36: s03-stg:/gluster/mnt12/brick Brick37: s04-stg:/gluster/mnt1/brick Brick38: s04-stg:/gluster/mnt2/brick Brick39: s04-stg:/gluster/mnt3/brick Brick40: s04-stg:/gluster/mnt4/brick Brick41: s04-stg:/gluster/mnt5/brick Brick42: s04-stg:/gluster/mnt6/brick Brick43: s04-stg:/gluster/mnt7/brick Brick44: s04-stg:/gluster/mnt8/brick Brick45: s04-stg:/gluster/mnt9/brick Brick46: s04-stg:/gluster/mnt10/brick Brick47: s04-stg:/gluster/mnt11/brick Brick48: s04-stg:/gluster/mnt12/brick Brick49: s05-stg:/gluster/mnt1/brick Brick50: s05-stg:/gluster/mnt2/brick Brick51: s05-stg:/gluster/mnt3/brick Brick52: s05-stg:/gluster/mnt4/brick Brick53: s05-stg:/gluster/mnt5/brick Brick54: s05-stg:/gluster/mnt6/brick Brick55: s05-stg:/gluster/mnt7/brick Brick56: s05-stg:/gluster/mnt8/brick Brick57: s05-stg:/gluster/mnt9/brick Brick58: s05-stg:/gluster/mnt10/brick Brick59: s05-stg:/gluster/mnt11/brick Brick60: s05-stg:/gluster/mnt12/brick Brick61: s06-stg:/gluster/mnt1/brick Brick62: s06-stg:/gluster/mnt2/brick Brick63: s06-stg:/gluster/mnt3/brick Brick64: s06-stg:/gluster/mnt4/brick Brick65: s06-stg:/gluster/mnt5/brick Brick66: s06-stg:/gluster/mnt6/brick Brick67: s06-stg:/gluster/mnt7/brick Brick68: s06-stg:/gluster/mnt8/brick Brick69: s06-stg:/gluster/mnt9/brick Brick70: s06-stg:/gluster/mnt10/brick Brick71: s06-stg:/gluster/mnt11/brick Brick72: s06-stg:/gluster/mnt12/brick Options Reconfigured: network.ping-timeout: 60 diagnostics.count-fop-hits: on diagnostics.latency-measurement: on cluster.server-quorum-type: server features.default-soft-limit: 90 features.quota-deem-statfs: on performance.io -thread-count: 16 disperse.cpu-extensions: auto performance.io -cache: off network.inode-lru-limit: 50000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on cluster.readdir-optimize: on performance.parallel-readdir: off performance.readdir-ahead: on cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 4 nfs.disable: on transport.address-family: inet cluster.quorum-type: auto cluster.min-free-disk: 10 performance.client-io-threads: on features.quota: on features.inode-quota: on features.bitrot: on features.scrub: Active cluster.brick-multiplex: on cluster.server-quorum-ratio: 51% If it can help, I paste here the output of ?free -m? command executed on all the cluster nodes: The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement? [root at s06 ~]# free -m total used free shared buff/cache available Mem: 64309 10409 464 15 53434 52998 Swap: 65535 103 65432 Thank you in advance. Sorry for my long message, but I?m trying to notify you all available information. Regards, Mauro _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users </blockquote> _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users </blockquote> ------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it https://it.linkedin.com/in/mauro-tridici-5977238b </blockquote> ------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it https://it.linkedin.com/in/mauro-tridici-5977238b _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users </blockquote> ------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it https://it.linkedin.com/in/mauro-tridici-5977238b _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180927/53912d09/attachment.html>
Mauro Tridici
2018-Sep-28 10:51 UTC
[Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish, as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0). This choice was due to the absence of empty brick to be used as indicated in the second approach. So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04). Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04. At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn?t stop again before complete the rebalance) Now rebalance is not moving data, it?s only scanning files (please, take a look to the following output) [root at s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09 Estimated time left for rebalance to complete : 442:45:06 If I?m not wrong, remove-brick rebalances entire cluster each time it start. Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster? Thank you in advance, Mauro> Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey <aspandey at redhat.com> ha scritto: > > > Yes, you can. > If not me others may also reply. > > --- > Ashish > > From: "Mauro Tridici" <mauro.tridici at cmcc.it> > To: "Ashish Pandey" <aspandey at redhat.com> > Cc: "gluster-users" <gluster-users at gluster.org> > Sent: Thursday, September 27, 2018 4:24:12 PM > Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version > > > Dear Ashish, > > I can not thank you enough! > Your procedure and description is very detailed. > I think to follow the first approach after setting network.ping-timeout option to 0 (If I?m not wrong ?0" means ?infinite?...I noticed that this value reduced rebalance errors). > After the fix I will set network.ping-timeout option to default value. > > Could I contact you again if I need some kind of suggestion? > > Thank you very much again. > Have a good day, > Mauro > > > Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey <aspandey at redhat.com <mailto:aspandey at redhat.com>> ha scritto: > > > Hi Mauro, > > We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37. > That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time. > I have named it V1 to V6. > > Problem: > Take the case of V1. > The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes. > However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes. > This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible. > In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it. > > Now, we can have two approach to correct/modify this setup. > > Approach 1 > We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes. > Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes. > > While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data. > > Approach 2 > > In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1- > > Step 1 - > Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command) > > Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free> > Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free> > > Command : > gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force > Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume > > > Use replace-brick command to move following bricks on s06-stg node one by one > > Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free> > Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free> > > > Step 2 - After, every replace-brick command, you have to wait for heal to be completed. > check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed. > > After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that > the nodes would be different on which you have to move the bricks. > > > > > V1 > > Brick37: s04-stg:/gluster/mnt1/brick > Brick38: s04-stg:/gluster/mnt2/brick > Brick39: s04-stg:/gluster/mnt3/brick > Brick40: s04-stg:/gluster/mnt4/brick > Brick41: s04-stg:/gluster/mnt5/brick > Brick42: s04-stg:/gluster/mnt6/brick > > V2 > Brick43: s04-stg:/gluster/mnt7/brick > Brick44: s04-stg:/gluster/mnt8/brick > Brick45: s04-stg:/gluster/mnt9/brick > Brick46: s04-stg:/gluster/mnt10/brick > Brick47: s04-stg:/gluster/mnt11/brick > Brick48: s04-stg:/gluster/mnt12/brick > > V3 > Brick49: s05-stg:/gluster/mnt1/brick > Brick50: s05-stg:/gluster/mnt2/brick > Brick51: s05-stg:/gluster/mnt3/brick > Brick52: s05-stg:/gluster/mnt4/brick > Brick53: s05-stg:/gluster/mnt5/brick > Brick54: s05-stg:/gluster/mnt6/brick > > V4 > Brick55: s05-stg:/gluster/mnt7/brick > Brick56: s05-stg:/gluster/mnt8/brick > Brick57: s05-stg:/gluster/mnt9/brick > Brick58: s05-stg:/gluster/mnt10/brick > Brick59: s05-stg:/gluster/mnt11/brick > Brick60: s05-stg:/gluster/mnt12/brick > > V5 > Brick61: s06-stg:/gluster/mnt1/brick > Brick62: s06-stg:/gluster/mnt2/brick > Brick63: s06-stg:/gluster/mnt3/brick > Brick64: s06-stg:/gluster/mnt4/brick > Brick65: s06-stg:/gluster/mnt5/brick > Brick66: s06-stg:/gluster/mnt6/brick > > V6 > Brick67: s06-stg:/gluster/mnt7/brick > Brick68: s06-stg:/gluster/mnt8/brick > Brick69: s06-stg:/gluster/mnt9/brick > Brick70: s06-stg:/gluster/mnt10/brick > Brick71: s06-stg:/gluster/mnt11/brick > Brick72: s06-stg:/gluster/mnt12/brick > > > Just a note that these steps need movement of data. > Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next. > Let me know if you have any issues. > > --- > Ashish > > > > From: "Mauro Tridici" <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> > To: "Ashish Pandey" <aspandey at redhat.com <mailto:aspandey at redhat.com>> > Cc: "gluster-users" <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> > Sent: Thursday, September 27, 2018 4:03:04 PM > Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version > > > Dear Ashish, > > I hope I don?t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem. > Please, forgive my insistence. > > Thank you in advance, > Mauro > > Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> ha scritto: > > Hi Ashish, > > sure, no problem! We are a little bit worried, but we can wait :-) > Thank you very much for your support and your availability. > > Regards, > Mauro > > > Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey <aspandey at redhat.com <mailto:aspandey at redhat.com>> ha scritto: > > Hi Mauro, > > Yes, I can provide you step by step procedure to correct it. > Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry? > > --- > Ashish > > From: "Mauro Tridici" <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> > To: "Ashish Pandey" <aspandey at redhat.com <mailto:aspandey at redhat.com>> > Cc: "gluster-users" <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> > Sent: Wednesday, September 26, 2018 6:54:19 PM > Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version > > > Hi Ashish, > > in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old). > I just stopped the running rebalance (as you can see at the bottom of the rebalance log file). > So, if exists a safe procedure to correct the problem I would like execute it. > > I don?t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced? > > The following outputs show the result of ?df -h? command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently. > > [root at s06 bricks]# df -h > File system Dim. Usati Dispon. Uso% Montato su > /dev/mapper/cl_s06-root 100G 2,1G 98G 3% / > devtmpfs 32G 0 32G 0% /dev > tmpfs 32G 4,0K 32G 1% /dev/shm > tmpfs 32G 26M 32G 1% /run > tmpfs 32G 0 32G 0% /sys/fs/cgroup > /dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var > /dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster > /dev/sda1 1014M 152M 863M 15% /boot > /dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3 > /dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6 > /dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2 > /dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4 > /dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9 > /dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1 > /dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7 > /dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5 > /dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8 > /dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11 > /dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10 > /dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12 > tmpfs 6,3G 0 6,3G 0% /run/user/0 > > [root at s01 ~]# df -h > File system Dim. Usati Dispon. Uso% Montato su > /dev/mapper/cl_s01-root 100G 5,3G 95G 6% / > devtmpfs 32G 0 32G 0% /dev > tmpfs 32G 39M 32G 1% /dev/shm > tmpfs 32G 26M 32G 1% /run > tmpfs 32G 0 32G 0% /sys/fs/cgroup > /dev/mapper/cl_s01-var 100G 11G 90G 11% /var > /dev/md127 1015M 151M 865M 15% /boot > /dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster > /dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7 > /dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11 > /dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4 > /dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10 > /dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8 > /dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12 > /dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9 > /dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6 > /dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5 > /dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3 > /dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1 > /dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2 > tmpfs 6,3G 0 6,3G 0% /run/user/0 > s01-stg:tier2 420T 159T 262T 38% /tier2 > > As you can see, used space value of each brick of the last servers is about 800GB. > > Thank you, > Mauro > > > > > > > > > Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey <aspandey at redhat.com <mailto:aspandey at redhat.com>> ha scritto: > > Hi Mauro, > > rebalance and brick logs should be the first thing we should go through. > > There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure. > You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration. > That means 2 bricks on each node for one subvolume. > The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete. > > I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks. > After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes. > > Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data. > > --- > Ashish > > > > From: "Mauro Tridici" <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> > To: "Ashish Pandey" <aspandey at redhat.com <mailto:aspandey at redhat.com>> > Cc: "gluster-users" <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> > Sent: Wednesday, September 26, 2018 5:55:02 PM > Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version > > > Dear Ashish, > > thank you for you answer. > I could provide you the entire log file related to glusterd, glusterfsd and rebalance. > Please, could you indicate which one you need first? > > Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it? > > Many thanks, > Mauro > > Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey <aspandey at redhat.com <mailto:aspandey at redhat.com>> ha scritto: > > > I think we don't have enough logs to debug this so I would suggest you to provide more logs/info. > I have also observed that the configuration and setup of your volume is not very efficient. > > For example: > Brick37: s04-stg:/gluster/mnt1/brick > Brick38: s04-stg:/gluster/mnt2/brick > Brick39: s04-stg:/gluster/mnt3/brick > Brick40: s04-stg:/gluster/mnt4/brick > Brick41: s04-stg:/gluster/mnt5/brick > Brick42: s04-stg:/gluster/mnt6/brick > Brick43: s04-stg:/gluster/mnt7/brick > Brick44: s04-stg:/gluster/mnt8/brick > Brick45: s04-stg:/gluster/mnt9/brick > Brick46: s04-stg:/gluster/mnt10/brick > Brick47: s04-stg:/gluster/mnt11/brick > Brick48: s04-stg:/gluster/mnt12/brick > > These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg > I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case. > > --- > Ashish > > From: "Mauro Tridici" <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> > To: "gluster-users" <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> > Sent: Wednesday, September 26, 2018 3:38:35 PM > Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version > > Dear All, Dear Nithya, > > after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong. > Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high. > > [root at s01 ~]# gluster volume rebalance tier2 status > Node Rebalanced-files size scanned failures skipped status run time in h:m:s > --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- > localhost 19 161.6GB 537 2 2 in progress 0:32:23 > s02-stg 25 212.7GB 526 5 2 in progress 0:32:25 > s03-stg 4 69.1GB 511 0 0 in progress 0:32:25 > s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25 > s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25 > s06-stg 3 1.2GB 8032 11 3 failed 0:17:57 > Estimated time left for rebalance to complete : 3601:05:41 > volume rebalance: tier2: success > > When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log > > Error type 1) > > [2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining> 000000, good=100111, bad=011000) > [2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining> 000000, good=111101, bad=000010) > > Error type 2) > > [2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer) > > Error type 3) > > [2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10) > [2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10) > [2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE > A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers > e-11:71373083776).Skipping file. > > Error type 4) > > W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected > > Error type 5) > > [2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55 > 90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down > > Error type 6) > > [2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting. > > It seems that there are some network or timeout problems, but the network usage/traffic values are not so high. > Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters? > Could you, please, help me to understand the cause of the problems above? > > You can find below our volume info: > (volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD) > > [root at s04 ~]# gluster vol info > > Volume Name: tier2 > Type: Distributed-Disperse > Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c > Status: Started > Snapshot Count: 0 > Number of Bricks: 12 x (4 + 2) = 72 > Transport-type: tcp > Bricks: > Brick1: s01-stg:/gluster/mnt1/brick > Brick2: s02-stg:/gluster/mnt1/brick > Brick3: s03-stg:/gluster/mnt1/brick > Brick4: s01-stg:/gluster/mnt2/brick > Brick5: s02-stg:/gluster/mnt2/brick > Brick6: s03-stg:/gluster/mnt2/brick > Brick7: s01-stg:/gluster/mnt3/brick > Brick8: s02-stg:/gluster/mnt3/brick > Brick9: s03-stg:/gluster/mnt3/brick > Brick10: s01-stg:/gluster/mnt4/brick > Brick11: s02-stg:/gluster/mnt4/brick > Brick12: s03-stg:/gluster/mnt4/brick > Brick13: s01-stg:/gluster/mnt5/brick > Brick14: s02-stg:/gluster/mnt5/brick > Brick15: s03-stg:/gluster/mnt5/brick > Brick16: s01-stg:/gluster/mnt6/brick > Brick17: s02-stg:/gluster/mnt6/brick > Brick18: s03-stg:/gluster/mnt6/brick > Brick19: s01-stg:/gluster/mnt7/brick > Brick20: s02-stg:/gluster/mnt7/brick > Brick21: s03-stg:/gluster/mnt7/brick > Brick22: s01-stg:/gluster/mnt8/brick > Brick23: s02-stg:/gluster/mnt8/brick > Brick24: s03-stg:/gluster/mnt8/brick > Brick25: s01-stg:/gluster/mnt9/brick > Brick26: s02-stg:/gluster/mnt9/brick > Brick27: s03-stg:/gluster/mnt9/brick > Brick28: s01-stg:/gluster/mnt10/brick > Brick29: s02-stg:/gluster/mnt10/brick > Brick30: s03-stg:/gluster/mnt10/brick > Brick31: s01-stg:/gluster/mnt11/brick > Brick32: s02-stg:/gluster/mnt11/brick > Brick33: s03-stg:/gluster/mnt11/brick > Brick34: s01-stg:/gluster/mnt12/brick > Brick35: s02-stg:/gluster/mnt12/brick > Brick36: s03-stg:/gluster/mnt12/brick > Brick37: s04-stg:/gluster/mnt1/brick > Brick38: s04-stg:/gluster/mnt2/brick > Brick39: s04-stg:/gluster/mnt3/brick > Brick40: s04-stg:/gluster/mnt4/brick > Brick41: s04-stg:/gluster/mnt5/brick > Brick42: s04-stg:/gluster/mnt6/brick > Brick43: s04-stg:/gluster/mnt7/brick > Brick44: s04-stg:/gluster/mnt8/brick > Brick45: s04-stg:/gluster/mnt9/brick > Brick46: s04-stg:/gluster/mnt10/brick > Brick47: s04-stg:/gluster/mnt11/brick > Brick48: s04-stg:/gluster/mnt12/brick > Brick49: s05-stg:/gluster/mnt1/brick > Brick50: s05-stg:/gluster/mnt2/brick > Brick51: s05-stg:/gluster/mnt3/brick > Brick52: s05-stg:/gluster/mnt4/brick > Brick53: s05-stg:/gluster/mnt5/brick > Brick54: s05-stg:/gluster/mnt6/brick > Brick55: s05-stg:/gluster/mnt7/brick > Brick56: s05-stg:/gluster/mnt8/brick > Brick57: s05-stg:/gluster/mnt9/brick > Brick58: s05-stg:/gluster/mnt10/brick > Brick59: s05-stg:/gluster/mnt11/brick > Brick60: s05-stg:/gluster/mnt12/brick > Brick61: s06-stg:/gluster/mnt1/brick > Brick62: s06-stg:/gluster/mnt2/brick > Brick63: s06-stg:/gluster/mnt3/brick > Brick64: s06-stg:/gluster/mnt4/brick > Brick65: s06-stg:/gluster/mnt5/brick > Brick66: s06-stg:/gluster/mnt6/brick > Brick67: s06-stg:/gluster/mnt7/brick > Brick68: s06-stg:/gluster/mnt8/brick > Brick69: s06-stg:/gluster/mnt9/brick > Brick70: s06-stg:/gluster/mnt10/brick > Brick71: s06-stg:/gluster/mnt11/brick > Brick72: s06-stg:/gluster/mnt12/brick > Options Reconfigured: > network.ping-timeout: 60 > diagnostics.count-fop-hits: on > diagnostics.latency-measurement: on > cluster.server-quorum-type: server > features.default-soft-limit: 90 > features.quota-deem-statfs: on > performance.io <http://performance.io/>-thread-count: 16 > disperse.cpu-extensions: auto > performance.io <http://performance.io/>-cache: off > network.inode-lru-limit: 50000 > performance.md-cache-timeout: 600 > performance.cache-invalidation: on > performance.stat-prefetch: on > features.cache-invalidation-timeout: 600 > features.cache-invalidation: on > cluster.readdir-optimize: on > performance.parallel-readdir: off > performance.readdir-ahead: on > cluster.lookup-optimize: on > client.event-threads: 4 > server.event-threads: 4 > nfs.disable: on > transport.address-family: inet > cluster.quorum-type: auto > cluster.min-free-disk: 10 > performance.client-io-threads: on > features.quota: on > features.inode-quota: on > features.bitrot: on > features.scrub: Active > cluster.brick-multiplex: on > cluster.server-quorum-ratio: 51% > > If it can help, I paste here the output of ?free -m? command executed on all the cluster nodes: > > The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement? > > [root at s06 ~]# free -m > total used free shared buff/cache available > Mem: 64309 10409 464 15 53434 52998 > Swap: 65535 103 65432 > > Thank you in advance. > Sorry for my long message, but I?m trying to notify you all available information. > > Regards, > Mauro > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> > > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it <http://www.cmcc.it/> > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it> > https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b> > > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it <http://www.cmcc.it/> > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it> > https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it <http://www.cmcc.it/> > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it> > https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it> https://it.linkedin.com/in/mauro-tridici-5977238b -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180928/2ee3b6cf/attachment.html>