Mauro Tridici
2018-Sep-12 13:54 UTC
[Gluster-users] Failures during rebalance on gluster distributed disperse volume
Dear All, I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume. Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem. I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected. [root at s01 glusterfs]# gluster v rebalance tier2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 71176 3.2MB 2137557 1530391 8128 in progress 13:59:05 s02-stg 0 0Bytes 0 0 0 completed 11:53:28 s03-stg 0 0Bytes 0 0 0 completed 11:53:32 s04-stg 0 0Bytes 0 0 0 completed 0:00:06 s05-stg 15 0Bytes 17055 0 18 completed 10:48:01 s06-stg 0 0Bytes 0 0 0 completed 0:00:06 Estimated time left for rebalance to complete : 0:46:53 volume rebalance: tier2: success In the volume rebalance log file, I detected a lot of error messages similar to the following ones: [2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc [2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc [2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on tier2-disperse-9 (Operation not supported) [2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc [2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc [2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on tier2-disperse-10 (Operation not supported) [2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc [2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc Could you please help me to understand what is happening and how to solve it? Our Gluster implementation is based on Gluster v.3.10.5 Thank you in advance, Mauro -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180912/3b6e1a43/attachment.html>
Nithya Balachandran
2018-Sep-13 11:38 UTC
[Gluster-users] Failures during rebalance on gluster distributed disperse volume
This looks like an issue because rebalance switched to using fallocate which EC did not have implemented at that point. @Pranith, @Ashish, which version of gluster had support for fallocate in EC? Regards, Nithya On 12 September 2018 at 19:24, Mauro Tridici <mauro.tridici at cmcc.it> wrote:> Dear All, > > I recently added 3 servers (each one with 12 bricks) to an existing > Gluster Distributed Disperse Volume. > Volume extension has been completed without error and I already executed > the rebalance procedure with fix-layout option with no problem. > I just launched the rebalance procedure without fix-layout option, but, as > you can see in the output below, I noticed that some failures have been > detected. > > [root at s01 glusterfs]# gluster v rebalance tier2 status > Node Rebalanced-files size > scanned failures skipped status run time in > h:m:s > --------- ----------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 71176 3.2MB > 2137557 1530391 8128 in progress 13:59:05 > s02-stg 0 0Bytes > 0 0 0 completed 11:53:28 > s03-stg 0 0Bytes > 0 0 0 completed 11:53:32 > s04-stg 0 0Bytes > 0 0 0 completed 0:00:06 > s05-stg 15 0Bytes > 17055 0 18 completed 10:48:01 > s06-stg 0 0Bytes > 0 0 0 completed 0:00:06 > Estimated time left for rebalance to complete : 0:46:53 > volume rebalance: tier2: success > > In the volume rebalance log file, I detected a lot of error messages > similar to the following ones: > > [2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] > 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - > /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s > ps_200508_003.cam.h0.2005-12_grid.nc > [2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] > 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_ > 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc > [2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] > 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_ > 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on > tier2-disperse-9 (Operation not supported) > [2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] > 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - > /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s > ps_200508_003.cam.h0.2005-09_grid.nc > [2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] > 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_ > 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc > [2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] > 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_ > 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on > tier2-disperse-10 (Operation not supported) > [2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] > 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - > /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s > ps_200508_003.cam.h0.2006-01_grid.nc > [2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] > 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_ > 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc > > Could you please help me to understand what is happening and how to solve > it? > > Our Gluster implementation is based on Gluster v.3.10.5 > > Thank you in advance, > Mauro > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180913/ac394f72/attachment.html>