Sunil Kumar Heggodu Gopala Acharya
2018-Sep-15 09:57 UTC
[Gluster-users] Failures during rebalance on gluster distributed disperse volume
Hi Mauro, As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as part of https://bugzilla.redhat.com/show_bug.cgi?id=1454686. Hence, upgrading to 3.12 as suggested before would be a right move. Here is the documentation for upgrading to 3.12: https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/ Regards, Sunil kumar Acharya Senior Software Engineer Red Hat <https://www.redhat.com> T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> On Sat, Sep 15, 2018 at 3:42 AM, Mauro Tridici <mauro.tridici at cmcc.it> wrote:> > Hi Nithya, > > thank you very much for your answer. > I will wait for @Sunil opinion too before starting the upgrade procedure. > > Since it will be the first upgrade of our Gluster cluster, I would like to > know if it could be a ?virtually dangerous" procedure and if it will be the > risk of losing data :-) > Unfortunately, I can?t do a preventive copy of the volume data in another > location. > If it is possible, could you please illustrate the right steps needed to > complete the upgrade procedure from the 3.10.5 to the 3.12 version? > > Thank you again, Nithya. > Thank you to all of you for the help! > > Regards, > Mauro > > Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran < > nbalacha at redhat.com> ha scritto: > > Hi Mauro, > > > The rebalance code started using fallocate in 3.10.5 ( > https://bugzilla.redhat.com/show_bug.cgi?id=1473132) which works fine on > replicated volumes. However, we neglected to test this with EC volumes on > 3.10. Once we discovered the issue, the EC fallocate implementation was > made available in 3.11. > > At this point, I'm afraid the only option I see is to upgrade to at least > 3.12. > > @Sunil, do you have anything to add? > > Regards, > Nithya > > On 13 September 2018 at 18:34, Mauro Tridici <mauro.tridici at cmcc.it> > wrote: > >> >> Hi Nithya, >> >> thank you for involving EC group. >> I will wait for your suggestions. >> >> Regards, >> Mauro >> >> Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran < >> nbalacha at redhat.com> ha scritto: >> >> This looks like an issue because rebalance switched to using fallocate >> which EC did not have implemented at that point. >> >> @Pranith, @Ashish, which version of gluster had support for fallocate in >> EC? >> >> >> Regards, >> Nithya >> >> On 12 September 2018 at 19:24, Mauro Tridici <mauro.tridici at cmcc.it> >> wrote: >> >>> Dear All, >>> >>> I recently added 3 servers (each one with 12 bricks) to an existing >>> Gluster Distributed Disperse Volume. >>> Volume extension has been completed without error and I already executed >>> the rebalance procedure with fix-layout option with no problem. >>> I just launched the rebalance procedure without fix-layout option, but, >>> as you can see in the output below, I noticed that some failures have been >>> detected. >>> >>> [root at s01 glusterfs]# gluster v rebalance tier2 status >>> Node Rebalanced-files size >>> scanned failures skipped status run time in >>> h:m:s >>> --------- ----------- ----------- >>> ----------- ----------- ----------- ------------ >>> -------------- >>> localhost 71176 3.2MB >>> 2137557 1530391 8128 in progress >>> 13:59:05 >>> s02-stg 0 0Bytes >>> 0 0 0 completed >>> 11:53:28 >>> s03-stg 0 0Bytes >>> 0 0 0 completed >>> 11:53:32 >>> s04-stg 0 0Bytes >>> 0 0 0 completed >>> 0:00:06 >>> s05-stg 15 0Bytes >>> 17055 0 18 completed >>> 10:48:01 >>> s06-stg 0 0Bytes >>> 0 0 0 completed >>> 0:00:06 >>> Estimated time left for rebalance to complete : 0:46:53 >>> volume rebalance: tier2: success >>> >>> In the volume rebalance log file, I detected a lot of error messages >>> similar to the following ones: >>> >>> [2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>> 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - >>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2 >>> 00508_003.cam.h0.2005-12_grid.nc >>> [2018-09-12 13:15:50.757025] E [MSGID: 109023] >>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005 >>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc >>> [2018-09-12 13:15:50.759183] E [MSGID: 109023] >>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: >>> fallocate failed for /CSP/sp1/CESM/archive/sps_2005 >>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on >>> tier2-disperse-9 (Operation not supported) >>> [2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>> 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - >>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2 >>> 00508_003.cam.h0.2005-09_grid.nc >>> [2018-09-12 13:15:50.759536] E [MSGID: 109023] >>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005 >>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc >>> [2018-09-12 13:15:50.777219] E [MSGID: 109023] >>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: >>> fallocate failed for /CSP/sp1/CESM/archive/sps_2005 >>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on >>> tier2-disperse-10 (Operation not supported) >>> [2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>> 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - >>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2 >>> 00508_003.cam.h0.2006-01_grid.nc >>> [2018-09-12 13:15:50.777676] E [MSGID: 109023] >>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005 >>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc >>> >>> Could you please help me to understand what is happening and how to >>> solve it? >>> >>> Our Gluster implementation is based on Gluster v.3.10.5 >>> >>> Thank you in advance, >>> Mauro >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Universit? del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it >> >> mobile: (+39) 327 5630841 >> email: mauro.tridici at cmcc.it >> >> > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180915/d35dcd01/attachment.html>
Mauro Tridici
2018-Sep-15 17:25 UTC
[Gluster-users] Failures during rebalance on gluster distributed disperse volume
Hi Sunil, many thanks to you too. I will follow your suggestions and the guide for upgrading to 3.12 Crossing fingers :-) Regards, Mauro> Il giorno 15 set 2018, alle ore 11:57, Sunil Kumar Heggodu Gopala Acharya <sheggodu at redhat.com> ha scritto: > > Hi Mauro, > > As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as part of https://bugzilla.redhat.com/show_bug.cgi?id=1454686 <https://bugzilla.redhat.com/show_bug.cgi?id=1454686>. Hence, upgrading to 3.12 as suggested before would be a right move. > > Here is the documentation for upgrading to 3.12: https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/ <https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/> > > Regards, > SUNIL KUMAR ACHARYA > SENIOR SOFTWARE ENGINEER > Red Hat > > <https://www.redhat.com/> > T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/> > > <https://red.ht/sig> > TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> > > On Sat, Sep 15, 2018 at 3:42 AM, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: > > Hi Nithya, > > thank you very much for your answer. > I will wait for @Sunil opinion too before starting the upgrade procedure. > > Since it will be the first upgrade of our Gluster cluster, I would like to know if it could be a ?virtually dangerous" procedure and if it will be the risk of losing data :-) > Unfortunately, I can?t do a preventive copy of the volume data in another location. > If it is possible, could you please illustrate the right steps needed to complete the upgrade procedure from the 3.10.5 to the 3.12 version? > > Thank you again, Nithya. > Thank you to all of you for the help! > > Regards, > Mauro > >> Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran <nbalacha at redhat.com <mailto:nbalacha at redhat.com>> ha scritto: >> >> Hi Mauro, >> >> >> The rebalance code started using fallocate in 3.10.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1473132 <https://bugzilla.redhat.com/show_bug.cgi?id=1473132>) which works fine on replicated volumes. However, we neglected to test this with EC volumes on 3.10. Once we discovered the issue, the EC fallocate implementation was made available in 3.11. >> >> At this point, I'm afraid the only option I see is to upgrade to at least 3.12. >> >> @Sunil, do you have anything to add? >> >> Regards, >> Nithya >> >> On 13 September 2018 at 18:34, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >> >> Hi Nithya, >> >> thank you for involving EC group. >> I will wait for your suggestions. >> >> Regards, >> Mauro >> >>> Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran <nbalacha at redhat.com <mailto:nbalacha at redhat.com>> ha scritto: >>> >>> This looks like an issue because rebalance switched to using fallocate which EC did not have implemented at that point. >>> >>> @Pranith, @Ashish, which version of gluster had support for fallocate in EC? >>> >>> >>> Regards, >>> Nithya >>> >>> On 12 September 2018 at 19:24, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >>> Dear All, >>> >>> I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume. >>> Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem. >>> I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected. >>> >>> [root at s01 glusterfs]# gluster v rebalance tier2 status >>> Node Rebalanced-files size scanned failures skipped status run time in h:m:s >>> --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- >>> localhost 71176 3.2MB 2137557 1530391 8128 in progress 13:59:05 >>> s02-stg 0 0Bytes 0 0 0 completed 11:53:28 >>> s03-stg 0 0Bytes 0 0 0 completed 11:53:32 >>> s04-stg 0 0Bytes 0 0 0 completed 0:00:06 >>> s05-stg 15 0Bytes 17055 0 18 completed 10:48:01 >>> s06-stg 0 0Bytes 0 0 0 completed 0:00:06 >>> Estimated time left for rebalance to complete : 0:46:53 >>> volume rebalance: tier2: success >>> >>> In the volume rebalance log file, I detected a lot of error messages similar to the following ones: >>> >>> [2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/> >>> [2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/> >>> [2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 (Operation not supported) >>> [2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> >>> [2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> >>> [2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 (Operation not supported) >>> [2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> >>> [2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> >>> >>> Could you please help me to understand what is happening and how to solve it? >>> >>> Our Gluster implementation is based on Gluster v.3.10.5 >>> >>> Thank you in advance, >>> Mauro >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> >>> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Universit? del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it <http://www.cmcc.it/> >> >> mobile: (+39) 327 5630841 >> email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it> >> > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it <http://www.cmcc.it/> > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it> >------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180915/d6675d75/attachment.html>