Alessandro Ipe
2018-Feb-02 10:27 UTC
[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?
Hi, I simplified the config in my first email, but I actually have 2x4 servers in replicate-distribute with each 4 bricks for 6 of them and 2 bricks for the remaining 2. Full healing will just take ages... for a just single brick to resync !> gluster v status homevolume status home Status of volume: home Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick server1:/data/glusterfs/home/brick1 49157 0 Y 5003 Brick server1:/data/glusterfs/home/brick2 49153 0 Y 5023 Brick server1:/data/glusterfs/home/brick3 49154 0 Y 5004 Brick server1:/data/glusterfs/home/brick4 49155 0 Y 5011 Brick server3:/data/glusterfs/home/brick1 49152 0 Y 5422 Brick server4:/data/glusterfs/home/brick1 49152 0 Y 5019 Brick server3:/data/glusterfs/home/brick2 49153 0 Y 5429 Brick server4:/data/glusterfs/home/brick2 49153 0 Y 5033 Brick server3:/data/glusterfs/home/brick3 49154 0 Y 5437 Brick server4:/data/glusterfs/home/brick3 49154 0 Y 5026 Brick server3:/data/glusterfs/home/brick4 49155 0 Y 5444 Brick server4:/data/glusterfs/home/brick4 N/A N/A N N/A Brick server5:/data/glusterfs/home/brick1 49152 0 Y 5275 Brick server6:/data/glusterfs/home/brick1 49152 0 Y 5786 Brick server5:/data/glusterfs/home/brick2 49153 0 Y 5276 Brick server6:/data/glusterfs/home/brick2 49153 0 Y 5792 Brick server5:/data/glusterfs/home/brick3 49154 0 Y 5282 Brick server6:/data/glusterfs/home/brick3 49154 0 Y 5794 Brick server5:/data/glusterfs/home/brick4 49155 0 Y 5293 Brick server6:/data/glusterfs/home/brick4 49155 0 Y 5806 Brick server7:/data/glusterfs/home/brick1 49156 0 Y 22339 Brick server8:/data/glusterfs/home/brick1 49153 0 Y 17992 Brick server7:/data/glusterfs/home/brick2 49157 0 Y 22347 Brick server8:/data/glusterfs/home/brick2 49154 0 Y 18546 NFS Server on localhost 2049 0 Y 683 Self-heal Daemon on localhost N/A N/A Y 693 NFS Server on server8 2049 0 Y 18553 Self-heal Daemon on server8 N/A N/A Y 18566 NFS Server on server5 2049 0 Y 23115 Self-heal Daemon on server5 N/A N/A Y 23121 NFS Server on server7 2049 0 Y 4201 Self-heal Daemon on server7 N/A N/A Y 4210 NFS Server on server3 2049 0 Y 5460 Self-heal Daemon on server3 N/A N/A Y 5469 NFS Server on server6 2049 0 Y 22709 Self-heal Daemon on server6 N/A N/A Y 22718 NFS Server on server4 2049 0 Y 6044 Self-heal Daemon on server4 N/A N/A Y 6243 server 2 is currently powered off as we are waiting a replacement RAID controller, as well as for server4:/data/glusterfs/home/brick4 And as I said, there is a rebalance in progress> gluster rebalance home statusNode Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 42083 23.3GB 1568065 1359 303734 in progress 16:49:31 server5 35698 23.8GB 1027934 0 240748 in progress 16:49:23 server4 35096 23.4GB 899491 0 229064 in progress 16:49:18 server3 27031 18.0GB 701759 8 182592 in progress 16:49:27 server8 0 0Bytes 327602 0 805 in progress 16:49:18 server6 35672 23.9GB 1028469 0 240810 in progress 16:49:17 server7 1 45Bytes 53 0 0 completed 0:03:53 Estimated time left for rebalance to complete : 359739:51:24 volume rebalance: home: success Thanks, A. On Thursday, 1 February 2018 18:57:17 CET Serkan ?oban wrote:> What is server4? You just mentioned server1 and server2 previously. > Can you post the output of gluster v status volname > > On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe <Alessandro.Ipe at meteo.be> wrote: > > Hi, > > > > > > Thanks. However "gluster v heal volname full" returned the following error > > message > > Commit failed on server4. Please check log file for details. > > > > I have checked the log files in /var/log/glusterfs on server4 (by grepping > > heal), but did not get any match. What should I be looking for and in > > which > > log file, please ? > > > > Note that there is currently a rebalance process running on the volume. > > > > > > Many thanks, > > > > > > A. > > > > On Thursday, 1 February 2018 17:32:19 CET Serkan ?oban wrote: > >> You do not need to reset brick if brick path does not change. Replace > >> the brick format and mount, then gluster v start volname force. > >> To start self heal just run gluster v heal volname full. > >> > >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe <Alessandro.Ipe at meteo.be> > > > > wrote: > >> > Hi, > >> > > >> > > >> > My volume home is configured in replicate mode (version 3.12.4) with > >> > the > >> > bricks server1:/data/gluster/brick1 > >> > server2:/data/gluster/brick1 > >> > > >> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon > >> > for > >> > that brick on server2, umounted it, reformated it, remounted it and did > >> > a> > >> > > >> >> gluster volume reset-brick home server2:/data/gluster/brick1 > >> >> server2:/data/gluster/brick1 commit force> > >> > > >> > I was expecting that the self-heal daemon would start copying data from > >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty > >> > server2:/data/gluster/brick1, which it only did for directories, but > >> > not > >> > for files. > >> > > >> > For the moment, I launched on the fuse mount point > >> > > >> >> find . | xargs stat > >> > > >> > but crawling the whole volume (100 TB) to trigger self-healing of a > >> > single > >> > brick of 7.4 TB is unefficient. > >> > > >> > Is there any trick to only self-heal a single brick, either by setting > >> > some attributes to its top directory, for example ? > >> > > >> > > >> > Many thanks, > >> > > >> > > >> > Alessandro > >> > > >> > > >> > _______________________________________________ > >> > Gluster-users mailing list > >> > Gluster-users at gluster.org > >> > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > > > > Dr. Ir. Alessandro Ipe > > Department of Observations Tel. +32 2 373 06 31 > > Remote Sensing from Space > > Royal Meteorological Institute > > Avenue Circulaire 3 Email: > > B-1180 Brussels Belgium Alessandro.Ipe at meteo.be > > Web: http://gerb.oma.be-- Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Royal Meteorological Institute Avenue Circulaire 3 Email: B-1180 Brussels Belgium Alessandro.Ipe at meteo.be Web: http://gerb.oma.be
Serkan Çoban
2018-Feb-02 10:38 UTC
[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?
If I were you I follow the following steps. Stop the rebalance and fix the cluster health first. Bring up the down server, replace server4:brick4 with a new disk, format and be sure it is started, then start a full heal. Without all bricks up full heal will not start. The you can continue with rebalance. On Fri, Feb 2, 2018 at 1:27 PM, Alessandro Ipe <Alessandro.Ipe at meteo.be> wrote:> Hi, > > > I simplified the config in my first email, but I actually have 2x4 servers in replicate-distribute with each 4 bricks for 6 of them and 2 bricks for the remaining 2. Full healing will just take ages... for a just single brick to resync ! > >> gluster v status home > volume status home > Status of volume: home > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick server1:/data/glusterfs/home/brick1 49157 0 Y 5003 > Brick server1:/data/glusterfs/home/brick2 49153 0 Y 5023 > Brick server1:/data/glusterfs/home/brick3 49154 0 Y 5004 > Brick server1:/data/glusterfs/home/brick4 49155 0 Y 5011 > Brick server3:/data/glusterfs/home/brick1 49152 0 Y 5422 > Brick server4:/data/glusterfs/home/brick1 49152 0 Y 5019 > Brick server3:/data/glusterfs/home/brick2 49153 0 Y 5429 > Brick server4:/data/glusterfs/home/brick2 49153 0 Y 5033 > Brick server3:/data/glusterfs/home/brick3 49154 0 Y 5437 > Brick server4:/data/glusterfs/home/brick3 49154 0 Y 5026 > Brick server3:/data/glusterfs/home/brick4 49155 0 Y 5444 > Brick server4:/data/glusterfs/home/brick4 N/A N/A N N/A > Brick server5:/data/glusterfs/home/brick1 49152 0 Y 5275 > Brick server6:/data/glusterfs/home/brick1 49152 0 Y 5786 > Brick server5:/data/glusterfs/home/brick2 49153 0 Y 5276 > Brick server6:/data/glusterfs/home/brick2 49153 0 Y 5792 > Brick server5:/data/glusterfs/home/brick3 49154 0 Y 5282 > Brick server6:/data/glusterfs/home/brick3 49154 0 Y 5794 > Brick server5:/data/glusterfs/home/brick4 49155 0 Y 5293 > Brick server6:/data/glusterfs/home/brick4 49155 0 Y 5806 > Brick server7:/data/glusterfs/home/brick1 49156 0 Y 22339 > Brick server8:/data/glusterfs/home/brick1 49153 0 Y 17992 > Brick server7:/data/glusterfs/home/brick2 49157 0 Y 22347 > Brick server8:/data/glusterfs/home/brick2 49154 0 Y 18546 > NFS Server on localhost 2049 0 Y 683 > Self-heal Daemon on localhost N/A N/A Y 693 > NFS Server on server8 2049 0 Y 18553 > Self-heal Daemon on server8 N/A N/A Y 18566 > NFS Server on server5 2049 0 Y 23115 > Self-heal Daemon on server5 N/A N/A Y 23121 > NFS Server on server7 2049 0 Y 4201 > Self-heal Daemon on server7 N/A N/A Y 4210 > NFS Server on server3 2049 0 Y 5460 > Self-heal Daemon on server3 N/A N/A Y 5469 > NFS Server on server6 2049 0 Y 22709 > Self-heal Daemon on server6 N/A N/A Y 22718 > NFS Server on server4 2049 0 Y 6044 > Self-heal Daemon on server4 N/A N/A Y 6243 > > server 2 is currently powered off as we are waiting a replacement RAID controller, as well as for > server4:/data/glusterfs/home/brick4 > > And as I said, there is a rebalance in progress >> gluster rebalance home status > Node Rebalanced-files size scanned failures skipped status run time in h:m:s > --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- > localhost 42083 23.3GB 1568065 1359 303734 in progress 16:49:31 > server5 35698 23.8GB 1027934 0 240748 in progress 16:49:23 > server4 35096 23.4GB 899491 0 229064 in progress 16:49:18 > server3 27031 18.0GB 701759 8 182592 in progress 16:49:27 > server8 0 0Bytes 327602 0 805 in progress 16:49:18 > server6 35672 23.9GB 1028469 0 240810 in progress 16:49:17 > server7 1 45Bytes 53 0 0 completed 0:03:53 > Estimated time left for rebalance to complete : 359739:51:24 > volume rebalance: home: success > > > Thanks, > > > A. > > > > On Thursday, 1 February 2018 18:57:17 CET Serkan ?oban wrote: >> What is server4? You just mentioned server1 and server2 previously. >> Can you post the output of gluster v status volname >> >> On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe <Alessandro.Ipe at meteo.be> wrote: >> > Hi, >> > >> > >> > Thanks. However "gluster v heal volname full" returned the following error >> > message >> > Commit failed on server4. Please check log file for details. >> > >> > I have checked the log files in /var/log/glusterfs on server4 (by grepping >> > heal), but did not get any match. What should I be looking for and in >> > which >> > log file, please ? >> > >> > Note that there is currently a rebalance process running on the volume. >> > >> > >> > Many thanks, >> > >> > >> > A. >> > >> > On Thursday, 1 February 2018 17:32:19 CET Serkan ?oban wrote: >> >> You do not need to reset brick if brick path does not change. Replace >> >> the brick format and mount, then gluster v start volname force. >> >> To start self heal just run gluster v heal volname full. >> >> >> >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe <Alessandro.Ipe at meteo.be> >> > >> > wrote: >> >> > Hi, >> >> > >> >> > >> >> > My volume home is configured in replicate mode (version 3.12.4) with >> >> > the >> >> > bricks server1:/data/gluster/brick1 >> >> > server2:/data/gluster/brick1 >> >> > >> >> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon >> >> > for >> >> > that brick on server2, umounted it, reformated it, remounted it and did >> >> > a> >> >> > >> >> >> gluster volume reset-brick home server2:/data/gluster/brick1 >> >> >> server2:/data/gluster/brick1 commit force> >> >> > >> >> > I was expecting that the self-heal daemon would start copying data from >> >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty >> >> > server2:/data/gluster/brick1, which it only did for directories, but >> >> > not >> >> > for files. >> >> > >> >> > For the moment, I launched on the fuse mount point >> >> > >> >> >> find . | xargs stat >> >> > >> >> > but crawling the whole volume (100 TB) to trigger self-healing of a >> >> > single >> >> > brick of 7.4 TB is unefficient. >> >> > >> >> > Is there any trick to only self-heal a single brick, either by setting >> >> > some attributes to its top directory, for example ? >> >> > >> >> > >> >> > Many thanks, >> >> > >> >> > >> >> > Alessandro >> >> > >> >> > >> >> > _______________________________________________ >> >> > Gluster-users mailing list >> >> > Gluster-users at gluster.org >> >> > http://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > -- >> > >> > Dr. Ir. Alessandro Ipe >> > Department of Observations Tel. +32 2 373 06 31 >> > Remote Sensing from Space >> > Royal Meteorological Institute >> > Avenue Circulaire 3 Email: >> > B-1180 Brussels Belgium Alessandro.Ipe at meteo.be >> > Web: http://gerb.oma.be > > > -- > > Dr. Ir. Alessandro Ipe > Department of Observations Tel. +32 2 373 06 31 > Remote Sensing from Space > Royal Meteorological Institute > Avenue Circulaire 3 Email: > B-1180 Brussels Belgium Alessandro.Ipe at meteo.be > Web: http://gerb.oma.be > > >
Maybe Matching Threads
- mixing tcp/ip and ib/rdma in distributed replicated volume for disaster recovery.
- How to trigger a resync of a newly replaced empty brick in replicate config ?
- How to trigger a resync of a newly replaced empty brick in replicate config ?
- How to migrate a huge AD 2003 domain to samba 4 ?
- memory problem