thr3ads.net - Gluster users - [Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ? [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Alessandro Ipe

2018-Feb-02 10:27 UTC

[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

Hi,


I simplified the config in my first email, but I actually have 2x4 servers in
replicate-distribute with each 4 bricks for 6 of them and 2 bricks  for the
remaining 2. Full healing will just take ages... for a just single brick to
resync !
> gluster v status homevolume status home
Status of volume: home
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick server1:/data/glusterfs/home/brick1  49157     0          Y       5003 
Brick server1:/data/glusterfs/home/brick2  49153     0          Y       5023 
Brick server1:/data/glusterfs/home/brick3  49154     0          Y       5004 
Brick server1:/data/glusterfs/home/brick4  49155     0          Y       5011 
Brick server3:/data/glusterfs/home/brick1  49152     0          Y       5422 
Brick server4:/data/glusterfs/home/brick1  49152     0          Y       5019 
Brick server3:/data/glusterfs/home/brick2  49153     0          Y       5429 
Brick server4:/data/glusterfs/home/brick2  49153     0          Y       5033 
Brick server3:/data/glusterfs/home/brick3  49154     0          Y       5437 
Brick server4:/data/glusterfs/home/brick3  49154     0          Y       5026 
Brick server3:/data/glusterfs/home/brick4  49155     0          Y       5444 
Brick server4:/data/glusterfs/home/brick4  N/A       N/A        N       N/A  
Brick server5:/data/glusterfs/home/brick1  49152     0          Y       5275 
Brick server6:/data/glusterfs/home/brick1  49152     0          Y       5786 
Brick server5:/data/glusterfs/home/brick2  49153     0          Y       5276 
Brick server6:/data/glusterfs/home/brick2  49153     0          Y       5792 
Brick server5:/data/glusterfs/home/brick3  49154     0          Y       5282 
Brick server6:/data/glusterfs/home/brick3  49154     0          Y       5794 
Brick server5:/data/glusterfs/home/brick4  49155     0          Y       5293 
Brick server6:/data/glusterfs/home/brick4  49155     0          Y       5806 
Brick server7:/data/glusterfs/home/brick1  49156     0          Y       22339
Brick server8:/data/glusterfs/home/brick1  49153     0          Y       17992
Brick server7:/data/glusterfs/home/brick2  49157     0          Y       22347
Brick server8:/data/glusterfs/home/brick2  49154     0          Y       18546
NFS Server on localhost                     2049      0          Y       683  
Self-heal Daemon on localhost               N/A       N/A        Y       693  
NFS Server on server8                      2049      0          Y       18553
Self-heal Daemon on server8                N/A       N/A        Y       18566
NFS Server on server5                      2049      0          Y       23115
Self-heal Daemon on server5                N/A       N/A        Y       23121
NFS Server on server7                      2049      0          Y       4201 
Self-heal Daemon on server7                N/A       N/A        Y       4210 
NFS Server on server3                      2049      0          Y       5460 
Self-heal Daemon on server3                N/A       N/A        Y       5469 
NFS Server on server6                      2049      0          Y       22709
Self-heal Daemon on server6                N/A       N/A        Y       22718
NFS Server on server4                      2049      0          Y       6044 
Self-heal Daemon on server4                N/A       N/A        Y       6243 

server 2 is currently powered off as we are waiting a replacement RAID
controller, as well as for
server4:/data/glusterfs/home/brick4

And as I said, there is a rebalance in progress> gluster rebalance home status                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            42083        23.3GB      
1568065          1359        303734          in progress       16:49:31
                                server5            35698        23.8GB      
1027934             0        240748          in progress       16:49:23
                                server4            35096        23.4GB       
899491             0        229064          in progress       16:49:18
                                server3            27031        18.0GB       
701759             8        182592          in progress       16:49:27
                                server8                0        0Bytes       
327602             0           805          in progress       16:49:18
                                server6            35672        23.9GB      
1028469             0        240810          in progress       16:49:17
                                server7                1       45Bytes          
53             0             0            completed        0:03:53
Estimated time left for rebalance to complete :   359739:51:24
volume rebalance: home: success


Thanks,


A.



On Thursday, 1 February 2018 18:57:17 CET Serkan ?oban
wrote:> What is server4? You just mentioned server1 and server2 previously.
> Can you post the output of gluster v status volname
> 
> On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe <Alessandro.Ipe at
meteo.be> wrote:
> > Hi,
> > 
> > 
> > Thanks. However "gluster v heal volname full" returned the
following error
> > message
> > Commit failed on server4. Please check log file for details.
> > 
> > I have checked the log files in /var/log/glusterfs on server4 (by
grepping
> > heal), but did not get any match. What should I be looking for and in
> > which
> > log file, please ?
> > 
> > Note that there is currently a rebalance process running on the
volume.
> > 
> > 
> > Many thanks,
> > 
> > 
> > A.
> > 
> > On Thursday, 1 February 2018 17:32:19 CET Serkan ?oban wrote:
> >> You do not need to reset brick if brick path does not change.
Replace
> >> the brick format and mount, then gluster v start volname force.
> >> To start self heal just run gluster v heal volname full.
> >> 
> >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe <Alessandro.Ipe
at meteo.be>
> > 
> > wrote:
> >> > Hi,
> >> > 
> >> > 
> >> > My volume home is configured in replicate mode (version
3.12.4) with
> >> > the
> >> > bricks server1:/data/gluster/brick1
> >> > server2:/data/gluster/brick1
> >> > 
> >> > server2:/data/gluster/brick1 was corrupted, so I killed
gluster daemon
> >> > for
> >> > that brick on server2, umounted it, reformated it, remounted
it and did
> >> > a>
> >> > 
> >> >> gluster volume reset-brick home
server2:/data/gluster/brick1
> >> >> server2:/data/gluster/brick1 commit force>
> >> > 
> >> > I was expecting that the self-heal daemon would start copying
data from
> >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty
> >> > server2:/data/gluster/brick1, which it only did for
directories, but
> >> > not
> >> > for files.
> >> > 
> >> > For the moment, I launched on the fuse mount point
> >> > 
> >> >> find . | xargs stat
> >> > 
> >> > but crawling the whole volume (100 TB) to trigger
self-healing of a
> >> > single
> >> > brick of 7.4 TB is unefficient.
> >> > 
> >> > Is there any trick to only self-heal a single brick, either
by setting
> >> > some attributes to its top directory, for example ?
> >> > 
> >> > 
> >> > Many thanks,
> >> > 
> >> > 
> >> > Alessandro
> >> > 
> >> > 
> >> > _______________________________________________
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
> > 
> > --
> > 
> >  Dr. Ir. Alessandro Ipe
> >  Department of Observations             Tel. +32 2 373 06 31
> >  Remote Sensing from Space
> >  Royal Meteorological Institute
> >  Avenue Circulaire 3                    Email:
> >  B-1180 Brussels        Belgium         Alessandro.Ipe at meteo.be
> >  Web: http://gerb.oma.be

-- 

 Dr. Ir. Alessandro Ipe   
 Department of Observations             Tel. +32 2 373 06 31
 Remote Sensing from Space
 Royal Meteorological Institute  
 Avenue Circulaire 3                    Email:  
 B-1180 Brussels        Belgium         Alessandro.Ipe at meteo.be 
 Web: http://gerb.oma.be

Serkan Çoban

2018-Feb-02 10:38 UTC

head link

[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

If I were you I follow the following steps. Stop the rebalance and fix
the cluster health first.
Bring up the down server, replace server4:brick4 with a new disk,
format and be sure it is started, then start a full heal.
Without all bricks up full heal will not start. The you can continue
with rebalance.


On Fri, Feb 2, 2018 at 1:27 PM, Alessandro Ipe <Alessandro.Ipe at
meteo.be> wrote:> Hi,
>
>
> I simplified the config in my first email, but I actually have 2x4 servers
in replicate-distribute with each 4 bricks for 6 of them and 2 bricks  for the
remaining 2. Full healing will just take ages... for a just single brick to
resync !
>
>> gluster v status home
> volume status home
> Status of volume: home
> Gluster process                             TCP Port  RDMA Port  Online 
Pid
>
------------------------------------------------------------------------------
> Brick server1:/data/glusterfs/home/brick1  49157     0          Y      
5003
> Brick server1:/data/glusterfs/home/brick2  49153     0          Y      
5023
> Brick server1:/data/glusterfs/home/brick3  49154     0          Y      
5004
> Brick server1:/data/glusterfs/home/brick4  49155     0          Y      
5011
> Brick server3:/data/glusterfs/home/brick1  49152     0          Y      
5422
> Brick server4:/data/glusterfs/home/brick1  49152     0          Y      
5019
> Brick server3:/data/glusterfs/home/brick2  49153     0          Y      
5429
> Brick server4:/data/glusterfs/home/brick2  49153     0          Y      
5033
> Brick server3:/data/glusterfs/home/brick3  49154     0          Y      
5437
> Brick server4:/data/glusterfs/home/brick3  49154     0          Y      
5026
> Brick server3:/data/glusterfs/home/brick4  49155     0          Y      
5444
> Brick server4:/data/glusterfs/home/brick4  N/A       N/A        N       N/A
> Brick server5:/data/glusterfs/home/brick1  49152     0          Y      
5275
> Brick server6:/data/glusterfs/home/brick1  49152     0          Y      
5786
> Brick server5:/data/glusterfs/home/brick2  49153     0          Y      
5276
> Brick server6:/data/glusterfs/home/brick2  49153     0          Y      
5792
> Brick server5:/data/glusterfs/home/brick3  49154     0          Y      
5282
> Brick server6:/data/glusterfs/home/brick3  49154     0          Y      
5794
> Brick server5:/data/glusterfs/home/brick4  49155     0          Y      
5293
> Brick server6:/data/glusterfs/home/brick4  49155     0          Y      
5806
> Brick server7:/data/glusterfs/home/brick1  49156     0          Y      
22339
> Brick server8:/data/glusterfs/home/brick1  49153     0          Y      
17992
> Brick server7:/data/glusterfs/home/brick2  49157     0          Y      
22347
> Brick server8:/data/glusterfs/home/brick2  49154     0          Y      
18546
> NFS Server on localhost                     2049      0          Y      
683
> Self-heal Daemon on localhost               N/A       N/A        Y      
693
> NFS Server on server8                      2049      0          Y      
18553
> Self-heal Daemon on server8                N/A       N/A        Y      
18566
> NFS Server on server5                      2049      0          Y      
23115
> Self-heal Daemon on server5                N/A       N/A        Y      
23121
> NFS Server on server7                      2049      0          Y      
4201
> Self-heal Daemon on server7                N/A       N/A        Y      
4210
> NFS Server on server3                      2049      0          Y      
5460
> Self-heal Daemon on server3                N/A       N/A        Y      
5469
> NFS Server on server6                      2049      0          Y      
22709
> Self-heal Daemon on server6                N/A       N/A        Y      
22718
> NFS Server on server4                      2049      0          Y      
6044
> Self-heal Daemon on server4                N/A       N/A        Y      
6243
>
> server 2 is currently powered off as we are waiting a replacement RAID
controller, as well as for
> server4:/data/glusterfs/home/brick4
>
> And as I said, there is a rebalance in progress
>> gluster rebalance home status
>                                     Node Rebalanced-files          size    
scanned      failures       skipped               status  run time in h:m:s
>                                ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
>                                localhost            42083        23.3GB    
1568065          1359        303734          in progress       16:49:31
>                                 server5            35698        23.8GB     
1027934             0        240748          in progress       16:49:23
>                                 server4            35096        23.4GB     
899491             0        229064          in progress       16:49:18
>                                 server3            27031        18.0GB     
701759             8        182592          in progress       16:49:27
>                                 server8                0        0Bytes     
327602             0           805          in progress       16:49:18
>                                 server6            35672        23.9GB     
1028469             0        240810          in progress       16:49:17
>                                 server7                1       45Bytes     
53             0             0            completed        0:03:53
> Estimated time left for rebalance to complete :   359739:51:24
> volume rebalance: home: success
>
>
> Thanks,
>
>
> A.
>
>
>
> On Thursday, 1 February 2018 18:57:17 CET Serkan ?oban wrote:
>> What is server4? You just mentioned server1 and server2 previously.
>> Can you post the output of gluster v status volname
>>
>> On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe <Alessandro.Ipe at
meteo.be> wrote:
>> > Hi,
>> >
>> >
>> > Thanks. However "gluster v heal volname full" returned
the following error
>> > message
>> > Commit failed on server4. Please check log file for details.
>> >
>> > I have checked the log files in /var/log/glusterfs on server4 (by
grepping
>> > heal), but did not get any match. What should I be looking for and
in
>> > which
>> > log file, please ?
>> >
>> > Note that there is currently a rebalance process running on the
volume.
>> >
>> >
>> > Many thanks,
>> >
>> >
>> > A.
>> >
>> > On Thursday, 1 February 2018 17:32:19 CET Serkan ?oban wrote:
>> >> You do not need to reset brick if brick path does not change.
Replace
>> >> the brick format and mount, then gluster v start volname
force.
>> >> To start self heal just run gluster v heal volname full.
>> >>
>> >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe
<Alessandro.Ipe at meteo.be>
>> >
>> > wrote:
>> >> > Hi,
>> >> >
>> >> >
>> >> > My volume home is configured in replicate mode (version
3.12.4) with
>> >> > the
>> >> > bricks server1:/data/gluster/brick1
>> >> > server2:/data/gluster/brick1
>> >> >
>> >> > server2:/data/gluster/brick1 was corrupted, so I killed
gluster daemon
>> >> > for
>> >> > that brick on server2, umounted it, reformated it,
remounted it and did
>> >> > a>
>> >> >
>> >> >> gluster volume reset-brick home
server2:/data/gluster/brick1
>> >> >> server2:/data/gluster/brick1 commit force>
>> >> >
>> >> > I was expecting that the self-heal daemon would start
copying data from
>> >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty
>> >> > server2:/data/gluster/brick1, which it only did for
directories, but
>> >> > not
>> >> > for files.
>> >> >
>> >> > For the moment, I launched on the fuse mount point
>> >> >
>> >> >> find . | xargs stat
>> >> >
>> >> > but crawling the whole volume (100 TB) to trigger
self-healing of a
>> >> > single
>> >> > brick of 7.4 TB is unefficient.
>> >> >
>> >> > Is there any trick to only self-heal a single brick,
either by setting
>> >> > some attributes to its top directory, for example ?
>> >> >
>> >> >
>> >> > Many thanks,
>> >> >
>> >> >
>> >> > Alessandro
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Gluster-users mailing list
>> >> > Gluster-users at gluster.org
>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> > --
>> >
>> >  Dr. Ir. Alessandro Ipe
>> >  Department of Observations             Tel. +32 2 373 06 31
>> >  Remote Sensing from Space
>> >  Royal Meteorological Institute
>> >  Avenue Circulaire 3                    Email:
>> >  B-1180 Brussels        Belgium         Alessandro.Ipe at meteo.be
>> >  Web: http://gerb.oma.be
>
>
> --
>
>  Dr. Ir. Alessandro Ipe
>  Department of Observations             Tel. +32 2 373 06 31
>  Remote Sensing from Space
>  Royal Meteorological Institute
>  Avenue Circulaire 3                    Email:
>  B-1180 Brussels        Belgium         Alessandro.Ipe at meteo.be
>  Web: http://gerb.oma.be
>
>
>

Maybe Matching Threads

Search for more possibly parallel threads

Gluster users - Feb 2018 - How to trigger a resync of a newly replaced empty brick in replicate config ?

[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

Maybe Matching Threads