thr3ads.net - Gluster users - [Gluster-users] Restore a replica after failed hardware [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Shanon Swafford

2020-Jul-06 20:32 UTC

[Gluster-users] Restore a replica after failed hardware

Hi guys,

 

I lost a brick in a 2x replicated system.  The volume is 17TB with 9TB used
( small files ).  3 drives failed in 2 hours in a raid-5 array.

 

Gluster version: 3.8.15

 

So "reset-brick" isn't available on this version.

 

I've googled all weekend and I'm overwhelmed so I'd like to verify
before I
muck everything up.

 

Is this the correct procedure to restore the failed brick?

 

# Replace drive

# Use parted to create /dev/sdb1

# Make xfs filesystem on /dev/sdb1

# Mount /var/glusterfs/sdb1

# gluster volume replace-brick myvol d-es2-nfs-a:/var/glusterfs/sdb1/myvol
d-es2-nfs-a:/var/glusterfs/sdb1/myvol commit force 

 

I read about using different brick names but again, I'm overwhelmed with all
the info on google.

 

I also saw something as simple as remove failed and re-add as new but..

 

Now I just read about xfsdump | xfsrestore to preload, but how would that
work with healing?

 

Thanks a ton in advance.

 

Shanon

 

 

 

[root at es2-nfs-a ~]# parted /dev/sdb print

Error: /dev/sdb: unrecognised disk label

Model: DELL PERC H700 (scsi)

Disk /dev/sdb: 18.2TB

Sector size (logical/physical): 512B/512B

Partition Table: unknown

Disk Flags:

 

 

[root at es2-nfs-a ~]# grep sdb /etc/fstab

/dev/sdb1               /var/glusterfs/sdb1     xfs     inode64         0 0

 

 

[root at es2-nfs-a ~]# gluster volume info

 

Volume Name: myvol

Type: Replicate

Volume ID: 49fd2a63-f887-4478-9242-69030a7a565d

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: d-es2-nfs-a:/var/glusterfs/sdb1/myvol

Brick2: d-es2-nfs-b:/var/glusterfs/sdb1/myvol

Options Reconfigured:

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

performance.cache-size: 1GB

 

 

[root at es2-nfs-a ~]# gluster volume status

Status of volume: myvol

Gluster process                             TCP Port  RDMA Port  Online  Pid

----------------------------------------------------------------------------
--

Brick d-es2-nfs-a:/var/glusterfs/sdb1/myvol

Brick d-es2-nfs-b:/var/glusterfs/sdb1/myvol

Self-heal Daemon on localhost               N/A       N/A        Y
2475

Self-heal Daemon on d-es2-nfs-b             N/A       N/A        Y
8663

 

Task Status of Volume myvol

----------------------------------------------------------------------------
--

There are no active volume tasks

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200706/a9ba1c68/attachment.html>

Strahil Nikolov

2020-Jul-07 04:02 UTC

head link

[Gluster-users] Restore a replica after failed hardware

It looks  OK.
Usually in the doc you got 'replace-brick source  destination start'
which stops  the brick process  and I do it this way.

Also, you should consider:
- adding 'noatime' to the mount options
- Check what is your stripe width (stride multiplied by data disks)  and then
create xfs  with the necessary options to align it properly.

I have never used xfsdump to recover a brick. Just ensure gluster brick process
is not running on the node during the restore.

Best Regards,
Strahil Nikolov


?? 6 ??? 2020 ?. 23:32:28 GMT+03:00, Shanon Swafford <shanondink at
gmail.com> ??????:>Hi guys,
>
> 
>
>I lost a brick in a 2x replicated system.  The volume is 17TB with 9TB
>used
>( small files ).  3 drives failed in 2 hours in a raid-5 array.
>
> 
>
>Gluster version: 3.8.15
>
> 
>
>So "reset-brick" isn't available on this version.
>
> 
>
>I've googled all weekend and I'm overwhelmed so I'd like to
verify
>before I
>muck everything up.
>
> 
>
>Is this the correct procedure to restore the failed brick?
>
> 
>
># Replace drive
>
># Use parted to create /dev/sdb1
>
># Make xfs filesystem on /dev/sdb1
>
># Mount /var/glusterfs/sdb1
>
># gluster volume replace-brick myvol
>d-es2-nfs-a:/var/glusterfs/sdb1/myvol
>d-es2-nfs-a:/var/glusterfs/sdb1/myvol commit force 
>
> 
>
>I read about using different brick names but again, I'm overwhelmed
>with all
>the info on google.
>
> 
>
>I also saw something as simple as remove failed and re-add as new but..
>
> 
>
>Now I just read about xfsdump | xfsrestore to preload, but how would
>that
>work with healing?
>
> 
>
>Thanks a ton in advance.
>
> 
>
>Shanon
>
> 
>
> 
>
> 
>
>[root at es2-nfs-a ~]# parted /dev/sdb print
>
>Error: /dev/sdb: unrecognised disk label
>
>Model: DELL PERC H700 (scsi)
>
>Disk /dev/sdb: 18.2TB
>
>Sector size (logical/physical): 512B/512B
>
>Partition Table: unknown
>
>Disk Flags:
>
> 
>
> 
>
>[root at es2-nfs-a ~]# grep sdb /etc/fstab
>
>/dev/sdb1               /var/glusterfs/sdb1     xfs     inode64        
>0 0
>
> 
>
> 
>
>[root at es2-nfs-a ~]# gluster volume info
>
> 
>
>Volume Name: myvol
>
>Type: Replicate
>
>Volume ID: 49fd2a63-f887-4478-9242-69030a7a565d
>
>Status: Started
>
>Snapshot Count: 0
>
>Number of Bricks: 1 x 2 = 2
>
>Transport-type: tcp
>
>Bricks:
>
>Brick1: d-es2-nfs-a:/var/glusterfs/sdb1/myvol
>
>Brick2: d-es2-nfs-b:/var/glusterfs/sdb1/myvol
>
>Options Reconfigured:
>
>nfs.disable: on
>
>performance.readdir-ahead: on
>
>transport.address-family: inet
>
>performance.cache-size: 1GB
>
> 
>
> 
>
>[root at es2-nfs-a ~]# gluster volume status
>
>Status of volume: myvol
>
>Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
>----------------------------------------------------------------------------
>--
>
>Brick d-es2-nfs-a:/var/glusterfs/sdb1/myvol
>
>Brick d-es2-nfs-b:/var/glusterfs/sdb1/myvol
>
>Self-heal Daemon on localhost               N/A       N/A        Y
>2475
>
>Self-heal Daemon on d-es2-nfs-b             N/A       N/A        Y
>8663
>
> 
>
>Task Status of Volume myvol
>
>----------------------------------------------------------------------------
>--
>
>There are no active volume tasks
>
> 
>
>

Gluster users - Jul 2020 - Restore a replica after failed hardware

[Gluster-users] Restore a replica after failed hardware

[Gluster-users] Restore a replica after failed hardware