Shanon Swafford
2020-Jul-06 20:32 UTC
[Gluster-users] Restore a replica after failed hardware
Hi guys, I lost a brick in a 2x replicated system. The volume is 17TB with 9TB used ( small files ). 3 drives failed in 2 hours in a raid-5 array. Gluster version: 3.8.15 So "reset-brick" isn't available on this version. I've googled all weekend and I'm overwhelmed so I'd like to verify before I muck everything up. Is this the correct procedure to restore the failed brick? # Replace drive # Use parted to create /dev/sdb1 # Make xfs filesystem on /dev/sdb1 # Mount /var/glusterfs/sdb1 # gluster volume replace-brick myvol d-es2-nfs-a:/var/glusterfs/sdb1/myvol d-es2-nfs-a:/var/glusterfs/sdb1/myvol commit force I read about using different brick names but again, I'm overwhelmed with all the info on google. I also saw something as simple as remove failed and re-add as new but.. Now I just read about xfsdump | xfsrestore to preload, but how would that work with healing? Thanks a ton in advance. Shanon [root at es2-nfs-a ~]# parted /dev/sdb print Error: /dev/sdb: unrecognised disk label Model: DELL PERC H700 (scsi) Disk /dev/sdb: 18.2TB Sector size (logical/physical): 512B/512B Partition Table: unknown Disk Flags: [root at es2-nfs-a ~]# grep sdb /etc/fstab /dev/sdb1 /var/glusterfs/sdb1 xfs inode64 0 0 [root at es2-nfs-a ~]# gluster volume info Volume Name: myvol Type: Replicate Volume ID: 49fd2a63-f887-4478-9242-69030a7a565d Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: d-es2-nfs-a:/var/glusterfs/sdb1/myvol Brick2: d-es2-nfs-b:/var/glusterfs/sdb1/myvol Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet performance.cache-size: 1GB [root at es2-nfs-a ~]# gluster volume status Status of volume: myvol Gluster process TCP Port RDMA Port Online Pid ---------------------------------------------------------------------------- -- Brick d-es2-nfs-a:/var/glusterfs/sdb1/myvol Brick d-es2-nfs-b:/var/glusterfs/sdb1/myvol Self-heal Daemon on localhost N/A N/A Y 2475 Self-heal Daemon on d-es2-nfs-b N/A N/A Y 8663 Task Status of Volume myvol ---------------------------------------------------------------------------- -- There are no active volume tasks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200706/a9ba1c68/attachment.html>
Strahil Nikolov
2020-Jul-07 04:02 UTC
[Gluster-users] Restore a replica after failed hardware
It looks OK. Usually in the doc you got 'replace-brick source destination start' which stops the brick process and I do it this way. Also, you should consider: - adding 'noatime' to the mount options - Check what is your stripe width (stride multiplied by data disks) and then create xfs with the necessary options to align it properly. I have never used xfsdump to recover a brick. Just ensure gluster brick process is not running on the node during the restore. Best Regards, Strahil Nikolov ?? 6 ??? 2020 ?. 23:32:28 GMT+03:00, Shanon Swafford <shanondink at gmail.com> ??????:>Hi guys, > > > >I lost a brick in a 2x replicated system. The volume is 17TB with 9TB >used >( small files ). 3 drives failed in 2 hours in a raid-5 array. > > > >Gluster version: 3.8.15 > > > >So "reset-brick" isn't available on this version. > > > >I've googled all weekend and I'm overwhelmed so I'd like to verify >before I >muck everything up. > > > >Is this the correct procedure to restore the failed brick? > > > ># Replace drive > ># Use parted to create /dev/sdb1 > ># Make xfs filesystem on /dev/sdb1 > ># Mount /var/glusterfs/sdb1 > ># gluster volume replace-brick myvol >d-es2-nfs-a:/var/glusterfs/sdb1/myvol >d-es2-nfs-a:/var/glusterfs/sdb1/myvol commit force > > > >I read about using different brick names but again, I'm overwhelmed >with all >the info on google. > > > >I also saw something as simple as remove failed and re-add as new but.. > > > >Now I just read about xfsdump | xfsrestore to preload, but how would >that >work with healing? > > > >Thanks a ton in advance. > > > >Shanon > > > > > > > >[root at es2-nfs-a ~]# parted /dev/sdb print > >Error: /dev/sdb: unrecognised disk label > >Model: DELL PERC H700 (scsi) > >Disk /dev/sdb: 18.2TB > >Sector size (logical/physical): 512B/512B > >Partition Table: unknown > >Disk Flags: > > > > > >[root at es2-nfs-a ~]# grep sdb /etc/fstab > >/dev/sdb1 /var/glusterfs/sdb1 xfs inode64 >0 0 > > > > > >[root at es2-nfs-a ~]# gluster volume info > > > >Volume Name: myvol > >Type: Replicate > >Volume ID: 49fd2a63-f887-4478-9242-69030a7a565d > >Status: Started > >Snapshot Count: 0 > >Number of Bricks: 1 x 2 = 2 > >Transport-type: tcp > >Bricks: > >Brick1: d-es2-nfs-a:/var/glusterfs/sdb1/myvol > >Brick2: d-es2-nfs-b:/var/glusterfs/sdb1/myvol > >Options Reconfigured: > >nfs.disable: on > >performance.readdir-ahead: on > >transport.address-family: inet > >performance.cache-size: 1GB > > > > > >[root at es2-nfs-a ~]# gluster volume status > >Status of volume: myvol > >Gluster process TCP Port RDMA Port Online > Pid > >---------------------------------------------------------------------------- >-- > >Brick d-es2-nfs-a:/var/glusterfs/sdb1/myvol > >Brick d-es2-nfs-b:/var/glusterfs/sdb1/myvol > >Self-heal Daemon on localhost N/A N/A Y >2475 > >Self-heal Daemon on d-es2-nfs-b N/A N/A Y >8663 > > > >Task Status of Volume myvol > >---------------------------------------------------------------------------- >-- > >There are no active volume tasks > > > >