Hi everyone, I've been using gluster for a few month now, on a simple 2 peers replicated infrastructure, 22Tb each. One of the peers has been offline last week during 10 hours (raid resync after a disk crash), and while my gluster server was healing bricks, I could see some write errors on my gluster clients. I couldn't find a way to isolate my healing peer, in the documentation or anywhere else. Is there a way to avoid that ? Detach the peer while healing ? Some tunning on the client side maybe ? I'm using gluster 3.9 on debian 8. Thank you for your help. Quentin
Hi, There is no way to isolate the healing peer. Healing happens from the good brick to the bad brick. I guess your replica bricks are on a different peers. If you try to isolate the healing peer, it will stop the healing process itself. What is the error you are getting while writing? It would be helpful to debug the issue, if you can provide us the output of the following commands: gluster volume info <vol_name> gluster volume heal <vol_name> info And also provide the client & heal logs. Thanks & Regards, Karthik On Mon, Oct 9, 2017 at 3:02 PM, ML <lists at websiteburo.com> wrote:> Hi everyone, > > I've been using gluster for a few month now, on a simple 2 peers > replicated infrastructure, 22Tb each. > > One of the peers has been offline last week during 10 hours (raid resync > after a disk crash), and while my gluster server was healing bricks, I > could see some write errors on my gluster clients. > > I couldn't find a way to isolate my healing peer, in the documentation or > anywhere else. > > Is there a way to avoid that ? Detach the peer while healing ? Some > tunning on the client side maybe ? > > I'm using gluster 3.9 on debian 8. > > Thank you for your help. > > Quentin > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171009/667098c0/attachment.html>
That make sense ^_^ Unfortunately I haven't kept the interresting data you need. Basically I had some write errors on my gluster clients when my monitoring tool tested mkdir & create files. The server's load was huge during the healing (cpu at 100%), and the disk latency increased a lot. That may be the source of my write errors, we'll know for sure next time... I'll keep & post all the data you asked. No way on the client side to force the gluster mount on 1 peer ? Thanks for your help Karthik! Quentin Le 09/10/2017 ? 12:10, Karthik Subrahmanya a ?crit?:> Hi, > > There is no way to isolate the healing peer. Healing happens from the > good brick to the bad brick. > I guess your replica bricks are on a different peers. If you try to > isolate the healing peer, it will stop the healing process itself. > > What is the error you are getting while writing? It would be helpful > to debug the issue, if you can provide us the output of the following > commands: > ??? gluster volume info <vol_name> > ??? gluster volume heal <vol_name> info > And also provide the client & heal logs. > > Thanks & Regards, > Karthik > > On Mon, Oct 9, 2017 at 3:02 PM, ML <lists at websiteburo.com > <mailto:lists at websiteburo.com>> wrote: > > Hi everyone, > > I've been using gluster for a few month now, on a simple 2 peers > replicated infrastructure, 22Tb each. > > One of the peers has been offline last week during 10 hours (raid > resync after a disk crash), and while my gluster server was > healing bricks, I could see some write errors on my gluster clients. > > I couldn't find a way to isolate my healing peer, in the > documentation or anywhere else. > > Is there a way to avoid that ? Detach the peer while healing ? > Some tunning on the client side maybe ? > > I'm using gluster 3.9 on debian 8. > > Thank you for your help. > > Quentin > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171009/01d86d03/attachment.html>
You have replica2 so you can't really take 50% of your cluster down without turning off quorum (and risking split brain). So detaching the rebuilding peer is really not an option. If you had replica3 or an arbiter, you CAN detach or isolate the problem peer.? I've done things like change the Gluster network IP on the 'bad' peer to help speed up a RAID6 rebuild, that wasn't happy with the gluster heal process going on at the same time. Your data will still be available and fully functional on the remaining peer (though you lost redundancy) Then once the raid rebuild had caught up, you could return the peer to the cluster and do a final 'heal'. -bill On 10/9/2017 2:32 AM, ML wrote:> > Hi everyone, > > I've been using gluster for a few month now, on a simple 2 peers > replicated infrastructure, 22Tb each. > > One of the peers has been offline last week during 10 hours (raid > resync after a disk crash), and while my gluster server was healing > bricks, I could see some write errors on my gluster clients. > > I couldn't find a way to isolate my healing peer, in the documentation > or anywhere else. > > Is there a way to avoid that ? Detach the peer while healing ? Some > tunning on the client side maybe ? > > I'm using gluster 3.9 on debian 8. > > Thank you for your help. > > Quentin > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users