On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta < gandalf.corvotempesta at gmail.com> wrote:> 2017-05-01 20:43 GMT+02:00 Shyam <srangana at redhat.com>: > > I do agree that for the duration a brick is replaced its replication > count > > is down by 1, is that your concern? In which case I do note that without > (a) > > above, availability is at risk during the operation. Which needs other > > strategies/changes to ensure tolerance to errors/faults. > > Oh, yes, i've forgot this too. > > I don't know Ceph, but Lizard, when moving chunks across the cluster, > does a copy, not a movement > During the whole operation you'll end with some files/chunks > replicated more than the requirement. >Replace-brick as a command is implemented with the goal of replacing a disk that went bad. So the availability was already less. In 2013-2014 I proposed that we do it by adding brick to just the replica set and increase its replica-count just for that set once heal is complete we could remove this brick. But at the point I didn't see any benefit to that approach, because availability was already down by 1. But with all of this discussion it seems like a good time to revive this idea. I saw that Shyam suggested the same in the PR he mentioned before.> > If you have a replica 3, during the movement, some file get replica 4 > In Gluster the same operation will bring you replica 2. > > IMHO, this isn't a viable/reliable solution > > Any change to change "replace-brick" to increase the replica count > during the operation ? >It can be done. We just need to find time to do this. -- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170502/a2729c58/attachment.html>
On 05/01/2017 02:55 PM, Pranith Kumar Karampuri wrote:> > > On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta > <gandalf.corvotempesta at gmail.com > <mailto:gandalf.corvotempesta at gmail.com>> wrote: > > 2017-05-01 20:43 GMT+02:00 Shyam <srangana at redhat.com > <mailto:srangana at redhat.com>>: > > I do agree that for the duration a brick is replaced its replication count > > is down by 1, is that your concern? In which case I do note that without (a) > > above, availability is at risk during the operation. Which needs other > > strategies/changes to ensure tolerance to errors/faults. > > Oh, yes, i've forgot this too. > > I don't know Ceph, but Lizard, when moving chunks across the cluster, > does a copy, not a movement > During the whole operation you'll end with some files/chunks > replicated more than the requirement. > > > Replace-brick as a command is implemented with the goal of replacing a > disk that went bad. So the availability was already less. In 2013-2014 I > proposed that we do it by adding brick to just the replica set and > increase its replica-count just for that set once heal is complete we > could remove this brick. But at the point I didn't see any benefit to > that approach, because availability was already down by 1. But with all > of this discussion it seems like a good time to revive this idea. I saw > that Shyam suggested the same in the PR he mentioned before.Ah! I did not know this, thanks. Yes, in essence this is what I suggest, but at that time (13-14) I guess we did not have EC, so in the current proposal I include EC and also on ways to deal with pure-distribute only environments, using the same/similar scheme.> > > > If you have a replica 3, during the movement, some file get replica 4 > In Gluster the same operation will bring you replica 2. > > IMHO, this isn't a viable/reliable solution > > Any change to change "replace-brick" to increase the replica count > during the operation ? > > It can be done. We just need to find time to do this.Agreed, to add to this point, and to reiterate. We are looking at "+1 scaling", this discussion helps in attempting to converge on a lot of why's for the same at least, if not necessarily the how's. So, Gandalf, it will be part of the roadmap, just when we maybe able to pick and deliver this is not clear yet (as Pranith puts it as well).> > > -- > Pranith
On Mon, May 1, 2017 at 2:55 PM, Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:> > > On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta < > gandalf.corvotempesta at gmail.com> wrote: > >> 2017-05-01 20:43 GMT+02:00 Shyam <srangana at redhat.com>: >> > I do agree that for the duration a brick is replaced its replication >> count >> > is down by 1, is that your concern? In which case I do note that >> without (a) >> > above, availability is at risk during the operation. Which needs other >> > strategies/changes to ensure tolerance to errors/faults. >> >> Oh, yes, i've forgot this too. >> >> I don't know Ceph, but Lizard, when moving chunks across the cluster, >> does a copy, not a movement >> During the whole operation you'll end with some files/chunks >> replicated more than the requirement. >> > > Replace-brick as a command is implemented with the goal of replacing a > disk that went bad. So the availability was already less. In 2013-2014 I > proposed that we do it by adding brick to just the replica set and increase > its replica-count just for that set once heal is complete we could remove > this brick. But at the point I didn't see any benefit to that approach, > because availability was already down by 1. But with all of this discussion > it seems like a good time to revive this idea. I saw that Shyam suggested > the same in the PR he mentioned before. > >The ability to increase and decrease the replication count within a replica set would be pretty cool. In addition to replace-brick, workloads that need elasticity to serve reads can benefit from more replicas to provide load balancing. Once the load is back to normal, we can cull the temporary brick. We might also want to start thinking about spare bricks that can be brought into a volume based on some policy. For example, if the posix health checker determines that underlying storage stack has problems, we can bring a spare brick into the volume to replace the failing brick. More policies can be evolved for triggering the action of bringing in a spare brick to a volume. -Vijay -Vijay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170501/1adb00c0/attachment.html>
2017-05-01 20:55 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:> Replace-brick as a command is implemented with the goal of replacing a disk > that went bad. So the availability was already less. In 2013-2014 I proposed > that we do it by adding brick to just the replica set and increase its > replica-count just for that set once heal is complete we could remove this > brick. But at the point I didn't see any benefit to that approach, because > availability was already down by 1. But with all of this discussion it seems > like a good time to revive this idea. I saw that Shyam suggested the same in > the PR he mentioned before.Why availability is already less? replace-brick is usefull for adding a new disks (as we are discussing here) or if you have to preventive replace/dismiss a disk. If you have disks that are getting older and older, you can safely replace them one by one with replace-disks. Doing this way will keep you at desidered redundancy for the whole phase. If you just remove the older disk and let gluster to heal, you loose one replica. During the heal process another disk could fail and so on..... The same like with any RAID. If possible, adding the new disks and the remove the older one is better than brutally replace disks. "mdadm", with replace disks (and I think also ZFS) will add the new disks keeping full redundancy and after replacement is done, the older disk is decommissioned. I don't see any drawback in doing this even with gluster, only advantages.
On 05/01/2017 11:55 AM, Pranith Kumar Karampuri wrote:> > > On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta > <gandalf.corvotempesta at gmail.com > <mailto:gandalf.corvotempesta at gmail.com>> wrote: > > 2017-05-01 20:43 GMT+02:00 Shyam <srangana at redhat.com > <mailto:srangana at redhat.com>>: > > I do agree that for the duration a brick is replaced its > replication count > > is down by 1, is that your concern? In which case I do note that > without (a) > > above, availability is at risk during the operation. Which needs > other > > strategies/changes to ensure tolerance to errors/faults. > > Oh, yes, i've forgot this too. > > I don't know Ceph, but Lizard, when moving chunks across the cluster, > does a copy, not a movement > During the whole operation you'll end with some files/chunks > replicated more than the requirement. > > > Replace-brick as a command is implemented with the goal of replacing a > disk that went bad. So the availability was already less. In 2013-2014 > I proposed that we do it by adding brick to just the replica set and > increase its replica-count just for that set once heal is complete we > could remove this brick. But at the point I didn't see any benefit to > that approach, because availability was already down by 1. But with > all of this discussion it seems like a good time to revive this idea. > I saw that Shyam suggested the same in the PR he mentioned before.I've always been against the idea of being a replica down based on that supposition. I've never had to replace-brick because a brick failed. It's always been for reconfiguration reasons. Good monitoring and analysis can predict drive failures in plenty of time to replace a functioning brick.> > If you have a replica 3, during the movement, some file get replica 4 > In Gluster the same operation will bring you replica 2. > > IMHO, this isn't a viable/reliable solution > > Any change to change "replace-brick" to increase the replica count > during the operation ? > > It can be done. We just need to find time to do this. > > > -- > Pranith > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170501/50deadd7/attachment.html>