thr3ads.net - Gluster users - [Gluster-users] rebalance and volume commit hash [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Piotr Misiak

2017-Jan-17 13:42 UTC

[Gluster-users] rebalance and volume commit hash

17 sty 2017 14:28 Jeff Darcy <jdarcy at redhat.com>
napisa?(a):>
> > Can you tell me please why every volume rebalance generates a new
value
> > for the volume commit hash? 
> > 
> > If I have fully rebalanced cluster (or almost) with millions of 
> > directories then rebalance has to change DHT xattr for every directory
> > only because there is a new volume commit hash value. It is pointless
in
> > my opinion. Is there any reason behind this? As I observed, the volume
> > commit hash is set at the rebalance beginning which totally destroys 
> > benefit of lookup optimization algorithm for directories not 
> > scanned/fixed yet by this rebalance run. 
>
> It disables the optimization because the optimization would no longer 
> lead to correct results.? There are plenty of distributed filesystems 
> that seem to have "fast but wrong" as a primary design goal;
we're
> not one of them. 
>
> The best way to think of the volume-commit-hash update is as a kind of 
> cache invalidation.? Lookup optimization is only valid as long as we 
> know that the actual distribution of files within a directory is 
> consistent with the current volume topology.? That ceases to be the 
> case as soon as we add or remove a brick, leaving us with three choices. 
>
> (1) Don't do lookup optimization at all.? *Every* time we fail to find 
> a file on the brick where hashing says it should be, look *everywhere* 
> else.? That's how things used to work, and still work if lookup 
> optimization is disabled.? The drawback is that every add/remove brick 
> operation causes a permanent and irreversible degradation of lookup 
> performance.? Even on a freshly created volume, lookups for files that 
> don't exist anywhere will cause every brick to be queried. 
>
> (2) Mark every directory as "unoptimized" at the very beginning
of
> rebalance.? Besides being almost as slow as fix-layout itself, this 
> would require blocking all lookups and other directory operations 
> *anywhere in the volume* while it completes. 
>
> (3) Change the volume commit hash, effectively marking every 
> directory as unoptimized without actually having to touch every one. 
> The root-directory operation is cheap and almost instantaneous. 
> Checking each directory commit hash isn't free, but it's still a 
> lot better than (1) above.? With upcalls we can enhance this even 
> further. 
>
> Now that you know a bit more about the tradeoffs, do "pointless" 
> and "destroys the benefit" still seem accurate? 
>
Thank you Jeff for your response. I understand this optimisation clearly but I
don't understand why? new commit hash is generated for the volume during
rebalance process? I think it should be generated only during add/remove brick
events but not during rebalance.

Thanks

Jeff Darcy

2017-Jan-17 14:10 UTC

head link

[Gluster-users] rebalance and volume commit hash

> I don't understand why? new commit hash is generated for the volume
during
> rebalance process? I think it should be generated only during add/remove
> brick events but not during rebalance.
The mismatch only becomes important during rebalance.  Prior to that, even
if we've added or removed a brick, the layouts haven't changed and the
optimization is still as valid as it was before.  If there are multiple
add/remove operations, we don't need or want to change the hash between
them.  Conversely, there are cases besides add/remove brick where we might
want to do a rebalance - e.g. after replace-brick with a brick of a
different size, or to change between total-space vs. free-space weighting.
Changing the hash in add/remove brick doesn't handle these cases, but
changing it at the start of rebalance does.

Gluster users - Jan 2017 - rebalance and volume commit hash

[Gluster-users] rebalance and volume commit hash

[Gluster-users] rebalance and volume commit hash