17 sty 2017 14:28 Jeff Darcy <jdarcy at redhat.com>
napisa?(a):>
> > Can you tell me please why every volume rebalance generates a new
value
> > for the volume commit hash?
> >
> > If I have fully rebalanced cluster (or almost) with millions of
> > directories then rebalance has to change DHT xattr for every directory
> > only because there is a new volume commit hash value. It is pointless
in
> > my opinion. Is there any reason behind this? As I observed, the volume
> > commit hash is set at the rebalance beginning which totally destroys
> > benefit of lookup optimization algorithm for directories not
> > scanned/fixed yet by this rebalance run.
>
> It disables the optimization because the optimization would no longer
> lead to correct results.? There are plenty of distributed filesystems
> that seem to have "fast but wrong" as a primary design goal;
we're
> not one of them.
>
> The best way to think of the volume-commit-hash update is as a kind of
> cache invalidation.? Lookup optimization is only valid as long as we
> know that the actual distribution of files within a directory is
> consistent with the current volume topology.? That ceases to be the
> case as soon as we add or remove a brick, leaving us with three choices.
>
> (1) Don't do lookup optimization at all.? *Every* time we fail to find
> a file on the brick where hashing says it should be, look *everywhere*
> else.? That's how things used to work, and still work if lookup
> optimization is disabled.? The drawback is that every add/remove brick
> operation causes a permanent and irreversible degradation of lookup
> performance.? Even on a freshly created volume, lookups for files that
> don't exist anywhere will cause every brick to be queried.
>
> (2) Mark every directory as "unoptimized" at the very beginning
of
> rebalance.? Besides being almost as slow as fix-layout itself, this
> would require blocking all lookups and other directory operations
> *anywhere in the volume* while it completes.
>
> (3) Change the volume commit hash, effectively marking every
> directory as unoptimized without actually having to touch every one.
> The root-directory operation is cheap and almost instantaneous.
> Checking each directory commit hash isn't free, but it's still a
> lot better than (1) above.? With upcalls we can enhance this even
> further.
>
> Now that you know a bit more about the tradeoffs, do "pointless"
> and "destroys the benefit" still seem accurate?
>
Thank you Jeff for your response. I understand this optimisation clearly but I
don't understand why? new commit hash is generated for the volume during
rebalance process? I think it should be generated only during add/remove brick
events but not during rebalance.
Thanks