----- Original Message -----
> From: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
> To: "Krutika Dhananjay" <kdhananj at redhat.com>
> Cc: "gluster-users" <gluster-users at gluster.org>
> Sent: Tuesday, October 27, 2015 5:17:09 PM
> Subject: Re: [Gluster-users] Shard Volume testing (3.7.5)
> On 26 October 2015 at 14:54, Krutika Dhananjay < kdhananj at redhat.com
> wrote:
> > Hi Lindsay,
>
> > Thank you for trying out sharding and for your feedback. :) Please
find my
> > comments inline.
>
> Hi Krutika, thanks for the feed back.
> > With block size as low as 4MB, to the replicate module, these
individual
> > shards appear as large number of small(er) files, effectively turning
it
> > into some form of a small-file workload.
>
> > There is an enhancement being worked on in AFR by Pranith, which
attempts
> > to
> > improve write performance which will especially be useful when used
with
> > sharding. That should make this problem go away.
>
> Cool, also for my purposes (VM Image hosting), block sizes of 512MB are
just
> as good and improve things considerably.
> > > One Bug:
> >
>
> > > After heals completed I shut down the VM's and run a MD5SUM
on the VM
> > > image
> > > (via glusterfs) on each nodes. They all matched except for one
time on
> > > gn3.
> > > Once I unmounted/remounted the datastore on gn3 the md5sum
matched.
> >
>
> > This could possibly be the effect of a caching bug reported at
> > https://bugzilla.redhat.com/show_bug.cgi?id=1272986 . The fix is out
for
> > review and I'm confident that it will make it into 3.7.6.
>
> Cool, I can replicate it fairly reliable at the moment.
> Would it occur when using qemu/gfapi direct?
> > > One Oddity:
> >
>
> > > gluster volume heals datastore info *always* shows a split brain
on the
> > > directory, but it always heals without intervention. Dunno if
this is
> > > normal
> > > on not.
> >
>
> > Which directory would this be?
>
> Oddly it was the .shard directory
> > Do you have the glustershd logs?
>
> Sorry no, and I haven't managed to replicate it again. Will keep
trying.
> > Here is some documentation on sharding:
> > https://gluster.readthedocs.org/en/release-3.7.0/Features/shard/ . Let
me
> > know if you have more questions, and I will be happy to answer them.
>
> > The problems we foresaw with too many 4MB shards is that
>
> > i. entry self-heal under /.shard could result in complete crawl of the
> > /.shard directory during heal, or
>
> > ii. a disk replacement could involve lot many files needing to be
created
> > and
> > healed to the sink brick,
>
> > both of which would result in slower "entry" heal and rather
high resource
> > consumption from self-heal daemon.
>
> Thanks, most interesting reading.
> > Fortunately, with the introduction of more granular changelogs in
replicate
> > module to identify exactly what files under a given directory need to
be
> > healed to the sink brick, these problems should go away.
>
> > In fact this enhancement is being worked upon as we speak and is
targeted
> > to
> > be out by 3.8. Here is some doc:
> >
http://review.gluster.org/#/c/12257/1/in_progress/afr-self-heal-improvements.md
> > (read section "Granular entry self-heals").
>
> That look very interesting - in fact from my point of view, it replaces the
> need for sharding altogether, that being the speed of heals.
So sharding also helps with better disk utilization in distributed-replicated
volumes for large files (like VM images).
So if you have a 2x3 volume with each brick having 10G space (say), even though
the aggregated size of the volume (due to the presence of distribute) is 20G,
without sharding you cannot create an image whose size is between 11G-20G on the
volume.
With sharding, breaking large files into smaller pieces will ensure better
utilisation of available space.
There are other long-term benefits one could reap from using sharding: for
instance, for someone who might want to use tiering in VM store use-case, having
sharding will be beneficial in terms of only migrating the shards between hot
and cold tiers, as opposed to moving large files in full, even if only a small
portion of the file is changed/accessed. :)
> > Yes. So Paul Cuzner and Satheesaran who have been testing sharding
here
> > have
> > reported better write performance with 512M shards. I'd be
interested to
> > know what you feel about performance with relatively larger shards
(think
> > 512M).
>
> Seq Read speeds basically tripled, and seq writes improved to the limit of
> the network connection.
OK. And what about the data heal performance with 512M shards? Satisfactory?
-Krutika
> Cheers,
> --
> Lindsay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151028/d562f2c7/attachment.html>