thr3ads.net - Gluster users - [Gluster-users] Synchronous replication, or no? [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Jeff Darcy

2015-Apr-09 15:45 UTC

[Gluster-users] Synchronous replication, or no?

> Jeff: I don't really understand how a write-behind translator could
keep data
> in memory before flushing to the replication module if the replication is
> synchronous. Or put another way, from whose perspective is the replication
> synchronous? The gluster daemon or the creating client?
That's actually a more complicated question than many would think.  When we
say "synchronous replication" we're talking about *durability*
(i.e. does
the disk see it) from the perspective of the replication module.  It does
none of its own caching or buffering.  When it is asked to do a write, it
does not report that write as complete until all copies have been updated.

However, durability is not the same as consistency (i.e. do *other clients*
see it) and the replication component does not exist in a vacuum.  There
are other components both before and after that can affect durability and
consistency.  We've already touched on the "after" part.  There
might be
caches at many levels that become stale as the result of a file being
created and written.  Of particular interest here are "negative directory
entries" which indicate that a file is *not* present.  Until those expire,
it is possible to see a file as "not there" even though it does
actually
exist on disk.  We can control some of this caching, but not all.

The other side is *before* the replication module, and that's where
write-behind comes in.  POSIX does not require that a write be immediately
durable in the absence of O_SYNC/fsync and so on.  We do honor those
requirements where applicable.  However, the most common user expectation
is that we will defer/batch/coalesce writes, because making every write
individually immediate and synchronous has a very large performance impact.
Therefore we implement write-behind, as a layer above replication.  Absent
any specific request to perform a write immediately, data might sit there
for an indeterminate (but usually short) time before the replication code
even gets to see it.

I don't think write-behind is likely to be the issue here, because it
only applies to data within a file.  It will pass create(2) calls through
immediately, so all servers should become aware of the file's existence
right away.  On the other hand, various forms of caching on the *client*
side (even if they're the same physical machines) could still prevent a
new file from being seen immediately.

Eric Mortensen

2015-Apr-09 18:15 UTC

head link

[Gluster-users] Synchronous replication, or no?

Ok, that made a lot of sense. I guess what I was expecting was that the
writes were (close to) immediately consistent, but Gluster is rather
designed to be eventually consistent.

Thanks for explaining all that.

Eric


On Thu, Apr 9, 2015 at 5:45 PM, Jeff Darcy <jdarcy at redhat.com> wrote:
> > Jeff: I don't really understand how a write-behind translator
could keep
> data
> > in memory before flushing to the replication module if the replication
is
> > synchronous. Or put another way, from whose perspective is the
> replication
> > synchronous? The gluster daemon or the creating client?
>
> That's actually a more complicated question than many would think. 
When we
> say "synchronous replication" we're talking about
*durability* (i.e. does
> the disk see it) from the perspective of the replication module.  It does
> none of its own caching or buffering.  When it is asked to do a write, it
> does not report that write as complete until all copies have been updated.
>
> However, durability is not the same as consistency (i.e. do *other clients*
> see it) and the replication component does not exist in a vacuum.  There
> are other components both before and after that can affect durability and
> consistency.  We've already touched on the "after" part. 
There might be
> caches at many levels that become stale as the result of a file being
> created and written.  Of particular interest here are "negative
directory
> entries" which indicate that a file is *not* present.  Until those
expire,
> it is possible to see a file as "not there" even though it does
actually
> exist on disk.  We can control some of this caching, but not all.
>
> The other side is *before* the replication module, and that's where
> write-behind comes in.  POSIX does not require that a write be immediately
> durable in the absence of O_SYNC/fsync and so on.  We do honor those
> requirements where applicable.  However, the most common user expectation
> is that we will defer/batch/coalesce writes, because making every write
> individually immediate and synchronous has a very large performance impact.
> Therefore we implement write-behind, as a layer above replication.  Absent
> any specific request to perform a write immediately, data might sit there
> for an indeterminate (but usually short) time before the replication code
> even gets to see it.
>
> I don't think write-behind is likely to be the issue here, because it
> only applies to data within a file.  It will pass create(2) calls through
> immediately, so all servers should become aware of the file's existence
> right away.  On the other hand, various forms of caching on the *client*
> side (even if they're the same physical machines) could still prevent a
> new file from being seen immediately.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150409/ae609b3e/attachment.html>

Gluster users - Apr 2015 - Synchronous replication, or no?

[Gluster-users] Synchronous replication, or no?

[Gluster-users] Synchronous replication, or no?