Ok, that made a lot of sense. I guess what I was expecting was that the
writes were (close to) immediately consistent, but Gluster is rather
designed to be eventually consistent.
Thanks for explaining all that.
Eric
On Thu, Apr 9, 2015 at 5:45 PM, Jeff Darcy <jdarcy at redhat.com> wrote:
> > Jeff: I don't really understand how a write-behind translator
could keep
> data
> > in memory before flushing to the replication module if the replication
is
> > synchronous. Or put another way, from whose perspective is the
> replication
> > synchronous? The gluster daemon or the creating client?
>
> That's actually a more complicated question than many would think.
When we
> say "synchronous replication" we're talking about
*durability* (i.e. does
> the disk see it) from the perspective of the replication module. It does
> none of its own caching or buffering. When it is asked to do a write, it
> does not report that write as complete until all copies have been updated.
>
> However, durability is not the same as consistency (i.e. do *other clients*
> see it) and the replication component does not exist in a vacuum. There
> are other components both before and after that can affect durability and
> consistency. We've already touched on the "after" part.
There might be
> caches at many levels that become stale as the result of a file being
> created and written. Of particular interest here are "negative
directory
> entries" which indicate that a file is *not* present. Until those
expire,
> it is possible to see a file as "not there" even though it does
actually
> exist on disk. We can control some of this caching, but not all.
>
> The other side is *before* the replication module, and that's where
> write-behind comes in. POSIX does not require that a write be immediately
> durable in the absence of O_SYNC/fsync and so on. We do honor those
> requirements where applicable. However, the most common user expectation
> is that we will defer/batch/coalesce writes, because making every write
> individually immediate and synchronous has a very large performance impact.
> Therefore we implement write-behind, as a layer above replication. Absent
> any specific request to perform a write immediately, data might sit there
> for an indeterminate (but usually short) time before the replication code
> even gets to see it.
>
> I don't think write-behind is likely to be the issue here, because it
> only applies to data within a file. It will pass create(2) calls through
> immediately, so all servers should become aware of the file's existence
> right away. On the other hand, various forms of caching on the *client*
> side (even if they're the same physical machines) could still prevent a
> new file from being seen immediately.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<gluster.org/pipermail/gluster-users/attachments/20150409/ae609b3e/attachment.html>