thr3ads.net - Gluster users - [Gluster-users] Atomic file updates [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Tom Munro Glass

2014-Feb-12 21:02 UTC

[Gluster-users] Atomic file updates

I'm not currently a Gluster user but I'm hoping it's the answer to a
problem I'm working on.

I manage a private web site that is basically a reporting tool for
equipment located at several hundred sites. Each site regularly uploads
zipped XML files to a cloud based server and this also provides a web
interface to the data using apache/PHP. The problem I need to solve is
that with a single server disk I/O has become a bottleneck.

The plan is to use a load balancer and multiple web servers with a
4-node Gluster volume behind to store the data. Data would be replicated
over 2 nodes.

The uploaded files are stored and then unzipped ready for reading by the
web interface code. Each file is unzipped into a temporary file and then
renamed, e.g.

file1.xml.zip --unzip--> uniquename.tmp --rename--> file1.xml

Use of the rename function makes these updates atomic.

How can I achieve atomic updates in this way using a Gluster volume? My
understanding is that renaming a file on a Gluster volume causes a link
file to be created and that clearly wouldn't be appropriate where there
are frequent updates.

I could use flock, exclusive for writing and shared for reading, but too
many reading processes could potentially block writing.

Any advice will be much appreciated.

Tom

Jay Vyas

2014-Feb-12 21:24 UTC

head link

[Gluster-users] Atomic file updates

For vanilla apps that are doing stuff in gluster, you normally do it
through a fuse mount.

mount -t glusterfs localhost:HadoopVol /mnt/glusterfs

But in your case, you might want to do some strict consistency settings to
make it atomic:

mount -t glusterfs localhost:HadoopVol -o
entry-timeout=0,attribute-timeout=0/mnt/glusterfs

This will make sure that everything is refreshed when you look up files.
This strategy has solved our eventual consistency requirements for the
hadoop plugin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140212/37191c60/attachment.html>

Jeff Darcy

2014-Feb-12 22:19 UTC

head link

[Gluster-users] Atomic file updates

> I'm not currently a Gluster user but I'm hoping it's the answer
to a
> problem I'm working on.
> 
> I manage a private web site that is basically a reporting tool for
> equipment located at several hundred sites. Each site regularly uploads
> zipped XML files to a cloud based server and this also provides a web
> interface to the data using apache/PHP. The problem I need to solve is
> that with a single server disk I/O has become a bottleneck.
> 
> The plan is to use a load balancer and multiple web servers with a
> 4-node Gluster volume behind to store the data. Data would be replicated
> over 2 nodes.
> 
> The uploaded files are stored and then unzipped ready for reading by the
> web interface code. Each file is unzipped into a temporary file and then
> renamed, e.g.
> 
> file1.xml.zip --unzip--> uniquename.tmp --rename--> file1.xml
> 
> Use of the rename function makes these updates atomic.
> 
> How can I achieve atomic updates in this way using a Gluster volume? My
> understanding is that renaming a file on a Gluster volume causes a link
> file to be created and that clearly wouldn't be appropriate where there
> are frequent updates.
Creating a file with one name and then renaming it to another *might*
cause creation of linkfiles, but I think concerns about linkfiles are
often overblown.  The one extra call to create a linkfile isn't much
compared to those for creating the file, writing into it, and then
renaming it even if the rename is local to one brick.  What really
matters is the performance of the entire sequence, with or without the
linkfile.

That said, there's also a trick you can use to avoid creation of a
linkfile.  Other tools, such as rsync and our own object interface,
use the same write-then-rename idiom.  To serve them, there's an
option called extra-hash-regex that can be used to place files on the
"right" brick according to their final name even though they're
created
with another.  Unfortunately, specifying that option via the command line
doesn't seem to work (it creates a malformed volfile) so you have to
mount a bit differently.  For example:

   glusterfs --volfile-server=a_server --volfile-id=a_volume \
   --xlator-option a_volume-dht.extra_hash_regex='(.*+)tmp' \
   /a/mountpoint

The important part is that second line.  That causes any file with a
"tmp" suffix to be hashed and placed as though only the part in the
first parenthesized part of the regex (i.e. without the "tmp") was
there.  Therefore, creating "xxxtmp" and then renaming it to
"xxx" is
the same as just creating "xxx" in the first place as far as linkfiles
etc. are concerned.  Note that the excluded part can be anything that
a regex can match, including a unique random number.  If I recall,
rsync uses temp files something like this:

   fubar = .fubar.NNNNNN (where NNNNNNN is a random number)

I know this probably seems a little voodoo-ish, but with a little bit
of experimentation to find the right regex you should be able to avoid
those dreaded linkfiles altogether.

Gluster users - Feb 2014 - Atomic file updates

[Gluster-users] Atomic file updates

[Gluster-users] Atomic file updates

[Gluster-users] Atomic file updates