thr3ads.net - Gluster users - [Gluster-users] trying to avoid a penalty for renaming every file [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Joe Julian

2016-Oct-27 20:23 UTC

[Gluster-users] trying to avoid a penalty for renaming every file

The best slide of all from the Gluster Developer Summit

https://twitter.com/JoeCyberGuru/status/784310321980674048


On 10/27/2016 11:52 AM, Christian Rice wrote:> We have a gluster pool where our developers are writing large files 
> with the suffix .COPYING, then removing that suffix when the copy is 
> done (renaming the file), so consumers of the data know the file is 
> safe to read.
>
> Back in 2014, Jeff Darcy wrote the below in a thread.  I guess I am 
> basically asking if this is still the supported/encouraged approach, 
> and how do I implement this when I?m using gluster?s NFS?
>
> "That said, there's also a trick you can use to avoid creation of
a
> linkfile.  Other tools, such as rsync and our own object interface,
> use the same write-then-rename idiom.  To serve them, there's an
> option called extra-hash-regex that can be used to place files on the
> "right" brick according to their final name even though
they're created
> with another.  Unfortunately, specifying that option via the command line
> doesn't seem to work (it creates a malformed volfile) so you have to
> mount a bit differently.  For example:
>
>    glusterfs --volfile-server=a_server --volfile-id=a_volume \
> --xlator-option a_volume-dht.extra_hash_regex='(.*+)tmp' \
>    /a/mountpoint
>
> The important part is that second line.  That causes any file with a
> "tmp" suffix to be hashed and placed as though only the part in
the
> first parenthesized part of the regex (i.e. without the "tmp")
was
> there.  Therefore, creating "xxxtmp" and then renaming it to
"xxx" is
> the same as just creating "xxx" in the first place as far as
linkfiles
> etc. are concerned.  Note that the excluded part can be anything that
> a regex can match, including a unique random number.  If I recall,
> rsync uses temp files something like this:
>
>    fubar = .fubar.NNNNNN (where NNNNNNN is a random number)
>
> I know this probably seems a little voodoo-ish, but with a little bit
> of experimentation to find the right regex you should be able to avoid
> those dreaded linkfiles altogether."
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161027/f36be574/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2016-10-27 13-22-04.png
Type: image/png
Size: 114132 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161027/f36be574/attachment.png>

Christian Rice

2016-Oct-28 05:02 UTC

head link

[Gluster-users] trying to avoid a penalty for renaming every file

That is great.  I appreciate the encouragement to continue in the right
direction.  What a relief!

I?m including my test procedure, in case it helps someone to visualize in ways a
slide cannot.  Using Debian 7/Wheezy, gluster bits 3.7.15-1, two hosts: burr and
van

from burr:
gluster peer probe van
vol create YAY transport tcp burr:/archive/gluster van:/archive/gluster
vol set YAY cluster.extra-hash-regex "(.*)\.COPYING"

check to make sure it looks right in
/var/lib/glusterd/vols/YAY/trusted-REGEXED.tcp-fuse.vol

vol start YAY
mount -t glusterfs 127.0.0.1:REGEXED ~crice/tiempo

vol info on both hosts, it should look like so:

crice at burr:~$ sudo gluster vol info

Volume Name: REGEXED
Type: Distribute
Volume ID: e751a668-97dc-4638-81cd-40b68c2438f2
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: burr:/archive/gluster
Brick2: van:/archive/gluster
Options Reconfigured:
performance.readdir-ahead: on
cluster.extra-hash-regex: "(.*)\.COPYING"


create two dirs, testA and testB

testA will have a ten files named foo1-10, but they were created as
foo1-10.COPYING then moved
testB wil have ten fies named bar1.10, but they were created as bar1-10.BAD

cd testA
for i in `seq 1 10`; do dd if=/dev/zero of=foo.${i}.COPYING bs=1M count=10; mv
foo.${i}.COPYING foo.$i; done
cd testB
for i in `seq 1 10`; do dd if=/dev/zero of=foo.${i}.BAD bs=1M count=10; mv
foo.${i}.BAD foo.$i; done

then check /archive/gluster/test{A,B} for link files on both burr and van.  The
testA dir should have no link files.

results:

crice at burr:/archive/gluster$ ls -lR testA testB
testA:
total 61440
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.10
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.3
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.4
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.5
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.6
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.7

testB:
total 61440
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.1
---------T 2 crice crice        0 Oct 27 21:17 foo.10
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.2
---------T 2 crice crice        0 Oct 27 21:17 foo.3
---------T 2 crice crice        0 Oct 27 21:17 foo.4
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.5
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.6
---------T 2 crice crice        0 Oct 27 21:17 foo.7
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.8
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.9


crice at van:/archive/gluster$ ls -lR testA testB
testA:
total 40960
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.1
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.2
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.8
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:16 foo.9

testB:
total 40960
---------T 2 crice crice        0 Oct 27 21:17 foo.1
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.10
---------T 2 crice crice        0 Oct 27 21:17 foo.2
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.3
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.4
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:17 foo.7
---------T 2 crice crice        0 Oct 27 21:17 foo.8
---------T 2 crice crice        0 Oct 27 21:17 foo.9


that totally effin worked!  and validated by repeating procedure based on my
notes


But having been tested using fuse, to be absolutely sure (because I don?t wanna
read code)  off I ran:
gluster vol set YAY nfs.disable false (glusterd immediately died, so I restarted
glusterfs-server, created a testC and testD dir and ran one more set over NFS:

cd /mnt/net/burr/YAY/
mkdir testC testD
cd testC
for i in `seq 1 10`; do dd if=/dev/zero of=foo.${i}.COPYING bs=1M count=10; mv
foo.${i}.COPYING foo.$i; done
cd testD
for i in `seq 1 10`; do dd if=/dev/zero of=foo.${i}.BAD bs=1M count=10; mv
foo.${i}.BAD foo.$i; done

and it was also good:

crice at burr:/archive/gluster$ ls -lR testC testD
testC:
total 98304
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.10
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.3
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.4
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.5
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.6
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.7

testD:
total 65536
---------T 2 crice crice        0 Oct 27 21:42 foo.1
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.10
---------T 2 crice crice        0 Oct 27 21:42 foo.2
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.3
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.4
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.7
---------T 2 crice crice        0 Oct 27 21:42 foo.8
---------T 2 crice crice        0 Oct 27 21:42 foo.9


crice at van:/archive/gluster$ ls -lR testC testD
testC:
total 65536
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.1
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.2
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.8
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:41 foo.9

testD:
total 98304
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.1
---------T 2 crice crice        0 Oct 27 21:42 foo.10
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.2
---------T 2 crice crice        0 Oct 27 21:42 foo.3
---------T 2 crice crice        0 Oct 27 21:42 foo.4
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.5
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.6
---------T 2 crice crice        0 Oct 27 21:42 foo.7
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.8
-rw-r--r-- 2 crice crice 10485760 Oct 27 21:42 foo.9

happy happy!  Thanks for sharing!

From: <gluster-users-bounces at gluster.org<mailto:gluster-users-bounces
at gluster.org>> on behalf of Joe Julian <joe at
julianfamily.org<mailto:joe at julianfamily.org>>
Date: Thursday, October 27, 2016 at 1:23 PM
To: "gluster-users at gluster.org<mailto:gluster-users at
gluster.org>" <gluster-users at gluster.org<mailto:gluster-users
at gluster.org>>
Subject: Re: [Gluster-users] trying to avoid a penalty for renaming every file


The best slide of all from the Gluster Developer Summit

https://twitter.com/JoeCyberGuru/status/784310321980674048<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_JoeCyberGuru_status_784310321980674048&d=CwMF-g&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=NE1NbWtVhgG-K7YvLdoLZigfFx8zGPwOGk6HWpYK04I&m=oAvh1QvYKW9NOvU2J6Nv99NxcxuPLY8hEUgiKGVWIBE&s=HdVvih36RDFLYWffBJRi706ww0YDGiQQugFxvHQGtDI&e=>

[cid:part1.471D37CF.4FE9DD6D at julianfamily.org]

On 10/27/2016 11:52 AM, Christian Rice wrote:
We have a gluster pool where our developers are writing large files with the
suffix .COPYING, then removing that suffix when the copy is done (renaming the
file), so consumers of the data know the file is safe to read.

Back in 2014, Jeff Darcy wrote the below in a thread.  I guess I am basically
asking if this is still the supported/encouraged approach, and how do I
implement this when I?m using gluster?s NFS?

"That said, there's also a trick you can use to avoid creation of a
linkfile.  Other tools, such as rsync and our own object interface,
use the same write-then-rename idiom.  To serve them, there's an
option called extra-hash-regex that can be used to place files on the
"right" brick according to their final name even though they're
created
with another.  Unfortunately, specifying that option via the command line
doesn't seem to work (it creates a malformed volfile) so you have to
mount a bit differently.  For example:

   glusterfs --volfile-server=a_server --volfile-id=a_volume \
   --xlator-option a_volume-dht.extra_hash_regex='(.*+)tmp' \
   /a/mountpoint

The important part is that second line.  That causes any file with a
"tmp" suffix to be hashed and placed as though only the part in the
first parenthesized part of the regex (i.e. without the "tmp") was
there.  Therefore, creating "xxxtmp" and then renaming it to
"xxx" is
the same as just creating "xxx" in the first place as far as linkfiles
etc. are concerned.  Note that the excluded part can be anything that
a regex can match, including a unique random number.  If I recall,
rsync uses temp files something like this:

   fubar = .fubar.NNNNNN (where NNNNNNN is a random number)

I know this probably seems a little voodoo-ish, but with a little bit
of experimentation to find the right regex you should be able to avoid
those dreaded linkfiles altogether."



_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org>http://www.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Dusers&d=CwMF-g&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=NE1NbWtVhgG-K7YvLdoLZigfFx8zGPwOGk6HWpYK04I&m=oAvh1QvYKW9NOvU2J6Nv99NxcxuPLY8hEUgiKGVWIBE&s=5cEB0fLFIY_qGLOfLvvEEFa1Fks3TxzdcDBSgzA3fDA&e=>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161028/b81979ff/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2016-10-27 13-22-04.png
Type: image/png
Size: 114132 bytes
Desc: Screenshot from 2016-10-27 13-22-04.png
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161028/b81979ff/attachment.png>

Gluster users - Oct 2016 - trying to avoid a penalty for renaming every file

[Gluster-users] trying to avoid a penalty for renaming every file

[Gluster-users] trying to avoid a penalty for renaming every file