thr3ads.net - Gluster users - [Gluster-users] Concurrent writes to same file [Jul 2013]

If this information is useful, please help other people find it:
Share via:
frank.gruellich at here.com
2013-Jul-25 07:28 UTC
[Gluster-users] Concurrent writes to same file

Hi,

we're trying to setup a web caching application on top of GlusterFS which is
supposed to provide shared storage for the a bunch of servers (2 right now).  We
have a similar setup running on NFS which works okay.  GlusterFS however is
giving us some headaches.

We use two servers ($GLUFS001 and $GLUFS002) in AWS both having their brick FS
mounted at /mnt/brick01/ with a vol01/ used as brick.  We tried XFS and EXT4 for
the underlying FS.  GlusterFS is setup with:

sudo gluster volume create tst_vol_aws01 replica 2 transport tcp
"$GLUFS001:/mnt/brick01/vol01/"
"$GLUFS002:/mnt/brick01/vol01/"

That's my volume info at the end:

Volume Name: tst_vol_aws01
Type: Replicate
Volume ID: f4b060fa-d6a8-45d2-89c5-1d99637dcc8a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: $GLUFS001:/mnt/brick01/vol01
Brick2: $GLUFS002:/mnt/brick01/vol01
Options Reconfigured:
performance.cache-size: 256MB

The clients are running on the very same machines mounting with glusterfs:

sudo mount -t glusterfs localhost:/tst_vol_aws01 /mnt/tst_vol_aws

which is used as storage for the web cache application.  This fetches misses
from an origin and places it into the storage to use it next time and also make
it available to the other web cache.  It does so by writing a file to a
temporary, per client unique location (eg. /mnt/tst_vol_aws/tmp/$IPADDR/) and
then moving the file into its final location (below
/mnt/tst_vol_aws/sat_p_cache/).

As long as I send requests to only one of the instances everything is fine.  The
trouble comes when I send requests to both of them and cause concurrent writes
to the storage.  Though unlikely, it can happen that two clients try to put the
very same file into their cache.  Needless to say that they fetch the same
content from origin, so if one overwrites the others version it is not a
problem. The error our application receives is:

pread() "/mnt/tst_vol_aws/sat_p_cache/9_158_309/18/169/80930_158240"
failed (22: Invalid argument)

Around that I find various messages in /var/log/glusterfs/mnt-tst_vol_aws.log:

[2013-07-25 07:08:36.859467] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-tst_vol_aws01-client-1: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309/19/682/161796_316418
(c19f0726-2c01-420f-b10f-926f8f55de43)
[2013-07-25 07:08:36.860412] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-tst_vol_aws01-client-0: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309/19/682/161796_316418
(c19f0726-2c01-420f-b10f-926f8f55de43)
[2013-07-25 07:09:44.035227] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-tst_vol_aws01-client-1: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309
(3628f18c-48c8-4b07-91cf-ff1922ab45f8)
[2013-07-25 07:09:44.035669] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-tst_vol_aws01-client-0: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309
(3628f18c-48c8-4b07-91cf-ff1922ab45f8)
[2013-07-25 07:09:44.885802] I
[afr-self-heal-common.c:1213:afr_sh_missing_entry_call_impunge_recreate]
0-tst_vol_aws01-replicate-0: no missing files -
/sat_p_cache/9_158_309/18/169/80930_158240. proceeding to metadata check
[2013-07-25 07:09:44.892560] W [client-rpc-fops.c:259:client3_3_mknod_cbk]
0-tst_vol_aws01-client-1: remote operation failed: File exists. Path:
/sat_p_cache/9_158_309/18/169/80930_158240
(00000000-0000-0000-0000-000000000000)
[2013-07-25 07:09:44.892994] W [client-rpc-fops.c:259:client3_3_mknod_cbk]
0-tst_vol_aws01-client-0: remote operation failed: File exists. Path:
/sat_p_cache/9_158_309/18/169/80930_158240
(00000000-0000-0000-0000-000000000000)
[2013-07-25 07:09:44.893073] E [dht-helper.c:1052:dht_inode_ctx_get]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
[0x7f541cd61025]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
[0x7f541cd4801e]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34)
[0x7f541cd49364]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:09:44.893106] E [dht-helper.c:1071:dht_inode_ctx_set]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
[0x7f541cd61025]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
[0x7f541cd4801e]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52)
[0x7f541cd49382]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:09:44.893476] W [page.c:991:__ioc_page_error]
0-tst_vol_aws01-io-cache: page error for page = 0x7f54100556c0 & waitq =
0x7f5410048b50
[2013-07-25 07:09:44.893527] W [fuse-bridge.c:2049:fuse_readv_cbk]
0-glusterfs-fuse: 5663: READ => -1 (Invalid argument)
[2013-07-25 07:09:44.896395] W [fuse-bridge.c:875:fuse_fd_cbk] 0-glusterfs-fuse:
5665: OPEN() /sat_p_cache/9_158_309/18/169/80930_158240 => -1 (Invalid
argument)
[2013-07-25 07:09:44.896524] W [defaults.c:1291:default_release]
(-->/usr/lib64/glusterfs/3.4.0/xlator/performance/open-behind.so(open_and_resume+0xa8)
[0x7f541c30c288] (-->/usr/lib64/libglusterfs.so.0(call_stub_destroy+0x142)
[0x7f54231cbbb2] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x144)
[0x7f54231d8604]))) 0-fuse: xlator does not implement release_cbk
[2013-07-25 07:09:45.381747] I
[afr-self-heal-common.c:1213:afr_sh_missing_entry_call_impunge_recreate]
0-tst_vol_aws01-replicate-0: no missing files -
/sat_p_cache/9_158_309/19/674/161796_316486. proceeding to metadata check

I think the one causing the message to us is this "invalid argument:
inode", but also the other are not exactly encouraging.  I can easily
reproduce the problem without our service by this simple shell script:

#!/bin/sh

ip="`/sbin/ip a s eth0 |sed -n '/inet /{s%^.*inet
%%;s%/.*$%%;p}'`"

tmpdir="/mnt/tst_vol_aws/tmp/$ip/foo/"
dstdir="/mnt/tst_vol_aws/sat_p_cache/foo/"

while true
do
	date >"$tmpdir/foo.$$"
	mv -v "$tmpdir/foo.$$" "$dstdir/foo.`date +%s`"
done

The clients get alternating:

mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure
needs cleaning

errors and I get the same garbage in gluster logfiles:

[2013-07-25 07:19:00.018670] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-tst_vol_aws01-replicate-0: background  meta-data data entry missing-entry gfid
self-heal failed on /sat_p_cache/foo/foo.1374736740
[2013-07-25 07:19:00.019713] W [client-rpc-fops.c:259:client3_3_mknod_cbk]
0-tst_vol_aws01-client-0: remote operation failed: File exists. Path:
/sat_p_cache/foo/foo.1374736740 (00000000-0000-0000-0000-000000000000)
[2013-07-25 07:19:00.020364] W [client-rpc-fops.c:259:client3_3_mknod_cbk]
0-tst_vol_aws01-client-1: remote operation failed: File exists. Path:
/sat_p_cache/foo/foo.1374736740 (00000000-0000-0000-0000-000000000000)
[2013-07-25 07:19:00.020440] E [dht-helper.c:1052:dht_inode_ctx_get]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
[0x7f73242c0025]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
[0x7f73242a701e]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34)
[0x7f73242a8364]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:19:00.020472] E [dht-helper.c:1071:dht_inode_ctx_set]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
[0x7f73242c0025]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
[0x7f73242a701e]
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52)
[0x7f73242a8382]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:19:00.172323] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix]
0-tst_vol_aws01-replicate-0: /sat_p_cache/foo: Performing conservative merge
[2013-07-25 07:19:01.068257] W [client-rpc-fops.c:526:client3_3_stat_cbk]
0-tst_vol_aws01-client-0: remote operation failed: No such file or directory

Any advice on this?  What am I doing wrong?  If required I can provide an easy
shell script how to set this up in AWS to reproduce the problem.  Thanks a lot
in advance.

Kind regards,
 Frank.
Gluster users - Jul 2013 - Concurrent writes to same file

[Gluster-users] Concurrent writes to same file