thr3ads.net - Gluster users - [Gluster-users] Input/output error on gfid mismatch (with test case) [REPOST] [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Jason Stubbs

2011-Nov-19 20:47 UTC

[Gluster-users] Input/output error on gfid mismatch (with test case) [REPOST]

(Sorry if this comes through twice, but I sent the original almost 12 hours ago
and it hasn't
appeared in the archives even though another mail sent after mine has)

Hi,

I've only been using glusterfs for a couple of weeks, but I've been
having a few issues with it.
For one of the issues, I've managed to put together steps to reproduce so I
guess this is a bug
report. The log files on the client that experiences the error:

[2011-11-19 18:05:23.619352] W [afr-common.c:1121:afr_conflicting_iattrs]
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
[2011-11-19 18:05:23.619391] W [afr-common.c:1121:afr_conflicting_iattrs]
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
[2011-11-19 18:05:23.619413] W [afr-common.c:882:afr_detect_self_heal_by_iatt]
0-testvol-replicate-0: /testfile: gfid different on subvolume
[2011-11-19 18:05:23.619452] I [afr-common.c:1038:afr_launch_self_heal]
0-testvol-replicate-0: background  missing-entry self-heal triggered. path:
/testfile
[2011-11-19 18:05:23.624027] I
[afr-self-heal-common.c:1858:afr_sh_post_nb_entrylk_conflicting_sh_cbk]
0-testvol-replicate-0: Non blocking entrylks failed.
[2011-11-19 18:05:23.624062] I
[afr-self-heal-common.c:963:afr_sh_missing_entries_done] 0-testvol-replicate-0:
split brain found, aborting selfheal of /testfile
[2011-11-19 18:05:23.624084] E
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk]
0-testvol-replicate-0: background  missing-entry self-heal failed on /testfile
[2011-11-19 18:05:23.624108] W [afr-common.c:1121:afr_conflicting_iattrs]
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
[2011-11-19 18:05:23.624133] W [fuse-bridge.c:184:fuse_entry_cbk]
0-glusterfs-fuse: 9142: LOOKUP() /testfile => -1 (Input/output error)

And to reproduce, using two glusterfs (v3.2.5) servers with the following volume
definition:

Volume Name: testvol
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.104.123.145:/gluster/testvol
Brick2: 10.82.37.136:/gluster/testvol

Run this on one client:

# while true; do touch testfile.tmp; mv testfile.tmp testfile; done

And this script on another client:

# while true; do x=$(<testfile); done

I couldn't get the error to occur either when both scripts were run on a
single client, or when
using the glusterfs servers instead separate clients. Also, it didn't matter
if both clients were
mount from the same glusterfs server or one from each of the servers.

My assumption is that the second client's read is being interleaved with the
first client's move
operation, giving a differing gfid. If any further information is needed, please
don't hesitate
to let me know.

--
Jason Stubbs

Pranith Kumar K

2011-Nov-20 04:18 UTC

head link

[Gluster-users] Input/output error on gfid mismatch (with test case) [REPOST]

hi Jason,
       Could you raise the bug on bugs.gluster.com.
When a file has mismatching gfid, client tries to findout which file is 
correct based on the extended attributes of the parent directory, to 
inspect those attributes it needs to take an entry lock on that parent 
directory. The lock is failing because of touch in a loop, so it gives 
up and returns the error EIO.

Thanks a lot for taking the time to provide the steps to re-create the 
problem.

Pranith.

On 11/20/2011 02:17 AM, Jason Stubbs wrote:> (Sorry if this comes through twice, but I sent the original almost 12 hours
ago and it hasn't
> appeared in the archives even though another mail sent after mine has)
>
> Hi,
>
> I've only been using glusterfs for a couple of weeks, but I've been
having a few issues with it.
> For one of the issues, I've managed to put together steps to reproduce
so I guess this is a bug
> report. The log files on the client that experiences the error:
>
> [2011-11-19 18:05:23.619352] W [afr-common.c:1121:afr_conflicting_iattrs]
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
> [2011-11-19 18:05:23.619391] W [afr-common.c:1121:afr_conflicting_iattrs]
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
> [2011-11-19 18:05:23.619413] W
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-testvol-replicate-0:
/testfile: gfid different on subvolume
> [2011-11-19 18:05:23.619452] I [afr-common.c:1038:afr_launch_self_heal]
0-testvol-replicate-0: background  missing-entry self-heal triggered. path:
/testfile
> [2011-11-19 18:05:23.624027] I
[afr-self-heal-common.c:1858:afr_sh_post_nb_entrylk_conflicting_sh_cbk]
0-testvol-replicate-0: Non blocking entrylks failed.
> [2011-11-19 18:05:23.624062] I
[afr-self-heal-common.c:963:afr_sh_missing_entries_done] 0-testvol-replicate-0:
split brain found, aborting selfheal of /testfile
> [2011-11-19 18:05:23.624084] E
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk]
0-testvol-replicate-0: background  missing-entry self-heal failed on /testfile
> [2011-11-19 18:05:23.624108] W [afr-common.c:1121:afr_conflicting_iattrs]
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
> [2011-11-19 18:05:23.624133] W [fuse-bridge.c:184:fuse_entry_cbk]
0-glusterfs-fuse: 9142: LOOKUP() /testfile =>  -1 (Input/output error)
>
> And to reproduce, using two glusterfs (v3.2.5) servers with the following
volume definition:
>
> Volume Name: testvol
> Type: Replicate
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.104.123.145:/gluster/testvol
> Brick2: 10.82.37.136:/gluster/testvol
>
> Run this on one client:
>
> # while true; do touch testfile.tmp; mv testfile.tmp testfile; done
>
> And this script on another client:
>
> # while true; do x=$(<testfile); done
>
> I couldn't get the error to occur either when both scripts were run on
a single client, or when
> using the glusterfs servers instead separate clients. Also, it didn't
matter if both clients were
> mount from the same glusterfs server or one from each of the servers.
>
> My assumption is that the second client's read is being interleaved
with the first client's move
> operation, giving a differing gfid. If any further information is needed,
please don't hesitate
> to let me know.
>
> --
> Jason Stubbs
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Nov 2011 - Input/output error on gfid mismatch (with test case) [REPOST]

[Gluster-users] Input/output error on gfid mismatch (with test case) [REPOST]

[Gluster-users] Input/output error on gfid mismatch (with test case) [REPOST]