Raghavendra Gowdappa
2018-Mar-26 07:37 UTC
[Gluster-users] Sharding problem - multiple shard copies with mismatching gfids
Ian, Do you've a reproducer for this bug? If not a specific one, a general outline of what operations where done on the file will help. regards, Raghavendra On Mon, Mar 26, 2018 at 12:55 PM, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote:> > > On Mon, Mar 26, 2018 at 12:40 PM, Krutika Dhananjay <kdhananj at redhat.com> > wrote: > >> The gfid mismatch here is between the shard and its "link-to" file, the >> creation of which happens at a layer below that of shard translator on the >> stack. >> >> Adding DHT devs to take a look. >> > > Thanks Krutika. I assume shard doesn't do any dentry operations like > rename, link, unlink on the path of file (not the gfid handle based path) > internally while managing shards. Can you confirm? If it does these > operations, what fops does it do? > > @Ian, > > I can suggest following way to fix the problem: > * Since one of files listed is a DHT linkto file, I am assuming there is > only one shard of the file. If not, please list out gfids of other shards > and don't proceed with healing procedure. > * If gfids of all shards happen to be same and only linkto has a different > gfid, please proceed to step 3. Otherwise abort the healing procedure. > * If cluster.lookup-optimize is set to true abort the healing procedure > * Delete the linkto file - the file with permissions -------T and xattr > trusted.dht.linkto and do a lookup on the file from mount point after > turning off readdriplus [1]. > > As to reasons on how we ended up in this situation, Can you explain me > what is the I/O pattern on this file - like are there lots of entry > operations like rename, link, unlink etc on the file? There have been known > races in rename/lookup-heal-creating-linkto where linkto and data file > have different gfids. [2] fixes some of these cases > > [1] http://lists.gluster.org/pipermail/gluster-users/2017- > March/030148.html > [2] https://review.gluster.org/#/c/19547/ > > regards, > Raghavendra > >> >> >>> -Krutika >> >> On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalliday at ndevix.com> >> wrote: >> >>> Hello all, >>> >>> We are having a rather interesting problem with one of our VM storage >>> systems. The GlusterFS client is throwing errors relating to GFID >>> mismatches. We traced this down to multiple shards being present on the >>> gluster nodes, with different gfids. >>> >>> Hypervisor gluster mount log: >>> >>> [2018-03-25 18:54:19.261733] E [MSGID: 133010] >>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard: >>> Lookup on shard 7 failed. Base file gfid = 87137cac-49eb-492a-8f33-8e33470d8cb7 >>> [Stale file handle] >>> The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] >>> 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid >>> different on data file on ovirt-zone1-replicate-3, gfid local >>> 00000000-0000-0000-0000-000000000000, gfid node >>> 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 " repeated 2 times between >>> [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576] >>> [2018-03-25 18:54:19.264349] W [MSGID: 109009] >>> [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: >>> /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on >>> subvolume ovirt-zone1-replicate-3, gfid local >>> fdf0813b-718a-4616-a51b-6999ebba9ec3, gfid node >>> 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 >>> >>> >>> On the storage nodes, we found this: >>> >>> [root at n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> >>> [root at n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> ---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> [root at n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> -rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> >>> [root at n1 gluster]# getfattr -d -m . -e hex >>> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >>> c6162656c65645f743a733000 >>> trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3 >>> trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e653 >>> 12d7265706c69636174652d3300 >>> >>> [root at n1 gluster]# getfattr -d -m . -e hex >>> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >>> c6162656c65645f743a733000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.bit-rot.version=0x020000000000000059914190000ce672 >>> trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56 >>> >>> >>> I'm wondering how they got created in the first place, and if anyone has >>> any insight on how to fix it? >>> >>> Storage nodes: >>> [root at n1 gluster]# gluster --version >>> glusterfs 4.0.0 >>> >>> [root at n1 gluster]# gluster volume info >>> >>> Volume Name: ovirt-350-zone1 >>> Type: Distributed-Replicate >>> Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 7 x (2 + 1) = 21 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 10.0.6.100:/gluster/brick1/brick >>> Brick2: 10.0.6.101:/gluster/brick1/brick >>> Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter) >>> Brick4: 10.0.6.100:/gluster/brick2/brick >>> Brick5: 10.0.6.101:/gluster/brick2/brick >>> Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter) >>> Brick7: 10.0.6.100:/gluster/brick3/brick >>> Brick8: 10.0.6.101:/gluster/brick3/brick >>> Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter) >>> Brick10: 10.0.6.100:/gluster/brick4/brick >>> Brick11: 10.0.6.101:/gluster/brick4/brick >>> Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter) >>> Brick13: 10.0.6.100:/gluster/brick5/brick >>> Brick14: 10.0.6.101:/gluster/brick5/brick >>> Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter) >>> Brick16: 10.0.6.100:/gluster/brick6/brick >>> Brick17: 10.0.6.101:/gluster/brick6/brick >>> Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter) >>> Brick19: 10.0.6.100:/gluster/brick7/brick >>> Brick20: 10.0.6.101:/gluster/brick7/brick >>> Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter) >>> Options Reconfigured: >>> cluster.min-free-disk: 50GB >>> performance.strict-write-ordering: off >>> performance.strict-o-direct: off >>> nfs.disable: off >>> performance.readdir-ahead: on >>> transport.address-family: inet >>> performance.cache-size: 1GB >>> features.shard: on >>> features.shard-block-size: 5GB >>> server.event-threads: 8 >>> server.outstanding-rpc-limit: 128 >>> storage.owner-uid: 36 >>> storage.owner-gid: 36 >>> performance.quick-read: off >>> performance.read-ahead: off >>> performance.io-cache: off >>> performance.stat-prefetch: on >>> cluster.eager-lock: enable >>> network.remote-dio: enable >>> cluster.quorum-type: auto >>> cluster.server-quorum-type: server >>> cluster.data-self-heal-algorithm: full >>> performance.flush-behind: off >>> performance.write-behind-window-size: 8MB >>> client.event-threads: 8 >>> server.allow-insecure: on >>> >>> >>> Client version: >>> [root at kvm573 ~]# gluster --version >>> glusterfs 3.12.5 >>> >>> >>> Thanks! >>> >>> - Ian >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180326/26ba11bd/attachment.html>
Raghavendra Gowdappa
2018-Apr-06 03:39 UTC
[Gluster-users] Sharding problem - multiple shard copies with mismatching gfids
Sorry for the delay, Ian :). This looks to be a genuine issue which requires some effort in fixing it. Can you file a bug? I need following information attached to bug: * Client and bricks logs. If you can reproduce the issue, please set diagnostics.client-log-level and diagnostics.brick-log-level to TRACE. If you cannot reproduce the issue or if you cannot accommodate such big logs, please set the log-level to DEBUG. * If possible a simple reproducer. A simple script or steps are appreciated. * strace of VM (to find out I/O pattern). If possible, dump of traffic between kernel and glusterfs. This can be captured by mounting glusterfs using --dump-fuse option. Note that the logs you've posted here captures the scenario _after_ the shard file has gone into bad state. But I need information on what led to that situation. So, please start collecting this diagnostic information as early as you can. regards, Raghavendra On Tue, Apr 3, 2018 at 7:52 AM, Ian Halliday <ihalliday at ndevix.com> wrote:> Raghavendra, > > Sorry for the late follow up. I have some more data on the issue. > > The issue tends to happen when the shards are created. The easiest time to > reproduce this is during an initial VM disk format. This is a log from a > test VM that was launched, and then partitioned and formatted with LVM / > XFS: > > [2018-04-03 02:05:00.838440] W [MSGID: 109048] > [dht-common.c:9732:dht_rmdir_cached_lookup_cbk] 0-ovirt-350-zone1-dht: > /489c6fb7-fe61-4407-8160-35c0aac40c85/images/_remove_ > me_9a0660e1-bd86-47ea-8e09-865c14f11f26/e2645bd1-a7f3-4cbd-9036-3d3cbc7204cd.meta > found on cached subvol ovirt-350-zone1-replicate-5 > [2018-04-03 02:07:57.967489] I [MSGID: 109070] > [dht-common.c:2796:dht_lookup_linkfile_cbk] 0-ovirt-350-zone1-dht: Lookup > of /.shard/927c6620-848b-4064-8c88-68a332b645c2.7 on > ovirt-350-zone1-replicate-3 (following linkfile) failed ,gfid > 00000000-0000-0000-0000-000000000000 [No such file or directory] > [2018-04-03 02:07:57.974815] I [MSGID: 109069] > [dht-common.c:2095:dht_lookup_unlink_stale_linkto_cbk] > 0-ovirt-350-zone1-dht: Returned with op_ret 0 and op_errno 0 for > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3 > [2018-04-03 02:07:57.979851] W [MSGID: 109009] > [dht-common.c:2831:dht_lookup_linkfile_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3: gfid different on data > file on ovirt-350-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-000000000000, > gfid node = 55f86aa0-e7a0-4075-b46b-a11f8bdbbceb > [2018-04-03 02:07:57.980716] W [MSGID: 109009] > [dht-common.c:2570:dht_lookup_everywhere_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3: gfid differs on subvolume > ovirt-350-zone1-replicate-3, gfid local = b1e3f299-32ff-497e-918b-090e957090f6, > gfid node = 55f86aa0-e7a0-4075-b46b-a11f8bdbbceb > [2018-04-03 02:07:57.980763] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] > 0-ovirt-350-zone1-shard: Lookup on shard 3 failed. Base file gfid > 927c6620-848b-4064-8c88-68a332b645c2 [Stale file handle] > [2018-04-03 02:07:57.983016] I [MSGID: 109069] > [dht-common.c:2095:dht_lookup_unlink_stale_linkto_cbk] > 0-ovirt-350-zone1-dht: Returned with op_ret 0 and op_errno 0 for > /.shard/927c6620-848b-4064-8c88-68a332b645c2.7 > [2018-04-03 02:07:57.988761] W [MSGID: 109009] > [dht-common.c:2570:dht_lookup_everywhere_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3: gfid differs on subvolume > ovirt-350-zone1-replicate-3, gfid local = b1e3f299-32ff-497e-918b-090e957090f6, > gfid node = 55f86aa0-e7a0-4075-b46b-a11f8bdbbceb > [2018-04-03 02:07:57.988844] W [MSGID: 109009] > [dht-common.c:2831:dht_lookup_linkfile_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.7: gfid different on data > file on ovirt-350-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-000000000000, > gfid node = 955a5e78-ab4c-499a-89f8-511e041167fb > [2018-04-03 02:07:57.989748] W [MSGID: 109009] > [dht-common.c:2570:dht_lookup_everywhere_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.7: gfid differs on subvolume > ovirt-350-zone1-replicate-3, gfid local = efbb9be5-0744-4883-8f3e-e8f7ce8d7741, > gfid node = 955a5e78-ab4c-499a-89f8-511e041167fb > [2018-04-03 02:07:57.989827] I [MSGID: 109069] > [dht-common.c:2095:dht_lookup_unlink_stale_linkto_cbk] > 0-ovirt-350-zone1-dht: Returned with op_ret -1 and op_errno 2 for > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3 > [2018-04-03 02:07:57.989832] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] > 0-ovirt-350-zone1-shard: Lookup on shard 7 failed. Base file gfid > 927c6620-848b-4064-8c88-68a332b645c2 [Stale file handle] > The message "W [MSGID: 109009] [dht-common.c:2831:dht_lookup_linkfile_cbk] > 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-8c88-68a332b645c2.3: > gfid different on data file on ovirt-350-zone1-replicate-3, gfid local > 00000000-0000-0000-0000-000000000000, gfid node = 55f86aa0-e7a0-4075-b46b-a11f8bdbbceb > " repeated 2 times between [2018-04-03 02:07:57.979851] and [2018-04-03 > 02:07:57.995739] > [2018-04-03 02:07:57.996644] W [MSGID: 109009] > [dht-common.c:2570:dht_lookup_everywhere_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3: gfid differs on subvolume > ovirt-350-zone1-replicate-3, gfid local = 0a701104-e9a2-44c0-8181-4a9a6edecf9f, > gfid node = 55f86aa0-e7a0-4075-b46b-a11f8bdbbceb > [2018-04-03 02:07:57.996761] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] > 0-ovirt-350-zone1-shard: Lookup on shard 3 failed. Base file gfid > 927c6620-848b-4064-8c88-68a332b645c2 [Stale file handle] > [2018-04-03 02:07:57.998986] W [MSGID: 109009] > [dht-common.c:2831:dht_lookup_linkfile_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3: gfid different on data > file on ovirt-350-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-000000000000, > gfid node = 55f86aa0-e7a0-4075-b46b-a11f8bdbbceb > [2018-04-03 02:07:57.999857] W [MSGID: 109009] > [dht-common.c:2570:dht_lookup_everywhere_cbk] 0-ovirt-350-zone1-dht: > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3: gfid differs on subvolume > ovirt-350-zone1-replicate-3, gfid local = 0a701104-e9a2-44c0-8181-4a9a6edecf9f, > gfid node = 55f86aa0-e7a0-4075-b46b-a11f8bdbbceb > [2018-04-03 02:07:57.999899] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] > 0-ovirt-350-zone1-shard: Lookup on shard 3 failed. Base file gfid > 927c6620-848b-4064-8c88-68a332b645c2 [Stale file handle] > [2018-04-03 02:07:57.999942] W [fuse-bridge.c:896:fuse_attr_cbk] > 0-glusterfs-fuse: 22338: FSTAT() /489c6fb7-fe61-4407-8160- > 35c0aac40c85/images/a717e25c-f108-4367-9d28-9235bd432bb7/ > 5a8e541e-8883-4dec-8afd-aa29f38ef502 => -1 (Stale file handle) > [2018-04-03 02:07:57.987941] I [MSGID: 109069] > [dht-common.c:2095:dht_lookup_unlink_stale_linkto_cbk] > 0-ovirt-350-zone1-dht: Returned with op_ret 0 and op_errno 0 for > /.shard/927c6620-848b-4064-8c88-68a332b645c2.3 > > > Duplicate shards are created. Output from one of the gluster nodes: > > # find -name 927c6620-848b-4064-8c88-68a332b645c2.* > ./brick1/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.19 > ./brick1/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.9 > ./brick1/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.7 > ./brick3/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.5 > ./brick3/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.3 > ./brick4/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.19 > ./brick4/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.9 > ./brick4/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.5 > ./brick4/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.3 > ./brick4/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.7 > > [root at n1 gluster]# getfattr -d -m . -e hex ./brick1/brick/.shard/ > 927c6620-848b-4064-8c88-68a332b645c2.19 > # file: brick1/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.19 > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x46083184a0e5468e89e6cc1db0bfc63b > trusted.gfid2path.77528eefc6a11c45=0x62653331383633382d653861302d > 346336642d393737642d3761393337616138343830362f39323763363632 > 302d383438622d343036342d386338382d3638613333326236343563322e3139 > trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65 > 312d7265706c69636174652d3300 > > [root at n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/ > 927c6620-848b-4064-8c88-68a332b645c2.19 > # file: brick4/brick/.shard/927c6620-848b-4064-8c88-68a332b645c2.19 > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.gfid=0x46083184a0e5468e89e6cc1db0bfc63b > trusted.gfid2path.77528eefc6a11c45=0x62653331383633382d653861302d > 346336642d393737642d3761393337616138343830362f39323763363632 > 302d383438622d343036342d386338382d3638613333326236343563322e3139 > > > In the above example, the shard on Brick 1 is the bad one. > > At this point, the VM will pause with an unknown storage error and will > not boot until the offending shards are removed. > > > # gluster volume info > Volume Name: ovirt-350-zone1 > Type: Distributed-Replicate > Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e > Status: Started > Snapshot Count: 0 > Number of Bricks: 7 x (2 + 1) = 21 > Transport-type: tcp > Bricks: > Brick1: 10.0.6.100:/gluster/brick1/brick > Brick2: 10.0.6.101:/gluster/brick1/brick > Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter) > Brick4: 10.0.6.100:/gluster/brick2/brick > Brick5: 10.0.6.101:/gluster/brick2/brick > Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter) > Brick7: 10.0.6.100:/gluster/brick3/brick > Brick8: 10.0.6.101:/gluster/brick3/brick > Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter) > Brick10: 10.0.6.100:/gluster/brick4/brick > Brick11: 10.0.6.101:/gluster/brick4/brick > Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter) > Brick13: 10.0.6.100:/gluster/brick5/brick > Brick14: 10.0.6.101:/gluster/brick5/brick > Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter) > Brick16: 10.0.6.100:/gluster/brick6/brick > Brick17: 10.0.6.101:/gluster/brick6/brick > Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter) > Brick19: 10.0.6.100:/gluster/brick7/brick > Brick20: 10.0.6.101:/gluster/brick7/brick > Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter) > Options Reconfigured: > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > performance.client-io-threads: off > server.allow-insecure: on > client.event-threads: 8 > storage.owner-gid: 36 > storage.owner-uid: 36 > server.event-threads: 16 > features.shard-block-size: 5GB > features.shard: on > transport.address-family: inet > nfs.disable: yes > > Any suggestions? > > > -- Ian > > > ------ Original Message ------ > From: "Raghavendra Gowdappa" <rgowdapp at redhat.com> > To: "Krutika Dhananjay" <kdhananj at redhat.com> > Cc: "Ian Halliday" <ihalliday at ndevix.com>; "gluster-user" < > gluster-users at gluster.org>; "Nithya Balachandran" <nbalacha at redhat.com> > Sent: 3/26/2018 2:37:21 AM > Subject: Re: [Gluster-users] Sharding problem - multiple shard copies with > mismatching gfids > > Ian, > > Do you've a reproducer for this bug? If not a specific one, a general > outline of what operations where done on the file will help. > > regards, > Raghavendra > > On Mon, Mar 26, 2018 at 12:55 PM, Raghavendra Gowdappa < > rgowdapp at redhat.com> wrote: > >> >> >> On Mon, Mar 26, 2018 at 12:40 PM, Krutika Dhananjay <kdhananj at redhat.com> >> wrote: >> >>> The gfid mismatch here is between the shard and its "link-to" file, the >>> creation of which happens at a layer below that of shard translator on the >>> stack. >>> >>> Adding DHT devs to take a look. >>> >> >> Thanks Krutika. I assume shard doesn't do any dentry operations like >> rename, link, unlink on the path of file (not the gfid handle based path) >> internally while managing shards. Can you confirm? If it does these >> operations, what fops does it do? >> >> @Ian, >> >> I can suggest following way to fix the problem: >> * Since one of files listed is a DHT linkto file, I am assuming there is >> only one shard of the file. If not, please list out gfids of other shards >> and don't proceed with healing procedure. >> * If gfids of all shards happen to be same and only linkto has a >> different gfid, please proceed to step 3. Otherwise abort the healing >> procedure. >> * If cluster.lookup-optimize is set to true abort the healing procedure >> * Delete the linkto file - the file with permissions -------T and xattr >> trusted.dht.linkto and do a lookup on the file from mount point after >> turning off readdriplus [1]. >> >> As to reasons on how we ended up in this situation, Can you explain me >> what is the I/O pattern on this file - like are there lots of entry >> operations like rename, link, unlink etc on the file? There have been known >> races in rename/lookup-heal-creating-linkto where linkto and data file >> have different gfids. [2] fixes some of these cases >> >> [1] http://lists.gluster.org/pipermail/gluster-users/2017-March/ >> 030148.html >> [2] https://review.gluster.org/#/c/19547/ >> >> regards, >> Raghavendra >> >>> >>> >>>> -Krutika >>> >>> On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalliday at ndevix.com> >>> wrote: >>> >>>> Hello all, >>>> >>>> We are having a rather interesting problem with one of our VM storage >>>> systems. The GlusterFS client is throwing errors relating to GFID >>>> mismatches. We traced this down to multiple shards being present on the >>>> gluster nodes, with different gfids. >>>> >>>> Hypervisor gluster mount log: >>>> >>>> [2018-03-25 18:54:19.261733] E [MSGID: 133010] >>>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard: >>>> Lookup on shard 7 failed. Base file gfid = 87137cac-49eb-492a-8f33-8e33470d8cb7 >>>> [Stale file handle] >>>> The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] >>>> 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: >>>> gfid different on data file on ovirt-zone1-replicate-3, gfid local >>>> 00000000-0000-0000-0000-000000000000, gfid node >>>> 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 " repeated 2 times between >>>> [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576] >>>> [2018-03-25 18:54:19.264349] W [MSGID: 109009] >>>> [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: >>>> /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on >>>> subvolume ovirt-zone1-replicate-3, gfid local >>>> fdf0813b-718a-4616-a51b-6999ebba9ec3, gfid node >>>> 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 >>>> >>>> >>>> On the storage nodes, we found this: >>>> >>>> [root at n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>>> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>>> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>>> >>>> [root at n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac >>>> -49eb-492a-8f33-8e33470d8cb7.7 >>>> ---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac >>>> -49eb-492a-8f33-8e33470d8cb7.7 >>>> [root at n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac >>>> -49eb-492a-8f33-8e33470d8cb7.7 >>>> -rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac >>>> -49eb-492a-8f33-8e33470d8cb7.7 >>>> >>>> [root at n1 gluster]# getfattr -d -m . -e hex >>>> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>>> # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >>>> c6162656c65645f743a733000 >>>> trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3 >>>> trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e653 >>>> 12d7265706c69636174652d3300 >>>> >>>> [root at n1 gluster]# getfattr -d -m . -e hex >>>> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>>> # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >>>> c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.bit-rot.version=0x020000000000000059914190000ce672 >>>> trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56 >>>> >>>> >>>> I'm wondering how they got created in the first place, and if anyone >>>> has any insight on how to fix it? >>>> >>>> Storage nodes: >>>> [root at n1 gluster]# gluster --version >>>> glusterfs 4.0.0 >>>> >>>> [root at n1 gluster]# gluster volume info >>>> >>>> Volume Name: ovirt-350-zone1 >>>> Type: Distributed-Replicate >>>> Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 7 x (2 + 1) = 21 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: 10.0.6.100:/gluster/brick1/brick >>>> Brick2: 10.0.6.101:/gluster/brick1/brick >>>> Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter) >>>> Brick4: 10.0.6.100:/gluster/brick2/brick >>>> Brick5: 10.0.6.101:/gluster/brick2/brick >>>> Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter) >>>> Brick7: 10.0.6.100:/gluster/brick3/brick >>>> Brick8: 10.0.6.101:/gluster/brick3/brick >>>> Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter) >>>> Brick10: 10.0.6.100:/gluster/brick4/brick >>>> Brick11: 10.0.6.101:/gluster/brick4/brick >>>> Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter) >>>> Brick13: 10.0.6.100:/gluster/brick5/brick >>>> Brick14: 10.0.6.101:/gluster/brick5/brick >>>> Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter) >>>> Brick16: 10.0.6.100:/gluster/brick6/brick >>>> Brick17: 10.0.6.101:/gluster/brick6/brick >>>> Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter) >>>> Brick19: 10.0.6.100:/gluster/brick7/brick >>>> Brick20: 10.0.6.101:/gluster/brick7/brick >>>> Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter) >>>> Options Reconfigured: >>>> cluster.min-free-disk: 50GB >>>> performance.strict-write-ordering: off >>>> performance.strict-o-direct: off >>>> nfs.disable: off >>>> performance.readdir-ahead: on >>>> transport.address-family: inet >>>> performance.cache-size: 1GB >>>> features.shard: on >>>> features.shard-block-size: 5GB >>>> server.event-threads: 8 >>>> server.outstanding-rpc-limit: 128 >>>> storage.owner-uid: 36 >>>> storage.owner-gid: 36 >>>> performance.quick-read: off >>>> performance.read-ahead: off >>>> performance.io-cache: off >>>> performance.stat-prefetch: on >>>> cluster.eager-lock: enable >>>> network.remote-dio: enable >>>> cluster.quorum-type: auto >>>> cluster.server-quorum-type: server >>>> cluster.data-self-heal-algorithm: full >>>> performance.flush-behind: off >>>> performance.write-behind-window-size: 8MB >>>> client.event-threads: 8 >>>> server.allow-insecure: on >>>> >>>> >>>> Client version: >>>> [root at kvm573 ~]# gluster --version >>>> glusterfs 3.12.5 >>>> >>>> >>>> Thanks! >>>> >>>> - Ian >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180406/5a73dd78/attachment.html>
Apparently Analagous Threads
- Sharding problem - multiple shard copies with mismatching gfids
- Sharding problem - multiple shard copies with mismatching gfids
- Sharding problem - multiple shard copies with mismatching gfids
- Sharding problem - multiple shard copies with mismatching gfids
- Sharding problem - multiple shard copies with mismatching gfids