thr3ads.net - Gluster users - [Gluster-users] Stale File Handle Errors During Heavy Writes [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Strahil

2019-Nov-28 05:46 UTC

[Gluster-users] Stale File Handle Errors During Heavy Writes

I have already tried disabling sharding on a test oVirt volume... The results
were devastating for the app, so please do not disable sharding.

Best Regards,
Strahil NikolovOn Nov 27, 2019 20:55, Olaf Buitelaar <olaf.buitelaar at
gmail.com> wrote:>
> Hi Tim,
>
> That issue also seems to point to a stale file. Best i suppose is first?to
determine if you indeed have the same shard on different sub-volumes, where on
one of the sub-volumes the file size is 0KB and has the stick bit set. if so we
suffer from the same issue, and you can clean those files up, so the `rm`
command should start working again.
> Essentially you should consider the volume unhealty until you have resolved
the stale files, before you can continue file operations. Remounting the client
shouldn't make a difference since the issue is at brick/sub-volume level.
>
> the last comment i received from Krutika;
> "I haven't had the chance to look into the attachments yet. I got
another customer case on me.
> But from the description, it seems like the linkto?file?(the one with a
'T') and the original?file?don't have the same gfid?
> It's not wrong for those 'T'?files?to exist. But they're
supposed to have the same gfid.
> This is something that needs DHT team's attention.
> Do you mind raising a bug in?bugzilla.redhat.com?against glusterfs and
component 'distribute' or 'DHT'?"
>
>
> For me replicating it was easiest with running xfs_fsr (which is very write
intensive in fragmented volumes) from within a VM, but it could happen with a
simple yum install.. docker run (with new image)..general test with dd, mkfs.xfs
or just random, which was the normal case. But i've to say my workload is
mostly write intensive, like yours.
>
> Sharding in general is a nice feature, it allows your files to be broken up
into peaces, which is also it's biggest danger..if anything goes haywire,
it's currently practically impossible to stitch all those peaces together
again, since no tool for this seems to exists..which is the nice thing about
none-sharded volumes, they are just files..but if you really wanted i suppose it
could be done. But would be very painful..i suppose.
> With the files being in shard's it allows? for much more equal
distribution. Also heals seem to resolve much quicker.
> I'm also running none sharded volumes, with files of 100GB+ and those
heals can take significantly longer. And those none sharded volumes i also
sometime's have issues with..however not remembering any stale files.
> But if you don't need it you might be better of disabling it. However i
believe you're never allowed to turn of sharding on a sharded volumes since
it will corrupt your data.
>
> Best Olaf
>
> Op wo 27 nov. 2019 om 19:19 schreef Timothy Orme <torme at
ancestry.com>:
>>
>> Hi Olaf,
>>
>> Thanks so much for sharing this, it's hugely helpful, if only to
make me feel less like I'm going crazy.? I'll see if theres anything I
can add to the bug report.? I'm trying to develop a test to reproduce the
issue now.
>>
>> We're running this in a sort of interactive HPC environment, so
these error are a bit hard for us to systematically handle, and they have a
tendency to be quite disruptive to folks work.
>>
>> I've run into other issues with sharding as well, such as this:
https://lists.gluster.org/pipermail/gluster-users/2019-October/037241.html
>>
>> I'm wondering then, if maybe sharding isn't quite stable yet
and it's more sensible for me to just disable this feature for now?? I'm
not quite sure what other implications that might have but so far all the issues
I've run into so far as a new gluster user seem like they're related to
shards.
>>
>> Thanks,
>> Tim
>> ________________________________
>> From: Olaf Buitelaar <olaf.buitelaar at gmail.com>
>> Sent: Wednesday, November 27, 2019 9:50 AM
>> To: Timothy Orme <torme at ancestry.com>
>> Cc: gluster-users <gluster-users at gluster.org>
>> Subject: [EXTERNAL] Re: [Gluster-users] Stale File Handle Errors During
Heavy Writes
>> ?
>> Hi Tim,
>>
>> i've been suffering from this also for a long time, not sure if
it's exact the same situation since your setup is different. But it seems
similar.
>> i've filed this bug
report;?https://bugzilla.redhat.com/show_bug.cgi?id=1732961?for which you might
be able to enrich.
>> To solve the stale files i've made this bash
script;?https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986?(it's
slightly outdated)?which you could use as inspiration, it basically removes the
stale files as suggested
here;?https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html?.
Please be aware the script won't work if you have? 2 (or more) bricks of the
same volume on the same server (since it always takes the first path found).
>> I invoke the script via ansible like this (since the script needs to
run on all bricks);
>> - hosts: host1,host2,host3
>> ? tasks:
>> ? ? - shell: 'bash /root/clean-stale-gluster-fh.sh --host="{{
intif.ip | first }}" --volume=ovirt-data
--backup="/backup/stale/gfs/ovirt-data" --shard="{{ item }}"
--force'
>> ? ? ? with_items:
>> ? ? ? ? - 1b0ba5c2-dd2b-45d0-9c4b-a39b2123cc13.14451
>>
>> fortunately for me the issue seems to be disappeared, since it's
now about 1 month i received one, while before it was about every other day.?
>> The biggest thing the seemed to resolve it was more disk space. while
before there was also plenty the gluster volume was at about 85% full, and the
individual disk had about 20-30% free of 8TB disk array, but had servers in the
mix with smaller disk array's but with similar available space (in
percents). i'm now at much lower percentage.?
>> So my latest running theory is that it has something todo with how
gluster allocates the shared's, since it's based on it's hash it
might want to place it in a certain sub-volume, but than comes to the conclusion
it has not enough space there, writes a marker to redirect it to another
sub-volume (thinking this is the stale file). However rebalances don't fix
this issue.? Also this still doesn't seem explain that most stale files
always end up in the first sub-volume.
>> Unfortunate i've no proof this is actually the root cause, besides
that the symptom "disappeared" once gluster had more space to work
with.
>>
>> Best Olaf
>>
>> Op wo 27 nov. 2019 om 02:38 schreef Timothy Orme <torme at
ancestry.com>:
>>>
>>> Hi All,
>>>
>>> I'm running a 3x2 cluster, v6.5.? Not sure if its relevant, but
also have sharding enabled.
>>>
>>> I've found that when under heavy write load, clients start
erroring out with "stale file handle" errors, on files not related to
the writes.
>>>
>>> For instance, when a user is running a simple wc against a file, it
will bail during that operation with "stale file"
>>>
>>> When I check the client logs, I see errors like:
>>>
>>> [2019-11-26 22:41:33.565776] E [MSGID: 109040]
[dht-helper.c:1336:dht_migration_complete_check_task] 3-scratch-dht:
24d53a0e-c28d-41e0-9dbc-a75e823a3c7d: failed to lookup the file on scratch-dht?
[Stale file handle]
>>> [2019-11-26 22:41:33.565853] W [fuse-bridge.c:2827:fuse_readv_cbk]
0-glusterfs-fuse: 33112038: READ => -1
gfid=147040e2-a6b8-4f54-8490-f0f3df29ee50 fd=0x7f95d8d0b3f8 (Stale file handle)
>>>
>>> I've seen some bugs or other threads referencing similar
issues, but couldn't really discern a solution from them.
>>>
>>> Is this caused by some consistency issue with metadata while under
load or something else?? I dont see the issue when heavy reads are occurrring.
>>>
>>> Any help is greatly appreciated!
>>>
>>> Thanks!
>>> Tim
>>> ________
>>>
>>> Community Meeting Calendar:
>>>
>>> APAC Schedule -
>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>> Bridge: https://bluejeans.com/441850968
>>>
>>> NA/EMEA Schedule -
>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>> Bridge: https://bluejeans.com/441850968
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191128/42024457/attachment-0001.html>

Olaf Buitelaar

2019-Nov-28 08:40 UTC

head link

[Gluster-users] Stale File Handle Errors During Heavy Writes

Yeah..zo the right procedure should be to setup a new volume without
sharding and copy everything over.

On Thu, 28 Nov 2019, 06:45 Strahil, <hunter86_bg at yahoo.com> wrote:
> I have already tried disabling sharding on a test oVirt volume... The
> results were devastating for the app, so please do not disable sharding.
>
> Best Regards,
> Strahil Nikolov
> On Nov 27, 2019 20:55, Olaf Buitelaar <olaf.buitelaar at gmail.com>
wrote:
>
> Hi Tim,
>
> That issue also seems to point to a stale file. Best i suppose is first to
> determine if you indeed have the same shard on different sub-volumes, where
> on one of the sub-volumes the file size is 0KB and has the stick bit set.
> if so we suffer from the same issue, and you can clean those files up, so
> the `rm` command should start working again.
> Essentially you should consider the volume unhealty until you have
> resolved the stale files, before you can continue file operations.
> Remounting the client shouldn't make a difference since the issue is at
> brick/sub-volume level.
>
> the last comment i received from Krutika;
> "I haven't had the chance to look into the attachments yet. I got
another
> customer case on me.
> But from the description, it seems like the linkto file (the one with a
> 'T') and the original file don't have the same gfid?
> It's not wrong for those 'T' files to exist. But they're
supposed to have
> the same gfid.
> This is something that needs DHT team's attention.
> Do you mind raising a bug in bugzilla.redhat.com against glusterfs and
> component 'distribute' or 'DHT'?"
>
>
> For me replicating it was easiest with running xfs_fsr (which is very
> write intensive in fragmented volumes) from within a VM, but it could
> happen with a simple yum install.. docker run (with new image)..general
> test with dd, mkfs.xfs or just random, which was the normal case. But
i've
> to say my workload is mostly write intensive, like yours.
>
> Sharding in general is a nice feature, it allows your files to be broken
> up into peaces, which is also it's biggest danger..if anything goes
> haywire, it's currently practically impossible to stitch all those
peaces
> together again, since no tool for this seems to exists..which is the nice
> thing about none-sharded volumes, they are just files..but if you really
> wanted i suppose it could be done. But would be very painful..i suppose.
> With the files being in shard's it allows  for much more equal
> distribution. Also heals seem to resolve much quicker.
> I'm also running none sharded volumes, with files of 100GB+ and those
> heals can take significantly longer. And those none sharded volumes i also
> sometime's have issues with..however not remembering any stale files.
> But if you don't need it you might be better of disabling it. However i
> believe you're never allowed to turn of sharding on a sharded volumes
since
> it will corrupt your data.
>
> Best Olaf
>
> Op wo 27 nov. 2019 om 19:19 schreef Timothy Orme <torme at
ancestry.com>:
>
> Hi Olaf,
>
> Thanks so much for sharing this, it's hugely helpful, if only to make
me
> feel less like I'm going crazy.  I'll see if theres anything I can
add to
> the bug report.  I'm trying to develop a test to reproduce the issue
now.
>
> We're running this in a sort of interactive HPC environment, so these
> error are a bit hard for us to systematically handle, and they have a
> tendency to be quite disruptive to folks work.
>
> I've run into other issues with sharding as well, such as this:
>
<https://lists.gluster.org/pipermail/gluster-users/2019-October/037241.html>
> https://lists.gluster.org/pipermail/gluster-users/2019-October/037241.html
>
> I'm wondering then, if maybe sharding isn't quite stable yet and
it's more
> sensible for me to just disable this feature for now?  I'm not quite
sure
> what other implications that might have but so far all the issues I've
run
> into so far as a new gluster user seem like they're related to shards.
>
> Thanks,
> Tim
> ------------------------------
> *From:* Olaf Buitelaar <olaf.buitelaar at gmail.com>
> *Sent:* Wednesday, November 27, 2019 9:50 AM
> *To:* Timothy Orme <torme at ancestry.com>
> *Cc:* gluster-users <gluster-users at gluster.org>
> *Subject:* [EXTERNAL] Re: [Gluster-users] Stale File Handle Errors During
> Heavy Writes
>
> Hi Tim,
>
> i've been suffering from this also for a long time, not sure if
it's exact
> the same situation since your setup is different. But it seems similar.
> i've filed this bug report;
> https://bugzilla.redhat.com/show_bug.cgi?id=1732961
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1732961&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=GbJiS8pLGORzLwdgt0ypnnQxQgRhrTHdGXEizatE9g0&e=>
for
> which you might be able to enrich.
> To solve the stale files i've made this bash script;
> https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_olafbuitelaar_ff6fe9d4ab39696d9ad6ca689cc89986&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=CvN0yMFI03czcHgzTeexTfP9h4woiAO_XVyn1umHR8g&e=>
(it's
> slightly outdated) which you could use as inspiration, it basically removes
> the stale files as suggested here;
> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_pipermail_gluster-2Dusers_2018-2DMarch_033785.html&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=MGGOwcqFQ8DwBK3MDoMxO-MD6_wrmojY1T9GYqE8WOs&e=>
.
> Please be aware the script won't work if you have  2 (or more) bricks
of
> the same volume on the same server (since it always takes the first path
> found).
> I invoke the script via ansible like this (since the script needs to run
> on all bricks);
> - hosts: host1,host2,host3
>   tasks:
>     - shell: 'bash /root/clean-stale-gluster-fh.sh --host="{{
intif.ip |
> first }}" --volume=ovirt-data
--backup="/backup/stale/gfs/ovirt-data"
> --shard="{{ item }}" --force'
>       with_items:
>         - 1b0ba5c2-dd2b-45d0-9c4b-a39b2123cc13.14451
>
> fortunately for me the issue seems to be disappeared, since it's now
about
> 1 month i received one, while before it was about every other day.
> The biggest thing the seemed to resolve it was more disk space. while
> before there was also plenty the gluster volume was at about 85% full, and
> the individual disk had about 20-30% free of 8TB disk array, but had
> servers in the mix with smaller disk array's but with similar available
> space (in percents). i'm now at much lower percentage.
> So my latest running theory is that it has something todo with how gluster
> allocates the shared's, since it's based on it's hash it might
want to
> place it in a certain sub-volume, but than comes to the conclusion it has
> not enough space there, writes a marker to redirect it to another
> sub-volume (thinking this is the stale file). However rebalances don't
fix
> this issue.  Also this still doesn't seem explain that most stale files
> always end up in the first sub-volume.
> Unfortunate i've no proof this is actually the root cause, besides that
> the symptom "disappeared" once gluster had more space to work
with.
>
> Best Olaf
>
> Op wo 27 nov. 2019 om 02:38 schreef Timothy Orme <torme at
ancestry.com>:
>
> Hi All,
>
> I'm running a 3x2 cluster, v6.5.  Not sure if its relevant, but also
have
> sharding enabled.
>
> I've found that when under heavy write load, clients start erroring out
> with "stale file handle" errors, on files not related to the
writes.
>
> For instance, when a user is running a simple wc against a file, it will
> bail during that operation with "stale file"
>
> When I check the client logs, I see errors like:
>
> [2019-11-26 22:41:33.565776] E [MSGID: 109040]
> [dht-helper.c:1336:dht_migration_complete_check_task] 3-scratch-dht:
> 24d53a0e-c28d-41e0-9dbc-a75e823a3c7d: failed to lookup the file on
> scratch-dht  [Stale file handle]
> [2019-11-26 22:41:33.565853] W [fuse-bridge.c:2827:fuse_readv_cbk]
> 0-glusterfs-fuse: 33112038: READ => -1
> gfid=147040e2-a6b8-4f54-8490-f0f3df29ee50 fd=0x7f95d8d0b3f8 (Stale file
> handle)
>
> I've seen some bugs or other threads referencing similar issues, but
> couldn't really discern a solution from them.
>
> Is this caused by some consistency issue with metadata while under load or
> something else?  I dont see the issue when heavy reads are occurrring.
>
> Any help is greatly appreciated!
>
> Thanks!
> Tim
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge:
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=JHDxrPUb-16_6j_D-rhVhXtDR9h4OwPyylW4ScTmygE&e=>
> https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge:
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=JHDxrPUb-16_6j_D-rhVhXtDR9h4OwPyylW4ScTmygE&e=>
> https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=gPJBHZbzGbDnozrJuLTslUXJdPrLDrR2rT86P1uUuPk&e=>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191128/de14102b/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

Gluster users - Nov 2019 - Stale File Handle Errors During Heavy Writes

[Gluster-users] Stale File Handle Errors During Heavy Writes

[Gluster-users] Stale File Handle Errors During Heavy Writes

Possibly Parallel Threads