thr3ads.net - Gluster users - [Gluster-users] Stale file content [Oct 2023]

If this information is useful, please help other people find it:
Share via:
Norbert Hartl
2023-Oct-16 07:21 UTC
[Gluster-users] Stale file content

Hi,

sorry, I was on vacation and just found it in my inbox again.
> Am 25.09.2023 um 23:32 schrieb Strahil Nikolov <hunter86_bg at
yahoo.com>:
> 
> Hi,
> 
> I found this one in my spam folder and ai can?t find any follow up
communication.
> Have you found the problem ?
Sort of. I just recognized that the volume stops syncing completely while my
other volume just did fine. So in the end I suspected some parameters to be the
culprit and repeated everything again and again adding a single parameter add a
time. When I added

performance.stat-prefetch: 'off'

it stopped syncing. Looking in more detail I?ve found that I have the setting
twice. Before that setting there is a

performance.stat-prefetch: ?on'

so it is switched on and when it is switched off again the volume stops syncing.
That is the thing I could find out and then I had to take countermeasure because
it was blocking the development of our product. While downgrading to glusterfs I
swa that the client-version was still on 10000 which made downgrading easier but
is maybe additional context to the problem.
There was also the problem that when creating the replica volume it appeared as
Distributed-Replicate which made the 11 release not feasible to use.

And there does not seem to be a fix release. The release page on
https://www.gluster.org/release-schedule/ shows the last release was on 14th of
februar and the noted 3 month maintenance does not seem to have taken place.

regards,

Norbert
> Usually it?s either a bug or a cache problem.
> To rule out the former , upgrade the clients , the servers and update the
ops version to the max.
> 
> Best Regards,
> Strahil Nikolov 
> 
> 
> Sent from Yahoo Mail for iPhone
<https://mail.onelink.me/107872968?pid=nativeplacement&c=Global_Acquisition_YMktg_315_Internal_EmailSignature&af_sub1=Acquisition&af_sub2=Global_YMktg&af_sub3=&af_sub4=100000604&af_sub5=EmailSignature__Static_>
> 
> On Friday, August 18, 2023, 11:41 PM, Norbert Hartl <norbert at
hartl.name> wrote:
> 
> I found another file that went stale and monitored that for hours but the
content did not update. While one client machine just progressed with the
content, the second machine has always the same content. I touched the file and
this works in both directions updating the timestamp but the content stays the
old on machine2.
> 
> Isn?t that weird?
> 
> Norbert
> 
>> Am 18.08.2023 um 13:53 schrieb Norbert Hartl <norbert at
hartl.name>:
>> 
> 
> Hi,
> 
> I?m using glusterfs since approx. 18 month and need some help detecting the
culprit for stale file content. Stale file content means here that I read the
same file from two clients with different content.
> 
> I have a distributed volume 
> 
> Volume Name: apptivegrid
> Type: Distribute
> Volume ID: 7087ee24-6603-477a-a822-29d011bca78e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.1.2.1:/bricks/apptivegrid-base
> Brick2: 10.1.2.8:/bricks/apptivegrid-base
> Options Reconfigured:
> performance.cache-invalidation: on
> performance.cache-samba-metadata: on
> performance.strict-o-direct: on
> performance.open-behind: off
> performance.read-ahead: off
> performance.write-behind: off
> performance.readdir-ahead: off
> performance.parallel-readdir: off
> performance.quick-read: off
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.flush-behind: off
> performance.client-io-threads: off
> locks.mandatory-locking: off
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> 
> and sometimes I get stale file content like expained above. The file that I
discovered having the problem is a small file of 32 bytes. The change to that
file is a version number increased, so I could testify the data is changed at
the right position and also could see that it has old content on one of the
machines. I made the above settings to the volume in order to be sure it cannot
happen.
> 
> I created that volume a while ago and did not change the layout of the
bricks (but upgraded from gluster 10 to 11 last week). Running a rebalance
command should a lot of errors and failures in the status report of the
rebalance. I also get quite a lot of this error in the logfile (settings is the
small file I?m talking about)
> 
> [2023-08-17 21:57:31.910494 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-apptivegrid-client-0: remote
operation failed. [{path=/62/03/8c/62038cd116e9a6857794aa14/settings},
{gfid=1d38410a-1c14-4346-a7e5-68856ed310e9}, {errno=2}, {error=No such file or
directory}]
> 
> and this 
> 
> [2023-08-17 21:57:31.902676 +0000] I [MSGID: 109018]
[dht-common.c:1838:dht_revalidate_cbk] 0-apptivegrid-dht: Mismatching layouts
for /62/03/8c/62038cd116e9a6857794aa14, gfid =
f7f8eef0-bc19-4936-8c0c-fd0a497c5e69
> 
> This morning I found another occurrence of a stale file which I wanted to
diagnose but a couple of minutes later it seemed to have healed itself. In order
to diagnose I?ve shutdown the processes that could access it to be sure. So no
idea what did the refresh if my action (releasing fds/locks) or timeout.
> 
> In order to better estimate what is the culprit I would need to
verify/falsify some of my assumptions:
> 
> - the process P1 on a machine opens a couple of files and keeps them open
until 30 minutes of inactivity on them. If a process P2 running on another
machine would change the same file, P1 would see those changes on next read,
right? So my assumption is that an open fd can get content update and open fds
do not prevent files to be updated on the client machine that has P1 running
> - same question for locks. I do advisory locks on the small file for
updating. This wouldn?t conflict with a content update as I assume that
glusterfs does not lock ranges it updates
> 
> If my assumptions are valid I would suspect a cache on the client could be
the culprit. I read that there is a default cache 32MB for small files. But then
I thought this would be invalidated by an upcall as cache-invalidation is on.
> 
> Is there a command to flush the client cache? What isn?t nice but should
work is unmounting and mounting again.
> 
> I hope the information provided is sufficient enough.
> 
> thanks in advance,
> 
> Norbert
> 
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20231016/e95272d2/attachment.html>
Reasonably Related Threads

Search for more maybe matching threads
Gluster users - Oct 2023 - Stale file content

[Gluster-users] Stale file content

Reasonably Related Threads

Wisdom of the Ancients