Hi Strahil, today we have the same number clients on all nodes, but the problem persists. I have the impression that it gets more frequent as the server capacity fills up, now we are having at least one incident per day. Regards, Martin On Mon, Oct 26, 2020 at 8:09 AM Mart?n Lorenzo <mlorenzo at gmail.com> wrote:> HI Strahil, thanks for your reply, > I had one node with 13 clients, the rest with 14. I've just restarted the > services on that node, now I have 14, let's see what happens. > Regarding the samba repos, I wasn't aware of that, I was using centos main > repo. I'll check the out > Best Regards, > Martin > > > On Tue, Oct 20, 2020 at 3:19 PM Strahil Nikolov <hunter86_bg at yahoo.com> > wrote: > >> Do you have the same ammount of clients connected to each brick ? >> >> I guess something like this can show it: >> >> gluster volume status VOL clients >> gluster volume status VOL client-list >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> >> >> ? ???????, 20 ???????? 2020 ?., 15:41:45 ???????+3, Mart?n Lorenzo < >> mlorenzo at gmail.com> ??????: >> >> >> >> >> >> Hi, I have the following problem, I have a distributed replicated cluster >> set up with samba and CTDB, over fuse mount points >> I am having inconsistencies across the FUSE mounts, users report that >> files are disappearing after being copied/moved. I take a look at the mount >> points on each node, and they don't display the same data >> >> #### faulty mount point#### >> [root at gluster6 ARRIBA GENTE martes 20 de octubre]# ll >> ls: cannot access PANEO VUELTA A CLASES CON TAPABOCAS.mpg: No such file >> or directory >> ls: cannot access PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg: No such file or >> directory >> total 633723 >> drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG martes >> 20 de octubre >> -rw-r--r--. 1 arribagente PN 648927236 Jun 3 07:16 PANEO FACHADA PALACIO >> LEGISLATIVO DRONE DIA Y NOCHE.mpg >> -?????????? ? ? ? ? ? PANEO NI?OS ESCUELAS >> CON TAPABOCAS.mpg >> -?????????? ? ? ? ? ? PANEO VUELTA A CLASES >> CON TAPABOCAS.mpg >> >> >> ###healthy mount point### >> [root at gluster7 ARRIBA GENTE martes 20 de octubre]# ll >> total 3435596 >> drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG martes >> 20 de octubre >> -rw-r--r--. 1 arribagente PN 648927236 Jun 3 07:16 PANEO FACHADA >> PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg >> -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NI?OS ESCUELAS >> CON TAPABOCAS.mpg >> -rw-r--r--. 1 arribagente PN 784701444 Sep 4 07:23 PANEO VUELTA A >> CLASES CON TAPABOCAS.mpg >> >> - So far the only way to solve this is to create a directory in the >> healthy mount point, on the same path: >> [root at gluster7 ARRIBA GENTE martes 20 de octubre]# mkdir hola >> >> - When you refresh the other mountpoint, and the issue is resolved: >> [root at gluster6 ARRIBA GENTE martes 20 de octubre]# ll >> total 3435600 >> drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG >> martes 20 de octubre >> drwxr-xr-x. 2 root root 4096 Oct 20 08:45 hola >> -rw-r--r--. 1 arribagente PN 648927236 Jun 3 07:16 PANEO FACHADA >> PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg >> -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NI?OS >> ESCUELAS CON TAPABOCAS.mpg >> -rw-r--r--. 1 arribagente PN 784701444 Sep 4 07:23 PANEO VUELTA A >> CLASES CON TAPABOCAS.mpg >> >> Interestingly, the error occurs on the mount point where the files were >> copied. They don't show up as pending heal entries. I have around 15 people >> using them over samba, I think I'm having this issue reported every two >> days. >> >> I have an older cluster with similar issues, different gluster version, >> but a very similar topology (4 bricks, initially two bricks then expanded) >> Please note , the bricks aren't the same size (but their replicas are), >> so my other suspicion is that rebalancing has something to do with it. >> >> I'm trying to reproduce it over a small virtualized cluster, so far no >> results. >> >> Here are the cluster details >> four nodes, replica 2, plus one arbiter hosting 2 bricks >> >> I have 2 bricks with ~20 TB capacity and the other pair is ~48TB >> Volume Name: tapeless >> Type: Distributed-Replicate >> Volume ID: 53bfa86d-b390-496b-bbd7-c4bba625c956 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 2 x (2 + 1) = 6 >> Transport-type: tcp >> Bricks: >> Brick1: gluster6.glustersaeta.net:/data/glusterfs/tapeless/brick_6/brick >> Brick2: gluster7.glustersaeta.net:/data/glusterfs/tapeless/brick_7/brick >> Brick3: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_1a/brick >> (arbiter) >> Brick4: gluster12.glustersaeta.net: >> /data/glusterfs/tapeless/brick_12/brick >> Brick5: gluster13.glustersaeta.net: >> /data/glusterfs/tapeless/brick_13/brick >> Brick6: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_2a/brick >> (arbiter) >> Options Reconfigured: >> features.quota-deem-statfs: on >> performance.client-io-threads: on >> nfs.disable: on >> transport.address-family: inet >> features.quota: on >> features.inode-quota: on >> features.cache-invalidation: on >> features.cache-invalidation-timeout: 600 >> performance.cache-samba-metadata: on >> performance.stat-prefetch: on >> performance.cache-invalidation: on >> performance.md-cache-timeout: 600 >> network.inode-lru-limit: 200000 >> performance.nl-cache: on >> performance.nl-cache-timeout: 600 >> performance.readdir-ahead: on >> performance.parallel-readdir: on >> performance.cache-size: 1GB >> client.event-threads: 4 >> server.event-threads: 4 >> performance.normal-prio-threads: 16 >> performance.io-thread-count: 32 >> performance.write-behind-window-size: 8MB >> storage.batch-fsync-delay-usec: 0 >> cluster.data-self-heal: on >> cluster.metadata-self-heal: on >> cluster.entry-self-heal: on >> cluster.self-heal-daemon: on >> performance.write-behind: on >> performance.open-behind: on >> >> Log section form faulty mount point. I think the [file exists] entries >> are from people trying to copy the missing files over an over >> >> >> [2020-10-20 11:31:03.034220] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:32:06.684329] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:33:02.191863] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:34:05.841608] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:35:20.736633] I [MSGID: 108026] >> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >> 0-tapeless-replicate-1: performing metadata selfheal on >> 958dbd7a-3cd7-4b66-9038-76e5c5669644 >> [2020-10-20 11:35:20.741213] I [MSGID: 108026] >> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: >> Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644. >> sources=[0] 1 sinks=2 >> [2020-10-20 11:35:04.278043] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> The message "I [MSGID: 108026] >> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >> 0-tapeless-replicate-1: performing metadata selfheal on >> 958dbd7a-3cd7-4b66-9038-76e5c5669644" repeated 3 times between [2020-10-20 >> 11:35:20.736633] and [2020-10-20 11:35:26.733298] >> The message "I [MSGID: 108026] >> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: >> Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644. >> sources=[0] 1 sinks=2 " repeated 3 times between [2020-10-20 >> 11:35:20.741213] and [2020-10-20 11:35:26.737629] >> [2020-10-20 11:36:02.548350] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:36:57.365537] I [MSGID: 108026] >> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >> 0-tapeless-replicate-1: performing metadata selfheal on >> f4907af2-1775-4c46-89b5-e9776df6d5c7 >> [2020-10-20 11:36:57.370824] I [MSGID: 108026] >> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: >> Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7. >> sources=[0] 1 sinks=2 >> [2020-10-20 11:37:01.363925] I [MSGID: 108026] >> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >> 0-tapeless-replicate-1: performing metadata selfheal on >> f4907af2-1775-4c46-89b5-e9776df6d5c7 >> [2020-10-20 11:37:01.368069] I [MSGID: 108026] >> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: >> Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7. >> sources=[0] 1 sinks=2 >> The message "I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0" repeated 3 times between >> [2020-10-20 11:36:02.548350] and [2020-10-20 11:37:36.389208] >> [2020-10-20 11:38:07.367113] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:39:01.595981] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:40:04.184899] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:41:07.833470] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:42:01.871621] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:43:04.399194] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:44:04.558647] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:44:15.953600] W [MSGID: 114031] >> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5: >> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE >> martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists] >> [2020-10-20 11:44:15.953819] W [MSGID: 114031] >> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2: >> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE >> martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists] >> [2020-10-20 11:44:15.954072] W [MSGID: 114031] >> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3: >> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE >> martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists] >> [2020-10-20 11:44:15.954680] W [fuse-bridge.c:2606:fuse_create_cbk] >> 0-glusterfs-fuse: 31043294: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes >> 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:15.963175] W [fuse-bridge.c:2606:fuse_create_cbk] >> 0-glusterfs-fuse: 31043306: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes >> 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:15.971839] W [fuse-bridge.c:2606:fuse_create_cbk] >> 0-glusterfs-fuse: 31043318: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes >> 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:16.010242] W [fuse-bridge.c:2606:fuse_create_cbk] >> 0-glusterfs-fuse: 31043403: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes >> 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:16.020291] W [fuse-bridge.c:2606:fuse_create_cbk] >> 0-glusterfs-fuse: 31043415: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes >> 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:16.028857] W [fuse-bridge.c:2606:fuse_create_cbk] >> 0-glusterfs-fuse: 31043427: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes >> 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> The message "W [MSGID: 114031] >> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5: >> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE >> martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists]" >> repeated 5 times between [2020-10-20 11:44:15.953600] and [2020-10-20 >> 11:44:16.027785] >> The message "W [MSGID: 114031] >> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2: >> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE >> martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists]" >> repeated 5 times between [2020-10-20 11:44:15.953819] and [2020-10-20 >> 11:44:16.028331] >> The message "W [MSGID: 114031] >> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3: >> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE >> martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists]" >> repeated 5 times between [2020-10-20 11:44:15.954072] and [2020-10-20 >> 11:44:16.028355] >> [2020-10-20 11:45:03.572106] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:45:40.080010] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> The message "I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0" repeated 2 times between >> [2020-10-20 11:45:40.080010] and [2020-10-20 11:47:10.871801] >> [2020-10-20 11:48:03.913129] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:49:05.082165] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:50:06.725722] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:51:04.254685] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:52:07.903617] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:53:01.420513] I [MSGID: 108026] >> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >> 0-tapeless-replicate-0: performing metadata selfheal on >> 3c316533-5f47-4267-ac19-58b3be305b94 >> [2020-10-20 11:53:01.428657] I [MSGID: 108026] >> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-0: >> Completed metadata selfheal on 3c316533-5f47-4267-ac19-58b3be305b94. >> sources=[0] sinks=1 2 >> The message "I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0" repeated 3 times between >> [2020-10-20 11:52:07.903617] and [2020-10-20 11:53:12.037835] >> [2020-10-20 11:54:02.208354] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:55:04.360284] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:56:09.508092] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:57:02.580970] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> [2020-10-20 11:58:06.230698] I [MSGID: 108031] >> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: >> selecting local read_child tapeless-client-0 >> >> >> Let me know if you need something else. Thank you for you suppoort! >> Best Regards, >> Martin Lorenzo >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201027/1346fe94/attachment.html>
Have you tried to reduce the cache timeouts ? I can't find your gluster version in the thread - can you share again OS + gluster version ? Best Regards, Strahil Nikolov ? ???????, 27 ???????? 2020 ?., 19:23:28 ???????+2, Mart?n Lorenzo <mlorenzo at gmail.com> ??????: Hi Strahil, today we have the same number clients on all nodes, but the problem persists. I have the impression that it gets more frequent?as the server capacity fills up, now we are having at least one incident per day. Regards, Martin On Mon, Oct 26, 2020 at 8:09 AM Mart?n Lorenzo <mlorenzo at gmail.com> wrote:> HI Strahil, thanks for your reply, > I had one node with 13 clients, the rest with 14. I've just restarted the services on that node, now I have 14, let's see what happens. > Regarding the samba repos, I wasn't aware of that, I was using centos main repo. I'll check the out > Best Regards, > Martin > > > On Tue, Oct 20, 2020 at 3:19 PM Strahil Nikolov <hunter86_bg at yahoo.com> wrote: >> Do you have the same ammount of clients connected to each brick ? >> >> I guess something like this can show it: >> >> gluster volume status VOL clients >> gluster volume status VOL client-list >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> >> >> ? ???????, 20 ???????? 2020 ?., 15:41:45 ???????+3, Mart?n Lorenzo <mlorenzo at gmail.com> ??????: >> >> >> >> >> >> Hi, I have the following?problem, I have a distributed replicated cluster set up with samba and CTDB, over fuse mount points >> I am having inconsistencies across the FUSE mounts, users report that files are disappearing after?being copied/moved. I take a look?at the mount points on each node, and they don't display the same data >> >> #### faulty mount point#### >> [root at gluster6 ARRIBA GENTE martes 20 de octubre]# ll >> ls: cannot access PANEO VUELTA A CLASES CON TAPABOCAS.mpg: No such file or directory >> ls: cannot access PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg: No such file or directory >> total 633723 >> drwxr-xr-x. 5 arribagente PN ? ? ?4096 Oct 19 10:52 COMERCIAL AG martes 20 de octubre >> -rw-r--r--. 1 arribagente PN 648927236 Jun ?3 07:16 PANEO FACHADA PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg >> -?????????? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?? PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg >> -?????????? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?? PANEO VUELTA A CLASES CON TAPABOCAS.mpg >> >> >> ###healthy mount point### >> [root at gluster7 ARRIBA GENTE martes 20 de octubre]# ll >> total 3435596 >> drwxr-xr-x. 5 arribagente PN ? ? ? 4096 Oct 19 10:52 COMERCIAL AG martes 20 de octubre >> -rw-r--r--. 1 arribagente PN ?648927236 Jun ?3 07:16 PANEO FACHADA PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg >> -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg >> -rw-r--r--. 1 arribagente PN ?784701444 Sep ?4 07:23 PANEO VUELTA A CLASES CON TAPABOCAS.mpg >> >> ?- So far the only way to solve this is to create a directory in the healthy mount point, on the same path: >> [root at gluster7 ARRIBA GENTE martes 20 de octubre]# mkdir hola >> >> - When you refresh the other mountpoint, and the issue is resolved: >> [root at gluster6 ARRIBA GENTE martes 20 de octubre]# ll >> total 3435600 >> drwxr-xr-x. 5 arribagente PN ? ? ? ? 4096 Oct 19 10:52 COMERCIAL AG martes 20 de octubre >> drwxr-xr-x. 2 root ? ? ? ?root ? ? ? 4096 Oct 20 08:45 hola >> -rw-r--r--. 1 arribagente PN ? ?648927236 Jun ?3 07:16 PANEO FACHADA PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg >> -rw-r--r--. 1 arribagente PN ? 2084415492 Aug 18 09:14 PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg >> -rw-r--r--. 1 arribagente PN ? ?784701444 Sep ?4 07:23 PANEO VUELTA A CLASES CON TAPABOCAS.mpg >> >> Interestingly, the error occurs on the mount point where the files were copied. They don't show up as pending heal entries. I have around 15 people using them over samba, I think I'm having this issue reported every two days.? >> >> I have an older cluster with?similar?issues, different gluster version, but a very similar topology (4 bricks, initially two bricks then expanded) >> Please note , the bricks aren't?the same size (but their replicas are), so my other suspicion is that rebalancing has something to do with it. >> >> I'm trying to reproduce it over a small virtualized?cluster, so far no results. >> >> Here are the cluster details >> four nodes, replica 2, plus one arbiter hosting 2 bricks >> >> I have 2 bricks with ~20 TB capacity and the other?pair is ~48TB >> Volume Name: tapeless >> Type: Distributed-Replicate >> Volume ID: 53bfa86d-b390-496b-bbd7-c4bba625c956 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 2 x (2 + 1) = 6 >> Transport-type: tcp >> Bricks: >> Brick1: gluster6.glustersaeta.net:/data/glusterfs/tapeless/brick_6/brick >> Brick2: gluster7.glustersaeta.net:/data/glusterfs/tapeless/brick_7/brick >> Brick3: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_1a/brick (arbiter) >> Brick4: gluster12.glustersaeta.net:/data/glusterfs/tapeless/brick_12/brick >> Brick5: gluster13.glustersaeta.net:/data/glusterfs/tapeless/brick_13/brick >> Brick6: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_2a/brick (arbiter) >> Options Reconfigured: >> features.quota-deem-statfs: on >> performance.client-io-threads: on >> nfs.disable: on >> transport.address-family: inet >> features.quota: on >> features.inode-quota: on >> features.cache-invalidation: on >> features.cache-invalidation-timeout: 600 >> performance.cache-samba-metadata: on >> performance.stat-prefetch: on >> performance.cache-invalidation: on >> performance.md-cache-timeout: 600 >> network.inode-lru-limit: 200000 >> performance.nl-cache: on >> performance.nl-cache-timeout: 600 >> performance.readdir-ahead: on >> performance.parallel-readdir: on >> performance.cache-size: 1GB >> client.event-threads: 4 >> server.event-threads: 4 >> performance.normal-prio-threads: 16 >> performance.io-thread-count: 32 >> performance.write-behind-window-size: 8MB >> storage.batch-fsync-delay-usec: 0 >> cluster.data-self-heal: on >> cluster.metadata-self-heal: on >> cluster.entry-self-heal: on >> cluster.self-heal-daemon: on >> performance.write-behind: on >> performance.open-behind: on >> >> Log section form faulty mount point. I think the [file exists] entries are from people trying to copy the missing files over an over >> >> >> [2020-10-20 11:31:03.034220] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:32:06.684329] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:33:02.191863] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:34:05.841608] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:35:20.736633] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644 >> [2020-10-20 11:35:20.741213] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644. sources=[0] 1 ?sinks=2 ? >> [2020-10-20 11:35:04.278043] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> The message "I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644" repeated 3 times between [2020-10-20 11:35:20.736633] and [2020-10-20 11:35:26.733298] >> The message "I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644. sources=[0] 1 ?sinks=2 " repeated 3 times between [2020-10-20 11:35:20.741213] and [2020-10-20 11:35:26.737629] >> [2020-10-20 11:36:02.548350] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:36:57.365537] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7 >> [2020-10-20 11:36:57.370824] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7. sources=[0] 1 ?sinks=2 ? >> [2020-10-20 11:37:01.363925] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7 >> [2020-10-20 11:37:01.368069] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7. sources=[0] 1 ?sinks=2 ? >> The message "I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0" repeated 3 times between [2020-10-20 11:36:02.548350] and [2020-10-20 11:37:36.389208] >> [2020-10-20 11:38:07.367113] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:39:01.595981] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:40:04.184899] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:41:07.833470] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:42:01.871621] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:43:04.399194] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:44:04.558647] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:44:15.953600] W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists] >> [2020-10-20 11:44:15.953819] W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists] >> [2020-10-20 11:44:15.954072] W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists] >> [2020-10-20 11:44:15.954680] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043294: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:15.963175] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043306: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:15.971839] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043318: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:16.010242] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043403: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:16.020291] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043415: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> [2020-10-20 11:44:16.028857] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043427: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists) >> The message "W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists]" repeated 5 times between [2020-10-20 11:44:15.953600] and [2020-10-20 11:44:16.027785] >> The message "W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists]" repeated 5 times between [2020-10-20 11:44:15.953819] and [2020-10-20 11:44:16.028331] >> The message "W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NI?OS ESCUELAS CON TAPABOCAS.mpg [File exists]" repeated 5 times between [2020-10-20 11:44:15.954072] and [2020-10-20 11:44:16.028355] >> [2020-10-20 11:45:03.572106] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:45:40.080010] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> The message "I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0" repeated 2 times between [2020-10-20 11:45:40.080010] and [2020-10-20 11:47:10.871801] >> [2020-10-20 11:48:03.913129] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:49:05.082165] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:50:06.725722] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:51:04.254685] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:52:07.903617] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:53:01.420513] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-0: performing metadata selfheal on 3c316533-5f47-4267-ac19-58b3be305b94 >> [2020-10-20 11:53:01.428657] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-0: Completed metadata selfheal on 3c316533-5f47-4267-ac19-58b3be305b94. sources=[0] ?sinks=1 2 ? >> The message "I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0" repeated 3 times between [2020-10-20 11:52:07.903617] and [2020-10-20 11:53:12.037835] >> [2020-10-20 11:54:02.208354] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:55:04.360284] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:56:09.508092] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:57:02.580970] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 >> [2020-10-20 11:58:06.230698] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0? >> >> >> Let me know if you need something else. Thank you for you suppoort! >> Best Regards, >> Martin Lorenzo >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >