Artem Russakovskii
2020-Jun-21 18:11 UTC
[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages
There's been 0 progress or attention to this issue in a month on github or otherwise. Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | @ArtemR <http://twitter.com/ArtemR> On Thu, May 21, 2020 at 12:43 PM Artem Russakovskii <archon810 at gmail.com> wrote:> I've also moved this to github: > https://github.com/gluster/glusterfs/issues/1257. > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | @ArtemR <http://twitter.com/ArtemR> > > > On Fri, May 15, 2020 at 2:51 PM Artem Russakovskii <archon810 at gmail.com> > wrote: > >> Hi, >> >> I see the team met up recently and one of the discussed items was issues >> upgrading to v7. What were the results of this discussion? >> >> Is the team going to respond to this thread with their thoughts and >> analysis? >> >> Thanks. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | @ArtemR <http://twitter.com/ArtemR> >> >> >> On Mon, May 4, 2020 at 10:23 PM Strahil Nikolov <hunter86_bg at yahoo.com> >> wrote: >> >>> On May 4, 2020 4:26:32 PM GMT+03:00, Amar Tumballi <amar at kadalu.io> >>> wrote: >>> >On Sat, May 2, 2020 at 10:49 PM Artem Russakovskii >>> ><archon810 at gmail.com> >>> >wrote: >>> > >>> >> I don't have geo replication. >>> >> >>> >> Still waiting for someone from the gluster team to chime in. They >>> >used to >>> >> be a lot more responsive here. Do you know if there is a holiday >>> >perhaps, >>> >> or have the working hours been cut due to Coronavirus currently? >>> >> >>> >> >>> >It was Holiday on May 1st, and 2nd and 3rd were Weekend days! And also >>> >I >>> >guess many of Developers from Red Hat were attending Virtual Summit! >>> > >>> > >>> > >>> >> I'm not inclined to try a v6 upgrade without their word first. >>> >> >>> > >>> >Fair bet! I will bring this topic in one of the community meetings, and >>> >ask >>> >developers if they have some feedback! I personally have not seen these >>> >errors, and don't have a hunch on which patch would have caused an >>> >increase >>> >in logs! >>> > >>> >-Amar >>> > >>> > >>> >> >>> >> On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <hunter86_bg at yahoo.com> >>> >> wrote: >>> >> >>> >>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii < >>> >>> archon810 at gmail.com> wrote: >>> >>> >The good news is the downgrade seems to have worked and was >>> >painless. >>> >>> > >>> >>> >zypper install --oldpackage glusterfs-5.13, restart gluster, and >>> >almost >>> >>> >immediately there are no heal pending entries anymore. >>> >>> > >>> >>> >The only things still showing up in the logs, besides some healing >>> >is >>> >>> >0-glusterfs-fuse: >>> >>> >writing to fuse device failed: No such file or directory: >>> >>> >==> mnt-androidpolice_data3.log <=>>> >>> >[2020-05-01 16:54:21.085643] E >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>> >>> >(--> >>> >>> >>> >>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d] >>> >>> >(--> >>> >>> >>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a] >>> >>> >(--> >>> >>> >>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb] >>> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--> >>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] ))))) >>> >0-glusterfs-fuse: >>> >>> >writing to fuse device failed: No such file or directory >>> >>> >==> mnt-apkmirror_data1.log <=>>> >>> >[2020-05-01 16:54:21.268842] E >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>> >>> >(--> >>> >>> >>> >>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d] >>> >>> >(--> >>> >>> >>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a] >>> >>> >(--> >>> >>> >>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb] >>> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (--> >>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] ))))) >>> >0-glusterfs-fuse: >>> >>> >writing to fuse device failed: No such file or directory >>> >>> > >>> >>> >It'd be very helpful if it had more info about what failed to write >>> >and >>> >>> >why. >>> >>> > >>> >>> >I'd still really love to see the analysis of this failed upgrade >>> >from >>> >>> >core >>> >>> >gluster maintainers to see what needs fixing and how we can upgrade >>> >in >>> >>> >the >>> >>> >future. >>> >>> > >>> >>> >Thanks. >>> >>> > >>> >>> >Sincerely, >>> >>> >Artem >>> >>> > >>> >>> >-- >>> >>> >Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC >>> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>> >>> > >>> >>> > >>> >>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii >>> ><archon810 at gmail.com> >>> >>> >wrote: >>> >>> > >>> >>> >> I do not have snapshots, no. I have a general file based backup, >>> >but >>> >>> >also >>> >>> >> the other 3 nodes are up. >>> >>> >> >>> >>> >> OpenSUSE 15.1. >>> >>> >> >>> >>> >> If I try to downgrade and it doesn't work, what's the brick >>> >>> >replacement >>> >>> >> scenario - is this still accurate? >>> >>> >> >>> >>> > >>> >>> >>> > >>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick >>> >>> >> >>> >>> >> Any feedback about the issues themselves yet please? >>> >Specifically, is >>> >>> >> there a chance this is happening because of the mismatched >>> >gluster >>> >>> >> versions? Though, what's the solution then? >>> >>> >> >>> >>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov >>> ><hunter86_bg at yahoo.com> >>> >>> >> wrote: >>> >>> >> >>> >>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem Russakovskii < >>> >>> >>> archon810 at gmail.com> wrote: >>> >>> >>> >If more time is needed to analyze this, is this an option? Shut >>> >>> >down >>> >>> >>> >7.5, >>> >>> >>> >downgrade it back to 5.13 and restart, or would this screw >>> >>> >something up >>> >>> >>> >badly? I didn't up the op-version yet. >>> >>> >>> > >>> >>> >>> >Thanks. >>> >>> >>> > >>> >>> >>> >Sincerely, >>> >>> >>> >Artem >>> >>> >>> > >>> >>> >>> >-- >>> >>> >>> >Founder, Android Police <http://www.androidpolice.com>, APK >>> >Mirror >>> >>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC >>> >>> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>> >>> >>> > >>> >>> >>> > >>> >>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem Russakovskii >>> >>> >>> ><archon810 at gmail.com> >>> >>> >>> >wrote: >>> >>> >>> > >>> >>> >>> >> The number of heal pending on citadel, the one that was >>> >upgraded >>> >>> >to >>> >>> >>> >7.5, >>> >>> >>> >> has now gone to 10s of thousands and continues to go up. >>> >>> >>> >> >>> >>> >>> >> Sincerely, >>> >>> >>> >> Artem >>> >>> >>> >> >>> >>> >>> >> -- >>> >>> >>> >> Founder, Android Police <http://www.androidpolice.com>, APK >>> >>> >Mirror >>> >>> >>> >> <http://www.apkmirror.com/>, Illogical Robot LLC >>> >>> >>> >> beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>> >>> >>> >> >>> >>> >>> >> >>> >>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem Russakovskii >>> >>> >>> ><archon810 at gmail.com> >>> >>> >>> >> wrote: >>> >>> >>> >> >>> >>> >>> >>> Hi all, >>> >>> >>> >>> >>> >>> >>> >>> Today, I decided to upgrade one of the four servers >>> >(citadel) we >>> >>> >>> >have to >>> >>> >>> >>> 7.5 from 5.13. There are 2 volumes, 1x4 replicate, and fuse >>> >>> >mounts >>> >>> >>> >(I sent >>> >>> >>> >>> the full details earlier in another message). If everything >>> >>> >looked >>> >>> >>> >OK, I >>> >>> >>> >>> would have proceeded the rolling upgrade for all of them, >>> >>> >following >>> >>> >>> >the >>> >>> >>> >>> full heal. >>> >>> >>> >>> >>> >>> >>> >>> However, as soon as I upgraded and restarted, the logs >>> >filled >>> >>> >with >>> >>> >>> >>> messages like these: >>> >>> >>> >>> >>> >>> >>> >>> [2020-04-30 21:39:21.316149] E >>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>> >actor >>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>> >>> >>> >>> [2020-04-30 21:39:21.382891] E >>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>> >actor >>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>> >>> >>> >>> [2020-04-30 21:39:21.442440] E >>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>> >actor >>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>> >>> >>> >>> [2020-04-30 21:39:21.445587] E >>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>> >actor >>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>> >>> >>> >>> [2020-04-30 21:39:21.571398] E >>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>> >actor >>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>> >>> >>> >>> [2020-04-30 21:39:21.668192] E >>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>> >actor >>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> The message "I [MSGID: 108031] >>> >>> >>> >>> [afr-common.c:2581:afr_local_discovery_cbk] >>> >>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local >>> >read_child >>> >>> >>> >>> androidpolice_data3-client-3" repeated 10 times between >>> >>> >[2020-04-30 >>> >>> >>> >>> 21:46:41.854675] and [2020-04-30 21:48:20.206323] >>> >>> >>> >>> The message "W [MSGID: 114031] >>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed >>> >>> >[Transport >>> >>> >>> >endpoint >>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30 >>> >>> >>> >21:46:32.129567] >>> >>> >>> >>> and [2020-04-30 21:48:29.905008] >>> >>> >>> >>> The message "W [MSGID: 114031] >>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed >>> >>> >[Transport >>> >>> >>> >endpoint >>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30 >>> >>> >>> >21:46:32.129602] >>> >>> >>> >>> and [2020-04-30 21:48:29.905040] >>> >>> >>> >>> The message "W [MSGID: 114031] >>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-client-2: remote operation failed >>> >>> >[Transport >>> >>> >>> >endpoint >>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30 >>> >>> >>> >21:46:32.129512] >>> >>> >>> >>> and [2020-04-30 21:48:29.905047] >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Once in a while, I'm seeing this: >>> >>> >>> >>> ==> bricks/mnt-hive_block4-androidpolice_data3.log <=>>> >>> >>> >>> [2020-04-30 21:45:54.251637] I [MSGID: 115072] >>> >>> >>> >>> [server-rpc-fops_v2.c:1681:server4_setattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-server: 5725811: SETATTR / >>> >>> >>> >>> >>> >>> >>> > >>> >>> >>> >>> >>> > >>> >>> >>> > >>> androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png >>> >>> >>> >>> (d4556eb4-f15b-412c-a42a-32b4438af557), client: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1, >>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>> >not >>> >>> >>> >permitted] >>> >>> >>> >>> [2020-04-30 21:49:10.439701] I [MSGID: 115072] >>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-server: 201833: SETATTR / >>> >>> >>> >>> androidpolice.com/public/wp-content/uploads >>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2, >>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>> >not >>> >>> >>> >permitted] >>> >>> >>> >>> [2020-04-30 21:49:10.453724] I [MSGID: 115072] >>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-server: 201842: SETATTR / >>> >>> >>> >>> androidpolice.com/public/wp-content/uploads >>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2, >>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>> >not >>> >>> >>> >permitted] >>> >>> >>> >>> [2020-04-30 21:49:16.224662] I [MSGID: 115072] >>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-server: 202865: SETATTR / >>> >>> >>> >>> androidpolice.com/public/wp-content/uploads >>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2, >>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>> >not >>> >>> >>> >permitted] >>> >>> >>> >>> >>> >>> >>> >>> There's also lots of self-healing happening that I didn't >>> >expect >>> >>> >at >>> >>> >>> >all, >>> >>> >>> >>> since the upgrade only took ~10-15s. >>> >>> >>> >>> [2020-04-30 21:47:38.714448] I [MSGID: 108026] >>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >>> >>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal >>> >on >>> >>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461 >>> >>> >>> >>> [2020-04-30 21:47:38.765033] I [MSGID: 108026] >>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>> >>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal >>> >on >>> >>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461. sources=[3] sinks=0 1 >>> >2 >>> >>> >>> >>> [2020-04-30 21:47:38.765289] I [MSGID: 108026] >>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >>> >>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal >>> >on >>> >>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296 >>> >>> >>> >>> [2020-04-30 21:47:38.800987] I [MSGID: 108026] >>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>> >>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal >>> >on >>> >>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296. sources=[3] sinks=0 1 >>> >2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> I'm also seeing "remote operation failed" and "writing to >>> >fuse >>> >>> >>> >device >>> >>> >>> >>> failed: No such file or directory" messages >>> >>> >>> >>> [2020-04-30 21:46:34.891957] I [MSGID: 108026] >>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>> >>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata >>> >selfheal >>> >>> >on >>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2] >>> >sinks=3 >>> >>> >>> >>> [2020-04-30 21:45:36.127412] W [MSGID: 114031] >>> >>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed >>> >>> >[Operation >>> >>> >>> >not >>> >>> >>> >>> permitted] >>> >>> >>> >>> [2020-04-30 21:45:36.345924] W [MSGID: 114031] >>> >>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk] >>> >>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed >>> >>> >[Operation >>> >>> >>> >not >>> >>> >>> >>> permitted] >>> >>> >>> >>> [2020-04-30 21:46:35.291853] I [MSGID: 108031] >>> >>> >>> >>> [afr-common.c:2543:afr_local_discovery_cbk] >>> >>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local >>> >read_child >>> >>> >>> >>> androidpolice_data3-client-2 >>> >>> >>> >>> [2020-04-30 21:46:35.977342] I [MSGID: 108026] >>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >>> >>> >>> >>> 0-androidpolice_data3-replicate-0: performing metadata >>> >selfheal >>> >>> >on >>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591 >>> >>> >>> >>> [2020-04-30 21:46:36.006607] I [MSGID: 108026] >>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>> >>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata >>> >selfheal >>> >>> >on >>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2] >>> >sinks=3 >>> >>> >>> >>> [2020-04-30 21:46:37.245599] E >>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>> >>> >>> >>> (--> >>> >>> >>> >>> >>> >>> >>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d] >>> >>> >>> >>> (--> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a] >>> >>> >>> >>> (--> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb] >>> >>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--> >>> >>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] ))))) >>> >>> >0-glusterfs-fuse: >>> >>> >>> >>> writing to fuse device failed: No such file or directory >>> >>> >>> >>> [2020-04-30 21:46:50.864797] E >>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>> >>> >>> >>> (--> >>> >>> >>> >>> >>> >>> >>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d] >>> >>> >>> >>> (--> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a] >>> >>> >>> >>> (--> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb] >>> >>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--> >>> >>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] ))))) >>> >>> >0-glusterfs-fuse: >>> >>> >>> >>> writing to fuse device failed: No such file or directory >>> >>> >>> >>> >>> >>> >>> >>> The number of items being healed is going up and down >>> >wildly, >>> >>> >from 0 >>> >>> >>> >to >>> >>> >>> >>> 8000+ and sometimes taking a really long time to return a >>> >value. >>> >>> >I'm >>> >>> >>> >really >>> >>> >>> >>> worried as this is a production system, and I didn't observe >>> >>> >this in >>> >>> >>> >our >>> >>> >>> >>> test system. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> gluster v heal apkmirror_data1 info summary >>> >>> >>> >>> Brick nexus2:/mnt/nexus2_block1/apkmirror_data1 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 27 >>> >>> >>> >>> Number of entries in heal pending: 27 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> Brick forge:/mnt/forge_block1/apkmirror_data1 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 27 >>> >>> >>> >>> Number of entries in heal pending: 27 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> Brick hive:/mnt/hive_block1/apkmirror_data1 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 27 >>> >>> >>> >>> Number of entries in heal pending: 27 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> Brick citadel:/mnt/citadel_block1/apkmirror_data1 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 8540 >>> >>> >>> >>> Number of entries in heal pending: 8540 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> gluster v heal androidpolice_data3 info summary >>> >>> >>> >>> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 1 >>> >>> >>> >>> Number of entries in heal pending: 1 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> Brick forge:/mnt/forge_block4/androidpolice_data3 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 1 >>> >>> >>> >>> Number of entries in heal pending: 1 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> Brick hive:/mnt/hive_block4/androidpolice_data3 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 1 >>> >>> >>> >>> Number of entries in heal pending: 1 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> Brick citadel:/mnt/citadel_block4/androidpolice_data3 >>> >>> >>> >>> Status: Connected >>> >>> >>> >>> Total Number of entries: 1149 >>> >>> >>> >>> Number of entries in heal pending: 1149 >>> >>> >>> >>> Number of entries in split-brain: 0 >>> >>> >>> >>> Number of entries possibly healing: 0 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> What should I do at this point? The files I tested seem to >>> >be >>> >>> >>> >replicating >>> >>> >>> >>> correctly, but I don't know if it's the case for all of >>> >them, >>> >>> >and >>> >>> >>> >the heals >>> >>> >>> >>> going up and down, and all these log messages are making me >>> >very >>> >>> >>> >nervous. >>> >>> >>> >>> >>> >>> >>> >>> Thank you. >>> >>> >>> >>> >>> >>> >>> >>> Sincerely, >>> >>> >>> >>> Artem >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> >>> Founder, Android Police <http://www.androidpolice.com>, APK >>> >>> >Mirror >>> >>> >>> >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> >>> >>> >>> beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>> >>> >>> >>> >>> >>> >>> >> >>> >>> >>> >>> >>> >>> I's not supported , but usually it works. >>> >>> >>> >>> >>> >>> In worst case scenario, you can remove the node, wipe gluster >>> >on >>> >>> >the >>> >>> >>> node, reinstall the packages and add it - it will require full >>> >heal >>> >>> >of the >>> >>> >>> brick and as you have previously reported could lead to >>> >performance >>> >>> >>> degradation. >>> >>> >>> >>> >>> >>> I think you are on SLES, but I could be wrong . Do you have >>> >btrfs or >>> >>> >LVM >>> >>> >>> snapshots to revert from ? >>> >>> >>> >>> >>> >>> Best Regards, >>> >>> >>> Strahil Nikolov >>> >>> >>> >>> >>> >> >>> >>> >>> >>> Hi Artem, >>> >>> >>> >>> You can increase the brick log level following >>> >>> >>> > >>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level >>> >>> but keep in mind that logs grow quite fast - so don't keep them >>> >above the >>> >>> current level for too much time. >>> >>> >>> >>> >>> >>> Do you have a geo replication running ? >>> >>> >>> >>> About the migration issue - I have no clue why this happened. Last >>> >time I >>> >>> skipped a major release(3.12 to 5.5) I got a huge trouble (all >>> >files >>> >>> ownership was switched to root) and I have the feeling that it >>> >won't >>> >>> happen again if you go through v6. >>> >>> >>> >>> Best Regards, >>> >>> Strahil Nikolov >>> >>> >>> >> ________ >>> >> >>> >> >>> >> >>> >> Community Meeting Calendar: >>> >> >>> >> Schedule - >>> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> >> Bridge: https://bluejeans.com/441850968 >>> >> >>> >> Gluster-users mailing list >>> >> Gluster-users at gluster.org >>> >> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >>> >>> Hey Artem, >>> >>> I just checked if the 'replica 4' is causing the issue , but that's not >>> true (tested with 1 node down, but it's the same situation). >>> >>> I created 4 VMs on CentOS 7 & Gluster v7.5 (brick has only noatime mount >>> option) and created a 'replica 4' volume. >>> Then I created a dir and placed 50000 very small files there via: >>> for i in {1..50000}; do echo $RANDOM > $i ; done >>> >>> The find command 'finds' them in 4s and after some tuning I have managed >>> to lower it to 2.5s. >>> >>> What has caused some improvement was: >>> A) Activated the rhgs-random-io tuned profile which you can take from >>> ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm >>> B) using noatime for the mount option and if you use SELINUX you could >>> use the 'context=system_u:object_r:glusterd_brick_t:s0' mount option to >>> prevent selinux context lookups >>> C) Activation of the gluster group of settings 'metadata-cache' or >>> 'nl-cache' brought 'find' to the same results - lowered from 3.5s to 2.5s >>> after an initial run. >>> >>> I know that I'm not compairing apples to apples , but still it might >>> help. >>> >>> I would like to learn what actually gluster does when a 'find' or 'ls' >>> is invoked, as I doubt it just executes it on the bricks. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200621/b7b683eb/attachment.html>
Mahdi Adnan
2020-Jun-21 19:02 UTC
[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages
I think if it's reproducible than someone can look into it, can you list the steps to reproduce it? On Sun, Jun 21, 2020 at 9:12 PM Artem Russakovskii <archon810 at gmail.com> wrote:> There's been 0 progress or attention to this issue in a month on github or > otherwise. > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | @ArtemR <http://twitter.com/ArtemR> > > > On Thu, May 21, 2020 at 12:43 PM Artem Russakovskii <archon810 at gmail.com> > wrote: > >> I've also moved this to github: >> https://github.com/gluster/glusterfs/issues/1257. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | @ArtemR <http://twitter.com/ArtemR> >> >> >> On Fri, May 15, 2020 at 2:51 PM Artem Russakovskii <archon810 at gmail.com> >> wrote: >> >>> Hi, >>> >>> I see the team met up recently and one of the discussed items was issues >>> upgrading to v7. What were the results of this discussion? >>> >>> Is the team going to respond to this thread with their thoughts and >>> analysis? >>> >>> Thanks. >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>> >>> >>> On Mon, May 4, 2020 at 10:23 PM Strahil Nikolov <hunter86_bg at yahoo.com> >>> wrote: >>> >>>> On May 4, 2020 4:26:32 PM GMT+03:00, Amar Tumballi <amar at kadalu.io> >>>> wrote: >>>> >On Sat, May 2, 2020 at 10:49 PM Artem Russakovskii >>>> ><archon810 at gmail.com> >>>> >wrote: >>>> > >>>> >> I don't have geo replication. >>>> >> >>>> >> Still waiting for someone from the gluster team to chime in. They >>>> >used to >>>> >> be a lot more responsive here. Do you know if there is a holiday >>>> >perhaps, >>>> >> or have the working hours been cut due to Coronavirus currently? >>>> >> >>>> >> >>>> >It was Holiday on May 1st, and 2nd and 3rd were Weekend days! And also >>>> >I >>>> >guess many of Developers from Red Hat were attending Virtual Summit! >>>> > >>>> > >>>> > >>>> >> I'm not inclined to try a v6 upgrade without their word first. >>>> >> >>>> > >>>> >Fair bet! I will bring this topic in one of the community meetings, and >>>> >ask >>>> >developers if they have some feedback! I personally have not seen these >>>> >errors, and don't have a hunch on which patch would have caused an >>>> >increase >>>> >in logs! >>>> > >>>> >-Amar >>>> > >>>> > >>>> >> >>>> >> On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <hunter86_bg at yahoo.com >>>> > >>>> >> wrote: >>>> >> >>>> >>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii < >>>> >>> archon810 at gmail.com> wrote: >>>> >>> >The good news is the downgrade seems to have worked and was >>>> >painless. >>>> >>> > >>>> >>> >zypper install --oldpackage glusterfs-5.13, restart gluster, and >>>> >almost >>>> >>> >immediately there are no heal pending entries anymore. >>>> >>> > >>>> >>> >The only things still showing up in the logs, besides some healing >>>> >is >>>> >>> >0-glusterfs-fuse: >>>> >>> >writing to fuse device failed: No such file or directory: >>>> >>> >==> mnt-androidpolice_data3.log <=>>>> >>> >[2020-05-01 16:54:21.085643] E >>>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>>> >>> >(--> >>>> >>> >>>> >>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d] >>>> >>> >(--> >>>> >>> >>>> >>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a] >>>> >>> >(--> >>>> >>> >>>> >>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb] >>>> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--> >>>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] ))))) >>>> >0-glusterfs-fuse: >>>> >>> >writing to fuse device failed: No such file or directory >>>> >>> >==> mnt-apkmirror_data1.log <=>>>> >>> >[2020-05-01 16:54:21.268842] E >>>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>>> >>> >(--> >>>> >>> >>>> >>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d] >>>> >>> >(--> >>>> >>> >>>> >>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a] >>>> >>> >(--> >>>> >>> >>>> >>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb] >>>> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (--> >>>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] ))))) >>>> >0-glusterfs-fuse: >>>> >>> >writing to fuse device failed: No such file or directory >>>> >>> > >>>> >>> >It'd be very helpful if it had more info about what failed to write >>>> >and >>>> >>> >why. >>>> >>> > >>>> >>> >I'd still really love to see the analysis of this failed upgrade >>>> >from >>>> >>> >core >>>> >>> >gluster maintainers to see what needs fixing and how we can upgrade >>>> >in >>>> >>> >the >>>> >>> >future. >>>> >>> > >>>> >>> >Thanks. >>>> >>> > >>>> >>> >Sincerely, >>>> >>> >Artem >>>> >>> > >>>> >>> >-- >>>> >>> >Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC >>>> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>>> >>> > >>>> >>> > >>>> >>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii >>>> ><archon810 at gmail.com> >>>> >>> >wrote: >>>> >>> > >>>> >>> >> I do not have snapshots, no. I have a general file based backup, >>>> >but >>>> >>> >also >>>> >>> >> the other 3 nodes are up. >>>> >>> >> >>>> >>> >> OpenSUSE 15.1. >>>> >>> >> >>>> >>> >> If I try to downgrade and it doesn't work, what's the brick >>>> >>> >replacement >>>> >>> >> scenario - is this still accurate? >>>> >>> >> >>>> >>> > >>>> >>> >>>> > >>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick >>>> >>> >> >>>> >>> >> Any feedback about the issues themselves yet please? >>>> >Specifically, is >>>> >>> >> there a chance this is happening because of the mismatched >>>> >gluster >>>> >>> >> versions? Though, what's the solution then? >>>> >>> >> >>>> >>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov >>>> ><hunter86_bg at yahoo.com> >>>> >>> >> wrote: >>>> >>> >> >>>> >>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem Russakovskii < >>>> >>> >>> archon810 at gmail.com> wrote: >>>> >>> >>> >If more time is needed to analyze this, is this an option? Shut >>>> >>> >down >>>> >>> >>> >7.5, >>>> >>> >>> >downgrade it back to 5.13 and restart, or would this screw >>>> >>> >something up >>>> >>> >>> >badly? I didn't up the op-version yet. >>>> >>> >>> > >>>> >>> >>> >Thanks. >>>> >>> >>> > >>>> >>> >>> >Sincerely, >>>> >>> >>> >Artem >>>> >>> >>> > >>>> >>> >>> >-- >>>> >>> >>> >Founder, Android Police <http://www.androidpolice.com>, APK >>>> >Mirror >>>> >>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC >>>> >>> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>>> >>> >>> > >>>> >>> >>> > >>>> >>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem Russakovskii >>>> >>> >>> ><archon810 at gmail.com> >>>> >>> >>> >wrote: >>>> >>> >>> > >>>> >>> >>> >> The number of heal pending on citadel, the one that was >>>> >upgraded >>>> >>> >to >>>> >>> >>> >7.5, >>>> >>> >>> >> has now gone to 10s of thousands and continues to go up. >>>> >>> >>> >> >>>> >>> >>> >> Sincerely, >>>> >>> >>> >> Artem >>>> >>> >>> >> >>>> >>> >>> >> -- >>>> >>> >>> >> Founder, Android Police <http://www.androidpolice.com>, APK >>>> >>> >Mirror >>>> >>> >>> >> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> >>> >>> >> beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>>> >>> >>> >> >>>> >>> >>> >> >>>> >>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem Russakovskii >>>> >>> >>> ><archon810 at gmail.com> >>>> >>> >>> >> wrote: >>>> >>> >>> >> >>>> >>> >>> >>> Hi all, >>>> >>> >>> >>> >>>> >>> >>> >>> Today, I decided to upgrade one of the four servers >>>> >(citadel) we >>>> >>> >>> >have to >>>> >>> >>> >>> 7.5 from 5.13. There are 2 volumes, 1x4 replicate, and fuse >>>> >>> >mounts >>>> >>> >>> >(I sent >>>> >>> >>> >>> the full details earlier in another message). If everything >>>> >>> >looked >>>> >>> >>> >OK, I >>>> >>> >>> >>> would have proceeded the rolling upgrade for all of them, >>>> >>> >following >>>> >>> >>> >the >>>> >>> >>> >>> full heal. >>>> >>> >>> >>> >>>> >>> >>> >>> However, as soon as I upgraded and restarted, the logs >>>> >filled >>>> >>> >with >>>> >>> >>> >>> messages like these: >>>> >>> >>> >>> >>>> >>> >>> >>> [2020-04-30 21:39:21.316149] E >>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>>> >actor >>>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>>> >>> >>> >>> [2020-04-30 21:39:21.382891] E >>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>>> >actor >>>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>>> >>> >>> >>> [2020-04-30 21:39:21.442440] E >>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>>> >actor >>>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>>> >>> >>> >>> [2020-04-30 21:39:21.445587] E >>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>>> >actor >>>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>>> >>> >>> >>> [2020-04-30 21:39:21.571398] E >>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>>> >actor >>>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>>> >>> >>> >>> [2020-04-30 21:39:21.668192] E >>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc >>>> >actor >>>> >>> >>> >>> (1298437:400:17) failed to complete successfully >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> The message "I [MSGID: 108031] >>>> >>> >>> >>> [afr-common.c:2581:afr_local_discovery_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local >>>> >read_child >>>> >>> >>> >>> androidpolice_data3-client-3" repeated 10 times between >>>> >>> >[2020-04-30 >>>> >>> >>> >>> 21:46:41.854675] and [2020-04-30 21:48:20.206323] >>>> >>> >>> >>> The message "W [MSGID: 114031] >>>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed >>>> >>> >[Transport >>>> >>> >>> >endpoint >>>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30 >>>> >>> >>> >21:46:32.129567] >>>> >>> >>> >>> and [2020-04-30 21:48:29.905008] >>>> >>> >>> >>> The message "W [MSGID: 114031] >>>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed >>>> >>> >[Transport >>>> >>> >>> >endpoint >>>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30 >>>> >>> >>> >21:46:32.129602] >>>> >>> >>> >>> and [2020-04-30 21:48:29.905040] >>>> >>> >>> >>> The message "W [MSGID: 114031] >>>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-client-2: remote operation failed >>>> >>> >[Transport >>>> >>> >>> >endpoint >>>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30 >>>> >>> >>> >21:46:32.129512] >>>> >>> >>> >>> and [2020-04-30 21:48:29.905047] >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> Once in a while, I'm seeing this: >>>> >>> >>> >>> ==> bricks/mnt-hive_block4-androidpolice_data3.log <=>>>> >>> >>> >>> [2020-04-30 21:45:54.251637] I [MSGID: 115072] >>>> >>> >>> >>> [server-rpc-fops_v2.c:1681:server4_setattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-server: 5725811: SETATTR / >>>> >>> >>> >>> >>>> >>> >>> > >>>> >>> >>> >>>> >>> > >>>> >>> >>>> > >>>> androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png >>>> >>> >>> >>> (d4556eb4-f15b-412c-a42a-32b4438af557), client: >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>>> >>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1, >>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>>> >not >>>> >>> >>> >permitted] >>>> >>> >>> >>> [2020-04-30 21:49:10.439701] I [MSGID: 115072] >>>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-server: 201833: SETATTR / >>>> >>> >>> >>> androidpolice.com/public/wp-content/uploads >>>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client: >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>>> >>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2, >>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>>> >not >>>> >>> >>> >permitted] >>>> >>> >>> >>> [2020-04-30 21:49:10.453724] I [MSGID: 115072] >>>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-server: 201842: SETATTR / >>>> >>> >>> >>> androidpolice.com/public/wp-content/uploads >>>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client: >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>>> >>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2, >>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>>> >not >>>> >>> >>> >permitted] >>>> >>> >>> >>> [2020-04-30 21:49:16.224662] I [MSGID: 115072] >>>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-server: 202865: SETATTR / >>>> >>> >>> >>> androidpolice.com/public/wp-content/uploads >>>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client: >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>>> >>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2, >>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation >>>> >not >>>> >>> >>> >permitted] >>>> >>> >>> >>> >>>> >>> >>> >>> There's also lots of self-healing happening that I didn't >>>> >expect >>>> >>> >at >>>> >>> >>> >all, >>>> >>> >>> >>> since the upgrade only took ~10-15s. >>>> >>> >>> >>> [2020-04-30 21:47:38.714448] I [MSGID: 108026] >>>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal >>>> >on >>>> >>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461 >>>> >>> >>> >>> [2020-04-30 21:47:38.765033] I [MSGID: 108026] >>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal >>>> >on >>>> >>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461. sources=[3] sinks=0 1 >>>> >2 >>>> >>> >>> >>> [2020-04-30 21:47:38.765289] I [MSGID: 108026] >>>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal >>>> >on >>>> >>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296 >>>> >>> >>> >>> [2020-04-30 21:47:38.800987] I [MSGID: 108026] >>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal >>>> >on >>>> >>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296. sources=[3] sinks=0 1 >>>> >2 >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> I'm also seeing "remote operation failed" and "writing to >>>> >fuse >>>> >>> >>> >device >>>> >>> >>> >>> failed: No such file or directory" messages >>>> >>> >>> >>> [2020-04-30 21:46:34.891957] I [MSGID: 108026] >>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata >>>> >selfheal >>>> >>> >on >>>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2] >>>> >sinks=3 >>>> >>> >>> >>> [2020-04-30 21:45:36.127412] W [MSGID: 114031] >>>> >>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed >>>> >>> >[Operation >>>> >>> >>> >not >>>> >>> >>> >>> permitted] >>>> >>> >>> >>> [2020-04-30 21:45:36.345924] W [MSGID: 114031] >>>> >>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed >>>> >>> >[Operation >>>> >>> >>> >not >>>> >>> >>> >>> permitted] >>>> >>> >>> >>> [2020-04-30 21:46:35.291853] I [MSGID: 108031] >>>> >>> >>> >>> [afr-common.c:2543:afr_local_discovery_cbk] >>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local >>>> >read_child >>>> >>> >>> >>> androidpolice_data3-client-2 >>>> >>> >>> >>> [2020-04-30 21:46:35.977342] I [MSGID: 108026] >>>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: performing metadata >>>> >selfheal >>>> >>> >on >>>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591 >>>> >>> >>> >>> [2020-04-30 21:46:36.006607] I [MSGID: 108026] >>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal] >>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata >>>> >selfheal >>>> >>> >on >>>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2] >>>> >sinks=3 >>>> >>> >>> >>> [2020-04-30 21:46:37.245599] E >>>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>>> >>> >>> >>> (--> >>>> >>> >>> >>>> >>> >>>> >>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d] >>>> >>> >>> >>> (--> >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a] >>>> >>> >>> >>> (--> >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb] >>>> >>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--> >>>> >>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] ))))) >>>> >>> >0-glusterfs-fuse: >>>> >>> >>> >>> writing to fuse device failed: No such file or directory >>>> >>> >>> >>> [2020-04-30 21:46:50.864797] E >>>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W] >>>> >>> >>> >>> (--> >>>> >>> >>> >>>> >>> >>>> >>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d] >>>> >>> >>> >>> (--> >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a] >>>> >>> >>> >>> (--> >>>> >>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb] >>>> >>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--> >>>> >>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] ))))) >>>> >>> >0-glusterfs-fuse: >>>> >>> >>> >>> writing to fuse device failed: No such file or directory >>>> >>> >>> >>> >>>> >>> >>> >>> The number of items being healed is going up and down >>>> >wildly, >>>> >>> >from 0 >>>> >>> >>> >to >>>> >>> >>> >>> 8000+ and sometimes taking a really long time to return a >>>> >value. >>>> >>> >I'm >>>> >>> >>> >really >>>> >>> >>> >>> worried as this is a production system, and I didn't observe >>>> >>> >this in >>>> >>> >>> >our >>>> >>> >>> >>> test system. >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> gluster v heal apkmirror_data1 info summary >>>> >>> >>> >>> Brick nexus2:/mnt/nexus2_block1/apkmirror_data1 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 27 >>>> >>> >>> >>> Number of entries in heal pending: 27 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> Brick forge:/mnt/forge_block1/apkmirror_data1 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 27 >>>> >>> >>> >>> Number of entries in heal pending: 27 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> Brick hive:/mnt/hive_block1/apkmirror_data1 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 27 >>>> >>> >>> >>> Number of entries in heal pending: 27 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> Brick citadel:/mnt/citadel_block1/apkmirror_data1 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 8540 >>>> >>> >>> >>> Number of entries in heal pending: 8540 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> gluster v heal androidpolice_data3 info summary >>>> >>> >>> >>> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 1 >>>> >>> >>> >>> Number of entries in heal pending: 1 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> Brick forge:/mnt/forge_block4/androidpolice_data3 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 1 >>>> >>> >>> >>> Number of entries in heal pending: 1 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> Brick hive:/mnt/hive_block4/androidpolice_data3 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 1 >>>> >>> >>> >>> Number of entries in heal pending: 1 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> Brick citadel:/mnt/citadel_block4/androidpolice_data3 >>>> >>> >>> >>> Status: Connected >>>> >>> >>> >>> Total Number of entries: 1149 >>>> >>> >>> >>> Number of entries in heal pending: 1149 >>>> >>> >>> >>> Number of entries in split-brain: 0 >>>> >>> >>> >>> Number of entries possibly healing: 0 >>>> >>> >>> >>> >>>> >>> >>> >>> >>>> >>> >>> >>> What should I do at this point? The files I tested seem to >>>> >be >>>> >>> >>> >replicating >>>> >>> >>> >>> correctly, but I don't know if it's the case for all of >>>> >them, >>>> >>> >and >>>> >>> >>> >the heals >>>> >>> >>> >>> going up and down, and all these log messages are making me >>>> >very >>>> >>> >>> >nervous. >>>> >>> >>> >>> >>>> >>> >>> >>> Thank you. >>>> >>> >>> >>> >>>> >>> >>> >>> Sincerely, >>>> >>> >>> >>> Artem >>>> >>> >>> >>> >>>> >>> >>> >>> -- >>>> >>> >>> >>> Founder, Android Police <http://www.androidpolice.com>, APK >>>> >>> >Mirror >>>> >>> >>> >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> >>> >>> >>> beerpla.net | @ArtemR <http://twitter.com/ArtemR> >>>> >>> >>> >>> >>>> >>> >>> >> >>>> >>> >>> >>>> >>> >>> I's not supported , but usually it works. >>>> >>> >>> >>>> >>> >>> In worst case scenario, you can remove the node, wipe gluster >>>> >on >>>> >>> >the >>>> >>> >>> node, reinstall the packages and add it - it will require full >>>> >heal >>>> >>> >of the >>>> >>> >>> brick and as you have previously reported could lead to >>>> >performance >>>> >>> >>> degradation. >>>> >>> >>> >>>> >>> >>> I think you are on SLES, but I could be wrong . Do you have >>>> >btrfs or >>>> >>> >LVM >>>> >>> >>> snapshots to revert from ? >>>> >>> >>> >>>> >>> >>> Best Regards, >>>> >>> >>> Strahil Nikolov >>>> >>> >>> >>>> >>> >> >>>> >>> >>>> >>> Hi Artem, >>>> >>> >>>> >>> You can increase the brick log level following >>>> >>> >>>> > >>>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level >>>> >>> but keep in mind that logs grow quite fast - so don't keep them >>>> >above the >>>> >>> current level for too much time. >>>> >>> >>>> >>> >>>> >>> Do you have a geo replication running ? >>>> >>> >>>> >>> About the migration issue - I have no clue why this happened. Last >>>> >time I >>>> >>> skipped a major release(3.12 to 5.5) I got a huge trouble (all >>>> >files >>>> >>> ownership was switched to root) and I have the feeling that it >>>> >won't >>>> >>> happen again if you go through v6. >>>> >>> >>>> >>> Best Regards, >>>> >>> Strahil Nikolov >>>> >>> >>>> >> ________ >>>> >> >>>> >> >>>> >> >>>> >> Community Meeting Calendar: >>>> >> >>>> >> Schedule - >>>> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> >> Bridge: https://bluejeans.com/441850968 >>>> >> >>>> >> Gluster-users mailing list >>>> >> Gluster-users at gluster.org >>>> >> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >> >>>> >>>> Hey Artem, >>>> >>>> I just checked if the 'replica 4' is causing the issue , but that's not >>>> true (tested with 1 node down, but it's the same situation). >>>> >>>> I created 4 VMs on CentOS 7 & Gluster v7.5 (brick has only noatime >>>> mount option) and created a 'replica 4' volume. >>>> Then I created a dir and placed 50000 very small files there via: >>>> for i in {1..50000}; do echo $RANDOM > $i ; done >>>> >>>> The find command 'finds' them in 4s and after some tuning I have >>>> managed to lower it to 2.5s. >>>> >>>> What has caused some improvement was: >>>> A) Activated the rhgs-random-io tuned profile which you can take from >>>> ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm >>>> B) using noatime for the mount option and if you use SELINUX you could >>>> use the 'context=system_u:object_r:glusterd_brick_t:s0' mount option to >>>> prevent selinux context lookups >>>> C) Activation of the gluster group of settings 'metadata-cache' or >>>> 'nl-cache' brought 'find' to the same results - lowered from 3.5s to 2.5s >>>> after an initial run. >>>> >>>> I know that I'm not compairing apples to apples , but still it might >>>> help. >>>> >>>> I would like to learn what actually gluster does when a 'find' or 'ls' >>>> is invoked, as I doubt it just executes it on the bricks. >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>> ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- Respectfully Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200621/edc14115/attachment.html>