thr3ads.net - Gluster users - [Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages [May 2020]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2020-May-15 21:51 UTC

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

Hi,

I see the team met up recently and one of the discussed items was issues
upgrading to v7. What were the results of this discussion?

Is the team going to respond to this thread with their thoughts and
analysis?

Thanks.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | @ArtemR <http://twitter.com/ArtemR>


On Mon, May 4, 2020 at 10:23 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:
> On May 4, 2020 4:26:32 PM GMT+03:00, Amar Tumballi <amar at
kadalu.io> wrote:
> >On Sat, May 2, 2020 at 10:49 PM Artem Russakovskii
> ><archon810 at gmail.com>
> >wrote:
> >
> >> I don't have geo replication.
> >>
> >> Still waiting for someone from the gluster team to chime in. They
> >used to
> >> be a lot more responsive here. Do you know if there is a holiday
> >perhaps,
> >> or have the working hours been cut due to Coronavirus currently?
> >>
> >>
> >It was Holiday on May 1st, and 2nd and 3rd were Weekend days!  And also
> >I
> >guess many of Developers from Red Hat were attending Virtual Summit!
> >
> >
> >
> >> I'm not inclined to try a v6 upgrade without their word first.
> >>
> >
> >Fair bet! I will bring this topic in one of the community meetings, and
> >ask
> >developers if they have some feedback! I personally have not seen these
> >errors, and don't have a hunch on which patch would have caused an
> >increase
> >in logs!
> >
> >-Amar
> >
> >
> >>
> >> On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <hunter86_bg at
yahoo.com>
> >> wrote:
> >>
> >>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii <
> >>> archon810 at gmail.com> wrote:
> >>> >The good news is the downgrade seems to have worked and
was
> >painless.
> >>> >
> >>> >zypper install --oldpackage glusterfs-5.13, restart
gluster, and
> >almost
> >>> >immediately there are no heal pending entries anymore.
> >>> >
> >>> >The only things still showing up in the logs, besides some
healing
> >is
> >>> >0-glusterfs-fuse:
> >>> >writing to fuse device failed: No such file or directory:
> >>> >==> mnt-androidpolice_data3.log <=> >>>
>[2020-05-01 16:54:21.085643] E
> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >>> >(-->
> >>>
>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
> >>> >(-->
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
> >>> >(-->
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9]
(-->
> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
> >0-glusterfs-fuse:
> >>> >writing to fuse device failed: No such file or directory
> >>> >==> mnt-apkmirror_data1.log <=> >>>
>[2020-05-01 16:54:21.268842] E
> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >>> >(-->
> >>>
>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d]
> >>> >(-->
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a]
> >>> >(-->
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb]
> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9]
(-->
> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] )))))
> >0-glusterfs-fuse:
> >>> >writing to fuse device failed: No such file or directory
> >>> >
> >>> >It'd be very helpful if it had more info about what
failed to write
> >and
> >>> >why.
> >>> >
> >>> >I'd still really love to see the analysis of this
failed upgrade
> >from
> >>> >core
> >>> >gluster maintainers to see what needs fixing and how we
can upgrade
> >in
> >>> >the
> >>> >future.
> >>> >
> >>> >Thanks.
> >>> >
> >>> >Sincerely,
> >>> >Artem
> >>> >
> >>> >--
> >>> >Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC
> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> >>> >
> >>> >
> >>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii
> ><archon810 at gmail.com>
> >>> >wrote:
> >>> >
> >>> >> I do not have snapshots, no. I have a general file
based backup,
> >but
> >>> >also
> >>> >> the other 3 nodes are up.
> >>> >>
> >>> >> OpenSUSE 15.1.
> >>> >>
> >>> >> If I try to downgrade and it doesn't work,
what's the brick
> >>> >replacement
> >>> >> scenario - is this still accurate?
> >>> >>
> >>> >
> >>>
> >
>
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
> >>> >>
> >>> >> Any feedback about the issues themselves yet please?
> >Specifically, is
> >>> >> there a chance this is happening because of the
mismatched
> >gluster
> >>> >> versions? Though, what's the solution then?
> >>> >>
> >>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov
> ><hunter86_bg at yahoo.com>
> >>> >> wrote:
> >>> >>
> >>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem
Russakovskii <
> >>> >>> archon810 at gmail.com> wrote:
> >>> >>> >If more time is needed to analyze this, is
this an option? Shut
> >>> >down
> >>> >>> >7.5,
> >>> >>> >downgrade it back to 5.13 and restart, or
would this screw
> >>> >something up
> >>> >>> >badly? I didn't up the op-version yet.
> >>> >>> >
> >>> >>> >Thanks.
> >>> >>> >
> >>> >>> >Sincerely,
> >>> >>> >Artem
> >>> >>> >
> >>> >>> >--
> >>> >>> >Founder, Android Police
<http://www.androidpolice.com>, APK
> >Mirror
> >>> >>> ><http://www.apkmirror.com/>, Illogical
Robot LLC
> >>> >>> >beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
> >>> >>> >
> >>> >>> >
> >>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem
Russakovskii
> >>> >>> ><archon810 at gmail.com>
> >>> >>> >wrote:
> >>> >>> >
> >>> >>> >> The number of heal pending on citadel,
the one that was
> >upgraded
> >>> >to
> >>> >>> >7.5,
> >>> >>> >> has now gone to 10s of thousands and
continues to go up.
> >>> >>> >>
> >>> >>> >> Sincerely,
> >>> >>> >> Artem
> >>> >>> >>
> >>> >>> >> --
> >>> >>> >> Founder, Android Police
<http://www.androidpolice.com>, APK
> >>> >Mirror
> >>> >>> >> <http://www.apkmirror.com/>,
Illogical Robot LLC
> >>> >>> >> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem
Russakovskii
> >>> >>> ><archon810 at gmail.com>
> >>> >>> >> wrote:
> >>> >>> >>
> >>> >>> >>> Hi all,
> >>> >>> >>>
> >>> >>> >>> Today, I decided to upgrade one of
the four servers
> >(citadel) we
> >>> >>> >have to
> >>> >>> >>> 7.5 from 5.13. There are 2 volumes,
1x4 replicate, and fuse
> >>> >mounts
> >>> >>> >(I sent
> >>> >>> >>> the full details earlier in another
message). If everything
> >>> >looked
> >>> >>> >OK, I
> >>> >>> >>> would have proceeded the rolling
upgrade for all of them,
> >>> >following
> >>> >>> >the
> >>> >>> >>> full heal.
> >>> >>> >>>
> >>> >>> >>> However, as soon as I upgraded and
restarted, the logs
> >filled
> >>> >with
> >>> >>> >>> messages like these:
> >>> >>> >>>
> >>> >>> >>> [2020-04-30 21:39:21.316149] E
> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> >actor
> >>> >>> >>> (1298437:400:17) failed to complete
successfully
> >>> >>> >>> [2020-04-30 21:39:21.382891] E
> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> >actor
> >>> >>> >>> (1298437:400:17) failed to complete
successfully
> >>> >>> >>> [2020-04-30 21:39:21.442440] E
> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> >actor
> >>> >>> >>> (1298437:400:17) failed to complete
successfully
> >>> >>> >>> [2020-04-30 21:39:21.445587] E
> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> >actor
> >>> >>> >>> (1298437:400:17) failed to complete
successfully
> >>> >>> >>> [2020-04-30 21:39:21.571398] E
> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> >actor
> >>> >>> >>> (1298437:400:17) failed to complete
successfully
> >>> >>> >>> [2020-04-30 21:39:21.668192] E
> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> >actor
> >>> >>> >>> (1298437:400:17) failed to complete
successfully
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>> The message "I [MSGID: 108031]
> >>> >>> >>>
[afr-common.c:2581:afr_local_discovery_cbk]
> >>> >>> >>> 0-androidpolice_data3-replicate-0:
selecting local
> >read_child
> >>> >>> >>> androidpolice_data3-client-3"
repeated 10 times between
> >>> >[2020-04-30
> >>> >>> >>> 21:46:41.854675] and [2020-04-30
21:48:20.206323]
> >>> >>> >>> The message "W [MSGID: 114031]
> >>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-client-1:
remote operation failed
> >>> >[Transport
> >>> >>> >endpoint
> >>> >>> >>> is not connected]" repeated 264
times between [2020-04-30
> >>> >>> >21:46:32.129567]
> >>> >>> >>> and [2020-04-30 21:48:29.905008]
> >>> >>> >>> The message "W [MSGID: 114031]
> >>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-client-0:
remote operation failed
> >>> >[Transport
> >>> >>> >endpoint
> >>> >>> >>> is not connected]" repeated 264
times between [2020-04-30
> >>> >>> >21:46:32.129602]
> >>> >>> >>> and [2020-04-30 21:48:29.905040]
> >>> >>> >>> The message "W [MSGID: 114031]
> >>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-client-2:
remote operation failed
> >>> >[Transport
> >>> >>> >endpoint
> >>> >>> >>> is not connected]" repeated 264
times between [2020-04-30
> >>> >>> >21:46:32.129512]
> >>> >>> >>> and [2020-04-30 21:48:29.905047]
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>> Once in a while, I'm seeing
this:
> >>> >>> >>> ==>
bricks/mnt-hive_block4-androidpolice_data3.log <=> >>>
>>> >>> [2020-04-30 21:45:54.251637] I [MSGID: 115072]
> >>> >>> >>>
[server-rpc-fops_v2.c:1681:server4_setattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-server:
5725811: SETATTR /
> >>> >>> >>>
> >>> >>> >
> >>> >>>
> >>> >
> >>>
> >
>
androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png
> >>> >>> >>>
(d4556eb4-f15b-412c-a42a-32b4438af557), client:
> >>> >>> >>>
> >>> >>>
> >>> >>>
> >>>
> >>>
>
>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1,
> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
> >not
> >>> >>> >permitted]
> >>> >>> >>> [2020-04-30 21:49:10.439701] I
[MSGID: 115072]
> >>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-server:
201833: SETATTR /
> >>> >>> >>>
androidpolice.com/public/wp-content/uploads
> >>> >>> >>>
(2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
> >>> >>> >>>
> >>> >>>
> >>> >>>
> >>>
> >>>
>
>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
> >not
> >>> >>> >permitted]
> >>> >>> >>> [2020-04-30 21:49:10.453724] I
[MSGID: 115072]
> >>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-server:
201842: SETATTR /
> >>> >>> >>>
androidpolice.com/public/wp-content/uploads
> >>> >>> >>>
(2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
> >>> >>> >>>
> >>> >>>
> >>> >>>
> >>>
> >>>
>
>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
> >not
> >>> >>> >permitted]
> >>> >>> >>> [2020-04-30 21:49:16.224662] I
[MSGID: 115072]
> >>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-server:
202865: SETATTR /
> >>> >>> >>>
androidpolice.com/public/wp-content/uploads
> >>> >>> >>>
(2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
> >>> >>> >>>
> >>> >>>
> >>> >>>
> >>>
> >>>
>
>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
> >not
> >>> >>> >permitted]
> >>> >>> >>>
> >>> >>> >>> There's also lots of
self-healing happening that I didn't
> >expect
> >>> >at
> >>> >>> >all,
> >>> >>> >>> since the upgrade only took ~10-15s.
> >>> >>> >>> [2020-04-30 21:47:38.714448] I
[MSGID: 108026]
> >>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> >>> >>> >>> 0-apkmirror_data1-replicate-0:
performing metadata selfheal
> >on
> >>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461
> >>> >>> >>> [2020-04-30 21:47:38.765033] I
[MSGID: 108026]
> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> >>> 0-apkmirror_data1-replicate-0:
Completed metadata selfheal
> >on
> >>> >>> >>>
4a6ba2d7-7ad8-4113-862b-02e4934a3461. sources=[3]  sinks=0 1
> >2
> >>> >>> >>> [2020-04-30 21:47:38.765289] I
[MSGID: 108026]
> >>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> >>> >>> >>> 0-apkmirror_data1-replicate-0:
performing metadata selfheal
> >on
> >>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296
> >>> >>> >>> [2020-04-30 21:47:38.800987] I
[MSGID: 108026]
> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> >>> 0-apkmirror_data1-replicate-0:
Completed metadata selfheal
> >on
> >>> >>> >>>
f3c62a41-1864-4e75-9883-4357a7091296. sources=[3]  sinks=0 1
> >2
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>> I'm also seeing "remote
operation failed" and "writing to
> >fuse
> >>> >>> >device
> >>> >>> >>> failed: No such file or
directory" messages
> >>> >>> >>> [2020-04-30 21:46:34.891957] I
[MSGID: 108026]
> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> >>> 0-androidpolice_data3-replicate-0:
Completed metadata
> >selfheal
> >>> >on
> >>> >>> >>>
2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2]
> >sinks=3
> >>> >>> >>> [2020-04-30 21:45:36.127412] W
[MSGID: 114031]
> >>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-client-0:
remote operation failed
> >>> >[Operation
> >>> >>> >not
> >>> >>> >>> permitted]
> >>> >>> >>> [2020-04-30 21:45:36.345924] W
[MSGID: 114031]
> >>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
> >>> >>> >>> 0-androidpolice_data3-client-1:
remote operation failed
> >>> >[Operation
> >>> >>> >not
> >>> >>> >>> permitted]
> >>> >>> >>> [2020-04-30 21:46:35.291853] I
[MSGID: 108031]
> >>> >>> >>>
[afr-common.c:2543:afr_local_discovery_cbk]
> >>> >>> >>> 0-androidpolice_data3-replicate-0:
selecting local
> >read_child
> >>> >>> >>> androidpolice_data3-client-2
> >>> >>> >>> [2020-04-30 21:46:35.977342] I
[MSGID: 108026]
> >>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> >>> >>> >>> 0-androidpolice_data3-replicate-0:
performing metadata
> >selfheal
> >>> >on
> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591
> >>> >>> >>> [2020-04-30 21:46:36.006607] I
[MSGID: 108026]
> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> >>> 0-androidpolice_data3-replicate-0:
Completed metadata
> >selfheal
> >>> >on
> >>> >>> >>>
2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2]
> >sinks=3
> >>> >>> >>> [2020-04-30 21:46:37.245599] E
> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >>> >>> >>> (-->
> >>> >>>
> >>>
>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
> >>> >>> >>> (-->
> >>> >>> >>>
> >>> >>>
> >>>
>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
> >>> >>> >>> (-->
> >>> >>> >>>
> >>> >>>
> >>>
>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
> >>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
> >>> >>> >>>
/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
> >>> >0-glusterfs-fuse:
> >>> >>> >>> writing to fuse device failed: No
such file or directory
> >>> >>> >>> [2020-04-30 21:46:50.864797] E
> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >>> >>> >>> (-->
> >>> >>>
> >>>
>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
> >>> >>> >>> (-->
> >>> >>> >>>
> >>> >>>
> >>>
>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
> >>> >>> >>> (-->
> >>> >>> >>>
> >>> >>>
> >>>
>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
> >>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
> >>> >>> >>>
/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
> >>> >0-glusterfs-fuse:
> >>> >>> >>> writing to fuse device failed: No
such file or directory
> >>> >>> >>>
> >>> >>> >>> The number of items being healed is
going up and down
> >wildly,
> >>> >from 0
> >>> >>> >to
> >>> >>> >>> 8000+ and sometimes taking a really
long time to return a
> >value.
> >>> >I'm
> >>> >>> >really
> >>> >>> >>> worried as this is a production
system, and I didn't observe
> >>> >this in
> >>> >>> >our
> >>> >>> >>> test system.
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>> gluster v heal apkmirror_data1 info
summary
> >>> >>> >>> Brick
nexus2:/mnt/nexus2_block1/apkmirror_data1
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 27
> >>> >>> >>> Number of entries in heal pending:
27
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>> Brick
forge:/mnt/forge_block1/apkmirror_data1
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 27
> >>> >>> >>> Number of entries in heal pending:
27
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>> Brick
hive:/mnt/hive_block1/apkmirror_data1
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 27
> >>> >>> >>> Number of entries in heal pending:
27
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>> Brick
citadel:/mnt/citadel_block1/apkmirror_data1
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 8540
> >>> >>> >>> Number of entries in heal pending:
8540
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>> gluster v heal androidpolice_data3
info summary
> >>> >>> >>> Brick
nexus2:/mnt/nexus2_block4/androidpolice_data3
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 1
> >>> >>> >>> Number of entries in heal pending: 1
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>> Brick
forge:/mnt/forge_block4/androidpolice_data3
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 1
> >>> >>> >>> Number of entries in heal pending: 1
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>> Brick
hive:/mnt/hive_block4/androidpolice_data3
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 1
> >>> >>> >>> Number of entries in heal pending: 1
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>> Brick
citadel:/mnt/citadel_block4/androidpolice_data3
> >>> >>> >>> Status: Connected
> >>> >>> >>> Total Number of entries: 1149
> >>> >>> >>> Number of entries in heal pending:
1149
> >>> >>> >>> Number of entries in split-brain: 0
> >>> >>> >>> Number of entries possibly healing:
0
> >>> >>> >>>
> >>> >>> >>>
> >>> >>> >>> What should I do at this point? The
files I tested seem to
> >be
> >>> >>> >replicating
> >>> >>> >>> correctly, but I don't know if
it's the case for all of
> >them,
> >>> >and
> >>> >>> >the heals
> >>> >>> >>> going up and down, and all these log
messages are making me
> >very
> >>> >>> >nervous.
> >>> >>> >>>
> >>> >>> >>> Thank you.
> >>> >>> >>>
> >>> >>> >>> Sincerely,
> >>> >>> >>> Artem
> >>> >>> >>>
> >>> >>> >>> --
> >>> >>> >>> Founder, Android Police
<http://www.androidpolice.com>, APK
> >>> >Mirror
> >>> >>> >>> <http://www.apkmirror.com/>,
Illogical Robot LLC
> >>> >>> >>> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
> >>> >>> >>>
> >>> >>> >>
> >>> >>>
> >>> >>> I's not supported  , but usually it works.
> >>> >>>
> >>> >>> In worst case scenario,  you can remove the node,
wipe gluster
> >on
> >>> >the
> >>> >>> node, reinstall the packages and add it - it will
require full
> >heal
> >>> >of the
> >>> >>> brick and as you have previously reported could
lead to
> >performance
> >>> >>> degradation.
> >>> >>>
> >>> >>> I think you are on SLES, but I could be wrong .
Do you have
> >btrfs or
> >>> >LVM
> >>> >>> snapshots to revert from ?
> >>> >>>
> >>> >>> Best Regards,
> >>> >>> Strahil Nikolov
> >>> >>>
> >>> >>
> >>>
> >>> Hi Artem,
> >>>
> >>> You can increase the brick log level following
> >>>
> >
>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
> >>> but keep in mind that logs grow quite fast - so don't keep
them
> >above the
> >>> current level for too much time.
> >>>
> >>>
> >>> Do you have a geo replication running ?
> >>>
> >>> About the migration issue - I have no clue why this happened.
Last
> >time I
> >>> skipped a major release(3.12  to 5.5) I got a huge trouble
(all
> >files
> >>> ownership was switched to root)  and I have the feeling  that
it
> >won't
> >>> happen again if you go through v6.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >> ________
> >>
> >>
> >>
> >> Community Meeting Calendar:
> >>
> >> Schedule -
> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> Bridge: https://bluejeans.com/441850968
> >>
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>
>
> Hey Artem,
>
> I just checked if the 'replica 4' is causing the issue , but
that's not
> true (tested with 1 node down, but it's the same situation).
>
> I created 4 VMs on CentOS 7 & Gluster v7.5 (brick has only noatime
mount
> option) and created a 'replica 4' volume.
> Then I created a dir and placed 50000 very small files there via:
> for i in {1..50000}; do echo $RANDOM > $i ; done
>
> The find command 'finds' them in 4s and after some tuning I have
managed
> to lower it to 2.5s.
>
> What has caused some improvement was:
> A) Activated the rhgs-random-io tuned profile which you can take from
>
ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm
> B) using noatime for the mount option and if you use SELINUX you could
> use  the 'context=system_u:object_r:glusterd_brick_t:s0' mount
option to
> prevent selinux context lookups
> C) Activation of the gluster group of settings 'metadata-cache' or
> 'nl-cache' brought 'find' to the same results - lowered 
from 3.5s to 2.5s
> after an initial run.
>
> I know that I'm not compairing apples to apples , but still it might
help.
>
> I would like to learn what actually gluster does when a 'find' or
'ls' is
> invoked, as I doubt it just executes it on the bricks.
>
> Best Regards,
> Strahil Nikolov
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200515/0b2f27ba/attachment.html>

Artem Russakovskii

2020-May-21 19:43 UTC

head link

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

I've also moved this to github:
https://github.com/gluster/glusterfs/issues/1257.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | @ArtemR <http://twitter.com/ArtemR>


On Fri, May 15, 2020 at 2:51 PM Artem Russakovskii <archon810 at
gmail.com>
wrote:
> Hi,
>
> I see the team met up recently and one of the discussed items was issues
> upgrading to v7. What were the results of this discussion?
>
> Is the team going to respond to this thread with their thoughts and
> analysis?
>
> Thanks.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>
>
> On Mon, May 4, 2020 at 10:23 PM Strahil Nikolov <hunter86_bg at
yahoo.com>
> wrote:
>
>> On May 4, 2020 4:26:32 PM GMT+03:00, Amar Tumballi <amar at
kadalu.io>
>> wrote:
>> >On Sat, May 2, 2020 at 10:49 PM Artem Russakovskii
>> ><archon810 at gmail.com>
>> >wrote:
>> >
>> >> I don't have geo replication.
>> >>
>> >> Still waiting for someone from the gluster team to chime in.
They
>> >used to
>> >> be a lot more responsive here. Do you know if there is a
holiday
>> >perhaps,
>> >> or have the working hours been cut due to Coronavirus
currently?
>> >>
>> >>
>> >It was Holiday on May 1st, and 2nd and 3rd were Weekend days!  And
also
>> >I
>> >guess many of Developers from Red Hat were attending Virtual
Summit!
>> >
>> >
>> >
>> >> I'm not inclined to try a v6 upgrade without their word
first.
>> >>
>> >
>> >Fair bet! I will bring this topic in one of the community meetings,
and
>> >ask
>> >developers if they have some feedback! I personally have not seen
these
>> >errors, and don't have a hunch on which patch would have caused
an
>> >increase
>> >in logs!
>> >
>> >-Amar
>> >
>> >
>> >>
>> >> On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <hunter86_bg
at yahoo.com>
>> >> wrote:
>> >>
>> >>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii
<
>> >>> archon810 at gmail.com> wrote:
>> >>> >The good news is the downgrade seems to have worked
and was
>> >painless.
>> >>> >
>> >>> >zypper install --oldpackage glusterfs-5.13, restart
gluster, and
>> >almost
>> >>> >immediately there are no heal pending entries anymore.
>> >>> >
>> >>> >The only things still showing up in the logs, besides
some healing
>> >is
>> >>> >0-glusterfs-fuse:
>> >>> >writing to fuse device failed: No such file or
directory:
>> >>> >==> mnt-androidpolice_data3.log <=>>
>>> >[2020-05-01 16:54:21.085643] E
>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >(-->
>> >>>
>>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >>> >(-->
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >>> >(-->
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >>> >(-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
>> >0-glusterfs-fuse:
>> >>> >writing to fuse device failed: No such file or
directory
>> >>> >==> mnt-apkmirror_data1.log <=>>
>>> >[2020-05-01 16:54:21.268842] E
>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >(-->
>> >>>
>>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d]
>> >>> >(-->
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a]
>> >>> >(-->
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb]
>> >>> >(-->
/lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (-->
>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] )))))
>> >0-glusterfs-fuse:
>> >>> >writing to fuse device failed: No such file or
directory
>> >>> >
>> >>> >It'd be very helpful if it had more info about
what failed to write
>> >and
>> >>> >why.
>> >>> >
>> >>> >I'd still really love to see the analysis of this
failed upgrade
>> >from
>> >>> >core
>> >>> >gluster maintainers to see what needs fixing and how
we can upgrade
>> >in
>> >>> >the
>> >>> >future.
>> >>> >
>> >>> >Thanks.
>> >>> >
>> >>> >Sincerely,
>> >>> >Artem
>> >>> >
>> >>> >--
>> >>> >Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC
>> >>> >beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >
>> >>> >
>> >>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii
>> ><archon810 at gmail.com>
>> >>> >wrote:
>> >>> >
>> >>> >> I do not have snapshots, no. I have a general
file based backup,
>> >but
>> >>> >also
>> >>> >> the other 3 nodes are up.
>> >>> >>
>> >>> >> OpenSUSE 15.1.
>> >>> >>
>> >>> >> If I try to downgrade and it doesn't work,
what's the brick
>> >>> >replacement
>> >>> >> scenario - is this still accurate?
>> >>> >>
>> >>> >
>> >>>
>> >
>>
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
>> >>> >>
>> >>> >> Any feedback about the issues themselves yet
please?
>> >Specifically, is
>> >>> >> there a chance this is happening because of the
mismatched
>> >gluster
>> >>> >> versions? Though, what's the solution then?
>> >>> >>
>> >>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov
>> ><hunter86_bg at yahoo.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem
Russakovskii <
>> >>> >>> archon810 at gmail.com> wrote:
>> >>> >>> >If more time is needed to analyze this,
is this an option? Shut
>> >>> >down
>> >>> >>> >7.5,
>> >>> >>> >downgrade it back to 5.13 and restart, or
would this screw
>> >>> >something up
>> >>> >>> >badly? I didn't up the op-version
yet.
>> >>> >>> >
>> >>> >>> >Thanks.
>> >>> >>> >
>> >>> >>> >Sincerely,
>> >>> >>> >Artem
>> >>> >>> >
>> >>> >>> >--
>> >>> >>> >Founder, Android Police
<http://www.androidpolice.com>, APK
>> >Mirror
>> >>> >>> ><http://www.apkmirror.com/>,
Illogical Robot LLC
>> >>> >>> >beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem
Russakovskii
>> >>> >>> ><archon810 at gmail.com>
>> >>> >>> >wrote:
>> >>> >>> >
>> >>> >>> >> The number of heal pending on
citadel, the one that was
>> >upgraded
>> >>> >to
>> >>> >>> >7.5,
>> >>> >>> >> has now gone to 10s of thousands and
continues to go up.
>> >>> >>> >>
>> >>> >>> >> Sincerely,
>> >>> >>> >> Artem
>> >>> >>> >>
>> >>> >>> >> --
>> >>> >>> >> Founder, Android Police
<http://www.androidpolice.com>, APK
>> >>> >Mirror
>> >>> >>> >> <http://www.apkmirror.com/>,
Illogical Robot LLC
>> >>> >>> >> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM
Artem Russakovskii
>> >>> >>> ><archon810 at gmail.com>
>> >>> >>> >> wrote:
>> >>> >>> >>
>> >>> >>> >>> Hi all,
>> >>> >>> >>>
>> >>> >>> >>> Today, I decided to upgrade one
of the four servers
>> >(citadel) we
>> >>> >>> >have to
>> >>> >>> >>> 7.5 from 5.13. There are 2
volumes, 1x4 replicate, and fuse
>> >>> >mounts
>> >>> >>> >(I sent
>> >>> >>> >>> the full details earlier in
another message). If everything
>> >>> >looked
>> >>> >>> >OK, I
>> >>> >>> >>> would have proceeded the rolling
upgrade for all of them,
>> >>> >following
>> >>> >>> >the
>> >>> >>> >>> full heal.
>> >>> >>> >>>
>> >>> >>> >>> However, as soon as I upgraded
and restarted, the logs
>> >filled
>> >>> >with
>> >>> >>> >>> messages like these:
>> >>> >>> >>>
>> >>> >>> >>> [2020-04-30 21:39:21.316149] E
>> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>> >actor
>> >>> >>> >>> (1298437:400:17) failed to
complete successfully
>> >>> >>> >>> [2020-04-30 21:39:21.382891] E
>> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>> >actor
>> >>> >>> >>> (1298437:400:17) failed to
complete successfully
>> >>> >>> >>> [2020-04-30 21:39:21.442440] E
>> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>> >actor
>> >>> >>> >>> (1298437:400:17) failed to
complete successfully
>> >>> >>> >>> [2020-04-30 21:39:21.445587] E
>> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>> >actor
>> >>> >>> >>> (1298437:400:17) failed to
complete successfully
>> >>> >>> >>> [2020-04-30 21:39:21.571398] E
>> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>> >actor
>> >>> >>> >>> (1298437:400:17) failed to
complete successfully
>> >>> >>> >>> [2020-04-30 21:39:21.668192] E
>> >>> >>> >>>
[rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>> >actor
>> >>> >>> >>> (1298437:400:17) failed to
complete successfully
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> The message "I [MSGID:
108031]
>> >>> >>> >>>
[afr-common.c:2581:afr_local_discovery_cbk]
>> >>> >>> >>>
0-androidpolice_data3-replicate-0: selecting local
>> >read_child
>> >>> >>> >>>
androidpolice_data3-client-3" repeated 10 times between
>> >>> >[2020-04-30
>> >>> >>> >>> 21:46:41.854675] and [2020-04-30
21:48:20.206323]
>> >>> >>> >>> The message "W [MSGID:
114031]
>> >>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-client-1:
remote operation failed
>> >>> >[Transport
>> >>> >>> >endpoint
>> >>> >>> >>> is not connected]" repeated
264 times between [2020-04-30
>> >>> >>> >21:46:32.129567]
>> >>> >>> >>> and [2020-04-30 21:48:29.905008]
>> >>> >>> >>> The message "W [MSGID:
114031]
>> >>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-client-0:
remote operation failed
>> >>> >[Transport
>> >>> >>> >endpoint
>> >>> >>> >>> is not connected]" repeated
264 times between [2020-04-30
>> >>> >>> >21:46:32.129602]
>> >>> >>> >>> and [2020-04-30 21:48:29.905040]
>> >>> >>> >>> The message "W [MSGID:
114031]
>> >>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-client-2:
remote operation failed
>> >>> >[Transport
>> >>> >>> >endpoint
>> >>> >>> >>> is not connected]" repeated
264 times between [2020-04-30
>> >>> >>> >21:46:32.129512]
>> >>> >>> >>> and [2020-04-30 21:48:29.905047]
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> Once in a while, I'm seeing
this:
>> >>> >>> >>> ==>
bricks/mnt-hive_block4-androidpolice_data3.log <=>> >>>
>>> >>> [2020-04-30 21:45:54.251637] I [MSGID: 115072]
>> >>> >>> >>>
[server-rpc-fops_v2.c:1681:server4_setattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-server:
5725811: SETATTR /
>> >>> >>> >>>
>> >>> >>> >
>> >>> >>>
>> >>> >
>> >>>
>> >
>>
androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png
>> >>> >>> >>>
(d4556eb4-f15b-412c-a42a-32b4438af557), client:
>> >>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1,
>> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>> >not
>> >>> >>> >permitted]
>> >>> >>> >>> [2020-04-30 21:49:10.439701] I
[MSGID: 115072]
>> >>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-server:
201833: SETATTR /
>> >>> >>> >>>
androidpolice.com/public/wp-content/uploads
>> >>> >>> >>>
(2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
>> >>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>> >not
>> >>> >>> >permitted]
>> >>> >>> >>> [2020-04-30 21:49:10.453724] I
[MSGID: 115072]
>> >>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-server:
201842: SETATTR /
>> >>> >>> >>>
androidpolice.com/public/wp-content/uploads
>> >>> >>> >>>
(2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
>> >>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>> >not
>> >>> >>> >permitted]
>> >>> >>> >>> [2020-04-30 21:49:16.224662] I
[MSGID: 115072]
>> >>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-server:
202865: SETATTR /
>> >>> >>> >>>
androidpolice.com/public/wp-content/uploads
>> >>> >>> >>>
(2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
>> >>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>> >not
>> >>> >>> >permitted]
>> >>> >>> >>>
>> >>> >>> >>> There's also lots of
self-healing happening that I didn't
>> >expect
>> >>> >at
>> >>> >>> >all,
>> >>> >>> >>> since the upgrade only took
~10-15s.
>> >>> >>> >>> [2020-04-30 21:47:38.714448] I
[MSGID: 108026]
>> >>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> >>> 0-apkmirror_data1-replicate-0:
performing metadata selfheal
>> >on
>> >>> >>> >>>
4a6ba2d7-7ad8-4113-862b-02e4934a3461
>> >>> >>> >>> [2020-04-30 21:47:38.765033] I
[MSGID: 108026]
>> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> >>> 0-apkmirror_data1-replicate-0:
Completed metadata selfheal
>> >on
>> >>> >>> >>>
4a6ba2d7-7ad8-4113-862b-02e4934a3461. sources=[3]  sinks=0 1
>> >2
>> >>> >>> >>> [2020-04-30 21:47:38.765289] I
[MSGID: 108026]
>> >>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> >>> 0-apkmirror_data1-replicate-0:
performing metadata selfheal
>> >on
>> >>> >>> >>>
f3c62a41-1864-4e75-9883-4357a7091296
>> >>> >>> >>> [2020-04-30 21:47:38.800987] I
[MSGID: 108026]
>> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> >>> 0-apkmirror_data1-replicate-0:
Completed metadata selfheal
>> >on
>> >>> >>> >>>
f3c62a41-1864-4e75-9883-4357a7091296. sources=[3]  sinks=0 1
>> >2
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> I'm also seeing "remote
operation failed" and "writing to
>> >fuse
>> >>> >>> >device
>> >>> >>> >>> failed: No such file or
directory" messages
>> >>> >>> >>> [2020-04-30 21:46:34.891957] I
[MSGID: 108026]
>> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> >>>
0-androidpolice_data3-replicate-0: Completed metadata
>> >selfheal
>> >>> >on
>> >>> >>> >>>
2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2]
>> >sinks=3
>> >>> >>> >>> [2020-04-30 21:45:36.127412] W
[MSGID: 114031]
>> >>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-client-0:
remote operation failed
>> >>> >[Operation
>> >>> >>> >not
>> >>> >>> >>> permitted]
>> >>> >>> >>> [2020-04-30 21:45:36.345924] W
[MSGID: 114031]
>> >>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>> >>> >>> >>> 0-androidpolice_data3-client-1:
remote operation failed
>> >>> >[Operation
>> >>> >>> >not
>> >>> >>> >>> permitted]
>> >>> >>> >>> [2020-04-30 21:46:35.291853] I
[MSGID: 108031]
>> >>> >>> >>>
[afr-common.c:2543:afr_local_discovery_cbk]
>> >>> >>> >>>
0-androidpolice_data3-replicate-0: selecting local
>> >read_child
>> >>> >>> >>> androidpolice_data3-client-2
>> >>> >>> >>> [2020-04-30 21:46:35.977342] I
[MSGID: 108026]
>> >>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> >>>
0-androidpolice_data3-replicate-0: performing metadata
>> >selfheal
>> >>> >on
>> >>> >>> >>>
2692eeba-1ebe-49b6-927f-1dfbcd227591
>> >>> >>> >>> [2020-04-30 21:46:36.006607] I
[MSGID: 108026]
>> >>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> >>>
0-androidpolice_data3-replicate-0: Completed metadata
>> >selfheal
>> >>> >on
>> >>> >>> >>>
2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2]
>> >sinks=3
>> >>> >>> >>> [2020-04-30 21:46:37.245599] E
>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >>> >>> >>> (-->
>> >>> >>> >>>
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >>> >>> >>> (-->
>> >>> >>> >>>
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >>> >>> >>>
/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
>> >>> >0-glusterfs-fuse:
>> >>> >>> >>> writing to fuse device failed:
No such file or directory
>> >>> >>> >>> [2020-04-30 21:46:50.864797] E
>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >>> >>> >>> (-->
>> >>> >>> >>>
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >>> >>> >>> (-->
>> >>> >>> >>>
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >>> >>> >>>
/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
>> >>> >0-glusterfs-fuse:
>> >>> >>> >>> writing to fuse device failed:
No such file or directory
>> >>> >>> >>>
>> >>> >>> >>> The number of items being healed
is going up and down
>> >wildly,
>> >>> >from 0
>> >>> >>> >to
>> >>> >>> >>> 8000+ and sometimes taking a
really long time to return a
>> >value.
>> >>> >I'm
>> >>> >>> >really
>> >>> >>> >>> worried as this is a production
system, and I didn't observe
>> >>> >this in
>> >>> >>> >our
>> >>> >>> >>> test system.
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> gluster v heal apkmirror_data1
info summary
>> >>> >>> >>> Brick
nexus2:/mnt/nexus2_block1/apkmirror_data1
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 27
>> >>> >>> >>> Number of entries in heal
pending: 27
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>> Brick
forge:/mnt/forge_block1/apkmirror_data1
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 27
>> >>> >>> >>> Number of entries in heal
pending: 27
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>> Brick
hive:/mnt/hive_block1/apkmirror_data1
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 27
>> >>> >>> >>> Number of entries in heal
pending: 27
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>> Brick
citadel:/mnt/citadel_block1/apkmirror_data1
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 8540
>> >>> >>> >>> Number of entries in heal
pending: 8540
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> gluster v heal
androidpolice_data3 info summary
>> >>> >>> >>> Brick
nexus2:/mnt/nexus2_block4/androidpolice_data3
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 1
>> >>> >>> >>> Number of entries in heal
pending: 1
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>> Brick
forge:/mnt/forge_block4/androidpolice_data3
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 1
>> >>> >>> >>> Number of entries in heal
pending: 1
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>> Brick
hive:/mnt/hive_block4/androidpolice_data3
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 1
>> >>> >>> >>> Number of entries in heal
pending: 1
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>> Brick
citadel:/mnt/citadel_block4/androidpolice_data3
>> >>> >>> >>> Status: Connected
>> >>> >>> >>> Total Number of entries: 1149
>> >>> >>> >>> Number of entries in heal
pending: 1149
>> >>> >>> >>> Number of entries in
split-brain: 0
>> >>> >>> >>> Number of entries possibly
healing: 0
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> What should I do at this point?
The files I tested seem to
>> >be
>> >>> >>> >replicating
>> >>> >>> >>> correctly, but I don't know
if it's the case for all of
>> >them,
>> >>> >and
>> >>> >>> >the heals
>> >>> >>> >>> going up and down, and all these
log messages are making me
>> >very
>> >>> >>> >nervous.
>> >>> >>> >>>
>> >>> >>> >>> Thank you.
>> >>> >>> >>>
>> >>> >>> >>> Sincerely,
>> >>> >>> >>> Artem
>> >>> >>> >>>
>> >>> >>> >>> --
>> >>> >>> >>> Founder, Android Police
<http://www.androidpolice.com>, APK
>> >>> >Mirror
>> >>> >>> >>>
<http://www.apkmirror.com/>, Illogical Robot LLC
>> >>> >>> >>> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >>> >>>
>> >>> >>> >>
>> >>> >>>
>> >>> >>> I's not supported  , but usually it
works.
>> >>> >>>
>> >>> >>> In worst case scenario,  you can remove the
node, wipe gluster
>> >on
>> >>> >the
>> >>> >>> node, reinstall the packages and add it - it
will require full
>> >heal
>> >>> >of the
>> >>> >>> brick and as you have previously reported
could lead to
>> >performance
>> >>> >>> degradation.
>> >>> >>>
>> >>> >>> I think you are on SLES, but I could be wrong
. Do you have
>> >btrfs or
>> >>> >LVM
>> >>> >>> snapshots to revert from ?
>> >>> >>>
>> >>> >>> Best Regards,
>> >>> >>> Strahil Nikolov
>> >>> >>>
>> >>> >>
>> >>>
>> >>> Hi Artem,
>> >>>
>> >>> You can increase the brick log level following
>> >>>
>> >
>>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
>> >>> but keep in mind that logs grow quite fast - so don't
keep them
>> >above the
>> >>> current level for too much time.
>> >>>
>> >>>
>> >>> Do you have a geo replication running ?
>> >>>
>> >>> About the migration issue - I have no clue why this
happened. Last
>> >time I
>> >>> skipped a major release(3.12  to 5.5) I got a huge trouble
(all
>> >files
>> >>> ownership was switched to root)  and I have the feeling 
that it
>> >won't
>> >>> happen again if you go through v6.
>> >>>
>> >>> Best Regards,
>> >>> Strahil Nikolov
>> >>>
>> >> ________
>> >>
>> >>
>> >>
>> >> Community Meeting Calendar:
>> >>
>> >> Schedule -
>> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> >> Bridge: https://bluejeans.com/441850968
>> >>
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>> >>
>>
>> Hey Artem,
>>
>> I just checked if the 'replica 4' is causing the issue , but
that's not
>> true (tested with 1 node down, but it's the same situation).
>>
>> I created 4 VMs on CentOS 7 & Gluster v7.5 (brick has only noatime
mount
>> option) and created a 'replica 4' volume.
>> Then I created a dir and placed 50000 very small files there via:
>> for i in {1..50000}; do echo $RANDOM > $i ; done
>>
>> The find command 'finds' them in 4s and after some tuning I
have managed
>> to lower it to 2.5s.
>>
>> What has caused some improvement was:
>> A) Activated the rhgs-random-io tuned profile which you can take from
>>
ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm
>> B) using noatime for the mount option and if you use SELINUX you could
>> use  the 'context=system_u:object_r:glusterd_brick_t:s0' mount
option to
>> prevent selinux context lookups
>> C) Activation of the gluster group of settings 'metadata-cache'
or
>> 'nl-cache' brought 'find' to the same results - lowered
from 3.5s to 2.5s
>> after an initial run.
>>
>> I know that I'm not compairing apples to apples , but still it
might help.
>>
>> I would like to learn what actually gluster does when a 'find'
or 'ls' is
>> invoked, as I doubt it just executes it on the bricks.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200521/d6b6c60a/attachment.html>

Gluster users - May 2020 - Upgrade from 5.13 to 7.5 full of weird messages

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages